My Experience with the Google Search Appliance

Google Search Appliance

by keif on May 7, 2010

So for a couple projects in the past year I’ve utilized the Google Search Appliance (here forward called theĀ GSA). The GSA is a powerful tool – it’s like your own slice of Google, only you can customize it, and in theory, tweak the algorithm and manipulate your results. Like a “SEO box” – but to my understanding, the algorithm is different, but in theory it’s the same.

The Versions of GSA

I’ve worked on a Google-mini, the Virtual GSA (version 5.0, no longer offered for download, for shame) and a 6.0 GSA. In terms of front end coding, you manipulate the XSL/XSLT code to change the default appearance. It’s cool – you can get the power of Google without the “branded” Google crap you get from the customized Google. So – your own branded, somewhat manipulatable Google box. Cool.

What is Google?

Of course, Google is not Bing. It is not Yahoo. It is not a commerce engine – it is a relevancy machine. The sole purpose of a Google crawl is to gather documents, have you execute a query, and for Google to say:

“I think this is what you want?”

At least, that’s *MY* experience with the GSA – they have a capability referred to as “Results Biasing.” This allows you to “influence” the results. My issues with this, is that you can make changes, but without arduous testing, you are never sure how long it’ll take the results to show up changed.

You can force a recrawl, but it seems the GSA “caches” the prior results for roughly 15 minutes. That means to tweak your results means retesting every 15 minutes, and even with results biasing, you may still be out of luck. It’s a crap shoot. A coworker and I spent several hours manipulating and testing results, only to eventually make one result bump up one slot, and we were working against tightly knit test set – we basically had 500 pages that were pure keywords and some other custom attributes to be utilized through a customized “Search As You Type” functionality.

Search As You Type

Yes – this was cool. A mix of plain JavaScript and PHP that essentially created a JSON result of your Google Search query. It’s a very cool project, and it allowed for some very unique experimentation with the results of a google query. I can’t delve into it to much, but we manipulated the data to spit out the results in a different display, for a more “spread” experience and relevant results. Very cool.

The Lessons Learned

I think the most difficult thing about the work was digging through the documentation – it was somewhat inaccurate at times, and very often some important, pertinent information was often in a single sentence, in some random blurb. Like Meta tag length limits (320 characters). Agitating, but we were able to rewrite our original meta tag code (to be better, in my opinion) to generate structural details in the PHP vs. including it in the meta data (it was a prototype stage, okay?). The only problem was the keywords were still being stuffed – way too many repeats and useless character phrases. This is a matter of education to the client, and clarifying that the GSA was not a “easy to manipulate” set of data – it was a complex beast, and that in ANY kind of manipulation should be taken with a grain of salt.

By no means is this article definitive, and, to be 100% honest, totally accurate. I spent many emails and phone calls with Google Support (which was a decent experience, I’d dare say surprisingly so) to come to many conclusions of our work arounds and test data issues and uncovered new information (little nuggets of wisdom) hidden in the documentation, and even uncovered a few inconsistencies in the documentation (from 2009) in terms of how the GSA was handling result diagnostics.

It’s a powerful tool – but it’s not EXTREMELY customizable to fit every situation. I wish the had more “virtual” instances (like the VGSA they had) of all their products. I’d love to really see the full capabilities when you don’t have to drop a ton of cash just to play with some hardware.

  • Matt Sidesinger

    Thanks for the post Keith. Hopefully, I will never have to use this information.

  • keif

    I've got a follow up planned with more details.

Previous post: