September 19, 2008
Searching the Invisible - Advances in Video and Audio Search

The Iceberg’s Tip
Since their inception, search engines have relied on the visible to locate relevant content. Visible text, to be precise. And not just any visible text, either - the text had to be accessible and readable to a web crawler. That is, it couldn’t be inside an animation, image, script, video, or a wide assortment of other file formats. It couldn’t be stored in a ‘deep web’ database such as the CDC or USGS, one that was reached only via an active user query. And it certainly couldn’t be spoken.
This meant that, for all the blow-your-mind number of visible web pages out there on the web, all this time only a tiny fraction of the available content has been indexable and searchable by search engines. In 2001 the company BrightPlanet estimated in their white paper “The Deep Web: Surfacing Hidden Value” that public search engines made only 0.03% of the total web content available to searchers. That’s tiny. And this estimate wasn’t even considering content hidden in images, audio files, and video. Like a giant iceberg of data and content, the majority of the web remained - and still remains - invisible to search. But this is all changing.
Making sense out of sound
Google’s beta release of Gaudi, its audio indexing tool heralds to the wider public a profound shift in the search environment. Why? Hasn’t audio search been around for years now? Actually, no. Not this way. (Read the full article…)











