« Looking for a good cowboy | Main | 55:56/170 »

February 22, 2005

Full text, part II

As Tilly alludes in the comments for my entry on this topic yesterday, Jean Véronis's Technologies du Langage blog covers the problem in more detail. In a recent entry Véronis demonstrates some amusing bugs in how Google references web pages, but suggest the crux of the matter lies deeper:

Or, la grande faiblesse de Google est justement son manque de chercheurs dans le domaine du traitement des langues. L'analyse des domaines de compétence de ses chercheurs à travers leur CVs et leurs publications fait apparaître une absence quasi-totale d'expertise dans ce domaine. Une telle expertise existe chez les développeurs de petits moteurs (notamment en France), mais les petits David semblent bien faibles par rapport au grand Googliath.

(Source: Référencement: Drôlement verni !)

Interesting point, and a good reason to keep alternative engines alive. But going back to Jeanneney's plea, in getting the full text on the web are the Davids in competition with the Googliath? After it's out there, will it really matter that Google makes a hash of the work of referencing the content?

Granted there won't necessarily be buckets of stock options and $3M bonuses, 1000 new millionaires, when the software editors do quality work rather than focus first on advertising revenue. (Unless David sells out to Goliath in return for quickly vesting options.) But fast bucks don't seem to be the goal. In the case brought forward by Jeanneney, we're talking about availability of texts selected by European librarians. The problems are separate. The solutions may be as well.

Posted by Mark at February 22, 2005 09:44 PM