August 31, 2005
Not-So-Smart Answers at AskJeeves
The latest additions to AskJeeves "Smart Answers" were reported by Gary Price in his August 22 Search Engine Watch blog. Although the latest additions expand on the existing Jeeves collection of sources that are pre-selected to handle queries for many common factual searches, they remain, like so many other "answer" capabilities of the major search engines, pretty mundane. Search Portfolio's research team tested many popular topics (e.g. marijuana, botox) to see if Smart Answers would deliver pre-selected content, and it didn't. Surely those sample searches are as common as one for burkina faso, which in AJ turns up through Smart Answers via the CIA World Factbook. Nothing new here: all the major search engines would turn up the CIA WF in the first 10 hits for the same query.
AJ's Smart Answers harkens back to the early days of AskJeeves, when it differentiated itself from other search engines by matching simple queries for common questions against a set of pre-determined web sites which could provide a variety of content that could answer the "question". Eventually, the site morphed into a meta-search engine, and then, with the integration of Teoma, became a more conventional search-engine-with-benefits. There is clearly little new hear, and not enough of real value to recommend this over other answer engines.
August 26, 2005
More Silly Search Engine Size Stories
Since Yahoo disclosed the jump of its index size to just over 19 billion (!) documents, I've been following a series of interesting posts at the Technologie du Langage blog from Jean Véronis, professor of Information and Technology at the University of Provence. In great detail (and in English), Véronis recounts, with good link references, the index-size story starting with Yahoo's announcement. He then systematically and persuasively refutes both the allegations of database size and the research methodology of a US study comparing database sizes of Google and Yahoo.
On the US study, Véronis concludes, "I find it amazing how quickly such a flawed study could be quoted with so much excitement all over the blogosphere and even make its way to the respectable New York Times." Those of us who are used to the republication as "news" of unverified company press releases are, sadly, not so surprised.
Although most of Véronis's posts at Technologie du Langage are in French, the blog is an outstanding (and rare) source of competent criticism of search engines, and deserves to be in the RSS feeds of serious web-watchers.
August 22, 2005
Google Tests "Commercial" Listings in "Organic" Search Results
From August 19's Clickz, an article on Google's testing of commercial listings in the 6th-8th position in "pure" search results. I replicated the test of the keywords on demand and the results are clearly visible. Interestingly, what isn't at all obvious is that the results are commercial/sponsored/paid in nature. The only apparent difference is the line above and below the commercial results, and the absence of either CACHED or SIMILAR PAGES links that usually company Google results.
I hate to say I told you so, but I predicted that Google would cross the line into paid-into-pure integration as early as 2003, when rumours began to surface about a possible public offering of Google. This appears to be the first public indication that the company is seriously testing the waters.
August 17, 2005
Data Mining Primer
From the US government, Data Mining, An Overview, is a short primer for those wanting to understand what data mining is all about. By Jeffrey Siebert, an infomation analyst at the US Congressional Research Service. In PDF format.
August 04, 2005
Interesting Tool: Copyscape
Plagiarist alert! There's an interesting online tool to help those of you who want to track those who has lifted content off your website. Copyscape uses Google API technology to identify distinctive sentences and phrases from your site and then sniffs around for other sites that use the same or similar phrases. Although Copyscape tends to sniff out a lot of blogs (probably because bloggers copy or paraphrase stuff a lot from other sources), this is a great way to track unapproved uses of your web site content. The folks who make Copyscape also produce Google Alert
August 02, 2005
An Info Pro's View on Yahoo Search Subscriptions
In "Searching More of the Opaque Web" Mary Ellen Bates provides an excellent overview of the relative merits of Yahoo! Search Subscriptions, Yahoo's new (and still fairly modest) service selling low-cost journal and news articles from a small group of sources such as Consumer Reports, New England Journal of Medicine, Wall Street Journal, Lexis-Nexis and Factiva.
Bates reminds us that the service doesn't allow for comprehensive coverage: it only allows searching of a subset of each journal/service's full text content, and the focus is on recent items.