July 30, 2004
Article on Search Engine Gigablast
There's a very interesting article on the independent search engine Gigablast by Canadian Internet consultant Gwen Harris, in this month's issue of Information Highways. Harris covers a bit of background on Gigablast's owner, Matt Wells (he's formerly of Infospace), and the business model (he wants to sell the technology rather than ads), plus an overview of one of Gigablast's most interesting features, Gigabits, which is used to show related concepts to a previously executed search.
Search Engine Comparison/Relationship Charts
Librarian Diana Botluk has produced a Search Engine Comparison Chart in the latest issue of LLRX.com. Botluck covers AlltheWeb, AltaVista, Google, Lycos, MSN, Teoma, Wisenut, and Yahoo. Gigablast, an important independent search engine, is absent from the list.
Although Botluk's chart is similar in style to the one that Greg Notess has maintained for several years as part of Search Engine Showdown, Botluck has focused on the major functions of each search tool rather than the databases that actually feed into these search engines. She also excludes Gigablast, an important independent search engine, from her list, which Notess includes. Notess chose not to include Altavista, Lycos or Alltheweb as separate entities, probably because they are all now fed by Yahoo!'s Inktomi engine plus paid listings from other sources.
The databases that serve results to search engines are at least as important as the functionality and features of each engine. Readers are advised to consider findings from both Botluk's and Notess's charts, and also to keep aware of the ever-changing feeds that provide each engine's content. There's a good (and frequently updated) chart at Search Engine Watch: the latest verion is dated July 23, 2004.
July 29, 2004
ConsumerWebWatch weighs in on online lawyer directories
A newly released report, Law and Disorder: The Complicated Online Search for Lawyers, evaluates several lawyer directories online and, as expected, finds some are better than others. From the abstract:
"Consumers searching for a local lawyer may find little more than advertising-based listings and nothing resembling thoughtful advice. Some sites, like FindLaw.com, are legitimate, ad-supported directories. But others, such as TheBestLegalServices.com, collect personal information and fail to disclose who they are or where they're sending the data. The result can be unwanted phone or e-mail contact from any number of law firms, and with no way to stop it."
July 23, 2004
How to Search Google from your IE address bar
I've reviewed several articles lately on how to change various Internet Explorer's MSN search defaults so that Google becomes the default search engine in IE rather than MSN.
Almost all of the articles suggest registry changes, which is fine if you're comfortable changing the registry, but most of us would never touch the thing.
Happily, there is a fix: If all you want to do is change your default address bar search engine to Google, you do not need to edit the registry. You can make Google your default address bar search engine by clicking the Search button on the top toolbar of IE, then click on Customize, and choose Autosearch Settings at the bottom of the box.
For those comfortable with registry change settings, here are some reference articles, but readers should be aware that I haven't tested the instructions provided and can't vouch for their effectiveness.
http://pubs.logicalexpressions.com/Pub0009/LPMArticle.asp?ID=87
http://www.google.com/options/defaults.html
July 19, 2004
More on Yahoo! and Google's Inclusion of WorldCat records
There has been more news this month of Yahoo!'s inclusion of Worldcat records (Google already has them) in its database.
This is interesting because it illustrates some real variations in current ranking and sorting differences between Yahoo! and Google.
As a test to see if the Worldcat records for a book would come up during an average search, I selected the book Your Guide to Passing the AMP Real Estate Exam by Joyce Bea Sterling (Real Estate Education Co., 2000) which is one of the Worldcat records captured by both Google and Yahoo. I chose the title because it was recent, because users looking to pass the exam could conceivably use Google to help them, and because the word selections for searching would be fairly obvious (amp real estate exam).
I typed in the query amp real estate exam into Google (without any punctuation or double quotations). As I expected, Google's algorithmic ranking and sorting methods, which prefer popular web pages (as opposed to Worldcat's obscure and rarely-linked documents) delivered lots of links, including lots of links to booksellers selling this and similar books, but in the first 10 pages of results, there was no link for the Worldcat record for this book.
I did the same in Yahoo - typed in the query amp real estate exam. The results were dramatically different. There in the first page of Yahoo results, was the Worldcat record.
What does this mean? Well, it provides an illustration of how Yahoo's ranking and sorting algorithms are different from Google's. Neither better nor worse, just different. It may also mean that, at least for a while, Yahoo may have given preferential treatment to the worldcatlibraries.org domain. We don't know for sure, we just know that the domain seems to rank higher in Yahoo's results than for a similar search in Google.
Clearly Google hasn't given preference to the worldcatlibraries.org domain (not yet, anyhow), but that doesn't mean that its results aren't just as -- or more -- relevant. I've always had a problem with domain preference decisions by the search engines (who are they to judge quality anyhow?) so if anything, the moral of the story is continue to use multiple search engines.
I'm puzzled at the very positive response by most information professionals to these announcements of database dumps into search engines. In Searcher, Barbara Quint recently quoted several ecstatic responses to these announcements from people who are usually a lot more measured in their opinions. On the other hand, Gary Price of Resourceshelf.com provided considerably more balanced views on the topic.
Although click throughs are way up at OCLC through these search engine links to Worldcat records, users will often fail to find these records unless they know that they want them. Sure, if I had added the keyword library or worldcat to my search string in Google, I would have found the Worldcat record for the Sterling book on the first page of results. But who would ever think of doing that when they don't know exactly what is wanted?
(See my posting, Just Because It's Indexed Doesn't Mean You'll Find It for another example, this one with PubMed records in Google)
July 14, 2004
Some Cautionary Notes on Vivisimo
In a recent issue of Resourceshelf.com, I spotted a link to a Pittsburgh Business Times article on Vivisimo, a popular meta-search engine, and about profitability of the Vivisimo meta-search engine "test bed" which demonstrates Vivisimo's clustering technology. Profitability, they say? Time to take another look at Vivisimo's public meta search engine.
At the heart of Vivisimo's popularity is its excellent clustering technology, which is also used to facilitate targeted search in many other online products. Raves about Vivisimo's clustering has brought many users to its public meta-search site.
But a closer look at the underlying databases used by Vivisimo show it as a substandard meta-search tool for serious searchers. It's default web search databases (MSN, Lycos, Looksmart, Wisenut, Open Directory, and Overture) are generally agreed to be less-than-stellar choices in their respective categories. Overture and Looksmart are almost exclusively pay-for-placement products. Lycos is now principally Yahoo's Inktomi database with added sponsored links; the Open Directory is generally agreed to be an occasionally useful directory, but crowded with commercial content because of its preferential treatment in Google's algorithms. Wisenut is owned by Looksmart, and according to Search Engine Showdown, it has one of the smallest databases of all the spidered search engines.
Conclusion? No wonder Vivisimo is boasting of profitability -- most of its source database partners are pay-for players. According to the article, Vivisimo earns 35% of its revenue from paid placement and advertising on its public web site.
Vivisimo's story isn't really new -- many search engines (including Google and the original Altavista) have in the past used their public web search utilities as test beds to promote their technology, only to soon discover that there was more money in search than in selling the technology outright.
I like Vivisimo's clustering technology a lot. But it's important for serious searchers to understand that even great technology will produce poor results if the underlying databases aren't good. In Vivisimo's case, paid content in, (clustered) paid content out.
Get Journal Contents by RSS Thru Ingenta Connect
Ingenta has launched a new service, IngentaConnect, which enables users to keep track of new contents of journal titles through an RSS feed. For those readers using a news aggregator, this is a great way to keep track of contents of new journal issues. Users can search by journal title, then click on the RSS button to display the feed for new issues of the journal. The RSS web URL can then be inserted into any aggregator, or fed to other RSS-friendly sources.
July 11, 2004
The Nature of Meaning in the Age of Google
In the April 2004 issue of Information Research, Terrence Brooks of the University of Washington's Information School wrote a very interesting article on how Google's algorithmic method of indexing has created a new culture of "lay indexing" with a variety of consequences for serious information seekers.
Brooks notes that in the age of Google, the nature of meaning has changed, moving from trust in the expertise of a few to reliance on the aggregation of many opinions, including many uninformed ones. The article is full of interesting insights, including thoughts on the competition between effective algorithmic measures of meaning and optimization attempts to distort it; Google's bias against obscure web sites; and much more.
July 07, 2004
2003 National Survey of Information Technology in US Higher Education
The Campus Computing Project is an important, ongoing compilation of information on the state of university and college computing issues. Started in 1990, the project surveys over 600 two-and four-year public and private colleges and universities in the United States, and publishes the results.
A summary of the 2003 national survey is available in PDF format. As expected, the report shows substantial growth in campus wireless access, policies to control illegal downloading of music and video, and the increasing presence of campus portals.