December 20, 2004
A way for Google to monetize article searching?
In "Google Magazine Search" Susan Kuchinskas explains how Google may be able to monetize its popular Google News services (and no doubt other Google services that deliver magazine content, like Google Scholar) without alienating publishers. According to Kuchinskas, "Google's problem with its News service is that there's no way to monetize it. News publishers would cry foul if it displayed contextual ads against their content, even if it is just headlines and openers. "
U.S. Patent Application No. 20040122811, filed under the name "Method for searching media, " gives some insight as to how Google could "offer pay-per-view documents, scanned documents with clickable ads and even the ability for print publishers to swap out ads in digital copies of their printed pages.
"There are two key elements of the patent: a method for executing a permission protocol so that the publisher could authorize Google to display more text from the relevant publication; and storing scanned versions of printed documents along with data sets representing the ads that went with them."
The creative aspect of the patent is that allows Google to take a flexible approach to delivering publisher content to users -- through ad-revenue sharing arrangements with publishers, who could use Google's permission protocol to deliver more content via referral from Google than it might through other sites (and to share the ad revenue generated from the accompanying ads), as well as straight pay-per-view options with shared revenue between Google and the publishers.
Kuchinskas speculates that the patent application is a likely indicator that Google has already developed the technology to do all this.
December 16, 2004
Understanding the Tension between Libraries and Google
There's an excellent article by May Wong of the Canadian Press and Associated Press, which appeared in the December 15 issue of the Globe and Mail. "Google Move Could Commercialize Libraries" discusses the differences between libraries goals of digitization (better access, free access) with Google's (unclear at the moment, but likely better access, pay-to-view).
Many library leaders wonder about who will win the war for eyeballs. "There is anxiety about whether the student researcher, scholar or citizen will be guided into the free public access rather than being lured into a purchasing relationship with the publisher," said Duane Webster, the executive director of the Association of Research Libraries.
December 14, 2004
Government Gazettes Online
Another excellent resource list from the University of Michigan Library -- Government Gazettes Online points users to links for government gazettes from around the world.
Cool Tool; Google Suggest
Nifty, simple, auto-complete tool at Google, called Google Suggest produces an automatic drop down menu of possible choices drawn from Google's most popular searches. OK, good for spelling help, or novice searchers looking for basic information. By the way, there seems to be some basic safe-search restrictions here -- I received no auto-complete options for a few typical roots.
What's behind Google's massive library digitization project?
Gary Price has provided the essential details of Google's just-announced project to begin rapid digitization of millions of out-of-copyright library materials. Although a handful of libraries (albeit major ones) are involved at this point, there is no doubt that many more will follow. (Google is footing the cost of much of this, so the offer is hard to pass up.)
What's behind this venture, and what type of business value can Google expect to see from this? Database size alone doesn't make for a sound business model, as revenue must follow, particularly in a newly-public company with shareholders looking for ever-increasing quarterly results.
One can see that all this free non-copyright full text content will enable Google to successfully market its now-way-bigger database as the source of a major portion of "serious" full text knowledge. That's an important first step that can allow Google to fully dominate the information market as the "first place to look" for ANY type of information. And following that, it will be much easier to convince trade publishers of commercial books to FULLY digitize their content AND make it indexed and available via Google Scholar with revenue generated from the shared sales of that for-fee material. (Remember that the trade content that's available through Google/Google Scholar right now is either just abstracts OR a few pages of the book, not the entire book.)
But book and article purchases alone probably aren't enough to produce the necessary revenue for Google, so I'm guessing that the company also factors in a large increase in advertising revenue into this business model.
Why will so much new advertising accrue? Simple -- because all this new digitized content will be dumped into an already-too-huge Google database. Giant databases make information harder to find, not easier, particularly when the only finding methodology is keyword guessing and algorithmic ranking.
Huge databases often frustrate users who are looking for something beyond basic information and general opinion. That, plus the unpredictable ranking of search results, ends up convincing advertisers that the only reliable way to reach their customers is to buy keyword placement. Even subdividing the Google database into sections (which currently exists with Google Scholar, Google US Government, and other Google special searches) is not necessarily good for normal human searchers (too many places to look), But for Google, it's all good -- a bigger package of targeted, Google-branded real estate to hold ads; and more advertisers competing for a limited number of keywords.
Here's my holiday wish -- for those academics and librarians that are now busily sharing this hot "news" about this latest digitization project: Please, don't go so gaga over this new Google venture that you see it for more than it actually is -- simply a way for Google to further increase its database size, thereby maximizing its revenue potential over time.
December 07, 2004
Peter Jacso reviews Google Scholar (with feeling)
Peter Jacso of the University of Hawaii has written a long review of Google Scholar in the December issue of Péter's Digital Reference Shelf. Although most observant reviewers have noticed that Google Scholar has missed a lot of stuff that one would normally expect to find, Jacso conducts many more tests and finds Google Scholar lacking in pretty well every aspect -- content and relevance being the major gripes.
Evaluating New Web Search Tools
When I'm not teaching or writing, I spend a large chunk of time evaluating new web search tools for inclusion (or not!) in the Search Portfolio. I'm often struck by how many reviewers of web search tools seem to completely miss essential elements that affect search tool quality. Although much has been written on how to evaluate content-rich web sites, there is almost nothing about how to effectively evaluate new search tools.
I recently spoke about my methodology on web search tool evaluation at Internet Librarian 2004 in Monterey, and I'm working on an article which will be published in Spring 2005 on the same topic. I see web search tool evaluation as a multi-step process, which involves evaluating both functionality and features of the tool as well as the source of content that is delivered through the tool. Most evaluators of web search tools tend to restrict their discussion to issues of functionality, but I would argue that content informs the quality of the tool at least as much as the functional bells and whistles.
It's also critically important to compare new web search tools to others of the same type that currently exist on the free web. Even if you like the way a new search tool behaves, it's only upon comparison to other existing "best of breed" tools of a similar type that you can really adequately determine whether or not this new tool is worth adding to your search tool roster.
I recently set a group of my students (mainly librarians) to the task of evaluating a newly announced web search tool, DonBusca.com. This group was just over halfway through taking the online course Beyond Google: Searching Faster and Smarter on the Web offered through the partnership of Canadian library associations in cooperation with Workingfaster.com. They had already been exposed to many excellent search tools in the first half of the course.
Those who focused their evaluation on functionality and ignored content sources had more difficulty in objectively judging quality. Those who tested the functionality of Donbusca (which uses a form of clusting in search results) against other clustering meta-search tools of similar type (like Clusty.com, for example) found notable differences in the quality of the clustering in these different search tools, which helped lead them to more objective conclusions. And those who dug deeply into the sources of content were able to make the best decisions, because they were able to assess the capacity of the search tool to deliver quality regardless of the functional capacity of the tool.
What did they think of Donbusca.com? Well, they liked the Thumbshots previews that accompanied some of the links (and so did I - this is a handy preview feature for broadband users). They didn't like Donbusca's clustering capability as much as they liked Clusty's -- and they conducted side-by-side comparisons to examine the clustering results. (For example, here's a result in DonBusca of 360 degree feedback and the same search in Clusty). One student looked pretty carefully at the sources of content for DonBusca. She carefully went through each source (DonBusca parses queries to 7search, About.com, AOL, AskJeeves, Dmoz, Epilot, FindWhat, MSN, Netscape, Overture, Wisenut, and Yahoo). She determined that several of the partners are pure pay-for-placement; that Netscape and AOL are basically Google searches, so they kind of cancelled each other out. One student was concerned about the prominent placement of the Wikipedia in search results, since as a school librarian she had met kids who deliberately put incorrect information into Wikipedia just to prove that they could.
Evaluating web search tools isn't easy, and this group of very capable students produced varying results. But for those who attempted to dig down "under the hood" and go beyond the "hmm, this is cool" conclusion, their evaluations proved more satisfying and ultimately more conclusive.
December 01, 2004
Recommended Resource: Who Named It?
Whonamedit.com lets you search for information about all medical phenomena named for a person, known as eponyms. Search by person, or browse an alphabetical list (great if you can't remember the spelling of the name!) Each eponym entry includes a very detailed entry, quotations, bibliography (!) plus links to more information on the persons involved.
Side-by-Side Measurement of Google Scholar vs. Publisher's own "native" search engines
Peter Jacso of the University of Hawaii has developed a simple but very effective tool to help web search evaluators measure the relative capabilities of Google Scholar against the publisher's own "native" search tools. Side-by-Side Native Search Engines vs. Google Scholar easily demonstrates that there are some limitations to Google Scholar's capacity to deliver results as good as the publisher's own native search tool.
From Jacso's commentary:
"Preliminary tests have shown that Google Scholar often retrieves far fewer unique items than the native search engines of the publishers. On the positive side, Google Scholar links to citing references if the document was cited by journals indexed in Google Scholar, and provides the immensely useful citedness score of the documents.
When Google Scholar has more "hits" for a query, they often turn out to be duplicates and triplicates (not always displayed adjacently) with a separate hit for the TOC entry, the abstract, the PDF file and (if available) the HTML file. Although their URLs are slightly different, they take you to the same spot in the archive. "
This conclusion is consistent with similar tests that I ran against Google Scholar's capacity to retrieve PubMed records, versus the same records available from a search of the PubMed site. Not only does PubMed retrieve far more results than Google Scholar, but it has the capacity of highly sophisticated native search logic that would never appear in Google. My search of the keywords asthma children in Google Scholar (with an added limit to retrieve only PubMed results) came up with approximately 14,400 records, as compared to the same search in PubMed, which retrieved over 23,000. I reported similarly disparate results earlier this year in a side-by-side comparison between Google and PubMed, titled "Just because it's indexed doesn't mean you'll find it."