April 29, 2003
Why Citation Errors Perpetuate
In the December 12 2002 issue of Nature Magazine, Philip Ball explores why identical article citation errors seem to perpetuate over time. The research suggests that is that this results in "lazy citation" -- a case of authors not actually reading the article that is being cited, but simply citing a previous citation with the original errors intact.
The numbers are very significant. Based on the number of distinct misprints tracked, only 22–23% of citations followed from a reading of the original paper.
April 25, 2003
The Trouble with Meta-Search
Consumer Web Watch columnist Angela Gunn's article, "In Search of Disclosure" explains how paid-for sites get mixed up with "real" search results in meta-search tools like Dogpile, Metacrawler, Mamma and many more. The implications may be inconsequential for many searches, but for information in which accuracy and specificity is essential -- health information, for instance -- a meta-search tool would be a poor choice.
Gunn also reminds readers to pay close attention to links on meta-search sites titled "About Results," "About Search," or something similar. These links will usually indicate which source tools provided the link -- and a few clicks on those links should provide clues on which search tools deliver paid listings.
April 24, 2003
More from "Golden Search"
The nice people at U.S. Bancorp Piper Jaffray sent me the full 90-page report Golden Search today. This is required reading for anyone at the advanced search level or who teaches search engines to end users.
Some fascinating tidbits from my first reading:
- Together Google, Yahoo!, MSN, and AOL have greater than 80% market share of search, with Google running at almost half of that, at 34%.
- Total searches done: almost 550 million per day worldwide; 245 million per day in the United States
- What do people search on: 65% on information and reference; 15% on commerce-related searches; 20% on entertainment-related searches - and of that total, the report estimates that 35% of all searches could be commercial in nature
- Great improvements in "search monetization" (which translates as making money from search technology improvements) will happen in the coming years to deliver eyeballs to advertisers at the moment they are actively looking to purchase something.
- Expect Yahoo! to switch its search engine partner from Google to Inktomi, imminently, as Yahoo's purchase of Inktomi is finalized
- Partnerships are everywhere, usually in sets of three (algorithmic search + paid inclusion + paid listings) and the number of players is relatively small (which explains why you see the same search results coming up in different places) -- examples include Lycos+Overture+AlltheWeb; AskJeeves+Teoma+Google; MSN+Looksmart+Overture
Rashtchy's full report The Golden Search, is available free with registration through Multex Investor (U.S. addresses are required) or and also to Investext Plus subscribers.
5 Key Trends to Shape the Future of the Search Industry
As an amateur (and largely unsuccessful) stock picker, I'm fascinated by how differently information seekers see web search as compared to search business professionals. Information seekers think of web search sites as helpful tools, but web search specialists know that most commercial web search tools are in the business of growing a customer base. The business of search engines has little to do with searching and everything to do with revenue growth.
Analyst Safa Rashtchy tracks search companies for investor clients of U.S. Bancorp Piper Jaffray. In a March 20 2003 press release from U.S. Bancorp Piper Jaffray, he identified five key trends that he believes will shape the future of the search industry.
Search Capitalism - Overturism, or the idea of paid search as a market-driven customer acquisition vehicle
Googlism - increased importance of relevance and a race to provide the best search experience,
Globalism - the increased importance of international markets and its impact on the partnerships among search companies,
Elitism - concentration of search among key destinations and the increasing importance of branded destinations, and
Realism- the next phase in search: in-context search.
Rashtchy asserts that the industry represents a major growth market and will grow in excess of 35% per year. In addition, he believes that the key driver of growth is "the increased popularity of search as the most efficient way to find products and information, and simultaneously the rise of search as the best way for advertisers to find and acquire customers." Rashtchy sees Overture and MSN as key companies to watch in this area.
Rashtchy's full report The Golden Search, is available free with registration through Multex Investor (U.S. addresses are required) or and also to Investext Plussubscribers.
Phrase Your Question as the Answer
In an interview with Greg Kline of the Champaign News-Gazette, Craig Silverstein, technology director of Google, suggests that web searchers looking for answers “always phrase [the] query in the form of an answer.” So that means if you're looking for the capital of Iowa, you might search using the phrase "the capital of iowa is" and expect to retrieve pages that have that phrase -- followed by the answer.
Most of us search uncritically in search engines using keywords that match the subject of our search. But search engines don't search for subjects, they just search for patterns of words on pages.
Read the full article
April 23, 2003
RCLS DeskRef - Gone!
DeskRef is gone. Produced by the Ramapo Catskill Library system, it was one of my favorite portals to quick reference sources.
Gary Price reminded me that you can still see parts of DeskRef through the Wayback Machine's archived files.
In our web search classes at Workingfaster.com, we would have a race to find information (an easy way to pick up the pace at that mid-day slump!) I would divide the class in half and get one half of the class to use search engines and bookmarks to find answers to quick reference questions, and asked the other half of the class to use DeskRef. The difference in the results were dramatic -- DeskRef questions were often answered in 30 seconds or less, while the search engine users were still pointing and clicking long after for most questions, even those where the "answer" could conceivably appear in the first page of search engine results.
I continue to believe that libraries and governments have to take a leadership role in building and promoting high quality free portal sites like DeskRef -- the .com world won't step in and do it systematically or for the long term. Even within the professional searching and library communities , the value of these selective, quality-filtered tools is grossly underestimated. In time they will be sorely missed.
April 22, 2003
Tips for Yahoo! Users
Yahoo!'s new look focuses this portal more than ever on white/yellow pages services, shopping, and searching. Serious web users who still rely on Yahoo! for its information sources are probably better served by starting at Yahoo's directory page rather than the cluttered main Yahoo page.
For web site selectors who are interested in finding new resources, the Yahoo directory page has handy links to resources added to the directory during the past week by Yahoo's staff . A quick view of these What's New lists shows that Yahoo! still favors business sites: the vast majority of added resources fall into either the business or regional (local business) categories.
April 18, 2003
Search Engine Robot Simulator
The Sim Spider Search Engine Robot Simulator is a spider that simulates what search engine robots read from your website. Readers can input a web page URL and visualize the links that will be spidered, the "word dump" that will go into the database, and keyword density analysis for each page. This is a highly illustrative example of the difference between the page you see on the screen and the content that actually lands up in the search engine's database.
April 17, 2003
Trends in the Evolution of the Public Web
An interesting study in D-Lib Magazine by researchers at OCLC, Trends in the Evolution of the Public Web suggests that 1) growth in the number of web sites has reached a plateau and actually shrank slightly last year; 2) globalization of the public web continues to be a myth, as web content is dominated by English-language content originating in the U.S. with no sign that this dominance may be shifting; 3) there is little if any progress toward adoption of formal metadata schemes for public Web resources.
All About Search Indexing Robots and Spiders
Good searchers seek to understand the nature and content of the database that they are searching. Understanding how content "happens" in databases can enable advanced searchers to tailor their searches to the content, and to know why some searches won't work well.
This principle also applies to search engines, but few of us really know how search engine database content "happens" and how search engines gather their web pages. All About Search Indexing Robots and Spiders by Avi Rappoport of SearchTools.com provides an excellent summary, with additional links, about how spiders actually find and download pages into their mega-databases. Of particular interest are the links to how robots.txt pages work and the Robots Exclusion Protocol, which enables webmasters to redirect web spiders away from selected directories or pages.
April 14, 2003
Knowledge Translation Lessons for Libraries
My top trend for libraries in 2003 is Knowledge Translation. Defined as the process that transfers research results from knowledge producers to knowledge users, knowledge translation products digest information from many sources, make decisions on what in that body of information is good information, and then repackage that “good information” into an easy-to-use tool that professionals can use with confidence.
Health care leads in the knowledge translation area.
Health care publishers have taken the lead in creating knowledge translation tools for use by physicians. Products like PDXMD.com (Elsevier), Inforetriever, and BMJ's Clinical Evidence are some examples of products currently in the marketplace.
With the easy availability of handheld computers with enlarged storage capacity, these knowledge-translation tools can bring content to actual practice, enabling physicians to carry around their practice tools as they move through their rounds.
Knowledge Translation tools are stepping beyond health care settings.
Although more popular in medicine than any other profession, knowledge translation tools are creeping into other professions, because professionals need access to information but lack the time to gather and process it themselves. Workingfaster.com's Search Portfolio is a product of this trend -- a web site selection service for librarians and libraries that simply lack the time to do nitty gritty site selection themselves. I also subscribe to Execubooks -- which takes major bestsellling business books and digests them into 2-5 pages that I can read online, print, or download onto my handheld and read on the subway. Welcome back Readers Digest -- with a brave new business face.
The trend toward knowledge translation is important for libraries.
These tools present a purchasing challenge for libraries, who already purchase the primary sources that form the basis of knowledge translation tools. Why should libraries buy the translated product when they already own the "real thing"? Much like the paperback-purchasing quandry that stymied public libraries decades ago, libraries are quickly coming to the conclusion that if they don't buy knowledge translation tools, their users will. And information consumers care little about the processes that go into creation of a knowledge product, whereas librarians care a lot about the decisions that may have an impact on the end-product.
For example, will a commercial publisher select or prefer their own family of published content for synthesis? Who writes the synthesized content? How is it reviewed? How often is it updated and how does the publisher respond to major new developments? Particularly in subject areas like health care, practice changes may happen quickly based on new evidence.
Knowledge translation sells because no one has time to keep up with their profession.
Libraries have never really seen themselves as being in the time-saving business. But time is being seen by our users as an increasingly precious commodity, and now more than ever, people want their information pre-digested and packaged for easy use when and where they need it. Libraries may want to take up the charge of knowledge translation tools to see how they can offer their users time-saving tools to uncomplicate their lives.
Further Reading
The Canadian Institutes of Health Research is a leader in knowledge translation research and practice. The site has a good bibliography plus links to relevant web sites.
Review of Image Search Tools
An excellent review by the TASI (Technical Advisory Service for Images) group on several free image search tools on the web.
April 11, 2003
Google's SafeSearch
Benjamin Edelman of Harvard University's Berkman Center for Internet and Society has just released an empirical study of Google's SafeSearch options. Google's SafeSearch is a checkbox enabled from the Advanced Search screen in Google to eliminate sexually explicit results. Edelman's study demonstrates clearly that SafeSearch omitted thousands of pages without any explicit sexual content, substantial amounts of valuable web content, including teacher lesson plans, educational institutions, and political content.
April 09, 2003
How to Decode a Web Address
Do you ever wonder how those very long web addresses are constructed? Genie Tyburski of the Virtual Chase explains why those long addresses look the way they do.
April 08, 2003
Yahoo's Search Changes -- How New?
There has been much news over the last few days on the new features of Yahoo's search. So far, there's not much to report -- Yahoo is still delivering results from its own directory and broader search results from Google (rather strange, given that Yahoo! now owns Inktomi, a rival search engine database producer). There is an added image search tool, which seems to deliver exactly the same results as Google's image search, plus the usual shopping searches. Early assessment? Not much new here for serious web searchers.
If you want to search using a search engine, use Google directly. Yahoo still has some good listings in its directory section: it's easier to search the directory by starting at http://dir.yahoo.com than at the overloaded main Yahoo page.
Gary Price is following the Yahoo story closely at http://www.resourceshelf.com -- I'll keep my eye on the major issues but search engine news junkies might want to take an occasional peek at Gary's site.
April 07, 2003
Sites for Lefties
When my dentist told me that she had to specially equip her entire office -- at some considerable cost -- to accommodate her lefthandedness, it struck me that there must be an entire lefthanded world out there -- of implements just for lefties. Sure enough, there is, and here are a few ineresting sites...
Rosemary West's Left-Handed World contains plenty of information and trivia on left-handedness. Anything Left-Handed and The Left Hand are both shopping sites for lefties.
April 06, 2003
Finding the Original Source for Health News
It's been around since 1997, but Biomedicine and Health in the News is a great tool for finding the research behind the headlines in health care news.
Two similar services -- one for the Minneapolis Star Tribune and another for the New York Times -- link health news headlines in these papers to the original research articles. Most news stories never precisely cite the original source of medical breakthroughs, and this tool is a great way to fill in the gap.
Subject-Specific Popularity -- Teoma's Magic Bullet?
There's a buzz about a new way to rank order pages, called "subject-specific popularity." Teoma, which uses the algorithm in order to directly compete with Google's PageRank, states that "subject-specific popularity ranks a site based on the number of same-subject pages that reference it, not just general popularity, to determine a site's level of authority."
This is done by first analyzing the web as a whole to identify subject communities. Teoma then employs link popularity within those communities to determine which sites are the "authorities" on the subject of the query and it's those sites that are returned as their results to a search.
You can see the results of subject-specific popularity in any topic search in Teoma. On the left side of the page, you'll see the raw results (which appear to use standard link analysis). On the right side, you'll see the "authority" subject portal sites that were returned through subject-specific popularity algorithms.
Teoma's engineers suggest that standard link popularity (such as Google's PageRank) does not help determine the subject or the context of the site, and larger more popular sites tend to overwhelm smaller sites that may actually be more relevant to a search.
April 04, 2003
Google's PageRank Explained
Although it's more than you'll ever want to know about how Google rank orders its search results, Phil Craven's excellent article is required reading for anyone interested in just how Google rank orders its search results.
Although the algorithms of PageRank are complex, the results produced by PageRank are pretty easy to predict. Searchers should keep in mind that the PageRank algorithm is a popularity ranking tool, not a relevancy ranking tool. So if you think that Google brings the most relevant results to the top of the hit list, you're wrong: it brings the best known, most established resultsto the top of the hit list. Relevancy in any substantive sense would require human assessment and intervention, which doesn't happen in search engines.