April 29, 2003

Why Citation Errors Perpetuate

In the December 12 2002 issue of Nature Magazine, Philip Ball explores why identical article citation errors seem to perpetuate over time. The research suggests that is that this results in "lazy citation" -- a case of authors not actually reading the article that is being cited, but simply citing a previous citation with the original errors intact.

The numbers are very significant. Based on the number of distinct misprints tracked, only 22–23% of citations followed from a reading of the original paper.

April 25, 2003

The Trouble with Meta-Search

Consumer Web Watch columnist Angela Gunn's article, "In Search of Disclosure" explains how paid-for sites get mixed up with "real" search results in meta-search tools like Dogpile, Metacrawler, Mamma and many more. The implications may be inconsequential for many searches, but for information in which accuracy and specificity is essential -- health information, for instance -- a meta-search tool would be a poor choice.

Gunn also reminds readers to pay close attention to links on meta-search sites titled "About Results," "About Search," or something similar. These links will usually indicate which source tools provided the link -- and a few clicks on those links should provide clues on which search tools deliver paid listings.

April 24, 2003

More from "Golden Search"

The nice people at U.S. Bancorp Piper Jaffray sent me the full 90-page report Golden Search today. This is required reading for anyone at the advanced search level or who teaches search engines to end users.

Some fascinating tidbits from my first reading:

  • Together Google, Yahoo!, MSN, and AOL have greater than 80% market share of search, with Google running at almost half of that, at 34%.
  • Total searches done: almost 550 million per day worldwide; 245 million per day in the United States
  • What do people search on: 65% on information and reference; 15% on commerce-related searches; 20% on entertainment-related searches - and of that total, the report estimates that 35% of all searches could be commercial in nature
  • Great improvements in "search monetization" (which translates as making money from search technology improvements) will happen in the coming years to deliver eyeballs to advertisers at the moment they are actively looking to purchase something.
  • Expect Yahoo! to switch its search engine partner from Google to Inktomi, imminently, as Yahoo's purchase of Inktomi is finalized
  • Partnerships are everywhere, usually in sets of three (algorithmic search + paid inclusion + paid listings) and the number of players is relatively small (which explains why you see the same search results coming up in different places) -- examples include Lycos+Overture+AlltheWeb; AskJeeves+Teoma+Google; MSN+Looksmart+Overture

Rashtchy's full report The Golden Search, is available free with registration through Multex Investor (U.S. addresses are required) or and also to Investext Plus subscribers.

5 Key Trends to Shape the Future of the Search Industry

As an amateur (and largely unsuccessful) stock picker, I'm fascinated by how differently information seekers see web search as compared to search business professionals. Information seekers think of web search sites as helpful tools, but web search specialists know that most commercial web search tools are in the business of growing a customer base. The business of search engines has little to do with searching and everything to do with revenue growth.

Analyst Safa Rashtchy tracks search companies for investor clients of U.S. Bancorp Piper Jaffray. In a March 20 2003 press release from U.S. Bancorp Piper Jaffray, he identified five key trends that he believes will shape the future of the search industry.

Search Capitalism - Overturism, or the idea of paid search as a market-driven customer acquisition vehicle
Googlism - increased importance of relevance and a race to provide the best search experience,
Globalism - the increased importance of international markets and its impact on the partnerships among search companies,
Elitism - concentration of search among key destinations and the increasing importance of branded destinations, and
Realism- the next phase in search: in-context search.

Rashtchy asserts that the industry represents a major growth market and will grow in excess of 35% per year. In addition, he believes that the key driver of growth is "the increased popularity of search as the most efficient way to find products and information, and simultaneously the rise of search as the best way for advertisers to find and acquire customers." Rashtchy sees Overture and MSN as key companies to watch in this area.

Rashtchy's full report The Golden Search, is available free with registration through Multex Investor (U.S. addresses are required) or and also to Investext Plussubscribers.

Phrase Your Question as the Answer

In an interview with Greg Kline of the Champaign News-Gazette, Craig Silverstein, technology director of Google, suggests that web searchers looking for answers “always phrase [the] query in the form of an answer.” So that means if you're looking for the capital of Iowa, you might search using the phrase "the capital of iowa is" and expect to retrieve pages that have that phrase -- followed by the answer.

Most of us search uncritically in search engines using keywords that match the subject of our search. But search engines don't search for subjects, they just search for patterns of words on pages.

Read the full article

April 23, 2003

RCLS DeskRef - Gone!

DeskRef is gone. Produced by the Ramapo Catskill Library system, it was one of my favorite portals to quick reference sources.

Gary Price reminded me that you can still see parts of DeskRef through the Wayback Machine's archived files.

In our web search classes at Workingfaster.com, we would have a race to find information (an easy way to pick up the pace at that mid-day slump!) I would divide the class in half and get one half of the class to use search engines and bookmarks to find answers to quick reference questions, and asked the other half of the class to use DeskRef. The difference in the results were dramatic -- DeskRef questions were often answered in 30 seconds or less, while the search engine users were still pointing and clicking long after for most questions, even those where the "answer" could conceivably appear in the first page of search engine results.

I continue to believe that libraries and governments have to take a leadership role in building and promoting high quality free portal sites like DeskRef -- the .com world won't step in and do it systematically or for the long term. Even within the professional searching and library communities , the value of these selective, quality-filtered tools is grossly underestimated. In time they will be sorely missed.

April 22, 2003

Tips for Yahoo! Users

Yahoo!'s new look focuses this portal more than ever on white/yellow pages services, shopping, and searching. Serious web users who still rely on Yahoo! for its information sources are probably better served by starting at Yahoo's directory page rather than the cluttered main Yahoo page.

For web site selectors who are interested in finding new resources, the Yahoo directory page has handy links to resources added to the directory during the past week by Yahoo's staff . A quick view of these What's New lists shows that Yahoo! still favors business sites: the vast majority of added resources fall into either the business or regional (local business) categories.

April 18, 2003

Search Engine Robot Simulator

The Sim Spider Search Engine Robot Simulator is a spider that simulates what search engine robots read from your website. Readers can input a web page URL and visualize the links that will be spidered, the "word dump" that will go into the database, and keyword density analysis for each page. This is a highly illustrative example of the difference between the page you see on the screen and the content that actually lands up in the search engine's database.

April 17, 2003

Trends in the Evolution of the Public Web

An interesting study in D-Lib Magazine by researchers at OCLC, Trends in the Evolution of the Public Web suggests that 1) growth in the number of web sites has reached a plateau and actually shrank slightly last year; 2) globalization of the public web continues to be a myth, as web content is dominated by English-language content originating in the U.S. with no sign that this dominance may be shifting; 3) there is little if any progress toward adoption of formal metadata schemes for public Web resources.


All About Search Indexing Robots and Spiders

Good searchers seek to understand the nature and content of the database that they are searching. Understanding how content "happens" in databases can enable advanced searchers to tailor their searches to the content, and to know why some searches won't work well.

This principle also applies to search engines, but few of us really know how search engine database content "happens" and how search engines gather their web pages. All About Search Indexing Robots and Spiders by Avi Rappoport of SearchTools.com provides an excellent summary, with additional links, about how spiders actually find and download pages into their mega-databases. Of particular interest are the links to how robots.txt pages work and the Robots Exclusion Protocol, which enables webmasters to redirect web spiders away from selected directories or pages.

April 14, 2003

Knowledge Translation Lessons for Libraries

My top trend for libraries in 2003 is Knowledge Translation. Defined as the process that transfers research results from knowledge producers to knowledge users, knowledge translation products digest information from many sources, make decisions on what in that body of information is good information, and then repackage that “good information” into an easy-to-use tool that professionals can use with confidence.

Health care leads in the knowledge translation area.

Health care publishers have taken the lead in creating knowledge translation tools for use by physicians. Products like PDXMD.com (Elsevier), Inforetriever, and BMJ's Clinical Evidence are some examples of products currently in the marketplace.

With the easy availability of handheld computers with enlarged storage capacity, these knowledge-translation tools can bring content to actual practice, enabling physicians to carry around their practice tools as they move through their rounds.

Knowledge Translation tools are stepping beyond health care settings.

Although more popular in medicine than any other profession, knowledge translation tools are creeping into other professions, because professionals need access to information but lack the time to gather and process it themselves. Workingfaster.com's Search Portfolio is a product of this trend -- a web site selection service for librarians and libraries that simply lack the time to do nitty gritty site selection themselves. I also subscribe to Execubooks -- which takes major bestsellling business books and digests them into 2-5 pages that I can read online, print, or download onto my handheld and read on the subway. Welcome back Readers Digest -- with a brave new business face.

The trend toward knowledge translation is important for libraries.

These tools present a purchasing challenge for libraries, who already purchase the primary sources that form the basis of knowledge translation tools. Why should libraries buy the translated product when they already own the "real thing"? Much like the paperback-purchasing quandry that stymied public libraries decades ago, libraries are quickly coming to the conclusion that if they don't buy knowledge translation tools, their users will. And information consumers care little about the processes that go into creation of a knowledge product, whereas librarians care a lot about the decisions that may have an impact on the end-product.

For example, will a commercial publisher select or prefer their own family of published content for synthesis? Who writes the synthesized content? How is it reviewed? How often is it updated and how does the publisher respond to major new developments? Particularly in subject areas like health care, practice changes may happen quickly based on new evidence.

Knowledge translation sells because no one has time to keep up with their profession.

Libraries have never really seen themselves as being in the time-saving business. But time is being seen by our users as an increasingly precious commodity, and now more than ever, people want their information pre-digested and packaged for easy use when and where they need it. Libraries may want to take up the charge of knowledge translation tools to see how they can offer their users time-saving tools to uncomplicate their lives.

Further Reading
The Canadian Institutes of Health Research is a leader in knowledge translation research and practice. The site has a good bibliography plus links to relevant web sites.

Review of Image Search Tools

An excellent review by the TASI (Technical Advisory Service for Images) group on several free image search tools on the web.

April 11, 2003

Google's SafeSearch

Benjamin Edelman of Harvard University's Berkman Center for Internet and Society has just released an empirical study of Google's SafeSearch options. Google's SafeSearch is a checkbox enabled from the Advanced Search screen in Google to eliminate sexually explicit results. Edelman's study demonstrates clearly that SafeSearch omitted thousands of pages without any explicit sexual content, substantial amounts of valuable web content, including teacher lesson plans, educational institutions, and political content.

April 09, 2003

How to Decode a Web Address

Do you ever wonder how those very long web addresses are constructed? Genie Tyburski of the Virtual Chase explains why those long addresses look the way they do.

April 08, 2003

Yahoo's Search Changes -- How New?

There has been much news over the last few days on the new features of Yahoo's search. So far, there's not much to report -- Yahoo is still delivering results from its own directory and broader search results from Google (rather strange, given that Yahoo! now owns Inktomi, a rival search engine database producer). There is an added image search tool, which seems to deliver exactly the same results as Google's image search, plus the usual shopping searches. Early assessment? Not much new here for serious web searchers.

If you want to search using a search engine, use Google directly. Yahoo still has some good listings in its directory section: it's easier to search the directory by starting at http://dir.yahoo.com than at the overloaded main Yahoo page.

Gary Price is following the Yahoo story closely at http://www.resourceshelf.com -- I'll keep my eye on the major issues but search engine news junkies might want to take an occasional peek at Gary's site.

April 07, 2003

Sites for Lefties

When my dentist told me that she had to specially equip her entire office -- at some considerable cost -- to accommodate her lefthandedness, it struck me that there must be an entire lefthanded world out there -- of implements just for lefties. Sure enough, there is, and here are a few ineresting sites...

Rosemary West's Left-Handed World contains plenty of information and trivia on left-handedness. Anything Left-Handed and The Left Hand are both shopping sites for lefties.

April 06, 2003

Finding the Original Source for Health News

It's been around since 1997, but Biomedicine and Health in the News is a great tool for finding the research behind the headlines in health care news.

Two similar services -- one for the Minneapolis Star Tribune and another for the New York Times -- link health news headlines in these papers to the original research articles. Most news stories never precisely cite the original source of medical breakthroughs, and this tool is a great way to fill in the gap.

Subject-Specific Popularity -- Teoma's Magic Bullet?

There's a buzz about a new way to rank order pages, called "subject-specific popularity." Teoma, which uses the algorithm in order to directly compete with Google's PageRank, states that "subject-specific popularity ranks a site based on the number of same-subject pages that reference it, not just general popularity, to determine a site's level of authority."

This is done by first analyzing the web as a whole to identify subject communities. Teoma then employs link popularity within those communities to determine which sites are the "authorities" on the subject of the query and it's those sites that are returned as their results to a search.

You can see the results of subject-specific popularity in any topic search in Teoma. On the left side of the page, you'll see the raw results (which appear to use standard link analysis). On the right side, you'll see the "authority" subject portal sites that were returned through subject-specific popularity algorithms.

Teoma's engineers suggest that standard link popularity (such as Google's PageRank) does not help determine the subject or the context of the site, and larger more popular sites tend to overwhelm smaller sites that may actually be more relevant to a search.

April 04, 2003

Google's PageRank Explained

Although it's more than you'll ever want to know about how Google rank orders its search results, Phil Craven's excellent article is required reading for anyone interested in just how Google rank orders its search results.

Although the algorithms of PageRank are complex, the results produced by PageRank are pretty easy to predict. Searchers should keep in mind that the PageRank algorithm is a popularity ranking tool, not a relevancy ranking tool. So if you think that Google brings the most relevant results to the top of the hit list, you're wrong: it brings the best known, most established resultsto the top of the hit list. Relevancy in any substantive sense would require human assessment and intervention, which doesn't happen in search engines.

April 01, 2003

Important News About Google!

Google is now available in Klingon:

A joyous April 1 to all.

Description
SiteLines is written by Rita Vine, a professional librarian, web search trainer, and lead site evaluator of the Search Portfolio web search product.

Together with other members of the Search Portfolio selection team, Rita monitors over 50 key alerting services related to web search tools, site announcements, and the business of web search. SiteLines is intended to present a distillation of the most important trends, news, and new web search tools and directories.

Sitelines is sponsored by the Search Portfolio, a licensed web desktop of the 100 top peer-reviewed web sites for searching.

Subscribe
Subscribe Unsubscribe
Search


Archives
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
Recent Entries
Why Citation Errors Perpetuate
The Trouble with Meta-Search
More from "Golden Search"
5 Key Trends to Shape the Future of the Search Industry
Phrase Your Question as the Answer
RCLS DeskRef - Gone!
Tips for Yahoo! Users
Search Engine Robot Simulator
Trends in the Evolution of the Public Web
All About Search Indexing Robots and Spiders
Categories
Boolean Searching (1)
E-Mail (4)
Google (45)
Handheld Computers (1)
Images (2)
Information Literacy (10)
Internet Filters (3)
Miscellaneous (13)
News Stories (14)
RSS (2)
Resources - Business (12)
Resources - Health (19)
Resources - Misc. (42)
Search Engines - Best Practices (14)
Search Engines - Business Issues (24)
Search Engines - Impact on Searching (7)
Searching - Best Practices (15)
Searching - User Behavior (6)
Software (7)
Spyware (2)
Staying Current (2)
Trends & Predictions (1)
Links
SiteLines Home
Workingfaster.com
Upcoming Courses
Search Portfolio
XML for Site Syndication(XML)