June 13, 2005

Search engines are "surprisingly ineffective" for many queries: New York Times

Eureka! Journalist James Fallows, in today's New York Times, finally gets it -- that search engines are great at answering easy, quick-fact types of questions, but are pretty terrible for more indepth critical questions. In his article "Enough Keyword Searches. Just Answer My Question," (free registration required to view), he reminds readers that for complex, indepth questions, searchers try in vain to outguess the engines. Fallows describes his frustration trying to use keyword searches to find consistent state-by-state data covering the last 40 years -- and coming up completely empty after fruitless hours of searching. "We live with these imperfections by trying to outguess the engines - what if I put "per capita spending by states" in quotation marks? - and by realizing that they're right for some jobs and wrong for others."

Posted by ritavine at 12:59 PM

May 28, 2004

Coming Soon - the Death of Search Engines?

Is search weariness finally settling in? Are mass market consumers ready to look beyond search engines to other ways of web searching? In "Coming Soon: The Death of Search Engines", I ponder the issues and look for some solutions.

Posted by ritavine at 11:28 AM

December 02, 2003

Link Competition on the Web

Although it is now almost 18 months old, Winners don't take all: Characterizing the competition for links on the web by David Pennock, Gary Flake, Steve Lawrence, Eric Glover, and C. Lee Giles, remains an excellent study of how distribution of links to web sites approximates a "power law" where a small number of sites receive the majority of links, and always rise to the top of search engine results for a given keyword combination. The study, which was published in the Proceedings of the National Academy of Sciences 99(8): 5207-5211, is also available in synopsis form

The study notes that the competition for web links is particularly fierce in publications, entertainment, and consumer electronics topics. Although the paper doesn't directly mention Google or its PageRank methodology, which ranks partially by link frequency, one can easily make the connection and conclude that link competition will continue to devolve Google's PageRank methodology, making Google less and less suitable for serious information searches in popular topics.

Posted by ritavine at 01:32 PM

Could Microsoft search your computer's files?

In Microsoft Aims for Search On Its Own Terms, Michael Kanellos describes Microsoft's experiment with "different search technologies that will, among other tasks, conduct Google-like searches on an individual's hard drive or categorize query results in different ways intended to make the data easier to digest."

Using this technology, the system "retrieves links, music files, e-mails and other materials that relate to applications running in the foreground." A Microsoft spokesperson describes the technology as "being able to retrieve a bunch of things without you explicitly asking for them."

If the technology could retrieve files based on the context of what you are working on now, it isn't a big stretch to think that the same technology might also conduct a web search and deliver web links based on the same contextual considerations.

Besides enabling Microsoft to fully undermine the utility of stand-alone search engines like Google by making its own software so easy to use, the prospect of such an invasive tool being built into an operating system has the sort of big-brother overtones that will likely raise privacy concerns among those who still care about such things.

If that's the idea (and Microsoft has persistently indicated that it wants to integrate web search into its next operating system), the idea is brilliant: Microsoft stands to enrich itself tremendously by persistently delivering external contextual content through a variety of revenue-producing streams. Harried computer users should find the convenience of integrated search irresistable, so this appears to be a strategy that can't miss.

Posted by ritavine at 11:03 AM

November 12, 2003

Rich-Get-Richer with Link Analysis

Google's PageRank, known generically as link analysis, has become the subject of some interesting research which leads many search professionals to conclude that search engines which rely on link analysis will favor the most popular, well-established and best-known web sites in their results.

The rich-get-richer concept of web linking -- whereby a large percentage of web links point to a relatively small number of web pages -- is described in reasonably plain language in Merrick E. Lozano's article "Rich Get Richer - Why Yahoo, DMOZ, Google and PageRank are Important." Lorenzo also touches on ideas like power laws and preferential attachment as they apply to web linking. A good introduction to a complex topic.

Posted by ritavine at 10:18 PM

April 18, 2003

Search Engine Robot Simulator

The Sim Spider Search Engine Robot Simulator is a spider that simulates what search engine robots read from your website. Readers can input a web page URL and visualize the links that will be spidered, the "word dump" that will go into the database, and keyword density analysis for each page. This is a highly illustrative example of the difference between the page you see on the screen and the content that actually lands up in the search engine's database.

Posted by ritavine at 03:25 PM

April 17, 2003

All About Search Indexing Robots and Spiders

Good searchers seek to understand the nature and content of the database that they are searching. Understanding how content "happens" in databases can enable advanced searchers to tailor their searches to the content, and to know why some searches won't work well.

This principle also applies to search engines, but few of us really know how search engine database content "happens" and how search engines gather their web pages. All About Search Indexing Robots and Spiders by Avi Rappoport of SearchTools.com provides an excellent summary, with additional links, about how spiders actually find and download pages into their mega-databases. Of particular interest are the links to how robots.txt pages work and the Robots Exclusion Protocol, which enables webmasters to redirect web spiders away from selected directories or pages.

Posted by ritavine at 02:13 PM

April 04, 2003

Google's PageRank Explained

Although it's more than you'll ever want to know about how Google rank orders its search results, Phil Craven's excellent article is required reading for anyone interested in just how Google rank orders its search results.

Although the algorithms of PageRank are complex, the results produced by PageRank are pretty easy to predict. Searchers should keep in mind that the PageRank algorithm is a popularity ranking tool, not a relevancy ranking tool. So if you think that Google brings the most relevant results to the top of the hit list, you're wrong: it brings the best known, most established resultsto the top of the hit list. Relevancy in any substantive sense would require human assessment and intervention, which doesn't happen in search engines.

Posted by ritavine at 11:50 PM
Description
SiteLines is written by Rita Vine, a professional librarian, web search trainer, and lead site evaluator of the Search Portfolio web search product.

Together with other members of the Search Portfolio selection team, Rita monitors over 50 key alerting services related to web search tools, site announcements, and the business of web search. SiteLines is intended to present a distillation of the most important trends, news, and new web search tools and directories.

Sitelines is sponsored by the Search Portfolio, a licensed web desktop of the 100 top peer-reviewed web sites for searching.

Subscribe
Subscribe Unsubscribe
Search


Archives
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
Recent Entries
Search engines are "surprisingly ineffective" for many queries: New York Times
Coming Soon - the Death of Search Engines?
Link Competition on the Web
Could Microsoft search your computer's files?
Rich-Get-Richer with Link Analysis
Search Engine Robot Simulator
All About Search Indexing Robots and Spiders
Google's PageRank Explained
Categories
Boolean Searching (1)
E-Mail (4)
Google (51)
Handheld Computers (1)
Images (2)
Information Literacy (10)
Internet Filters (3)
Miscellaneous (14)
News Stories (16)
Patents (1)
Podcasts (1)
RSS (3)
Resources - Business (13)
Resources - Health (21)
Resources - Misc. (45)
Search Engines (6)
Search Engines - Best Practices (14)
Search Engines - Business Issues (26)
Search Engines - Impact on Searching (8)
Searching - Best Practices (16)
Searching - User Behavior (10)
Software (8)
Spyware (2)
Staying Current (3)
Trends & Predictions (4)
Weblogs (1)
Yahoo! (1)
Links
SiteLines Home
Workingfaster.com
Upcoming Courses
Search Portfolio
XML for Site Syndication(XML)