October 19, 2005

Wikipedia Founder Admits to Serious Quality Problems

It's hard to believe that the Wikipedia has led such a charmed life. Encyclopedia-by-committee, even with some editorial oversight, is prone to hazards. There's amazingly variable quality between entries, and it's almost impossible to prevent always-present hackers from inputting bad, wrong, or dubious information ...just for fun.

So it's no surprise that the Register reports this week that Wikipedia's founder is reporting serious quality problems. Since when did information-by-committee replace serious editorial review? Read the article at http://www.theregister.co.uk/2005/10/18/wikipedia_quality_problem/.

Posted by ritavine at 09:01 PM

May 12, 2005

Thinking About Where Information Lives

Over the last few weeks, I've been thinking about how to help serious web searchers think more creatively about how to find information on the web.

Most users think about searching the search engines by topic, and then perhaps about looking in a topical directory or catalogue of resources. But great searchers go beyond topic to thinking about sources of, or types of information, (book, journal article, bibliography, dictionary, encyclopedia) and that clearly gives them a head up on topic-fixated web searchers.

Thinking about the type of source that an information nugget may appear in can help web searchers be more creative in the way they use the web, and web search engines in particular.

For example, if you're looking for trends in soft drink consumption in Europe, and you think about type rather than topic, you might think about finding associations that are concerned with the topic, or journal articles on the topic, or government starter sites that link to European industry information.

And, following from that, a search engine can be used to search the keywords that represent the type of information you're looking for, rather than the topic. The benefit of this sort of searching is that it tends to deliver better results than topical keyword searches, particularly for information that is likely to be heavily optimized by search engine marketers.

Posted by ritavine at 06:12 PM

November 09, 2004

Problems Dating Web Pages

Read this excellent summary by Greg Notess on the problems attributing a date to web pages (even for those pages that are dated!), in the November-December issue of ONLINE.

Posted by ritavine at 11:09 AM

October 27, 2004

RECONCILING THE ORDER OF THE LIBRARY WITH THE CHAOS OF THE WEB

Spotted on RLG's Shelflife, some insightful quotations from Judith Pearce of the National Library of Australia on the relationship/collisions (you choose) between the library and the web (extract from Shelflife:)

Noting that the library has long been "a metaphor for order and
rationality," where the search for information is aided by knowledgeable
librarians, Judith Pearce, director of business analysis at the National
Library of Australia, contrasts such structure with the "anarchy" of the
Web: "The Web is free-associating, unrestricted and disorderly. Searching
is secondary to finding and the process by which things are found is
unimportant. Collections are temporary and subjective where a blog entry
may be as valuable to the individual as an unpublished paper as are six
pages of a book made available by Amazon. The individual searches alone
without expert help and, not knowing what is undiscovered, is satisfied."
Services like Google and Amazon have raised the expectations of library
users. For others, they have introduced a "world of information in which
libraries and their collections have new audiences and new roles to play."
Pearce describes recent changes at the National Library, aimed at reducing
the separation between the Web site and the library catalog in order to
draw users into the collection. Visitors to the new color-coded site do not
even need to know what a catalog is in order to find information, she says.
(National Library of Australia Sep 2004)
http://www.nla.gov.au/nla/staffpaper/2004/pearce2.html

Posted by ritavine at 12:27 PM

October 11, 2004

Cautionary Tales of the Wikipedia

Genie Tyburski of the Virtual Chase contributed a brief summary recently of some tests of the Wikipedia encyclopedia. The Wikipedia is a collective encyclopedia which accepts contributions and edits from any contributor, using software to control blatant vandalism. The Wiki model should make any serious searcher cautious, but the Wiki encyclopedia is well known and popular nonetheless. (It will come up high in search results of any search on the keyword encyclopedia.)

I was thinking about the Wiki model today, and it struck me that its "majority-rules" approach to information isn't really much different from those of search engines that rely on link analysis to rank order resources. Anytime I conduct a web search for unknown information in a tool like Google, which uses link analysis in its PageRank algorithms, my results will favor popular, frequently-linked resources. Isn't that almost the same majority-rules approach as wiki-style resources?

Posted by ritavine at 10:35 AM

June 02, 2004

Business web users have a hard time finding what they need

Hmm, are real people getting the message that while search is easy, finding good information can be a challenge for employees? In his article "Not So Simple Search" Jon Surmacz summarizes the results of a Delphi Group survey, which found that business web users are often frustrated with the quality of their search capabilities.

According to the survey, nearly 30 percent of business users spend more than 8 hours per week searching for electronic information, and more than 40% spend at least 7 hours per week (that's a full day!) hunting for information. 62% of the survey's respondents said that they are either dissatisfied or very dissatisfied with their search experience. Note that the definition of "search" included both web search tools, enterprise search tools for company intranets, and other proprietary databases.

Analysts at Delphi say that searching is easier if you know exactly what you need, but harder if you want to browse or discover something new. They lay some of the blame on the search tool designers, who need to design differently. But surely searchers need to think about alternate ways to search besides simply popping keywords into boxes and hoping for the best? There are any number of high quality search tools that permit browsing (think Librarians Index, Resource Discovery Network, even Yahoo's directory). Of course, tools aren't the only answer -- searchers also need to turn their brains "on" and, as Mary Ellen Bates proclaims, "get involved with information."

Posted by ritavine at 07:08 PM

Mary Ellen Bates ponders newfangled search skills in a more-than-Boolean world

In the provocatively-titled "Is Boolean Dead?" in the April 5 2004 issue of EContent Magazine, professional searcher Mary Ellen Bates considers the changing nature of information-seeking in a digital world of big databases, search engines, and services that rely "on the 30,000-foot view of the information landscape." Bates remarks that traditional drill-down keyword approaches don't always work well with large, broad-based databases, and require the searcher to rely on intuition, fuzzier relationships with words and concepts.

Bates advises searchers to "to learn how to manipulate and, well, get involved with information, rather than just typing in words and waiting for the search engine or research source to evaluate the syntax, consult its inverted index, sort the results, and present us with an ordered list of "hits." We will have to understand the nature of the information we are looking at in order to recognize the answer when we see it. In fact, we will have to evolve into Zen researchers, unearthing the answer from a myriad of options rather than simply scrolling through 10, 50, or 100 Web sites, articles, or patents served up by our search tool. Not only will we have to be good searchers, but we will have to be good intuiters as well, able to know where to look for information and how to sort through all the options presented to us."

Posted by ritavine at 06:40 PM

May 21, 2004

Yahoo Results Are Looking More Like Google's

For the last several weeks, Thumbshots.com has offered a free online tool that compares results from the same search conducted in Yahoo and Google. Until recently, the results in the two search engines were different enough to suggest that searching in both might be one way to bring up a greater variety of results.

But the Search Engine Journal weblog reports that results in the first twenty hits (the ones that most searchers look at) are looking more alike now. This is a development worth following, as search relevance will be an important competitive advantage as these two major players duke it out for dominance over the coming months.

Posted by ritavine at 08:28 AM

March 18, 2004

THE INFOGRAPHY - Best sources of information on selected subjects

Spotted on RLG's Shelflife, an organiztion called The Fields of Knowledge has created a Web service called The Infography (in-fóg-ra-phy) as a reference tool that facilitates the identification of "superlative sources of information about a subject of inquiry, viewed through the lens of expert opinion." This is a commercial venture, and it appears that ad revenues will support the site and also provide for some (probably modest) royalties for those subject specialists who elect to contribute resource lists to this tool.

The best way to think about the site is as a searchable list of reading lists. Many of the lists include relevant web links as well as book and journal citations. Though questions remain about how often the lists will be updated (particularly for scientific and technical topics in rapidly changing fields), this site is definitely worth consideration for libraries that support high school and college students.

It's not clear just how many lists are included in the database, but it's substantial -- I searched the keyword literature across the full text of the site and got 192 lists that contained the word somewhere in the full text. A search of marketing delivered 31 lists, including topics such as "Mass Media - Uses and Effects" and "Economics of the Arts." One quibble - the only statement of responsibility on each list is too generic. For example, on the Economics of the Arts page, we see the statement, "A professor whose research specialty is the economics of the arts recommends these sources." Surely fuller attribution is warranted and required.

Posted by ritavine at 01:40 PM

March 03, 2004

FaganFinder: Search By File Format

Ontario university student Michael Fagan has produced another handy meta-search template for searching the major search engines that offer file-type limits. This beta version file format searcher enables users to select the file format type (only one at a time), and then select any one of the search engines that permit the search. Google, AlltheWeb, Yahoo!, MSN Search, Gigablast, AltaVista, and Elsevier's Scirus are included. Readers should note that not all formats can be searched in all of the tools.

Posted by ritavine at 04:51 PM

March 02, 2004

Searching by File Types

In "Fiddling with Filetypes", Greg Notess examines how to search using file type limiters in Google, Alltheweb, and Gigablast. Notess also points out the little-known fact that PDF conversion tools often interpret text strangely, and that initial letters could be separated from the remainder of the word. Notess suggests leaving off the first letter of the word in a doc-type search (for example: try using nalyze filetype:pdf.

Posted by ritavine at 11:29 PM

December 02, 2003

Think Like a Web Page

There is now an open link to my article "Think like a web page: 5 tips for smarter search engine searching.", published originally in the September 2003 issue of Informed Librarian.

Posted by ritavine at 04:35 PM

November 25, 2003

Challenges In Web Search Engines

In the November 16 2003 issue of his ResourceShelf.com weblog, Gary Price reminded readers of a few older (2002) but still-valuable articles related to web searching. Click on the ResourceShelf.com post for links to all three articles, but I recommend that serious searchers of search engines pay special attention to Challenges in Web Search Engines. This 2002 paper offers explanations of the problems inherent in search engine database design and ranking systems, and provides the information succinctly, in plain language (well, as plain as this sort of technical explanation can get). The authors cover a variety of issues that can have impact on crawling, ranking, and retrieval of results, including search engine spam, differences between text-based and links-based approaches to ranking, cloaking, doorway pages, addressing conventions, and use and abuse of meta tags. The PDF report is linked from the document's main web page.

Posted by ritavine at 11:53 AM

September 24, 2003

Web Research Guide from Elsevier

Elsevier has produced a highly readable Web Research Guide, aimed "at scientists, faculty members students, researchers and authors who have access to ScienceDirect. Although clearly intended to promote ScienceDirect, the general suggestions about search tools, written by an unidentified editorial board, are really very good -- and attractively presented as lists of tips with examples and templates.

Posted by ritavine at 05:47 PM

September 19, 2003

Does Information Visualization Matter?

At an industry event earlier this week, I was asked what I thought about information visualization techniques and developments that can enhance the search process.

Tools that help users visualize search results have made news lately. Anacubis is running a demo of visualization using a joint Amazon/Google database; Kartoo boasts a metasearch engine that visualizes results in map-style form.

I've always been skeptical of the value of visualization of print information, for two main reasons.

First, information wants to be read not turned into a map, and I'm not sure that transforming it into a map actually helps people make better choices from lists of links than they would by simply reading and picking from the list. I much prefer clustering technology, which enables grouping of similar resources while retaining a printed, linear environment.

Second, visualization does nothing to improve information quality. Meta-search engine hopeful Kartoo (a French venture) searches against a dozen tools -- search engines Altavista, AlltheWeb, Teoma, Lycos, Wisenut, HotBot and MSN; French portals such as Voila and Le Toile du Quebec; and a handful of others. All the existing problems of meta-search are still present in Kartoo . For example, sponsored links rise to the top, and the nature of the ranking extractions mean that many excellent sites will be missed.

Visualization of bad information doesn't make the information better: it's still bad information. We must consider the sources used in any meta-search tool before accepting it as a valid solution to an information seeking problem.

Posted by ritavine at 08:27 PM

March 15, 2003

Do I Search Fee or Free?

How do you make a decision between fee and free web sources? Do you play spend your valuable search time using free web sources or do you pay the price for fee-based web sites that offer copyrighted content, or value-added information?

Free web sites can be great sources of information, but they can be hard to find and even harder to use. Free sites aren't often designed for easy download of information, and there's often no "peer review" to ensure that information is current or correct. Users need to assess the source of free web information carefully in order to validate data that may or may not be adequately referenced.

Fee-based web sites that offer copyrighted information, like journal articles, are an important extension of the free web for any serious researcher. Much high quality information is published in journal articles, and will never appear for free on the web. Searchers seeking comprehensive coverage of a topic need to make sure that they cover the journal literature as fully as possible. There is a smattering of free journal content on the web (e.g. FindArticles.com) and journal indexes (e.g. PubMed) but the great majority of journal content lives in licensed databases, off-limits to non-subscribers.

Value-added services, which allow the searcher to search multiple databases at once, are good alternatives when time is money. Examples include services like Factiva, Dialog, or Lexis-Nexis. Some (or portions of) the databases available in value-added services can be found for free on the web -- if only you knew where to look. Even so, many of these value-added services offer deep archival coverage, easy and powerful search options, updater services, and content download features that are rarely offered on the free counterparts.

Is there a "rule of thumb" to help make the decision on fee-versus-free? In our experience, the more commercial or "saleable" the information, the harder it will be to track down on the free web. Business, investment, sales and marketing searches can benefit significantly from value-added services. For academic research, there is good free web content in many disciplines, but journal articles remain essential. You can try some searchable journal article services like Findarticles.com or document delivery services like Ingenta but their search capabilities are primitive and frustrating. A visit to your local library to access the best indexes and databases will probably be required.

Posted by ritavine at 11:11 AM

March 08, 2003

Planning a Trip with the Web

Consider going beyond search engines to find better and more varied information for multi-faceted projects. Trip planning using web sources can demonstrate how using multiple sources improve search quality.

Try finding information on travelling to London for a week-long holiday -- and test your search in Google using the keywords london and travel

The search results are illustrative of one of the limitations of search engines -- that they are good at helping you find something that you know is on the web already. Using our travel example, Google will turn up links to the most popular links -- hotel sellers, the BBC, and some popular dotcom web sites.

Test a similar information search in Librarians Index to the Internet . Instead of using keywords, browse under the heading Travel. In both the General Resources section (often mistaken for a title in LII, not a link!) or any of the travel subcategories, you'll discover some gems -- a universal packing list, customs regulations, links for sernior and youth travellers, a world museum portal and much, much more. Keyword searching in the LII search template using britain reveals useful subject headings that can lead you to more regional resources, route planners, art guides, historical information, locales that served as models for the Harry Potter book series, and the list goes on and on.

Consider the role that browsing and serendipity play in information retrieval. Above all, when you are using browsable tools, browse them rather than search using keywords. You may be surprised at how much you are able to find.

Posted by ritavine at 10:16 AM
Description
SiteLines is written by Rita Vine, a professional librarian, web search trainer, and lead site evaluator of the Search Portfolio web search product.

Together with other members of the Search Portfolio selection team, Rita monitors over 50 key alerting services related to web search tools, site announcements, and the business of web search. SiteLines is intended to present a distillation of the most important trends, news, and new web search tools and directories.

Sitelines is sponsored by the Search Portfolio, a licensed web desktop of the 100 top peer-reviewed web sites for searching.

Subscribe
Subscribe Unsubscribe
Search


Archives
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
Recent Entries
Wikipedia Founder Admits to Serious Quality Problems
Thinking About Where Information Lives
Problems Dating Web Pages
RECONCILING THE ORDER OF THE LIBRARY WITH THE CHAOS OF THE WEB
Cautionary Tales of the Wikipedia
Business web users have a hard time finding what they need
Mary Ellen Bates ponders newfangled search skills in a more-than-Boolean world
Yahoo Results Are Looking More Like Google's
THE INFOGRAPHY - Best sources of information on selected subjects
FaganFinder: Search By File Format
Categories
Boolean Searching (1)
E-Mail (4)
Google (55)
Handheld Computers (1)
Images (2)
Information Literacy (10)
Internet Filters (3)
Miscellaneous (15)
News Stories (17)
Patents (1)
Podcasts (1)
RSS (4)
Resources - Business (13)
Resources - Health (23)
Resources - Misc. (47)
Search Engines (8)
Search Engines - Best Practices (14)
Search Engines - Business Issues (26)
Search Engines - Impact on Searching (8)
Searching - Best Practices (17)
Searching - User Behavior (10)
Software (9)
Spyware (2)
Staying Current (3)
Trends & Predictions (4)
Weblogs (2)
Yahoo! (2)
Links
SiteLines Home
Workingfaster.com
Upcoming Courses
Search Portfolio
XML for Site Syndication(XML)