May 29, 2003
Does Google prefer recent pages?
In the article "What Google Leaves Out" Elwyn Jenkins of MicroDoc News describes a test he performed to try to identify what types of information pages Google didn't spider. Jenkins set an automatic crawler to look for occurences of the benchmark word "googlology" (he knew something about the history of the term and the pages that contained it) and found that Google picked up about 29% of available pages that had the term. The pages that are left out of the Google database tended to be older uses of the term.
Does Google prefer to index recent pages rather than older ones? In the long term, what impact will this have on the availability -- and retrieval -- of legacy information on the web? I hope that this early-stage research will be expanded and we'll see more search engine legacy studies in the future.
May 28, 2003
Meta-Search Template for Images
Michael Fagan is an Ontario high school student and creator of FaganFinder, a search meta-resource consisting of several all-in-one pages that permit keyword searching for various categories of materials. One of the most useful parts of FaganFinder is the Image Search Engines page, which enables users to conduct off-the-page keyword searching of over 40 image databases. Just check the radio button of the source you want to search, then type in your keyword(s) and search.
I tried searching "superman" off Fagan's page for about 2 minutes (I timed myself) and came up with several good-enough links to make my time well spent.
I'm not usually a fan of secondary-source search templates ("secondary source" in this case means one site (e.g. FaganFinder) providing a keyword search link to a second site (e.g. Google Image Search) which is the actual "primary" source of the database). Much can go wrong with secondary-source programming: you're entrusting the search to a third party who may not instantly update the search link when it changes. As a result, your search may incorrectly produce low or zero results In addition, secondary-source search templates may not contain all the search options of the primary search page, which could mean more low/zero results.
But with those caveats, this little tool is a handy way to quickly check many sources of pictures and icons.
May 26, 2003
Creating Learner-Centered Instruction
Creating Learner-Centered Instruction is an open-access, free online course from the Faculty Development Institute at Virginia Tech. The course is "designed to provide ... the necessary resources to begin investigating the process of designing instruction to enhance student learning." The self-study program is dense, and not for beginners, but experienced trainers will find many fresh ways of thinking about design and planning of learner-centered training.
I particularly liked the lessons on creating student goals and objectives. I spend a portion of my time training librarians to plan and design effective end-user learning experiences, and without question, creating good objectives is the hardest part of the process.
Good objectives can inspire all other aspects of a training session, and so it's important to get them right. As occasional trainers, it's common for library professionals to get too caught up in seeing goals from their own point of view, rather than the learner's. As a result, much unnecessary content ends up being included in training programs, and the learner isn't fully engaged in the learning experience.
May 22, 2003
Key findings from the Consumer WebWatch Study on recognition of paid results in search engines
Consumer WebWatch (http://www.consumerwebwatch.org) the web watchdog arm of Consumer Reports, will release an important study in late June 2003 on the degree to which consumer web users can recognize the presence of paid listings on a search site. We'll have to wait until June for the full report, but preliminary results of the study were reported at a recent industry panel held on April 24 and titled Building Trust on the Web: Consumer WebWatch's First National Summit on Web Credibility
The April 24 panel paired pay-for-placement leaders Matt Cutts of Google and Doug Leeds of Overture against the research. The study's key findings tell the story beautifully. "All study participants expressed surprise after learning about pay-for-placement (some had emotional reactions), and their behavior changed accordingly. Most consumers said paid search disclosure information on search and navigation sites was too difficult to recognize or find on many sites, and that the information available was clearly written for the advertiser, not the consumer."
Keyword-selling search tools (like Google and Overture) have always asserted that consumers know the difference between pay-for-placement content and "true" search results. Our own experience at Workingfaster.com in training both typical and more advanced searchers matches the findings of the report -- that almost no searchers are aware of pay-for-placement content until it is pointed out to them, and thereafter their behavior and attitudes change.
May 21, 2003
Resources on Corporate Governance
Looking for something else, I came across an excellent annotated list of web resources related to corporate governance, written by Louise Klusek, librarian at Salamon Smith Barney in New York. It appears in the Winter 2003 issue of ChapterNews, the newsletter of the Special Libraries Association New York Chapter. Available at http://www.sla.org/chapter/cny/chapternews/75_1.pdf, the article is on pp. 12-13.
These web site features appear regularly in ChapterNews -- I spotted similar articles on anti-money laundering, (Fall 2002), venture capital and private equity (Spring 2002). Business and law librarians may want to bookmark the ChapterNews site -- the web resource lists are valuable and the topics-- which lack precise terminology -- often prove difficult to research effectively in search engines alone.
Dispute Escalating Elements of Email
This has little to do with web research, but it's a great article by an established authority in the field of dispute resolution and worth a read. Raymond Friedman of Vanderbilt University specializes in the study of conflict, negotiation, and diversity. His article "E-Mail Escalation: Dispute Exacerbating Elements of Electronic Communication" suggests that some of the benefits of email are responsible for its tendency to escalate conflict.
An excerpt from Friedman's conclusion:
"...e-mail does have some characteristics that make it highly susceptible to conflict escalation: E-mails [sic] reduces feedback and social cues, allows for excess attention to be focused on statements made, introduces new tactics (such as argument bundling) that can lead to the use of heavy tactics, makes the
other’s party’s tactics seem more heavy, creates deindividuation, enhances biased perceptions of the
other party, and makes it harder to resolve disputes. As a result, escalation is more likely than would be the case in face-to-face or phone communication. These problems can be managed, and perhaps – over time – most people will be come skilled enough in e-mail and aware enough of its risks that the
effects we propose will disappear. For the time being – and probably into the foreseeable future—we must use caution regarding how we act when addressing and resolving disputes via e-mail."
May 16, 2003
What happens when Google updates its index?
Denise Bisson, librarian and indefatigable selector of quality web sources for the Search Portfolio, found information on the "Google Dance" -- the process that Google goes through to recalculate its ranking algorithm, PageRank. The process can take several days to complete. During this period, search results may fluctuate; sometimes minute-by minute, hence the use of the term "dance" to describe it.
Phil Craven's article about the Google dance provides a good explanation in plain language. I also discovered a helpful although unattributed visualization tool of the "dance" in action, which allows you to check the index at all 3 Google servers simultaneously. Possibly not useful for the average searcher, but the story feeds our inner geek.
May 15, 2003
AgeSource Worldwide
AgeSource Worldwide is a new free database on aging from the American Association of Retired Persons (AARP). It's a nice compliment to the better-known AgeLine (also available form the AARP) which indexes aging-related information in books, journals, and videos.
AgeSource Worldwide is not a journal index, rather it is a metasite that includes links to "clearinghouses, databases, libraries, directories, statistical resources, bibliographies and reading lists, texts, and Web "metasites" focused on aging or closely allied subjects." This is an excellent resource that belongs in any link list related to aging, health or wellness, and a great model of how high quality links can be compiled and delivered over the web.
As a new resource, it's not large yet - only 200 resources, and although the project is international in scope, over half originate in the U.S. Don't be put off by the apparently small number -- each link represents a sizeable collection of items.
AgeSource enables topic selection through a series of checkboxes -- a big bonus when search language can be variable -- and it's also keyword-searchable. Helpful annotations precede entry into any of the links. The site is also available in French and Spanish.
May 13, 2003
Google World - Everything Google
Since its inception, Google has quietly taken on many small experimental projects -- some quite useful to serious searchers. Tools like Google Sets (algorithmically finds synonyms to keywords in the Google cache), Google Viewer (delivers snapshots of web pages in the hit list, slowly) and others are interesting and fun to try, and some offsite searches (like the links to Google's patent applications) offer clues to development strategies taking place at Google.
It's always a struggle to find these pieces on the web or within the Google sitemap but now there's a way. Google World, an easy to use mini-directory from Indicateur.com, links to all those little Google pieces that are hard to find unless you can remember the exact names or URLs.
Thanks to Genie Tyburski of the Virtual Chase for finding this little gem from Chris Sherman's longer Search Day article.
May 12, 2003
Google launches Canadian and UK news services
Google announced today that it has launched both Canadian and UK news services. Readers should be advised that there are only the most minor differences between these regional Google news versions and the baseline version of Google News (http://news.google.com).
I compared http://news.google.com with http://news.google.ca and there are only minor differences. Although both baseline Google News and Google News Canada cover the same 4,500 news sources, an algorithm selects Canadian-focused news stories in the .ca version and places them more prominently on the front headlines page.
The differences pretty well stop there. I searched the keyword "sars" in both the .com and .ca versions of Google News to discover precisely the same results in the results page, in precisely the same order, so no visible differences there.
The news announcement received loads of press today -- all this for a barely altered algorithm?
More on what's behind web addresses
Librarian Greg Notess' On the Net columns in Online Magazine are required reading for serious searchers. In the May/June issue, Greg deconstructs web addresses, helping us understand basic conventions but also covering topics as web-address shortening tools, alternative URLs, how URLs can be altered for tracking purposes, and ways that spammers can cloak a URL so it can't easily be identified and reported as spam.
May 10, 2003
Web Links Losing to Search Engines
According to a March 2003 report from WebSideStory, the percent of Internet users worldwide using search engines to arrive at their desired online destination is growing while all other ways of finding sites are shrinking. Over the past year, search engine use great to over 13% from just over 7%. Meanwhile, the number of people using web links has declined dramatically from over 42% in March 2002 to roughly 21% as of March 2003.
This trend will likely continue. Search engines like Google are featured so frequently in the news that awareness of and use of them will doubtlessly increase. As a result, browsing through high quality filtered catalogues will become a lost art, as will the discovery of web resources through serendipity (much like browsing the library card catalogue, if anyone remembers that).
Yet our classroom tests and exercises routinely prove to serious searchers that search engines alone don't perform as well as searching/browsing several different types of search tools to identify topical resources. Search engines are great for finding things you know are on the web already, but insufficient for richer resource discovery. Search engines like Google help us discover what is already well known and popular, but relegate unpopular (but often high quality) resources to the bottom of the hit lists where they will almost surely be missed.
May 07, 2003
NLM Implements Educational Clearinghouse
Another winning initiative from the US National Network of Libraries of Medicine -- an Educational Clearinghouse Database as part of their National Training Center and Clearinghouse initiative. The database links to information, training resources and available courses related to biomedical topics and tools from the National Library of Medicine, the National Network of Libraries of Medicine and other governmental, educational and not-for profit sites.
There isn't much in the database yet (I counted about 100 entries today) but that will likely change quickly. Medical information searchers and librarians will find many of the resources immediately useful. One cautionary note -- as a clearinghouse there is some variability of content quality, so caveat lector.
May 06, 2003
How to Search by Format
Read this article by author and business information researcher Mary Ellen Bates on how to search by format for special information types (Adobe Acrobat files, MS Office formats). Although Bates provides examples from the competitive intelligence world, instructors can use the format lookup options in selected search engines as a method to to identify sample training materials in DOC or PDF formats and topical presentations in Powerpoint.
May 05, 2003
Librarians in the Movies
Martin Raish, Director of the library at BYU-Idaho, keeps a list. Two choices here: fuel your firey rage against the persistence of librarian stereotypes, or just enjoy it. Shhhh....
May 04, 2003
From Google: Why Pages are Missing from the Google Database
Although it's certainly not a complete explanation of why so much content is missing from Google's database, Google provides some answers. The article is directed at search engine optimizers but is helpful to advanced searchers as well, and reminds us that Google is but one of many search starting points.
May 02, 2003
Comments on AskJeeves' Redesign
AskJeeves.com redesign presents a cleaner, whiter interface, with image, news and shopping search options. Initial results are identical to Teoma (which is owned by AskJeeves) but lack Teoma's Related Search links and helpful sidebar to subject portals. It's these "subject specific popularity" algorithms that make Teoma an interesting Google alternative. (See my posting Subject-Specific Popularity -- Teoma's Magic Bullet? for more info on this feature.)
Nothing in the redesign changes my resistance to including AskJeeves in a serious web searcher's toolbox. If you want to keyword-search the web for information, Teoma is a better choice than AskJeeves. For images, check out some of the resources reviewed in Review of Image Search Tools.
AskJeeves for Kids remains a good quality choice for kid-friendly sites, as it lacks some of the advertising gimmickry that seems to infect most commercial web search tools aimed at adults.