October 27, 2005
The Challenge of Evaluating Health Search Tools
Tony Gentile of Healthline has posted a long and interesting comment on my review of Healthline.com at http://www.buzzhit.com/2005/10/rita-doesnt-dig-us.html
Near the end of his article, Gentile asks,
“Is your fundamental believe [sic] that only ad-free content can be trusted? If so, unfortunately, that would put many companies out of business.”
I believe that there is good information on health matters provided by all kinds of web sources -- educational institutions, commercial web sources, and maybe even in some wiki-style sources (although even Wikipedia’s founder has publicly admitted to serious problems with content quality).
In my view, the issue of trustworthiness of search tools isn't simple. Evaluators like me are concerned more about disclosure, the content of the index, and the relevance and quality of results. Trustworthiness of search tools emerges when levels of disclosure, content, relevance and quality are satisfied. It's a package deal, and it doesn't matter if the tool is commercial or non-commercial in origin.
THe vast majority of commercial search tools on the web earn their revenue from different sources in order to deliver free information – and that includes not just sidebar ads, but also things like paid inclusion of content, paid placement for top results, partnerships, sponsorships, and other types of mechanisms that lead to preferred inclusion, or improve the positioning of content of paying partners. That immediately distorts the playing field of information indexed and retrieved by these search tools.
More disclosure, and more visible disclosure (i.e. alongside the actual result) would help a lot. I'm not the only one who feels this way: Consumerwebwatch.org has commissioned several studies on disclosure, and continues to concluded that there isn't enough of it, and that more of it, more prominently placed, would help consumers a lot.
Unfortunately it rarely happens, because there’s no outright requirement to disclose, except for lightweight US FTC guidelines, which govern only US-domiciled search tools.
Search tool marketers aren't any different from marketers of any stripe -- if they do their job right, people will believe that their product is somehow better, different, more comprehensive, and more relevant – in other words, more trustworthy. Trust is particularly important for consumer health information. Sometimes on-the-page marketing content is really helpful in building trust, often it's a little over-zealous, sometimes it is downright misleading, and sometimes there is no information at all. It's my job to scratch the surface and figure out if the search tool can actually deliver the goods.
Reviewers like me face challenges when reviewing commercial search tools. Without sufficient on-the-page disclosure, we must look at results of searches, in order to figure out the underlying index, the content delivery structure, and compare those results to what we could find in existing search tools.
Let’s look at the disclosures and promotional information in a tool like Healthline, as an illustrative example. According to the web site, 1,100 doctors are associated with Healthline. They helped to build Healthline's healthmaps, synonym and taxnonomy structure, and are associated with Healthline's original content. Healthline's editorial policy speaks of physician and editorial involvement in original content creation and selection of licensed content.
I couldn't find any original Healthline-created medical content anywhere in the Healthline site during the three times I checked it (my last check was October 27) -- the best I could find was third-party licensed content. In his blog post, Gentile also that there is much original Healthline content that isn't yet on the site -- built by those same doctors. I hope that the original content is added soon, because Healthline’s editorial policy page leads readers to believe that the original content is there already.
In the How We're Different section of the help file, Healthline states:
."..when you search with Healthline, we use the collective knowledge of over 1100 doctors to give you more precise search options, all just a click away (we call this Medically Guided Search)."
And in the Guided Tour, this statement appears:
"What if you could have over one thousand doctors help you search for health information on the Internet?"
In my view, Healthline's statements about physician involvement can make searchers believe that whatever they see linked from the Healthline site is either selected or written by those doctors. That’s not the case, and although the exact nature of physician involvement is disclosed in the site's help file, it took me three careful re-reads to realize that much -- even most -- of the content in Healthline's search index isn't vetted by those physicians.
There's also no disclosure that I can find anywhere on the site of just who some of those 1,100 Healthline-associated physicians are. Where are the names and credentials?
What about Healthline's healthmaps? Healthline claims that at least some of those 1,100 doctors helped build those healthmaps.
Some of the healthmaps are so generic that they look like placeholders. Perhaps they are, and will be updated in future builds of Healthline content. (I ran this check on October 27.) For example. see the Healthline healthmap for LUPUS NEPHRITIS, which looks fairly generic. In fact, it’s identical to the one for HEADACHE. Like the healthmap for HEADACHE, one of the Healthline healthmap links for Lupus Nephritis is to PREVENTION, an odd choice for a condition where Healthline's licensed ADAM-derived reference article states that there is no known way to prevent the disease.
And what happens when you click on PREVENTION in the Lupus Nephritis healthmap? You get links from the Healthline index that match the keywords lupus+nephritis+prevention but not, as the healthmap would lead you to believe, about prevention of lupus nephritis specifically. In fact, I only spotted one link, from a 2002 research article, that tangentially mentioned the possibility that heparin could prevent the occurrence of nephritis in experimental lupus.
Searching for another, more developed, healthmap for a disease, I went to the ACNE healthmap. When you click on a link for SYMPTOMS in the ACNE healthmap, the links presented come from the general index, using the pre-encoded search keywords ACNE SYMPTOMS. To see if the mapped links would be any different from the links I would get from a keyword search in Healthline, I went back and did another search in the top page of Healthline for the keywords acne symptoms. I got exactly the same 13 results that were delivered by the healthmap.
What good is a visual taxonomy if the linked information isn't any better than one could get with a simple keyword search? The idea of a medical taxonomy is to deliver information results that keywords alone couldn’t achieve.
Ultimately, it's the quality of the search results for ACNE SYMPTOMS that matter most to health information searchers. Healthline tells us:
"Healthline only searches the top health sites on the web, so you only receive the best health information, without having to sift through pages of unnecessary and unrelated results."
Among the 13 results delivered from a search on the healthmap for acne symptoms, I got links that included basic dictionary-type acne information from a yoga site; an online questionnaire that I could submit to a Rosacea-treatment manufacturer on my Rosacea symptoms; and an "Acne Knowledge Map" from an alternative treatment shopping site, GoldBamboo.com.
When one compares the quality of results in Healthline to those in other comparable health search tools, like MedHunt (http://www.hon.ch/MedHunt/) it’s easy to see the difference. MedHunt’s search for the keywords acne symptoms reveals 717 results (compared to 13 for Healthline) that are clearly differentiated in useful ways. Commercial sites are noted. Some are reviewed, others not, and that too is disclosed. Hits are organized by relevance but there are alternative ways to display them. Disclosure statements and content descriptions are easy to find in MedHunt's About Us section.
The web-based health information world already has good search tools – MedHunt and Medline Plus are just two examples, but I could provide more. I hope that more good search tools will continue to be developed and existing ones refined. When we review new search tools, it’s important to compare new tools to best-of-breed tools that already exist, and to test all claims of quality, comprehensiveness, relevance, and authority.
UPDATE October 28 2005: Somehow I made an error citing the contact for Healthline. I have corrected the inforamtion in this post today. Oops. Thanks for Tony of Healthline for spotting this and letting me know.
October 19, 2005
Wikipedia Founder Admits to Serious Quality Problems
It's hard to believe that the Wikipedia has led such a charmed life. Encyclopedia-by-committee, even with some editorial oversight, is prone to hazards. There's amazingly variable quality between entries, and it's almost impossible to prevent always-present hackers from inputting bad, wrong, or dubious information ...just for fun.
So it's no surprise that the Register reports this week that Wikipedia's founder is reporting serious quality problems. Since when did information-by-committee replace serious editorial review? Read the article at http://www.theregister.co.uk/2005/10/18/wikipedia_quality_problem/.
October 18, 2005
Scratching Under the Surface of a "New" Health Search Engine
Lots of buzz this week about Healthline.com, a new vertical search engine for medical information. Chris Sherman, in his SearchDay review, quotes the company's promotional material, which indicates that the site covers "62,000 web sites with between 45-50 million pages... [and] hosted content licensed from reliable content providers."
However, my own initial examination showed a site that offers little to rival the best quality ad-and-sponsorship-free medical content on the web through sites like Medline Plus. Healthline relies principally on content from popular pre-existing 3rd party .com sources that could be obtained from any commercial search engine.
I conducted a search of the keywords lung cancer in order to obtain results. The first link, Lung cancer - small cell (Doctor-Reviewed information) led to a brief definition of the term, reviewed by Allen J. Blaivas, D.O., Division of Pulmonary and Critical Care Medicine, UMDNJ-New Jersey Medical School, Newark, NJ, and updated in early 2005. A small unlinked logo on the right side of the page suggested that the content was derived from A.D.A.M., a popular consumer health encyclopedia which is also used in Medline Plus and many others. A close comparison revealed exact duplication of content, and attribution to Blaivas, in both the Healthline entry and the A.D.A.M. entry.
Returning to the list of top results in Healthline.com for lung cancer, I wondered about the inclusion of lung cancer links from the site Worldhistory.com. What's the link between lung cancer and history? Answer: nothing. The site is simply a domain name that repurposes the content of the Wikipedia, word for word.
Another link, to lung cancer links from Goldbamboo.com, offer "a comprehensive online source that combines Eastern and Western health and wellness information." The links to lung cancer information are largely sourceless, although it's clear from a cursory examination that all or almost all the informational content is repurposed from pre-existing sources that contain the keywords "lung cancer". In the about page of Goldbamboo.com, it becomes clear that the purpose of the site is to offer paid inclusion of content: "Each page view includes highly relevant product and advertising information tailored to the consumer's stated interests. "
Most of the remaining links on the Healthline search results page represented commercial sites such as Medicinenet.com, Healthwise (in this case repurposed through the Everett Medical Clinic, a chain of private medical clinics in Washington State), and Emedicine.com -- three popular commercial information sites whose content is repurposed in many other information sites, all well-linked in major search engines. The news links are derived from general news sources repurposed from Topix.net, a well-known commercial provider of news services.
The only site in the list of links that lacked advertising or some other form of paid sponsorship or repurposing of pre-existing commercial partnered content was NLM's MedlinePlus.gov.
I don't mind companies like Healthline trying to market themselves with a little puffery during launch (like asserting that the site was "created in collaboration with 1,100 physician specialists") but it's important for reviewers not to believe everything in the press kit. Clearly the vast majority of those "physician specialists" in Healthline come from somewhere else, likely the 3rd party content providers who supply much of the Healthline-branded content.
If plain-language lay-level searching is important, there are plenty of better options out there for non-medical searchers. You don't have to speak "medicalese" to MedlinePlus, the Canadian Health Network, or OMNI, either, and even the research-focused PubMed offers spelling checks, terminology alternatives, and sophisticated back-end query reformulations to non-experts.
And what about the content of Healthline? Much of it is derived from common, well-known and well-positioned consumer web content, and there is little about the search results that look different from anything else on the commercial, free web. Surely, if sites like Wikipedia -- where volunteers with minimal editorial oversight contribute content, and even its CEO concedes serious quality problems -- qualify as "high-quality, authoritative information", what does that say about the standards that information professionals set for the quality of health information?
October 17, 2005
Google Scholar Grows - An Update
Google Scholar's chief engineer, Anurag Acharya, contributed a presentation “Searching Scholarly Literature: A Google Scholar Perspective” at the 9th World Congress on Health Information and Libraries, September 23, 2005.
Some key points:
The index has grown significantly in the last six months, although the company does not disclose the actual index size
Coverage by category is focused on medicine and sciences -- medicine 22%; engineering 14%; biology and sociology, 13% each; physics 12%
GS indexes full text of all publishers except for Elsevier and ACS (probably because Elsevier's competitor product, Scirus, is the publisher's preferred source of Elsevier's full-text content)
Google Scholar's sources come not just from publishers (both open-access and commercial), but also from MANY 3rd party hosting services like Highwire, Ingenta, and academic institutional repositories.
Acharya also provided some clues on the question of how relevance is achieved in Google Scholar. He mentions conditions such as "who wrote it, where it was published, how many people cite it, where citations are from" as clues to the relevancy question.
Acharya claims that, contrary to reviews indicating otherwise, the indexing of PubMed content is "fairly complete." (Watch for a test of that assertion in a future issue of SiteLines.)
October 05, 2005
New! Google Blog Search
Not to be outdone by upstart competitors (Technorati, Blogdigger, Feedster, and more), Google has announced a beta-version of its blog search. This is still a baby-beta version: it covers blog content back only to June 2005 so far, although it's reasonable to expect that the coverage will increase as takeup of the product ramps up.
Unlike most Google search appliances, Google's Blog Search doesn't search the full text of blogs -- rather, it only searches the "feed" -- the part of the blog posting that an author sends out through an RSS feed. Most bloggers only send a short part of their blog posts through feeds, and as a result Google's blog search won't cover the parts that aren't fed. What's more, if a blog lacks a feed (through either RSS or Atom) Google's Blog Search won't index it at all.
Although this seems like a substantial problem for the moment, I expect that most bloggers will catch on to this and adjust their feed content to contain either fuller blog content or more carefully-crafted keywords in the feed content.
Google's Blog Search also provides its own RSS feed. That's not particularly novel, but with a tool of this size, it's a big plus for those of us who are trying to keep up to date in the ever changing web world. Just key in your search, then click the feed button to get feed link that you can plug into your newsreader of choice.