informatics institute logo Informatics Institute UMDNJ logo
Bioinformatics Workshop: Information Integration
 

Literature and Data

Libraries
Bibliographic Software
 
 
 
Literature Search Techniques

Overview

OK, it's pretty easy just to click on a link in a database and get a journal citation. But more often than not we need to do a more complex search. Our experience with web searches leads us to understand the ease with which searches can be accomplished. That same experience also shows us how little relevance the outcomes have to what we were looking for.

Understanding search strategies and the rules of an information system can make us better scientists. We'll use two tools, PubMed and the libraries' licensed resource, OVID, to examine details of searching the literature.

Controlled Vocabularies

Mature literature databases employ a "controlled" vocabulary, relevant to the discipline. There are many ways to say that you are interested in a specific disease, for example. A controlled vocabulary for searches in the biomedical literature is defined by MeSH, "Medical Subject Headings." The primary value of a controlled vocabulary is that it accounts for synonyms and variations in spelling. Controlled vocabularies enhance the specificity of searches.

Finding the correct term is not always easy. New terms are continuously evaluated and, sometimes, older terms require adaptation.

Examples:

  • The MeSH term for the influenza in humans was "influenza" until 2006, when it was changed to "influenza, human".
  • The MeSH term for haemagglutinin is "hemagglutinins" (with additional narrower terms)
  • The MeSH term for influenza virus hemagglutinin glycoprotein is " hemagglutinin glycoproteins, influenza virus ."
  • The MeSH term for bird flu was "influenza, avian" until 2006, when it was changed to "influenza in birds".

MeSH headings can be

  • "major" or "focused" - specifically about the topic
  • "minor" - including, but not necessarily about, the topic

MeSH headings are organized into a hierarchy.

  • Articles are indexed by humans to the most specific, or narrow, term
  • When searching a broad term, you need to "explode" the search if you wish to also retrieve citations indexed under the narrower terms.
    • PubMed explodes topics automatically for you.
    • Ovid asks you to choose regarding the "explode" option.

DIfferent tools provide varied ways to hunt for the MeSH terms you will need to do good literature searches. Some of these tools are public, some require a subscription.

Consider the issue of "personalized medicine," a phrase used to describe how medical treatments may be customized to individuals' specific genetic makeup. We'll see how the term is handled in both "public" and "subscribed" spaces.

  • Public space: PubMed or MESH Query = personalized medicine
  • What's happening? PubMed or MeSH combines four searches with the Boolian operators, promising to return hits with "personalized" in any field AND (as well as) a set of other properties enclosed in parentheses.
    • TIAB searches Title/Abstract from any PubMed record that is not part of Medline and then searches through the entire Medline database for references including the MeSH terms "pharmaceutical preparations" OR "medicine," OR, finally, medicine as a text word in the record.
  • Note: in this example, both PubMED and MESH databases translate to the same query. That is not always the case
query translation
  • Subscription space for UMDNJ - Ovid Query = personalized medicine
  • What's happening? Ovid offers a selection of MeSH headings based on a proprietary algorithm.  In this case you would probably select "pharmacogenetics"

OvidSearch

OvidResults

Keyword Searching - Uncontrolled Vocabularies

"Keywords" were a good idea gone bad. Some people thought that journal authors or web page authors could imagine, in an unstructured vocabulary, a set of terms that would be chosen by people who might be interested in their subjects. As computers got faster and databases encompassed entire text, the need for specific key words has shrunken.

Generally when we search for "key words," we are looking for words in an article or book title, an abstract, any controlled vocabulary, or possibly the full text of an article, book, or web page. Sometimes free-text or keyword searching is the right way to go. For example,

  • The topic may be too new to have a controlled term assigned.
  • You may want to restrict your results to include a term that is not in the controlled vocabulary.
  • You haven't found many results using the controlled vocabulary. Keyword searching will locate references that include your terms in the title, abstract, or full text. The choice of which of these to search may be yours.
  • Unlike using a controlled vocabulary, you must think of all possible synonyms and spellings.

Field Searching

Field searching is a technique that can be used for both controlled and uncontrolled vocabularies. Most of you are familiar now with the field structure of GenBank. MEDLINE, too, has a decided structure of fields that can be used in searching.

  • GenBank queries are sometimes best made with attention to special fields such as
    • LOCUS
    • ACCESSION
    • REFERENCE
    • etc.
  • MEDLINE queries through PubMed can specify one or more fields such as
    • Author
    • Affiliation (Institutional)
    • Grant Number
    • Secondary Source ID (Molecular sequence or structure)
    • MESH
    • Title (Article)
    • Journal Title
    • Title/Abstract
    • etc.

Just as we earlier in the term examined the records and field definitions for GenBank, an exhaustive description of MEDLINE fields is at NCBI. Depending on the database or search interface you are using, field options may subtly change.

Web Searching

We are all familiar with dropping terms into Google to find interesting pages on the web, but we have only a limited idea what it is doing behind the scenes, and search engines will mix scientific journal articles with other, less well-founded information.

Here's what Google tells us about how it does its ranking. Web search engines use private, or proprietary, algorithms for searching and ranking the results. These algorithms change often in a neverending battle with savvy webmasters who figure out how to get their sites on the top of the list.

Google Scholar is an attempt to assist web users to retrieve more scholarly literature than general search engines.

  • Google has made arrangements (and is having other fights) with some publishers to search full text and link to it, but actual retrieval of the full text depends on whether the articles are freely available or your institution has a subscription to the e-journal.
  • Set Google Scholar's preferences to "Show links to import citations into" your bibliographic software if you wish. More on that at the end of this workshop.

Structured Queries on the Web

Most search engines use some form of Boolean logic, allowing you to use AND/OR or +/- to include or exclude terms and offer some kind of advanced search that allows you to fine tune your results. Behind the scenes, they quietly combine your terms and hopefully come up with a useful list of results. The advanced search features add some form of structure to your query, but you are still using key words and need to allow for synonyms and variations in spelling.

Google Scholar, as well as general and specialized web search engines, will retrieve citations, and sometimes full text, from scholarly literature. Google Scholar's advanced search feature allows you to specify an author, journal, time frame, or general subject area as part of your query.

Though they offer some basic form of structured searching, most web-based search engines do not offer the highly precise and structured query interface available through bibliographic database interfaces such as Ovid MEDLINE or PubMed.

Page last updated March 4, 2008

UMDNJ logo Informatics Institute informatics institute logo informatics institute logo