[Advanced Retrieval, 680K]
The
Advanced Retrieval
search engine is a very powerful tool offered by Textpresso for real-time information retrieval from
C. elegans
literature. Like the
Simple Retrieval
, the Advanced Retrieval tool is designed to combine keywords with ontology categories in performing searches. However, the
Advanced Retrieval tool allows the user to specify boolean terms, ontology category subgroups and number of occurrences of an
ontology category
or keyword in the search.
If used properly, the Advanced Retrieval facilitates the formulation of queries that are closer to
semantic questions than any other search engine for biological literature
. A well designed Advanced Retrieval query allows the user to ask
questions such as, "
What genetic expression affects Ras signaling?
".
In order to make optimal use of the Advanced Retrieval tool, the user must have a good understanding of
sense meaning of each ontology category and its subgroups.
A few notes before we start:
- The first row in the query table must be filled out, otherwise no matches will be returned (this prohibits the
user from performing just a keyword search with the Advanced Retrieval tool). Any other lines left empty will not
affect the query.
- The Advanced Retrieval search engine processes rows from top to bottom. In other words, the matches of the first row are found first,
the system then applies the boolean operator of the next filled row (if it is present) and matches the next row against the return of the
first and so on ....
- It is possible to generate some pretty big returns with the Advanced Retrieval tool. The goldturtle server is set to time out after 5min.s,
so, until we get a better server, try big return queries at times of low traffic for best results (i.e. at night).
The output of the search is displayed on a "
Summary Page
",
which summarizes details of the publications which contain matches. The user selects which matches they would like to view and the results
are shown on the "
Results Page
".
Read on to find out how to use this powerful tool!!!!
1. Search Buttons
Clicking on the "Search!" button initiates a search with the inputed parameters. There are three other buttons. The "Load last query!" button
is a useful feature that remembers the last user query and loads it again. The "Clear last query" sets the query table back to the
default values. The "Undo current changes!" button allows the user to conveniently
undo the last change they did to their search.
2. Text Corpus Selection and Author/Year Search
The user has the option to choose to search any combination of the "Titles", "Abstracts" and "Papers" by selecting the boxes beside
these options.
3. Sentence Search vs Publication Search
The user has the option to search the input parameters within individual sentences or entire publications. Note: this only applies when
"Abstracts" and/or "Papers" are selected to search against.
4. Setting the number of category and keyword rows in the query table
The user may set an alternative number of rows for ontology category and keyword in the query table if they wish. The default number (three rows
of each) may be customized in the
customization page
. Alternatively, to set a number of rows temporarily, use the "Reproduce query table"
button at the bottom of the Advanced Retrieval search page, setting the category and keyword row numbers as desired.
5. The query table features
-
Boolean Operator
The user may choose to specify one of three boolean operators (AND, OR, NOT) between any two keywords or ontology categories. The default
boolean term is AND.
-
Category or Keyword
Keywords and/or ontology classes are selected in this column. In the top portion of the query table, the user may choose from a drop down menu
of ontology classes. Keywords to be searched are entered in these search fields at the bottom portion of the query table, for example,
"let-60", "dumpy" or "1999". The wildcard insertion (
*
) is used by default after each word. For example, if you type "egl" in the , the return would include matches with egl, egl-1, egl-2, egl-30
etc. The user can select the exact match box to turn off the wild card insertion (see Category or Match Attributes below). Note: the keyword search is
not
case sensitive. There are three ontology class menus and three keyword search fields in the query table by default. The number of text boxes and ontology class
menus that appear on the Advanced Retrieval search page man be altered for a particular search (see "Setting the number of category and keyword rows in the query
table" below) or the default number that
appear may be set on the
customization page
.
-
Category or Match Attributes
Attribute values of a corresponding ontology class and/or whether or not a keyword should be matched exactly is selected in this column. The attribute values of the ontology
classes are listed in alphabetical order in a scrollable list window in the ontology class rows. The category that a particular attribute is
associated with is in parenthesis after the attribute value name in the format:
<
ontology_class
>
<
attribute_type
>:
<
attribute_value
>
For example, the ontology class "involvement" has an attribute, "requirement", with two values, "yes" and "no", which appear in the attribute
list as "involvement requirement: yes" and "involvement requirement: no" respectively. The user may select multiple attributes for
particular ontology class, if that class has more than one attribute type. For example, the ontology class "biological process"
has three attributes, "source", "biosynthesis" and "type" and the user may choose values for any or all of these attributes.
For more information about the ontology classes and their associated attributes,
check out our
Ontology page
. Selecting the box
beside "Exact match" on a keyword row, determines whether the keyword is matched exactly or may also appear as the start of a word.
-
Specification
Some ontology classes have a special attribute that determines whether they are referred to directly or indirectly. For example
, in the case of genes (the "gene" ontology class) a gene may be mentioned directly by name, "eat-4" or indirectly as "the gene".
Here, the user may specify whether the want to match something that is only directly named (the "named" option), only indirectly
named (the "unnamed" option) or both (the "all" option, which is the default). The following ontology classes can be referred to
directly or indirectly:
"allele", "cell", "drugs", "entity_feature", "molecular function", "gene", "life stage", "mutants",
"nucleic acid", "organism", "sex", "strain".
-
Numerical Comparison of Matches and Number of Matches
The "numerical comparison of matches" and "number of matches" values allow the user to determine the amount of times they wish to match the
ontology class or keyword they have set on the same row in the text. By combining these two values provides the user with great flexibility
for matching frequency of occurrences. From the drop down menu in the "number of matches" column the user can select a number and then, by
choosing one of the options in the "numerical comparison of matches" menu, decide if the would like to see, that exact number ("equal to")
of occurrences, or a greater ("greater than") or lesser ("less than") number.
to the top
1. Matches Information
This is where the number of matches of the search parameters is displayed. If the system is searching either sentences or publications,
the total number of sentences
containing the search parameters is displayed. Also shown is the total number of publications that contain one or more hits.
2. Summary Display Controls
The summary page is returned with ten summaries displayed per page (this is the default, the number of results summaries displayed
per page can be customized in the "
Customization
" page). The
summary display controls allow the user to display any page by selecting the page from the drop down menu and pressing the "Display" button.
Alternatively, the user can navigate the summary page using the "Previous" and "Next" buttons.
3. Email Settings
The user can opt to have the summary page sent to them via email. To do this the user must enter their email address in the text box and
press the "E-mail" button. By selecting the include matches option, the email will also contain the resulting matches from the search.
Beware, this can result in very large emails!
4. "View all matches" Button
Clicking this button brings the user to a results page containing all the results for a given search.
5. Abstract Expansion Buttons
By default (and for the sake of clarity) only the first two sentences of an abstract are displayed in the "Abstract" column. Pressing the
"Expand abstract" button will display the full abstract in the column. This full abstracts can be collapsed again by pressing the
"Collapse abstract" button.
6. "View matches" Button
Clicking this button brings the user to a results page containing the results for a that particular publication.
7. "PDF" Button
Clicking
on the PDF button displays the pdf version of that publication
(only available to Caltech users)
.
8. "Related articles" Button
This button outlinks to the PubMed web-site page of citations that are related to that particular publication.
9. "Results in PDF" Button
Clicking on the Results in PDF button brings the user to a web page where they can opt to download all the resulting
hits for their query in PDF format.
(This may take a few minutes, depending on the number of resulting hits)
to the top
1. Query Display
At the top of the results page the search query is displayed. Note that the different search parameters are displayed with different
color as a visual aid.
2. Results Display
The result display is returned with ten matches displayed per page (this is the default, the number of matches displayed
per page can be customized in the "
Customization
" page). The
summary display controls allow the user to display any page by selecting the page from the drop down menu and pressing the "Display" button.
Alternatively, the user can navigate the summary page using the "Previous" and "Next" buttons.
3. Publication Identifier
The "File ID" identifies the publication from which the match comes according to
WormBase
abstract nomenclature. The type of publication is displayed in parenthesis after the File ID, i.e. Abstract.
4. Sentence Identifier
The "Sentence ID" specifies the sentence number of the match in the publication.
5. Search Matches
The matching sentences are displayed in boldface font. For the sake of context, the sentences that surround the match in the publication
may also be displayed (the default number is ten, the number of surrounding sentences displayed per match can be customized in the
"
Customization
" page).
6. Links to Wormbase
Some words and terms in the matching sentences will link to their corresponding report pages in
Wormbase
, a database repository for the biology and genome of
C. elegans
.
to the top
Below is an example of a text search using the Advanced Retrieval tool in Textpresso. Please see the
"
Examples
" section for many more examples of Textpresso
searches.
What genetic expression affects Ras signaling?
A number of ontology classes are employed to formulate a query that asks, "what genetic expression affects Ras signaling?". In the Textpresso Ontology,
terms that indicate expression are contained in the Biological Process ontology class and have the attribute value, "expression". Therefore
in the first row, where the ontology class "Biological Process" is chosen, an attribute, "biosynthesis" with the value, "expression" is assigned
to the ontology class. The next row determines that the match must also contain (by specifying the AND operator) one or more
("greater than 0") directly named genes. The selection of the ontology class "Effect" in the third ontology row influences the relationship
between the two entities, the named gene and "Ras", which is entered as a keyword in the fifth row to be matched exactly. The inclusion of
one or more occurrences of a "Pathway" class term in the fourth row serves to refine the search to Ras signaling. The substitution of a different "relationship"
ontology class here, such as "involvement" or "purpose", could be used to determine subtlety different queries, "how is genetic expression
involved
in Ras signaling?" and "what
role
does genetic expression play in Ras signaling?" respectively. The differing result
from each of these three subtle variations are shown below:
What genetic expression
affects
Ras signaling?
How is genetic expression
involved
in Ras signaling?
What
role
does genetic expression play in Ras signaling?
to the top