NIK2009 - Retrieving BioMedical Information with BioTracer: Challenges and Possibilities
|Publikasjon||Norsk informatikkonferanse (NIK)|
|ISSN/ISSN2||1892-0713 (trykk) / 1892-0721 (online)/|
|Utgiver||Tapir Akademisk Forlag|
|Adresse utgiver||Nardoveien 12 7005 Trondheim|
AbstraktA large amount of biomedical information is available to researchers today,
and it is continuously increasing. As a result, researchers widely agree
that the ability to precisely retrieve desired information is vital to use the
available knowledge. A way to achieve this is providing a retrieval system
that is not only able to retrieve the available and sought information, but
also to filter out irrelevant documents, while giving the relevant ones the
highest ranking. The main goal of this work has been to investigate how
to improve the ability for a system to find and rank relevant documents. As
described and discussed in this paper, our method is based on applying series
of information retrieval techniques to search in biomedical information and
combine them in an optimal manner. These techniques include extending and
using well-established information retrieval (IR) similarity models like the
TF-IDF and BM25 as the scoring schemes, and applying personalisation so
that researchers may affect the ranking based on their view of relevance. The
techniques have been implemented and tested in a proof-of-concept prototype
called BioTracer, extending a Java-based open source search engine library.
The preliminary results from our experiments using the TREC 2004 Genomic
Track collection seem satisfactory, with the best mean average precision
(MAP) of 0.5129 and the best precision at 100 retrieved documents (P@100)
of 0.473. What can be concluded from these results is that involving the users
in the search will often have positive effects on the ranking of search results,
and that our BioTracer system represents a tool that may be able to meet the
user’s information needs.
Referanser Gianni Amati and Cornelis Joost Van Rijsbergen. Probabilistic models of information
retrieval based on measuring the divergence from randomness. ACM Transactions on
Information Systems, 20(4):357–389, 2002.
 Ricardo A. Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-
Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999.
 L. Chen, H. Liu, and C. Friedman. Gene name ambiguity of eukaryotic nomenclatures.
Bioinformatics, 21(2):248–256, 2005.
 A. Divoli and T. K. Attwood. Bioie: extracting informative sentences from the biomedical
literature. Bioinformatics, 21:2138–2139, 2005.
 Alfred D. Eaton. Hubmed: a web-based biomedical literature search interface. Nucleic Acids
Research, 34 (Web Server issue):W745–W747, 2006.
 Ola Galde and John Harald Sevaldsen. Searching biomedical text: Towards maximum
relevant results. Master’s thesis, Norwegian University of Science and Technology (NTNU),
 Erik Hatcher and Otis Gospodnetic. Lucene in Action. Manning Publications Co., 209 Bruce
Park Ave., Greenwich, CT 06830, 2005.
 William R Hersh, Ravi Teja Bhupatiraju, Laura Ross, Phoebe Roberts, Aaron M Cohen, and
Dale F Kraemer. Enhancing access to the bibliome: the trec 2004 genomics track. Journal
of Biomedical Discovery and Collaboration2006, 1(3):pp. 10, March 2006.
 Jing Jiang and ChengXiang Zhai. An empirical study of tokenization strategies for
biomedical information retrieval. Information Retrieval, 10(4-5):341–363, 2007.
 Mika K¨aki and Anne Aula. Controlling the complexity in comparing search user interfaces
via user studies. Information Processing and Management, 44(1):82 – 91, 2008. Evaluation
of Interactive Information Retrieval Systems.
 Diane Kelly, David J. Harper, and Brian Landau. Questionnaire mode effects in interactive
information retrieval experiments. Information Processing and Management, 44(1):122 –
141, 2008. Evaluation of Interactive Information Retrieval Systems.
 Michael Krauthammer and Goran Nenadic. Term identification in the biomedical literature.
Journal of Biomedical Informatics, 37(6):512–526, 2004.
 H. J. Lowe and G. O. Barnett. Understanding and using the medical subject headings (mesh)
vocabulary to perform literature searches. JAMA, 271(14):1103–1108, April 1994.
 Hans-Michael Muller, Eimear E Kenny, and Paul W Sternberg. Textpresso: an ontologybased
information retrieval and extraction system for biological literature. PLoS Biol,
2(11):e309, Nov 2004.
 Rebecca Netzel, Carolina Perez-Iratxeta, Peer Bork, and Miguel A. Andrade. The way we
write. EMBO Reports, 4(5):446—451, May 2003.
 Stephen Robertson, Hugo Zaragoza, and Michael Taylor. Simple bm25 extension to multiple
weighted fields. In CIKM ’04: Proceedings of the thirteenth ACM international conference
on Information and knowledge management, pages 42–49, Washington, D.C., USA, 2004.
 Stephen E. Robertson and Karen Sparck Jones. Simple proven approaches to text retrieval.
Technical Report 356, University of Cambridge, 1994.
 J. Rocchio. Relevance feedback in information retrieval. In Gerard Salton, editor, The
SMART Retrieval System: Experiments in Automatic Document Processing, chapter 14,
pages 313–323. Prentice-Hall, Englewood Cliffs, NJ, USA, 1971.
 Gerard Salton and Chris Buckley. Term-weighting approaches in automatic text retrieval.
Information Processing and Management, 24(5):513–523, 1988.
 Ellen M. Voorhees. On test collections for adaptive information retrieval. Inf. Process.
Manage., 44(6):1879–1885, 2008.
 Ross Wilkinson. Effective retrieval of structured documents. In SIGIR ’94: Proceedings
of the 17th annual international ACM SIGIR conference on Research and development in
information retrieval, pages 311–317, New York, NY, USA, 1994. Springer-Verlag New
 Emine Yilmaz and Javed A. Aslam. Estimating average precision when judgments are
incomplete. Knowledge and Information Systems, 16(2):173–211, July 2008.
 C. Zhai. Notes on the lemur tfidf model. note with lemur 1.9 documentation. Technical
report, School of CS, CMU, 2001.
Forrige artikkel Neste artikkel