Publications

Export 81 results:
Sort by: Author Title Type [ Year  (Desc)]
2014
Ofoghi, B, Lopez Campos GH, Verspoor K, Martin Sanchez F.  2014.  BiomRKRS: A Biomarker Retrieval and Knowledge Reasoning System, 20-23 January. Proceedings of the Seventh Australasian Workshop on Health Information and Knowledge Management conference (HIKM 2013). , Auckland, NZ
Verspoor, K.  2014.  Diving deep into data to crack the gene code on disease, 14 February 2014. The Conversation.
Jimeno Yepes, A, Verspoor K.  2014.  Literature mining of genetic variants for curation: Quantifying the importance of supplementary material. Database: The Journal of Biological Databases and Curation. :bau003.Publisher website
2013
Verspoor, K, MacKinlay A, Cohn JD, Wall ME.  2013.  Detection of protein catalytic sites in the biomedical literature, Jan 3-7, 2013. Pacific Symposium on Biocomputing. , Hawaii
MacKinlay, A, Martinez D, Jimeno Yepes A, Liu H, Wilbur WJ, Verspoor K.  2013.  Extracting Biomedical Events and Modifications Using Subgraph Matching with Noisy Training Data, 9 August. Proceedings of the BioNLP Shared Task Workshop at the Association for Computational Linguistics 2013 meeting. , Sofia, Bulgaria
Liu, H, Verspoor K, Comeau DC, MacKinlay A, Wilbur WJ.  2013.  Generalizing an Approximate Subgraph Matching-based System to Extract Events in Molecular Biology and Cancer Genetics, 9 August. Proceedings of the BioNLP Shared Task Workshop at the Association for Computational Linguistics 2013 meeting. , Sofia, Bulgaria
Matykiewicz, P, Cohen KB, Holland KD, Glauser TA, Standridge SM, Verspoor KM, J P.  2013.  Earlier Identification of Epilepsy Surgery Candidates Using Natural Language Processing, 8 August. Proceedings of the BioNLP Shared Task Workshop at the Association for Computational Linguistics 2013 meeting. , Sofia, Bulgaria
Jimeno Yepes, A, Verspoor K.  2013.  Towards automatic large-scale curation of genomic variation: improving coverage based on supplementary material, 20 July. Proceedings of BioLINK SIG 2013. , Berlin, Germanybiolinksig2013_jimeno_verspoor.pdf
Cavedon, L, Martinez D, Suominen H, Ananda-Rajah M, Pitson G, Verspoor K.  2013.  Roles for language technology and text mining for next-generation healthcare, 19 April. HISA Big Data. , Melbourne, Australiabigdata2013_cavedon.pdf
Verspoor, K, Jimeno Yepes A, Ong C-S, Macintyre G.  2013.  Prioritising genetic mutations by mining the biomedical literature, 18 April. HISA Big Data. , Melbourne, Australiabig_data_2013_verspoor.pdf
MacKinlay, A, Verspoor K.  2013.  Information Extraction from Medication Prescriptions Within Drug Administration Data, 11 February. The 4th International Workshop on Health Document Text Mining and Information Analysis with the Focus of Cross-Language Evaluation (LOUHI). , Sydney, Australiainterpret-prescriptions-louhi.pdf
Liu, H, Hunter L, Keselj V, Verspoor K.  2013.  Approximate Subgraph Matching-Based Literature Mining for Biomedical Events and Relations, 04. PLoS ONE. 8:e60954., Number 4: Public Library of Science AbstractWebsite

The biomedical text mining community has focused on developing techniques to automatically extract important relations between biological components and semantic events involving genes or proteins from literature. In this paper, we propose a novel approach for mining relations and events in the biomedical literature using approximate subgraph matching. Extraction of such knowledge is performed by searching for an approximate subgraph isomorphism between key contextual dependencies and input sentence graphs. Our approach significantly increases the chance of retrieving relations or events encoded within complex dependency contexts by introducing error tolerance into the graph matching process, while maintaining the extraction precision at a high level. When evaluated on practical tasks, it achieves a 51.12% F-score in extracting nine types of biological events on the GE task of the BioNLP-ST 2011 and an 84.22% F-score in detecting protein-residue associations. The performance is comparable to the reported systems across these tasks, and thus demonstrates the generalizability of our proposed approach.

Rimell, L, Lippincott T, Verspoor K, Johnson HL, Korhonen A.  2013.  Acquisition and evaluation of verb subcategorization resources for biomedicine. Journal of Biomedical Informatics. 46:228-237., Number 2 AbstractWebsite

n/a

Verspoor, K, Jimeno Yepes A, Cavedon L, McIntosh T, Herten-Crabb A, Thomas Z, Plazzer J-P.  2013.  Annotating the biomedical literature for the human variome. Database: The Journal of Biological Databases and Curation. 2013 AbstractWebsite

This article introduces the Variome Annotation Schema, a schema that aims to capture the core concepts and relations relevant to cataloguing and interpreting human genetic variation and its relationship to disease, as described in the published literature. The schema was inspired by the needs of the database curators of the International Society for Gastrointestinal Hereditary Tumours (InSiGHT) database, but is intended to have application to genetic variation information in a range of diseases. The schema has been applied to a small corpus of full text journal publications on the subject of inherited colorectal cancer. We show that the inter-annotator agreement on annotation of this corpus ranges from 0.78 to 0.95 F-score across different entity types when exact matching is measured, and improves to a minimum F-score of 0.87 when boundary matching is relaxed. Relations show more variability in agreement, but several are reliable, with the highest, cohort-has-size, reaching 0.90 F-score. We also explore the relevance of the schema to the InSiGHT database curation process. The schema and the corpus represent an important new resource for the development of text mining solutions that address relationships among patient cohorts, disease and genetic variation, and therefore, we also discuss the role text mining might play in the curation of information related to the human variome. The corpus is available at http://opennicta.com/home/health/variome.

Lippincott, T, Rimell L, Verspoor K, Korhonen A.  2013.  Approaches to Verb Subcategorization for Biomedicine. Journal of Biomedical Informatics. 46(2):212-227.DOI
Comeau, DC, Doğan RI, Ciccarese P, Cohen KB, Krallinger M, Leitner F, Lu Z, Peng Y, Rinaldi F, Torii M, Valencia A, Verspoor K, Wiegers TC, Wu CH, Wilbur WJ.  2013.  BioC: A Minimalist Approach to Interoperability for Biomedical Text Processing. Database: The Journal of Biological Databases and Curation. :bat064.Journal Link
Sokolov, A, Funk C, Graim K, Verspoor K, Ben-Hur A.  2013.  Combining Heterogeneous Data Sources for Accurate Functional Annotation of Proteins. BMC Bioinformatics. 14(Suppl 3):S10.Publisher website
Shmanina, T, Zukerman I, Cavedon L, Jimeno Yepes A, Verspoor K.  2013.  Impact of Corpus Diversity and Complexity on NER Performance. Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013). :91-95.
Radivojac, P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, Graim K, Funk C, Verspoor K, Ben-Hur A, Mooney S, Friedberg I, et al.  2013.  A large-scale evaluation of computational protein function prediction. Nature Methods. advance online publication: Nature Publishing Group AbstractWebsite

n/a

Karimi, S, Verspoor K.  2013.  Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013). Australasian Language Technology Association Workshop 2013 (ALTA 2013). ALTA2013_Proceedings.pdf
Verspoor, K, Blaschke C, Hirschman L, Shatkay H, Valencia A.  2013.  Proceedings of the BioLINK SIG 2013: Roles for text mining in biomedical knowledge discovery and translational medicine. BioLINK SIG 2013. , Berlin, Germany: ISMB BioLINK SIGbiolink2013_proceedings.pdf
Livingston, K, Bada M, Hunter LE, Verspoor K.  2013.  Representing Annotation Compositionality and Provenance for the Semantic Web. Journal of Biomedical Semantics. 4:38.Journal link
2012
Martinez, DM, MacKinlay A, Molla-Aliod D, Cavedon L, Verspoor K.  2012.  Simple similarity-based question answering strategies for biomedical text, Sept 17-20, 2012. Conference and Labs of the Evaluation Forum (CLEF). , Rome, Italy
MacKinlay, AD, Verspoor K.  2012.  Extracting Structured Information from Free-Text Medication Prescriptions, October 29, 2012. ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics (DTMBIO). , Hawaii, USA
Liu, H, Keselj V, Blouin C, Verspoor K.  2012.  Subgraph Matching-based Literature Mining for Biomedical Relations and Events, Nov 2-4, 2012. AAAI 2012 Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text. , Arlington, VA, USA
Verspoor, K, Livingston K.  2012.  Towards Adaptation of Linguistic Annotations to Scholarly Annotation Formalisms on the Semantic Web, 12-13 July 2012. Linguistic Annotation Workshop at the Association for Computational Linguistics annual meeting. , Jeju Island, Korea
Liu, H, Christiansen T, Baumgartner Jr WA, Verspoor K.  2012.  BioLemmatizer: a lemmatization tool for morphological processing of biomedical text. Journal of Biomedical Semantics. 3(3) AbstractWebsite

Background
The wide variety of morphological variants of domain-specific technical terms contributes to the complexity of performing natural language processing of the scientific literature related to molecular biology. For morphological analysis of these texts, lemmatization has been actively applied in the recent biomedical research.

Results
In this work, we developed a domain-specific lemmatization tool, BioLemmatizer, for the morphological analysis of biomedical literature. The tool focuses on the inflectional morphology of English and is based on the general English lemmatization tool MorphAdorner. The BioLemmatizer is further tailored to the biological domain through incorporation of several published lexical resources. It retrieves lemmas based on the use of a word lexicon, and defines a set of rules that transform a word to a lemma if it is not encountered in the lexicon. An innovative aspect of the BioLemmatizer is the use of a hierarchical strategy for searching the lexicon, which enables the discovery of the correct lemma even if the input Part-of-Speech information is inaccurate. The BioLemmatizer achieves an accuracy of 97.5% in lemmatizing an evaluation set prepared from the CRAFT corpus, a collection of full-text biomedical articles, and an accuracy of 97.6% on the LLL05 corpus. The contribution of the BioLemmatizer to accuracy improvement of a practical information extraction task is further demonstrated when it is used as a component in a biomedical text mining system.

Conclusions
The BioLemmatizer outperforms other tools when compared with eight existing lemmatizers. The BioLemmatizer is released as an open source software and can be downloaded from http://biolemmatizer.sourceforge.net.

Bada, M, Eckert M, Evans D, Garcia K, Shipley K, Sitnikov D, Baumgartner WA, Cohen KB, Verspoor K, Blake JA, Hunter LE.  2012.  Concept Annotation in the CRAFT corpus. BMC Bioinformatics. 13(161) AbstractWebsite

Background
Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text.

Results
This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement.

Conclusions
As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.

Verspoor, K, Cohen KB, Lanfranchi A, Warner C, Johnson HL, Roeder C, Choi JD, Funk C, Malenkiy Y, Eckert M, Xue N, Baumgartner WA, Bada M, Palmer M, Hunter LE.  2012.  A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools. BMC Bioinformatics. 13:207.Journal Website
Ravikumar, KE, Liu H, Cohn JD, Wall ME, Verspoor K.  2012.  Literature Mining of Protein-Residue Associations with Graph Rules Learned through Distant Supervision. Journal of Biomedical Semantics. 3(S3):S2.
Verspoor, KM, Cohn JD, Ravikumar KE, Wall ME.  2012.  Text Mining Improves Prediction of Protein Functional Sites. PLoS ONE. 7( e32171), Number 2: Public Library of Science AbstractWebsite

We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions.

2011
Cohen, KB, Christiansen T, Baumgartner Jr WA, Verspoor K, Hunter LE.  2011.  Fast and simple semantic class assignment for biomedical text. ACL HLT 2011. :38. Abstract

n/a

Lu, Z, Kao HY, Wei CH, Huang M, Liu J, Kuo CJ, Hsu CN, Tsai R, Dai HJ, Okazaki N, others, Verspoor K, Livingston K, Wilbur WJ.  2011.  The gene normalization task in BioCreative III. BMC Bioinformatics. 12(Suppl 8):S2., Number Suppl 8: BioMed Central LtdWebsite
Cohen, KB *, Verspoor K *, Johnson HL, Roeder C, Ogren PV, Baumgartner WA, White E, Tipney H, Hunter L.  2011.  High-precision biological event extraction: Effects of system and data. Computational Intelligence. 27(4):681–701. Abstractbionlp09_coin_paper.pdf

n/a

Ravikumar, KE, Liu H, Cohn JD, Wall ME, Verspoor K.  2011.  Pattern Learning through Distant Supervision for Extraction of Protein-Residue Associations in the Biomedical Literature. Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on. 2:59–65.: IEEE Abstract

n/a

Verspoor, CM, Sims BH, Ambrosiano JJ, Cleland TJ.  2011.  System and method for knowledge based matching of users in a network. : Los Alamos National Security LLC (Los Alamos, NM) Abstract

n/a

Kano, Y, Bjorne J, Ginter F, Salakoski T, Buyko E, Hahn U, Cohen KB, Verspoor K, Roeder C, Hunter LE, others, Ohta T, Tsujii J.  2011.  U-Compare bio-event meta-service: compatible BioNLP event extraction services. BMC bioinformatics. 12:481., Number 1: BioMed Central Ltd Abstract

n/a

2010
Ramakrishnan, C, Baumgartner Jr WA, Blake JA, Burns GAPC, Cohen KB, Drabkin H, Eppig J, Hovy E, Hsu CN, Hunter LE, Ingulfsen T, Pokkunuri S, Onda H, Riloff E, Roeder C, Verspoor K.  2010.  Building the Scientific Knowledge Mine (SciKnowMine): a Community-driven Framework for Text Mining Tools in Direct Service to Biocuration. Malta. Language Resources and Evaluation. Abstract

n/a

Verspoor, K, Roeder C, Johnson HL, Cohen KB, Baumgartner Jr WA, Hunter LE.  2010.  Exploring species-based strategies for gene normalization. Computational Biology and Bioinformatics, IEEE/ACM Transactions on. 7:462–471., Number 3: IEEE Abstract

n/a

Livingston, KM, Johnson HL, Verspoor K, Hunter LE.  2010.  Leveraging Gene Ontology Annotations to Improve a Memory-Based Language Understanding System. Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on. :40–45.: IEEE Abstract

n/a

Cohen, KB, Johnson H, Verspoor K, Roeder C, Hunter L.  2010.  The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC bioinformatics. 11:492., Number 1: BioMed Central Ltd Abstract

n/a

Cohen, KB, Baumgartner Jr WA, Roeder C, Hunter LE, Verspoor K.  2010.  Test suite design for ontology concept recognition systems. Language Resources and Evaluation Conference (LREC). Abstract

n/a

Roeder, C, Jonquet C, Shah NH, Baumgartner Jr WA, Verspoor K, Hunter L.  2010.  A UIMA wrapper for the NCBO annotator. Bioinformatics. 26:1800–1801., Number 14: Oxford Univ Press Abstract

n/a

Görg, C, Tipney H, Verspoor K, Baumgartner W, Cohen K, Stasko J, Hunter L.  2010.  Visualization and language processing for supporting analysis across the biomedical literature. Knowledge-Based and Intelligent Information and Engineering Systems. :420–429.: Springer Abstract

n/a

2009
Ferrucci, D, Lally A, Verspoor K, Nyberg DE.  2009.  Unstructured Information Management Architecture (UIMA) Version 1.0, March 2, 2009. : OASIS Technical Standard
Verspoor, K, Baumgartner Jr W, Roeder C, Hunter L.  2009.  Abstracting the types away from a UIMA type system. From Form to Meaning: Processing Texts Automatically. C. Chiarcos, Eckhart de Castilho, Stede, M. :249–256. Abstract

n/a

Cohen, KB, Verspoor K, Johnson HL, Roeder C, Ogren PV, Baumgartner Jr WA, White E, Tipney H, Hunter L.  2009.  High-precision biological event extraction with a concept recognizer. Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task. :50–58.: Association for Computational Linguistics Abstract

n/a

Verspoor, K, Dvorkin D, Cohen KB, Hunter L.  2009.  Ontology quality assurance through analysis of term transformations. Bioinformatics. 25:i77., Number 12: Oxford Univ Press Abstract

n/a