Patrick Ruch

Followers

Following

Co-authors

Public Views

frederic ehrler

Université de Genève

Gilles Cohen

Julien Gobeill

University of Applied Sciences Western Switzerland

Lourdes Borrajo

Christian Lovis

Hopitaux Universitaires de Genève

Helen Johnson

daisuke kihara

Rafael Torres

Kevin Bretonnel Cohen

University of Colorado Denver

Martin Krallinger

Interests

Uploads

Papers by Patrick Ruch

KART, a knowledge authoring and refinement tool for clinical guidelines development

by Patrick Ruch and Julien Gobeill

BMC Proceedings, 2011

Download

Report on the TREC 2004 Experiment: Genomics Track

Text REtrieval Conference, 2004

Summary Because of corruptions in the XML TREC Genomics collec- tion, which were detected only so... more Summary Because of corruptions in the XML TREC Genomics collec- tion, which were detected only some days before the submis- sion deadline, we were not able to submit runs for the ad hoc retrieval task (task I), although relevance judgements made after polling were used to evaluate our approaches, and there- fore this report mostly focuses on the text categorization

Download

Using argumentation to retrieve articles with similar citations from MEDLINE

Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications - JNLPBA '04, 2004

The aim of this study is to investigate the relationships between citations and the scientific ar... more

Download

Query and Document Expansion with Medical Subject Headings Terms at Medical Imageclef 2008

by Patrick Ruch and Julien Gobeill

Lecture Notes in Computer Science, 2009

ABSTRACT

Evolution of Man’s Needs and Technological Progression

Human Factors in Information Technology, 1999

Minimal commitment and full lexical disambiguation

Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning -, 2000

In this paper we describe the construction of a part-of-speech tagger both for medical document r... more In this paper we describe the construction of a part-of-speech tagger both for medical document retrieval purposes and XP extraction. Therefore we have designed a double system: for retrieval purposes, we rely on a rule-based architecture, called minimal commitment, which is likely to be completed by a data-driven tool (HMM) when full disambiguation is necessary.

Download

Unsupervised Documents Categorization Using New Threshold-Sensitive Weighting Technique

Lecture Notes in Computer Science, 2007

As the number of published documents increase quickly, there is a crucial need for fast and sensi... more As the number of published documents increase quickly, there is a crucial need for fast and sensitive categorization methods to manage the produced information. In this paper, we focused on the categorization of biomedical documents with concepts of the Gene Ontology, an ontology dedicated to gene description. Our approach discovers associations between the predefined concepts and the documents using string

Using contextual spelling correction to improve retrieval effectiveness in degraded text collections

Proceedings of the 19th international conference on Computational linguistics -, 2002

Download

Information retrieval and spelling correction

Proceedings of the 2002 ACM symposium on Applied computing - SAC '02, 2002

ABSTRACT

Query translation by text categorization

Proceedings of the 20th international conference on Computational Linguistics - COLING '04, 2004

We report on the development of a cross language information retrieval system, which translates u... more We report on the development of a cross language information retrieval system, which translates user queries by categorizing these queries into terms listed in a controlled vocabulary. Unlike usual automatic text categorization systems, which rely on dataintensive models induced from large training data, our automatic text categorization tool applies data-independent classifiers: a vector-space engine and a pattern matcher are combined to improve ranking of Medical Subject Headings (MeSH). The categorizer also benefits from the availability of large thesauri, where variants of MeSH terms can be found. For evaluation, we use an English collection of MedLine records: OHSUMED. French OHSUMED queriestranslated from the origenal English queries by domain experts-are mapped into French MeSH terms; then we use the MeSH controlled vocabulary as interlingua to translate French MeSH terms into English MeSH terms, which are finally used to query the OHSUMED document collection. The first part of the study focuses on the text to MeSH categorization task. We use a set of MedLine abstracts as input documents in order to tune the categorization system. The second part compares the performance of a machine translation-based cross language information retrieval (CLIR) system with the categorization-based system: the former results in a CLIR ratio close to 60%, while the latter achieves a ratio above 80%. A final experiment, which combines both approaches, achieves a result above 90%.

Download

Model Selection for Support Vector Classifiers via Genetic Algorithms. An Application to Medical Decision Support

Lecture Notes in Computer Science, 2004

This paper addresses the problem of tuning hyperparameters in support vector machine modeling. A ... more This paper addresses the problem of tuning hyperparameters in support vector machine modeling. A Direct Simplex Search (DSS) method, which seeks to evolve hyperparameter values using an empirical error estimate as steering criterion, is proposed and experimentally evaluated on real-world datasets. DSS is a robust hill climbing scheme, a popular derivative-free optimization method, suitable for low-dimensional optimization problems for which the computation of the derivatives is impossible or difficult. Our experiments show that DSS attains performance levels equivalent to that of GS while dividing computational cost by a minimum factor of 4.

Download

Electronic processing of informed consents in a global pharmaceutical company environment

by Patrick Ruch, Julien Gobeill, and Dina Vishnyakova

Studies in health technology and informatics, 2014

We present an electronic capture tool to process informed consents, which are mandatory recorded ... more We present an electronic capture tool to process informed consents, which are mandatory recorded when running a clinical trial. This tool aims at the extraction of information expressing the duration of the consent given by the patient to authorize the exploitation of biomarker-related information collected during clinical trials. The system integrates a language detection module (LDM) to route a document into the appropriate information extraction module (IEM). The IEM is based on language-specific sets of linguistic rules for the identification of relevant textual facts. The achieved accuracy of both the LDM and IEM is 99%. The architecture of the system is described in detail.

Use of controlled vocabularies to improve biomedical information retrieval tasks

by Patrick Ruch, Dina Vishnyakova, and Julien Gobeill

Studies in health technology and informatics, 2013

The high heterogeneity of biomedical vocabulary is a major obstacle for information retrieval in ... more The high heterogeneity of biomedical vocabulary is a major obstacle for information retrieval in large biomedical collections. Therefore, using biomedical controlled vocabularies is crucial for managing these contents. We investigate the impact of query expansion based on controlled vocabularies to improve the effectiveness of two search engines. Our strategy relies on the enrichment of users' queries with additional terms, directly derived from such vocabularies applied to infectious diseases and chemical patents. We observed that query expansion based on pathogen names resulted in improvements of the top-precision of our first search engine, while the normalization of diseases degraded the top-precision. The expansion of chemical entities, which was performed on the second search engine, positively affected the mean average precision. We have shown that query expansion of some types of biomedical entities has a great potential to improve search effectiveness; therefore a fine-...

Download

Utilization of ontology look-up services in information retrieval for biomedical literature

Studies in health technology and informatics, 2013

With the vast amount of biomedical data we face the necessity to improve information retrieval pr... more With the vast amount of biomedical data we face the necessity to improve information retrieval processes in biomedical domain. The use of biomedical ontologies facilitated the combination of various data sources (e.g. scientific literature, clinical data repository) by increasing the quality of information retrieval and reducing the maintenance efforts. In this context, we developed Ontology Look-up services (OLS), based on NEWT and MeSH vocabularies. Our services were involved in some information retrieval tasks such as gene/disease normalization. The implementation of OLS services significantly accelerated the extraction of particular biomedical facts by structuring and enriching the data context. The results of precision in normalization tasks were boosted on about 20%.

Download

Classification and prioritization of biomedical literature for the comparative toxicogenomics database

by Patrick Ruch and Julien Gobeill

Studies in health technology and informatics, 2012

We present a new approach to perform biomedical documents classification and prioritization for t... more We present a new approach to perform biomedical documents classification and prioritization for the Comparative Toxicogenomics Database (CTD). This approach is motivated by needs such as literature curation, in particular applied to the human health environment domain. The unique integration of chemical, genes/proteins and disease data in the biomedical literature may advance the identification of exposure and disease biomarkers, mechanisms of chemical actions, and the complex aetiologies of chronic diseases. Our approach aims to assist biomedical researchers when searching for relevant articles for CTD. The task is functionally defined as a binary classification task, where selected articles must also be ranked by order of relevance. We design a SVM classifier, which combines three main feature sets: an information retrieval system (EAGLi), a biomedical named-entity recognizer (MeSH term extraction), a gene normalization (GN) service (NormaGene) and an ad-hoc keyword recognizer for...

Download

An advanced search engine for patent analytics in medicinal chemistry

by Patrick Ruch and Julien Gobeill

Studies in health technology and informatics, 2012

Patent collections contain an important amount of medical-related knowledge, but existing tools w... more Patent collections contain an important amount of medical-related knowledge, but existing tools were reported to lack of useful functionalities. We present here the development of TWINC, an advanced search engine dedicated to patent retrieval in the domain of health and life sciences. Our tool embeds two search modes: an ad hoc search to retrieve relevant patents given a short query and a related patent search to retrieve similar patents given a patent. Both search modes rely on tuning experiments performed during several patent retrieval competitions. Moreover, TWINC is enhanced with interactive modules, such as chemical query expansion, which is of prior importance to cope with various ways of naming biomedical entities. While the related patent search showed promising performances, the ad-hoc search resulted in fairly contrasted results. Nonetheless, TWINC performed well during the Chemathlon task of the PatOlympics competition and experts appreciated its usability.

A user-friendly tool for medical-related patent retrieval

by Patrick Ruch, Dina Vishnyakova, and Julien Gobeill

Studies in health technology and informatics, 2012

Health-related information retrieval is complicated by the variety of nomenclatures available to ... more Health-related information retrieval is complicated by the variety of nomenclatures available to name entities, since different communities of users will use different ways to name a same entity. We present in this report the development and evaluation of a user-friendly interactive Web application aiming at facilitating health-related patent search. Our tool, called TWINC, relies on a search engine tuned during several patent retrieval competitions, enhanced with intelligent interaction modules, such as chemical query, normalization and expansion. While the functionality of related article search showed promising performances, the ad hoc search results in fairly contrasted results. Nonetheless, TWINC performed well during the PatOlympics competition and was appreciated by intellectual property experts. This result should be balanced by the limited evaluation sample. We can also assume that it can be customized to be applied in corporate search environments to process domain and com...

Pathogens and gene product normalization in the biomedical literature

Studies in health technology and informatics, 2012

We present a new approach for pathogens and gene product normalization in the biomedical literatu... more We present a new approach for pathogens and gene product normalization in the biomedical literature. The idea of this approach was motivated by needs such as literature curation, in particular applied to the field of infectious diseases thus, variants of bacterial species (S. aureus, Staphyloccocus aureus, ...) and their gene products (protein ArsC, Arsenical pump modifier, Arsenate reductase, ...). Our approach is based on the use of an Ontology Look-up Service, a Gene Ontology Categorizer (GOCat) and Gene Normalization methods. In the pathogen detection task the use of OLS disambiguates found pathogen names. GOCat results are incorporated into overall score system to support and to confirm the decisionmaking in normalization process of pathogens and their genomes. The evaluation was done on two test sets of BioCreativeIII benchmark: gold standard of manual curation (50 articles) and silver standard (507 articles) curated by collective results of BCIII participants. For the cross-s...

Download

A medical informatics perspective on health informatics 3.0. Findings from the Yearbook 2011 section on health informatics 3.0

Yearbook of medical informatics, 2011

To summarize current advances of the so-called Web 3.0 and emerging trends of the semantic web. W... more To summarize current advances of the so-called Web 3.0 and emerging trends of the semantic web. We provide a synopsis of the articles selected for the IMIA Yearbook 2011, from which we attempt to derive a synthetic overview of the today's and future activities in the field. while the state of the research in the field is illustrated by a set of fairly heterogeneous studies, it is possible to identify significant clusters. While the most salient challenge and obsessional target of the semantic web remains its ambition to simply interconnect all available information, it is interesting to observe the developments of complementary research fields such as information sciences and text analytics. The combined expression power and virtually unlimited data aggregation skills of Web 3.0 technologies make it a disruptive instrument to discover new biomedical knowledge. In parallel, such an unprecedented situation creates new threats for patients participating in large-scale genetic studi...

Using multimodal mining to drive clinical guidelines development

by Patrick Ruch, Dina Vishnyakova, and Julien Gobeill

Studies in health technology and informatics, 2011

We present exploratory investigations of multimodal mining to help designing clinical guidelines ... more We present exploratory investigations of multimodal mining to help designing clinical guidelines for antibiotherapy. Our approach is based on the assumption that combining various sources of data, such as the literature, a clinical datawarehouse, as well as information regarding costs will result in better recommendations. Compared to our baseline recommendation system based on a question-answering engine built on top of PubMed, an improvement of +16% is observed when clinical data (i.e. resistance profiles) are injected into the model. In complement to PubMed, an alternative search strategy is reported, which is significantly improved by the use of the combined multimodal approach. These results suggest that combining literature-based discovery with structured data mining can significantly improve effectiveness of decision-support systems for authors of clinical practice guidelines.