Cheminformatics Breathes New Life Into Legacy Data

It has been a privilege to have recently been involved in an exciting new collaboration with Alexander Tropsha and Denis Fourches at the University of North Carolina. The team carried out a cheminformatics analysis of the assertional metadata (AMD) from BioWisdom’s Safety Intelligence Program (SIP), the world’s largest, ever-expanding collection of the known effects of drugs and other chemicals. SIP makes use of BioWisdom’s collection of key concept metadata and Sofia technology, which translates the varied and diverse language used by scientists into a semantically consistent form. The study has provided a fresh new perspective on historic public domain data, bringing together isolated fragments of toxicology information, spanning decades, to provide new insight into drug safety. The resulting article, entitled “Cheminformatics Analysis of Assertions Mined from Literature That Describe Drug-Induced Liver Injury in Different Species”, was published in Chemical Research in Toxicology on December 16, 2009.
The work focused on the effects of chemicals in the liver, in humans as well as a range of rodent and non-rodent animal models. A list of compounds that cause injury to, or alter the normal physiological functioning of, the liver as stated in Medline abstracts were exported from SIP. The AMD for this collection of toxicants were used to evaluate how well preclinical models are able to predict hepatotoxicity in humans; the relationship between drug-induced liver injury (DILI) and chemical structure, and the use of quantitative structure-activity relationship (QSAR) modeling to predict whether a chemical is likely to induce liver effects in humans.
The predictive power of preclinical models to signal human hepatotoxicity was measured by looking at the concordance of liver effects across humans, rodents and non-rodent groups. In each case the level of concordance, and therefore predictivity, was shown to be relatively low (39-44%) which is consistent with recent publications. This raised the question of whether species-specific liver effects may be, in part, related to chemical structure. This was addressed in the first part of the cheminformatics study, in which clusters of structurally-related compounds were created and the balance of human and non-human liver effects compared. We were able to show that chemically similar compounds tend to have similar liver effect profiles across different species, suggesting that such a link does exist. Interestingly, a number of distinct “chemotypes” emerged that have an increased liability for human liver effects. It is hoped that further development of this approach could translate into “structural alerts”, and help prevent attrition at the costly later stages of drug development by flagging up compounds that may cause injury to humans despite appearing safe in preclinical models.
In the second part of the cheminformatics analysis, the AMD and structural information were used to build QSAR models capable of predicting whether a compound is likely to produce liver injury in rodent species only (and therefore not humans) or exclusively in humans. The models were tested against a set of compounds from regulatory documents, in which the toxicology had been thoroughly investigated, and reached prediction accuracies of up to 72.6%. This figure compares favourably with in vitro methods, and demonstrates the great value that lies in historical data.
To the best of our knowledge, this application of QSAR modeling and other cheminformatics techniques to observations extracted by text mining is the first of its kind, and is made possible by the rigorous methods used in the creation of AMD. The study has opened up new opportunities for generating fresh insight from “legacy” language and for supplementing traditional whole-animal and in vitro approaches to measuring drug safety.
Tags: AMD, cheminformatics, DILI, drug safety, Hepatic Injury, hepatotox, historic data, legacy data, prediction of risk, QSAR
March 11th, 2010 at 5:35 am
I fell into your blog throught Google. While not open source specifically, Ariadne is just launching a commercially supported product to specifically pick out small molecule to protein, to disease, and several other biological concept entities from electronically stored text sources. It will be lauched at a product call ChemEffect and was made to meet standards required by some of the larger pharmas.
While the backend programming is not open, the program is open designed to be modified by users and the entities level for non-programmer users and by extraction patterns at an advanced level.
It might help! Just a thought.
Cheers.
Matt John
————————–