Sharing Data – and Knowledge
How to use a knowledge repository that doesn’t retire or leave you for a competitor.
Millions of dollars have been invested to harness data for intellectual property protection and regulatory purposes, but the industry is severely lacking systems that re-use the data generated in analytical laboratories on a daily basis. In fact, many organizations still rely on scientists’ brains or interpretations scribbled on paper spectra when it comes to analytical data and knowledge, even though far more data is generated than can possibly fit in a person’s head.
My colleagues and I are responsible for studying the metabolic fate of molecules in development for GSK’s drug metabolism and pharmacokinetics department. We generate and consume a lot of data (analytical, structural and species specific) to build metabolite schemes that help us to understand the fate of molecules. Until a few years ago, a lot of our data were recorded on paper, so when I tried to discover if anything similar had been seen in another project or species, I had to ask colleagues or search through the paper files. I also had colleagues in the US, so sharing data in the days of paper records was extremely difficult, particularly as they used software to store analytical data, but not to map metabolic outcomes. I suspect you’ll find a similarly fragmented approach to analytical data in other global companies.
Quite often the terms ‘data’, ‘information’, and ‘knowledge’ are used indiscriminately and interchangeably, but an understanding of these terms can help you identify where you have a gap.
- Data is raw and represents a set of discrete facts; it has no significance beyond its existence.
- Information is data that have been processed to derive meaning and purpose.
- Knowledge is the human understanding of the subject matter, acquired through study and experience, which helps us draw meaningful conclusions.
These terms form an ascending scale of value and context. The following metaphor makes the difference clear: Out shopping, you might spot an old colleague. The facial recognition represents data. The value is increased by information or metadata that begins to fill in the picture. You remember his dog’s name and what his daughters were studying in school. Knowledge is how you recall that he is dreadfully dull! You quickly duck into a store to avoid him, thus using your acquired knowledge to guide your actions to a preferable outcome.
Several years ago at GSK, we set out to create a repository of knowledge that doesn’t forget, doesn’t go senile, doesn’t retire, and doesn’t leave the company for a competitor. Our goal was to capture spectra generated in sample investigations, as well as the context (associated metabolites, and project details) and insights associated with the data (interpretation and conclusions drawn), so that our investigators could share information easily and learn from past outcomes. The end result was very positive and allowed us to better manage our knowledge, which is why I am sharing it here.
Our solution was to implement software from ACD/Labs that could store, search and share analytical and metadata linked to structures in a biotransformation map. We were able to collect analytical data from different techniques (mass spectrometry and nuclear magnetic resonance are widely used in our research); connect it with metabolite structures and other information; and map the data onto biotransformation schemes that record the metabolism pathway, where the parent drug can turn into 100 metabolites. Importantly, it was a way of sharing data with colleagues worldwide so that we could all benefit from previous experiences when looking to develop compounds that could avoid a particular metabolic fate. The data could be searched from almost any facet; for example, by molecular mass, project, analyst, site, species, or structure. Sharing data is extremely important because access to colleagues’ findings can give you confidence in your own conclusions, or reveal additional considerations when analyses have proven tricky.
We’ve configured the software to fully meet our needs and it’s also provided other benefits beyond access to information. Reports that used to take weeks to compile, requiring cut and paste from various vendor software and data management silos, can now be created much more easily.
We haven’t looked back, but I can see why others in our industry may be wary. Even when a new technology or piece of software offers benefits, the pharma industry is cautious of change, after all there is a mentality of: “if it ain’t broke, don’t fix it” – the change curve could bring about a dip in productivity. But many companies are perhaps unaware that if they don’t facilitate the sharing of data and knowledge, they are already experiencing lower than optimal productivity.
Graduating from Warwick with a Chemistry degree, Steve Thomas joined the NMR department of Merck’s Neuroscience Research Centre at Terlings Park in 1990. “The wealth of experience in Medicinal Chemistry support made me analytically bilingual; speaking both NMR and Mass Spec.” Closure of the site in 2006 led him to the Biotransformation and Drug Disposition group at GSK. “I have always loved puzzles and science,” Steve explains, “and structural identification is a straight combination of the two”. Having studied metabolic transformations his entire career, eventually, the lure of more challenging samples and close proximity to the development compounds that change people’s quality of life proved too strong.