Welcome to the Lab of the Future
AnIML and SiLA are community- supported, interoperable data and communication standards that can be seamlessly combined to enable the whole lifecycle of a sample – including data from multiple combined technologies – to be visualized. The lab of the future is closer than you think!
Arne Kusserow | | 7 min read
sponsored by MilliporeSigma
The pharmaceutical industry can be a slow- moving beast, with a tendency toward self-preservation in a highly regulated environment. This is certainly the case with the industry’s adoption of digitalization and open data. Pharmaceutical laboratory processes often involve manual tasks that don’t necessarily lend themselves to digitalization, which in turn hampers the industry’s ability to share data internally and externally.
There are other historical reasons for the lack of connectivity in laboratories. In the past, instrument vendors often had no choice but to develop their own software from scratch. If you’ve put the time and effort into developing a proprietary system, you would understandably like to maximize your return on investment by giving (limited) access to paying customers only. In addition, vendors might be cautious about other companies’ analytical-software packages replacing their own. Nevertheless, I’ve seen a growing appreciation in recent years for the benefits of embracing accessible, community-supported, interoperable data standards as part of a drive to connect various instruments from different vendors.
BSSN’s co-founder helped develop Analytical Information Markup Language (AnIML), an open source data standard, in 2003. BSSN, which was acquired by Merck KGaA Darmstadt, Germany in 2019, was the first company to introduce a fully compliant AnIML solution to the market.
Suitable for a wide range of analytical measurement techniques, AnIML lets users accurately record and document lab workflows and results, regardless of the instruments or measurement techniques used. This makes it far easier to process, share, and archive experimental data. And because AnIML is based on XML (which is text based), it can be easily implemented with readily available off-the-shelf tools for XML manipulation and read by human beings – an important factor for long-term storage.
In short, AnIML is human readable; clearly structured; flexible; royalty free; available to all; and supported and maintained by an independent, non-profit consortium of industry, academia, vendors, and governmental bodies. And that’s exactly what an open data standard should look like.
If the industry genuinely wants to embrace open data, we need open, community-supported standards like AnIML. Of course, we’re not married to AnIML in particular; if another community- supported standard emerges and makes sense from a scientific perspective, we would gladly embrace it. But, as it stands, I can’t see that happening for a number of years. As stated above, AnIML is easy to share and simple to read. It’s free – you don’t need expensive software tools to use it – and it can be used for any purpose. You can store AnIML files. If you have a software developer who knows a little XML, they can easily use AnIML files to build data visualization tools.
- Long-term storage. Archiving data in a proprietary format often means you need the original application to open the data in the future. Open, standardized, multi-technique formats greatly reduce the number of software tools you must maintain to retain access to the data.
- Data interchange. Due to the proprietary nature of instrument data files, sharing data with other scientists is challenging. Although standards exist to address this issue, they only apply to particular measurement methods. AnIML is a universally accepted standard that supports multiple techniques, making data exchange much easier.
- Regulatory compliance. Certain regulations require electronic signatures on data records. AnIML supports such regulations by providing a standardized way of applying digital signatures to scientific data. Changes to AnIML documents can also be recorded in the built-in audit trail. And, to further increase security, audit trail entries can be digitally signed.
- Cross-technique data mining. Having data from multiple techniques in the same format allows cross-technique data analysis and provides a solid foundation for data mining tools.
- LIMS integration. Rather than implementing a LIMS interface for every instrument, you can establish a single AnIML interface from which LIMS would extract the required information.
Data standards aren’t enough
All that said, AnIML isn’t an open data panacea; it’s a file format that organizes the structure of analytical data. If you want to transfer these data from one system to another, you need a communication standard that prevents them from becoming siloed. And that’s where SiLA comes in.
SiLA enables communication between instruments and software systems so that you can use your data in a way you want. Data generated by a NMR machine, for example, are stored as an XML-based AnIML file, which can be transferred bi-directionally to whatever software you’re using to process your data. Through the seamless integration of AnIML with SiLA, a new ecosystem is emerging that allows end-to-end integration of instrument control, data capture, and leading systems (such as LIMS). Your sample might have its NMR spectral data, HPLC measurements, and laboratory balance readings – all stored as traceable AnIML data packages describing device information, batch status, audit trail, and other parameters – feeding into any data consumption system via SiLA.
This ability to visualize the whole lifecycle of a sample, including data from multiple combined technologies, is incredibly valuable for integration. It also simplifies audit trails, digital signatures, and validation for regulatory compliance, as well as the sharing of companion data with scientific publications or external collaborators. All of this enables scientists to focus on what they do best: science! Researchers can generate and analyze data without having to think about how to store them, which facilitates such benefits as effective collaboration, comparisons of results, and regulatory compliance. Relieving this administrative burden is, to me, the fundamental benefit of standardizing and connecting data with AnIML and SiLA.
We can even go further by integrating the management of data produced by instruments with the data associated with the materials you use in the lab. Integrating AnIML and SiLA with a materials management system would for example add detailed information about the purity, pH, and volume of your raw materials to your global data lake. From there, you can devise standard operating procedures that take variability in your raw materials into account. Welcome to the lab of the future: truly digitalized and automated.
The major challenge for companies today is that none of this is available out of the box: setting up interfaces between different systems remains a manual task. Our aim is to promote the use of standards such as AnIML and SiLA to make this possible, but the next stage will be for vendors to embrace FAIR (findable, accessible, interoperable, and re-usable) principles in the design of their software systems – something of ever-increasing interest to end users and regulators. From our point of view, we’re happy to support all instrument vendors in adopting AnIML and SiLA as part of a wider push toward open data. We currently offer converters for 300+ instruments, and this number will increase rapidly thanks to the additional converters in development. Converters available now include those for complex CDS systems like Chromeleon and Empower.
The concept of open data is exciting, but we are at the beginning of this journey. In labs as a whole, adoption of AnIML is relatively low, but it is increasing. Early adopters tend to be the bigger players in pharma: those with the most complex laboratories feel the strongest pressure from authorities to ensure that their data are secure, and they’re often already using complex software systems like LIMS, which makes adoption more straightforward.
Whatever the sense of urgency, I think few people would argue against a move toward FAIR data, which wasn’t the case just a few years ago. Most of us now recognize that the lab of the future must be open, digitized, and automated. Embracing open data standards is a step toward that future that we can all take today.