Executive Summary
Chemical R&D activities continue to generate a deluge of instrumental analytical data on a daily basis, regardless of industry. Regulatory submissions and critical R&D or manufacturing decisions are based on analytical data every day. When data are silo’d and unavailable in standard accessible formats, access and re-use for decision-making and problem-solving is hard, if not impossible. Organizations must have ways to standardize, homogenize, and digitize analytical data to improve data access while maintaining data integrity, and facilitating scientific business innovation. In this drive for standardization, however, we postulate the importance of positioning it with chemical context, since analytical experiments are diverse by purpose and support many chemical workflows.
Chemical R&D activities continue to generate a deluge of instrumental analytical data on a daily basis, regardless of industry. Regulatory submissions and critical R&D or manufacturing decisions are based on analytical data every day. When data are silo’d and unavailable in standard accessible formats, access and re-use for decision-making and problem-solving is hard, if not impossible. Organizations must have ways to standardize, homogenize, and digitize analytical data to improve data access while maintaining data integrity, and facilitating scientific business innovation. In this drive for standardization, however, we postulate the importance of positioning it with chemical context, since analytical experiments are diverse by purpose and support many chemical workflows.

Introduction: Data-to-Information-to-Knowledge
In ca. 1600 AD, Johannes Kepler published new interpretations of empirical scientific data to postulate new insights into how the universe worked. In the foreword to the 2009 seminal text, ‘The Fourth Paradigm—Data-Intensive Scientific Discovery’ (inspired by Jim Gray’s 2007 paper on eScience), Gordon Bell from Microsoft Research relays the following: “It was Tycho Brahe’s assistant Johannes Kepler who took Brahe’s systematic astronomical observations and discovered the laws of planetary motion. This established the division between the mining and analysis of captured and carefully archived experimental data and the creation of theories.”1 In his influential paper ‘eScience: A Transformed Scientific Method’ Jim Gray described Data Exploration as the fourth paradigm: “A Thousand years ago, science was empirical, describing natural phenomena—[first paradigm]
In the last few hundred years, the theoretical branch used models and generalizations—[second paradigm]
In the last few decades, the computational branch afforded the simulation of complex phenomena—[third paradigm]
Today, data exploration (or eScience) aims to unify theory, experiment, and simulation—[fourth paradigm]:
• Data is captured by instruments or generated by a simulator
• Data is processed by software
• Information/knowledge is stored in a computer
• Scientist analyzes database/files using data management and statistics” 2In this fourth paradigm data is, therefore, the lineage of information which provides knowledge that enables managers to make strategic and tactical data-based decisions for actions that maximize benefits and limit risks. Data exchange between organizations and data sharing inside organizations is necessary to effectively communicate this ‘data-information-knowledge’ lineage. Such an approach, however, demands dealing with the data deluge coming from the overwhelming volume, velocity, variety, and variability of data. This is no truer than when discussing analytical data generated on a variety of instruments; from disparate techniques; to answer any number of different questions and address diverse situations throughout chemical R&D.