MetAssimulo

Developed by Harriet Muncey1, Rebecca Jones1, Maria De Iorio1 and Timothy M D Ebbels2

1. Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London.
2. Biomolecular Medicine, Department of Surgery and Cancer, Imperial College London.

Contact Tim Ebbels

Delving into the very heart of genetic and environmental interactions, metabolic profiling (or metabolomics/metabonomics), the study of small molecules involved in metabolic reactions, is a rapidly expanding `omics' field. A major technique for capturing metabolite data is 1H-NMR spectroscopy and this yields highly complex profiles that require sophisticated statistical analysis methods. The development of such techniques relies on sufficient representative training data. However, experimental data is difficult to control and expensive to obtain. Thus data simulation is a productive route to aid algorithm development.

Example of MetAssimulo output: (a) Real normal urine sample, (b) Mean simulated normal urine sample.

MetAssimulo is a MATLAB-based package which simulates 1H-NMR spectra of complex mixtures such as metabolic profiles. Drawing data from a metabolite standard spectral database in conjunction with concentration information input by the user or constructed automatically from the Human Metabolome Database, MetAssimulo is able to create realistic metabolic profiles containing large numbers of metabolites with a range of user-defined properties. Current features include the simulation of two groups ('case' and 'control') specified by means and standard deviations of concentrations for each metabolite. The software also allows addition of spectral noise with a realistic autocorrelation structure at user controllable levels. A crucial feature of the algorithm is its ability to simulate both intra- and inter-metabolite correlations, the analysis of which is fundamental to many techniques in the field. Further, MetAssimulo is able to simulate shifts in NMR peak positions that result from matrix effects such as pH differences which are often observed in metabolic NMR spectra and pose serious challenges for statistical algorithms.

We thank the UK Medical Research Council for financial support via a Capacity Building Studentship awarded to Harriet Muncey and Rebecca Jones.

The paper describing this work can be found here.