NL FR EN
www.belgium.be

Prototype of a distributed FAME identification system for improved identification of prokaryotes

Research project C3/012 (Research action C3)

Persons :

Description :

Context

Fast and accurate identification of microorganisms is of large environmental, clinical and economical importance, finding applications within a wide range of areas such as biodefense, bioremediation, clinical diagnosis, crop protection, food safety and water management, among many others. Although the enormous potential that faster and more sophisticated computing offers to the field of pattern recognition has been exploited in many fields, its application for the identification of bacterial species remains in a comparatively early stage. The technical challenges remain considerable. However, the huge progress that has been made by a few largely exploratory projects is impressive. It suggests that the bounds on just what machine learning is possible to achieve remain to be established. Cultural and organisational issues and availability of adequate and appropriate resources have undoubtedly severely constrained progress so far. However, given the high value of the possible output – a generic automated species identification system that could open up vistas of new opportunities for pure and applied work in microbiology and related fields – it would seem foolish not to find ways of overcoming these obstacles in the near future.

Project description

Automated identification of a bacterial strain based on the knowledge of some of its phenotypic and genomic traits, is in many cases clearly unreliable without unrestricted access to the accumulated empirical evidence upon which the prokaryotic taxonomy is settled. Generating a sufficient amount of observational information to cover the whole taxonomic range – a necessary precursor for the implementation of automated identification systems – is obviously far to ambitious to be accomplished by a single research laboratory. Hence, the establishment of such a large-scale technology platform for biological resources should be set up as a community-wide effort, wherein the databases owned by different research institutes are integrated. Concurrently, this platform could create a scientific workplace where collaborative research activities with mathematicians, engineers and computer scientists would lead to new approaches for bacterial identification, while providing a solid basis for the harmonization of international policies and shaping the foundation for a better understanding of the dynamics behind prokaryotic diversification. Setting up a platform for the exchange of bacterial knowledge includes advancing the barriers of global data sharing, identify and come up with ways to fill the gaps of observational efforts, and explore the possibilities of novel data mining techniques to the benefits of understanding bacterial life.

With the many improvements in automated calibration and interpretation of the chromatographic profiles, reproducible fatty acid profiles nowadays can be generated rapidly, provided that strains are grown under specified and standardized conditions, and the identification of microorganisms by analysis of their cellular fatty acid composition has become a routine method in many laboratories. The high-throughput properties of fatty acid analysis enables the screening of a larger segment of the prokaryotic diversity, compared to genotypic markers such as 16S rRNA gene sequence or multilocus sequence information. Therefore, as a first step towards the construction of a complex technology framework this project will focus on the integration of the fatty acid databases that are autonomously generated using the Sherlock Microbial Identification System within the BCCM/LMG and DSMZ bacteria collections. We will demonstrate how some previously unseen patterns can be learned from this large knowledge base, resulting in an improved discriminatory power of bacterial whole cell fatty acid methyl ester (FAME) analysis for recognizing bacterial species. These findings should endorse the applicability of knowledge discovery in databases with the field of microbiology. However, apart from extending the size of the knowledge base, the accuracy and flexibility of automated species identification may be further enhanced by the application of state-of-the-art machine learning methods. From the successful application of these algorithms in other domains is can be anticipated that they should be able to accurately model the whole spectrum of operational taxonomic units, given that sufficient qualitative training examples can be collected. Finally, we will also focus our attention on the opportunities that are offered by the integration of fatty acid databases from different research institutions for quality control issues concerning the authenticity of biological material as it is transferred between these institutions.

Work packages

To achieve the aims of the project, the following work packages (WPs) were defined

WORK PACKAGE 1. integration of BCCM/LMG and DSMZ fatty acid databases
1.1 integrated strain database management
1.2 incorporation of taxonomic name resolver
1.3 linking fatty acid databases onto the integrated strain database
1.4 standardisation of experiment annotation format
1.5 analysis of the information content of the integrated FAME databases
1.6 evaluate scalability of information system for future extensions

WORK PACKAGE 2. improved automated identification of fatty acid compounds
2.1 data warehousing for OLAP
2.2 re-evaluation of existing peak naming libraries
2.3 test stability of new fatty acid windows
2.4 determine chemical composition of new fatty acid windows
2.5 effect of improved fatty acid peak identification on species discrimination

WORK PACKAGE 3. design of novel species identification strategies
3.1 supervised identification
3.1.1 training set management
3.1.2 implementation of state-of-the-art machine learning methods
3.2 unsupervised identification
3.3 compare alternative identification strategies: accuracy and flexibility

WORK PACKAGE 4. quality control of authenticity of biological material
4.1. error rate estimation of inter-laboratory reproducibility
4.2. determine list of bacterial strains shared between BCCM/LMG and DSMZ
4.3. detect fatty acid profile inconsistencies across laboratory boundaries

Partners

Partner 1: BCCM\LMG, Ghent University
Promotor: Prof. Dr. Paul De Vos
Laboratory of Microbiology
K.L. Ledeganckstraat 35
B-9000 Gent, Belgium
Tel: 09 264 51 10
Fax: 09 264 50 92
Email: Paul.DeVos@UGent.be
http://lmg.ugent.be; http://bccm.belspo.be

Partner 2 : KERMIT, Ghent University
Promotor: Prof. Dr. Bernard De Baets
Research unit Knowledge-based Systems
Department of Applied Mathematics, Biometrics and Process Control
Coupure links 653
B-9000 Gent, Belgium
Tel: 09 264 59 41
Fax: 09 264 62 20
Email: Bernard.DeBaets@UGent.be
http://users.ugent.be/~bdebaets/

User committee

Member 1
Contact person: Prof. Dr. Erko Stackebrandt
Institution: DSMZ (German collection of microorganisms and cell cultures)
Address: DSMZ GmbH, Mascheroder Weg 1b, 38124 Braunschweig, Germany
Tel: ++49 531 2616 352
Fax: ++49 531 2616 418
Email: erko@dsmz.de
Type of Institution: Member of the Leibniz Wissenschaftsgemeinschaft, funded by the Ministry of Science and Technology, Germany, A Service Institute with the mandate to perform Collection-related research

Member 2
Contact person: Dr. David Smith
Institution: CABI Bioscience
Address: CABI Bioscience, Bakeham Lane, Egham, Surrey, TW20 9TY, UK
Tel: ++44 1491 829046
Fax: ++44 1491 829100
Email: d.smith@cabi.org
Type of Institution: Not for profit Intergovernmental organisation with two divisions: Publishing and Bioscience. BioScience work includes conservation and utilisation of microorganisms, molecular biology, identification of fungi and bacteria, biopesticides, crop protection, ecology, industrial and environmental biology.

Member 3
Contact Person : Dr. Myron Sasser
Institution: MIDI, Inc.
Address : MIDI, Inc , 125 Sandy Drive, Newark, DE 19713, USA
Tel: 302-737-4297 or 800-276-8068
Fax: 302-737-7781
Email: myron@midi-inc.com
Type of Institution: MIDI, Inc is the microbiological instrument company, developer and manufacturer of the Sherlock Microbial Identification System. Founded in 1985 by Dr. Myron Sasser, MIDI, Inc is a small biotech company located in Newark, Delaware. MIDI develops, manufactures and sells the Sherlock Microbial Identification System (MIS) worldwide. The Sherlock System identifies aerobic bacteria (including bioterrorism agents), anaerobic bacteria and yeast based on gas chromatographic (GC) analysis of whole cell fatty acid content.