A Markov Classification Model for Metabolic Pathways

Prof. Hiroshi Mamitsuka (Kyoto University, Japan)

Abstract

The networked representation of complex processes such as metabolism allows researchers to gain an intuitive understanding of the key elements that determine the overall network function. However, as the size and complexity of these networks increase, the ability to visually understand the interaction between components is lost. It must be useful for biologists if there is some computational model which can automatically extract the key functional components of interest to their specific problem from networks. We present a novel method for identifying the key pathways through metabolic networks that relate to an observed biological response. In a realistic context, the response could represent different experimental conditions and the pathways are defined to be a sequence of active genes observed within a metabolic network. Our proposed model, called HME3M, first searches for frequently observed or dominant pathways within the set of all pathways using a Markov mixture model. Then supervision of this cluster analysis is performed by training local classifiers on the pathways of each cluster. This process is not performed as two discrete steps: each step is optimized simultaneously with an EM algorithm. The EM optimization allows for information to pass between the pathway clustering algorithm and each local classifier. The flow of information between the mixture model and each classifier has the effect of identifying pathway clusters which optimize the performance of each classifier. Therefore HME3M is a probabilistic ensemble classifier where each classifier is localized to a specific dominant pathway. We compare the performance of HME3M against logistic regression and support vector machines (SVM) using simulated pathways and two metabolic networks: glycolysis and the pentose phosphate pathway. We use AltGenExpress mircroarray data and focus on the pathway differences in the developmental stages and stress responses of the benchmark organism, Arabidopsis thaliana. The results clearly show that HME3M outperforms the comparison methods in the presence of increasing network complexity and pathway noise. Furthermore, an analysis of the paths identified by HME3M for each metabolic network confirmed the known biological responses of Arabidopsis.

BIO:
Hiroshi Mamtisuka is Professor of Bioinformatics Center, Institute for Chemical Research, Kyoto University , Japan , being jointly appointed as Professor of Graduate School of Pharmaceutical Sciences of the same university. He received his Ph.D. in Information Science from the University of Tokyo in Japan . He did research in the machine learning domain, considering a variety of applications in business and sciences for more than ten years and then has priotized Bioinformatics more for the recent seven years. His current research interests are to develop data mining/machine learning techniques, which work for semi-structured data in biology.