Estimating Tree-Structured Covariance Matrices via Mixed-Integer Programming with an Application to Phylogenetic Analysis of Gene Expression
نویسندگان
چکیده
We present a novel method for estimating tree-structured covariance matrices directly from observed continuous data. A representation of these classes of matrices as linear combinations of rank-one matrices indicating object partitions is used to formulate estimation as instances of well-studied numerical optimization problems. In particular, we present estimation based on projection where the covariance estimate is the nearest tree-structured covariance matrix to an observed sample covariance matrix. The problem is posed as a linear or quadratic mixed-integer program (MIP) where a setting of the integer variables in the MIP specifies a set of tree topologies of the structured covariance matrix. We solve these problems to optimality using efficient and robust existing MIP solvers. We also show that the least squares distance method of Fitch and Margoliash (1967) can be formulated as a quadratic MIP and thus solved exactly using existing, robust branch-and-bound MIP solvers. Our motivation for this method is the discovery of phylogenetic structure directly from gene expression data. Recent studies have adapted traditional phylogenetic comparative analysis methods to expression data. Typically, these methods first estimate a phylogenetic tree from genomic sequence data and subsequently analyze expression data. A covariance matrix constructed from the sequence-derived tree is used to correct for the lack of independence in phylogenetically related taxa. However, recent results have shown that the hierarchical structure of sequence-derived tree estimates are highly sensitive to the genomic region chosen to build them. To circumvent this difficulty, we propose a stable method for deriving tree-structured covariance matrices directly from gene expression as an exploratory step that can guide investigators in their modelling choices for these types of comparative analysis. We present a case study in phylogenetic analysis of expression in yeast gene families. Our method is able to corroborate the presence of phylogenetic structure in the response of expression in a subset of the gene families under particular experimental conditions. Additionally, when used in conjunction with transcription factor occupancy data, our methods show that alternative modelling choices should be considered when creating sequence-derived trees for this comparative analysis. ∗Corresponding Author, [email protected]
منابع مشابه
Estimating Tree-Structured Covariance Matrices via Mixed-Integer Programming
We present a novel method for estimating tree-structured covariance matrices directly from observed continuous data. Specifically, we estimate a covariance matrix from observations of p continuous random variables encoding a stochastic process over a tree with p leaves. A representation of these classes of matrices as linear combinations of rank-one matrices indicating object partitions is used...
متن کاملA Mixed Integer Programming Approach to Optimal Feeder Routing for Tree-Based Distribution System: A Case Study
A genetic algorithm is proposed to optimize a tree-structured power distribution network considering optimal cable sizing. For minimizing the total cost of the network, a mixed-integer programming model is presented determining the optimal sizes of cables with minimized location-allocation cost. For designing the distribution lines in a power network, the primary factors must be considered as m...
متن کاملApplication of Gene Expression Programming and Support Vector Regression models to Modeling and Prediction Monthly precipitation
Estimating and predicting precipitation and achieving its runoff play an important role to correct management and exploitation of basins, management of dams and reservoirs, minimizing the flood damages and droughts, and water resource management, so they are considered by hydrologists. The appropriate performance of intelligent models leads researchers to use them for predicting hydrological ph...
متن کاملALTERNATIVE MIXED INTEGER PROGRAMMING FOR FINDING EFFICIENT BCC UNIT
Data Envelopment Analysis (DEA) cannot provide adequate discrimination among efficient decision making units (DMUs). To discriminate these efficient DMUs is an interesting research subject. The purpose of this paper is to develop the mix integer linear model which was proposed by Foroughi (Foroughi A.A. A new mixed integer linear model for selecting the best decision making units in data envelo...
متن کاملOPTIMIZATION OF TREE-STRUCTURED GAS DISTRIBUTION NETWORK USING ANT COLONY OPTIMIZATION: A CASE STUDY
An Ant Colony Optimization (ACO) algorithm is proposed for optimal tree-structured natural gas distribution network. Design of pipelines, facilities, and equipment systems are necessary tasks to configure an optimal natural gas network. A mixed integer programming model is formulated to minimize the total cost in the network. The aim is to optimize pipe diameter sizes so that the location-alloc...
متن کامل