Abstract |
In this project, we take a Bayesian perspective of estimating the neighbourhood of a set of p query variables in an undirected network of dependencies. Gaussian Graphical Models (GGM) are a tool for repre- senting such relationships in an interpretable way. In a classical GGM setting, the sparsity pattern of the inverse covariance matrix W encodes conditional independence between variables of the graph. Consequently, various estimators have been proposed that reduce the number of parameters by imposing sparsity constraints on W, e.g. the graphical lasso procedure and its Bayesian extensions. We consider a sub-network corresponding to the neighbourhood of a set of query variables, where the set of potential neighbours is big. We aim at developing an efficient inference scheme such that the estimation of the sub-network is possible without inferring the entire network.
In real world situations it is often the case that we have to estimate a full network but interpret only part of it. An example of such a situation is modelling the dependence between clinical variables and a potentially large set of genetic explanatory variables. Here, we would be more interested in establishing the links between these portions, rather than examining the links within the portions themselves. The proposed idea averts prohibitive computations on the whole network and makes it possible to estimate only the parts of interest. An additional challenge is the ability to handle missing values and heterogenous data, i.e. continuous and discrete random variables at the same time. We plan to achieve this by a copula extension. |