Bioinformatics Research Centre

 

Mark Girolami and Rainer Breitling
Biologically Valid Linear Factor Models of Gene Expression
to appear Bioinformatics, 2004

The identification of physiological processes underlying and generating the expression pattern observed in microarray experiments is a major challenge. Principal Component Analysis (PCA) is a linear multivariate statistical method that is regularly employed for that purpose as it provides a reduced-dimensional representation for subsequent study of possible biological processes responding to the particular experimental conditions. Making explicit the data assumptions underlying PCA highlights their lack of biological validity thus making biological interpretation of the principal components problematic. A microarray data representation which enables clear biological interpretation is a desirable analysis tool. We address this issue by employing the probabilistic interpretation of Principal Component Analysis and proposing alternative Linear Factor Models which are based on refined biological assumptions. A practical study on two well-understood microarray data sets highlights the weakness of Principal Component Analysis and the greater biological interpretability of the linear models we have developed.


Matlab Code
Download the following zipped file. LFM_DEMO.zip
Unzip the file to  your chosen directory then startup Matlab ensuring the directory is defined in your path. To view the demo simply type efm_demo at the Matlab command line.