The aim of this project is to use a statistical approach to predict the structure of protein molecules from their chemical composition. It builds on existing work by Roslin bin Ismael, and involves implementing his algorithm in C++ and displaying the results graphically. Some knowledge of statistics is desirable, but no know knowledge of Biochemistry is required.
Proteins are large molecules involved in the chemistry of life. The structure of the molecules is a long chain, folded into a complicated shape. The chemical properties of the molecule are largely determined by its shape. The actual chemical structure of any protein can be determined relatively easily and can be thought of as a string of length around 200 with an alphabet of size 20. This string forms what are called secondary structures, such as helices and sheets, which are then folded to form tertiary structures, the complete molecule. It is quite hard to find the shape of a molecule experimentally and so there are several theoretical approaches to tackle this problem, which has not yet been solved. An example of a class molecule whose shape it would be good to determine is prion. This is the molecule responsible for BSE and CJD and it is thought to have two different shapes, one benign and one fatal.
This project builds on a secondary structure prediction method developed by Roslin and Poet. I would like to put in some extra effort to improve the prediction, which is where some statistical expertese would help. I would also like to incorperate it into my graphics tool, writen in C++. The work is mainly software engineering, but an interest in socially relevant research would be an asset.