A computational study of crystallization data: Application and potential prediction
MetadataShow full item record
Protein crystallization is an ongoing bottleneck in the process of protein structure determination via X-ray crystallography. The advent of large structural biology projects such as the Protein Structure Initiative (PSI) has led to the unprecedented production of data on proteins and their crystallization outcomes. In order to effectively mine these data and move crystallization towards a more methodical process from its current empirical state, new analysis methods are needed. This thesis contains the Cocktail Distance Coefficient (CD coeff ), a distance metric for the effective comparison of crystallization cocktails, and shows how hierarchical clustering using this metric can highlight trends in crystallization data. The use of the CD coeff correctly identifies changes in crystallization screens used by the Hauptman-Woodward Medical Research Institute (HWI) and, by connecting it with a specific protein's crystallization results, can lead to the correction of that protein's structure. The CD coeff is then applied to the study of crystal contacts, the points of contact between protein molecules in a crystal structure, in an attempt to connect properties of these contacts to proteins' crystallization propensities. In this limited analysis, no strong correlations were found, but the data do support published theories on contact composition.