Structural and bioinformatic studies of the short chain oxidoreductase enzyme family
Huether, Robert Paul
MetadataShow full item record
The superfamily of short chain oxidoreductase enzymes contain more than 40,000 members and contain a conserved Tyr and Lys in the catalytic center. The largest subfamily contains an N-terminal Gly rich motif TGxxxGxG (TGYK-SCOR) and is made up of over 19,000 members. They control essential metabolic processes for nutrients, hormones, and toxins. TGYK-SCOR sequences have an approximate length of 250 amino acid residues of which 40 are highly conserved. This thesis describes a technique for clustering subsets of the TGYK-SCOR enzymes in order to correlate sequence to substrate specificity and species distribution. We report the crystal structure determinations of three TGYK-SCOR enzymes that represent subsets of the family, which no structure has been reported previously. The structures provide new details about the nature of cofactor and substrate binding and mechanism of action. The substrates of 38% (7,311) of the TGYK-SCOR enzymes are predicted on the basis of biochemical analysis and/or sequence homology. Crystal structures of 63 TGYK-SCOR enzymes reveal that the substrate-binding pocket is made up of three flexible loops. Contacts between amino acid residues of bound substrates in five specific positions in two of these loops identify a "minimum substrate fingerprint" for dozens of substrates in the TGYK-SCOR family. Using amino acid identities, a subset composed of 11,788 TGYK-SCOR sequences was divided into 151 potential substrate-specific families. Three families of uncertain substrate fingerprint were chosen for further structural study and analysis. The structure of the TGYK-SCOR enzyme A3DFK9 from C. thermocellum, a bacterium with the ability to convert biomaterial into ethanol, was determined to 1.7Å resolution. On the basis of sequence analysis the protein was predicted to interact with the cofactor NAD. Structural solution of the protein complexed with NAD supported this prediction. Several variations in highly conserved residues were observed. The variations were found to be structurally analogous to typically observed residues in this family. The clustering revealed that two residues previously identified as essential to mechanism of action could be replaced by two residues not previously recognized as compatible with activity. Additionally, a glycerol molecule was bound in the substrate-binding region. When TGYK-SCOR sequences having the ensemble of residues that interacted with this glycerol were isolated, 36 proteins from SwissProt/ TrEMBL clustered with A3DFK9. The structure revealed that the loops surrounding the substrate-binding region are shifted and expose different binding positions. The function of this protein is unknown, but our analysis of the substrate-binding loops putatively identifies A3DFK9 as a carbohydrate or polyalcohol metabolizing enzyme. The structure of the TGYK-SCOR Q9HYA2, from the pathogenic bacterium P. aeruginosa was solved to 2.3Å resolution. It possesses an atypical catalytic tetrad composed of Lys118-Ser146-Thr159-Arg163. The substrate binding and cofactor recognition residues are conserved in 86 orthologs. The structure revealed that the putative active site of Q9HYA2 contains chemically similar amino acids at each catalytically important position of a typical TGYK-SCOR enzyme (N → K118, Y→ T159, K → R163: typical TGYK-SCOR to Q9HYA2 respectively). This is the first observation of a TGYK-SCOR protein having a catalytic center with threonine replacing the catalytic tyrosine and a Cl- ion replacing the hydroxyl of the tyrosine. An NADP + -dependent serine dehydrogenase [EC 220.127.116.116] from Saccharomyces cerevisiae (YMR226C) was determined to a resolution of 2.36Å. The protein is the first structure solved of the putative NADP + serine 3-dehydrogenase group with the conformation of all three substrate-binding loops fully resolved. This protein contains a five substrate-fingerprint of AG-YTG, which is one of the five most commonly observed substrate-fingerprints in the TGYK-SCOR family. This fingerprint is found in over 637 members from different species of bacteria and lower eukaryotes. The binding of the cofactor and a hydrogen bond between the substrate fingerprint residues Y162 and R209 stabilizes the third substrate-binding loop forming the binding pocket. Although all residues in the predicted five substrate-binding fingerprint may not directly contact the substrate, the structure revealed their importance to forming the secondary shell to the binding pocket, and to verify the predicted residues in clustering and characterizing members in this subfamily. The structures of the three proteins described in this work each have sequence variations in highly conserved residues that illustrate species divergence of substrate, and details of mechanism of action. Although endogenous substrates for these enzymes are still unknown, all three contain the hallmarks of active TGYK-SCOR proteins.