Computational tools for investigating the role of chromatin in regulating genomic functional elements
Lai, William Kai Ming
MetadataShow full item record
Chromatin is the nucleoprotein complex composed of histone protein and DNA found in the nucleus of every eukaryotic cell. The basic subunit of chromatin is the nucleosome, consisting of 147bp of DNA wrapped around a histone octamer. The nature of nucleosome position, occupancy, and the presences and/or absence of post-translational covalent modification to the histone protein are responsible for differential gene regulation. The remarkable advancements in genomics technology have revealed the enormous complexity and combinatorial nature of chromatin. However the difficulty of analysis has become exponentially more difficult as the size of the datasets has similarly increased. In order to facilitate interpretation of the combinatorial nature of chromatin, we have developed a novel method to integrate all chromatin datasets into distinct nucleosome types (nucleosome alphabet). We have applied this approach to S. cerevisiae, generating a nucleosome alphabet, which forms chromatin motifs when mapped back to the genome. By applying novel chromatin alignment and global word search approaches we have defined distinctive chromatin motifs for introns, origins of replication, tRNAs, antisense transcripts, double-strand break hotspots, DNase hypersensitive sites, and can distinguish genes by expression level. We have also uncovered strong associations between transcription factor binding and specific types of nucleosomes. Our results demonstrate the uses and functionality of defining a chromatin alphabet and provide a unique and novel framework for exploring chromatin architecture. Characterization of chromatin architecture remains a complex task due to the difficulties inherent in identifying the location of genomic functional elements. Under the assumption that genomic functional elements possessing conserved function also possess conserved chromatin structure, we have developed a chromatin architecture alignment algorithm (ArchAlign). ArchAlign identifies shared chromatin structural patterns from high-resolution chromatin structural datasets derived from next-generation sequencing or tiled microarray approaches for user defined regions of interest. We validated ArchAlign using well characterized functional elements, and used it to explore the chromatin structural architecture at CTCF binding sites in the human genome. Finally, we have developed and implemented a novel chromatin Architecture Basic Local Alignment Search Tool (ArchBLAST) for the purpose of identification and characterization of new classes of genomic features. The ArchBLAST algorithm utilizes conserved chromatin architecture at known sites of interest and globally searches the genome for similar sites. ArchBLAST differs from other approaches in that it uses the amplitude and spatial arrangement of chromatin modifications to score similarity. Importantly ArchBLAST allows for identification of subtypes of known genomic features and can accurately predict previously uncharacterized locations. ArchBLAST uses an innovative weighted profile generated from only the most informative chromatin datasets and then scores the entire genome. We have validated the accuracy of our approach with multiple well characterized genomic features from yeast and humans. We show ArchBLAST is capable of predicting both gene expression and genomic feature directionality as well as identifying cell-type specific enhancers using only chromatin architecture.