/sites/default/files/styles/banner_image/public/default_images/inside-page-banner_2_1.jpg?itok=Er8q0C-3
Fellow 2009-10

Saurabh Sinha

Computer Science

The New Frontier of Genomics: Predicting Biological Function of a DNA Sequence

Genes in a cell are “switched” on or off in precise patterns that determine how the cell functions, a process known as gene regulation. Located near each gene is a segment of DNA known as a regulatory sequence that governs that particular gene. A typical regulatory sequence contains several subsegments known as binding sites, which together determine when and where the nearby gene is switched on.

Professor Sinha is combining ideas from statistical mechanics and bioinformatics to approach one of the great scientific challenges in this area: Given a particular regulatory sequence, how can we determine its exact biological function? As part of this project, he will make the first attempt to design DNA sequences that can perform specific, complex regulatory functions.

During his Center appointment, Professor Sinha will (a) model the interaction between a transcription factor (protein) and its binding site (DNA) through the energy of binding, using the bioinformatics technique of position weight matrix, (b) model the interaction between a transcription factor and the basic machinery for a gene’s regulation through an unknown energy term that will be learned from examples, and (c) model the interaction between any two transcription factors through an optional cooperative energy term that will depend on the distance between the two factors along the DNA.

Once the modeling is complete, Professor Sinha and his group will study two main technical issues: When multiple transcription factors together activate a gene, what is the relationship between the number of such activators and the extent of activation that takes place? and When a transcription factor represses a gene, does it do so by interacting with activators, with the basic machinery, with the DNA, or some combination of these? The research team will analyze datasets from a variety of biological systems, searching for the most informative clues related to these questions.