Characterization of physicochemical environments of proteins
Tan, Kuan Pern
Date of Issue2017-03-01
School of Computer Science and Engineering
Bioinformatics Institute, A*STAR
Proteins are molecular machines in cells that perform a diverse set of essential biological functions. The functions of proteins are determined by its 3D structure. The struc- ture creates local microenvironments for protein atoms to interact with one another. A detailed understanding of these microenvironments would allow better characterization and engineering of protein functions. For example, this knowledge forms the basis of modern therapeutics innovations such as rational drug and vaccine design, and could have implications in other industries, including bioprocessing, biomimetics, biomaterials among others. This thesis presents my results on the characterization of physicochemical properties of microenvironment in proteins. To investigate the complex nature of protein microenvironment, the characterization effort can be broadly categorized into three in- terconnected topics, namely (i) residue depth (ii) hydrogen bonding, and (iii) multibody statistical potential. The first topic aims to quantify protein microenvironment using the biophysical pa- rameter of residue depth. Depth of an amino acid measures the degree of amino acid burial in proteins. I have shown that the energetics of proteins, the spatial distribution and chemical properties of amino acids are dependent on residue depth. To exemplify and utilize the results, I have designed several computational methods for protein en- gineering and functional characterization. First, a novel method to design temperature sensitive alleles of proteins was proposed by making point mutations of these residues. Next, I have used residue depth to identify small molecule ligand binding site on proteins by supplementing it with solvent accessibility and evolutionary information. Benchmarks have shown that the method has comparable or better than the best available methods, and could reveal unconventional sites unidentifiable with other methods. In addition, I have also shown that residue depth can be used in the estimation of protein cavity volume using a Monte Carlo sampling approach, and pK a of amino acid residues using a linear model. The second topic studies the physicochemical properties of hydrogen bonding in dif- ferent protein environments. I have performed statistical analysis on databases and clas- sified hydrogen bonds into different types, and characterized the geometrical preference and variations of the different types. By analyzing quantum simulation of the system, I have shown that the geometrical preference of main-chain hydrogen bond is due to elec- tron density arising from the planar nature of the peptide bond. I have also performed empirical simulations that strongly suggest the causal link between this geometrical pref- erence and secondary structure formation. Next, I have discovered that low-resolution protein models in databases are consistently missing hydrogen bonds. To ameliorate the models, I have designed a two-step refinement protocol. First, a simple algorithm was used to predict missing pairs of donor-acceptor to form hydrogen bonds based on their mutual preference and specificity. Second, Gaussian restraints were applied on the geometry distribution of the missing pairs, after which a standard modelling protocol can be implemented to refine the protein model. The refinement protocol was shown capable of re-introducing hydrogen bonds in the local environment as well as improving overall model quality. The refinement has functional implication on the protein chemical properties, as exemplified with the more accurate pK a prediction. The third topic is constructing an environmental dependent protein statistical po- tential Packpred. Here, I have explicitly defined protein microenvironments as a set of tightly packed amino acids, dubbed as ”residue cliques”. Employing Sippl’s formulation, the non-random occurrence of microenvironments is characterized. The non-random occurrence is indicative of the strength of interaction among amino acids, and can be interpreted as an energy potential. I have evaluated the capability of the potential in describing protein energetics on a large number of mutagenesis data. The benchmark has shown that, as compared to all other competing methods, Packpred has the best performance not only in binary classification of destabilizing mutants, but also correctly rank-ordering the degree of phenotypical change associated with different mutations. Lastly, I also present three biomolecular system modelling studies involving non- globular proteins. These system are (i) Cohesin ring protein with coiled-coil structure (ii) transmembrane transporters OCTN-1 and -2, (iii) interaction interface between onco- genic proteins VAV1 and EZH2. Modelling of these systems are challenging because con- ventional tools and framework of comparative modelling are not applicable. Instead, an integrative modelling approach was undertaken pertaining to individual systems. In all the modelling work I have proposed experimentally testable hypotheses to decipher the biological mechanism underlying the systems. In conclusion, in this thesis I have presented an extensive characterization of physic- ochemical environments of protein. The complex nature of the environment was elu- cidated by three interdependent topics of residue depth, hydrogen bonding and amino acid cliques. In addition to novel results, for every investigation I have also explored their biological utilities, and have built open-access tools for them. I hope that the work presented here would facilitate future research into protein structures and their functions.