Proteins are large and complex molecules, which perform a vast array of functions. They are one of the four molecules of life. There are a lot of varieties of proteins present in the human body, ensuring survival and health. Useful data about these proteins are collected and stored in multiple databases, and these data are used by scientists and clinicians to understand and fight diseases. However, as there are a number of databases with different scopes, the data is widely scattered without much uniformity in structure.
Published in: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0210475
For more information, please write to firstname.lastname@example.org
Diseases and heritable disorders like cancer and diabetes are linked to mutations in proteins. Mutations are the change in the sequence of proteins. In clinical setups, a patient’s sequence data can be used to identify the presence of one particular mutation, which provides a prognosis, and predicts outcomes for particular modes of treatment. For a sophisticated analysis, we require algorithms that are advanced and precise. Researchers and clinicians who are working towards customized medicines are currently creating such databases for use in predictive algorithms and other large-scale analyses. While there are a number of relevant databases available, they usually do not provide additional details, such as how the mutation affects the protein. For example, existing databases do not give information about the protein’s location in the cell (“subcellular location”). Sometimes, the data found in one database may even conflict with the data found in another. A comprehensive database could mitigate these shortcomings.
Prof Michael Gromiha and his team at IITM have created several such databases: MutHTP for disease-causing mutations in transmembrane proteins (proteins located in the membrane), the mutational effects on protein aggregation (CPAD), protein stability (ProTherm) and binding affinity of protein-protein complexes (PROXiMATE). HuVarBase is one of the newest resources created by the team, which contains data on mutations in human proteins with comprehensive information at genes and proteins. The database is publicly available.
The first step in constructing HuVarBase was to go through the literature, and existing variant databases to collect all the necessary data. In the case of conflicting data, the team went back to the original literature sources to decide which data could be included. Further, additional features about the protein were included: the protein sequence (i.e. the order of amino acids used to build the protein), the disease class (i.e. the type of disease like cardiovascular disease or skin disease); links to structural details of the protein, protein’s location in the cell, whether the protein has any additional molecular changes (collectively termed as “post-translational modifications”), etc.
The database is equipped with advanced search options, an easy-to-follow tutorial, FAQs, and glossary, thus making it accessible to those interested in the study of protein mutations and diseases. Ensuring the reliability of data was one of the challenges faced by the team. In order to ensure that the data provided is accurate and traceable, the team has included links to the original source of the data.
“In this genomic era, a large volume of disparate “omics” data
pile-up almost on a daily basis. The information content in such data
sets are enormous. It is non-trivial to uncover biologically relevant
and significant information from these massive data sets. One of the
first requirements in such ventures is to be able to integrate the
disparate data sets in a logical and effective way before querying the
data to address a biological question. Gromiha and co-workers, in their
development named HuVarBase, made a strategic integration of variant
Omics datasets on humans. Though one talks about human genome sequence
data as though it is absolute, there are variations between humans.
Sometimes these variations have genetic basis leading to an explanation
for the vulnerability of an individual human for example, to cancer,
diabetes and like. Therefore HuVarBase should greatly aid our
understanding of the molecular basis of a disease process. This forms
the firm first steps in the pipeline of drug design and discovery. Use
of development depends on the ease of using the web interface. HuVarBase
has several useful features incorporated to enable user community to
make a complex query quite easily.”
Molecular Biophysics Unit
Indian Institute of Science, Bangalore