Computational analysis of Human papillomavirus (HPV) E1, E2, E6, E7 proteins, the LCR regions, and biological consequences

Researcher(s)

  • Sean Fletcher, Medical Diagnostics, University of Delaware

Faculty Mentor(s)

  • Subhasis Biswas, MMSC, University of Delaware

Abstract

Human papillomavirus (HPV) is a significant risk factor for cervical cancer. The HPV genome is divided into three functional regions: early, late, and long control region (LCR). The early region encodes the E1, E2, E4, E5, E6, and E7 proteins, which are involved in viral replication and transcription. The LCR contains regulatory elements that control the expression of the early genes.

We used bioinformatics tools to analyze the E1, E2, E6, E7, and LCR regions of HPV to determine the amino acid residues that remained invariant during the evolution of the virus family. The invariant residues within the amino acid chains of the proteins were identified as follows; E1 having 26, E2 having 7, E6 having 9, and E7 having 4. Furthermore, the invariant residues within E6/E7 were only identified to be Cysteine groups with C-X-X-C motif, which could suggest the formation of a complex as in zinc-finger proteins.

The E2 binding sites within the LCR were compiled and classified into high-risk, low-risk, and probable high-risk strains using the 12 nucleotide palindromic DNA sequences of the four E2 binding sites.

The results of this study provide new insights into the molecular biology of HPV. The HPV virus family evolved over 400 million years. Thus, invariant residues within the amino acid chains of these proteins could be significant for the structure and function of each protein and may help develop new diagnostic tests for HPV infection. The classification of the E2 binding sites into high-risk, low-risk, and probable high-risk strains could be used to develop new vaccines and therapeutic agents for HPV-associated diseases.