생물정보학 끄적끄적

GO Enrichment 분석 연습

Hazel Y. 2023. 12. 28. 15:04

 

이번 포스팅에서는 지카 바이러스 (ZIKV) 감염에 반응하는 upregulated 유전자들의 이름과 downregulated 유전자들의 이름을 가지고 GO enrichment 분석 연습을 해 보려 한다. 참고로, upregulated 유전자들은 baseline이나 reference state에 비해 activity가 증가하는 유전자들이고, 반대로 downregulated 유전자들의 activity는 감소한다. 해당 유전자 데이터는 2016년 Zika Virus Infects Human Cortical Neural Progenitors and Attenuates Their Growth라는 제목의 연구에 의해 distribute되었다.

 

Zika Virus Infects Human Cortical Neural Progenitors and Attenuates Their Growth - PubMed

The suspected link between infection by Zika virus (ZIKV), a re-emerging flavivirus, and microcephaly is an urgent global health concern. The direct target cells of ZIKV in the developing human fetus are not clear. Here we show that a strain of the ZIKV, M

pubmed.ncbi.nlm.nih.gov

 

하지만 해당 데이터의 위치가 연구 논문 상에 명확히 표기되어 있지 않기 때문에 직접 찾기란 쉽지 않다. 다행히 나는 내가 지금 공부하는 데 사용하고 있는 책(The Biostar Handbook: 2nd Edition)의 저자(István Albert)가 학생들이 쉽게 접근할 수 있도록 별도로 만든 링크를 통해 다운 받을 수 있었다. 그러나 내가 보고 있는 책은 오픈 소스가 아닌 관계로 그 링크를 직접적으로 공유하긴 어렵겠다고 판단이 되어 대신 내가 GO enrichment 분석 연습에 사용하기 위해 유전자 이름만 추출한 텍스트 파일들을 공유하려고 한다.

up-regulated.txt
0.02MB
down-regulated.txt
0.02MB


이번 GO enrichment 분석 연습에는 PANTHER18.0Enrichr을 tool로 사용하였고, molecular function ontology에 해당하는 결과를 도출해 보았다. PANTHER18.0 결과에서는 Fold Enrichment > 2인 GO를, Enrichr 결과에서는 Benjamini-Hochberg correction 방법으로 adjust된 p-value < 0.05인 GO를 바탕으로 upregulated 유전자들과 downregulated 유전자들에는 각각 어떤 activity들이 공통적으로 나타나는지 알아보는 게 이번 연습의 목표이다.

 

Gene Ontology Resource

The Gene Ontology (GO) project is a major bioinformatics initiative to develop a computational representation of our evolving knowledge of how genes encode biological functions at the molecular, cellular and tissue system levels.

geneontology.org

 

Enrichr

Recently, Enrichr was upgraded to support inputting a background. This can be done via the UI and the API. Another new feature that was recently added is the ability to export up to 100 annotated gene sets after performing a metadata search using the Term

maayanlab.cloud


1. Upregulated genes

- PANTHER18.0 결과에서 Fold Enrichment > 2인 GO와 Enrichr 결과에서 adjusted p-value < 0.05인 GO 모두 다수인 관계로 상위 10개 GO만 아래에 표로 정리해 보았다.

 

(A) PANTHER18.0

  Name Accession Definition Fold Enrichment FDR
1 Tau-protein kinase activity GO:0050321 Catalysis of the reaction: ATP + tau-protein = ADP + O-phospho-tau-protein. 3.91 8.58E-03
2 RNA polymerase II complex binding GO:0000993 Binding to an RNA polymerase II core enzyme, a multisubunit eukaryotic nuclear RNA polymerase typically composed of twelve subunits. 3.51 1.56E-03
3 Aminoacyl-tRNA ligase activity GO:0004812 Catalysis of the formation of aminoacyl-tRNA from ATP, amino acid, and tRNA with the release of diphosphate and AMP. 3.22 1.67E-03
4 K63-linked polyubiquitin modification-dependent protein binding GO:0070530 Binding to a protein upon poly-ubiquitination formed by linkages between lysine residues at position 63 in the target protein. 3.20 4.83E-02
5 Tau protein binding GO:0048156 Binding to tau protein. tau is a microtubule-associated protein, implicated in Alzheimer's disease, Down Syndrome and ALS. 3.14 2.13E-03
6 Protein serine/threonine/tyrosine kinase activity GO:0004712 Catalysis of the reactions: ATP + a protein serine = ADP + protein serine phosphate; ATP + a protein threonine = ADP + protein threonine phosphate; and ATP + a protein tyrosine = ADP + protein tyrosine phosphate. 3.14 1.55E-03
7 2-oxoglutarate-dependent dioxygenase activity GO:0016706 Catalysis of the reaction: A + 2-oxoglutarate + O2 = B + succinate + CO2. This is an oxidation-reduction (redox) reaction in which hydrogen or electrons are transferred from 2-oxoglutarate and one other donor, and one atom of oxygen is incorporated into each donor. 2.82 1.15E-03
8 Demethylase activity GO:0032451 Catalysis of the removal of a methyl group from a substrate. 2.70 3.54E-02
9 SMAD binding GO:0046332 Binding to a SMAD signaling protein. 2.55 1.89E-03
10 Protein kinase A binding GO:0051018 Binding to a protein kinase A. 2.53 3.21E-02

 

(B) Enrichr

  Name Accession Definition Adjusted p-value
1 Protein serine/threonine kinase activity GO:0004674 Catalysis of the reactions: ATP + protein serine = ADP + protein serine phosphate, and ATP + protein threonine = ADP + protein threonine phosphate. 5.138e-12
2 RNA binding GO:0003723 Binding to an RNA molecule or a portion thereof. 3.343e-11
3 Ubiquitin-protein transferase activity GO:0004842 Catalysis of the transfer of ubiquitin from one protein to another via the reaction X-Ub + Y = Y-Ub + X, where both X-Ub and Y-Ub are covalent linkages. 3.712e-9
4 Ubiquitin-like protein transferase activity GO:0019787 Catalysis of the transfer of a ubiquitin-like from one protein to another via the reaction X-ULP + Y = Y-ULP + X, where both X-ULP and Y-ULP are covalent linkages. ULP represents a ubiquitin-like protein. 1.079e-8
5 Ubiquitin-like protein ligase activity GO:0061659 Catalysis of the transfer of a ubiquitin-like protein (ULP) to a substrate protein via the reaction X-ULP + S = X + S-ULP, where X is either an E2 or E3 enzyme, the X-ULP linkage is a thioester bond, and the S-ULP linkage is an isopeptide bond between the C-terminal glycine of ULP and the epsilon-amino group of lysine residues in the substrate. 1.079e-8
6 RNA polymerase II transcription regulatory region sequence-specific DNA binding GO:0000977 Binding to a specific sequence of DNA that is part of a regulatory region that controls the transcription of a gene or cistron by RNA polymerase II. 1.079e-8
7 Cis-regulatory region sequence-specific DNA binding GO:0000987 Binding to a specific upstream regulatory DNA sequence (transcription factor recognition sequence or binding site, located in cis relative to the transcription start site (i.e., on the same strand of DNA) of a gene transcribed by some RNA polymerase. Cis-regulatory sites are often referred to as a sequence motifs, enhancers, or silencers. 2.542e-8
8 Ubiquitin protein ligase activity GO:0061630 Catalysis of the transfer of ubiquitin to a substrate protein via the reaction X-ubiquitin + S = X + S-ubiquitin, where X is either an E2 or E3 enzyme, the X-ubiquitin linkage is a thioester bond, and the S-ubiquitin linkage is an amide bond: an isopeptide bond between the C-terminal glycine of ubiquitin and the epsilon-amino group of lysine residues in the substrate or, in the linear extension of ubiquitin chains, a peptide bond the between the C-terminal glycine and N-terminal methionine of ubiquitin residues. 4.416e-8
9 RNA polymerase II cis-regulatory region sequence-specific DNA binding GO:0000978 Binding to a specific upstream regulatory DNA sequence (transcription factor recognition sequence or binding site) located in cis relative to the transcription start site (i.e., on the same strand of DNA) of a gene transcribed by RNA polymerase II. 7.794e-8
10 Transcription cis-regulatory region binding GO:0000976 Binding to a specific sequence of DNA that is part of a regulatory region that controls transcription of that section of the DNA. The transcribed region might be described as a gene, cistron, or operon. 9.208e-8

 

(C) 결과

- catabolic/anabolic processes

- transcription

- translation

- protein transport


2. Downregulated genes

 

(A) PANTHER18.0

  Name Accession Definition Fold Enrichment FDR
1 Single-stranded DNA helicase activity GO:0017116 Catalysis of the reaction: ATP + H2O = ADP + phosphate, in the presence of single-stranded DNA; drives the unwinding of a DNA helix. 5.18 3.04E-04
2 DNA secondary structure binding GO:0000217 Binding to a DNA secondary structure element such as a four-way junction, a bubble, a loop, Y-form DNA, or a double-strand/single-strand junction. 3.13 2.32E-02
3 Damaged DNA binding GO:0003684 Binding to damaged DNA. 3.03 3.15E-04
4 Extracellular matrix binding GO:0050840 Binding to a component of the extracellular matrix. 2.70 2.73E-02
5 Single-stranded DNA binding GO:0003697 Binding to single-stranded DNA. 2.61 3.19E-05
6 DNA nuclease activity GO:0004536 Catalysis of the hydrolysis of ester linkages within deoxyribonucleic acid. 2.49 4.88E-02
7 Extracellular matrix structural constituent GO:0005201 The action of a molecule that contributes to the structural integrity of the extracellular matrix. 2.25 2.63E-04

 

(B) Enrichr

  Name Accession Definition Adjusted p-value
1 Single-stranded DNA helicase activity GO:0017116 Catalysis of the reaction: ATP + H2O = ADP + phosphate, in the presence of single-stranded DNA; drives the unwinding of a DNA helix. 9.277e-7
2 Single-stranded DNA binding GO:0003697 Binding to single-stranded DNA. 0.000005917
3 Damaged DNA binding GO:0003684 Binding to damaged DNA. 0.00005720
4 DNA secondary structure binding GO:0000217 Binding to a DNA secondary structure element such as a four-way junction, a bubble, a loop, Y-form DNA, or a double-strand/single-strand junction. 0.001512
5 DNA exonuclease activity, producing 5'-phosphomonoesters GO:0016895 Catalysis of the hydrolysis of ester linkages within deoxyribonucleic acids by removing nucleotide residues from the 3' or 5' end to yield 5' phosphomonoesters. 0.003159

 

(C) 결과

- DNA replication/repair

- tissue maintenance


따라서 해당 연구의 데이터에 의하면, 지카 바이러스 감염 시, catabolic/anabolic processes, transcription, translation, 그리고 protein transport에 있어서는 activities가 증가하는 것으로 관찰된 반면, DNA replication/repair과 tissue maintenance에 있어서는 activities가 감소하는 것으로 관찰되었다.