DAVIDenrich
Queries the DAVID database for an enrichment analysis. Check https://david.ncifcrf.gov/content.jsp?file=DAVID_API.html for database == "type" tag and categories == "annot" tag.
DAVIDenrich(database, categories, user, ids, ids_bg = None, name = '', name_bg = '', verbose = False, p = 0.1, n = 2)
database
A string for the database to query, e.g. 'WORMBASE_GENE_ID'categories
A comma separated string with databasesuser
A user ID registered at DAVID for queryingids
A list with identifiersname
A string with the name for the query setids_bg
A list with the background identifiers to enrich against, 'None' for whole setname_bg
A string with the name for the background setp
Maximum p value for enrichment of a termn
Minimum number of genes within a termreturns
None if no ids match the queried database, or a pandas dataframe with results
>>> import AGEpy as age
>>> print sigGenes[:10]
[u'WBGene00022275', u'WBGene00004418', u'WBGene00018774',
u'WBGene00018772', u'WBGene00018958', u'WBGene00021662',
u'WBGene00255594', u'WBGene00021658', u'WBGene00021026',
u'WBGene00022042']
>>> categories=['GOTERM_BP_FAT', 'GOTERM_CC_FAT', 'GOTERM_MF_FAT', 'KEGG_PATHWAY','BIOCARTA', 'PFAM', 'PROSITE' ]
>>> DAVIDdf=age.DAVIDenrich('WORMBASE_GENE_ID', categories, 'email.registered@david.com', sigGenes)
>>> print DAVIDdf.head()
categoryName termName listHits \
0 GOTERM_BP_FAT GO:0006412~translation 177
1 GOTERM_BP_FAT GO:0006518~peptide metabolic process 198
2 GOTERM_BP_FAT GO:0043043~peptide biosynthetic process 177
3 GOTERM_BP_FAT GO:0043604~amide biosynthetic process 180
4 GOTERM_BP_FAT GO:0043603~cellular amide metabolic process 206
percent ease \
0 5.85704831238 4.32627669357e-43
1 6.55195234944 1.36601477909e-42
2 5.85704831238 4.04090150003e-42
3 5.95632031767 1.05565138148e-40
4 6.81667769689 3.74871147863e-40
geneIds listTotals popHits \
0 WBGENE00002063, WBGENE00013678, WBGENE00006626... 1878 379
1 WBGENE00002063, WBGENE00006626, WBGENE00007584... 1878 455
2 WBGENE00002063, WBGENE00013678, WBGENE00006626... 1878 384
3 WBGENE00002063, WBGENE00013678, WBGENE00006626... 1878 402
4 WBGENE00002063, WBGENE00006626, WBGENE00007584... 1878 499
popTotals foldEnrichment bonferroni benjamini \
0 11221 2.79042292227 1.28576943333e-39 1.28576943333e-39
1 11221 2.60009830425 4.05979592345e-39 2.02989796172e-39
2 11221 2.75408929047 1.20095592581e-38 4.00318641936e-39
3 11221 2.6753612131 3.13739590576e-37 7.84348976441e-38
4 11221 2.46662227543 1.11411705145e-36 2.2282341029e-37
afdr
0 7.78207551683e-40
1 2.45717759656e-39
2 7.26874466353e-39
3 1.89889814083e-37
4 6.7431553467e-37
DAVIDgetGeneAttribute
Returns a list of gene names for given gene ids.
DAVIDgetGeneAttribute(x, df, refCol="ensembl_gene_id", fieldTOretrieve="gene_name")
x
a string with the list of IDs separated by ', 'df
a dataframe with the reference column and a the column to retrieverefCol
the header of the column containing the identifiersfieldTOretrieve
the field to retrieve from parsedGTF eg. 'gene_name'returns
list of fieldTOretrieve separeted by ', ' in the same order as the given in x
>>> import AGEpy as age
>>> print df.head()
ensembl_gene_id gene locus sample_1 sample_2 status \
0 WBGene00022275 Y74C9A.1 I:43732-44677 N2 daf2 OK
1 WBGene00004418 F53G12.9,rpl-7 I:111037-113672 N2 daf2 OK
2 WBGene00018774 F53G12.9,rpl-7 I:111037-113672 N2 daf2 OK
3 WBGene00018772 F53G12.4 I:134336-137282 N2 daf2 OK
4 WBGene00018958 F56C11.6 I:171339-175991 N2 daf2 OK
value_1 value_2 log2(fold_change) test_stat p_value q_value \
0 0.195901 0.986634 2.332390 2.32959 0.00570 0.031216
1 3354.820000 2463.480000 -0.445539 -2.71381 0.00005 0.000556
2 3354.820000 2463.480000 -0.445539 -2.71381 0.00005 0.000556
3 1.235670 2.992460 1.276040 3.16508 0.00005 0.000556
4 2.651180 3.795600 0.517696 1.73994 0.00410 0.024157
significant GO_id \
0 yes NaN
1 yes GO:0003735; GO:0000463; GO:0044822; GO:0002181...
2 yes NaN
3 yes NaN
4 yes GO:0016787; GO:0005615; GO:0004104
GO_term gene_biotype \
0 NaN protein_coding
1 structural constituent of ribosome; maturation... protein_coding
2 NaN protein_coding
3 NaN protein_coding
4 hydrolase activity; extracellular space; choli... protein_coding
NormInt evidence
0 -0.356904 no
1 3.458609 no
2 3.458609 no
3 0.283965 no
4 0.501360 no
>>> print DAVIDdf.head()
categoryName termName listHits \
0 GOTERM_BP_FAT GO:0006412~translation 177
1 GOTERM_BP_FAT GO:0006518~peptide metabolic process 198
2 GOTERM_BP_FAT GO:0043043~peptide biosynthetic process 177
3 GOTERM_BP_FAT GO:0043604~amide biosynthetic process 180
4 GOTERM_BP_FAT GO:0043603~cellular amide metabolic process 206
percent ease \
0 5.85704831238 4.32627669357e-43
1 6.55195234944 1.36601477909e-42
2 5.85704831238 4.04090150003e-42
3 5.95632031767 1.05565138148e-40
4 6.81667769689 3.74871147863e-40
geneIds listTotals popHits \
0 WBGENE00002063, WBGENE00013678, WBGENE00006626... 1878 379
1 WBGENE00002063, WBGENE00006626, WBGENE00007584... 1878 455
2 WBGENE00002063, WBGENE00013678, WBGENE00006626... 1878 384
3 WBGENE00002063, WBGENE00013678, WBGENE00006626... 1878 402
4 WBGENE00002063, WBGENE00006626, WBGENE00007584... 1878 499
popTotals foldEnrichment bonferroni benjamini \
0 11221 2.79042292227 1.28576943333e-39 1.28576943333e-39
1 11221 2.60009830425 4.05979592345e-39 2.02989796172e-39
2 11221 2.75408929047 1.20095592581e-38 4.00318641936e-39
3 11221 2.6753612131 3.13739590576e-37 7.84348976441e-38
4 11221 2.46662227543 1.11411705145e-36 2.2282341029e-37
afdr
0 7.78207551683e-40
1 2.45717759656e-39
2 7.26874466353e-39
3 1.89889814083e-37
4 6.7431553467e-37
>>> gene_names=df[["ensembl_gene_id","gene"]].drop_duplicates()
>>> DAVIDdf["gene_names"]=DAVIDdf["geneIds"].apply(lambda x: \
age.DAVIDgetGeneAttribute(x,\
gene_names,\
refCol="ensembl_gene_id",\
fieldTOretrieve="gene"))
>>> print DAVIDdf.head()
categoryName termName listHits \
0 GOTERM_BP_FAT GO:0006412~translation 177
1 GOTERM_BP_FAT GO:0006518~peptide metabolic process 198
2 GOTERM_BP_FAT GO:0043043~peptide biosynthetic process 177
3 GOTERM_BP_FAT GO:0043604~amide biosynthetic process 180
4 GOTERM_BP_FAT GO:0043603~cellular amide metabolic process 206
percent ease \
0 5.85704831238 4.32627669357e-43
1 6.55195234944 1.36601477909e-42
2 5.85704831238 4.04090150003e-42
3 5.95632031767 1.05565138148e-40
4 6.81667769689 3.74871147863e-40
geneIds listTotals popHits \
0 WBGENE00002063, WBGENE00013678, WBGENE00006626... 1878 379
1 WBGENE00002063, WBGENE00006626, WBGENE00007584... 1878 455
2 WBGENE00002063, WBGENE00013678, WBGENE00006626... 1878 384
3 WBGENE00002063, WBGENE00013678, WBGENE00006626... 1878 402
4 WBGENE00002063, WBGENE00006626, WBGENE00007584... 1878 499
popTotals foldEnrichment bonferroni benjamini \
0 11221 2.79042292227 1.28576943333e-39 1.28576943333e-39
1 11221 2.60009830425 4.05979592345e-39 2.02989796172e-39
2 11221 2.75408929047 1.20095592581e-38 4.00318641936e-39
3 11221 2.6753612131 3.13739590576e-37 7.84348976441e-38
4 11221 2.46662227543 1.11411705145e-36 2.2282341029e-37
afdr gene_names
0 7.78207551683e-40 ife-5, Y105E8A.20, tsn-1, yars-1, ife-3, C14C1...
1 2.45717759656e-39 ife-5, tsn-1, C14C10.1, ife-3, rps-30, iff-2, ...
2 7.26874466353e-39 ife-5, Y105E8A.20, tsn-1, yars-1, ife-3, C14C1...
3 1.89889814083e-37 ife-5, Y105E8A.20, tsn-1, yars-1, ife-3, C14C1...
4 6.7431553467e-37 ife-5, tsn-1, C14C10.1, ife-3, rps-30, Y51H4A....
DAVIDplot
Queries the DAVID database for an enrichment analysis and plots CellPlots as well as SymPlots (see plots) using the 20 most significant terms. Check https://david.ncifcrf.gov/content.jsp?file=DAVID_API.html for database == "type" tag and categories == "annot" tag.
DAVIDplot(database, categories, user, df_ids, output, df_ids_bg = None, name = '', name_bg = '', verbose = False, p = 0.1, n = 2)
database
a string for the database to query, e.g. 'WORMBASE_GENE_ID'categories
a comma separated string with databasesuser
a user ID registered at DAVID for queryingdf_ids
a dataframe where the first column contains the identifiers to be queried and the second column the respective log2fc for each identifier.output
/path/to/output/prefixdf_ids_bg
a dataframe where the first column contains the identifiers to be used as background. 'None' for whole setname
a string with the name for the query setname_bg
a string with the name for the background setp
Maximum p value for enrichment of a term-
n
Minimum number of genes within a term -
returns
nothing
>>> import AGEpy as age
>>> print df.head()
ensembl_gene_id log2(fold_change)
0 ENSG00000272449 1.859500
1 ENSG00000130762 0.601051
2 ENSG00000083444 -0.881957
3 ENSG00000162493 -0.638433
4 ENSG00000253368 0.654517
>>> categories=['GOTERM_BP_FAT', 'GOTERM_CC_FAT', 'GOTERM_MF_FAT', 'KEGG_PATHWAY','BIOCARTA', 'PFAM', 'PROSITE' ]
>>> DAVIDdf=DAVIDplot('ENSEMBL_GENE_ID', categories, 'email.registered@david.com', df, "/usr/home/JDoe/mydataset")