organismsKEGG
Lists all organisms present in the KEGG database.
organismsKEGG()
returns
a dataframe containing one organism per row.
>>> import AGEpy as age
>>> KEGGo=age.organismsKEGG()
>>> print KEGGo.head()
0 1 2 \
0 T01001 hsa Homo sapiens (human)
1 T01005 ptr Pan troglodytes (chimpanzee)
2 T02283 pps Pan paniscus (bonobo)
3 T02442 ggo Gorilla gorilla gorilla (western lowland gorilla)
4 T01416 pon Pongo abelii (Sumatran orangutan)
3
0 Eukaryotes;Animals;Vertebrates;Mammals
1 Eukaryotes;Animals;Vertebrates;Mammals
2 Eukaryotes;Animals;Vertebrates;Mammals
3 Eukaryotes;Animals;Vertebrates;Mammals
4 Eukaryotes;Animals;Vertebrates;Mammals
databasesKEGG
Finds KEGG database identifiers for a respective organism given example ensembl ids.
databasesKEGG(organism,ens_ids)
organism
an organism as listed in organismsKEGG()ens_ids
a list of ensenbl ids of the respective organismreturns
nothing if no database was found, or a string if a database was found
>>> import AGEpy as age
>>> print sigGenes[:10]
['ENSG00000272449', 'ENSG00000130762', 'ENSG00000083444', 'ENSG00000162493', 'ENSG00000253368', 'ENSG00000235912', 'ENSG00000169174', 'ENSG00000240563', 'ENSG00000200174', 'ENSG00000162607']
>>> KEGGd=age.databasesKEGG("hsa",sigGenes)
For hsa the following db was found: Ensembl
>>> print KEGGd.head()
Ensembl
ensembl_to_kegg
Looks up KEGG mappings of KEGG ids to ensembl ids.
ensembl_to_kegg(organism,kegg_db)
organism
an organisms as listed in organismsKEGG()kegg_db
a matching KEGG db as reported in databasesKEGGreturns
a Pandas dataframe of with 'KEGGid' and 'ENSid'.
>>> import AGEpy as age
>>> KEGGi=age.ensembl_to_kegg("hsa","Ensembl")
KEGG API: http://rest.genome.jp/link/Ensembl/hsa
>>> print KEGGi.head()
KEGGid ENSid
0 hsa:1 ENSG00000121410
1 hsa:10 ENSG00000156006
2 hsa:100 ENSG00000196839
3 hsa:1000 ENSG00000170558
4 hsa:10000 ENSG00000117020
ecs_idsKEGG
Uses KEGG to retrieve all ids and respective ecs for a given KEGG organism.
ecs_idsKEGG(organism)
organism
an organisms as listed in organismsKEGG()returns
a Pandas dataframe of with 'ec' and 'KEGGid'.
>>> import AGEpy as age
>>> KEGGie=age.ecs_idsKEGG("hsa")
>>> print KEGGie.head()
ec KEGGid
0 ec:2.7.11.1 hsa:9344
1 ec:2.7.11.1 hsa:5894
2 ec:2.7.11.1 hsa:673
3 ec:2.7.12.2 hsa:5607
4 ec:2.7.11.24 hsa:5598
idsKEGG
Uses KEGG to retrieve all ids for a given KEGG organism.
idsKEGG(organism)
organism
an organism as listed in organismsKEGG()returns
a Pandas dataframe of with 'gene_name' and 'KEGGid'.
>>> import AGEpy as age
>>> KEGGids = age.idsKEGG("hsa")
>>> print KEGGids.head()
gene_name KEGGid
0 uncharacterized LOC100287010 hsa:100287010
1 uncharacterized LOC100288846 hsa:100288846
2 DKFZp434L192 hsa:222029
3 uncharacterized protein FLJ30679 hsa:146512
4 uncharacterized LOC100128288 hsa:100128288
pathwaysKEGG
Retrieves all pathways for a given organism.
pathwaysKEGG(organism)
organism
an organism as listed in organismsKEGG()returns df
a Pandas dataframe with the columns 'KEGGid','pathIDs', and 'pathName'.returns df_
a Pandas dataframe with a columns for 'KEGGid', and one column for each pathway with the corresponding gene ids below
>>> import AGEpy as age
>>> KEGGp, KEGGp_ =age.pathwaysKEGG("hsa")
>>> print KEGGp.head()
KEGGid pathID \
0 hsa:10 path:hsa00232, path:hsa01100, path:hsa00983, p...
1 hsa:100 path:hsa05340, path:hsa01100, path:hsa00230
2 hsa:1000 path:hsa04514, path:hsa05412
3 hsa:10000 path:hsa04211, path:hsa04630, path:hsa04152, p...
4 hsa:10005 path:hsa01100, path:hsa00120, path:hsa04146
pathName
0 Caffeine metabolism - Homo sapiens (human), Dr...
1 Primary immunodeficiency - Homo sapiens (human...
2 Arrhythmogenic right ventricular cardiomyopath...
3 Longevity regulating pathway - multiple specie...
4 Metabolic pathways - Homo sapiens (human), Pri...
>>> print KEGGp_.head()
KEGGid path:hsa04151 path:hsa00250 path:hsa05230 path:hsa00790 \
0 hsa:10327 NaN NaN NaN NaN
1 hsa:124 NaN NaN NaN NaN
2 hsa:125 NaN NaN NaN NaN
3 hsa:126 NaN NaN NaN NaN
4 hsa:127 NaN NaN NaN NaN
path:hsa04610 path:hsa00020 path:hsa04612 path:hsa01524 path:hsa01521 \
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
... path:hsa04672 path:hsa00730 path:hsa04670 path:hsa05168 \
0 ... NaN NaN NaN NaN
1 ... NaN NaN NaN NaN
2 ... NaN NaN NaN NaN
3 ... NaN NaN NaN NaN
4 ... NaN NaN NaN NaN
path:hsa00531 path:hsa00532 path:hsa00533 path:hsa00534 path:hsa04914 \
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
path:hsa04340
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
>>> print KEGGp_[["path:hsa04151"]].dropna().head()
path:hsa04151
27 hsa:2538
41 hsa:5105
42 hsa:5106
58 hsa:57818
66 hsa:92579
KEGGmatrix
Looks for all KEGG annotatios of an organism in biomart and the respective pathways in KEGG. It can also retrieve links to pathways figures with red labeled genes provided in a dataframe.
KEGGmatrix(organism, dataset, query_attributes=['ensembl_gene_id', 'kegg_enzyme'], host=biomart_host, links=True, dfexp=None, kegg_db=None, database=None )
organism
a KEGG organism identifierdataset
a biomaRt datasetquery_attributes
biomaRt query attributes, the name can change but the output should stay in the same order ie. 'ensembl_gene_id','kegg_enzyme'host
biomart_hostlinks
if True, returns df_linksdfexp
a Pandas dataframe with the following columns: 'ensembl_gene_id', 'log2FC'kegg_db
a KEGG database as recovered by the databasesKEGG functiondatabase
a biomaRt database, depecrated, default=None.returns df
a Pandas dataframe with the 'KEGGid','pathsIDs','pathName','ensembl_gene_id','kegg_enzyme'returns df_
a matrix with a column for each KEGG pathway for a given organism and the expression values in the respective dfexp in parameterreturns fullmatrix
a matrix with a column for each KEGG pathway for a given organismreturns df_links
a dataframe with links for each pathway and the links in the dfexp highlighted red (if df_links.
>>> import AGEpy as age
>>> print dge.head()
ensembl_gene_id log2FC
0 ENSG00000272449 1.859500
1 ENSG00000130762 0.601051
2 ENSG00000083444 -0.881957
3 ENSG00000162493 -0.638433
4 ENSG00000253368 0.654517
>>> results=age.KEGGmatrix("hsa",'hsapiens_gene_ensembl', dfexp=dge, kegg_db="Ensembl" )
>>> print results[0]
index KEGGid pathIDs \
0 58519 hsa:8813 path:hsa00510, path:hsa01100
1 120375 hsa:8711 NaN
2 120515 hsa:6725 NaN
3 48229 hsa:6714 path:hsa04360, path:hsa04611, path:hsa05100, p...
4 20996 hsa:2534 path:hsa04725, path:hsa04360, path:hsa05416, p...
pathName ensembl_gene_id \
0 Metabolic pathways - Homo sapiens (human), N-G... ENSG00000000419
1 NaN ENSG00000000938
2 NaN ENSG00000000938
3 Mitophagy - animal - Homo sapiens (human), EGF... ENSG00000000938
4 Axon guidance - Homo sapiens (human), Adherens... ENSG00000000938
kegg_enzyme
0 ec:2.4.1.83
1 ec:2.7.10.2
2 ec:2.7.10.2
3 ec:2.7.10.2
4 ec:2.7.10.2
>>> print results[1]
ensembl_gene_id kegg_enzyme KEGGid log2FC path:hsa04151 \
26 ENSG00000149930 ec:2.7.11.1 hsa:10000 0.456776 0.456776
251 ENSG00000183421 ec:2.7.11.1 hsa:10000 0.407559 0.407559
476 ENSG00000112079 ec:2.7.11.1 hsa:10000 -0.498607 -0.498607
701 ENSG00000196730 ec:2.7.11.1 hsa:10000 0.920654 0.920654
926 ENSG00000123572 ec:2.7.11.1 hsa:10000 2.108620 2.108620
path:hsa00250 path:hsa05230 path:hsa00790 path:hsa04610 \
26 NaN 0.456776 NaN NaN
251 NaN 0.407559 NaN NaN
476 NaN -0.498607 NaN NaN
701 NaN 0.920654 NaN NaN
926 NaN 2.108620 NaN NaN
path:hsa00020 ... path:hsa04672 path:hsa00730 \
26 NaN ... NaN NaN
251 NaN ... NaN NaN
476 NaN ... NaN NaN
701 NaN ... NaN NaN
926 NaN ... NaN NaN
path:hsa04670 path:hsa05168 path:hsa00531 path:hsa00532 \
26 NaN NaN NaN NaN
251 NaN NaN NaN NaN
476 NaN NaN NaN NaN
701 NaN NaN NaN NaN
926 NaN NaN NaN NaN
path:hsa00533 path:hsa00534 path:hsa04914 path:hsa04340
26 NaN NaN 0.456776 NaN
251 NaN NaN 0.407559 NaN
476 NaN NaN -0.498607 NaN
701 NaN NaN 0.920654 NaN
926 NaN NaN 2.108620 NaN
>>> print results[2]
KEGGid path:hsa04151 path:hsa00250 path:hsa05230 path:hsa00790 \
0 hsa:10327 NaN NaN NaN NaN
1 hsa:124 NaN NaN NaN NaN
2 hsa:125 NaN NaN NaN NaN
3 hsa:126 NaN NaN NaN NaN
4 hsa:127 NaN NaN NaN NaN
path:hsa04610 path:hsa00020 path:hsa04612 path:hsa01524 path:hsa01521 \
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
... path:hsa04672 path:hsa00730 path:hsa04670 path:hsa05168 \
0 ... NaN NaN NaN NaN
1 ... NaN NaN NaN NaN
2 ... NaN NaN NaN NaN
3 ... NaN NaN NaN NaN
4 ... NaN NaN NaN NaN
path:hsa00531 path:hsa00532 path:hsa00533 path:hsa00534 path:hsa04914 \
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
path:hsa04340
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
>>> print results[3]
URL pathway
0 http://www.kegg.jp/kegg-bin/show_pathway?hsa04... hsa04151
1 http://www.kegg.jp/kegg-bin/show_pathway?hsa00... hsa00250
2 http://www.kegg.jp/kegg-bin/show_pathway?hsa05... hsa05230
3 http://www.kegg.jp/kegg-bin/show_pathway?hsa00... hsa00790
4 http://www.kegg.jp/kegg-bin/show_pathway?hsa04... hsa04610
>>> print results[3].loc[0,"URL"]
http://www.kegg.jp/kegg-bin/show_pathway?hsa04151/ec:2.7.11.1%09red/ec:2.7.11.22%09red/ec:2.7.10.1%09red/ec:3.1.3.16%09red/ec:2.7.10.2%09red/ec:2.3.2.27%09red/ec:1.14.13.39%09red/ec:2.7.12.2%09red/ec:3.1.3.48%09red