Intro
aDiff
is an annotation tool for differential gene expression results generated by cuffdiff (Trapnell C., Nature Biotechnology, 2012).
It annotates cuffdiff outputs with ensembl gene ids, gene ontology terms and kegg ids.
Additonally it uses DAVIDs API (Huang DW, Nature Protoc., 2009; Huang DW, Nucleic Acids Res., 2009; Xiaoli J, Bioinformatics, 2012) to perform enrichment analysis.
A Cytoscape (Shannon P, Genome Research, 2003) instance running with the String (Szklarczyk D, Nucleic Acids Res., 2017) App installed can additionally be plugged in to generate expanded protein-protein interactions.
For a full RNAseq pipeline including aDiff
check: http://bioinformatics.age.mpg.de/presentations-tutorials/presentations/modules/rnaseq-tuxedo-update/#/intro
Examples
Example of an aDiff
call on a c. elegans dataset:
$ aDiff -D -i cuffdiff_output -o adiff_output \
-G references/cel.latest.ensembl.gtf \
-C cuffmerge_output/merged.gtf \
--DAVIDuser "<Registered.Email@david.com>" \
--organismtag CEL \
--cytoscape_host 'localhost' \
--cytoscape_port 1234
Example of an aDiff
call on a d. melanogaster dataset:
$ aDiff -D -i cuffdiff_output -o adiff_output \
-G references/Drosophila_melanogaster.BDGP6.90.gtf \
-C cuffmerge_output/merged.gtf \
--dataset dmelanogaster_gene_ensembl \
--filter flybase_gene_id \
--outputBiotypes 'flybase_gene_id gene_biotype' \
--outputGoterms 'flybase_gene_id go_id name_1006' \
--DAVIDid FLYBASE_GENE_ID \
--DAVIDuser "<Registered.Email@david.com>" \
--organismtag DMEL \
--species 'drosophila melanogaster' \
--cytoscape_host 'localhost' \
--cytoscape_port 1234
Example of an aDiff
call on a mus musculus dataset:
$ aDiff -i cufdiff_output -o adiff_output \
-G ensembl.mus_musculus.83.original.gtf \
-C cuffmerge_output/merged.gtf \
--TSV \
--dataset mmusculus_gene_ensembl \
-u "<Registered.Email@david.com>" \
--DAVIDid ENSEMBL_GENE_ID \
--host http://dec2015.archive.ensembl.org/biomart \
--organismtag MUS \
--species 'mus musculus' \
--cytoscape_host 'localhost' \
--cytoscape_port 1234
Example of an aDiff
call on a h. sapiens dataset:
$ aDiff -i cufdiff_output -o adiff_output \
-G ensembl.homo_sapiens.83.original.gtf \
-C cuffmerge_output/merged.gtf \
--TSV \
--dataset hsapiens_gene_ensembl \
-u "<Registered.Email@david.com>" \
--DAVIDid ENSEMBL_GENE_ID \
--host http://dec2015.archive.ensembl.org/biomart \
--organismtag HSA \
--species 'homo sapiens' \
--cytoscape_host 'localhost' \
--cytoscape_port 1234
Output files
Example of the output for the the h. sapiens call above.
-
diff_sig_geneexp.xlsx
this file reports significant differential gene expression. It is based on the gene_exp.diff file output of cuffdiff adding annotation columns to it. It contains one sheet for each pairwise comparison filtered to significant values (as defined in cuffdiff). -
diff_sig_iso.xlsx
this file reports significant differential isoform expression . It is based on the isoform_exp.diff file output of cuffdiff adding annotation columns to it. It contains one sheet for each pairwise comparison filtered to significant values (as defined in cuffdiff). -
diff_sig_prom.xlsx
this file reports significant differential promoter usage. It is based on the promoters.diff file output of cuffdiff adding annotation columns to it. It contains one sheet for each pairwise comparison filtered to significant values (as defined in cuffdiff). -
diff_sig_splic.xlsx
this file reprots significant differential splicing . It is based on the splicing.diff file output of cuffdiff adding annotation columns to it. It contains one sheet for each pairwise comparison filtered to significant values (as defined in cuffdiff). -
diff_sig_cds.xlsx
this file reports significant differential cds usage. It is based on the cds.diff file output of cuffdiff adding annotation columns to it. It contains one sheet for each pairwise comparison filtered to significant values (as defined in cuffdiff). -
geneexp_ALL.tsv
this file is based on the gene_exp.diff file output of cuffdiff adding annotation columns to it. -
iso_ALL.tsv
this file is based on the isoform_exp.diff file output of cuffdiff adding annotation columns to it. -
prom_ALL.tsv
this file is based on the promoters.diff file output of cuffdiff adding annotation columns to it. -
splic_ALL.tsv
this file is based on the splicing.diff file output of cuffdiff adding annotation columns to it. -
cds_ALL.tsv
this file is based on the cds.diff file output of cuffdiff adding annotation columns to it. -
diff_p.05.xlsx
contains a sheet for each of the files above (ie. geneexp_ALL.tsv, iso_ALL.tsv, prom_ALL.tsv, splic_ALL.tsv, cds_ALL.tsv ) subset to p values bellow 0.05. -
KEGG_PATHWAY_diff_sig_geneexp.xlsx
this file is based on the gene_exp.diff file output of cuffdiff. It generates a result sheet for each pairwise comparison. It reports DAVID enrichment results for KEGG using genes labeled as significant by cuffdiff. -
GOTERM_BP_FAT_diff_sig_splic.xlsx
this is file is based on the splicing.diff file output of cuffdiff. It generates a result sheet for each pairwise comparison. It reports DAVID enrichment results for Gene Ontology Biological Process (GOTERM BP) using genes labeled as significant by cuffdiff. -
OMIM_DISEASE_diff_sig_geneexp.xlsx
this file is based on the gene_exp.diff file output of cuffdiff. It generates a result sheet for each pairwise comparison. It reports DAVID enrichment results for OMIM DISEASE using genes labeled as significant by cuffdiff.
DAVID output columns:
-
categoryName: Category name. eg.: GOTERM_BP_FAT.
-
termName: Term name. eg.: GO:0048468~cell development.
-
listHits: Number of items in the query list matching this term.
-
percent: Percentage of items in the query list matching this term.
-
ease: ease test p value.
-
geneIds: gene ids.
-
Gene_name: gene name.
-
listTotals: number of genes in query list.
-
popHits: number of genes in background population list matching this term.
-
popTotals: number of genes in background population lis.
-
foldEnrichment: Fold enrichment.
-
bonferroni: Bonferroni corrected p values.
-
benjamini: Benjamini-Hochberg corrected p values.
-
afdr: False discovery rate.
More information on the standard ouput columns of cuffdiff can be found here.
The cytoscape
folder contains cytoscape session files cys
, as well as pdf
s and png
s of the generated networks. Networks are generated by String PPI queries allowing a 25% size expanasion and a confidence cuttoff of 0.4. It also generates a subnetwork by ranking the genes by abs(log2(fold change)) and selecting the top 10% of nodes with edges and the respective first neighbours as well as the same 10% slection but using difusion. Node color maps log2(fold change) - blue down, red up - while node border color and size map normalized expression.
Help
$ aDiff --help
aDiff is an annotation tool for differential gene expression results generated
by cuffdiff (Trapnell C., Nature Biotechnology, 2012).
usage: aDiff [-h] [-D] [-i INPUTFOLDER] [-o OUTPUTFOLDER] [-G ORIGINALGTF]
[-C CUFFCOMPAREGTF] [-f INPUTFILES] [-s SHORTOUTPUTNAME]
[--sigOnly] [--TSV] [--TSVall] [--description] [--listMarts]
[--mart MART] [--listDatasets] [--dataset DATASET]
[--listFilters] [--filter FILTER] [--listAttributes]
[--outputBiotypes OUTPUTBIOTYPES] [--outputGoterms OUTPUTGOTERMS]
[--KEGG] [--listKEGGorganisms] [--KEGGorg KEGGORG] [--findKEGGdb]
[--KEGGdb KEGGDB] [--DAVIDid DAVIDID] [--DAVIDcat DAVIDCAT]
[-u DAVIDUSER] [--host HOST] [--organismtag {DMEL,CEL,MUS,HSA}]
[--species SPECIES] [--limit LIMIT] [--cuttoff CUTTOFF]
[--taxon TAXON] [--cytoscape_host CYTOSCAPE_HOST]
[--cytoscape_port CYTOSCAPE_PORT]
optional arguments:
-h, --help show this help message and exit
-D, --DAVID Use this flag to perform DAVID GO enrichment analysis
(default: False)
-i INPUTFOLDER, --inputFolder INPUTFOLDER
Cuffdiff output folder (default: None)
-o OUTPUTFOLDER, --outputFolder OUTPUTFOLDER
Output folder (default: None)
-G ORIGINALGTF, --originalGTF ORIGINALGTF
Original/downloaded GTF (default: None)
-C CUFFCOMPAREGTF, --cuffcompareGTF CUFFCOMPAREGTF
Merged cuffcompared GTF (default: None)
-f INPUTFILES, --inputFiles INPUTFILES
Implies -s. Use this option to select which *.diff
files you wish to analyse.'. (default: gene_exp.diff
promoters.diff splicing.diff cds.diff
isoform_exp.diff)
-s SHORTOUTPUTNAME, --shortOutputName SHORTOUTPUTNAME
Use this option to select a short outpput name for
each *.diff file used in '-f'. No '.' (dots) allowed.
(default: geneexp prom splic cds iso)
--sigOnly Only create report tables for cuffdiff-labeled
significantly changed genes (default: False)
--TSV For p values > = 0.05 write tables as tab separated
values (default: False)
--TSVall Save p < 0.05 save tables as tab separated values in a
folder called TSV (default: False)
--description Get a description of what this script does. (default:
False)
--listMarts List biomaRt Marts (default: False)
--mart MART Your mart of choice. (default: ENSEMBL_MART_ENSEMBL)
--listDatasets List datasets for your mart (default: False)
--dataset DATASET Dataset of your choice. (default:
celegans_gene_ensembl)
--listFilters List available filters (default: False)
--filter FILTER Filter to use to identify your genes. (default:
ensembl_gene_id)
--listAttributes List available attributes for your dataset. (default:
False)
--outputBiotypes OUTPUTBIOTYPES
Outputs/attributes for your biotypes data. Order has
to be kept, ie. first IDs then biotype. (default:
ensembl_gene_id gene_biotype)
--outputGoterms OUTPUTGOTERMS
Outputs/attributes for your goterms data. Order has to
be kept, ie. 1st gene_id, then go_id, then
go_term_name (default: ensembl_gene_id go_id
name_1006)
--KEGG Add KEGG annotations (default: False)
--listKEGGorganisms List KEGG organisms. (default: False)
--KEGGorg KEGGORG KEGG organism. (default: cel)
--findKEGGdb KEGG has DB identifier for each linked DB. Use this
function to find the label of your DB, eg: 'ensembl-
hsa', 'FlyBase'. This option requires --originalGTF
and --KEGGorg (default: False)
--KEGGdb KEGGDB KEGG database linked to your ensembl organism.
(default: EnsemblGenomes-Gn)
--DAVIDid DAVIDID DAVID's id for your dataset. List of ids available in
http://david.abcc.ncifcrf.gov/content.jsp?file=DAVID_A
PI.html#input_list (default: WORMBASE_GENE_ID)
--DAVIDcat DAVIDCAT DAVID's categories you wish to analyse. List of
available categories in https://david.ncifcrf.gov/cont
ent.jsp?file=DAVID_API.html#approved_list. (default: G
OTERM_BP_FAT,GOTERM_CC_FAT,GOTERM_MF_FAT,KEGG_PATHWAY,
PFAM,PROSITE,GENETIC_ASSOCIATION_DB_DISEASE,OMIM_DISEA
SE)
-u DAVIDUSER, --DAVIDuser DAVIDUSER
Your DAVID's user id. example: 'John.Doe@age.mpg.de'
(default: None)
--host HOST Ensembl host. Check http://www.ensembl.org/info/websit
e/archives/index.html for older releases. (default:
http://www.ensembl.org/biomart)
--organismtag {DMEL,CEL,MUS,HSA}
Organism tag. (default: None)
--species SPECIES Species for string app query. eg. 'caenorhabditis
elegans', 'drosophila melanogaster', 'mus musculus',
'homo sapiens'. Default='caenorhabditis elegans'
(default: caenorhabditis elegans)
--limit LIMIT Limit for string app query. Number of extra genes to
recover. If None, limit=N(query_genes)*.25 (default:
None)
--cuttoff CUTTOFF Confidence cuttoff for sting app query. Default=0.4
(default: 0.4)
--taxon TAXON Taxon id for string app query. For the species shown
above, taxon id will be automatically identified.
(default: None)
--cytoscape_host CYTOSCAPE_HOST
Host address for cytoscape. (default: None)
--cytoscape_port CYTOSCAPE_PORT
Cytoscape port. (default: None)