CellPlot

Python implementation of the CellPlot from the CellPlot package for R. -inf or inf enrichments will come out as min found float or max found float, respectively.

CellPlot(df, output_file=None, gene_expression="log2FC", figure_title="CellPlot", pvalCol="elimFisher", lowerLimit=None, upperLimit=None, colorBarType='Spectral')

  • df pandas dataframe with the following columns - 'Enrichment', 'Term', and 'log2fc'. For log2fc each cell must contain a comma separated string with the log2fc for the genes enriched in the respective term. eg. '-inf,-1,2,3.4,3.66,inf'
  • output_file prefix for an output file. If given it will create output_file.CellPlot.svg and output_file.CellPlot.png
  • gene_expression label for the color gradiant bar.
  • figure_title Figure title.
  • pvalCol name of the column containing the p values to determine if the terms should be marked as NS - not significant, use None for no marking
  • lowerLimit lower limit for the heatmap bar (default is the 0.1 percentile)
  • upperLimit upper limit for the heatmap bar (default is the 0.9 percentile)
  • colorBarType type of heatmap, 'Spectral' is default, alternative eg. 'seismic'
  • returns a matplotlib figure
>>> import AGEpy as age
>>> print df.head()

Term  Annotated  Enrichment  \
0          GO:0008544~epidermis development         38    4.006021   
1               GO:0043588~skin development         33    4.359840   
2         GO:0045087~innate immune response         61    2.385984   
3               GO:0006952~defense response         90    1.913315   
4  GO:0009605~response to external stimulus        113    1.736641   

ease                                             log2fc  
0  1.193931e-12  1.13845,0.771811,0.926561,0.578588,-0.694105,1...  
1  4.757460e-12  1.13845,0.926561,-0.694105,1.48945,0.94486,-1....  
2  5.609421e-10  -1.91507,-0.630414,-1.87466,-0.898252,0.458041...  
3  2.238959e-09  -0.538926,0.667335,-1.91507,-0.630414,-1.87466...  
4  3.051460e-09  0.667335,-1.91507,-0.630414,1.46227,0.755911,-...  

>>> cellplot=age.CellPlot(df[:20], "cellplot",  "log2(mt/wt)", "mutant 1", \
pvalCol="ease", colorBarType="bwr", lowerLimit=-1.25,upperLimit=1.25)

cellpot


SymPlot

Python implementation of the SymPlot from the CellPlot package for R. -inf or inf enrichments will come out as min found float or max found float, respectively.

SymPlot(df,output_file=None,figure_title="SymPlot",pvalCol="elimFisher")

  • df pandas dataframe with the following columns - 'Enrichment', 'Significant', 'Annotated', 'Term', and 'log2fc'. 'Annotated'i stands for number of genes annotated with the respective GO term. As reported in DAVID by listHits. For log2fc each cell must contain a comma separated string with the log2fc for the genes enriched in the respective term. eg. '-inf,-1,2,3.4,3.66,inf'
  • output_file prefix for an output file. If given it witll create output_file.SymPlot.svg and output_file.SymPlot.png
  • figure_title Figure title.
  • pvalCol name of the column containing the p values to determine if the terms should be marked as NS - not significant, use None for no marking
  • returns a matplotlib figure
>>> import AGEpy as age
>>> symplot=age.SymPlot(df[:20],"symplot", "mutant 1",pvalCol="ease")

sympot


MA

Plots an MA like plot.

MA(df, title, figName, c, daType="counts", nbins=10, perc=.5, deg=3, eq=True, splines=True, spec=None, Targets=None, ylim=None, sizeRed=8)

  • df dataframe output of GetData()
  • title plot title, 'Genes' or 'Transcripts'
  • figName /path/to/saved/figure/prefix
  • c pair of samples to be plotted in list format
  • daType data type, ie. 'counts' or 'FPKM'
  • nbins number of bins on normalized intensities to fit the splines
  • per log2(fold change) percentil to which the splines will be fitted
  • deg degress of freedom used to fit the splines
  • eq if true assumes for each bin that the lower and upper values are equally distant to 0, taking the smaller distance for both
  • splines plot splines, default=True
  • spec list of ids to be highlighted
  • Targets list of ids that will be highlighted if outside of the fitted splines
  • ylim a list of limits to apply on the y-axis of the plot
  • sizeRed size of the highlight marker
  • returns df_ a Pandas dataframe similar to the GetData() output with normalized intensities and spline outbounds rows marked as 1.
  • returns red list of ids that are highlighted
>>> import AGEpy as age
>>> print df.head()

gene_id                    gene  wt0  wt20  log2(wt20/wt0)  \
0  ENSG00000223972                 DDX11L1  0.0   0.0             NaN   
1  ENSG00000243485  MIR1302-2,RP11-34P13.3  0.0   0.0             NaN   
2  ENSG00000274890  MIR1302-2,RP11-34P13.3  0.0   0.0             NaN   
3  ENSG00000268020                  OR4G4P  0.0   0.0             NaN   
4  ENSG00000240361                 OR4G11P  0.0   0.0             NaN   

p_value  q_value significant  
0      1.0      1.0          no  
1      1.0      1.0          no  
2      1.0      1.0          no  
3      1.0      1.0          no  
4      1.0      1.0          no  

>>> madf1,sig1=age.MA(dge_, 'Genes',"MA1",["wt0","wt20"], daType="FPKM")

ma1

>>> sigGenes=df[df["significant"=="yes"]]["gene_id"].tolist()
>>> madf2,sig2=age.MA(dge_, 'Genes',"MA2", ["wt0","wt20"], splines=False, daType="FPKM",spec=sigGenes)

ma2

>>> madf3,sig3=age.MA(dge_, 'Genes',"MA3", ["wt0","wt20"], splines=True, daType="FPKM",Targets=sigGenes)

ma3