BLASTquery
Performs a blast query online. As in https://ncbi.github.io/blast-cloud/
BLASTquery(query,database,program,filter=None, format_type=None, expect=None, nucl_reward=None, nucl_penalty=None, gapcosts=None, matrix=None, hitlist_size=None, descriptions=None, alignments=None, ncbi_gi=None, threshold=None, word_size=None, composition_based_statistics=None, organism=None, others=None, num_threads=None, baseURL="http://blast.ncbi.nlm.nih.gov", verbose=False)
querySearch query. Allowed values: Accession, GI, or FASTA.databaseBLAST database. Allowed values: nt, nr, refseq_rna, refseq_protein, swissprot, pdbaa, pdbntprogramBLAST program. Allowed values: blastn, megablast, blastp, blastx, tblastn, tblastxfilterLow complexity filtering. Allowed values: F to disable. T or L to enable. Prepend “m” for mask at lookup (e.g., mL)format_typeReport type. Allowed values: HTML, Text, XML, XML2, JSON2, or Tabular. HTML is the default.expectExpect value. Allowed values: Number greater than zero.nucl_rewardReward for matching bases (BLASTN and megaBLAST). Allowed values: Integer greater than zero.nucl_penaltyCost for mismatched bases (BLASTN and megaBLAST). Allowed values: Integer less than zero.gapcostsGap existence and extension costs. Allowed values: Pair of positive integers separated by a space such as “11 1”.matrixScoring matrix name. Allowed values: One of BLOSUM45, BLOSUM50, BLOSUM62, BLOSUM80, BLOSUM90, PAM250, PAM30 or PAM70. Default: BLOSUM62 for all applicable programs.hitlist_sizeNumber of databases sequences to keep. Allowed values: Integer greater than zero.descriptionsNumber of descriptions to print (applies to HTML and Text). Allowed values: Integer greater than zero.alignmentsNumber of alignments to print (applies to HTML and Text). Allowed values: Integer greater than zero.ncbi_giShow NCBI GIs in report. Allowed values: T or F.thresholdNeighboring score for initial words. Allowed values: Positive integer (BLASTP default is 11). Does not apply to BLASTN or MegaBLAST).word_sizeSize of word for initial matches. Allowed values: Positive integer.composition_based_statisticsComposition based statistics algorithm to use. Allowed values: One of 0, 1, 2, or 3. See comp_based_stats command line option in the BLAST+ user manual for details.organisman organism as in https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&LINK_LOC=blasthomeothershere you can add other parameters as seen in a blast bookmarked page. Define you query in https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome Once your query is defined click on "Bookmark" on right upper side of the page. You can copy fragments of the URL which define the query. Eg. For organism "Homo sapiens (taxid:9606)" you will see the string "EQ_MENU=Homo%20sapiens%20%28taxid%3A9606%29" - this is the string you can use here in others.num_threadsNumber of virtual CPUs to use. Allowed values: Integer greater than zero (default is 1). Supported only on the cloud.-
verboseprint more -
returnsBLAST search request identifier
>>> import AGEpy as age
>>> seq="CTCCTCAGCATCTTATCCGAGTGGAAGGAAATTTGCGTGTGGAGTATTTGGATGACAGAAACACTTTTCGACATAGTGTGGTGGTGCCCTATGAGCCGCCTGAGGTTGGCTCTGACTGTACCACCATCCACTACAACTAC"
>>> RID=age.BLASTquery(seq,"nt","blastn")
>>> print RID
4MS2JV8T014
BLASTcheck
Checks the status of a query.
BLASTcheck(rid,baseURL="http://blast.ncbi.nlm.nih.gov")
ridBLAST search request identifier. Allowed values: The Request ID (RID) returned when the search was submitted-
baseURLserver url. Default=http://blast.ncbi.nlm.nih.gov -
returns statusstatus for the query. returns therearehistyes or no for existing hits on a finished query.
>>> import AGEpy as age
>>> status, therearehits=age.BLASTcheck(RID)
RID: 4MRYDZSC014; status:READY; hits: yes
>>> print status, therearehits
READY yes
BLASTresults
Retrieves results for an RID.
BLASTresults(rid, format_type="Tabular", hitlist_size= None, alignments=None, ncbi_gi = None, format_object=None, baseURL="http://blast.ncbi.nlm.nih.gov")
ridBLAST search request identifier. Allowed values: The Request ID (RID) returned when the search was submittedformat_typeReport type. Allowed values: HTML, Text, XML, XML2, JSON2, or Tabular.hitlist_sizeNumber of databases sequences to keep. Allowed values: Integer greater than zero.alignmentsNumber of alignments to print (applies to HTML and Text). Allowed values: Integer greater than zero.ncbi_giShow NCBI GIs in report. Allowed values: T or F.format_objectObject type. Allowed values: SearchInfo (status check) or Alignment (report formatting).-
baseURLserver url. Default=http://blast.ncbi.nlm.nih.gov -
returnsthe result of a BLAST query. If format_type="Tabular" it will parse the content into a Pandas dataframe.
>>> import AGEpy as age
>>> r=age.BLASTresults(RID)
>>> print r.head()
query id subject ids \
0 Query_17381 gi|1012955506|gb|JN214348.1|
1 Query_17381 gi|631786534|tpe|HG975427.1|
2 Query_17381 gi|369762889|gb|JN900492.1|
3 Query_17381 gi|371502118|ref|NM_001126118.1|
4 Query_17381 gi|371502115|ref|NM_001126112.2|;gi|454521556|...
query acc.ver subject acc.ver % identity alignment length mismatches \
0 Query_17381 JN214348.1 100.000 1190 0
1 Query_17381 HG975427.1 100.000 1190 0
2 Query_17381 JN900492.1 100.000 1190 0
3 Query_17381 NM_001126118.1 100.000 1190 0
4 Query_17381 NM_001126112.2 100.000 1190 0
gap opens q. start q. end s. start s. end evalue bit scor
0 0 1 1190 614 1803 0.0 2147
1 0 1 1190 766 1955 0.0 2147
2 0 1 1190 877 2066 0.0 2147
3 0 1 1190 888 2077 0.0 2147
4 0 1 1190 768 1957 0.0 2147