BLASTquery
Performs a blast query online. As in https://ncbi.github.io/blast-cloud/
BLASTquery(query,database,program,filter=None, format_type=None, expect=None, nucl_reward=None, nucl_penalty=None, gapcosts=None, matrix=None, hitlist_size=None, descriptions=None, alignments=None, ncbi_gi=None, threshold=None, word_size=None, composition_based_statistics=None, organism=None, others=None, num_threads=None, baseURL="http://blast.ncbi.nlm.nih.gov", verbose=False)
query
Search query. Allowed values: Accession, GI, or FASTA.database
BLAST database. Allowed values: nt, nr, refseq_rna, refseq_protein, swissprot, pdbaa, pdbntprogram
BLAST program. Allowed values: blastn, megablast, blastp, blastx, tblastn, tblastxfilter
Low complexity filtering. Allowed values: F to disable. T or L to enable. Prepend “m” for mask at lookup (e.g., mL)format_type
Report type. Allowed values: HTML, Text, XML, XML2, JSON2, or Tabular. HTML is the default.expect
Expect value. Allowed values: Number greater than zero.nucl_reward
Reward for matching bases (BLASTN and megaBLAST). Allowed values: Integer greater than zero.nucl_penalty
Cost for mismatched bases (BLASTN and megaBLAST). Allowed values: Integer less than zero.gapcosts
Gap existence and extension costs. Allowed values: Pair of positive integers separated by a space such as “11 1”.matrix
Scoring matrix name. Allowed values: One of BLOSUM45, BLOSUM50, BLOSUM62, BLOSUM80, BLOSUM90, PAM250, PAM30 or PAM70. Default: BLOSUM62 for all applicable programs.hitlist_size
Number of databases sequences to keep. Allowed values: Integer greater than zero.descriptions
Number of descriptions to print (applies to HTML and Text). Allowed values: Integer greater than zero.alignments
Number of alignments to print (applies to HTML and Text). Allowed values: Integer greater than zero.ncbi_gi
Show NCBI GIs in report. Allowed values: T or F.threshold
Neighboring score for initial words. Allowed values: Positive integer (BLASTP default is 11). Does not apply to BLASTN or MegaBLAST).word_size
Size of word for initial matches. Allowed values: Positive integer.composition_based_statistics
Composition based statistics algorithm to use. Allowed values: One of 0, 1, 2, or 3. See comp_based_stats command line option in the BLAST+ user manual for details.organism
an organism as in https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&LINK_LOC=blasthomeothers
here you can add other parameters as seen in a blast bookmarked page. Define you query in https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome Once your query is defined click on "Bookmark" on right upper side of the page. You can copy fragments of the URL which define the query. Eg. For organism "Homo sapiens (taxid:9606)" you will see the string "EQ_MENU=Homo%20sapiens%20%28taxid%3A9606%29" - this is the string you can use here in others.num_threads
Number of virtual CPUs to use. Allowed values: Integer greater than zero (default is 1). Supported only on the cloud.-
verbose
print more -
returns
BLAST search request identifier
>>> import AGEpy as age
>>> seq="CTCCTCAGCATCTTATCCGAGTGGAAGGAAATTTGCGTGTGGAGTATTTGGATGACAGAAACACTTTTCGACATAGTGTGGTGGTGCCCTATGAGCCGCCTGAGGTTGGCTCTGACTGTACCACCATCCACTACAACTAC"
>>> RID=age.BLASTquery(seq,"nt","blastn")
>>> print RID
4MS2JV8T014
BLASTcheck
Checks the status of a query.
BLASTcheck(rid,baseURL="http://blast.ncbi.nlm.nih.gov")
rid
BLAST search request identifier. Allowed values: The Request ID (RID) returned when the search was submitted-
baseURL
server url. Default=http://blast.ncbi.nlm.nih.gov -
returns status
status for the query. returns therearehist
yes or no for existing hits on a finished query.
>>> import AGEpy as age
>>> status, therearehits=age.BLASTcheck(RID)
RID: 4MRYDZSC014; status:READY; hits: yes
>>> print status, therearehits
READY yes
BLASTresults
Retrieves results for an RID.
BLASTresults(rid, format_type="Tabular", hitlist_size= None, alignments=None, ncbi_gi = None, format_object=None, baseURL="http://blast.ncbi.nlm.nih.gov")
rid
BLAST search request identifier. Allowed values: The Request ID (RID) returned when the search was submittedformat_type
Report type. Allowed values: HTML, Text, XML, XML2, JSON2, or Tabular.hitlist_size
Number of databases sequences to keep. Allowed values: Integer greater than zero.alignments
Number of alignments to print (applies to HTML and Text). Allowed values: Integer greater than zero.ncbi_gi
Show NCBI GIs in report. Allowed values: T or F.format_object
Object type. Allowed values: SearchInfo (status check) or Alignment (report formatting).-
baseURL
server url. Default=http://blast.ncbi.nlm.nih.gov -
returns
the result of a BLAST query. If format_type="Tabular" it will parse the content into a Pandas dataframe.
>>> import AGEpy as age
>>> r=age.BLASTresults(RID)
>>> print r.head()
query id subject ids \
0 Query_17381 gi|1012955506|gb|JN214348.1|
1 Query_17381 gi|631786534|tpe|HG975427.1|
2 Query_17381 gi|369762889|gb|JN900492.1|
3 Query_17381 gi|371502118|ref|NM_001126118.1|
4 Query_17381 gi|371502115|ref|NM_001126112.2|;gi|454521556|...
query acc.ver subject acc.ver % identity alignment length mismatches \
0 Query_17381 JN214348.1 100.000 1190 0
1 Query_17381 HG975427.1 100.000 1190 0
2 Query_17381 JN900492.1 100.000 1190 0
3 Query_17381 NM_001126118.1 100.000 1190 0
4 Query_17381 NM_001126112.2 100.000 1190 0
gap opens q. start q. end s. start s. end evalue bit scor
0 0 1 1190 614 1803 0.0 2147
1 0 1 1190 766 1955 0.0 2147
2 0 1 1190 877 2066 0.0 2147
3 0 1 1190 888 2077 0.0 2147
4 0 1 1190 768 1957 0.0 2147