Workflow Entry: KEGG Pathway Analysis
Created at: 19/03/10 @ 13:46:37
Version 1
(of 1)
|
|
Title:
KEGG Pathway Analysis
Type:
Taverna 2
Preview
(Click on the image to get the full size)
Description
The KEGG pathway analysis of the workflow takes a list of UniProt accession numbers in any of the following formats with the following prefixes:
External database Database prefix
----------------- ---------------
NCBI GI ncbi-gi:
NCBI GeneID ncbi-geneid:
GenBank genbank:
UniGene unigene:
UniProt uniprot:
It performs this using the web service bconv, provided by the KEGG database (Kanehisa et al., 2010), described in the KEGG API available at: http://www.genome.jp/kegg/docs/keggapi_manual.html#label:42.
A list of KEGG Ids in a tabular format is produced, the first element contains the input ID, the second element is the KEGG ID and the third element is a string confirming the corresponding existence of the proteins in both databases used. This tabular format is then split into three segments using white-space as a regular expression. Each element from each line is then entered into a new separate list. The next step in the workflow is to remove the confirmation string and the NCBI-GI ID, leaving the KEGG ID of the proteins. This is done by using the regular expression:
.{3}:.*
The get_pathways_by_genes web service from the KEGG database then queries the KEGG database and retrieves the pathways the protein participates in. The mark_pathway_by_objects method is used to mark the input proteins from the filtered list in their respective KEGG pathways found by get_pathways_by_genes. This method then generates a list of URLs as an output. The URLs retrieved corresponds to the images of the KEGG pathways. In these images the target proteins are marked in orange. For this procedure the Get_Image_From_URL method is used. The final output is a list of images with the target proteins in their respective KEGG pathways highlighted in orange.
Download
Run
Run this Workflow in the Taverna Workbench...
Workflow Components
![header=[] body=[This is the author information extracted from the workflow version] cssheader=[boxoverTooltipHeader] cssbody=[boxoverTooltipBody] delay=[200] Information](/images/famfamfam_silk/information.png?1202402239)
Authors (2)
| Haakon Berven and Monica Bhaskar. |
| Baywatch Solutions |
![header=[] body=[These are the descriptive titles embedded within the workflow version] cssheader=[boxoverTooltipHeader] cssbody=[boxoverTooltipBody] delay=[200] Information](/images/famfamfam_silk/information.png?1202402239)
Titles (1)
![header=[] body=[These are the descriptions embedded within the workflow version] cssheader=[boxoverTooltipHeader] cssbody=[boxoverTooltipBody] delay=[200] Information](/images/famfamfam_silk/information.png?1202402239)
Descriptions (6)
| Uses workflows made by: Franck Tanoh. Paul Fischer and |
| This workflow receives a list of UniProt accession numbers as input. These are then converted to KEGG IDs, which are entered into a separate list in a tabular format. |
| The KEGG pathway analysis of the workflow takes a list of UniProt accession Ids in any of the following formats with the following prefixes:External database Database prefix----------------- ---------------NCBI GI ncbi-gi:NCBI GeneID ncbi-geneid:GenBank genbank:UniGene unigene:UniProt uniprot:It performs this using the web service bconv, provided by the KEGG database (Kanehisa et al., 2010), described in the KEGG API available at: http://www.genome.jp/kegg/docs/keggapi_manual.html#label:42.A list of KEGG Ids in a tabular format is produced, the first element contains the input ID, the second element is the KEGG ID and the third element is a string confirming the corresponding existence of the proteins in both databases used. This tabular format is then split into three segments using white-space as a regular expression. Each element from each line is then entered into a new separate list. The next step in the workflow is to remove the confirmation string and the NCBI-GI ID, leaving the KEGG ID of the proteins. This is done by using the regular expression: .{3}:.* The get_pathways_by_genes web service from the KEGG database then queries the KEGG database and retrieves the pathways the protein participates in. The mark_pathway_by_objects method is used to mark the input proteins from the filtered list in their respective KEGG pathways found by get_pathways_by_genes. This method then generates a list of URLs as an output. The URLs retrieved corresponds to the images of the KEGG pathways. In these images the target proteins are marked in orange. For this procedure the Get_Image_From_URL method is used. The final output is a list of images with the target proteins in their respective KEGG pathways highlighted in orange. |
| This workflow receives a list of UniProt accession numbers as input. These are then converted to KEGG IDs using the bconv service provided by , which are entered into a separate list in a tabular format. |
| Uses workflows made by: Franck Tanoh. Paul Fischer and Michael Gerlich. |
| The KEGG pathway analysis of the workflow takes a list of UniProt accession numbers in any of the following formats with the following prefixes:External database Database prefix ----------------- ---------------NCBI GI ncbi-gi:NCBI GeneID ncbi-geneid:GenBank genbank:UniGene unigene:UniProt uniprot:It performs this using the web service bconv, provided by the KEGG database (Kanehisa et al., 2010), described in the KEGG API available at: http://www.genome.jp/kegg/docs/keggapi_manual.html#label:42.A list of KEGG Ids in a tabular format is produced, the first element contains the input ID, the second element is the KEGG ID and the third element is a string confirming the corresponding existence of the proteins in both databases used. This tabular format is then split into three segments using white-space as a regular expression. Each element from each line is then entered into a new separate list. The next step in the workflow is to remove the confirmation string and the NCBI-GI ID, leaving the KEGG ID of the proteins. This is done by using the regular expression: .{3}:.* The get_pathways_by_genes web service from the KEGG database then queries the KEGG database and retrieves the pathways the protein participates in. The mark_pathway_by_objects method is used to mark the input proteins from the filtered list in their respective KEGG pathways found by get_pathways_by_genes. This method then generates a list of URLs as an output. The URLs retrieved corresponds to the images of the KEGG pathways. In these images the target proteins are marked in orange. For this procedure the Get_Image_From_URL method is used. The final output is a list of images with the target proteins in their respective KEGG pathways highlighted in orange. |
Inputs (4)
| Name |
Description |
| minP |
This input provides the minimum p-value for the parsing of the blast file generated from the search between dissimilar proteins and the drug target database.
|
| maxE |
This
This input provides the maximum e-value for the parsing of the blast file generated from the search between dissimilar proteins and the drug target database.
This input provides the maximum e-value for the parsing of the blast file generated
|
| blast_output_file |
This input provides the full Blast result from a blast search comparing dissimilar proteins to the drug target database.
This file provides the blast result
|
| protein_id_list |
This input provides a list of the UniProt accession numbers for the dissimilar proteins compared against the drug target database by Blast. It feeds this into bconv, which then translates the accession numbers into KEGG IDs in a tabular format.
This is a list UniProt accession numbers generated from the
|
Processors (11)
| Name |
Type |
Description |
| bconv_2 |
wsdl |
See the Service entry
for this web service.
|
| Split_string_into_string_list_by_regular_expression |
localworker |
|
| regex_value |
stringconstant |
|
| Filter_List_of_Strings_by_regex |
localworker |
|
| regex_value_1 |
stringconstant |
|
| get_pathways_by_genes |
wsdl |
See the Service entry
for this web service.
|
| Get_Image_From_URL |
localworker |
|
| mark_pathway_by_objects |
wsdl |
See the Service entry
for this web service.
|
| parse_blast_results |
beanshell |
|
| load_blast_results |
beanshell |
|
| path |
stringconstant |
|
Beanshells (2)
| Name |
Description |
Inputs |
Outputs |
| parse_blast_results |
|
blast_results
minP
maxE
|
protein_list
records
|
| load_blast_results |
|
path
filename
|
blast_results
|
Outputs (7)
| Name |
Description |
| Filtered |
This output shows the filtered list generated from the tabular format made by bconv. Every line generated by bconv is split into three parts and added to a new list, by using whitespace as a regular expression. The three different types of values are KEGG IDs, UniProt accession numbers and a string confirming the existence of the protein in question in both databases.
|
| KEGG_ID |
This output shows a list of KEGG IDs generated by filtering the "Filtered" output value list specifically for KEGG IDs using .{3}:.* as a regular expression.
|
| Pathway_ID |
This output retrieves the KEGG pathway IDs as a list for the UniProt accession numbers input in the protein_id_list parameter.
|
| image |
This output retrieves a KEGG pathway image pinpointing the location of the proteins input from the protein_id_list input parameter.
This output retrieves a KEGG pathway image pinpointing the location of the proteins input from the protein_id_list parameter.
|
| blast_hits |
This output retrieves the Blast hits from a blast search comparing dissimilar proteins to the drug target database.
This output retrieves the full Blast result from a blast search comparing dissimilar proteins to the drug target database. The blast result has been filtered to remove proteins with a higher e-value than entered in the input "maxE" and a p-value higher than the one entered in "minP".
This output
This output retrieves the full Blast result from a blast search comparing dissimilar proteins to the drug target database.
|
| protein_list |
This output retrieves a list of the UniProt accession numbers for the dissimilar proteins compared against the drug target database by Blast.
This output retrieves a list of the UniProt accession numbers for the dissimilar proteins compared against the drug target database by Blast. The blast result has been filtered to remove proteins with a higher e-value than entered in the input "maxE" and a p-value higher than the one entered in "minP".
This output retrieves a list of the Uniprot accession numbers for the proteins
This output retrieves a list of the UniProt accession numbers for the dissimilar proteins compared against the drug target database by Blast. The blast result has been filtered to remove proteins with a higher
|
| img_url |
This output retrieves the URL for the KEGG pathway image pinpointing the location of the proteins input from the protein_id_list parameter.
|
Datalinks (21)
| Source |
Sink |
| protein_id_list |
bconv_2:string |
| bconv_2:return |
Split_string_into_string_list_by_regular_expression:string |
| regex_value:value |
Split_string_into_string_list_by_regular_expression:regex |
| Split_string_into_string_list_by_regular_expression:split |
Filter_List_of_Strings_by_regex:stringlist |
| regex_value_1:value |
Filter_List_of_Strings_by_regex:regex |
| Filter_List_of_Strings_by_regex:filteredlist |
get_pathways_by_genes:genes_id_list |
| mark_pathway_by_objects:return |
Get_Image_From_URL:url |
| Filter_List_of_Strings_by_regex:filteredlist |
mark_pathway_by_objects:object_id_list |
| get_pathways_by_genes:return |
mark_pathway_by_objects:pathway_id |
| load_blast_results:blast_results |
parse_blast_results:blast_results |
| maxE |
parse_blast_results:maxE |
| minP |
parse_blast_results:minP |
| path:value |
load_blast_results:path |
| blast_output_file |
load_blast_results:filename |
| Split_string_into_string_list_by_regular_expression:split |
Filtered |
| Filter_List_of_Strings_by_regex:filteredlist |
KEGG_ID |
| get_pathways_by_genes:return |
Pathway_ID |
| Get_Image_From_URL:image |
image |
| parse_blast_results:records |
blast_hits |
| parse_blast_results:protein_list |
protein_list |
| mark_pathway_by_objects:return |
img_url |
Original Uploader
License
All versions of this Workflow are
licensed under:
Credits (1)
(People/Groups)
Attributions (0)
(Workflows/Files)
None
Shared with Groups (1)
Featured In Packs (0)
None
Log in to add to one of your Packs
Ratings (0)
Current:
0.0 / 5
(0 ratings)
Log in to rate and see breakdown of ratings
Attributed By (0)
(Workflows/Files)
None
Favourited By (0)
No one
Statistics
Other workflows that use similar services
(91)
Only the first 2 workflows that use similar services are shown. View all workflows that use these services.
No comments yet
Log in to make a comment