This workflow uses one or more services that are deprecated as of 31st December 2012 (over 12 years ago), and may no longer function. Show details...

KEGG Pathway Analysis

Created: 2010-03-19 13:46:37

Download Workflow

The KEGG pathway analysis of the workflow takes a list of UniProt accession numbers in any of the following formats with the following prefixes:

External database Database prefix ----------------- --------------- NCBI GI ncbi-gi:

NCBI GeneID ncbi-geneid:

GenBank genbank:

UniGene unigene:

UniProt uniprot:

It performs this using the web service bconv, provided by the KEGG database (Kanehisa et al., 2010), described in the KEGG API available at: http://www.genome.jp/kegg/docs/keggapi_manual.html#label:42. A list of KEGG Ids in a tabular format is produced, the first element contains the input ID, the second element is the KEGG ID and the third element is a string confirming the corresponding existence of the proteins in both databases used. This tabular format is then split into three segments using white-space as a regular expression. Each element from each line is then entered into a new separate list. The next step in the workflow is to remove the confirmation string and the NCBI-GI ID, leaving the KEGG ID of the proteins. This is done by using the regular expression: .{3}:.* The get_pathways_by_genes web service from the KEGG database then queries the KEGG database and retrieves the pathways the protein participates in. The mark_pathway_by_objects method is used to mark the input proteins from the filtered list in their respective KEGG pathways found by get_pathways_by_genes. This method then generates a list of URLs as an output. The URLs retrieved corresponds to the images of the KEGG pathways. In these images the target proteins are marked in orange. For this procedure the Get_Image_From_URL method is used. The final output is a list of images with the target proteins in their respective KEGG pathways highlighted in orange.

Preview

Download as scalable diagram (SVG)

Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/1179/download?version=1
[ More Info Expand ]

Workflow Components

Authors (2)

Titles (1)

Descriptions (6)

Uses workflows made by: Franck Tanoh. Paul Fischer and

This workflow receives a list of UniProt accession numbers as input. These are then converted to KEGG IDs, which are entered into a separate list in a tabular format.

The KEGG pathway analysis of the workflow takes a list of UniProt accession Ids in any of the following formats with the following prefixes:External database Database prefix----------------- ---------------NCBI GI ncbi-gi:NCBI GeneID ncbi-geneid:GenBank genbank:UniGene unigene:UniProt uniprot:It performs this using the web service bconv, provided by the KEGG database (Kanehisa et al., 2010), described in the KEGG API available at: http://www.genome.jp/kegg/docs/keggapi_manual.html#label:42.A list of KEGG Ids in a tabular format is produced, the first element contains the input ID, the second element is the KEGG ID and the third element is a string confirming the corresponding existence of the proteins in both databases used. This tabular format is then split into three segments using white-space as a regular expression. Each element from each line is then entered into a new separate list. The next step in the workflow is to remove the confirmation string and the NCBI-GI ID, leaving the KEGG ID of the proteins. This is done by using the regular expression: .{3}:.* The get_pathways_by_genes web service from the KEGG database then queries the KEGG database and retrieves the pathways the protein participates in. The mark_pathway_by_objects method is used to mark the input proteins from the filtered list in their respective KEGG pathways found by get_pathways_by_genes. This method then generates a list of URLs as an output. The URLs retrieved corresponds to the images of the KEGG pathways. In these images the target proteins are marked in orange. For this procedure the Get_Image_From_URL method is used. The final output is a list of images with the target proteins in their respective KEGG pathways highlighted in orange.

This workflow receives a list of UniProt accession numbers as input. These are then converted to KEGG IDs using the bconv service provided by , which are entered into a separate list in a tabular format.

Uses workflows made by: Franck Tanoh. Paul Fischer and Michael Gerlich.

The KEGG pathway analysis of the workflow takes a list of UniProt accession numbers in any of the following formats with the following prefixes:External database Database prefix ----------------- ---------------NCBI GI ncbi-gi:NCBI GeneID ncbi-geneid:GenBank genbank:UniGene unigene:UniProt uniprot:It performs this using the web service bconv, provided by the KEGG database (Kanehisa et al., 2010), described in the KEGG API available at: http://www.genome.jp/kegg/docs/keggapi_manual.html#label:42.A list of KEGG Ids in a tabular format is produced, the first element contains the input ID, the second element is the KEGG ID and the third element is a string confirming the corresponding existence of the proteins in both databases used. This tabular format is then split into three segments using white-space as a regular expression. Each element from each line is then entered into a new separate list. The next step in the workflow is to remove the confirmation string and the NCBI-GI ID, leaving the KEGG ID of the proteins. This is done by using the regular expression: .{3}:.* The get_pathways_by_genes web service from the KEGG database then queries the KEGG database and retrieves the pathways the protein participates in. The mark_pathway_by_objects method is used to mark the input proteins from the filtered list in their respective KEGG pathways found by get_pathways_by_genes. This method then generates a list of URLs as an output. The URLs retrieved corresponds to the images of the KEGG pathways. In these images the target proteins are marked in orange. For this procedure the Get_Image_From_URL method is used. The final output is a list of images with the target proteins in their respective KEGG pathways highlighted in orange.

Dependencies (0)

Inputs (4)

Name	Description
minP	This input provides the minimum p-value for the parsing of the blast file generated from the search between dissimilar proteins and the drug target database.
maxE	This This input provides the maximum e-value for the parsing of the blast file generated from the search between dissimilar proteins and the drug target database. This input provides the maximum e-value for the parsing of the blast file generated
blast_output_file	This input provides the full Blast result from a blast search comparing dissimilar proteins to the drug target database. This file provides the blast result
protein_id_list	This input provides a list of the UniProt accession numbers for the dissimilar proteins compared against the drug target database by Blast. It feeds this into bconv, which then translates the accession numbers into KEGG IDs in a tabular format. This is a list UniProt accession numbers generated from the

Processors (11)

Name	Type	Description
bconv_2	wsdl	Wsdl http://soap.genome.jp/KEGG.wsdl Wsdl Operation bconv
Split_string_into_string_list_by_regular_expression	localworker	Script List split = new ArrayList();if (!string.equals("")) { String regexString = ","; if (regex != void) { regexString = regex; } String[] result = string.split(regexString); for (int i = 0; i < result.length; i++) { split.add(result[i]); }}
regex_value	stringconstant	Value \s
Filter_List_of_Strings_by_regex	localworker	Script filteredlist = new ArrayList();StringBuffer sb = new StringBuffer();for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); if (item.matches(regex)) { filteredlist.add(item); }}
regex_value_1	stringconstant	Value .{3}:.*
get_pathways_by_genes	wsdl	Wsdl http://soap.genome.jp/KEGG.wsdl Wsdl Operation get_pathways_by_genes
Get_Image_From_URL	localworker	Script URL inputURL = new URL(url);byte[] contents = new byte[4];if (url == null) return;if (inputURL.openConnection().getContentLength() == -1) { // Content size unknown, must read first... byte[] buffer = new byte[1024]; int bytesRead = 0; int totalBytesRead = 0; InputStream is = inputURL.openStream(); while (bytesRead != -1) { totalBytesRead += bytesRead; bytesRead = is.read(buffer, 0, 1024); } contents = new byte[totalBytesRead];} else { contents = new byte[inputURL.openConnection().getContentLength()];}int bytesRead = 0;int totalBytesRead = 0;InputStream is = inputURL.openStream();while (bytesRead != -1) { bytesRead = is.read(contents, totalBytesRead, contents.length - totalBytesRead); totalBytesRead += bytesRead; if (contents.length==totalBytesRead) break;}image = contents;
mark_pathway_by_objects	wsdl	Wsdl http://soap.genome.jp/KEGG.wsdl Wsdl Operation mark_pathway_by_objects
parse_blast_results	beanshell	Script // "uniprot:P02745 "// takes a string of \t separated results and \n //String minP = "30.0";//String maxE = "2.0";StringBuffer sb1 = new StringBuffer();StringBuffer sb2 = new StringBuffer(); double minPercent = Double.parseDouble(minP);double maxEvalue = Double.parseDouble(maxE);int count = 0;protein_list = new ArrayList();sb2.append("thresholds: minP=" + minP + ", maxE=" + maxE + "\n========================\n");String [] rows = blast_results.split("\n"); for(int i = 0; i < rows.length; ++i) { String [] cols = rows[i].split("\t"); if(cols != null && cols.length > 9) { String [] query = cols[0].split("[\|]"); String uniProtId = cols[1].replaceAll("[\|]","").trim(); String percent = cols[2].trim(); String e_val = cols[10].trim(); double max1 = 0; double max2 = 0; if( Double.parseDouble(percent) >= minPercent && Double.parseDouble(e_val) <= maxEvalue ) { //sb1.append("uniprot:" + uniProtId + " "); protein_list.add("uniprot:" + uniProtId); sb2.append(">>> query id=" + query[1]); sb2.append(", name=" + query[2]); sb2.append(", hit uniProt id=" + uniProtId); sb2.append(", % identity=" + percent); sb2.append(", e value =" + e_val + "\n"); } } }// "uniprot:P00734 uniprot:P00737"; // probably on works if both have same pathway id// sb1.toString(); records = sb2.toString();
load_blast_results	beanshell	Script //String blast_results = "sp\|Q2FH34\|ACYP_STAA3 \|P00734 66.67 9 3 0 47 55 441 449 1.1 17.7\n" +//"sp\|Q2FEF3\|3MGH_STAA3 \|P00533 25.00 72 46 2 40 103 862 933 0.18 21.9";import java.io.*;StringBuffer sb = new StringBuffer();try { BufferedReader br = new BufferedReader(new FileReader(path + filename)); String line = br.readLine(); while(line != null) { sb.append(line + "\n"); line = br.readLine(); } br.close();}catch(Exception ex) { System.out.println(ex); }blast_results = sb.toString();
path	stringconstant	Value D:/DATA/demo/

Beanshells (2)

Name	Description	Inputs	Outputs
parse_blast_results		blast_results minP maxE	protein_list records
load_blast_results		path filename	blast_results

Outputs (7)

Name	Description
Filtered	This output shows the filtered list generated from the tabular format made by bconv. Every line generated by bconv is split into three parts and added to a new list, by using whitespace as a regular expression. The three different types of values are KEGG IDs, UniProt accession numbers and a string confirming the existence of the protein in question in both databases.
KEGG_ID	This output shows a list of KEGG IDs generated by filtering the "Filtered" output value list specifically for KEGG IDs using .{3}:.* as a regular expression.
Pathway_ID	This output retrieves the KEGG pathway IDs as a list for the UniProt accession numbers input in the protein_id_list parameter.
image	This output retrieves a KEGG pathway image pinpointing the location of the proteins input from the protein_id_list input parameter. This output retrieves a KEGG pathway image pinpointing the location of the proteins input from the protein_id_list parameter.
blast_hits	This output retrieves the Blast hits from a blast search comparing dissimilar proteins to the drug target database. This output retrieves the full Blast result from a blast search comparing dissimilar proteins to the drug target database. The blast result has been filtered to remove proteins with a higher e-value than entered in the input "maxE" and a p-value higher than the one entered in "minP". This output This output retrieves the full Blast result from a blast search comparing dissimilar proteins to the drug target database.
protein_list	This output retrieves a list of the UniProt accession numbers for the dissimilar proteins compared against the drug target database by Blast. This output retrieves a list of the UniProt accession numbers for the dissimilar proteins compared against the drug target database by Blast. The blast result has been filtered to remove proteins with a higher e-value than entered in the input "maxE" and a p-value higher than the one entered in "minP". This output retrieves a list of the Uniprot accession numbers for the proteins This output retrieves a list of the UniProt accession numbers for the dissimilar proteins compared against the drug target database by Blast. The blast result has been filtered to remove proteins with a higher
img_url	This output retrieves the URL for the KEGG pathway image pinpointing the location of the proteins input from the protein_id_list parameter.

Datalinks (21)

Source	Sink
protein_id_list	bconv_2:string
bconv_2:return	Split_string_into_string_list_by_regular_expression:string
regex_value:value	Split_string_into_string_list_by_regular_expression:regex
Split_string_into_string_list_by_regular_expression:split	Filter_List_of_Strings_by_regex:stringlist
regex_value_1:value	Filter_List_of_Strings_by_regex:regex
Filter_List_of_Strings_by_regex:filteredlist	get_pathways_by_genes:genes_id_list
mark_pathway_by_objects:return	Get_Image_From_URL:url
Filter_List_of_Strings_by_regex:filteredlist	mark_pathway_by_objects:object_id_list
get_pathways_by_genes:return	mark_pathway_by_objects:pathway_id
load_blast_results:blast_results	parse_blast_results:blast_results
maxE	parse_blast_results:maxE
minP	parse_blast_results:minP
path:value	load_blast_results:path
blast_output_file	load_blast_results:filename
Split_string_into_string_list_by_regular_expression:split	Filtered
Filter_List_of_Strings_by_regex:filteredlist	KEGG_ID
get_pathways_by_genes:return	Pathway_ID
Get_Image_From_URL:image	image
parse_blast_results:records	blast_hits
parse_blast_results:protein_list	protein_list
mark_pathway_by_objects:return	img_url

Coordinations (0)

Information Workflow Type

Taverna 2

Information Uploader

Andrew David King

Information License

All versions of this Workflow are licensed under:

Information Version 1 (of 1)

Information Credits (1)

(People/Groups)

Baywatch Solutions

Information Attributions (0)

(Workflows/Files)

None

Information Tags (1)

Uploader tags

kegg

Log in to add Tags

Information Shared with Groups (1)

Baywatch Solutions

Information Featured In Packs (0)

None

Log in to add to one of your Packs

Information Attributed By (0)

(Workflows/Files)

None

Information Favourited By (0)

No one

Information Statistics

3496 viewings

1899 downloads

[ see breakdown ]

Citations (0)

None

Version History

In chronological order:

KEGG Pathway Analysis

Created by Andrew David King on Friday 19 March 2010 13:46:37 (UTC)

Reviews (0)

No reviews yet

Be the first to review!

Comments (0)

View Timeline

No comments yet

Log in to make a comment

Other workflows that use similar services (93)

Only the first 2 workflows that use similar services are shown. View all workflows that use these services.

Taverna 2

Uploader

Francois Belleau

Triplify namespace list from global search... (1)

Download

No description

Created: 2009-11-30 | Last updated: 2009-11-30

Credits: Francois Belleau

Taverna 1

Uploader

Paul Fisher

Cow-Human Ortholog Pathways and Gene annot... (2)

Download

This workflow searches for genes which reside in a QTL (Quantitative Trait Loci) region in the cow, Bos taurus. The workflow requires an input of: a chromosome name or number; a QTL start base pair position; QTL end base pair position. Data is then extracted from BioMart to annotate each of the genes found in this region. As the Cow genome is currently unfinished, the workflow subsequently maps the cow ensembl gene ids to human orthologues. Entrez and UniProt identifiers are then identified...

Created: 2007-10-03 | Last updated: 2009-12-03

KEGG Pathway Analysis

Preview

Run

Run this Workflow in the Taverna Workbench...

Workflow Components

Wsdl

Wsdl Operation

Script

Value

Script

Value

Wsdl

Wsdl Operation

Script

Wsdl

Wsdl Operation

Script

Script

Value

Reviews (0)

Comments (0)

Other workflows that use similar services (93)