Extract_unique_proteins_from_blast_resultsblastFile00 inputs the xml format of the blast results 2010-03-19 03:21:20.950 GMT tfasta00 fasta file of the target proteins to extract the sequences 2010-03-19 03:19:27.444 GMT cfasta_file_path00 Where the workflow will save the unique proteins to the specified filepath 2010-03-19 03:20:20.859 GMT Read_Text_Filefileurl0filecontents00net.sf.taverna.t2.activitieslocalworker-activity1.0net.sf.taverna.t2.activities.localworker.LocalworkerActivity net.sourceforge.taverna.scuflworkers.io.TextFileReader workflow java.lang.String true fileurl 0 'text/plain' 0 filecontents 0 'text/plain' net.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeExtract_unique_proteinsxml_result0gi_lines_val00net.sf.taverna.t2.activitiesbeanshell-activity1.0net.sf.taverna.t2.activities.beanshell.BeanshellActivity workflow java.lang.String true xml_result 0 text/plain 0 gi_lines_val 0 net.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeTfasta_parsergi_val0tfasta_in0cfasta_out00net.sf.taverna.t2.activitiesbeanshell-activity1.0net.sf.taverna.t2.activities.beanshell.BeanshellActivity workflow java.lang.String true gi_val 0 text/plain java.lang.String true tfasta_in 0 text/plain 0 cfasta_out 0 net.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeWrite_Text_FileoutputFile0filecontents0net.sf.taverna.t2.activitieslocalworker-activity1.0net.sf.taverna.t2.activities.localworker.LocalworkerActivity net.sourceforge.taverna.scuflworkers.io.TextFileWriter workflow java.lang.String true outputFile 0 'text/plain' java.lang.String true filecontents 0 'text/plain' 0 outputFile 0 'text/plain' net.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.0net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeRead_Text_FilefileurlblastFileExtract_unique_proteinsxml_resultRead_Text_FilefilecontentsTfasta_parsergi_valExtract_unique_proteinsgi_lines_valTfasta_parsertfasta_intfastaWrite_Text_FileoutputFilecfasta_file_pathWrite_Text_FilefilecontentsTfasta_parsercfasta_out 2010-03-19 03:14:00.733 GMT fetchEnsemblSeqsAndBlast 2010-03-15 11:30:59.109 GMT Extract unique proteins from blast results 2010-03-19 03:16:29.89 GMT nclteamc 2010-03-19 03:15:57.858 GMT Workflow outputs a list of proteins encoded by the target genomes that do not have sequences similarity to those encoded by the source genome 2010-03-19 03:18:26.823 GMT This workflow allows you to configure a BioMart query to fetch sequences you want from Ensembl. These sequences are retrieved and a blast database of them is created (by default, in the directory you ran taverna from). Warning: This workflow assumes that you have blastall and formatdb installed on the machine, and that by default, these are both found or linked in /usr/local/bin. It also assumes that you have write permission to the directory you have run taverna from. The beanshells "create_blastall_cmdArgs" and "create_formatdb_cmdArgs" are what you need to edit if the default locations are not appropriate for you. Shortcomings: The names of all the files created and used is hard coded in this workflow. This means that if you run this workflow more than once without editing anything, you will overwrite files you have previously created. All files created in the working directory are not yet coded to be deleted via the workflow. Ideally there would be an option that a user could choose that would set the files to be kept or deleted after use. 2010-03-15 11:30:59.109 GMT The workflow parses uses the blast results to determine the unique proteins found in the target genome that have no similairty to the source genome. Using these unique protein ids, and the original target protein fasta file, a fasta file of unique proteins is created. 2010-03-19 03:23:47.653 GMT Bela Tiwari 2010-03-15 11:30:59.109 GMT