PubMed_Search_and_Sosearch_term00 text/plain 2011-02-03 14:53:06.572 UTC "Text Mining" AND Leiden 2013-08-05 14:01:31.929 UTC I want to find all abstracts that contain the words: 2013-07-25 14:05:03.937 UTC end_date00 2015 2013-08-05 13:36:08.823 UTC The found articles should be older then ____: 2013-07-25 14:07:17.530 UTC start_date00 The found articles should not be older then ____. 2013-07-25 14:06:46.323 UTC 1900 2013-08-05 13:36:20.405 UTC maximum_articles00 I want a maximum of ____ articles. 10 is set as default for testing. More then 100 is good for a normal run. 2013-07-25 14:06:07.459 UTC 10 2013-07-25 13:34:01.622 UTC ResearcherID00 The ID of the researcher that runs this workflow. If you do not have an ID, please register at www.researcherid.com. 2013-07-30 11:24:03.84 UTC H-8686-2013 2013-07-30 11:29:43.224 UTC Workspace00 /tmp/ 2013-08-07 12:28:32.783 UTC This is workspace where the found articles will be stored. The name of the files will be [ID of the article].xml. 2013-08-07 12:28:27.306 UTC SolrWorkspace00 This is where the indexed files are being stored. This could be the same as workspace as the workspace where the abstracts are being stored. 2013-08-09 10:46:14.845 UTC /tmp/ 2013-08-09 10:46:26.451 UTC SolrImport_STDERR This is the standerdized error from Solr. If there were any errors while running Solr then the values will turn red. Red is generally considered to be a bad colour when programming. 2013-08-09 11:03:49.481 UTC SolrImport_STDOUT This is the standerdized output from Solr. 2013-07-25 14:20:00.680 UTC IDsInDatabase This output contains the list of ID's that were found by eSearch but were already in the wokring directory. Better luck next time. 2013-08-05 11:16:31.507 UTC IDsBeingSearched This output port contains the ID's that has been extracted. 2013-08-05 11:17:00.143 UTC AbstractWithProv This is the abstract that has been extracted + some provenance. 2013-08-05 11:19:55.518 UTC IndexedFileLocation The location of the indexed file that was imported in Solr. 2013-08-09 11:02:53.603 UTC pubmed_databasevalue00 Which database is being used. 2013-07-25 14:08:02.977 UTC net.sf.taverna.t2.activitiesstringconstant-activity1.4net.sf.taverna.t2.activities.stringconstant.StringConstantActivity pubmed net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 0 0 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeextractPMIDxpath0xml-text0nodelist11nodelistAsXML11 This process extracts the pubmed ID's based on the eSearch run. 2013-08-05 11:02:44.568 UTC net.sf.taverna.t2.activitieslocalworker-activity1.4net.sf.taverna.t2.activities.localworker.LocalworkerActivity xpath 0 'text/plain' java.lang.String true xml-text 0 'text/xml' java.lang.String true nodelist 1 l('text/plain') 1 nodelistAsXML 1 l('text/plain') 1 workflow dom4j dom4j 1.6 716010169 dom4j:dom4j:1.6 net.sourceforge.taverna.scuflworkers.xml.XPathTextWorker net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 0 0 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Invokexpathvalue00net.sf.taverna.t2.activitiesstringconstant-activity1.4net.sf.taverna.t2.activities.stringconstant.StringConstantActivity /*[local-name(.)='eSearchResult']/*[local-name(.)='IdList']/*[local-name(.)='Id'] net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 0 0 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Invokerun_eSearchparameters0attachmentList11parameters00 This process will run eSearch that will extract the ID's of the articles that give a hit on the query. 2013-08-05 10:17:21.59 UTC net.sf.taverna.t2.activitieswsdl-activity1.4net.sf.taverna.t2.activities.wsdl.WSDLActivity http://eutils.ncbi.nlm.nih.gov/entrez/eutils/soap/eutils.wsdl run_eSearch net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 0 0 5 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeparametersXML_eFecthdb0term0maxdate0mindate0RetMax0output00 This process will create the parameters that can then be used by eSearch and eFetch. 2013-08-05 11:29:52.580 UTC net.sf.taverna.t2.activitieswsdl-activity1.4net.sf.taverna.t2.activities.wsdl.xmlsplitter.XMLInputSplitterActivity db 0 'text/plain' java.lang.String true term 0 'text/plain' java.lang.String true WebEnv 0 'text/plain' java.lang.String true QueryKey 0 'text/plain' java.lang.String true usehistory 0 'text/plain' java.lang.String true tool 0 'text/plain' java.lang.String true email 0 'text/plain' java.lang.String true field 0 'text/plain' java.lang.String true reldate 0 'text/plain' java.lang.String true mindate 0 'text/plain' java.lang.String true maxdate 0 'text/plain' java.lang.String true datetype 0 'text/plain' java.lang.String true RetStart 0 'text/plain' java.lang.String true RetMax 0 'text/plain' java.lang.String true rettype 0 'text/plain' java.lang.String true sort 0 'text/plain' java.lang.String true output 0 'text/xml' 0 <s:extensions xmlns:s="http://org.embl.ebi.escience/xscufl/0.1alpha"><s:complextype optional="false" unbounded="false" typename="eSearchRequest" name="parameters" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/esearch}eSearchRequest"><s:elements><s:basetype optional="true" unbounded="false" typename="string" name="db" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="term" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="WebEnv" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="QueryKey" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="usehistory" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="tool" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="email" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="field" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="reldate" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="mindate" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="maxdate" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="datetype" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="RetStart" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="RetMax" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="rettype" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="sort" qname="{http://www.w3.org/2001/XMLSchema}string" /></s:elements></s:complextype></s:extensions> net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 0 0 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeRetrive_abstractspubmed_ids0AbstractXML00 This nested workflow was part of Fishers workflow, but has been decreased in size. This workflow is about storing the xmll files from eFetch and doesn't require to extract the plain text abstract like the original workflow did. 2013-07-25 14:22:59.512 UTC net.sf.taverna.t2.activitiesdataflow-activity1.4net.sf.taverna.t2.activities.dataflow.DataflowActivitynet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 0 0 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeLookAtWatchActivateTimeLookUp0CurrentTime00 Time flies like an arrow; fruit flies like banana. This process receives the Abstract XML that activates a time lookup. This is then given to the next process. 2013-08-05 11:12:03.855 UTC net.sf.taverna.t2.activitiesbeanshell-activity1.4net.sf.taverna.t2.activities.beanshell.BeanshellActivity ActivateTimeLookUp 0 text/plain java.lang.String true CurrentTime 0 0 workflow net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeCreateProvenanceResearcherID0MyExperimentID0WorkflowDevelopers0WorkflowVersion0ExtractionDate0MaximumArticles0SearchTerm0StartDate0EndDate0Prov00 Provenance is important when you want to trace back your data. For this reason I added a process in the workflow that will add some basic provenance based on the work of the w3 (www.w3.org). The Process adds the following types of provenance: ResearcherID - Use www.reasercherID.org to get a researcherID. This can then be linked to your research. ExtractionDate - The date and time the article was extracted. MyExperimentID - The MyExperimentID of the used worklow. WorkflowVersion - The Version of the used workflow. WorkflowDevelopers - A list of the developers of the workflow. StartDate - The starting date of the article search, see imput port for more information. EndDate - The ending date of the article search, see imput port for more information. SearchTerm - The original search query, see imput port for more information. MaximumArticles - The Maximum amount of articles that have been searched, see imput port for more information. 2013-08-09 11:21:45.129 UTC net.sf.taverna.t2.activitiesbeanshell-activity1.4net.sf.taverna.t2.activities.beanshell.BeanshellActivity ResearcherID 0 text/plain java.lang.String true ExtractionDate 0 text/plain java.lang.String true MyExperimentID 0 text/plain java.lang.String true WorkflowVersion 0 text/plain java.lang.String true WorkflowDevelopers 0 text/plain java.lang.String true StartDate 0 text/plain java.lang.String true EndDate 0 text/plain java.lang.String true SearchTerm 0 text/plain java.lang.String true MaximumArticles 0 text/plain java.lang.String true Prov 0 0 workflow net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeWrite_Text_FileoutputFile0filecontents0 This process writes the content of the workflow to files. The location of the file is created in the create file location process. NOTE: You might want to change the working directory. This can be done by changing the CreateFileLocation process. 2013-08-05 11:04:07.216 UTC net.sf.taverna.t2.activitieslocalworker-activity1.4net.sf.taverna.t2.activities.localworker.LocalworkerActivity outputFile 0 'text/plain' java.lang.String true filecontents 0 'text/plain' java.lang.String true encoding 0 'text/plain' java.lang.String true outputFile 0 0 workflow net.sourceforge.taverna.scuflworkers.io.TextFileWriter UserNameHere 2013-08-05 13:18:51.98 UTC net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeCreateListOfArticlesThatNeedExtractingIDsInDatabase0ExtractableIDs0IDsInDatabaseOut11ListOfArticlesThatNeedExtracting11 This Conditional branch splits the workflow in two directions: IDsInDatabase: A list of ID's of articles that are already in the working directory. They should not be extracted. ListOfArticlesThatNeedExtracting - The List of Pubmed ID's that should be extracted and added to the database. 2013-08-05 10:15:05.873 UTC net.sf.taverna.t2.activitiesbeanshell-activity1.4net.sf.taverna.t2.activities.beanshell.BeanshellActivity ExtractableIDs 0 text/plain java.lang.String true IDsInDatabase 0 text/plain java.lang.String true ListOfArticlesThatNeedExtracting 1 1 IDsInDatabaseOut 1 1 workflow net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeCheckIfArticleIsInDatabaseExtractableArticles0STDOUT00 This process calls the commandline and checks if the file at the filelocation exists. If this is not the case the process will return the string false. 2013-08-05 10:58:54.113 UTC net.sf.taverna.t2.activitiesexternal-tool-activity1.4net.sf.taverna.t2.activities.externaltool.ExternalToolActivity 789663B8-DA91-428A-9F7D-B3F3DA185FD4 default local <?xml version="1.0" encoding="UTF-8"?> <localInvocation><shellPrefix>/bin/sh -c</shellPrefix><linkCommand>/bin/ln -s %%PATH_TO_ORIGINAL%% %%TARGET_NAME%%</linkCommand></localInvocation> 5f0528ad-82e5-47c0-99ae-a67cb01e2038 //Check if article is in the database [ -f %%ExtractableArticles%% ] && echo -n "File exists" || echo %%ExtractableArticles%% 1200 1800 ExtractableArticles ExtractableArticles ExtractableArticles false false false UTF-8 false false false false true true 0 false net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeFlatten_Listinputlist2outputlist11 We need to decrease the depth of the list by one level. Otherwise we will get errors in the Validation report. 2013-08-05 10:15:47.895 UTC net.sf.taverna.t2.activitieslocalworker-activity1.4net.sf.taverna.t2.activities.localworker.LocalworkerActivity inputlist 2 l(l('')) [B true outputlist 1 l('') 1 workflow org.embl.ebi.escience.scuflworkers.java.FlattenList net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeCreateFileLocation_2PubmedID0Workspace0FileLocation00 This process creates the location of the file. This can then be used to check whetevver the ile exists in the next process (CheckIfArticleIsInDatabase). NOTE: if you want to change the working directory, please change this process so it will link to the correct directory. 2013-08-05 11:04:34.562 UTC net.sf.taverna.t2.activitiesbeanshell-activity1.4net.sf.taverna.t2.activities.beanshell.BeanshellActivity PubmedID 0 text/plain java.lang.String true Workspace 0 text/plain java.lang.String true FileLocation 0 0 workflow net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeMyExperimentID_valuevalue00 This value stores the MyExperiment ID of the workflow. If you reupload this workflow with improvements feel free to change this value. 2013-08-05 11:15:01.977 UTC net.sf.taverna.t2.activitiesstringconstant-activity1.4net.sf.taverna.t2.activities.stringconstant.StringConstantActivity 3659 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeWorkflowDevelopers_valuevalue00 Sander van Boom and Paul Fisher created this workflow. If you've changed this workflow and uploaded it on myExperiment feel free to add your name in this variable as well. 2013-08-05 11:13:38.464 UTC net.sf.taverna.t2.activitiesstringconstant-activity1.4net.sf.taverna.t2.activities.stringconstant.StringConstantActivity Sander van Boom and Paul Fisher net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeWorkflowVersion_valuevalue00 This value stores the current version of the workflow. 2013-08-05 11:14:06.878 UTC net.sf.taverna.t2.activitiesstringconstant-activity1.4net.sf.taverna.t2.activities.stringconstant.StringConstantActivity 5 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeAddProvenanceProvenance0Abstract0AbstractWithProvenance00 This process fuses the provenance, the found abstracts and the header of the file. The output is also send to an output port for checking the values. 2013-08-06 09:54:37.903 UTC net.sf.taverna.t2.activitiesbeanshell-activity1.4net.sf.taverna.t2.activities.beanshell.BeanshellActivity Provenance 0 text/plain java.lang.String true Abstract 0 text/plain java.lang.String true AbstractWithProvenance 0 0 workflow net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeXPath_Servicexml_text0nodelistAsXML11 This XPath Service removes the header from the file. This is because we want to add provenance to the file later in the workflow. After we've added the provenance then we add the header back to the file. 2013-08-06 09:53:59.709 UTC net.sf.taverna.t2.activitiesxpath-activity1.4net.sf.taverna.t2.activities.xpath.XPathActivity <?xml version="1.0" encoding="UTF-8"?> <eFetchResult xmlns="http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed"> <PubmedArticleSet> <PubmedArticle> <MedlineCitation Owner="NLM" Status="PubMed-not-MEDLINE"> <PMID Version="1">20618981</PMID> <DateCreated> <Year>2010</Year> <Month>07</Month> <Day>12</Day> </DateCreated> <DateCompleted> <Year>2012</Year> <Month>10</Month> <Day>02</Day> </DateCompleted> <DateRevised> <Year>2012</Year> <Month>11</Month> <Day>09</Day> </DateRevised> <Article PubModel="Electronic"> <Journal> <ISSN IssnType="Electronic">2041-1480</ISSN> <JournalIssue CitedMedium="Internet"> <Volume>1</Volume> <Issue>1</Issue> <PubDate> <Year>2010</Year> </PubDate> </JournalIssue> <Title>Journal of biomedical semantics</Title> <ISOAbbreviation>J Biomed Semantics</ISOAbbreviation> </Journal> <ArticleTitle>Rewriting and suppressing UMLS terms for improved biomedical term identification.</ArticleTitle> <Pagination> <MedlinePgn>5</MedlinePgn> </Pagination> <ELocationID EIdType="doi" ValidYN="Y">10.1186/2041-1480-1-5</ELocationID> <Abstract> <AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">Identification of terms is essential for biomedical text mining.. We concentrate here on the use of vocabularies for term identification, specifically the Unified Medical Language System (UMLS). To make the UMLS more suitable for biomedical text mining we implemented and evaluated nine term rewrite and eight term suppression rules. The rules rely on UMLS properties that have been identified in previous work by others, together with an additional set of new properties discovered by our group during our work with the UMLS. Our work complements the earlier work in that we measure the impact on the number of terms identified by the different rules on a MEDLINE corpus. The number of uniquely identified terms and their frequency in MEDLINE were computed before and after applying the rules. The 50 most frequently found terms together with a sample of 100 randomly selected terms were evaluated for every rule.</AbstractText> <AbstractText Label="RESULTS" NlmCategory="RESULTS">Five of the nine rewrite rules were found to generate additional synonyms and spelling variants that correctly corresponded to the meaning of the original terms and seven out of the eight suppression rules were found to suppress only undesired terms. Using the five rewrite rules that passed our evaluation, we were able to identify 1,117,772 new occurrences of 14,784 rewritten terms in MEDLINE. Without the rewriting, we recognized 651,268 terms belonging to 397,414 concepts; with rewriting, we recognized 666,053 terms belonging to 410,823 concepts, which is an increase of 2.8% in the number of terms and an increase of 3.4% in the number of concepts recognized. Using the seven suppression rules, a total of 257,118 undesired terms were suppressed in the UMLS, notably decreasing its size. 7,397 terms were suppressed in the corpus.</AbstractText> <AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">We recommend applying the five rewrite rules and seven suppression rules that passed our evaluation when the UMLS is to be used for biomedical term identification in MEDLINE. A software tool to apply these rules to the UMLS is freely available at http://biosemantics.org/casper.</AbstractText> </Abstract> <Affiliation>Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands. k.hettne@erasmusmc.nl.</Affiliation> <AuthorList CompleteYN="Y"> <Author ValidYN="Y"> <LastName>Hettne</LastName> <ForeName>Kristina M</ForeName> <Initials>KM</Initials> </Author> <Author ValidYN="Y"> <LastName>van Mulligen</LastName> <ForeName>Erik M</ForeName> <Initials>EM</Initials> </Author> <Author ValidYN="Y"> <LastName>Schuemie</LastName> <ForeName>Martijn J</ForeName> <Initials>MJ</Initials> </Author> <Author ValidYN="Y"> <LastName>Schijvenaars</LastName> <ForeName>Bob Ja</ForeName> <Initials>BJ</Initials> </Author> <Author ValidYN="Y"> <LastName>Kors</LastName> <ForeName>Jan A</ForeName> <Initials>JA</Initials> </Author> </AuthorList> <Language>eng</Language> <PublicationTypeList> <PublicationType>Journal Article</PublicationType> </PublicationTypeList> <ArticleDate DateType="Electronic"> <Year>2010</Year> <Month>03</Month> <Day>31</Day> </ArticleDate> </Article> <MedlineJournalInfo> <Country>England</Country> <MedlineTA>J Biomed Semantics</MedlineTA> <NlmUniqueID>101531992</NlmUniqueID> </MedlineJournalInfo> <CommentsCorrectionsList> <CommentsCorrections RefType="Cites"> <RefSource>BMC Bioinformatics. 2009;10:14</RefSource> <PMID Version="1">19134199</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>Proc AMIA Symp. 2001;:17-21</RefSource> <PMID Version="1">11825149</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>Pharmacogenomics. 2007 Nov;8(11):1521-34</RefSource> <PMID Version="1">18034617</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>Brief Bioinform. 2007 Sep;8(5):358-75</RefSource> <PMID Version="1">17977867</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>Bioinformatics. 2005 Sep 1;21 Suppl 2:ii259-67</RefSource> <PMID Version="1">16204115</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>Proteomics. 2007 Mar;7(6):921-31</RefSource> <PMID Version="1">17370270</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>J Biomed Discov Collab. 2007;2:2</RefSource> <PMID Version="1">17480215</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>IEEE Trans Nanobioscience. 2007 Mar;6(1):51-9</RefSource> <PMID Version="1">17393850</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>BMC Bioinformatics. 2007;8:14</RefSource> <PMID Version="1">17233900</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>Drug Discov Today. 2006 Apr;11(7-8):315-25</RefSource> <PMID Version="1">16580973</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>Brief Bioinform. 2005 Mar;6(1):57-71</RefSource> <PMID Version="1">15826357</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>J Biomed Inform. 2004 Dec;37(6):512-26</RefSource> <PMID Version="1">15542023</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>PLoS Biol. 2004 Nov;2(11):e309</RefSource> <PMID Version="1">15383839</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>Neurology. 1996 Sep;47(3):809-10</RefSource> <PMID Version="1">8797484</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>Perspect Biol Med. 1988 Summer;31(4):526-57</RefSource> <PMID Version="1">3075738</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>Perspect Biol Med. 1986 Autumn;30(1):7-18</RefSource> <PMID Version="1">3797213</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>Genome Biol. 2004;5(6):R43</RefSource> <PMID Version="1">15186494</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70</RefSource> <PMID Version="1">14681409</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>J Am Med Inform Assoc. 2003 May-Jun;10(3):252-9</RefSource> <PMID Version="1">12626374</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>Proc AMIA Symp. 2002;:504-8</RefSource> <PMID Version="1">12463875</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>Pac Symp Biocomput. 2003;:451-62</RefSource> <PMID Version="1">12603049</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>Proc AMIA Symp. 2002;:727-31</RefSource> <PMID Version="1">12463920</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>Genome Biol. 2002 Sep 13;3(10):RESEARCH0055</RefSource> <PMID Version="1">12372143</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>Proc AMIA Symp. 2001;:448-52</RefSource> <PMID Version="1">11825228</PMID> </CommentsCorrections> <CommentsCorrections RefType="Cites"> <RefSource>BMC Bioinformatics. 2007;8 Suppl 9:S5</RefSource> <PMID Version="1">18047706</PMID> </CommentsCorrections> </CommentsCorrectionsList> <OtherID Source="NLM">PMC2895736</OtherID> </MedlineCitation> <PubmedData> <History> <PubMedPubDate PubStatus="received"> <Year>2009</Year> <Month>7</Month> <Day>15</Day> </PubMedPubDate> <PubMedPubDate PubStatus="accepted"> <Year>2010</Year> <Month>3</Month> <Day>31</Day> </PubMedPubDate> <PubMedPubDate PubStatus="aheadofprint"> <Year>2010</Year> <Month>3</Month> <Day>31</Day> </PubMedPubDate> <PubMedPubDate PubStatus="entrez"> <Year>2010</Year> <Month>7</Month> <Day>13</Day> <Hour>6</Hour> <Minute>0</Minute> </PubMedPubDate> <PubMedPubDate PubStatus="pubmed"> <Year>2010</Year> <Month>7</Month> <Day>14</Day> <Hour>6</Hour> <Minute>0</Minute> </PubMedPubDate> <PubMedPubDate PubStatus="medline"> <Year>2010</Year> <Month>7</Month> <Day>14</Day> <Hour>6</Hour> <Minute>1</Minute> </PubMedPubDate> </History> <PublicationStatus>epublish</PublicationStatus> <ArticleIdList> <ArticleId IdType="pii">2041-1480-1-5</ArticleId> <ArticleId IdType="doi">10.1186/2041-1480-1-5</ArticleId> <ArticleId IdType="pubmed">20618981</ArticleId> <ArticleId IdType="pmc">PMC2895736</ArticleId> </ArticleIdList> </PubmedData> </PubmedArticle> </PubmedArticleSet> </eFetchResult> /default:eFetchResult/default:PubmedArticleSet default http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeFlatten_List_2inputlist2outputlist11 We need to decrease the depth of the list by one level. Otherwise we will get errors in the Validation report. 2013-08-05 10:15:47.895 UTC net.sf.taverna.t2.activitieslocalworker-activity1.4net.sf.taverna.t2.activities.localworker.LocalworkerActivity inputlist 2 l(l('')) [B true outputlist 1 l('') 1 workflow org.embl.ebi.escience.scuflworkers.java.FlattenList net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeCreateFileLocation_2_2PubmedID0Workspace0FileLocation00 This process creates the location of the file. This can then be used to check whetevver the ile exists in the next process (CheckIfArticleIsInDatabase). NOTE: if you want to change the working directory, please change this process so it will link to the correct directory. 2013-08-05 11:04:34.562 UTC net.sf.taverna.t2.activitiesbeanshell-activity1.4net.sf.taverna.t2.activities.beanshell.BeanshellActivity PubmedID 0 text/plain java.lang.String true Workspace 0 text/plain java.lang.String true FileLocation 0 0 workflow net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeInformationExtractionAndSolrImportAbstractLocation0Workspace0PathToPostJar0SolrImport_STDOUT11SolrImport_STDERR11OutputFileLocation11 Read a file, extract the the content, extrat the pubmedID from the abstract and write the file back to a new workspace. Then import it in solr. NOTE: Make sure that Solr is installed and the variable pathToPostJar is linking to the correct path of post.jar. BONUS NOTE: If you want Solr to detect more then just the title and the ID, you should add extra xpaths and update our solr schema accordingly. 2013-08-09 10:44:01.563 UTC net.sf.taverna.t2.activitiesdataflow-activity1.4net.sf.taverna.t2.activities.dataflow.DataflowActivitynet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokepathToPostJarvalue00 This is the path to the Post.jar that Solr uses to import it's documents. NOTE: Please change this variable to your Solr directory. 2013-08-05 11:27:45.222 UTC net.sf.taverna.t2.activitiesstringconstant-activity1.4net.sf.taverna.t2.activities.stringconstant.StringConstantActivity /run/media/sander/Second Space/Downloads/Solaria/solr-4.4.0/example/exampledocs/post.jar net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeextractPMIDxpathxpathvalueextractPMIDxml-textrun_eSearchparametersrun_eSearchparametersparametersXML_eFecthoutputparametersXML_eFecthdbpubmed_databasevalueparametersXML_eFecthtermsearch_termparametersXML_eFecthmaxdateend_dateparametersXML_eFecthmindatestart_dateparametersXML_eFecthRetMaxmaximum_articlesRetrive_abstractspubmed_idsFlatten_ListoutputlistLookAtWatchActivateTimeLookUpRetrive_abstractsAbstractXMLCreateProvenanceResearcherIDResearcherIDCreateProvenanceMyExperimentIDMyExperimentID_valuevalueCreateProvenanceWorkflowDevelopersWorkflowDevelopers_valuevalueCreateProvenanceWorkflowVersionWorkflowVersion_valuevalueCreateProvenanceExtractionDateLookAtWatchCurrentTimeCreateProvenanceMaximumArticlesmaximum_articlesCreateProvenanceSearchTermsearch_termCreateProvenanceStartDatestart_dateCreateProvenanceEndDateend_dateWrite_Text_FileoutputFileCreateFileLocation_2FileLocationWrite_Text_FilefilecontentsAddProvenanceAbstractWithProvenanceCreateListOfArticlesThatNeedExtractingIDsInDatabaseCheckIfArticleIsInDatabaseSTDOUTCreateListOfArticlesThatNeedExtractingExtractableIDsextractPMIDnodelistCheckIfArticleIsInDatabaseExtractableArticlesCreateFileLocation_2_2FileLocationFlatten_ListinputlistCreateListOfArticlesThatNeedExtractingListOfArticlesThatNeedExtractingCreateFileLocation_2PubmedIDFlatten_ListoutputlistCreateFileLocation_2WorkspaceWorkspaceAddProvenanceProvenanceCreateProvenanceProvAddProvenanceAbstractFlatten_List_2outputlistXPath_Servicexml_textRetrive_abstractsAbstractXMLFlatten_List_2inputlistXPath_ServicenodelistAsXMLCreateFileLocation_2_2PubmedIDextractPMIDnodelistCreateFileLocation_2_2WorkspaceWorkspaceInformationExtractionAndSolrImportAbstractLocationCreateFileLocation_2FileLocationInformationExtractionAndSolrImportWorkspaceSolrWorkspaceInformationExtractionAndSolrImportPathToPostJarpathToPostJarvalueSolrImport_STDERRInformationExtractionAndSolrImportSolrImport_STDERRSolrImport_STDOUTInformationExtractionAndSolrImportSolrImport_STDOUTIDsInDatabaseCreateListOfArticlesThatNeedExtractingIDsInDatabaseOutIDsBeingSearchedFlatten_ListoutputlistAbstractWithProvAddProvenanceAbstractWithProvenanceIndexedFileLocationInformationExtractionAndSolrImportOutputFileLocation 28bf3d16-934e-44fc-94de-7c87f4c41030 2013-08-05 11:35:21.859 UTC c1e30222-30f4-48ec-a194-ad5e34c92047 2013-08-02 13:18:41.739 UTC a56da4d8-cc40-45e8-a6bc-ee99d799df8d 2013-08-01 14:51:39.516 UTC 9d0a018d-a88e-4e9e-91a8-6484c9be0cc3 2013-08-01 15:04:44.157 UTC 633f3632-d8e8-471c-bdc5-c14f0845c256 2013-08-01 13:59:58.6 UTC 0be07374-fa4b-4ca7-b87a-99f72155aaf1 2013-08-05 09:49:14.166 UTC 34cec394-b4d5-467d-97e3-80576202b8a1 2013-07-31 12:08:28.188 UTC abc3b001-f660-4424-b7e4-b84048b890a0 2013-08-05 09:42:19.127 UTC ca16d958-1f9e-4f5a-a104-d6627217f307 2013-08-02 15:07:25.897 UTC 251f2cb5-f496-464a-92e0-0b3af747c34d 2013-08-05 11:04:37.153 UTC 695fd90a-fb08-426c-b068-c1f77156cb03 2013-08-05 11:12:07.214 UTC 510c4ef5-5b98-4384-9ace-9e8c9fe3c368 2013-08-05 11:02:05.834 UTC c072e496-2000-4d17-b0b0-7f86fdc7b9e7 2013-08-01 15:06:06.726 UTC dc9b637d-68ca-4940-9e17-ead4d1a19e32 2013-07-31 11:54:07.584 UTC 30555d6c-da87-42c9-9a70-1b2cfe37bbe9 2013-08-02 14:49:45.84 UTC e7bcf406-5544-47a9-8a6a-3e02f8233be2 2013-07-25 14:23:04.289 UTC e1a018c8-4dd3-40ab-966e-0fd10101000a 2013-08-09 10:44:05.29 UTC 9b4e81ed-3c43-4f5d-b99a-73b809992b4b 2013-08-05 12:54:44.331 UTC 4918d574-774c-4973-a9af-06bb0f7e50ef 2013-07-30 11:25:13.262 UTC a8d7cf24-2934-4c30-885f-26f93d0e2d20 2013-08-05 11:14:09.606 UTC a98f054d-1840-42f7-9288-d89a7fee337d 2013-08-05 12:31:05.871 UTC bece0fe1-5a8e-4de6-8547-c7f3e1795cd8 2013-08-02 13:09:26.124 UTC 3436ce86-a526-4522-a33f-3a688a0dc4d9 2013-08-05 13:20:01.209 UTC 87d5379e-e1d3-4ba5-af58-b1d0f3be7f6b 2013-08-05 12:24:28.396 UTC 021d14b7-f71d-465a-8cc2-40f249414539 2013-08-05 10:17:22.778 UTC 390037a5-901d-4f4f-914d-9fc35011f599 2013-08-07 12:11:31.536 UTC ed01feeb-8f26-491a-b98a-dbd55b6082b9 2013-08-01 13:01:27.520 UTC bbfed93e-4f84-439d-8a03-ca9a58d25543 2013-08-05 13:24:20.215 UTC 9ede5928-dd21-4a22-89cc-5441148ff588 2013-08-02 12:30:24.566 UTC bb33b0f9-cbd8-4ee1-8e35-04d3ed983264 2013-08-02 12:52:58.971 UTC 16d30de7-778d-4ff2-971f-086bfd372210 2013-08-01 13:42:26.816 UTC e9a2e3d2-eb50-434b-b2a0-a51b537f03a6 2013-08-05 11:31:30.521 UTC d0e7b42b-b6e5-438b-8702-c5b573c9a3eb 2013-08-09 11:24:12.493 UTC 57da9a79-91d7-49e5-bc17-720a5f8c027b 2013-08-06 10:14:16.630 UTC 78039f41-0953-47fd-b758-f53eb52df92a 2013-08-05 13:34:38.718 UTC 16fb9c33-5aab-4350-9bde-dfde5167133f 2013-08-06 09:51:47.305 UTC a193e3d6-a485-4073-a1b6-5c490a3260dd 2013-07-25 13:41:59.997 UTC PubMed Search and Solr storage 2013-07-25 14:00:44.958 UTC b8446a0f-abaf-4610-961d-30b6b46409dd 2013-07-25 13:29:52.623 UTC 801a5b11-eaa4-4965-93e1-25e2c6a87d82 2013-08-09 11:12:55.156 UTC c60821fa-1326-45cf-bc1a-798a2325ed10 2013-08-05 12:08:59.769 UTC 85ccb2cb-b20c-43e6-b9bd-f13b9bc92037 2013-07-29 10:19:36.356 UTC 4fbee505-4970-4393-9a62-d3b7759f0b11 2013-07-29 14:12:17.418 UTC 1ea94807-433d-4a4f-88da-dc557bb7138b 2013-08-01 14:47:21.302 UTC 4dfbabb5-9197-451e-9e92-02fae79aa4d3 2013-08-02 13:23:00.602 UTC 44ed3c5b-7b91-4d4f-b5e6-52c3df127011 2013-08-02 12:24:49.468 UTC 64c26331-190a-475a-a16e-d070aa7c391a 2013-08-05 11:29:58.399 UTC 77d1a1a0-6a43-4450-a94b-0575c0243d78 2013-08-02 14:35:40.916 UTC 68df372d-6140-41a8-bb0e-9089e490233d 2013-07-25 13:33:03.208 UTC 8b5bbbda-bb25-4e31-9d19-94cb4df65c00 2013-08-02 13:37:45.695 UTC f236f9b4-5c12-45c6-9884-98676e5f9837 2013-08-01 15:02:27.252 UTC 1575312b-b90b-4a95-ae15-ac3cec5bbb44 2013-07-31 11:45:02.645 UTC ea73a848-e4d9-444a-b3c3-bc97e006d585 2013-08-02 15:03:50.411 UTC 8113db84-5222-48bf-95e9-a86c2df01e4b 2013-08-02 14:53:12.557 UTC cb41787b-8927-4ff8-ad2a-4ec50dc1027d 2013-08-01 14:41:46.463 UTC 015db7bf-1fd9-4f19-b570-b3e9c21aba27 2013-08-06 09:45:47.778 UTC 67e6dc72-e6f0-4334-8560-cb2fee160540 2013-08-05 11:13:42.734 UTC 77aabeaf-d02f-4bda-a291-e82790295d05 2013-08-01 15:07:27.808 UTC 9b77a50d-baf5-4936-9455-4db8754a1e1e 2013-08-05 10:58:56.293 UTC 778bd18c-70ea-4831-a8ff-ca222bcdd3ab 2011-02-03 15:12:56.329 UTC 808bf10f-79cf-4ad8-b7c4-0915567296b2 2013-08-05 11:24:50.935 UTC 8a657ece-8722-428c-8eff-ce1281e159ee 2013-07-25 14:15:07.256 UTC 50392d9f-9021-4655-86d6-4e6c628232c8 2013-08-02 12:35:21.621 UTC 104da24c-03a3-4941-a6c3-3fdee4b04af6 2013-07-31 12:12:55.498 UTC 804a0584-07a7-4334-9f13-4d25ba89ce7f 2013-08-05 11:37:59.345 UTC 238c9aa4-2842-4dec-ba40-8a9a422ca87c 2013-08-08 14:47:13.913 UTC 9deb185a-d5b3-421d-b8b7-867f535b07f0 2013-08-01 14:29:45.250 UTC 3dd2a242-b51a-4924-b15a-7acf1e86dd8e 2013-08-01 14:57:49.545 UTC 6ad6a1a5-d0b0-4bae-9aad-50271417b8b6 2013-08-01 12:57:48.123 UTC 3ae47817-e0df-46c3-a9a9-f4d38e57020b 2013-08-05 13:25:13.232 UTC 0bd54491-9f34-45ae-8e1f-cad023f20c1a 2013-08-05 10:15:51.297 UTC f482df22-fa2d-4b69-b07b-e733ec99ccf2 2013-08-01 14:37:39.664 UTC b23e87c9-c8f7-4857-bcb2-5b649f85135c 2013-08-01 13:03:41.141 UTC a8332100-0556-4a8f-adbc-39e0eb500c93 2011-02-03 14:54:58.829 UTC 8aa0e942-dbe9-423c-b4e2-cf414400860d 2013-08-01 14:02:46.310 UTC a8090a72-ef06-480f-b430-d231cf3ccf56 2013-08-05 09:44:47.503 UTC 6d06e03f-e72a-409e-8168-0db7e7e8c31a 2013-07-31 12:21:00.201 UTC 4785d728-b29b-4bf5-8095-915f27192628 2013-07-25 13:37:21.944 UTC c85b29cd-b254-42bc-98e8-b75fc4c245bf 2013-08-05 09:49:41.933 UTC 1c28bdfa-d41d-4311-9215-10bc571fc76d 2013-07-31 12:10:50.446 UTC a2d64b2d-305a-44ed-9c3e-4dbcf575f931 2013-07-31 12:06:02.75 UTC cfe5cfd0-557f-442c-8760-87b0ae1b306f 2013-08-01 14:53:26.552 UTC 027cfd43-b69c-4173-a28d-967f76995324 2013-08-02 14:16:52.758 UTC 27614fd7-b142-4335-9d64-137f112703c3 2013-08-02 15:11:19.668 UTC 5d01758b-4938-4141-9066-23da04f57ce5 2013-08-01 13:36:56.385 UTC dbc96ad2-018f-473d-acca-3e33b44cc59d 2013-08-05 09:51:01.133 UTC d7dcb486-3934-4b45-8264-8db539865745 2013-07-30 11:29:45.552 UTC d52ae246-a8f4-4695-953d-ce977b5200c5 2013-08-05 12:37:06.980 UTC fac3b253-9743-4ea5-b040-b430ebd35e0a 2013-08-07 12:16:19.171 UTC 73c90213-5465-49e8-ae2c-75ea43271545 2013-08-05 11:28:10.560 UTC 23c02cd5-ebae-4ec3-97e0-ab0e46a4bb17 2013-08-01 13:44:48.519 UTC 559e9841-0893-4caa-b1c3-c9aebf0adc4c 2013-08-08 14:55:49.964 UTC 266c2de9-b756-4915-8269-f8411f0121f4 2013-08-05 13:07:27.415 UTC fa1ba2b9-8f0d-4e15-9f5d-0e67993cb458 2013-08-06 09:54:40.78 UTC da4a0084-8d34-48e9-a1f6-b105367a8f6e 2013-07-25 13:38:55.807 UTC bcdf7647-6a41-4abb-a49e-fa052111a1be 2013-08-02 13:15:15.722 UTC ee5896c5-1953-4343-a44d-6236f12dd9a7 2013-08-06 11:05:41.319 UTC 8bc7e6bb-8060-44fa-a31c-9322ab44c3ac 2013-08-02 13:31:18.812 UTC 72ea5398-092c-4313-99a8-cb24da36d6fb 2013-08-05 13:10:02.979 UTC 9a5233bb-b5f4-4a82-9ad5-91b6faa2df3d 2013-08-05 11:23:06.510 UTC 2bcf5ab4-af3e-40ab-947d-fcffdcac302a 2013-08-01 13:08:59.891 UTC 4e6de8f6-4f85-4086-9cac-55825bd4f590 2013-08-01 15:10:45.802 UTC 7f0e1834-99b1-4607-99bf-1d5a9cca664a 2013-08-02 14:58:59.430 UTC c2a26e6c-36a3-450d-9441-0051bc9a3e55 2013-08-02 13:13:36.645 UTC 7ee85d66-b88a-4f43-a944-a4fea254b745 2013-08-05 10:15:14.703 UTC 3806ac74-1442-43cf-89a1-ca61a0a63a3a 2013-08-05 13:47:05.557 UTC ca415cd3-750b-44f6-8cf6-b0fff3eb9e92 2013-08-01 13:10:44.925 UTC 9618806c-90c4-40ba-b004-5cdf4725b4a1 2013-07-25 13:28:42.189 UTC 9b9b6151-edc6-49ea-b7e5-8da5b55cb319 2013-07-29 14:09:55.421 UTC dcf224e9-c257-410c-aa56-1cb13d083194 2013-08-06 09:48:48.376 UTC 095a1912-763f-46f5-ae15-d2a16b7f37fe 2013-08-05 09:54:17.750 UTC ba4fc1f4-0624-4eac-af9b-f970db6d3796 2013-08-01 13:40:43.128 UTC a2c9cb5d-4826-4e75-b26d-551c65081b2f 2013-08-08 15:00:16.153 UTC 8b5bbbda-bb25-4e31-9d19-94cb4df65c00 2013-08-02 13:29:07.675 UTC 8b6bcd9f-4030-4b67-b196-838fe332a12b 2013-08-05 13:13:23.595 UTC 9ea869b9-2a18-4725-905d-933f8f8254d5 2013-07-25 14:03:56.306 UTC d5491639-3bc5-4208-8f03-683820f123db 2013-07-29 14:08:38.490 UTC 5c47732e-3d43-4224-b5a8-ba5110575135 2013-08-05 13:36:21.297 UTC ca3f0ef7-8cb8-48ee-b8cf-6684b3bbe058 2013-07-25 13:35:00.414 UTC a7966040-5302-4ad2-82af-3ce18e3b417f 2013-08-01 15:14:48.394 UTC 0d522cd9-e99e-4311-845a-f24b8545c21a 2013-08-01 14:30:45.677 UTC a353a37a-6536-4b73-8ce2-c7a378332347 2013-08-09 10:55:20.254 UTC ac6a2084-57f0-487c-8e6b-ef283bcc571f 2013-08-02 13:01:59.325 UTC 3c5ec6a8-e649-4019-a7b5-deafc259131f 2013-08-09 10:46:31.299 UTC 454a4c58-5ea2-4538-87e2-1b7bd5d26f30 2013-08-05 13:42:51.628 UTC Sander van Boom, Paul Fisher 2013-07-25 14:00:21.554 UTC 1a45a255-b1ce-4e06-be03-ce4025991f3f 2013-08-08 14:45:19.731 UTC 0f73bf4a-63e2-45a0-ac4e-00115b8976eb 2013-08-05 12:58:28.131 UTC 8f5ee0fb-adfd-4b78-b0c4-be530f4c43f8 2013-08-02 14:29:28.785 UTC Based on the work of Fisher: This workflow takes in a search term, are passed to the eSearch function and searched for in PubMed. I extended by removing outputs and text extraction and addded an automatic Solr storage process using a post.jar, specified by the user. Before running this workflow, make sure that a solr server is up and running and the variable attached to the SolrImport process contains the correct path. Dependencies: - Solr 2013-07-25 14:03:50.958 UTC 20efc1da-dbf9-495b-876f-d2df86c32a63 2013-07-31 12:18:13.833 UTC 276bd7a3-3b53-4cb4-b2f7-f7fe7cb781ba 2013-08-02 14:39:41.720 UTC 3713d8f2-2fb5-4d4d-ad33-2f3bd787d2d1 2013-08-01 12:55:09.939 UTC 3498bc34-1c89-4aaf-b542-c3669568ea42 2013-07-30 11:27:57.638 UTC d54ddd8a-3a89-4913-b445-4289983e09cb 2013-08-05 13:03:51.185 UTC 7e2a22e9-5f80-41f0-9488-53f179b69c8f 2013-07-30 11:23:16.662 UTC 412a298e-90f7-40c9-a04e-33dce4cfebba 2013-07-29 10:20:56.876 UTC 868c69db-94c6-43a4-b23f-fb0a349a7832 2013-08-05 14:01:35.954 UTC 7c44ad63-2295-4240-8444-96c392b8f3ae 2013-07-25 14:08:16.582 UTC f9dfce1f-40b1-4bd2-a61e-848fbbc78bf3 2013-08-07 12:40:15.762 UTC 3439bdc0-268a-4e97-9a48-a63ab1ab3a0d 2013-08-05 12:47:50.761 UTC 34f05c20-4402-4ccd-b46f-d394a483b89c 2013-08-01 13:24:52.947 UTC XML_extraction_and_SAbstractLocation00 This is the location of the chosen abstract. 2013-08-09 11:12:32.861 UTC Workspace00 /tmp/ 2013-08-09 11:11:59.799 UTC This is where the workflow will store the files that are being indexed in Solr. 2013-08-09 11:12:19.929 UTC PathToPostJar00 /tmp/solr/example/exampledocs/post.jar 2013-08-09 11:11:48.834 UTC This is the path to the example Post.jar. 2013-08-09 11:11:25.982 UTC OutputFileLocationSolrImport_STDERRSolrImport_STDOUTExtractPubmedIDxml_text0nodelist11 This XPath extracts the PubmedID from the abstract. 2013-08-09 11:08:37.672 UTC net.sf.taverna.t2.activitiesxpath-activity1.4net.sf.taverna.t2.activities.xpath.XPathActivity <?xml version="1.0" encoding="UTF-8"?> <eFetchResult xmlns="http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed"> <WorkflowRun> <ResearcherID>H-8686-2013</ResearcherID> <ArticleExtractionDate>Wed Aug 07 15:04:04 CEST 2013</ArticleExtractionDate> <WorkflowMyExperimentID>3659</WorkflowMyExperimentID> <WorkflowVersion>3</WorkflowVersion> <WorkflowDevelopers>Sander van Boom and Paul Fisher</WorkflowDevelopers> <WorkflowInputs> <StartDate/> <EndDate/> <SearchTerm>proteomics</SearchTerm> <MaximumArticles>10</MaximumArticles> </WorkflowInputs> </WorkflowRun> <PubmedArticleSet> <PubmedArticle> <MedlineCitation Owner="NLM" Status="In-Data-Review"> <PMID Version="1">23918963</PMID> <DateCreated> <Year>2013</Year> <Month>08</Month> <Day>06</Day> </DateCreated> <Article PubModel="Print"> <Journal> <ISSN IssnType="Electronic">1460-2431</ISSN> <JournalIssue CitedMedium="Internet"> <Volume>64</Volume> <Issue>11</Issue> <PubDate> <Year>2013</Year> <Month>Aug</Month> </PubDate> </JournalIssue> <Title>Journal of experimental botany</Title> <ISOAbbreviation>J. Exp. Bot.</ISOAbbreviation> </Journal> <ArticleTitle>Leaf proteome alterations in the context of physiological and morphological responses to drought and heat stress in barley (Hordeum vulgare L.).</ArticleTitle> <Pagination> <MedlinePgn>3201-12</MedlinePgn> </Pagination> <ELocationID EIdType="doi" ValidYN="Y">10.1093/jxb/ert158</ELocationID> <Abstract> <AbstractText>The objective of this study was to identify barley leaf proteins differentially regulated in response to drought and heat and the combined stresses in context of the morphological and physiological changes that also occur. The Syrian landrace Arta and the Australian cultivar Keel were subjected to drought, high temperature, or a combination of both treatments starting at heading. Changes in the leaf proteome were identified using differential gel electrophoresis and mass spectrometry. The drought treatment caused strong reductions of biomass and yield, while photosynthetic performance and the proteome were not significantly changed. In contrast, the heat treatment and the combination of heat and drought reduced photosynthetic performance and caused changes of the leaf proteome. The proteomic analysis identified 99 protein spots differentially regulated in response to heat treatment, 14 of which were regulated in a genotype-specific manner. Differentially regulated proteins predominantly had functions in photosynthesis, but also in detoxification, energy metabolism, and protein biosynthesis. The analysis indicated that de novo protein biosynthesis, protein quality control mediated by chaperones and proteases, and the use of alternative energy resources, i.e. glycolysis, play important roles in adaptation to heat stress. In addition, genetic variation identified in the proteome, in plant growth and photosynthetic performance in response to drought and heat represent stress adaption mechanisms to be exploited in future crop breeding efforts.</AbstractText> </Abstract> <Affiliation>Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829 Köln, Germany.</Affiliation> <AuthorList CompleteYN="Y"> <Author ValidYN="Y"> <LastName>Rollins</LastName> <ForeName>J A</ForeName> <Initials>JA</Initials> </Author> <Author ValidYN="Y"> <LastName>Habte</LastName> <ForeName>E</ForeName> <Initials>E</Initials> </Author> <Author ValidYN="Y"> <LastName>Templer</LastName> <ForeName>S E</ForeName> <Initials>SE</Initials> </Author> <Author ValidYN="Y"> <LastName>Colby</LastName> <ForeName>T</ForeName> <Initials>T</Initials> </Author> <Author ValidYN="Y"> <LastName>Schmidt</LastName> <ForeName>J</ForeName> <Initials>J</Initials> </Author> <Author ValidYN="Y"> <LastName>von Korff</LastName> <ForeName>M</ForeName> <Initials>M</Initials> </Author> </AuthorList> <Language>eng</Language> <PublicationTypeList> <PublicationType>Journal Article</PublicationType> </PublicationTypeList> </Article> <MedlineJournalInfo> <Country>England</Country> <MedlineTA>J Exp Bot</MedlineTA> <NlmUniqueID>9882906</NlmUniqueID> <ISSNLinking>0022-0957</ISSNLinking> </MedlineJournalInfo> <CitationSubset>IM</CitationSubset> <KeywordList Owner="NOTNLM"> <Keyword MajorTopicYN="N">Abiotic stress</Keyword> <Keyword MajorTopicYN="N">Rubisco activase</Keyword> <Keyword MajorTopicYN="N">barley</Keyword> <Keyword MajorTopicYN="N">drought</Keyword> <Keyword MajorTopicYN="N">heat</Keyword> <Keyword MajorTopicYN="N">proteomics</Keyword> <Keyword MajorTopicYN="N">yield.</Keyword> </KeywordList> </MedlineCitation> <PubmedData> <History> <PubMedPubDate PubStatus="entrez"> <Year>2013</Year> <Month>8</Month> <Day>7</Day> <Hour>6</Hour> <Minute>0</Minute> </PubMedPubDate> <PubMedPubDate PubStatus="pubmed"> <Year>2013</Year> <Month>8</Month> <Day>7</Day> <Hour>6</Hour> <Minute>0</Minute> </PubMedPubDate> <PubMedPubDate PubStatus="medline"> <Year>2013</Year> <Month>8</Month> <Day>7</Day> <Hour>6</Hour> <Minute>0</Minute> </PubMedPubDate> </History> <PublicationStatus>ppublish</PublicationStatus> <ArticleIdList> <ArticleId IdType="pii">ert158</ArticleId> <ArticleId IdType="doi">10.1093/jxb/ert158</ArticleId> <ArticleId IdType="pubmed">23918963</ArticleId> </ArticleIdList> </PubmedData> </PubmedArticle> </PubmedArticleSet> </eFetchResult> /default:eFetchResult/default:PubmedArticleSet/default:PubmedArticle/default:MedlineCitation/default:PMID default http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeRead_Text_Filefileurl0filecontents00 Some files are to be tasted, others to be swallowd and some few to be chewed and digested. This program reads a file and returns the content. 2013-08-09 11:09:50.861 UTC net.sf.taverna.t2.activitieslocalworker-activity1.4net.sf.taverna.t2.activities.localworker.LocalworkerActivity fileurl 0 'text/plain' java.lang.String true encoding 0 'text/plain' java.lang.String true filecontents 0 'text/plain' 0 workflow net.sourceforge.taverna.scuflworkers.io.TextFileReader net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeCreateFileContentPubmedID0Title0XMLContent00 This process takes all the values from the XPaths and creates an XML file with a format that Solr can understand. 2013-08-09 11:07:44.258 UTC net.sf.taverna.t2.activitiesbeanshell-activity1.4net.sf.taverna.t2.activities.beanshell.BeanshellActivity PubmedID 0 text/plain java.lang.String true Title 0 text/plain java.lang.String true XMLContent 0 0 workflow net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeWrite_Text_Filefilecontents0outputFile0 This process writes a text file to a known location. But I guess you already figured that out from the title... 2013-08-09 11:05:55.436 UTC net.sf.taverna.t2.activitieslocalworker-activity1.4net.sf.taverna.t2.activities.localworker.LocalworkerActivity outputFile 0 'text/plain' java.lang.String true filecontents 0 'text/plain' java.lang.String true encoding 0 'text/plain' java.lang.String true outputFile 0 'text/plain' 0 workflow net.sourceforge.taverna.scuflworkers.io.TextFileWriter net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeCreateFileLocationWorkspace0PubmedID0FileLocation00 This process creates the final location to store the indexed files based on the workspace and the pubmedID. 2013-08-09 11:06:42.218 UTC net.sf.taverna.t2.activitiesbeanshell-activity1.4net.sf.taverna.t2.activities.beanshell.BeanshellActivity Workspace 0 text/plain java.lang.String true PubmedID 0 text/plain java.lang.String true FileLocation 0 0 workflow net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeSolrImportinputFile0pathToPostJar0STDERR00STDOUT00 SolrImport takes the path of the txt file and stores this in a Solr database. Make sure that the Solr database is running and that the correct path is inside the variable. If Solr is running locally you can check if the files have been stored by browsing to the following location:http://localhost:8983/solr/#/ Solr can be downloaded at: http://lucene.apache.org/solr/ 2013-07-25 14:14:44.103 UTC net.sf.taverna.t2.activitiesexternal-tool-activity1.4net.sf.taverna.t2.activities.externaltool.ExternalToolActivity 789663B8-DA91-428A-9F7D-B3F3DA185FD4 default local <?xml version="1.0" encoding="UTF-8"?> <localInvocation><shellPrefix>/bin/sh -c</shellPrefix><linkCommand>/bin/ln -s %%PATH_TO_ORIGINAL%% %%TARGET_NAME%%</linkCommand></localInvocation> 6acb54d9-5501-46e2-85de-ee69bbd97fc4 #Note the -Dauto argument in the command. This makes solr automatically find #The extension of the file and creates a process that can use a wide range of different #extensions. java -Dauto -jar "%%pathToPostJar%%" "%%inputFile%%" 1200 1800 inputFile pathToPostJar inputFile inputFile false false false UTF-8 false false false pathToPostJar pathToPostJar false false false UTF-8 false false false false true true 0 false net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeExtractTitlexml_text0nodelist11 This XPath extracts the title from the abstract. 2013-08-09 11:08:21.80 UTC net.sf.taverna.t2.activitiesxpath-activity1.4net.sf.taverna.t2.activities.xpath.XPathActivity <?xml version="1.0" encoding="UTF-8"?> <eFetchResult xmlns="http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed"> <WorkflowRun> <ResearcherID>H-8686-2013</ResearcherID> <ArticleExtractionDate>Wed Aug 07 15:04:12 CEST 2013</ArticleExtractionDate> <WorkflowMyExperimentID>3659</WorkflowMyExperimentID> <WorkflowVersion>3</WorkflowVersion> <WorkflowDevelopers>Sander van Boom and Paul Fisher</WorkflowDevelopers> <WorkflowInputs> <StartDate/> <EndDate/> <SearchTerm>proteomics</SearchTerm> <MaximumArticles>10</MaximumArticles> </WorkflowInputs> </WorkflowRun> <PubmedArticleSet> <PubmedArticle> <MedlineCitation Owner="NLM" Status="Publisher"> <PMID Version="1">23917802</PMID> <DateCreated> <Year>2013</Year> <Month>8</Month> <Day>6</Day> </DateCreated> <Article PubModel="Print-Electronic"> <Journal> <ISSN IssnType="Electronic">1574-4647</ISSN> <JournalIssue CitedMedium="Internet"> <PubDate> <Year>2013</Year> <Month>Aug</Month> <Day>6</Day> </PubDate> </JournalIssue> <Title>Age (Dordrecht, Netherlands)</Title> <ISOAbbreviation>Age (Dordr)</ISOAbbreviation> </Journal> <ArticleTitle>Urine proteomes of healthy aging humans reveal extracellular matrix (ECM) alterations and immune system dysfunction.</ArticleTitle> <Pagination> <MedlinePgn/> </Pagination> <Abstract> <AbstractText NlmCategory="UNLABELLED">Aging is a complex physiological process that poses considerable conundrums to rapidly aging societies. For example, the risk of dying from cardiovascular diseases and/or cancer steadily declines for people after their 60s, and other causes of death predominate for seniors older than 80 years of age. Thus, physiological aging presents numerous unanswered questions, particularly with regard to changing metabolic patterns. Urine proteomics analysis is becoming a non-invasive and reproducible diagnostic method. We investigated the urine proteomes in healthy elderly people to determine which metabolic processes were weakened or strengthened in aging humans. Urine samples from 37 healthy volunteers aged 19-90 years (19 men, 18 women) were analyzed for protein expression by liquid chromatography-tandem mass spectrometry. This generated a list of 19 proteins that were differentially expressed in different age groups (young, intermediate, and old age). In particular, the oldest group showed protein changes reflective of altered extracellular matrix turnover and declining immune function, in which changes corresponded to reported changes in cardiovascular tissue remodeling and immune disorders in the elderly. Thus, urinary proteome changes in the elderly appear to reflect the physiological processes of aging and are particularly clearly represented in the circulatory and immune systems. Detailed identification of "protein trails" creates a more global picture of metabolic changes that occur in the elderly.</AbstractText> </Abstract> <Affiliation>Mass Spectrometry Laboratory, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, ul. Pawinskiego 5a, 02-106, Warsaw, Poland.</Affiliation> <AuthorList> <Author> <LastName>Bakun</LastName> <ForeName>M</ForeName> <Initials>M</Initials> </Author> <Author> <LastName>Senatorski</LastName> <ForeName>G</ForeName> <Initials>G</Initials> </Author> <Author> <LastName>Rubel</LastName> <ForeName>T</ForeName> <Initials>T</Initials> </Author> <Author> <LastName>Lukasik</LastName> <ForeName>A</ForeName> <Initials>A</Initials> </Author> <Author> <LastName>Zielenkiewicz</LastName> <ForeName>P</ForeName> <Initials>P</Initials> </Author> <Author> <LastName>Dadlez</LastName> <ForeName>M</ForeName> <Initials>M</Initials> </Author> <Author> <LastName>Paczek</LastName> <ForeName>L</ForeName> <Initials>L</Initials> </Author> </AuthorList> <Language>ENG</Language> <PublicationTypeList> <PublicationType>JOURNAL ARTICLE</PublicationType> </PublicationTypeList> <ArticleDate DateType="Electronic"> <Year>2013</Year> <Month>8</Month> <Day>6</Day> </ArticleDate> </Article> <MedlineJournalInfo> <MedlineTA>Age (Dordr)</MedlineTA> <NlmUniqueID>101250497</NlmUniqueID> </MedlineJournalInfo> </MedlineCitation> <PubmedData> <History> <PubMedPubDate PubStatus="received"> <Year>2013</Year> <Month>2</Month> <Day>5</Day> </PubMedPubDate> <PubMedPubDate PubStatus="accepted"> <Year>2013</Year> <Month>7</Month> <Day>1</Day> </PubMedPubDate> <PubMedPubDate PubStatus="aheadofprint"> <Year>2013</Year> <Month>8</Month> <Day>6</Day> </PubMedPubDate> <PubMedPubDate PubStatus="entrez"> <Year>2013</Year> <Month>8</Month> <Day>7</Day> <Hour>6</Hour> <Minute>0</Minute> </PubMedPubDate> <PubMedPubDate PubStatus="pubmed"> <Year>2013</Year> <Month>8</Month> <Day>7</Day> <Hour>6</Hour> <Minute>0</Minute> </PubMedPubDate> <PubMedPubDate PubStatus="medline"> <Year>2013</Year> <Month>8</Month> <Day>7</Day> <Hour>6</Hour> <Minute>0</Minute> </PubMedPubDate> </History> <PublicationStatus>aheadofprint</PublicationStatus> <ArticleIdList> <ArticleId IdType="doi">10.1007/s11357-013-9562-7</ArticleId> <ArticleId IdType="pubmed">23917802</ArticleId> </ArticleIdList> </PubmedData> </PubmedArticle> </PubmedArticleSet> </eFetchResult> /default:eFetchResult/default:PubmedArticleSet/default:PubmedArticle/default:MedlineCitation/default:Article/default:ArticleTitle default http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeExtractPubmedIDxml_textRead_Text_FilefilecontentsRead_Text_FilefileurlAbstractLocationCreateFileContentPubmedIDExtractPubmedIDnodelistCreateFileContentTitleExtractTitlenodelistWrite_Text_FilefilecontentsCreateFileContentXMLContentWrite_Text_FileoutputFileCreateFileLocationFileLocationCreateFileLocationWorkspaceWorkspaceCreateFileLocationPubmedIDExtractPubmedIDnodelistSolrImportinputFileCreateFileLocationFileLocationSolrImportpathToPostJarPathToPostJarExtractTitlexml_textRead_Text_FilefilecontentsOutputFileLocationCreateFileLocationFileLocationSolrImport_STDERRSolrImportSTDERRSolrImport_STDOUTSolrImportSTDOUT This workflow extracts information from the abstracts and imports them in Solr. 2013-08-09 11:11:01.596 UTC eb33b76c-eef6-4571-a24c-1c1259277d71 2013-08-08 14:04:39.26 UTC 1edcb63d-d915-44df-a049-4d08e51a4c4b 2013-08-09 11:12:40.460 UTC 221d0764-d2f9-4b5e-b919-d34b4090f83b 2013-08-08 14:19:03.953 UTC 8336e90d-0d11-4ae4-bcc7-9095c78600e4 2013-08-08 14:31:46.947 UTC XML extraction and Solr import of abstracts. 2013-08-09 11:10:32.633 UTC cd08d3c5-d20e-44fd-86d9-6497f8d40703 2013-08-09 11:09:46.680 UTC c78610c0-f7f7-4231-b8f1-f3b24adf386d 2013-08-09 10:53:40.522 UTC 459e8502-d439-434b-8456-592ebb255977 2013-08-08 14:15:43.30 UTC b7721579-d224-491e-ab14-34596ffc34f6 2013-08-08 14:06:16.517 UTC 1889a560-dffe-4bdb-a5dc-dd36ddd2de33 2013-08-08 14:17:58.94 UTC 7a67c280-f0b2-4e49-8c6c-39ecdfe2cbb2 2013-08-08 14:40:42.973 UTC 59f6b762-1b12-44c8-aae4-85a05e31d26d 2013-08-08 14:59:03.531 UTC Sander van Boom 2013-08-09 11:10:03.667 UTC XPathPubmedIdspubmed_ids00 text/plain 2011-02-03 14:53:07.50 UTC 2011-02-03 14:53:07.50 UTC AbstractXMLrun_eFetchinpp0attachmentList11outp00net.sf.taverna.t2.activitieswsdl-activity1.4net.sf.taverna.t2.activities.wsdl.WSDLActivity http://www.ncbi.nlm.nih.gov/entrez/eutils/soap/v2.0/efetch_pubmed.wsdl run_eFetch net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 0 0 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeinppXMLid0output00net.sf.taverna.t2.activitieswsdl-activity1.4net.sf.taverna.t2.activities.wsdl.xmlsplitter.XMLInputSplitterActivity id 0 'text/plain' java.lang.String true WebEnv 0 'text/plain' java.lang.String true query_key 0 'text/plain' java.lang.String true tool 0 'text/plain' java.lang.String true email 0 'text/plain' java.lang.String true retstart 0 'text/plain' java.lang.String true retmax 0 'text/plain' java.lang.String true rettype 0 'text/plain' java.lang.String true output 0 'text/xml' 0 <s:extensions xmlns:s="http://org.embl.ebi.escience/xscufl/0.1alpha"><s:complextype optional="false" unbounded="false" typename="eFetchRequest" name="inpp" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed}eFetchRequest"><s:elements><s:basetype optional="true" unbounded="false" typename="string" name="id" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed}>eFetchRequest>id" /><s:basetype optional="true" unbounded="false" typename="string" name="WebEnv" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed}>eFetchRequest>WebEnv" /><s:basetype optional="true" unbounded="false" typename="string" name="query_key" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed}>eFetchRequest>query_key" /><s:basetype optional="true" unbounded="false" typename="string" name="tool" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed}>eFetchRequest>tool" /><s:basetype optional="true" unbounded="false" typename="string" name="email" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed}>eFetchRequest>email" /><s:basetype optional="true" unbounded="false" typename="string" name="retstart" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed}>eFetchRequest>retstart" /><s:basetype optional="true" unbounded="false" typename="string" name="retmax" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed}>eFetchRequest>retmax" /><s:basetype optional="true" unbounded="false" typename="string" name="rettype" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed}>eFetchRequest>rettype" /></s:elements></s:complextype></s:extensions> net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 0 0 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Invokerun_eFetchinppinppXMLoutputinppXMLidpubmed_idsAbstractXMLrun_eFetchoutp dd13f2ba-ad40-4acc-bc47-ee46fafcba52 2013-07-25 13:32:10.434 UTC 363be023-70a0-4cc5-b02a-199d04d05e58 2011-02-03 14:54:58.907 UTC This workflow takes in a number of search terms (as used in the normal PubMed interface) and retrieves a list of PubMed ids in a list format. 2011-02-03 14:53:07.444 UTC Paul Fisher 2011-02-03 14:53:07.444 UTC XPath Pubmed Ids 2011-02-03 14:53:07.444 UTC