PubMed_Search_and_Sosearch_term00text/plain2011-02-03 14:53:06.572 UTC"Text Mining" AND Leiden2013-08-05 14:01:31.929 UTCI want to find all abstracts that contain the words:2013-07-25 14:05:03.937 UTCend_date0020152013-08-05 13:36:08.823 UTCThe found articles should be older then ____:2013-07-25 14:07:17.530 UTCstart_date00The found articles should not be older then ____.2013-07-25 14:06:46.323 UTC19002013-08-05 13:36:20.405 UTCmaximum_articles00I want a maximum of ____ articles.
10 is set as default for testing. More then 100 is good for a normal run. 2013-07-25 14:06:07.459 UTC102013-07-25 13:34:01.622 UTCResearcherID00The ID of the researcher that runs this workflow. If you do not have an ID, please register at www.researcherid.com. 2013-07-30 11:24:03.84 UTCH-8686-20132013-07-30 11:29:43.224 UTCWorkspace00/tmp/2013-08-07 12:28:32.783 UTCThis is workspace where the found articles will be stored. The name of the files will be [ID of the article].xml.2013-08-07 12:28:27.306 UTCSolrWorkspace00This is where the indexed files are being stored. This could be the same as workspace as the workspace where the abstracts are being stored.2013-08-09 10:46:14.845 UTC/tmp/2013-08-09 10:46:26.451 UTCSolrImport_STDERRThis is the standerdized error from Solr. If there were any errors while running Solr then the values will turn red. Red is generally considered to be a bad colour when programming. 2013-08-09 11:03:49.481 UTCSolrImport_STDOUTThis is the standerdized output from Solr. 2013-07-25 14:20:00.680 UTCIDsInDatabaseThis output contains the list of ID's that were found by eSearch but were already in the wokring directory. Better luck next time.2013-08-05 11:16:31.507 UTCIDsBeingSearchedThis output port contains the ID's that has been extracted. 2013-08-05 11:17:00.143 UTCAbstractWithProvThis is the abstract that has been extracted + some provenance. 2013-08-05 11:19:55.518 UTCIndexedFileLocationThe location of the indexed file that was imported in Solr.2013-08-09 11:02:53.603 UTCpubmed_databasevalue00Which database is being used.2013-07-25 14:08:02.977 UTCnet.sf.taverna.t2.activitiesstringconstant-activity1.4net.sf.taverna.t2.activities.stringconstant.StringConstantActivitypubmednet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeextractPMIDxpath0xml-text0nodelist11nodelistAsXML11This process extracts the pubmed ID's based on the eSearch run.2013-08-05 11:02:44.568 UTCnet.sf.taverna.t2.activitieslocalworker-activity1.4net.sf.taverna.t2.activities.localworker.LocalworkerActivityxpath0'text/plain'java.lang.Stringtruexml-text0'text/xml'java.lang.Stringtruenodelist1l('text/plain')1nodelistAsXML1l('text/plain')1workflowdom4jdom4j1.6716010169dom4j:dom4j:1.6net.sourceforge.taverna.scuflworkers.xml.XPathTextWorkernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Invokexpathvalue00net.sf.taverna.t2.activitiesstringconstant-activity1.4net.sf.taverna.t2.activities.stringconstant.StringConstantActivity/*[local-name(.)='eSearchResult']/*[local-name(.)='IdList']/*[local-name(.)='Id']net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Invokerun_eSearchparameters0attachmentList11parameters00This process will run eSearch that will extract the ID's of the articles that give a hit on the query. 2013-08-05 10:17:21.59 UTCnet.sf.taverna.t2.activitieswsdl-activity1.4net.sf.taverna.t2.activities.wsdl.WSDLActivityhttp://eutils.ncbi.nlm.nih.gov/entrez/eutils/soap/eutils.wsdlrun_eSearchnet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0005net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeparametersXML_eFecthdb0term0maxdate0mindate0RetMax0output00This process will create the parameters that can then be used by eSearch and eFetch. 2013-08-05 11:29:52.580 UTCnet.sf.taverna.t2.activitieswsdl-activity1.4net.sf.taverna.t2.activities.wsdl.xmlsplitter.XMLInputSplitterActivitydb0'text/plain'java.lang.Stringtrueterm0'text/plain'java.lang.StringtrueWebEnv0'text/plain'java.lang.StringtrueQueryKey0'text/plain'java.lang.Stringtrueusehistory0'text/plain'java.lang.Stringtruetool0'text/plain'java.lang.Stringtrueemail0'text/plain'java.lang.Stringtruefield0'text/plain'java.lang.Stringtruereldate0'text/plain'java.lang.Stringtruemindate0'text/plain'java.lang.Stringtruemaxdate0'text/plain'java.lang.Stringtruedatetype0'text/plain'java.lang.StringtrueRetStart0'text/plain'java.lang.StringtrueRetMax0'text/plain'java.lang.Stringtruerettype0'text/plain'java.lang.Stringtruesort0'text/plain'java.lang.Stringtrueoutput0'text/xml'0<s:extensions xmlns:s="http://org.embl.ebi.escience/xscufl/0.1alpha"><s:complextype optional="false" unbounded="false" typename="eSearchRequest" name="parameters" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/esearch}eSearchRequest"><s:elements><s:basetype optional="true" unbounded="false" typename="string" name="db" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="term" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="WebEnv" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="QueryKey" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="usehistory" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="tool" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="email" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="field" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="reldate" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="mindate" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="maxdate" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="datetype" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="RetStart" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="RetMax" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="rettype" qname="{http://www.w3.org/2001/XMLSchema}string" /><s:basetype optional="true" unbounded="false" typename="string" name="sort" qname="{http://www.w3.org/2001/XMLSchema}string" /></s:elements></s:complextype></s:extensions>net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeRetrive_abstractspubmed_ids0AbstractXML00This nested workflow was part of Fishers workflow, but has been decreased in size. This workflow is about storing the xmll files from eFetch and doesn't require to extract the plain text abstract like the original workflow did.2013-07-25 14:22:59.512 UTCnet.sf.taverna.t2.activitiesdataflow-activity1.4net.sf.taverna.t2.activities.dataflow.DataflowActivitynet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeLookAtWatchActivateTimeLookUp0CurrentTime00Time flies like an arrow; fruit flies like banana.
This process receives the Abstract XML that activates a time lookup. This is then given to the next process.2013-08-05 11:12:03.855 UTCnet.sf.taverna.t2.activitiesbeanshell-activity1.4net.sf.taverna.t2.activities.beanshell.BeanshellActivityActivateTimeLookUp0text/plainjava.lang.StringtrueCurrentTime00workflownet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeCreateProvenanceResearcherID0MyExperimentID0WorkflowDevelopers0WorkflowVersion0ExtractionDate0MaximumArticles0SearchTerm0StartDate0EndDate0Prov00Provenance is important when you want to trace back your data. For this reason I added a process in the workflow that will add some basic provenance based on the work of the w3 (www.w3.org).
The Process adds the following types of provenance:
ResearcherID - Use www.reasercherID.org to get a researcherID. This can then be linked to your research.
ExtractionDate - The date and time the article was extracted.
MyExperimentID - The MyExperimentID of the used worklow.
WorkflowVersion - The Version of the used workflow.
WorkflowDevelopers - A list of the developers of the workflow.
StartDate - The starting date of the article search, see imput port for more information.
EndDate - The ending date of the article search, see imput port for more information.
SearchTerm - The original search query, see imput port for more information.
MaximumArticles - The Maximum amount of articles that have been searched, see imput port for more information.2013-08-09 11:21:45.129 UTCnet.sf.taverna.t2.activitiesbeanshell-activity1.4net.sf.taverna.t2.activities.beanshell.BeanshellActivityResearcherID0text/plainjava.lang.StringtrueExtractionDate0text/plainjava.lang.StringtrueMyExperimentID0text/plainjava.lang.StringtrueWorkflowVersion0text/plainjava.lang.StringtrueWorkflowDevelopers0text/plainjava.lang.StringtrueStartDate0text/plainjava.lang.StringtrueEndDate0text/plainjava.lang.StringtrueSearchTerm0text/plainjava.lang.StringtrueMaximumArticles0text/plainjava.lang.StringtrueProv00workflownet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeWrite_Text_FileoutputFile0filecontents0This process writes the content of the workflow to files. The location of the file is created in the create file location process.
NOTE: You might want to change the working directory. This can be done by changing the CreateFileLocation process. 2013-08-05 11:04:07.216 UTCnet.sf.taverna.t2.activitieslocalworker-activity1.4net.sf.taverna.t2.activities.localworker.LocalworkerActivityoutputFile0'text/plain'java.lang.Stringtruefilecontents0'text/plain'java.lang.Stringtrueencoding0'text/plain'java.lang.StringtrueoutputFile00workflownet.sourceforge.taverna.scuflworkers.io.TextFileWriterUserNameHere2013-08-05 13:18:51.98 UTCnet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeCreateListOfArticlesThatNeedExtractingIDsInDatabase0ExtractableIDs0IDsInDatabaseOut11ListOfArticlesThatNeedExtracting11This Conditional branch splits the workflow in two directions:
IDsInDatabase: A list of ID's of articles that are already in the working directory. They should not be extracted.
ListOfArticlesThatNeedExtracting - The List of Pubmed ID's that should be extracted and added to the database. 2013-08-05 10:15:05.873 UTCnet.sf.taverna.t2.activitiesbeanshell-activity1.4net.sf.taverna.t2.activities.beanshell.BeanshellActivityExtractableIDs0text/plainjava.lang.StringtrueIDsInDatabase0text/plainjava.lang.StringtrueListOfArticlesThatNeedExtracting11IDsInDatabaseOut11workflownet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeCheckIfArticleIsInDatabaseExtractableArticles0STDOUT00This process calls the commandline and checks if the file at the filelocation exists. If this is not the case the process will return the string false.2013-08-05 10:58:54.113 UTCnet.sf.taverna.t2.activitiesexternal-tool-activity1.4net.sf.taverna.t2.activities.externaltool.ExternalToolActivity789663B8-DA91-428A-9F7D-B3F3DA185FD4default local<?xml version="1.0" encoding="UTF-8"?>
<localInvocation><shellPrefix>/bin/sh -c</shellPrefix><linkCommand>/bin/ln -s %%PATH_TO_ORIGINAL%% %%TARGET_NAME%%</linkCommand></localInvocation>
5f0528ad-82e5-47c0-99ae-a67cb01e2038//Check if article is in the database
[ -f %%ExtractableArticles%% ] && echo -n "File exists" || echo %%ExtractableArticles%%12001800ExtractableArticlesExtractableArticlesExtractableArticlesfalsefalsefalseUTF-8falsefalsefalsefalsetruetrue0falsenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeFlatten_Listinputlist2outputlist11We need to decrease the depth of the list by one level. Otherwise we will get errors in the Validation report. 2013-08-05 10:15:47.895 UTCnet.sf.taverna.t2.activitieslocalworker-activity1.4net.sf.taverna.t2.activities.localworker.LocalworkerActivityinputlist2l(l(''))[Btrueoutputlist1l('')1workfloworg.embl.ebi.escience.scuflworkers.java.FlattenListnet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeCreateFileLocation_2PubmedID0Workspace0FileLocation00This process creates the location of the file. This can then be used to check whetevver the ile exists in the next process (CheckIfArticleIsInDatabase).
NOTE: if you want to change the working directory, please change this process so it will link to the correct directory.2013-08-05 11:04:34.562 UTCnet.sf.taverna.t2.activitiesbeanshell-activity1.4net.sf.taverna.t2.activities.beanshell.BeanshellActivityPubmedID0text/plainjava.lang.StringtrueWorkspace0text/plainjava.lang.StringtrueFileLocation00workflownet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeMyExperimentID_valuevalue00This value stores the MyExperiment ID of the workflow. If you reupload this workflow with improvements feel free to change this value. 2013-08-05 11:15:01.977 UTCnet.sf.taverna.t2.activitiesstringconstant-activity1.4net.sf.taverna.t2.activities.stringconstant.StringConstantActivity3659net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeWorkflowDevelopers_valuevalue00Sander van Boom and Paul Fisher created this workflow. If you've changed this workflow and uploaded it on myExperiment feel free to add your name in this variable as well.2013-08-05 11:13:38.464 UTCnet.sf.taverna.t2.activitiesstringconstant-activity1.4net.sf.taverna.t2.activities.stringconstant.StringConstantActivitySander van Boom and Paul Fishernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeWorkflowVersion_valuevalue00This value stores the current version of the workflow. 2013-08-05 11:14:06.878 UTCnet.sf.taverna.t2.activitiesstringconstant-activity1.4net.sf.taverna.t2.activities.stringconstant.StringConstantActivity5net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeAddProvenanceProvenance0Abstract0AbstractWithProvenance00This process fuses the provenance, the found abstracts and the header of the file.
The output is also send to an output port for checking the values.2013-08-06 09:54:37.903 UTCnet.sf.taverna.t2.activitiesbeanshell-activity1.4net.sf.taverna.t2.activities.beanshell.BeanshellActivityProvenance0text/plainjava.lang.StringtrueAbstract0text/plainjava.lang.StringtrueAbstractWithProvenance00workflownet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeXPath_Servicexml_text0nodelistAsXML11This XPath Service removes the header from the file. This is because we want to add provenance to the file later in the workflow.
After we've added the provenance then we add the header back to the file.2013-08-06 09:53:59.709 UTCnet.sf.taverna.t2.activitiesxpath-activity1.4net.sf.taverna.t2.activities.xpath.XPathActivity<?xml version="1.0" encoding="UTF-8"?>
<eFetchResult xmlns="http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed">
<PubmedArticleSet>
<PubmedArticle>
<MedlineCitation Owner="NLM" Status="PubMed-not-MEDLINE">
<PMID Version="1">20618981</PMID>
<DateCreated>
<Year>2010</Year>
<Month>07</Month>
<Day>12</Day>
</DateCreated>
<DateCompleted>
<Year>2012</Year>
<Month>10</Month>
<Day>02</Day>
</DateCompleted>
<DateRevised>
<Year>2012</Year>
<Month>11</Month>
<Day>09</Day>
</DateRevised>
<Article PubModel="Electronic">
<Journal>
<ISSN IssnType="Electronic">2041-1480</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>1</Volume>
<Issue>1</Issue>
<PubDate>
<Year>2010</Year>
</PubDate>
</JournalIssue>
<Title>Journal of biomedical semantics</Title>
<ISOAbbreviation>J Biomed Semantics</ISOAbbreviation>
</Journal>
<ArticleTitle>Rewriting and suppressing UMLS terms for improved biomedical term identification.</ArticleTitle>
<Pagination>
<MedlinePgn>5</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1186/2041-1480-1-5</ELocationID>
<Abstract>
<AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">Identification of terms is essential for biomedical text mining.. We concentrate here on the use of vocabularies for term identification, specifically the Unified Medical Language System (UMLS). To make the UMLS more suitable for biomedical text mining we implemented and evaluated nine term rewrite and eight term suppression rules. The rules rely on UMLS properties that have been identified in previous work by others, together with an additional set of new properties discovered by our group during our work with the UMLS. Our work complements the earlier work in that we measure the impact on the number of terms identified by the different rules on a MEDLINE corpus. The number of uniquely identified terms and their frequency in MEDLINE were computed before and after applying the rules. The 50 most frequently found terms together with a sample of 100 randomly selected terms were evaluated for every rule.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">Five of the nine rewrite rules were found to generate additional synonyms and spelling variants that correctly corresponded to the meaning of the original terms and seven out of the eight suppression rules were found to suppress only undesired terms. Using the five rewrite rules that passed our evaluation, we were able to identify 1,117,772 new occurrences of 14,784 rewritten terms in MEDLINE. Without the rewriting, we recognized 651,268 terms belonging to 397,414 concepts; with rewriting, we recognized 666,053 terms belonging to 410,823 concepts, which is an increase of 2.8% in the number of terms and an increase of 3.4% in the number of concepts recognized. Using the seven suppression rules, a total of 257,118 undesired terms were suppressed in the UMLS, notably decreasing its size. 7,397 terms were suppressed in the corpus.</AbstractText>
<AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">We recommend applying the five rewrite rules and seven suppression rules that passed our evaluation when the UMLS is to be used for biomedical term identification in MEDLINE. A software tool to apply these rules to the UMLS is freely available at http://biosemantics.org/casper.</AbstractText>
</Abstract>
<Affiliation>Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands. k.hettne@erasmusmc.nl.</Affiliation>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Hettne</LastName>
<ForeName>Kristina M</ForeName>
<Initials>KM</Initials>
</Author>
<Author ValidYN="Y">
<LastName>van Mulligen</LastName>
<ForeName>Erik M</ForeName>
<Initials>EM</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Schuemie</LastName>
<ForeName>Martijn J</ForeName>
<Initials>MJ</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Schijvenaars</LastName>
<ForeName>Bob Ja</ForeName>
<Initials>BJ</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Kors</LastName>
<ForeName>Jan A</ForeName>
<Initials>JA</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType>Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2010</Year>
<Month>03</Month>
<Day>31</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>J Biomed Semantics</MedlineTA>
<NlmUniqueID>101531992</NlmUniqueID>
</MedlineJournalInfo>
<CommentsCorrectionsList>
<CommentsCorrections RefType="Cites">
<RefSource>BMC Bioinformatics. 2009;10:14</RefSource>
<PMID Version="1">19134199</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Proc AMIA Symp. 2001;:17-21</RefSource>
<PMID Version="1">11825149</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Pharmacogenomics. 2007 Nov;8(11):1521-34</RefSource>
<PMID Version="1">18034617</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Brief Bioinform. 2007 Sep;8(5):358-75</RefSource>
<PMID Version="1">17977867</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Bioinformatics. 2005 Sep 1;21 Suppl 2:ii259-67</RefSource>
<PMID Version="1">16204115</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Proteomics. 2007 Mar;7(6):921-31</RefSource>
<PMID Version="1">17370270</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>J Biomed Discov Collab. 2007;2:2</RefSource>
<PMID Version="1">17480215</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>IEEE Trans Nanobioscience. 2007 Mar;6(1):51-9</RefSource>
<PMID Version="1">17393850</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>BMC Bioinformatics. 2007;8:14</RefSource>
<PMID Version="1">17233900</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Drug Discov Today. 2006 Apr;11(7-8):315-25</RefSource>
<PMID Version="1">16580973</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Brief Bioinform. 2005 Mar;6(1):57-71</RefSource>
<PMID Version="1">15826357</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>J Biomed Inform. 2004 Dec;37(6):512-26</RefSource>
<PMID Version="1">15542023</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>PLoS Biol. 2004 Nov;2(11):e309</RefSource>
<PMID Version="1">15383839</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Neurology. 1996 Sep;47(3):809-10</RefSource>
<PMID Version="1">8797484</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Perspect Biol Med. 1988 Summer;31(4):526-57</RefSource>
<PMID Version="1">3075738</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Perspect Biol Med. 1986 Autumn;30(1):7-18</RefSource>
<PMID Version="1">3797213</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Genome Biol. 2004;5(6):R43</RefSource>
<PMID Version="1">15186494</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70</RefSource>
<PMID Version="1">14681409</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>J Am Med Inform Assoc. 2003 May-Jun;10(3):252-9</RefSource>
<PMID Version="1">12626374</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Proc AMIA Symp. 2002;:504-8</RefSource>
<PMID Version="1">12463875</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Pac Symp Biocomput. 2003;:451-62</RefSource>
<PMID Version="1">12603049</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Proc AMIA Symp. 2002;:727-31</RefSource>
<PMID Version="1">12463920</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Genome Biol. 2002 Sep 13;3(10):RESEARCH0055</RefSource>
<PMID Version="1">12372143</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Proc AMIA Symp. 2001;:448-52</RefSource>
<PMID Version="1">11825228</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>BMC Bioinformatics. 2007;8 Suppl 9:S5</RefSource>
<PMID Version="1">18047706</PMID>
</CommentsCorrections>
</CommentsCorrectionsList>
<OtherID Source="NLM">PMC2895736</OtherID>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2009</Year>
<Month>7</Month>
<Day>15</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2010</Year>
<Month>3</Month>
<Day>31</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="aheadofprint">
<Year>2010</Year>
<Month>3</Month>
<Day>31</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2010</Year>
<Month>7</Month>
<Day>13</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2010</Year>
<Month>7</Month>
<Day>14</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2010</Year>
<Month>7</Month>
<Day>14</Day>
<Hour>6</Hour>
<Minute>1</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pii">2041-1480-1-5</ArticleId>
<ArticleId IdType="doi">10.1186/2041-1480-1-5</ArticleId>
<ArticleId IdType="pubmed">20618981</ArticleId>
<ArticleId IdType="pmc">PMC2895736</ArticleId>
</ArticleIdList>
</PubmedData>
</PubmedArticle>
</PubmedArticleSet>
</eFetchResult>/default:eFetchResult/default:PubmedArticleSetdefaulthttp://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmednet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeFlatten_List_2inputlist2outputlist11We need to decrease the depth of the list by one level. Otherwise we will get errors in the Validation report. 2013-08-05 10:15:47.895 UTCnet.sf.taverna.t2.activitieslocalworker-activity1.4net.sf.taverna.t2.activities.localworker.LocalworkerActivityinputlist2l(l(''))[Btrueoutputlist1l('')1workfloworg.embl.ebi.escience.scuflworkers.java.FlattenListnet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeCreateFileLocation_2_2PubmedID0Workspace0FileLocation00This process creates the location of the file. This can then be used to check whetevver the ile exists in the next process (CheckIfArticleIsInDatabase).
NOTE: if you want to change the working directory, please change this process so it will link to the correct directory.2013-08-05 11:04:34.562 UTCnet.sf.taverna.t2.activitiesbeanshell-activity1.4net.sf.taverna.t2.activities.beanshell.BeanshellActivityPubmedID0text/plainjava.lang.StringtrueWorkspace0text/plainjava.lang.StringtrueFileLocation00workflownet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeInformationExtractionAndSolrImportAbstractLocation0Workspace0PathToPostJar0SolrImport_STDOUT11SolrImport_STDERR11OutputFileLocation11Read a file, extract the the content, extrat the pubmedID from the abstract and write the file back to a new workspace. Then import it in solr.
NOTE: Make sure that Solr is installed and the variable pathToPostJar is linking to the correct path of post.jar.
BONUS NOTE: If you want Solr to detect more then just the title and the ID, you should add extra xpaths and update our solr schema accordingly.2013-08-09 10:44:01.563 UTCnet.sf.taverna.t2.activitiesdataflow-activity1.4net.sf.taverna.t2.activities.dataflow.DataflowActivitynet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokepathToPostJarvalue00This is the path to the Post.jar that Solr uses to import it's documents.
NOTE: Please change this variable to your Solr directory. 2013-08-05 11:27:45.222 UTCnet.sf.taverna.t2.activitiesstringconstant-activity1.4net.sf.taverna.t2.activities.stringconstant.StringConstantActivity/run/media/sander/Second Space/Downloads/Solaria/solr-4.4.0/example/exampledocs/post.jarnet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeextractPMIDxpathextractPMIDxml-textrun_eSearchparametersparametersXML_eFecthdbparametersXML_eFecthtermparametersXML_eFecthmaxdateparametersXML_eFecthmindateparametersXML_eFecthRetMaxRetrive_abstractspubmed_idsLookAtWatchActivateTimeLookUpCreateProvenanceResearcherIDCreateProvenanceMyExperimentIDCreateProvenanceWorkflowDevelopersCreateProvenanceWorkflowVersionCreateProvenanceExtractionDateCreateProvenanceMaximumArticlesCreateProvenanceSearchTermCreateProvenanceStartDateCreateProvenanceEndDateWrite_Text_FileoutputFileWrite_Text_FilefilecontentsCreateListOfArticlesThatNeedExtractingIDsInDatabaseCreateListOfArticlesThatNeedExtractingExtractableIDsCheckIfArticleIsInDatabaseExtractableArticlesFlatten_ListinputlistCreateFileLocation_2PubmedIDCreateFileLocation_2WorkspaceAddProvenanceProvenanceAddProvenanceAbstractXPath_Servicexml_textFlatten_List_2inputlistCreateFileLocation_2_2PubmedIDCreateFileLocation_2_2WorkspaceInformationExtractionAndSolrImportAbstractLocationInformationExtractionAndSolrImportWorkspaceInformationExtractionAndSolrImportPathToPostJarSolrImport_STDERRSolrImport_STDOUTIDsInDatabaseIDsBeingSearchedAbstractWithProvIndexedFileLocation28bf3d16-934e-44fc-94de-7c87f4c410302013-08-05 11:35:21.859 UTCc1e30222-30f4-48ec-a194-ad5e34c920472013-08-02 13:18:41.739 UTCa56da4d8-cc40-45e8-a6bc-ee99d799df8d2013-08-01 14:51:39.516 UTC9d0a018d-a88e-4e9e-91a8-6484c9be0cc32013-08-01 15:04:44.157 UTC633f3632-d8e8-471c-bdc5-c14f0845c2562013-08-01 13:59:58.6 UTC0be07374-fa4b-4ca7-b87a-99f72155aaf12013-08-05 09:49:14.166 UTC34cec394-b4d5-467d-97e3-80576202b8a12013-07-31 12:08:28.188 UTCabc3b001-f660-4424-b7e4-b84048b890a02013-08-05 09:42:19.127 UTCca16d958-1f9e-4f5a-a104-d6627217f3072013-08-02 15:07:25.897 UTC251f2cb5-f496-464a-92e0-0b3af747c34d2013-08-05 11:04:37.153 UTC695fd90a-fb08-426c-b068-c1f77156cb032013-08-05 11:12:07.214 UTC510c4ef5-5b98-4384-9ace-9e8c9fe3c3682013-08-05 11:02:05.834 UTCc072e496-2000-4d17-b0b0-7f86fdc7b9e72013-08-01 15:06:06.726 UTCdc9b637d-68ca-4940-9e17-ead4d1a19e322013-07-31 11:54:07.584 UTC30555d6c-da87-42c9-9a70-1b2cfe37bbe92013-08-02 14:49:45.84 UTCe7bcf406-5544-47a9-8a6a-3e02f8233be22013-07-25 14:23:04.289 UTCe1a018c8-4dd3-40ab-966e-0fd10101000a2013-08-09 10:44:05.29 UTC9b4e81ed-3c43-4f5d-b99a-73b809992b4b2013-08-05 12:54:44.331 UTC4918d574-774c-4973-a9af-06bb0f7e50ef2013-07-30 11:25:13.262 UTCa8d7cf24-2934-4c30-885f-26f93d0e2d202013-08-05 11:14:09.606 UTCa98f054d-1840-42f7-9288-d89a7fee337d2013-08-05 12:31:05.871 UTCbece0fe1-5a8e-4de6-8547-c7f3e1795cd82013-08-02 13:09:26.124 UTC3436ce86-a526-4522-a33f-3a688a0dc4d92013-08-05 13:20:01.209 UTC87d5379e-e1d3-4ba5-af58-b1d0f3be7f6b2013-08-05 12:24:28.396 UTC021d14b7-f71d-465a-8cc2-40f2494145392013-08-05 10:17:22.778 UTC390037a5-901d-4f4f-914d-9fc35011f5992013-08-07 12:11:31.536 UTCed01feeb-8f26-491a-b98a-dbd55b6082b92013-08-01 13:01:27.520 UTCbbfed93e-4f84-439d-8a03-ca9a58d255432013-08-05 13:24:20.215 UTC9ede5928-dd21-4a22-89cc-5441148ff5882013-08-02 12:30:24.566 UTCbb33b0f9-cbd8-4ee1-8e35-04d3ed9832642013-08-02 12:52:58.971 UTC16d30de7-778d-4ff2-971f-086bfd3722102013-08-01 13:42:26.816 UTCe9a2e3d2-eb50-434b-b2a0-a51b537f03a62013-08-05 11:31:30.521 UTCd0e7b42b-b6e5-438b-8702-c5b573c9a3eb2013-08-09 11:24:12.493 UTC57da9a79-91d7-49e5-bc17-720a5f8c027b2013-08-06 10:14:16.630 UTC78039f41-0953-47fd-b758-f53eb52df92a2013-08-05 13:34:38.718 UTC16fb9c33-5aab-4350-9bde-dfde5167133f2013-08-06 09:51:47.305 UTCa193e3d6-a485-4073-a1b6-5c490a3260dd2013-07-25 13:41:59.997 UTCPubMed Search and Solr storage2013-07-25 14:00:44.958 UTCb8446a0f-abaf-4610-961d-30b6b46409dd2013-07-25 13:29:52.623 UTC801a5b11-eaa4-4965-93e1-25e2c6a87d822013-08-09 11:12:55.156 UTCc60821fa-1326-45cf-bc1a-798a2325ed102013-08-05 12:08:59.769 UTC85ccb2cb-b20c-43e6-b9bd-f13b9bc920372013-07-29 10:19:36.356 UTC4fbee505-4970-4393-9a62-d3b7759f0b112013-07-29 14:12:17.418 UTC1ea94807-433d-4a4f-88da-dc557bb7138b2013-08-01 14:47:21.302 UTC4dfbabb5-9197-451e-9e92-02fae79aa4d32013-08-02 13:23:00.602 UTC44ed3c5b-7b91-4d4f-b5e6-52c3df1270112013-08-02 12:24:49.468 UTC64c26331-190a-475a-a16e-d070aa7c391a2013-08-05 11:29:58.399 UTC77d1a1a0-6a43-4450-a94b-0575c0243d782013-08-02 14:35:40.916 UTC68df372d-6140-41a8-bb0e-9089e490233d2013-07-25 13:33:03.208 UTC8b5bbbda-bb25-4e31-9d19-94cb4df65c002013-08-02 13:37:45.695 UTCf236f9b4-5c12-45c6-9884-98676e5f98372013-08-01 15:02:27.252 UTC1575312b-b90b-4a95-ae15-ac3cec5bbb442013-07-31 11:45:02.645 UTCea73a848-e4d9-444a-b3c3-bc97e006d5852013-08-02 15:03:50.411 UTC8113db84-5222-48bf-95e9-a86c2df01e4b2013-08-02 14:53:12.557 UTCcb41787b-8927-4ff8-ad2a-4ec50dc1027d2013-08-01 14:41:46.463 UTC015db7bf-1fd9-4f19-b570-b3e9c21aba272013-08-06 09:45:47.778 UTC67e6dc72-e6f0-4334-8560-cb2fee1605402013-08-05 11:13:42.734 UTC77aabeaf-d02f-4bda-a291-e82790295d052013-08-01 15:07:27.808 UTC9b77a50d-baf5-4936-9455-4db8754a1e1e2013-08-05 10:58:56.293 UTC778bd18c-70ea-4831-a8ff-ca222bcdd3ab2011-02-03 15:12:56.329 UTC808bf10f-79cf-4ad8-b7c4-0915567296b22013-08-05 11:24:50.935 UTC8a657ece-8722-428c-8eff-ce1281e159ee2013-07-25 14:15:07.256 UTC50392d9f-9021-4655-86d6-4e6c628232c82013-08-02 12:35:21.621 UTC104da24c-03a3-4941-a6c3-3fdee4b04af62013-07-31 12:12:55.498 UTC804a0584-07a7-4334-9f13-4d25ba89ce7f2013-08-05 11:37:59.345 UTC238c9aa4-2842-4dec-ba40-8a9a422ca87c2013-08-08 14:47:13.913 UTC9deb185a-d5b3-421d-b8b7-867f535b07f02013-08-01 14:29:45.250 UTC3dd2a242-b51a-4924-b15a-7acf1e86dd8e2013-08-01 14:57:49.545 UTC6ad6a1a5-d0b0-4bae-9aad-50271417b8b62013-08-01 12:57:48.123 UTC3ae47817-e0df-46c3-a9a9-f4d38e57020b2013-08-05 13:25:13.232 UTC0bd54491-9f34-45ae-8e1f-cad023f20c1a2013-08-05 10:15:51.297 UTCf482df22-fa2d-4b69-b07b-e733ec99ccf22013-08-01 14:37:39.664 UTCb23e87c9-c8f7-4857-bcb2-5b649f85135c2013-08-01 13:03:41.141 UTCa8332100-0556-4a8f-adbc-39e0eb500c932011-02-03 14:54:58.829 UTC8aa0e942-dbe9-423c-b4e2-cf414400860d2013-08-01 14:02:46.310 UTCa8090a72-ef06-480f-b430-d231cf3ccf562013-08-05 09:44:47.503 UTC6d06e03f-e72a-409e-8168-0db7e7e8c31a2013-07-31 12:21:00.201 UTC4785d728-b29b-4bf5-8095-915f271926282013-07-25 13:37:21.944 UTCc85b29cd-b254-42bc-98e8-b75fc4c245bf2013-08-05 09:49:41.933 UTC1c28bdfa-d41d-4311-9215-10bc571fc76d2013-07-31 12:10:50.446 UTCa2d64b2d-305a-44ed-9c3e-4dbcf575f9312013-07-31 12:06:02.75 UTCcfe5cfd0-557f-442c-8760-87b0ae1b306f2013-08-01 14:53:26.552 UTC027cfd43-b69c-4173-a28d-967f769953242013-08-02 14:16:52.758 UTC27614fd7-b142-4335-9d64-137f112703c32013-08-02 15:11:19.668 UTC5d01758b-4938-4141-9066-23da04f57ce52013-08-01 13:36:56.385 UTCdbc96ad2-018f-473d-acca-3e33b44cc59d2013-08-05 09:51:01.133 UTCd7dcb486-3934-4b45-8264-8db5398657452013-07-30 11:29:45.552 UTCd52ae246-a8f4-4695-953d-ce977b5200c52013-08-05 12:37:06.980 UTCfac3b253-9743-4ea5-b040-b430ebd35e0a2013-08-07 12:16:19.171 UTC73c90213-5465-49e8-ae2c-75ea432715452013-08-05 11:28:10.560 UTC23c02cd5-ebae-4ec3-97e0-ab0e46a4bb172013-08-01 13:44:48.519 UTC559e9841-0893-4caa-b1c3-c9aebf0adc4c2013-08-08 14:55:49.964 UTC266c2de9-b756-4915-8269-f8411f0121f42013-08-05 13:07:27.415 UTCfa1ba2b9-8f0d-4e15-9f5d-0e67993cb4582013-08-06 09:54:40.78 UTCda4a0084-8d34-48e9-a1f6-b105367a8f6e2013-07-25 13:38:55.807 UTCbcdf7647-6a41-4abb-a49e-fa052111a1be2013-08-02 13:15:15.722 UTCee5896c5-1953-4343-a44d-6236f12dd9a72013-08-06 11:05:41.319 UTC8bc7e6bb-8060-44fa-a31c-9322ab44c3ac2013-08-02 13:31:18.812 UTC72ea5398-092c-4313-99a8-cb24da36d6fb2013-08-05 13:10:02.979 UTC9a5233bb-b5f4-4a82-9ad5-91b6faa2df3d2013-08-05 11:23:06.510 UTC2bcf5ab4-af3e-40ab-947d-fcffdcac302a2013-08-01 13:08:59.891 UTC4e6de8f6-4f85-4086-9cac-55825bd4f5902013-08-01 15:10:45.802 UTC7f0e1834-99b1-4607-99bf-1d5a9cca664a2013-08-02 14:58:59.430 UTCc2a26e6c-36a3-450d-9441-0051bc9a3e552013-08-02 13:13:36.645 UTC7ee85d66-b88a-4f43-a944-a4fea254b7452013-08-05 10:15:14.703 UTC3806ac74-1442-43cf-89a1-ca61a0a63a3a2013-08-05 13:47:05.557 UTCca415cd3-750b-44f6-8cf6-b0fff3eb9e922013-08-01 13:10:44.925 UTC9618806c-90c4-40ba-b004-5cdf4725b4a12013-07-25 13:28:42.189 UTC9b9b6151-edc6-49ea-b7e5-8da5b55cb3192013-07-29 14:09:55.421 UTCdcf224e9-c257-410c-aa56-1cb13d0831942013-08-06 09:48:48.376 UTC095a1912-763f-46f5-ae15-d2a16b7f37fe2013-08-05 09:54:17.750 UTCba4fc1f4-0624-4eac-af9b-f970db6d37962013-08-01 13:40:43.128 UTCa2c9cb5d-4826-4e75-b26d-551c65081b2f2013-08-08 15:00:16.153 UTC8b5bbbda-bb25-4e31-9d19-94cb4df65c002013-08-02 13:29:07.675 UTC8b6bcd9f-4030-4b67-b196-838fe332a12b2013-08-05 13:13:23.595 UTC9ea869b9-2a18-4725-905d-933f8f8254d52013-07-25 14:03:56.306 UTCd5491639-3bc5-4208-8f03-683820f123db2013-07-29 14:08:38.490 UTC5c47732e-3d43-4224-b5a8-ba51105751352013-08-05 13:36:21.297 UTCca3f0ef7-8cb8-48ee-b8cf-6684b3bbe0582013-07-25 13:35:00.414 UTCa7966040-5302-4ad2-82af-3ce18e3b417f2013-08-01 15:14:48.394 UTC0d522cd9-e99e-4311-845a-f24b8545c21a2013-08-01 14:30:45.677 UTCa353a37a-6536-4b73-8ce2-c7a3783323472013-08-09 10:55:20.254 UTCac6a2084-57f0-487c-8e6b-ef283bcc571f2013-08-02 13:01:59.325 UTC3c5ec6a8-e649-4019-a7b5-deafc259131f2013-08-09 10:46:31.299 UTC454a4c58-5ea2-4538-87e2-1b7bd5d26f302013-08-05 13:42:51.628 UTCSander van Boom, Paul Fisher2013-07-25 14:00:21.554 UTC1a45a255-b1ce-4e06-be03-ce4025991f3f2013-08-08 14:45:19.731 UTC0f73bf4a-63e2-45a0-ac4e-00115b8976eb2013-08-05 12:58:28.131 UTC8f5ee0fb-adfd-4b78-b0c4-be530f4c43f82013-08-02 14:29:28.785 UTCBased on the work of Fisher: This workflow takes in a search term, are passed to the eSearch function and searched for in PubMed. I extended by removing outputs and text extraction and addded an automatic Solr storage process using a post.jar, specified by the user.
Before running this workflow, make sure that a solr server is up and running and the variable attached to the SolrImport process contains the correct path.
Dependencies:
- Solr2013-07-25 14:03:50.958 UTC20efc1da-dbf9-495b-876f-d2df86c32a632013-07-31 12:18:13.833 UTC276bd7a3-3b53-4cb4-b2f7-f7fe7cb781ba2013-08-02 14:39:41.720 UTC3713d8f2-2fb5-4d4d-ad33-2f3bd787d2d12013-08-01 12:55:09.939 UTC3498bc34-1c89-4aaf-b542-c3669568ea422013-07-30 11:27:57.638 UTCd54ddd8a-3a89-4913-b445-4289983e09cb2013-08-05 13:03:51.185 UTC7e2a22e9-5f80-41f0-9488-53f179b69c8f2013-07-30 11:23:16.662 UTC412a298e-90f7-40c9-a04e-33dce4cfebba2013-07-29 10:20:56.876 UTC868c69db-94c6-43a4-b23f-fb0a349a78322013-08-05 14:01:35.954 UTC7c44ad63-2295-4240-8444-96c392b8f3ae2013-07-25 14:08:16.582 UTCf9dfce1f-40b1-4bd2-a61e-848fbbc78bf32013-08-07 12:40:15.762 UTC3439bdc0-268a-4e97-9a48-a63ab1ab3a0d2013-08-05 12:47:50.761 UTC34f05c20-4402-4ccd-b46f-d394a483b89c2013-08-01 13:24:52.947 UTCXML_extraction_and_SAbstractLocation00This is the location of the chosen abstract.2013-08-09 11:12:32.861 UTCWorkspace00/tmp/2013-08-09 11:11:59.799 UTCThis is where the workflow will store the files that are being indexed in Solr.2013-08-09 11:12:19.929 UTCPathToPostJar00/tmp/solr/example/exampledocs/post.jar2013-08-09 11:11:48.834 UTCThis is the path to the example Post.jar. 2013-08-09 11:11:25.982 UTCOutputFileLocationSolrImport_STDERRSolrImport_STDOUTExtractPubmedIDxml_text0nodelist11This XPath extracts the PubmedID from the abstract.2013-08-09 11:08:37.672 UTCnet.sf.taverna.t2.activitiesxpath-activity1.4net.sf.taverna.t2.activities.xpath.XPathActivity<?xml version="1.0" encoding="UTF-8"?>
<eFetchResult xmlns="http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed">
<WorkflowRun>
<ResearcherID>H-8686-2013</ResearcherID>
<ArticleExtractionDate>Wed Aug 07 15:04:04 CEST 2013</ArticleExtractionDate>
<WorkflowMyExperimentID>3659</WorkflowMyExperimentID>
<WorkflowVersion>3</WorkflowVersion>
<WorkflowDevelopers>Sander van Boom and Paul Fisher</WorkflowDevelopers>
<WorkflowInputs>
<StartDate/>
<EndDate/>
<SearchTerm>proteomics</SearchTerm>
<MaximumArticles>10</MaximumArticles>
</WorkflowInputs>
</WorkflowRun>
<PubmedArticleSet>
<PubmedArticle>
<MedlineCitation Owner="NLM" Status="In-Data-Review">
<PMID Version="1">23918963</PMID>
<DateCreated>
<Year>2013</Year>
<Month>08</Month>
<Day>06</Day>
</DateCreated>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Electronic">1460-2431</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>64</Volume>
<Issue>11</Issue>
<PubDate>
<Year>2013</Year>
<Month>Aug</Month>
</PubDate>
</JournalIssue>
<Title>Journal of experimental botany</Title>
<ISOAbbreviation>J. Exp. Bot.</ISOAbbreviation>
</Journal>
<ArticleTitle>Leaf proteome alterations in the context of physiological and morphological responses to drought and heat stress in barley (Hordeum vulgare L.).</ArticleTitle>
<Pagination>
<MedlinePgn>3201-12</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1093/jxb/ert158</ELocationID>
<Abstract>
<AbstractText>The objective of this study was to identify barley leaf proteins differentially regulated in response to drought and heat and the combined stresses in context of the morphological and physiological changes that also occur. The Syrian landrace Arta and the Australian cultivar Keel were subjected to drought, high temperature, or a combination of both treatments starting at heading. Changes in the leaf proteome were identified using differential gel electrophoresis and mass spectrometry. The drought treatment caused strong reductions of biomass and yield, while photosynthetic performance and the proteome were not significantly changed. In contrast, the heat treatment and the combination of heat and drought reduced photosynthetic performance and caused changes of the leaf proteome. The proteomic analysis identified 99 protein spots differentially regulated in response to heat treatment, 14 of which were regulated in a genotype-specific manner. Differentially regulated proteins predominantly had functions in photosynthesis, but also in detoxification, energy metabolism, and protein biosynthesis. The analysis indicated that de novo protein biosynthesis, protein quality control mediated by chaperones and proteases, and the use of alternative energy resources, i.e. glycolysis, play important roles in adaptation to heat stress. In addition, genetic variation identified in the proteome, in plant growth and photosynthetic performance in response to drought and heat represent stress adaption mechanisms to be exploited in future crop breeding efforts.</AbstractText>
</Abstract>
<Affiliation>Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829 Köln, Germany.</Affiliation>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Rollins</LastName>
<ForeName>J A</ForeName>
<Initials>JA</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Habte</LastName>
<ForeName>E</ForeName>
<Initials>E</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Templer</LastName>
<ForeName>S E</ForeName>
<Initials>SE</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Colby</LastName>
<ForeName>T</ForeName>
<Initials>T</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Schmidt</LastName>
<ForeName>J</ForeName>
<Initials>J</Initials>
</Author>
<Author ValidYN="Y">
<LastName>von Korff</LastName>
<ForeName>M</ForeName>
<Initials>M</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType>Journal Article</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>J Exp Bot</MedlineTA>
<NlmUniqueID>9882906</NlmUniqueID>
<ISSNLinking>0022-0957</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">Abiotic stress</Keyword>
<Keyword MajorTopicYN="N">Rubisco activase</Keyword>
<Keyword MajorTopicYN="N">barley</Keyword>
<Keyword MajorTopicYN="N">drought</Keyword>
<Keyword MajorTopicYN="N">heat</Keyword>
<Keyword MajorTopicYN="N">proteomics</Keyword>
<Keyword MajorTopicYN="N">yield.</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="entrez">
<Year>2013</Year>
<Month>8</Month>
<Day>7</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2013</Year>
<Month>8</Month>
<Day>7</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2013</Year>
<Month>8</Month>
<Day>7</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pii">ert158</ArticleId>
<ArticleId IdType="doi">10.1093/jxb/ert158</ArticleId>
<ArticleId IdType="pubmed">23918963</ArticleId>
</ArticleIdList>
</PubmedData>
</PubmedArticle>
</PubmedArticleSet>
</eFetchResult>/default:eFetchResult/default:PubmedArticleSet/default:PubmedArticle/default:MedlineCitation/default:PMIDdefaulthttp://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmednet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeRead_Text_Filefileurl0filecontents00Some files are to be tasted, others to be swallowd and some few to be chewed and digested.
This program reads a file and returns the content. 2013-08-09 11:09:50.861 UTCnet.sf.taverna.t2.activitieslocalworker-activity1.4net.sf.taverna.t2.activities.localworker.LocalworkerActivityfileurl0'text/plain'java.lang.Stringtrueencoding0'text/plain'java.lang.Stringtruefilecontents0'text/plain'0workflownet.sourceforge.taverna.scuflworkers.io.TextFileReadernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeCreateFileContentPubmedID0Title0XMLContent00This process takes all the values from the XPaths and creates an XML file with a format that Solr can understand. 2013-08-09 11:07:44.258 UTCnet.sf.taverna.t2.activitiesbeanshell-activity1.4net.sf.taverna.t2.activities.beanshell.BeanshellActivityPubmedID0text/plainjava.lang.StringtrueTitle0text/plainjava.lang.StringtrueXMLContent00workflownet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeWrite_Text_Filefilecontents0outputFile0This process writes a text file to a known location. But I guess you already figured that out from the title...2013-08-09 11:05:55.436 UTCnet.sf.taverna.t2.activitieslocalworker-activity1.4net.sf.taverna.t2.activities.localworker.LocalworkerActivityoutputFile0'text/plain'java.lang.Stringtruefilecontents0'text/plain'java.lang.Stringtrueencoding0'text/plain'java.lang.StringtrueoutputFile0'text/plain'0workflownet.sourceforge.taverna.scuflworkers.io.TextFileWriternet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeCreateFileLocationWorkspace0PubmedID0FileLocation00This process creates the final location to store the indexed files based on the workspace and the pubmedID.2013-08-09 11:06:42.218 UTCnet.sf.taverna.t2.activitiesbeanshell-activity1.4net.sf.taverna.t2.activities.beanshell.BeanshellActivityWorkspace0text/plainjava.lang.StringtruePubmedID0text/plainjava.lang.StringtrueFileLocation00workflownet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeSolrImportinputFile0pathToPostJar0STDERR00STDOUT00SolrImport takes the path of the txt file and stores this in a Solr database.
Make sure that the Solr database is running and that the correct path is inside the variable.
If Solr is running locally you can check if the files have been stored by browsing to the following location:http://localhost:8983/solr/#/
Solr can be downloaded at:
http://lucene.apache.org/solr/2013-07-25 14:14:44.103 UTCnet.sf.taverna.t2.activitiesexternal-tool-activity1.4net.sf.taverna.t2.activities.externaltool.ExternalToolActivity789663B8-DA91-428A-9F7D-B3F3DA185FD4default local<?xml version="1.0" encoding="UTF-8"?>
<localInvocation><shellPrefix>/bin/sh -c</shellPrefix><linkCommand>/bin/ln -s %%PATH_TO_ORIGINAL%% %%TARGET_NAME%%</linkCommand></localInvocation>
6acb54d9-5501-46e2-85de-ee69bbd97fc4#Note the -Dauto argument in the command. This makes solr automatically find
#The extension of the file and creates a process that can use a wide range of different
#extensions.
java -Dauto -jar "%%pathToPostJar%%" "%%inputFile%%"12001800inputFilepathToPostJarinputFileinputFilefalsefalsefalseUTF-8falsefalsefalsepathToPostJarpathToPostJarfalsefalsefalseUTF-8falsefalsefalsefalsetruetrue0falsenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeExtractTitlexml_text0nodelist11This XPath extracts the title from the abstract.2013-08-09 11:08:21.80 UTCnet.sf.taverna.t2.activitiesxpath-activity1.4net.sf.taverna.t2.activities.xpath.XPathActivity<?xml version="1.0" encoding="UTF-8"?>
<eFetchResult xmlns="http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed">
<WorkflowRun>
<ResearcherID>H-8686-2013</ResearcherID>
<ArticleExtractionDate>Wed Aug 07 15:04:12 CEST 2013</ArticleExtractionDate>
<WorkflowMyExperimentID>3659</WorkflowMyExperimentID>
<WorkflowVersion>3</WorkflowVersion>
<WorkflowDevelopers>Sander van Boom and Paul Fisher</WorkflowDevelopers>
<WorkflowInputs>
<StartDate/>
<EndDate/>
<SearchTerm>proteomics</SearchTerm>
<MaximumArticles>10</MaximumArticles>
</WorkflowInputs>
</WorkflowRun>
<PubmedArticleSet>
<PubmedArticle>
<MedlineCitation Owner="NLM" Status="Publisher">
<PMID Version="1">23917802</PMID>
<DateCreated>
<Year>2013</Year>
<Month>8</Month>
<Day>6</Day>
</DateCreated>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1574-4647</ISSN>
<JournalIssue CitedMedium="Internet">
<PubDate>
<Year>2013</Year>
<Month>Aug</Month>
<Day>6</Day>
</PubDate>
</JournalIssue>
<Title>Age (Dordrecht, Netherlands)</Title>
<ISOAbbreviation>Age (Dordr)</ISOAbbreviation>
</Journal>
<ArticleTitle>Urine proteomes of healthy aging humans reveal extracellular matrix (ECM) alterations and immune system dysfunction.</ArticleTitle>
<Pagination>
<MedlinePgn/>
</Pagination>
<Abstract>
<AbstractText NlmCategory="UNLABELLED">Aging is a complex physiological process that poses considerable conundrums to rapidly aging societies. For example, the risk of dying from cardiovascular diseases and/or cancer steadily declines for people after their 60s, and other causes of death predominate for seniors older than 80 years of age. Thus, physiological aging presents numerous unanswered questions, particularly with regard to changing metabolic patterns. Urine proteomics analysis is becoming a non-invasive and reproducible diagnostic method. We investigated the urine proteomes in healthy elderly people to determine which metabolic processes were weakened or strengthened in aging humans. Urine samples from 37 healthy volunteers aged 19-90 years (19 men, 18 women) were analyzed for protein expression by liquid chromatography-tandem mass spectrometry. This generated a list of 19 proteins that were differentially expressed in different age groups (young, intermediate, and old age). In particular, the oldest group showed protein changes reflective of altered extracellular matrix turnover and declining immune function, in which changes corresponded to reported changes in cardiovascular tissue remodeling and immune disorders in the elderly. Thus, urinary proteome changes in the elderly appear to reflect the physiological processes of aging and are particularly clearly represented in the circulatory and immune systems. Detailed identification of "protein trails" creates a more global picture of metabolic changes that occur in the elderly.</AbstractText>
</Abstract>
<Affiliation>Mass Spectrometry Laboratory, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, ul. Pawinskiego 5a, 02-106, Warsaw, Poland.</Affiliation>
<AuthorList>
<Author>
<LastName>Bakun</LastName>
<ForeName>M</ForeName>
<Initials>M</Initials>
</Author>
<Author>
<LastName>Senatorski</LastName>
<ForeName>G</ForeName>
<Initials>G</Initials>
</Author>
<Author>
<LastName>Rubel</LastName>
<ForeName>T</ForeName>
<Initials>T</Initials>
</Author>
<Author>
<LastName>Lukasik</LastName>
<ForeName>A</ForeName>
<Initials>A</Initials>
</Author>
<Author>
<LastName>Zielenkiewicz</LastName>
<ForeName>P</ForeName>
<Initials>P</Initials>
</Author>
<Author>
<LastName>Dadlez</LastName>
<ForeName>M</ForeName>
<Initials>M</Initials>
</Author>
<Author>
<LastName>Paczek</LastName>
<ForeName>L</ForeName>
<Initials>L</Initials>
</Author>
</AuthorList>
<Language>ENG</Language>
<PublicationTypeList>
<PublicationType>JOURNAL ARTICLE</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2013</Year>
<Month>8</Month>
<Day>6</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<MedlineTA>Age (Dordr)</MedlineTA>
<NlmUniqueID>101250497</NlmUniqueID>
</MedlineJournalInfo>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2013</Year>
<Month>2</Month>
<Day>5</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2013</Year>
<Month>7</Month>
<Day>1</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="aheadofprint">
<Year>2013</Year>
<Month>8</Month>
<Day>6</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2013</Year>
<Month>8</Month>
<Day>7</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2013</Year>
<Month>8</Month>
<Day>7</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2013</Year>
<Month>8</Month>
<Day>7</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>aheadofprint</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="doi">10.1007/s11357-013-9562-7</ArticleId>
<ArticleId IdType="pubmed">23917802</ArticleId>
</ArticleIdList>
</PubmedData>
</PubmedArticle>
</PubmedArticleSet>
</eFetchResult>/default:eFetchResult/default:PubmedArticleSet/default:PubmedArticle/default:MedlineCitation/default:Article/default:ArticleTitledefaulthttp://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmednet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0100050000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeExtractPubmedIDxml_textRead_Text_FilefileurlCreateFileContentPubmedIDCreateFileContentTitleWrite_Text_FilefilecontentsWrite_Text_FileoutputFileCreateFileLocationWorkspaceCreateFileLocationPubmedIDSolrImportinputFileSolrImportpathToPostJarExtractTitlexml_textOutputFileLocationSolrImport_STDERRSolrImport_STDOUTThis workflow extracts information from the abstracts and imports them in Solr. 2013-08-09 11:11:01.596 UTCeb33b76c-eef6-4571-a24c-1c1259277d712013-08-08 14:04:39.26 UTC1edcb63d-d915-44df-a049-4d08e51a4c4b2013-08-09 11:12:40.460 UTC221d0764-d2f9-4b5e-b919-d34b4090f83b2013-08-08 14:19:03.953 UTC8336e90d-0d11-4ae4-bcc7-9095c78600e42013-08-08 14:31:46.947 UTCXML extraction and Solr import of abstracts.2013-08-09 11:10:32.633 UTCcd08d3c5-d20e-44fd-86d9-6497f8d407032013-08-09 11:09:46.680 UTCc78610c0-f7f7-4231-b8f1-f3b24adf386d2013-08-09 10:53:40.522 UTC459e8502-d439-434b-8456-592ebb2559772013-08-08 14:15:43.30 UTCb7721579-d224-491e-ab14-34596ffc34f62013-08-08 14:06:16.517 UTC1889a560-dffe-4bdb-a5dc-dd36ddd2de332013-08-08 14:17:58.94 UTC7a67c280-f0b2-4e49-8c6c-39ecdfe2cbb22013-08-08 14:40:42.973 UTC59f6b762-1b12-44c8-aae4-85a05e31d26d2013-08-08 14:59:03.531 UTCSander van Boom2013-08-09 11:10:03.667 UTCXPathPubmedIdspubmed_ids00text/plain2011-02-03 14:53:07.50 UTC2011-02-03 14:53:07.50 UTCAbstractXMLrun_eFetchinpp0attachmentList11outp00net.sf.taverna.t2.activitieswsdl-activity1.4net.sf.taverna.t2.activities.wsdl.WSDLActivityhttp://www.ncbi.nlm.nih.gov/entrez/eutils/soap/v2.0/efetch_pubmed.wsdlrun_eFetchnet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeinppXMLid0output00net.sf.taverna.t2.activitieswsdl-activity1.4net.sf.taverna.t2.activities.wsdl.xmlsplitter.XMLInputSplitterActivityid0'text/plain'java.lang.StringtrueWebEnv0'text/plain'java.lang.Stringtruequery_key0'text/plain'java.lang.Stringtruetool0'text/plain'java.lang.Stringtrueemail0'text/plain'java.lang.Stringtrueretstart0'text/plain'java.lang.Stringtrueretmax0'text/plain'java.lang.Stringtruerettype0'text/plain'java.lang.Stringtrueoutput0'text/xml'0<s:extensions xmlns:s="http://org.embl.ebi.escience/xscufl/0.1alpha"><s:complextype optional="false" unbounded="false" typename="eFetchRequest" name="inpp" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed}eFetchRequest"><s:elements><s:basetype optional="true" unbounded="false" typename="string" name="id" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed}>eFetchRequest>id" /><s:basetype optional="true" unbounded="false" typename="string" name="WebEnv" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed}>eFetchRequest>WebEnv" /><s:basetype optional="true" unbounded="false" typename="string" name="query_key" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed}>eFetchRequest>query_key" /><s:basetype optional="true" unbounded="false" typename="string" name="tool" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed}>eFetchRequest>tool" /><s:basetype optional="true" unbounded="false" typename="string" name="email" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed}>eFetchRequest>email" /><s:basetype optional="true" unbounded="false" typename="string" name="retstart" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed}>eFetchRequest>retstart" /><s:basetype optional="true" unbounded="false" typename="string" name="retmax" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed}>eFetchRequest>retmax" /><s:basetype optional="true" unbounded="false" typename="string" name="rettype" qname="{http://www.ncbi.nlm.nih.gov/soap/eutils/efetch_pubmed}>eFetchRequest>rettype" /></s:elements></s:complextype></s:extensions>net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize1net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry1.0000net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Invokerun_eFetchinppinppXMLidAbstractXMLdd13f2ba-ad40-4acc-bc47-ee46fafcba522013-07-25 13:32:10.434 UTC363be023-70a0-4cc5-b02a-199d04d05e582011-02-03 14:54:58.907 UTCThis workflow takes in a number of search terms (as used in the normal PubMed interface) and retrieves a list of PubMed ids in a list format.2011-02-03 14:53:07.444 UTCPaul Fisher2011-02-03 14:53:07.444 UTCXPath Pubmed Ids2011-02-03 14:53:07.444 UTC