Version 7 (latest)
(of 7)
|
Version created on:
16/05/09 @ 01:06:26
by:
Marco Roos
|
Revision comments
Last edited on: 16/05/09 @ 01:13:19 by: Marco Roos
Title: BioAID_EnirchBioModelWithProteinsFromText
Type: Taverna 1
Preview
(Click on the image to get the full size)
Description
This workflow is for demonstration purposes only. Please contact the authors if you wish to try it. We will gladly collaborate with you.
Summary
This workflow extracts proteins and protein relations from Medline. Extracted protein names (symbols of at least 3 characters) are validated against mouse, rat, and human UniProt symbols, so the results are limited to these species. This workflow follows the following basic steps:
To support hypothesis formation, the results are added to a repository containing proto-ontologies with biological classes and procedural classes to log evidence. The models are based on RDF and OWL.
Acknowledgements:
Synonyms and Uniprot services: Martijn Scheumie, BioSemantics Group, University of Rotterdam, The Netherlands (BioRange project)
Known issues
Occasionally the workflow will fail on intermediate results that return no results (e.g. on a time out or a bug in the workflow). This problem will be addressed in Taverna 2 using its more strict list iteration mechanism and the AIDA plugin for Taverna 2. The workflow contains some elements that are not yet functional. This will show as failed when run. This can be ignored.
Please contact us if you have any questions about the workflow, our approach, or if you experience technical difficulties.
Download
Run
Option 1:
Note: you need to have both the WHIP Launcher and the Taverna myExperiment/WHIP plugin installed on your machine for this to work. See here for information.
Option 2:
Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/379/download?version=7
[ More Info
]
Taverna is available from http://taverna.sourceforge.net/
If you are having problems downloading it in Taverna, you may need to provide your username and password in the URL so that Taverna can access the Workflow:
Replace http:// in the link above with http://yourusername:yourpassword@
Workflow Components
| Name | Description |
|---|---|
| query | Biological query, e.g. a protein of interest. See Lucene documentation for advanced queries (http://lucene.apache.org/) Synonyms for protein names will be searched and added for terms within double quotes. |
| max_document_nr | limits the maximum number of hits search will produce. In Taverna 1 '100' works well while a 1000 and above is likely to halt Taverna 1 due to memory problems. This also depends on the memory setting for the java virtual machine by the client (usually your local Taverna). |
| aida_magic_word | A magic word is required to make use of the AIDA semantic repository for BioAID workflows. Please ask Scott Marshall (marshall@science.uva.nl) or Marco Roos (M.Roos1@uva.nl) for the magic word. NB: this semantic repository is for temporary data only. You should expect the repository to be cleared often and without warning. |
| clear_knowledge_repository | true if you would like the knowledge base (triple store) to be cleared and the proto-ontologies reloaded false otherwise |
| Name | Type | Description |
|---|---|---|
| false | stringconstant | |
| MedLineTotalDocCount | stringconstant | |
| MinLogLikelhoodScoreDescription | stringconstant | |
| true | stringconstant | |
| Fail | local | |
| s06_AddProteinRelationToSemanticModel | workflow | |
| negate | beanshell | |
| CountProteinsPerDocument | beanshell | |
| CloneStringToList_NERserviceRunInstance | beanshell | |
| CloneStringToList_DocumentInstance | beanshell | |
| CloneStringToList_NERprocessRunInstance | beanshell | |
| Timestamp | beanshell | |
| s07_AddScoreToSemanticModel | workflow | Add ranking score for discovered protein terms to the semantic model. |
| s02_AddOriginalQueryToSemanticModel | workflow | Add Query to Semantic model with Sesame service cf example Biological Query |
| 02_RetrieveDocumentsFromMedline | workflow | This workflow applies the search web service from the AIDA toolbox. Comments: This search service is based on lucene defaults; it may be necessary to optimize the querystring to adopt the behaviour to what is most relevant in a particular domain (e.g. for medline prioritizing based on publication date is useful). Lucene favours shorter sentences, which may be bad for subsequent information extraction. |
| p03_AddProteinDiscoveryToSemanticModel | workflow | |
| p02_AddDocumentDiscoveryToSemanticModel | workflow | |
| s03b_AddQueryProteinsToSemanticModel | workflow | Add Protein to Semantic model with Sesame service cf example Discovered Proteins |
| 01b_UniProtXrefURLs | workflow | Adds URL cross references to various protein information resources. |
| s03_AddExpandedQueryToSemanticModel | workflow | Add autamotically expanded query to Semantic model. |
| p01_AddWorkflowToSemanticModel | workflow | |
| 05_ScoreExtractedProteins | workflow | This workflow calculates a min log likelihood score for the combination of a discoverd protein and a protein of interest (the query protein). Note that at the moment the total count of medline papers, which is part of the formula, is hard coded and not exact. Given its size this should not matter that much, and certainly not in comparison with other likelihoods calculated using the same value. |
| s08_AddRdfToRepository | workflow | |
| 01b_CalculateQueryFrequency | workflow | |
| s01_AddBiologicalModelToSemanticModel | workflow | Add Query to Semantic model with Sesame service cf example Biological Query |
| s00_InitializeSemanticStorage | workflow | |
| 06_UniProtXrefURLs_iHopBYPASS | workflow | Adds URL cross references to various protein information resources. |
| 04_ExtractProteinRelations_HomoSapiens | workflow | Workflow to extract protein protein interactions from text, followed by filtering protein names known as human protein names. The protein protein interaction service takes the output in 'IOB' format from applyCRF, which annotates proteins as such in text. |
| s05_AddDiscoveredProteinToSemanticModel | workflow | Add Protein to Semantic model with Sesame service cf example Discovered Proteins |
| s04_AddDocToSemanticModel | workflow | Add Document to Semantic model with Sesame service cf example discovered document |
| 01_ProcessQuery | workflow | Workflow to optimize a Lucene document retrieval query to 1. increase the priority of recent years (in decreasing order from 2009 down to 2002) 2. limit a subsequent search to a specific organism using a mesh organism tag |
| 03_ExtractProteins_UniProtValidation | workflow | Workflow to extract proteins from text, followed by filtering protein names known as human protein names. |
| Name | Description | Inputs | Outputs |
|---|---|---|---|
| InstantiateDiscoveryScore |
instance_ontology_url
instance_name type_uri label_string comment_string datetime |
NTriple_InstanceOf_statement
instance_uri |
|
| ConcatenateStringList |
stringlist
delimiter |
output | |
| DefineHasDiscoveryScoreRelation |
domain_instance_uri
relation_uri range_instance_uri |
NTriple_Relation_statement | |
| DefineTypedPropertyOfInstance |
instance_uri
property_uri property_string property_type |
NTriple_PropertyOfInstance_statement | |
| DefinePropertyOfInstance |
property_uri
property_string instance_uri |
NTriple_PropertyOfInstance_statement | |
| InstantiateQueryInstance |
instance_ontology_url
instance_name type_uri label_string comment_string datetime |
NTriple_InstanceOf_statement
instance_uri |
|
| ConcatenateRDFstatements |
stringlist
delimiter |
output | |
| ReplaceCharsForQueryID | input | output | |
| SlashDoubleQuotes |
input
findstring replacestring |
output | |
| DefineSemanticRelation |
domain_instance_uri
relation_uri range_instance_uri |
NTriple_Relation_statement | |
| DefineBooleanPropertyOfInstance |
instance_uri
property_uri property_string |
NTriple_PropertyOfInstance_statement | |
| DefineSemanticRelationRunComponent |
domain_instance_uri
relation_uri range_instance_uri |
NTriple_Relation_statement | |
| InstantiateRunOfNERProcessAndService |
instance_ontology_url
datetime timestamp_shortstring executed_service_instance_uri executed_process_instance_uri process_service_relation_uri run_of_process_property_uri run_of_service_property_uri process_run_type_uri service_run_type_uri input |
NTriple_statements
process_run_instance_uri service_run_instance_uri |
|
| ConcatenateStringList |
stringlist
delimiter |
output | |
| ConcatenateStringList |
stringlist
delimiter |
output | |
| DefineSemanticRelationRunComponent |
domain_instance_uri
relation_uri range_instance_uri |
NTriple_Relation_statement | |
| InstantiateRunOfDocRetrProcessAndService |
instance_ontology_url
datetime timestamp_shortstring executed_service_instance_uri executed_process_instance_uri process_service_relation_uri run_of_process_property_uri run_of_service_property_uri process_run_type_uri service_run_type_uri input |
NTriple_statements
process_run_instance_uri service_run_instance_uri |
|
| DefineSeeAlsoEntrezUniprotForInstance |
instance_uri
annotation_property_uri seeAlso_url |
NTriple_PropertyOfInstance_statement | |
| DefineSeeAlsoiHopForInstance |
instance_uri
annotation_property_uri seeAlso_url |
NTriple_PropertyOfInstance_statement | |
| InstantiateSemanticType_Protein |
instance_ontology_url
instance_name type_uri label_string comment_string datetime |
NTriple_InstanceOf_statement
instance_uri |
|
| DefineSeeAlsoiHopQueryForInstance |
instance_uri
annotation_property_uri seeAlso_url |
NTriple_PropertyOfInstance_statement | |
| DefineSemanticRelation_isModelComponent |
domain_instance_uri
relation_uri range_instance_uri |
NTriple_Relation_statement | |
| DefineSemanticRelation_references |
domain_instance_uri
relation_uri range_instance_uri |
NTriple_Relation_statement | |
| DefineSeeAlsoExpasyForInstance |
instance_uri
annotation_property_uri seeAlso_url |
NTriple_PropertyOfInstance_statement | |
| ConcatenateStringList |
stringlist
delimiter |
output | |
| InstantiateSemanticType_ProteinTerm |
instance_ontology_url
instance_name type_uri label_string comment_string datetime |
NTriple_InstanceOf_statement
instance_uri |
|
| ProteinTermAnnotations |
protein_name
uniprot_id |
protein_label
protein_comment protein_term_label protein_term_comment |
|
| ReplaceCharsForQueryID | input | output | |
| DefineSemanticRelation_expansion_of |
domain_instance_uri
relation_uri range_instance_uri |
NTriple_Relation_statement | |
| ConcatenateRDFstatements |
stringlist
delimiter |
output | |
| DefinePropertyOfInstance |
property_uri
property_string instance_uri |
NTriple_PropertyOfInstance_statement | |
| DefineSemanticRelation_references |
domain_instance_uri
relation_uri range_instance_uri |
NTriple_Relation_statement | |
| SlashDoubleQuotes |
input
findstring replacestring |
output | |
| InstantiateQueryInstance |
instance_ontology_url
instance_name type_uri label_string comment_string datetime |
NTriple_InstanceOf_statement
instance_uri |
|
| DefineBooleanPropertyOfInstance |
instance_uri
property_uri property_string |
NTriple_PropertyOfInstance_statement | |
| ConcatenateStringList |
stringlist
delimiter |
output | |
| InstantiateRunOfTextMiningProcessAndWorkflow |
instance_ontology_url
datetime timestamp_shortstring executed_service_instance_uri executed_process_instance_uri process_service_relation_uri run_of_process_property_uri run_of_service_property_uri process_run_type_uri service_run_type_uri input |
NTriple_statements
process_run_instance_uri service_run_instance_uri |
|
| PoiAndDpQuery |
poi_query
dp_query |
poi_and_dp_query | |
| RelativeFrequencyPoiInCorpus |
corpus_total
poi_count_in_corpus |
relative_frequency | |
| validate_query | query |
validated_query
count_brackets |
|
| CloneFrequencies |
copy_number
input |
clones | |
| MinLogLikelihood |
query_frequency
discovered_frequency query_discovered_frequency total_frequency |
minloglikelihood | |
| CountListElements | list | count | |
| RelativeFrequencyPoiInCorpus |
corpus_total
poi_count_in_corpus |
relative_frequency | |
| InstantiateSemanticType |
instance_ontology_url
instance_name type_uri label_string comment_string |
NTriple_InstanceOf_statement
instance_uri |
|
| ParseToBioModelID | input | output | |
| RepositoryRef |
sesame_url
repository |
bioaid_repository_url | |
| AddQueryToFilename |
input_query
input_filename |
new_filename | |
| FilterTrueProteinPairsByUniProtID |
protein1
uniprot1 protein2 uniprot2 |
true_protein1
true_uniprot1 true_protein2 true_uniprot2 |
|
| UniProtOrNot2 | uniprotIDlist | uniprotID_or_False | |
| UniProtOrNot1 | uniprotIDlist | uniprotID_or_False | |
| example_interaction_doc | interaction_doc | ||
| ConcatenateRelation |
protein_name1
protein_name2 interaction_term uniprot_id1 uniprot_id2 |
relation
id_relation |
|
| DefineSeeAlsoiHopForInstance |
instance_uri
annotation_property_uri seeAlso_url |
NTriple_PropertyOfInstance_statement | |
| InstantiateSemanticType_ProteinTerm |
instance_ontology_url
instance_name type_uri label_string comment_string datetime |
NTriple_InstanceOf_statement
instance_uri |
|
| DefineSemanticRelation_discovered_by |
domain_instance_uri
relation_uri range_instance_uri |
NTriple_Relation_statement | |
| DefineSeeAlsoiHopQueryForInstance |
instance_uri
annotation_property_uri seeAlso_url |
NTriple_PropertyOfInstance_statement | |
| DefineDocComponentRelation |
domain_instance_uri
relation_uri range_instance_uri |
NTriple_Relation_statement | |
| DefineRelation_NERCRFhasInput |
domain_instance_uri
relation_uri range_instance_uri |
NTriple_Relation_statement | |
| DefineSemanticRelation_isModelComponent |
domain_instance_uri
relation_uri range_instance_uri |
NTriple_Relation_statement | |
| DefineSeeAlsoExpasyForInstance |
instance_uri
annotation_property_uri seeAlso_url |
NTriple_PropertyOfInstance_statement | |
| DefineRelation_NERCRFhasOutput |
domain_instance_uri
relation_uri range_instance_uri |
NTriple_Relation_statement | |
| ProteinTermAnnotations |
protein_name
uniprot_id |
protein_label
protein_comment protein_term_label protein_term_comment |
|
| InstantiateSemanticType_Protein |
instance_ontology_url
instance_name type_uri label_string comment_string datetime |
NTriple_InstanceOf_statement
instance_uri |
|
| DefineSemanticRelation_references |
domain_instance_uri
relation_uri range_instance_uri |
NTriple_Relation_statement | |
| ConcatenateStringList |
stringlist
delimiter |
output | |
| DefineSeeAlsoEntrezUniprotForInstance |
instance_uri
annotation_property_uri seeAlso_url |
NTriple_PropertyOfInstance_statement | |
| DefineRelation_DocRetr_hasInput |
domain_instance_uri
relation_uri range_instance_uri |
NTriple_Relation_statement | |
| ConcatenateStringList |
stringlist
delimiter |
output | |
| DefineSemanticRelation_has_output |
domain_instance_uri
relation_uri range_instance_uri |
NTriple_Relation_statement | |
| DefineSemanticRelation_discovered_by |
domain_instance_uri
relation_uri range_instance_uri |
NTriple_Relation_statement | |
| InstantiateSemanticType |
instance_ontology_url
instance_name type_uri label_string comment_string datetime |
NTriple_InstanceOf_statement
instance_uri |
|
| validatePotentialProteinName |
uniProtIDlist
potentialProteinName |
validatedUniProtIDlist
validatedUniProtID validatedProteinName validated |
|
| QueryToStringIfNotList | input_list | output | |
| RemoveFalseProteins |
validated
uniProtIDlist proteinNameList uniProtIDlistList |
cleanUniProtIDlist
cleanProteinNamelist cleanUniProtIDlistList |
|
| Concat_synonyms |
synonymlist
query_term |
new_query | |
| ListFindAndReplace |
input
findstringlist replacestringlist |
output | |
| ReListEmpty | listOfLists_in | listOfLists_out | |
| Prioritise_lucene_query |
query_string
priority_string |
lucene_query | |
| validatePotentialProteinName |
uniProtIDlist
potentialProteinName |
validatedUniProtIDlist
validatedUniProtID validatedProteinName validated |
|
| Trim_protein_gene | input | output | |
| negate | true_or_false | false_or_true | |
| CountProteinsPerDocument | list | count | |
| CloneStringToList_NERserviceRunInstance |
copy_number
input |
clones | |
| CloneStringToList_DocumentInstance |
copy_number
input |
clones | |
| CloneStringToList_NERprocessRunInstance |
copy_number
input |
clones | |
| Timestamp |
now_RFC822
now_short now_ISO8601 |
| Name | Description |
|---|---|
| ProteinURL | |
| PubMed_URL | |
| protein_discovery_score | |
| Protein_name | |
| Relation | |
| RDFtriples_doc_url | |
| BioAID_RDFrepository_URL |
| Source | Sink |
|---|---|
| aida_magic_word | s00_InitializeSemanticStorage:aida_magic_word |
| aida_magic_word | s08_AddRdfToRepository:aida_magic_word |
| max_document_nr | 02_RetrieveDocumentsFromMedline:maxHits |
| query | 01_ProcessQuery:query_string |
| 01_ProcessQuery:UniProtID | s03b_AddQueryProteinsToSemanticModel:uniprot_id |
| 01_ProcessQuery:UniProtIDlist | 01b_UniProtXrefURLs:UniProtID |
| 01_ProcessQuery:extended_lucene_query | 01b_CalculateQueryFrequency:query |
| 01_ProcessQuery:extended_lucene_query | 02_RetrieveDocumentsFromMedline:queryString |
| 01_ProcessQuery:extended_lucene_query | 05_ScoreExtractedProteins:query |
| 01_ProcessQuery:extended_lucene_query | s03_AddExpandedQueryToSemanticModel:query |
| 01_ProcessQuery:protein_name | s03b_AddQueryProteinsToSemanticModel:protein_name |
| 01b_CalculateQueryFrequency:poi_count_in_corpus | 05_ScoreExtractedProteins:query_frequency |
| 01b_UniProtXrefURLs:EntrezUniProtURL | s03b_AddQueryProteinsToSemanticModel:entrez_pubmed_URL |
| 01b_UniProtXrefURLs:ExpasyUniProtURL | s03b_AddQueryProteinsToSemanticModel:expasy_URL |
| 01b_UniProtXrefURLs:iHopSearchURL | s03b_AddQueryProteinsToSemanticModel:iHop_search_URL |
| 01b_UniProtXrefURLs:iHopSentencesURL | s03b_AddQueryProteinsToSemanticModel:iHop_sentence_URL |
| 02_RetrieveDocumentsFromMedline:abstract | 03_ExtractProteins_UniProtValidation:input_text |
| 02_RetrieveDocumentsFromMedline:abstract | 04_ExtractProteinRelations_HomoSapiens:input_text |
| 02_RetrieveDocumentsFromMedline:pubmed_URL | s04_AddDocToSemanticModel:pubmed_URL |
| 02_RetrieveDocumentsFromMedline:pubmed_id | s04_AddDocToSemanticModel:pubmed_id |
| 03_ExtractProteins_UniProtValidation:protein_name | 05_ScoreExtractedProteins:discovered_protein |
| 03_ExtractProteins_UniProtValidation:protein_name | CountProteinsPerDocument:list |
| 03_ExtractProteins_UniProtValidation:protein_name | s05_AddDiscoveredProteinToSemanticModel:protein_name |
| 03_ExtractProteins_UniProtValidation:uniProtIDlist | 06_UniProtXrefURLs_iHopBYPASS:UniProtID |
| 03_ExtractProteins_UniProtValidation:uniprotID | s05_AddDiscoveredProteinToSemanticModel:uniprot_id |
| 04_ExtractProteinRelations_HomoSapiens:protein1 | s06_AddProteinRelationToSemanticModel:protein_name1 |
| 04_ExtractProteinRelations_HomoSapiens:protein2 | s06_AddProteinRelationToSemanticModel:protein_name2 |
| 04_ExtractProteinRelations_HomoSapiens:relation_term | s06_AddProteinRelationToSemanticModel:interaction_term |
| 04_ExtractProteinRelations_HomoSapiens:uniprot_id1 | s06_AddProteinRelationToSemanticModel:uniprot_id1 |
| 04_ExtractProteinRelations_HomoSapiens:uniprot_id2 | s06_AddProteinRelationToSemanticModel:uniprot_id2 |
| 05_ScoreExtractedProteins:min_log_likelihood | s07_AddScoreToSemanticModel:score_value |
| 06_UniProtXrefURLs_iHopBYPASS:EntrezUniProtURL | s05_AddDiscoveredProteinToSemanticModel:entrez_pubmed_URL |
| 06_UniProtXrefURLs_iHopBYPASS:ExpasyUniProtURL | s05_AddDiscoveredProteinToSemanticModel:expasy_URL |
| 06_UniProtXrefURLs_iHopBYPASS:iHopSearchURL | s05_AddDiscoveredProteinToSemanticModel:iHop_search_URL |
| 06_UniProtXrefURLs_iHopBYPASS:iHopSentencesURL | s05_AddDiscoveredProteinToSemanticModel:iHop_sentence_URL |
| CloneStringToList_DocumentInstance:clones | s05_AddDiscoveredProteinToSemanticModel:doc_instance_uri |
| CloneStringToList_NERprocessRunInstance:clones | s05_AddDiscoveredProteinToSemanticModel:protein_discovery_process_run_instance_uri |
| CloneStringToList_NERserviceRunInstance:clones | s05_AddDiscoveredProteinToSemanticModel:protein_discovery_service_run_instance_uri |
| CountProteinsPerDocument:count | CloneStringToList_DocumentInstance:copy_number |
| CountProteinsPerDocument:count | CloneStringToList_NERprocessRunInstance:copy_number |
| CountProteinsPerDocument:count | CloneStringToList_NERserviceRunInstance:copy_number |
| MedLineTotalDocCount:value | 01b_CalculateQueryFrequency:corpus_total_doc_count |
| MedLineTotalDocCount:value | 05_ScoreExtractedProteins:PubMedTotalDocCount |
| MinLogLikelhoodScoreDescription:value | s07_AddScoreToSemanticModel:discovery_score_method |
| Timestamp:now_ISO8601 | p01_AddWorkflowToSemanticModel:timestamp_iso8601 |
| Timestamp:now_ISO8601 | p02_AddDocumentDiscoveryToSemanticModel:timestamp_iso8601 |
| Timestamp:now_ISO8601 | p03_AddProteinDiscoveryToSemanticModel:timestamp_iso8601 |
| Timestamp:now_ISO8601 | s02_AddOriginalQueryToSemanticModel:datetime |
| Timestamp:now_ISO8601 | s03_AddExpandedQueryToSemanticModel:datetime |
| Timestamp:now_ISO8601 | s03b_AddQueryProteinsToSemanticModel:datetime |
| Timestamp:now_ISO8601 | s04_AddDocToSemanticModel:datetime |
| Timestamp:now_ISO8601 | s05_AddDiscoveredProteinToSemanticModel:datetime |
| Timestamp:now_ISO8601 | s07_AddScoreToSemanticModel:datetime |
| Timestamp:now_short | p01_AddWorkflowToSemanticModel:timestamp_shortstring |
| Timestamp:now_short | p02_AddDocumentDiscoveryToSemanticModel:timestamp_shortstring |
| Timestamp:now_short | p03_AddProteinDiscoveryToSemanticModel:timestamp_shortstring |
| clear_knowledge_repository | negate:true_or_false |
| false:value | s00_InitializeSemanticStorage:do_not_add_to_repository |
| false:value | s00_InitializeSemanticStorage:do_not_clear_tmp_rdf_file |
| false:value | s08_AddRdfToRepository:do_not_add_to_repository |
| query | s00_InitializeSemanticStorage:original_input_query |
| query | s01_AddBiologicalModelToSemanticModel:BioModelComment |
| query | s01_AddBiologicalModelToSemanticModel:ModelIdentifyingName |
| query | s02_AddOriginalQueryToSemanticModel:query |
| negate:false_or_true | s00_InitializeSemanticStorage:do_not_clear_repository |
| p01_AddWorkflowToSemanticModel:AIDA_TextMiningWorkflowRun_instance | p02_AddDocumentDiscoveryToSemanticModel:workflow_run_instance_uri |
| p01_AddWorkflowToSemanticModel:AIDA_TextMiningWorkflowRun_instance | p03_AddProteinDiscoveryToSemanticModel:workflow_run_instance_uri |
| p02_AddDocumentDiscoveryToSemanticModel:AIDA_DocRetrProcessRun_instance | s04_AddDocToSemanticModel:docretrieval_process_run_instance_uri |
| p02_AddDocumentDiscoveryToSemanticModel:AIDA_DocRetrServiceRun_instance | s04_AddDocToSemanticModel:docretrieval_service_run_instance_uri |
| p03_AddProteinDiscoveryToSemanticModel:AIDA_NERProcessRun_instance | CloneStringToList_NERprocessRunInstance:input |
| p03_AddProteinDiscoveryToSemanticModel:AIDA_NERserviceRun_instance | CloneStringToList_NERserviceRunInstance:input |
| s00_InitializeSemanticStorage:BioAIDinstances_ontology_url | p01_AddWorkflowToSemanticModel:instance_ontology_url |
| s00_InitializeSemanticStorage:BioAIDinstances_ontology_url | p02_AddDocumentDiscoveryToSemanticModel:instance_ontology_url |
| s00_InitializeSemanticStorage:BioAIDinstances_ontology_url | p03_AddProteinDiscoveryToSemanticModel:instance_ontology_url |
| s00_InitializeSemanticStorage:BioAIDinstances_ontology_url | s01_AddBiologicalModelToSemanticModel:InstanceOntologyURL |
| s00_InitializeSemanticStorage:BioAIDinstances_ontology_url | s02_AddOriginalQueryToSemanticModel:InstanceOntologyURL |
| s00_InitializeSemanticStorage:BioAIDinstances_ontology_url | s03_AddExpandedQueryToSemanticModel:InstanceOntologyURL |
| s00_InitializeSemanticStorage:BioAIDinstances_ontology_url | s03b_AddQueryProteinsToSemanticModel:instance_ontology_url |
| s00_InitializeSemanticStorage:BioAIDinstances_ontology_url | s04_AddDocToSemanticModel:instance_ontology_url |
| s00_InitializeSemanticStorage:BioAIDinstances_ontology_url | s05_AddDiscoveredProteinToSemanticModel:instance_ontology_url |
| s00_InitializeSemanticStorage:BioAIDinstances_ontology_url | s07_AddScoreToSemanticModel:instance_ontology_url |
| s00_InitializeSemanticStorage:RDFoutput_doc_url | p01_AddWorkflowToSemanticModel:tmp_RDFdoc_fileref |
| s00_InitializeSemanticStorage:RDFoutput_doc_url | p02_AddDocumentDiscoveryToSemanticModel:tmp_RDFdoc_fileref |
| s00_InitializeSemanticStorage:RDFoutput_doc_url | p03_AddProteinDiscoveryToSemanticModel:tmp_RDFdoc_fileref |
| s00_InitializeSemanticStorage:RDFoutput_doc_url | s08_AddRdfToRepository:rdfFile_url |
| s01_AddBiologicalModelToSemanticModel:biomodel_instance_uri | s02_AddOriginalQueryToSemanticModel:model_instance_uri |
| s01_AddBiologicalModelToSemanticModel:biomodel_instance_uri | s03_AddExpandedQueryToSemanticModel:model_instance_uri |
| s01_AddBiologicalModelToSemanticModel:biomodel_instance_uri | s03b_AddQueryProteinsToSemanticModel:model_instance_uri |
| s01_AddBiologicalModelToSemanticModel:biomodel_instance_uri | s05_AddDiscoveredProteinToSemanticModel:model_instance_uri |
| s02_AddOriginalQueryToSemanticModel:query_instance | p01_AddWorkflowToSemanticModel:workflow_input |
| s02_AddOriginalQueryToSemanticModel:query_instance | p02_AddDocumentDiscoveryToSemanticModel:computation_input |
| s02_AddOriginalQueryToSemanticModel:query_instance | s03_AddExpandedQueryToSemanticModel:original_query_instance_uri |
| s03_AddExpandedQueryToSemanticModel:query_instance | s04_AddDocToSemanticModel:query_instance_uri |
| s04_AddDocToSemanticModel:doc_instance_uri | CloneStringToList_DocumentInstance:input |
| s04_AddDocToSemanticModel:doc_instance_uri | p03_AddProteinDiscoveryToSemanticModel:computation_input |
| s04_AddDocToSemanticModel:doc_instance_uri | s06_AddProteinRelationToSemanticModel:doc_instance |
| 02_RetrieveDocumentsFromMedline:pubmed_URL | PubMed_URL |
| 03_ExtractProteins_UniProtValidation:protein_name | Protein_name |
| 04_ExtractProteinRelations_HomoSapiens:relation | Relation |
| 05_ScoreExtractedProteins:min_log_likelihood | protein_discovery_score |
| 06_UniProtXrefURLs_iHopBYPASS:EntrezUniProtURL | ProteinURL |
| s00_InitializeSemanticStorage:BioAID_Repository_url | BioAID_RDFrepository_URL |
| s00_InitializeSemanticStorage:RDFoutput_doc_url | RDFtriples_doc_url |
| s00_InitializeSemanticStorage:RDFoutput_doc_url | s01_AddBiologicalModelToSemanticModel:RDF_doc_filename |
| s00_InitializeSemanticStorage:RDFoutput_doc_url | s02_AddOriginalQueryToSemanticModel:RDF_doc_filename |
| s00_InitializeSemanticStorage:RDFoutput_doc_url | s03_AddExpandedQueryToSemanticModel:RDF_doc_filename |
| s00_InitializeSemanticStorage:RDFoutput_doc_url | s03b_AddQueryProteinsToSemanticModel:tmp_rdf_output_fileref |
| s00_InitializeSemanticStorage:RDFoutput_doc_url | s04_AddDocToSemanticModel:rdf_output_doc_url |
| s00_InitializeSemanticStorage:RDFoutput_doc_url | s05_AddDiscoveredProteinToSemanticModel:tmp_rdf_output_fileref |
| s00_InitializeSemanticStorage:RDFoutput_doc_url | s07_AddScoreToSemanticModel:tmp_rdf_doc_ref |
| s00_InitializeSemanticStorage:Rdf_NTriple_format | s08_AddRdfToRepository:rdf_format |
| s05_AddDiscoveredProteinToSemanticModel:protein_term_instance | s07_AddScoreToSemanticModel:discovered_instance_uri |
| Controller | Target |
|---|---|
| s07_AddScoreToSemanticModel | s08_AddRdfToRepository |
| Fail | s06_AddProteinRelationToSemanticModel |
| Fail | 04_ExtractProteinRelations_HomoSapiens |
| 05_ScoreExtractedProteins | 06_UniProtXrefURLs_iHopBYPASS |
| s00_InitializeSemanticStorage | 01_ProcessQuery |
| s03_AddExpandedQueryToSemanticModel | s03b_AddQueryProteinsToSemanticModel |
| p01_AddWorkflowToSemanticModel | s03_AddExpandedQueryToSemanticModel |
| s03b_AddQueryProteinsToSemanticModel | p02_AddDocumentDiscoveryToSemanticModel |
Workflow Type
None
Log in to add Tags
Shared with Groups (11)
Log in to add to one of your Packs
Current:
0.0 / 5
(0 ratings)
Log in to rate and see breakdown of ratings
Statistics
None
Earliest Version:
[1] - BioAID_ProteinDiscovery_HomoSapiens
Created on: Friday 22 August 2008 @ 11:00:29 (GMT)
Created by: Marco Roos
Last edited on: Tuesday 26 August 2008 @ 15:56:58 (GMT)
Last edited by: Marco Roos
Revision comments:
None
Previous Versions:
[2] - BioAID_EnirchBioModelWithProteinsFromText
Created on: Friday 22 August 2008 @ 11:00:29 (GMT)
Created by: Marco Roos
Last edited on: Tuesday 26 August 2008 @ 16:05:11 (GMT)
Last edited by: Marco Roos
Revision comments:
Protein discovery workflow that stores instances in a semantic model that separates biology (intensional) knowledge from procedural (extensional) knowledge.
Semantic types in the semantic sub workflows are obtained provisionally using strings produced by a workflow (GetFromSesame.xml) that gets types from a Sesame repository containing the template ontologies.
Created on: Friday 22 August 2008 @ 11:00:29 (GMT)
Created by: Marco Roos
Last edited on: Wednesday 27 August 2008 @ 23:15:01 (GMT)
Last edited by: Marco Roos
Revision comments:
Minor update. Changed the description.
Created on: Wednesday 29 October 2008 @ 09:14:51 (GMT)
Created by: Marco Roos
Revision comments:
Adjustments for semantic model updates.
Created on: Wednesday 29 October 2008 @ 09:27:18 (GMT)
Created by: Marco Roos
Last edited on: Friday 15 May 2009 @ 16:38:18 (GMT)
Last edited by: Marco Roos
Revision comments:
Minor updates to get the syncing of document instances and proteins right.
Created on: Saturday 16 May 2009 @ 00:57:14 (GMT)
Created by: Marco Roos
Revision comments:
Latest Version:
[7] - BioAID_EnirchBioModelWithProteinsFromText
Created on: Saturday 16 May 2009 @ 01:06:26 (GMT)
Created by: Marco Roos
Last edited on: Saturday 16 May 2009 @ 01:13:19 (GMT)
Last edited by: Marco Roos
Revision comments:
Reviews
(0)
Other workflows that use similar services
(2)
|
Original Uploader |
Created: 28/05/09 @ 12:21:05
Credits:
Attributions:
License: Creative Commons Attribution-Share Alike 3.0 Unported License
This workflow finds proteins relevant to the query string via the following steps:
A user query: a single gene/protein name. E.g.: (EZH2 OR "Enhancer of Zeste").
Retrieve documents: finds 'maximumNumberOfHits' relevant documents (abstract+title) based on query (the AIDA service inside is based on Apache's Lucene)
Discover proteins: extract proteins discovered in the set of relevant abstracts with a 'named entity recognizer' trained on genomic terms using a Bayesian approach; the AIDA serv...
Rating: 0.0 / 5 (0 ratings) | Versions: 11 | Reviews: 0 | Comments: 1 | Citations: 0 Viewed: 454 times | Downloaded: 167 times Tags (9):
|
View
Download (v11)
|
|
Original Uploader |
Created: 10/05/10 @ 16:21:09 | Last updated: 20/03/12 @ 17:16:11
Credits:
License: Creative Commons Attribution-Share Alike 3.0 Unported License
This protein discovery workflow extracts protein names from documents retrieved from MedLine based on a user Query (cf Apache Lucene syntax). The protein names are filtered by checking if there exists a valid UniProt ID for the given protein name.
Rating: 0.0 / 5 (0 ratings) | Versions: 7 | Reviews: 0 | Comments: 1 | Citations: 0 Viewed: 292 times | Downloaded: 131 times Tags (12):
|
View
Download (v7)
|
Linked Data
Non-Information Resource URI: http://www.myexperiment.org/workflows/379
Alternative Formats
New/Upload
Log in / Register
Need an account?
Click here to register
Popular Tags
25 tags
[All Tags]
Copyright © 2007 - 2011 The University of Manchester and University of Southampton

Log in
Register
Give us Feedback
Invite
Download Scalable Diagram (SVG)
Launch in Taverna

View
Download (v11)
Log in to make a comment
This workflow may need some work because of a recent server migration... Our apologies.