This workflow extracts proteins and protein relations from Medline. Extracted protein names (symbols of at least 3 characters) are validated against mouse, rat, and human UniProt symbols, so the results are limited to these species.
This workflow follows the following basic steps:
1. it retrieves documents relevant for the query string
2. it discovers proteins in those documents, considered relevant to the query string (colocation in text mining terms)
3. it extract protein-protein relations (slightly stronger than colocation)
In addition, the results are added to a biological model to support hypthesis formation and a procedural model to log trails to evidence. The models are based on description logic (RDF/OWL format).
Acknowledgements:
Synonyms and Uniprot services: Martijn Scheumie, BioSemantics Group, University of Rotterdam, The Netherlands (BioRange project)
false
17000000
-log likelihood ratio for finding query q and discovery d together computed by -log((QD_exp / N) / (QD_obs / N)), QD_exp = (Q/D)/N, where Q is the frequency of documents containing query q, D is the frequency of documents containing the discovery d, and QD the frequency of documents containing both q and d; QD_exp is the expected frequency of documents containing q and d assuming independence of q and d (H0). This score is a measure of 'specialness' of finding q and d together.
true
true
org.embl.ebi.escience.scuflworkers.java.FailIfTrue
String input = true_or_false;
String output = "true";
if (input.equals("true") || input.equals("True") || input.equals("TRUE") || input.equals("yes") || input.equals("Yes") || input.equals("YES")) {
output = "false";
} else {
output = "true";
}
false_or_true = output;
true_or_false
false_or_true
count = list.size();
list
count
import java.util.*;
List newlist = new ArrayList();
for (int i=0; i<((int) Integer.parseInt(copy_number.toString())); i++) {
newlist.add(input);
}
clones=newlist;
copy_number
input
clones
import java.util.*;
List newlist = new ArrayList();
for (int i=0; i<((int) Integer.parseInt(copy_number.toString())); i++) {
newlist.add(input);
}
clones=newlist;
copy_number
input
clones
import java.util.*;
List newlist = new ArrayList();
for (int i=0; i<((int) Integer.parseInt(copy_number.toString())); i++) {
newlist.add(input);
}
clones=newlist;
copy_number
input
clones
/*
provides current date and time in various formats
no inputs
output:
now_RFC822
now_short
now_ISO8601
*/
import java.util.Date;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
DateFormat dateFormatRFC822 = new SimpleDateFormat("EEE', 'dd' 'MMM' 'yyyy' 'HH':'mm':'ss' 'z");
DateFormat dateFormatISO8601 = new SimpleDateFormat("yyyy'-'MM'-'dd'T'HH':'mm':'ssZ");
DateFormat dateFormatShort = new SimpleDateFormat("yyMMdd_HHmmss");
Date date = new Date();
String tmp_ISO8601 = (String) dateFormatISO8601.format(date);
tmp_ISO8601 = tmp_ISO8601.substring(0,tmp_ISO8601.length()-2) + ":" + tmp_ISO8601.substring(tmp_ISO8601.length()-2);
now_RFC822=dateFormatRFC822.format(date);
now_ISO8601 = tmp_ISO8601;
now_short=dateFormatShort.format(date);
now_RFC822
now_short
now_ISO8601
Add ranking score for discovered protein terms to the semantic model.
Add Likelihood Score to Semantic model with Sesame service cf example Score
LikelihoodDiscoveryScore_
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#DiscoveryScore
yes
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#has_discovery_score
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovery_score_value
http://www.w3.org/2001/XMLSchema#double
/*
N-triple beanshell: Instance of Type including label and comment
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/BioModel.owl#Enzyme> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/2000/01/rdf-schema#label> "an enzyme"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/2000/01/rdf-schema#comment> "e.g. the enzyme referred to by as 'EZH2'"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_BioModel> <http://purl.org/dc/elements/1.1/date> "1999-05-31T13:20:00-05:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
*/
/*
input:
instance_ontology_url
instance_name
type_uri
label_string
comment_string
datetime
output:
NTriple_InstanceOf_statement
instance_uri
*/
instance_uri = instance_ontology_url + "#" + instance_name;
NTriple_InstanceOf_statement = "<" + instance_uri + "> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <" + type_uri + "> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#label> \"" + label_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#comment> \"" + comment_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://purl.org/dc/elements/1.1#date> \"" + datetime + "\"^^<http://www.w3.org/2001/XMLSchema#dateTime> .\n";
instance_ontology_url
instance_name
type_uri
label_string
comment_string
datetime
NTriple_InstanceOf_statement
instance_uri
/*
Concatenates List of Strings.
Use 'merge all data' on its input to concatenate different inputs.
*/
String s;
Iterator iter = (Iterator) stringlist.iterator();
if (iter.hasNext()) s = iter.next();
while (iter.hasNext()) {
s = s + delimiter + iter.next();
}
output = s;
stringlist
delimiter
output
/*
N-triple beanshell: RelationByURI
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_InteractionTerm> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovered_by> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_RelationDiscoveryProcess> .
*/
/*
variables,
input:
domain_instance_uri
relation_uri
range_instance_uri
output:
NTriple_Relation_statement
*/
NTriple_Relation_statement = "<" + domain_instance_uri + "> <" + relation_uri + "> <" + range_instance_uri + "> .\n";
domain_instance_uri
relation_uri
range_instance_uri
NTriple_Relation_statement
org.embl.ebi.escience.scuflworkers.java.StringConcat
/*
N-triple beanshell: Property of Instance
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_DocumentSearchQuery> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#has_lucene_query> "\"EZH2\" AND chromatin"^^<http://www.w3.org/2001/XMLSchema#string> .
*/
/*
variables, input:
instance_uri
property_uri
property_string
output:
NTriple_PropertyOfInstance_statement
*/
NTriple_PropertyOfInstance_statement = "<" + instance_uri + "> <"+ property_uri + "> \"" + property_string + "\"^^<" + property_type + "> .\n";
instance_uri
property_uri
property_string
property_type
NTriple_PropertyOfInstance_statement
http://aida.science.uva.nl:8888/axis/AidaFiler.jws?wsdl
save_as
Description of how the discovery score was calculated.
Add Query to Semantic model with Sesame service cf example Biological Query
Add original (user provided) query to Semantic model.
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#has_lucene_query
DocQry:
input query for document retrieval
original query
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/MappingBioTextMining.owl#partially_represents
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#is_user_original
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#DocumentSearchQuery
/*
N-triple beanshell: Property of Instance
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_DocumentSearchQuery> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#has_lucene_query> "\"EZH2\" AND chromatin"^^<http://www.w3.org/2001/XMLSchema#string> .
*/
/*
variables, input:
instance_uri
property_uri
property_string
output:
NTriple_PropertyOfInstance_statement
*/
NTriple_PropertyOfInstance_statement = "<" + instance_uri + "> <"+ property_uri + "> \"" + property_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
property_uri
property_string
instance_uri
NTriple_PropertyOfInstance_statement
true
yes
/*
N-triple beanshell: Instance of Type including label, comment, and date
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/BioModel.owl#Enzyme> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/2000/01/rdf-schema#label> "an enzyme"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/2000/01/rdf-schema#comment> "e.g. the enzyme referred to by as 'EZH2'"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_BioModel> <http://purl.org/dc/elements/1.1/date> "2008-08-14T14:37:29+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
*/
/*
input:
instance_ontology_url
instance_name
type_uri
label_string
comment_string
datetime // must be in the right format, cf. 2008-08-14T14:37:29+02:00 (yyyy-MM-ddTHH:mm:ssZ); NB 'Z' for timezone with Java's SimpleDateFormat omits the last colon which is required.
output:
NTriple_InstanceOf_statement
instance_uri
*/
import java.net.URLEncoder;
// e.g. output=URLEncoder.encode(input, "UTF-8");
instance_uri = instance_ontology_url + "#" + instance_name;
NTriple_InstanceOf_statement = "<" + instance_uri + "> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <" + type_uri + "> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#label> \"" + label_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#comment> \"" + comment_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://purl.org/dc/elements/1.1/date> \"" + datetime + "\"^^<http://www.w3.org/2001/XMLSchema#dateTime> .\n";
instance_ontology_url
instance_name
type_uri
label_string
comment_string
datetime
NTriple_InstanceOf_statement
instance_uri
/*
Concatenates List of Strings.
Use 'merge all data' on its input to concatenate different inputs.
*/
String s;
Iterator iter = (Iterator) stringlist.iterator();
if (iter.hasNext()) s = iter.next();
while (iter.hasNext()) {
s = s + delimiter + iter.next();
}
output = s;
stringlist
delimiter
output
/*
replace characters that Protégé does not like for names
*/
import java.util.regex.*;
import java.net.URLEncoder;
// e.g. output=URLEncoder.encode(input, "UTF-8");
String tmpstring = input;
/*
Pattern p = Pattern.compile("#");
Matcher m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("&hash;");
p = Pattern.compile("\\^");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("⁁");
p = Pattern.compile("<");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("<");
p = Pattern.compile(">");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll(">");
p = Pattern.compile("\\{");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("&lcurly;");
p = Pattern.compile("\\}");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("&rcurly;");
p = Pattern.compile("%");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("&perc;");
p = Pattern.compile("_");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("–");
p = Pattern.compile("\"");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll(""");
p = Pattern.compile("\\s");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("_");
*/
output = URLEncoder.encode(tmpstring, "UTF-8");
input
output
"
\\\"
import java.util.regex.*;
Pattern p = Pattern.compile(findstring);
Matcher m = p.matcher(input);
output = (String) m.replaceAll(replacestring);
input
findstring
replacestring
output
/*
N-triple beanshell: RelationByURI
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_InteractionTerm> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovered_by> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_RelationDiscoveryProcess> .
*/
/*
variables,
input:
domain_instance_uri
relation_uri
range_instance_uri
output:
NTriple_Relation_statement
*/
NTriple_Relation_statement = "<" + domain_instance_uri + "> <" + relation_uri + "> <" + range_instance_uri + "> .\n";
domain_instance_uri
relation_uri
range_instance_uri
NTriple_Relation_statement
/*
N-triple beanshell: Property of Instance
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_DocumentSearchQuery> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#has_lucene_query> "\"EZH2\" AND chromatin"^^<http://www.w3.org/2001/XMLSchema#string> .
*/
/*
variables, input:
instance_uri
property_uri
property_string
output:
NTriple_PropertyOfInstance_statement
*/
NTriple_PropertyOfInstance_statement = "<" + instance_uri + "> <"+ property_uri + "> \"" + property_string + "\"^^<http://www.w3.org/2001/XMLSchema#boolean> .\n";
instance_uri
property_uri
property_string
NTriple_PropertyOfInstance_statement
org.embl.ebi.escience.scuflworkers.java.StringConcat
no
http://aida.science.uva.nl:8888/axis/AidaFiler.jws?wsdl
save_as
Reference to file for RDF output.
E.g.
http://aida.science.uva.nl:9999/aida_public/rdf-output/tmp-rdf-out.rdf
Instance ontology URL. E.g. http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/BioAID_Instances.owl
e.g.
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_BioModel
conform this format
2008-08-14T14:37:29+02:00
This workflow applies the search web service from the AIDA toolbox.
Comments:
This search service is based on lucene defaults; it may be necessary to optimize the querystring to adopt the behaviour to what is most relevant in a particular domain (e.g. for medline prioritizing based on publication date is useful). Lucene favours shorter sentences, which may be bad for subsequent information extraction.
MedLine
content
This workflow applies the search web service from the AIDA toolbox.
Comments:
This search service is based on lucene defaults; it may be necessary to optimize the querystring to adopt the behaviour to what is most relevant in a particular domain (e.g. for medline prioritizing based on publication date is useful). Lucene favours shorter sentences, which may be bad for subsequent information extraction.
/aid:result/doc/field[@name='title']/value
http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=PubMed&list_uids=
/aid:result/doc/field[@name='content']/value
/aid:result/doc/field[@name='PMID']/value
net.sourceforge.taverna.scuflworkers.xml.XPathTextWorker
Xpath returns a list, while concatenat doesn't; this syncronized the list level of all ouptuts.
org.embl.ebi.escience.scuflworkers.java.StringConcat
Xpath returns a list, while concatenat doesn't; this syncronized the list level of all ouptuts.
org.embl.ebi.escience.scuflworkers.java.StringConcat
Xpath returns a list, while concatenat doesn't; this syncronized the list level of all ouptuts.
org.embl.ebi.escience.scuflworkers.java.StringConcat
org.embl.ebi.escience.scuflworkers.java.StringConcat
net.sourceforge.taverna.scuflworkers.xml.XPathTextWorker
net.sourceforge.taverna.scuflworkers.xml.XPathTextWorker
org.embl.ebi.escience.scuflworkers.java.StringConcat
http://aida.science.uva.nl:8888/axis/services/SearcherWS?wsdl
search
Lucene query for search. Simple AND and OR queries will work. For advanced queries see http://lucene.apache.org for more information.
e.g. MedLine will give access to a weekly update index of the medline corpus.
e.g.' content' will search abstract and title; abstract just the abstract, title just the title.
limits the maximum number of hits search will produce. In Taverna 1 '100' works well while a 1000 and above is likely to halt Taverna 1 due to memory problems. This also depends on the memory setting for the java virtual machine by the client (usually your local Taverna).
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/Workflow.owl#computation_component_of_run
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#TextMiningProcessRun
yes
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/AIDA_Instances.owl#AIDA_CombinedProteinTermExtractionProcess
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/Workflow.owl#ComputationRun
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#run_of_process
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/AIDA_Instances.owl#AIDA_CRFNamedEntityRecognitionService
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/MappingBioTextMining.owl#process_run_performed_by_computation_run
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/Workflow.owl#computation_run_of
/*
N-triple beanshell: RelationByURI
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_InteractionTerm> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovered_by> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_RelationDiscoveryProcess> .
*/
/*
variables,
input:
domain_instance_uri
relation_uri
range_instance_uri
output:
NTriple_Relation_statement
*/
NTriple_Relation_statement = "<" + domain_instance_uri + "> <" + relation_uri + "> <" + range_instance_uri + "> .\n";
domain_instance_uri
relation_uri
range_instance_uri
NTriple_Relation_statement
/*
N-triple beanshell: Instance of Process and Service runs including label and comment and their interrelationship
*/
/*
input:
instance_ontology_url
executed_process_instance_uri
executed_service_instance_uri
process_service_relation_uri
run_of_process_property_uri
run_of_service_property_uri
process_run_type_uri
service_run_type_uri
datetime
timestamp_shortstring
input
output:
NTriple_statements
process_run_instance_uri;
service_run_instance_uri;
*/
String[] tmp;
String executed_instance_uri;
String executed_name;
String instance_name;
String type_uri;
NTriple_statements = "";
int i=0;
while (i<=1) {
if (i==0) {
executed_instance_uri = executed_process_instance_uri;
run_of_property_uri = run_of_process_property_uri;
type_uri = process_run_type_uri;
} else {
executed_instance_uri = executed_service_instance_uri;
run_of_property_uri = run_of_service_property_uri;
type_uri = service_run_type_uri;
}
tmp = executed_instance_uri.split("#", 2);
executed_name = tmp[1].toString();
instance_name = executed_name + "_run_on_" + input + "_at_" + timestamp_shortstring;
instance_uri = instance_ontology_url + "#" + instance_name;
if (i==0) { process_run_instance_uri = instance_uri; } else { service_run_instance_uri = instance_uri; }
String comment_string = "run of " + executed_instance_uri + " on " + input + " dd. " + datetime + ".";
String label_string = "run of " + executed_name + " on " + input + " at " + timestamp_shortstring;
NTriple_statements = NTriple_statements + "<" + instance_uri + "> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <" + type_uri + "> .\n";
NTriple_statements = NTriple_statements + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#label> \"" + label_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_statements = NTriple_statements + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#comment> \"" + comment_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_statements = NTriple_statements + "<" + instance_uri + "> <http://purl.org/dc/elements/1.1#date> \"" + datetime + "\"^^<http://www.w3.org/2001/XMLSchema#dateTime> .\n";
NTriple_statements = NTriple_statements + "<" + instance_uri + "> <" + run_of_property_uri + "> <" + executed_instance_uri + "> .\n";
i=i+1;
}
NTriple_statements = NTriple_statements + "<" + process_run_instance_uri + "> <" + process_service_relation_uri + "> <" + service_run_instance_uri + "> .\n";
instance_ontology_url
datetime
timestamp_shortstring
executed_service_instance_uri
executed_process_instance_uri
process_service_relation_uri
run_of_process_property_uri
run_of_service_property_uri
process_run_type_uri
service_run_type_uri
input
NTriple_statements
process_run_instance_uri
service_run_instance_uri
/*
Concatenates List of Strings.
Use 'merge all data' on its input to concatenate different inputs.
*/
String s;
Iterator iter = (Iterator) stringlist.iterator();
if (iter.hasNext()) s = iter.next();
while (iter.hasNext()) {
s = s + delimiter + iter.next();
}
output = s;
stringlist
delimiter
output
no
http://aida.science.uva.nl:8888/axis/AidaFiler.jws?wsdl
save_as
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/MappingBioTextMining.owl#process_run_performed_by_computation_run
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#run_of_process
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#TextMiningProcessRun
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/Workflow.owl#computation_run_of
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/AIDA_Instances.owl#AIDA_LuceneBasedMedLineDocumentRetrievalProcess
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/Workflow.owl#computation_component_of_run
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/AIDA_Instances.owl#AIDA_DocumentRetrievalService
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/Workflow.owl#ComputationRun
yes
/*
Concatenates List of Strings.
Use 'merge all data' on its input to concatenate different inputs.
*/
String s;
Iterator iter = (Iterator) stringlist.iterator();
if (iter.hasNext()) s = iter.next();
while (iter.hasNext()) {
s = s + delimiter + iter.next();
}
output = s;
stringlist
delimiter
output
/*
N-triple beanshell: RelationByURI
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_InteractionTerm> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovered_by> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_RelationDiscoveryProcess> .
*/
/*
variables,
input:
domain_instance_uri
relation_uri
range_instance_uri
output:
NTriple_Relation_statement
*/
NTriple_Relation_statement = "<" + domain_instance_uri + "> <" + relation_uri + "> <" + range_instance_uri + "> .\n";
domain_instance_uri
relation_uri
range_instance_uri
NTriple_Relation_statement
/*
N-triple beanshell: Instance of Process and Service runs including label and comment and their interrelationship
*/
/*
input:
instance_ontology_url
executed_process_instance_uri
executed_service_instance_uri
process_service_relation_uri
run_of_process_property_uri
run_of_service_property_uri
process_run_type_uri
service_run_type_uri
datetime
timestamp_shortstring
input
output:
NTriple_statements
process_run_instance_uri;
service_run_instance_uri;
*/
String[] tmp;
String executed_instance_uri;
String executed_name;
String instance_name;
String type_uri;
NTriple_statements = "";
int i=0;
while (i<=1) {
if (i==0) {
executed_instance_uri = executed_process_instance_uri;
run_of_property_uri = run_of_process_property_uri;
type_uri = process_run_type_uri;
} else {
executed_instance_uri = executed_service_instance_uri;
run_of_property_uri = run_of_service_property_uri;
type_uri = service_run_type_uri;
}
tmp = executed_instance_uri.split("#", 2);
executed_name = tmp[1].toString();
instance_name = executed_name + "_run_on_" + input + "_at_" + timestamp_shortstring;
instance_uri = instance_ontology_url + "#" + instance_name;
if (i==0) { process_run_instance_uri = instance_uri; } else { service_run_instance_uri = instance_uri; }
String comment_string = "run of " + executed_instance_uri + " dd. " + datetime + ".";
String label_string = "run of " + executed_name + " on " + timestamp_shortstring;
NTriple_statements = NTriple_statements + "<" + instance_uri + "> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <" + type_uri + "> .\n";
NTriple_statements = NTriple_statements + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#label> \"" + label_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_statements = NTriple_statements + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#comment> \"" + comment_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_statements = NTriple_statements + "<" + instance_uri + "> <http://purl.org/dc/elements/1.1#date> \"" + datetime + "\"^^<http://www.w3.org/2001/XMLSchema#dateTime> .\n";
NTriple_statements = NTriple_statements + "<" + instance_uri + "> <" + run_of_property_uri + "> <" + executed_instance_uri + "> .\n";
i=i+1;
}
NTriple_statements = NTriple_statements + "<" + process_run_instance_uri + "> <" + process_service_relation_uri + "> <" + service_run_instance_uri + "> .\n";
instance_ontology_url
datetime
timestamp_shortstring
executed_service_instance_uri
executed_process_instance_uri
process_service_relation_uri
run_of_process_property_uri
run_of_service_property_uri
process_run_type_uri
service_run_type_uri
input
NTriple_statements
process_run_instance_uri
service_run_instance_uri
no
http://aida.science.uva.nl:8888/axis/AidaFiler.jws?wsdl
save_as
The run of a computation component that the document retrieval service runs are component parts of.
Input that triggered the service and process to run (a unique identifier for each run)
Add Protein to Semantic model with Sesame service cf example Discovered Proteins
Add Protein to Semantic model with Sesame service cf example Discovered Proteins
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/seeAlso_iHopQueryURI
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/seeAlso_ExpasyUniProtURI
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/seeAlso_EntrezUniProtURI
yes
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/BioModel.owl#isModelComponentOf
/*
N-triple beanshell: Property of Instance
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_DocumentSearchQuery> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#has_lucene_query> "\"EZH2\" AND chromatin"^^<http://www.w3.org/2001/XMLSchema#string> .
*/
/*
variables, input:
instance_uri
annotation_property_uri
seeAlso_url
output:
NTriple_PropertyOfInstance_statement
*/
import java.net.URLEncoder;
// e.g. output=URLEncoder.encode(input, "UTF-8");
seeAlso_url = URLEncoder.encode(seeAlso_url, "UTF-8");
NTriple_PropertyOfInstance_statement = "<" + instance_uri + "> <"+ annotation_property_uri + "> \"" + seeAlso_url + "\"^^<http://www.w3.org/2001/XMLSchema#anyURI> .\n";
instance_uri
annotation_property_uri
seeAlso_url
NTriple_PropertyOfInstance_statement
/*
N-triple beanshell: Property of Instance
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_DocumentSearchQuery> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#has_lucene_query> "\"EZH2\" AND chromatin"^^<http://www.w3.org/2001/XMLSchema#string> .
*/
/*
variables, input:
instance_uri
annotation_property_uri
seeAlso_url
output:
NTriple_PropertyOfInstance_statement
*/
import java.net.URLEncoder;
// e.g. output=URLEncoder.encode(input, "UTF-8");
seeAlso_url = URLEncoder.encode(seeAlso_url, "UTF-8");
NTriple_PropertyOfInstance_statement = "<" + instance_uri + "> <"+ annotation_property_uri + "> \"" + seeAlso_url + "\"^^<http://www.w3.org/2001/XMLSchema#anyURI> .\n";
instance_uri
annotation_property_uri
seeAlso_url
NTriple_PropertyOfInstance_statement
/*
N-triple beanshell: Instance of Type including label and comment
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/BioModel.owl#Enzyme> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/2000/01/rdf-schema#label> "an enzyme"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/2000/01/rdf-schema#comment> "e.g. the enzyme referred to by as 'EZH2'"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_BioModel> <http://purl.org/dc/elements/1.1/date> "1999-05-31T13:20:00-05:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
*/
/*
input:
instance_ontology_url
instance_name
type_uri
label_string
comment_string
datetime
output:
NTriple_InstanceOf_statement
instance_uri
*/
instance_uri = instance_ontology_url + "#" + instance_name;
NTriple_InstanceOf_statement = "<" + instance_uri + "> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <" + type_uri + "> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#label> \"" + label_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#comment> \"" + comment_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://purl.org/dc/elements/1.1#date> \"" + datetime + "\"^^<http://www.w3.org/2001/XMLSchema#dateTime> .\n";
instance_ontology_url
instance_name
type_uri
label_string
comment_string
datetime
NTriple_InstanceOf_statement
instance_uri
/*
N-triple beanshell: Property of Instance
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_DocumentSearchQuery> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#has_lucene_query> "\"EZH2\" AND chromatin"^^<http://www.w3.org/2001/XMLSchema#string> .
*/
/*
variables, input:
instance_uri
annotation_property_uri
seeAlso_url
output:
NTriple_PropertyOfInstance_statement
*/
import java.net.URLEncoder;
// e.g. output=URLEncoder.encode(input, "UTF-8");
seeAlso_url = URLEncoder.encode(seeAlso_url, "UTF-8");
NTriple_PropertyOfInstance_statement = "<" + instance_uri + "> <"+ annotation_property_uri + "> \"" + seeAlso_url + "\"^^<http://www.w3.org/2001/XMLSchema#anyURI> .\n";
instance_uri
annotation_property_uri
seeAlso_url
NTriple_PropertyOfInstance_statement
/*
N-triple beanshell: RelationByURI
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_InteractionTerm> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovered_by> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_RelationDiscoveryProcess> .
*/
/*
variables,
input:
domain_instance_uri
relation_uri
range_instance_uri
output:
NTriple_Relation_statement
*/
NTriple_Relation_statement = "<" + domain_instance_uri + "> <" + relation_uri + "> <" + range_instance_uri + "> .\n";
domain_instance_uri
relation_uri
range_instance_uri
NTriple_Relation_statement
/*
N-triple beanshell: RelationByURI
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_InteractionTerm> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovered_by> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_RelationDiscoveryProcess> .
*/
/*
variables,
input:
domain_instance_uri
relation_uri
range_instance_uri
output:
NTriple_Relation_statement
*/
NTriple_Relation_statement = "<" + domain_instance_uri + "> <" + relation_uri + "> <" + range_instance_uri + "> .\n";
domain_instance_uri
relation_uri
range_instance_uri
NTriple_Relation_statement
/*
N-triple beanshell: Property of Instance
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_DocumentSearchQuery> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#has_lucene_query> "\"EZH2\" AND chromatin"^^<http://www.w3.org/2001/XMLSchema#string> .
*/
/*
variables, input:
instance_uri
annotation_property_uri
seeAlso_url
output:
NTriple_PropertyOfInstance_statement
*/
import java.net.URLEncoder;
// e.g. output=URLEncoder.encode(input, "UTF-8");
seeAlso_url = URLEncoder.encode(seeAlso_url, "UTF-8");
NTriple_PropertyOfInstance_statement = "<" + instance_uri + "> <"+ annotation_property_uri + "> \"" + seeAlso_url + "\"^^<http://www.w3.org/2001/XMLSchema#anyURI> .\n";
instance_uri
annotation_property_uri
seeAlso_url
NTriple_PropertyOfInstance_statement
/*
Concatenates List of Strings.
Use 'merge all data' on its input to concatenate different inputs.
*/
String s;
Iterator iter = (Iterator) stringlist.iterator();
if (iter.hasNext()) s = iter.next();
while (iter.hasNext()) {
s = s + delimiter + iter.next();
}
output = s;
stringlist
delimiter
output
/*
N-triple beanshell: Instance of Type including label and comment
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/BioModel.owl#Enzyme> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/2000/01/rdf-schema#label> "an enzyme"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/2000/01/rdf-schema#comment> "e.g. the enzyme referred to by as 'EZH2'"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_BioModel> <http://purl.org/dc/elements/1.1/date> "1999-05-31T13:20:00-05:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
*/
/*
input:
instance_ontology_url
instance_name
type_uri
label_string
comment_string
datetime
output:
NTriple_InstanceOf_statement
instance_uri
*/
instance_uri = instance_ontology_url + "#" + instance_name;
NTriple_InstanceOf_statement = "<" + instance_uri + "> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <" + type_uri + "> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#label> \"" + label_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#comment> \"" + comment_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://purl.org/dc/elements/1.1#date> \"" + datetime + "\"^^<http://www.w3.org/2001/XMLSchema#dateTime> .\n";
instance_ontology_url
instance_name
type_uri
label_string
comment_string
datetime
NTriple_InstanceOf_statement
instance_uri
/*
input:
protein_name
uniprot_id
output:
protein_label
protein_comment
protein_term_label
protein_term_comment
*/
if(uniprot_id.length()>0) {
protein_label = protein_name + "_" + uniprot_id;
protein_comment = "protein referred to by as " + protein_name + " and UniProt ID: " + uniprot_id;
protein_term_comment = "protein term " + protein_name + ", identified as name of protein with UniProt ID: " + uniprot_id;
} else {
protein_label = protein_name; // flawed assumption that proteins without a uniprot id can be identified by their name
protein_comment = "protein referred to by as " + protein_name + " without a rat, mouse, or human UniProt ID";
protein_term_comment = "protein term " + protein_name + " as name of protein without rat, mouse, or human UniProt identifier";
}
protein_term_label = protein_label;
protein_name
uniprot_id
protein_label
protein_comment
protein_term_label
protein_term_comment
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/Text.owl#ProteinTerm
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/BioModel.owl#Protein
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/seeAlso_iHopUniProtURI
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/MappingBioTextMining.owl#references
http://aida.science.uva.nl:8888/axis/AidaFiler.jws?wsdl
save_as
E.g.
EZH2
e.g.
Q15910
http://www.ncbi.nlm.nih.gov/pubmed/
http://www.expasy.ch/
E.g.
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/BioAID_Instances.owl
E.g.
2008-08-14T14:37:29+02:00
E.g.
http://aida.science.uva.nl:9999/aida_public/rdf-output/tmp-rdf-out.rdf
Adds URL cross references to various protein information resources.
Adds URL cross references to various protein information resources.
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=
org.embl.ebi.escience.scuflworkers.java.StringConcat
http://www.ihop-net.org/UniPub/iHOP/gismo/
org.embl.ebi.escience.scuflworkers.java.StringConcat
org.embl.ebi.escience.scuflworkers.java.StringConcat
.html?ORGANISM_ID=1
http://expasy.org/uniprot/
http://www.ihop-net.org/UniPub/iHOP/?field=UNIPROT__AC&ncbi_tax_id=9606&organism_syn=&search=
<iHOPguessedSymbolId query=\".+\" xmlns=\"http://www.pdg.cnb.uam.es/UniPub/iHOP/xml\">(.+)</iHOPguessedSymbolId>
1
org.embl.ebi.escience.scuflworkers.java.RegularExpressionStringList
http://www.ihop-net.org/UniPub/iHOP/gismo/
.html?ORGANISM_ID=1
org.embl.ebi.escience.scuflworkers.java.StringConcat
org.embl.ebi.escience.scuflworkers.java.StringConcat
It takes a biological database reference as input. It guess the iHOP Id which best matches with the input.
http://ubio.bioinfo.cnio.es/biotools/iHOP/iHOP-SOAP.wsdl
guessSymbolIdFromReference
UniProt ID (for iHop a Human protein is expected)
E.g. Q15190
text/xml
Add autamotically expanded query to Semantic model.
Add autamotically expanded query to Semantic model.
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#has_lucene_query
DocQry:
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#DocumentSearchQuery
false
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/MappingBioTextMining.owl#partially_represents
input query for document retrieval
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#expanded_query_of
expanded query
yes
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#is_user_original
/*
replace characters that Protégé does not like for names
*/
import java.util.regex.*;
import java.net.URLEncoder;
// e.g. output=URLEncoder.encode(input, "UTF-8");
String tmpstring = input;
/*
Pattern p = Pattern.compile("#");
Matcher m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("&hash;");
p = Pattern.compile("\\^");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("⁁");
p = Pattern.compile("<");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("<");
p = Pattern.compile(">");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll(">");
p = Pattern.compile("\\{");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("&lcurly;");
p = Pattern.compile("\\}");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("&rcurly;");
p = Pattern.compile("%");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("&perc;");
p = Pattern.compile("_");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("–");
p = Pattern.compile("\"");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll(""");
p = Pattern.compile("\\s");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("_");
*/
output = URLEncoder.encode(tmpstring, "UTF-8");
input
output
/*
N-triple beanshell: RelationByURI
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_InteractionTerm> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovered_by> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_RelationDiscoveryProcess> .
*/
/*
variables,
input:
domain_instance_uri
relation_uri
range_instance_uri
output:
NTriple_Relation_statement
*/
NTriple_Relation_statement = "<" + domain_instance_uri + "> <" + relation_uri + "> <" + range_instance_uri + "> .\n";
domain_instance_uri
relation_uri
range_instance_uri
NTriple_Relation_statement
/*
Concatenates List of Strings.
Use 'merge all data' on its input to concatenate different inputs.
*/
String s;
Iterator iter = (Iterator) stringlist.iterator();
if (iter.hasNext()) s = iter.next();
while (iter.hasNext()) {
s = s + delimiter + iter.next();
}
output = s;
stringlist
delimiter
output
/*
N-triple beanshell: Property of Instance
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_DocumentSearchQuery> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#has_lucene_query> "\"EZH2\" AND chromatin"^^<http://www.w3.org/2001/XMLSchema#string> .
*/
/*
variables, input:
instance_uri
property_uri
property_string
output:
NTriple_PropertyOfInstance_statement
*/
NTriple_PropertyOfInstance_statement = "<" + instance_uri + "> <"+ property_uri + "> \"" + property_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
property_uri
property_string
instance_uri
NTriple_PropertyOfInstance_statement
/*
N-triple beanshell: RelationByURI
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_InteractionTerm> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovered_by> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_RelationDiscoveryProcess> .
*/
/*
variables,
input:
domain_instance_uri
relation_uri
range_instance_uri
output:
NTriple_Relation_statement
*/
NTriple_Relation_statement = "<" + domain_instance_uri + "> <" + relation_uri + "> <" + range_instance_uri + "> .\n";
domain_instance_uri
relation_uri
range_instance_uri
NTriple_Relation_statement
"
\\\"
import java.util.regex.*;
Pattern p = Pattern.compile(findstring);
Matcher m = p.matcher(input);
output = (String) m.replaceAll(replacestring);
input
findstring
replacestring
output
/*
N-triple beanshell: Instance of Type including label, comment, and date
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/BioModel.owl#Enzyme> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/2000/01/rdf-schema#label> "an enzyme"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/2000/01/rdf-schema#comment> "e.g. the enzyme referred to by as 'EZH2'"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_BioModel> <http://purl.org/dc/elements/1.1/date> "2008-08-14T14:37:29+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
*/
/*
input:
instance_ontology_url
instance_name
type_uri
label_string
comment_string
datetime // must be in the right format, cf. 2008-08-14T14:37:29+02:00 (yyyy-MM-ddTHH:mm:ssZ); NB 'Z' for timezone with Java's SimpleDateFormat omits the last colon which is required.
output:
NTriple_InstanceOf_statement
instance_uri
*/
instance_uri = instance_ontology_url + "#" + instance_name;
NTriple_InstanceOf_statement = "<" + instance_uri + "> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <" + type_uri + "> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#label> \"" + label_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#comment> \"" + comment_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://purl.org/dc/elements/1.1/date> \"" + datetime + "\"^^<http://www.w3.org/2001/XMLSchema#dateTime> .\n";
instance_ontology_url
instance_name
type_uri
label_string
comment_string
datetime
NTriple_InstanceOf_statement
instance_uri
/*
N-triple beanshell: Property of Instance
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_DocumentSearchQuery> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#has_lucene_query> "\"EZH2\" AND chromatin"^^<http://www.w3.org/2001/XMLSchema#string> .
*/
/*
variables, input:
instance_uri
property_uri
property_string
output:
NTriple_PropertyOfInstance_statement
*/
NTriple_PropertyOfInstance_statement = "<" + instance_uri + "> <"+ property_uri + "> \"" + property_string + "\"^^<http://www.w3.org/2001/XMLSchema#boolean> .\n";
instance_uri
property_uri
property_string
NTriple_PropertyOfInstance_statement
org.embl.ebi.escience.scuflworkers.java.StringConcat
no
http://aida.science.uva.nl:8888/axis/AidaFiler.jws?wsdl
save_as
Reference to file for RDF output.
E.g.
http://aida.science.uva.nl:9999/aida_public/rdf-output/tmp-rdf-out.rdf
Instance ontology URL. E.g. http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/BioAID_Instances.owl
e.g.
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_BioModel
conform this format
2008-08-14T14:37:29+02:00
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/Workflow.owl#computation_run_of
yes
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#run_of_process
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/Workflow.owl#ComputationRun
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/MappingBioTextMining.owl#process_run_performed_by_computation_run
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/AIDA_Instances.owl#AIDA_ProteinTermColocationExtractionProcess
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#TextMiningProcessRun
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/AIDA_Instances.owl#BioAID_BioModellingSupportByProteinExtractionWorkflow
/*
Concatenates List of Strings.
Use 'merge all data' on its input to concatenate different inputs.
*/
String s;
Iterator iter = (Iterator) stringlist.iterator();
if (iter.hasNext()) s = iter.next();
while (iter.hasNext()) {
s = s + delimiter + iter.next();
}
output = s;
stringlist
delimiter
output
/*
N-triple beanshell: Instance of Process and Service runs including label and comment and their interrelationship
*/
/*
input:
instance_ontology_url
executed_process_instance_uri
executed_service_instance_uri
process_service_relation_uri
run_of_process_property_uri
run_of_service_property_uri
process_run_type_uri
service_run_type_uri
datetime
timestamp_shortstring
input
output:
NTriple_statements
process_run_instance_uri;
service_run_instance_uri;
*/
String[] tmp;
String executed_instance_uri;
String executed_name;
String instance_name;
String type_uri;
NTriple_statements = "";
int i=0;
while (i<=1) {
if (i==0) {
executed_instance_uri = executed_process_instance_uri;
run_of_property_uri = run_of_process_property_uri;
type_uri = process_run_type_uri;
} else {
executed_instance_uri = executed_service_instance_uri;
run_of_property_uri = run_of_service_property_uri;
type_uri = service_run_type_uri;
}
tmp = executed_instance_uri.split("#", 2);
executed_name = tmp[1].toString();
instance_name = executed_name + "_run_on_" + input + "_at_" + timestamp_shortstring;
instance_uri = instance_ontology_url + "#" + instance_name;
if (i==0) { process_run_instance_uri = instance_uri; } else { service_run_instance_uri = instance_uri; }
String comment_string = "run of " + executed_instance_uri + " on " + input + " dd. " + datetime + ".";
String label_string = "run of " + executed_name + " on input " + input + " on " + timestamp_shortstring;
NTriple_statements = NTriple_statements + "<" + instance_uri + "> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <" + type_uri + "> .\n";
NTriple_statements = NTriple_statements + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#label> \"" + label_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_statements = NTriple_statements + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#comment> \"" + comment_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_statements = NTriple_statements + "<" + instance_uri + "> <http://purl.org/dc/elements/1.1#date> \"" + datetime + "\"^^<http://www.w3.org/2001/XMLSchema#dateTime> .\n";
NTriple_statements = NTriple_statements + "<" + instance_uri + "> <" + run_of_property_uri + "> <" + executed_instance_uri + "> .\n";
i=i+1;
}
NTriple_statements = NTriple_statements + "<" + process_run_instance_uri + "> <" + process_service_relation_uri + "> <" + service_run_instance_uri + "> .\n";
instance_ontology_url
datetime
timestamp_shortstring
executed_service_instance_uri
executed_process_instance_uri
process_service_relation_uri
run_of_process_property_uri
run_of_service_property_uri
process_run_type_uri
service_run_type_uri
input
NTriple_statements
process_run_instance_uri
service_run_instance_uri
no
http://aida.science.uva.nl:8888/axis/AidaFiler.jws?wsdl
save_as
This workflow calculates a min log likelihood score for the combination of a discoverd protein and a protein of interest (the query protein). Note that at the moment the total count of medline papers, which is part of the formula, is hard coded and not exact. Given its size this should not matter that much, and certainly not in comparison with other likelihoods calculated using the same value.
This workflow calculates a min log likelihood score for the combination of a discoverd protein and a protein of interest (the query protein). Note that at the moment the total count of medline papers, which is part of the formula, is hard coded and not exact. Given its size this should not matter much, and certainly not in comparison with other likelihoods calculated using the same value.
Computation:
-log likelihood ratio for finding query q and discovery d together computed by -log((QD_exp / N) / (QD_obs / N)), QD_exp = (Q/D)/N, where Q is the frequency of documents containing query q, D is the frequency of documents containing the discovery d, and QD the frequency of documents containing both q and d; QD_exp is the expected frequency of documents containing q and d assuming independence of q and d (H0). This score is a measure of 'specialness' of finding q and d together.
edit me!
org.embl.ebi.escience.scuflworkers.java.FlattenList
org.embl.ebi.escience.scuflworkers.java.FlattenList
int count_b = 0;
int count_q = 0;
int count_p = 0;
validated_query = query;
String s=query;
count_b = (s.length() - s.replaceAll("\\(", "").length());
count_b = count_b - (s.length() - s.replaceAll("\\)", "").length());
count_q = (s.length() - s.replaceAll("\"", "").length()) % 2;
count_p = (s.length() - s.replaceAll("'", "").length()) % 2;
count_brackets = count_p.toString();
while (count_b>0) {
validated_query = validated_query + ")";
count_b--;
}
while (count_q != 0) {
validated_query = validated_query + "\"";
count_q--;
}
while (count_p != 0) {
validated_query = validated_query + "'";
count_p--;
}
while (count_p<0) {
validated_query = "'" + validated_query;
count_p++;
}
validated_query = "(" + validated_query + ")" + " OR voetbalstadion";
query
validated_query
count_brackets
import java.util.*;
List newlist = new ArrayList();
for (int i=0; i<((int) Integer.parseInt(copy_number.toString())); i++) {
newlist.add(input);
}
clones=newlist;
copy_number
input
clones
org.embl.ebi.escience.scuflworkers.java.FlattenList
org.embl.ebi.escience.scuflworkers.java.FlattenList
/*
Mijn voorstel is een -log likelihood ratio:
-log ( (#QD_expected / #N) / (#QD / #N) )
waarbij
#QD : het gevonden aantal documenten met het Query eiwit Q, en het discovered eiwit D
#QD_expected = (#Q*#D)/#N : het verwachte aantal documenten met Q en D, gebaseerd op de gevonden aantallen #Q en #D
Edgar: De maat die je beschrijft lijkt erg veel op PMI (point-wise mutual information), wellicht dat je daar wat aan hebt.
*/
/* variables
query_frequency (#Q)
discovered_frequency (#D)
query_discovered_frequency (#QD)
total_frequency (#N)
return
minloglikelihood
*/
import java.lang.Math;
// import edu.uah.math.distributions;
Double mll;
if (query_discovered_frequency.equals("0")) {
mll = (Double) Double.MAX_VALUE; // hack, have to find a more standard or sophisticated score
} else {
double q = (double) Integer.parseInt( query_frequency );
double d = (double) Integer.parseInt( discovered_frequency );
double qd = (double) Integer.parseInt( query_discovered_frequency );
double n = (double) Integer.parseInt( total_frequency );
double qd_expected = (double) ((q*d)/n);
mll = (Double) new Double((double) -( ((double) Math.log(qd_expected/n)) - ((double) Math.log(qd/n))));
}
minloglikelihood = mll.toString();
// minloglikelihood = (String) "test";
query_frequency
discovered_frequency
query_discovered_frequency
total_frequency
minloglikelihood
org.embl.ebi.escience.scuflworkers.java.FlattenList
org.embl.ebi.escience.scuflworkers.java.FlattenList
count = list.size();
list
count
./aid:result/@total
MedLine
org.embl.ebi.escience.scuflworkers.java.FlattenList
net.sourceforge.taverna.scuflworkers.xml.XPathTextWorker
poi_and_dp_query = "(" + poi_query + ") AND (" + dp_query + ")";
poi_query
dp_query
poi_and_dp_query
1
content
http://aida.science.uva.nl:8888/axis/services/SearcherWS?wsdl
search
./aid:result/@total
MedLine
net.sourceforge.taverna.scuflworkers.xml.XPathTextWorker
org.embl.ebi.escience.scuflworkers.java.FlattenList
17000000
/* variables
poi_count_in_corpus
corpus_total
return
relative_frequency
*/
import java.lang.Math;
Double rf = new Double(-Math.log((double)(Integer.parseInt( poi_count_in_corpus ) ) / ((double) (Integer.parseInt( corpus_total )))));
relative_frequency = rf.toString();
corpus_total
poi_count_in_corpus
relative_frequency
1
content
http://aida.science.uva.nl:8888/axis/services/SearcherWS?wsdl
search
E.g. EZH2
E.g. HDAC1
Completed
MinLogLikelihood
Ready
Scheduled
Running
true
org.embl.ebi.escience.scuflworkers.java.FailIfTrue
http://aida.science.uva.nl:8888/axis/AidaSemanticStorage.jws?wsdl
password_bioaid
http://aida.science.uva.nl:8888/axis/AidaSemanticStorage.jws?wsdl
aida_semantic_server_url
http://aida.science.uva.nl:8888/axis/AidaSemanticStorage.jws?wsdl
repository_bioaid
http://aida.science.uva.nl:8888/axis/AidaSemanticStorage.jws?wsdl
username_bioaid
http://aida.science.uva.nl:8888/axis/services/RepositoryWS?wsdl
addRdfFile
e.g.
http://aida.science.uva.nl:9999/aida_public/rdf-output/tmp-rdf-out.rdf
true to prevent adding rdf statements to the repository
Completed
Fail_if_true_Prevent_add
addRdfFile
Scheduled
Running
MedLine
./aid:result/@total
org.embl.ebi.escience.scuflworkers.java.FlattenList
net.sourceforge.taverna.scuflworkers.xml.XPathTextWorker
17000000
/* variables
poi_count_in_corpus
corpus_total
return
relative_frequency
*/
import java.lang.Math;
Double rf = new Double(-Math.log((double)(Integer.parseInt( poi_count_in_corpus ) ) / ((double) (Integer.parseInt( corpus_total )))));
relative_frequency = rf.toString();
corpus_total
poi_count_in_corpus
relative_frequency
1
content
http://aida.science.uva.nl:9999/axis/services/SearcherWS?wsdl
search
Add Query to Semantic model with Sesame service cf example Biological Query
Add Query to Semantic model with Sesame service cf example Biological Query
yes
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/BioModel.owl#BiologicalModel
/*
N-triple beanshell: Instance of Type including label and comment
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/BioModel.owl#Enzyme> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/2000/01/rdf-schema#label> "an enzyme"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/2000/01/rdf-schema#comment> "e.g. the enzyme referred to by as 'EZH2'"^^<http://www.w3.org/2001/XMLSchema#string> .
*/
/*
input:
instance_ontology_url
instance_name
type_uri
label_string
comment_string
output:
NTriple_InstanceOf_statement
instance_uri
*/
instance_uri = instance_ontology_url + "#" + instance_name;
NTriple_InstanceOf_statement = "<" + instance_uri + "> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <" + type_uri + "> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#label> \"" + label_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#comment> \"" + comment_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
instance_ontology_url
instance_name
type_uri
label_string
comment_string
NTriple_InstanceOf_statement
instance_uri
/*
replace characters that Protégé does not like for names
*/
import java.util.regex.*;
import java.net.URLEncoder;
// e.g. output=URLEncoder.encode(input, "UTF-8");
String tmpstring = input;
/*
Pattern p = Pattern.compile("#");
Matcher m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("&hash;");
p = Pattern.compile("\\^");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("⁁");
p = Pattern.compile("<");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("<");
p = Pattern.compile(">");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll(">");
p = Pattern.compile("\\{");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("&lcurly;");
p = Pattern.compile("\\}");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("&rcurly;");
p = Pattern.compile("%");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("&perc;");
p = Pattern.compile("_");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("–");
p = Pattern.compile("\"");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll(""");
p = Pattern.compile("\\s");
m = p.matcher(tmpstring);
tmpstring = (String) m.replaceAll("_");
*/
tmpstring = "BioModel_" + URLEncoder.encode(tmpstring, "UTF-8");
output = tmpstring;
input
output
no
http://aida.science.uva.nl:8888/axis/AidaFiler.jws?wsdl
save_as
Unique name to identify the Biological model by. This name is used as the unique identifier for the model. Characters that cannot be used by OWL in a name are replaced.
Reference to file for RDF output.
E.g.
http://aida.science.uva.nl:9999/aida_public/rdf-output/tmp-rdf-out.rdf
Instance ontology URL. E.g. http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/BioAID_Instances.owl
false
false
false
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/Workflow.owl
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/BioAnnotations.owl
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/AIDA_Instances.owl
no
'rdfxml' or 'turtle' or 'n3'
n3
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/MappingBioTextMining.owl
rdfxml
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/BioAID_Instances.owl
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/BioModel.owl
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/Text.owl
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl
false
org.embl.ebi.escience.scuflworkers.java.FailIfTrue
false
org.embl.ebi.escience.scuflworkers.java.FailIfTrue
bioaid_repository_url = sesame_url.substring(0,sesame_url.length()-6) + "workbench/repositories/" + repository;
sesame_url
repository
bioaid_repository_url
/*
replace characters that Protégé does not like for names
*/
import java.util.regex.*;
import java.net.URLEncoder;
// e.g. output=URLEncoder.encode(input, "UTF-8");
new_filename = input_filename.substring(0, input_filename.length()-4) + URLEncoder.encode(input_query, "UTF-8") + input_filename.substring(input_filename.length()-4);
input_query
input_filename
new_filename
org.embl.ebi.escience.scuflworkers.java.FailIfTrue
http://aida.science.uva.nl:8888/axis/AidaFiler.jws?wsdl
save_as
http://aida.science.uva.nl:8888/axis/AidaSemanticStorage.jws?wsdl
def_rdf_output_doc_url_ts
http://aida.science.uva.nl:8888/axis/AidaSemanticStorage.jws?wsdl
password_bioaid_sandbox
http://aida.science.uva.nl:8888/axis/AidaSemanticStorage.jws?wsdl
aida_semantic_sandbox_server_url
http://aida.science.uva.nl:8888/axis/services/RepositoryWS?wsdl
addRdfFile
http://aida.science.uva.nl:8888/axis/AidaSemanticStorage.jws?wsdl
username_bioaid_sandbox
http://aida.science.uva.nl:8888/axis/AidaSemanticStorage.jws?wsdl
repository_bioaid_sandbox
http://aida.science.uva.nl:8888/axis/services/RepositoryWS?wsdl
clear
true to prevent the repository from being cleared before adding semantic models
true to prevent adding the ontologies to the repository
true to prevent the temporary rdf output file for intermediate rdf output to be overwritten with an empty document.
Completed
clear
addRdfFile
Scheduled
Running
Completed
Fail_if_true_Prevent_clear
clear
Scheduled
Running
Completed
Fail_if_true_Prevent_add
addRdfFile
Scheduled
Running
Completed
Fail_if_true_Prevent_init
save_as
Scheduled
Running
Adds URL cross references to various protein information resources.
Adds URL cross references to various protein information resources.
http://www.uniprot.org/uniprot/
http://www.ihop-net.org/UniPub/iHOP/gismo/
.html?ORGANISM_ID=1
.rdf
http://expasy.org/uniprot/
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=
http://www.ihop-net.org/UniPub/iHOP/?field=UNIPROT__AC&ncbi_tax_id=9606&organism_syn=&search=
http://www.ihop-net.org/UniPub/iHOP/gismo/SKIPPED.html?ORGANISM_ID=1
foo
bar
org.embl.ebi.escience.scuflworkers.java.TestAlwaysFailingProcessor
http://www.ihop-net.org/UniPub/iHOP/gismo/
org.embl.ebi.escience.scuflworkers.java.StringConcat
org.embl.ebi.escience.scuflworkers.java.StringConcat
org.embl.ebi.escience.scuflworkers.java.StringConcat
<iHOPguessedSymbolId query=\".+\" xmlns=\"http://www.pdg.cnb.uam.es/UniPub/iHOP/xml\">(.+)</iHOPguessedSymbolId>
1
org.embl.ebi.escience.scuflworkers.java.RegularExpressionStringList
.html?ORGANISM_ID=1
org.embl.ebi.escience.scuflworkers.java.StringConcat
org.embl.ebi.escience.scuflworkers.java.StringConcat
org.embl.ebi.escience.scuflworkers.java.StringConcat
org.embl.ebi.escience.scuflworkers.java.StringConcat
It takes a biological database reference as input. It guess the iHOP Id which best matches with the input.
http://ubio.bioinfo.cnio.es/biotools/iHOP/iHOP-SOAP.wsdl
guessSymbolIdFromReference
UniProt ID (for iHop a Human protein is expected)
E.g. Q15190
text/xml
Completed
RemoveToUndoBYPASS
guessSymbolIdFromReference
Scheduled
Running
Workflow to extract protein protein interactions from text, followed by filtering protein names known as human protein names. The protein protein interaction service takes the output in 'IOB' format from applyCRF, which annotates proteins as such in text.
[Work in progress and temporarily out of service]
Workflow to extract protein protein interactions from text, followed by filtering protein names known as human protein names. The protein protein interaction service takes the output in 'IOB' format from applyCRF, which annotates proteins as such in text.
ToDo:
Add Relation extraction subworkflow for real.
(.+)\s+(.+)
We O have O identified O a O transcriptional O repressor, O Nrg1, O in O a O genetic O screen O designed O to O reveal O negative O factors O involved O in O the O expression O of O STA1, B-PROTEIN which O encodes O a O glucoamylase. O The O NRG1 B-PROTEIN gene I-PROTEIN encodes O a O 25-kDa O C2H2 B-PROTEIN zinc I-PROTEIN finger I-PROTEIN protein I-PROTEIN which O specifically O binds O to O two O regions O in O the O upstream O activation O sequence O of O the O STA1 B-PROTEIN gene, O as O judged O by O gel O retardation O and O DNase B-PROTEIN I I-PROTEIN footprinting O analyses. O Disruption O of O the O NRG1 B-PROTEIN gene I-PROTEIN causes O a O fivefold O increase O in O the O level O of O the O STA1 B-PROTEIN transcript O in O the O presence O of O glucose. O
1
org.embl.ebi.escience.scuflworkers.java.RegularExpressionStringList
2
org.embl.ebi.escience.scuflworkers.java.RegularExpressionStringList
org.embl.ebi.escience.scuflworkers.java.SplitByRegex
org.embl.ebi.escience.scuflworkers.java.FlattenList
org.embl.ebi.escience.scuflworkers.java.FlattenList
org.embl.ebi.escience.scuflworkers.java.FlattenList
org.embl.ebi.escience.scuflworkers.java.FlattenList
interaction_doc = "P300 TP53\r\nIRF3 CITED1\r\nCITED1 STAT5A\r\nTP73 P300\r\nP300 ZAC1\r\nP300 mSRC-1\r\nSigK T4\r\nGerE ykvP\r\nSpoIIE sigma(F)\r\nrsfA sigma(F)\r\nsigma(K) cwlH\r\nsigma(K) gerE\r\ncwlH gerE\r\nphrC sigmaH\r\nphrC CSF\r\nsigmaH CSF\r\nyfhS E_sigma_E\r\nsigmaK Spo0A\r\nsigmaK sigE\r\nSpo0A sigE\r\nGerE sigK\r\nGerE sigmaK\r\nsigK sigmaK\r\nsigmaK gerE\r\nsigmaK GerE\r\nsigmaK sigmaK\r\ngerE GerE\r\ngerE sigmaK\r\nGerE sigmaK\r\nGerE cotD\r\nGerE sigmaK\r\nGerE GerE\r\nGerE cotD\r\ncotD sigmaK\r\ncotD GerE\r\ncotD cotD\r\nsigmaK GerE\r\nsigmaK cotD\r\nGerE cotD\r\n";
interaction_doc
unspecified relation
relation = protein_name1 + " - " + interaction_term + " - " + protein_name2;
id_relation = uniprot_id1 + ", " + interaction_term + ", " + uniprot_id2;
protein_name1
protein_name2
interaction_term
uniprot_id1
uniprot_id2
relation
id_relation
(\r\n)|(\n)
This workflow filters protein_molecule-labeled term pairs from two input strings(list). The result is a pair of tagged lists of proteins. It uses a UniProt service by Martijn Schuemie (BioSemantics group Rotterdam) that know about human proteins.
This workflow filters protein_molecule-labeled terms from an input string(list). The result is a tagged list of proteins (disregarding false positives in the input).
Internal information:
This workflow is a copy of 'filter_protein_molecule_MR3' used for the NBIC poster (now in Archive).
.+
if (uniprot1 != "False" && uniprot2 != "False") {
true_protein1=protein1;
true_uniprot1=uniprot1;
true_protein2=protein2;
true_uniprot2=uniprot2;
}
protein1
uniprot1
protein2
uniprot2
true_protein1
true_uniprot1
true_protein2
true_uniprot2
.+
org.embl.ebi.escience.scuflworkers.java.FilterStringList
if (uniprotIDlist.isEmpty() ) {
uniprotID_or_False = "False";
} else {
uniprotID_or_False = (String) uniprotIDlist.iterator().next().toString();
}
uniprotIDlist
uniprotID_or_False
org.embl.ebi.escience.scuflworkers.java.FilterStringList
.+
org.embl.ebi.escience.scuflworkers.java.FilterStringList
org.embl.ebi.escience.scuflworkers.java.FilterStringList
if (uniprotIDlist.isEmpty() ) {
uniprotID_or_False = "False";
} else {
uniprotID_or_False = (String) uniprotIDlist.iterator().next().toString();
}
uniprotIDlist
uniprotID_or_False
http://biosemantics.org:8080/axis/services/SynsetServer/SynsetServer.jws?wsdl
getUniprotID
http://biosemantics.org:8080/axis/services/SynsetServer/SynsetServer.jws?wsdl
getUniprotID
text/plain
Plain text to extract proteins from.
e.g.
We have identified a transcriptional repressor, Nrg1, in a genetic screen designed to reveal negative factors involved in the expression of STA1, which encodes a glucoamylase. The NRG1 gene encodes a 25-kDa C2H2 zinc finger protein which specifically binds to two regions in the upstream activation sequence of the STA1 gene, as judged by gel retardation and DNase I footprinting analyses. Disruption of the NRG1 gene causes a fivefold increase in the level of the STA1 transcript in the presence of glucose.
Add Protein to Semantic model with Sesame service cf example Discovered Proteins
Add Protein to Semantic model with Sesame service cf example Discovered Proteins
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/BioModel.owl#isModelComponentOf
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/Text.owl#ProteinTerm
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/Text.owl#is_Content_Component_Of
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/Workflow.owl#has_input
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/Workflow.owl#has_output
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/BioModel.owl#Protein
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/seeAlso_EntrezUniProtURI
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/MappingBioTextMining.owl#references
yes
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/seeAlso_ExpasyUniProtURI
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovered_by
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/seeAlso_iHopUniProtURI
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/seeAlso_iHopQueryURI
/*
N-triple beanshell: Property of Instance
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_DocumentSearchQuery> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#has_lucene_query> "\"EZH2\" AND chromatin"^^<http://www.w3.org/2001/XMLSchema#string> .
*/
/*
variables, input:
instance_uri
annotation_property_uri
seeAlso_url
output:
NTriple_PropertyOfInstance_statement
*/
import java.net.URLEncoder;
// e.g. output=URLEncoder.encode(input, "UTF-8");
seeAlso_url = URLEncoder.encode(seeAlso_url, "UTF-8");
NTriple_PropertyOfInstance_statement = "<" + instance_uri + "> <"+ annotation_property_uri + "> \"" + seeAlso_url + "\"^^<http://www.w3.org/2001/XMLSchema#anyURI> .\n";
instance_uri
annotation_property_uri
seeAlso_url
NTriple_PropertyOfInstance_statement
/*
N-triple beanshell: Instance of Type including label and comment
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/BioModel.owl#Enzyme> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/2000/01/rdf-schema#label> "an enzyme"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/2000/01/rdf-schema#comment> "e.g. the enzyme referred to by as 'EZH2'"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_BioModel> <http://purl.org/dc/elements/1.1/date> "1999-05-31T13:20:00-05:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
*/
/*
input:
instance_ontology_url
instance_name
type_uri
label_string
comment_string
datetime
output:
NTriple_InstanceOf_statement
instance_uri
*/
import java.net.URLEncoder;
// e.g. output=URLEncoder.encode(input, "UTF-8");
instance_uri = instance_ontology_url + "#" + URLEncoder.encode(instance_name, "UTF-8");
NTriple_InstanceOf_statement = "<" + instance_uri + "> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <" + type_uri + "> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#label> \"" + label_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#comment> \"" + comment_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://purl.org/dc/elements/1.1#date> \"" + datetime + "\"^^<http://www.w3.org/2001/XMLSchema#dateTime> .\n";
instance_ontology_url
instance_name
type_uri
label_string
comment_string
datetime
NTriple_InstanceOf_statement
instance_uri
/*
N-triple beanshell: RelationByURI
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_InteractionTerm> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovered_by> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_RelationDiscoveryProcess> .
*/
/*
variables,
input:
domain_instance_uri
relation_uri
range_instance_uri
output:
NTriple_Relation_statement
*/
NTriple_Relation_statement = "<" + domain_instance_uri + "> <" + relation_uri + "> <" + range_instance_uri + "> .\n";
domain_instance_uri
relation_uri
range_instance_uri
NTriple_Relation_statement
/*
N-triple beanshell: Property of Instance
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_DocumentSearchQuery> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#has_lucene_query> "\"EZH2\" AND chromatin"^^<http://www.w3.org/2001/XMLSchema#string> .
*/
/*
variables, input:
instance_uri
annotation_property_uri
seeAlso_url
output:
NTriple_PropertyOfInstance_statement
*/
import java.net.URLEncoder;
// e.g. output=URLEncoder.encode(input, "UTF-8");
seeAlso_url = URLEncoder.encode(seeAlso_url, "UTF-8");
NTriple_PropertyOfInstance_statement = "<" + instance_uri + "> <"+ annotation_property_uri + "> \"" + seeAlso_url + "\"^^<http://www.w3.org/2001/XMLSchema#anyURI> .\n";
instance_uri
annotation_property_uri
seeAlso_url
NTriple_PropertyOfInstance_statement
/*
N-triple beanshell: RelationByURI
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_InteractionTerm> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovered_by> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_RelationDiscoveryProcess> .
*/
/*
variables,
input:
domain_instance_uri
relation_uri
range_instance_uri
output:
NTriple_Relation_statement
*/
NTriple_Relation_statement = "<" + domain_instance_uri + "> <" + relation_uri + "> <" + range_instance_uri + "> .\n";
domain_instance_uri
relation_uri
range_instance_uri
NTriple_Relation_statement
/*
N-triple beanshell: RelationByURI
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_InteractionTerm> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovered_by> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_RelationDiscoveryProcess> .
*/
/*
variables,
input:
domain_instance_uri
relation_uri
range_instance_uri
output:
NTriple_Relation_statement
*/
NTriple_Relation_statement = "<" + domain_instance_uri + "> <" + relation_uri + "> <" + range_instance_uri + "> .\n";
domain_instance_uri
relation_uri
range_instance_uri
NTriple_Relation_statement
/*
N-triple beanshell: RelationByURI
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_InteractionTerm> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovered_by> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_RelationDiscoveryProcess> .
*/
/*
variables,
input:
domain_instance_uri
relation_uri
range_instance_uri
output:
NTriple_Relation_statement
*/
NTriple_Relation_statement = "<" + domain_instance_uri + "> <" + relation_uri + "> <" + range_instance_uri + "> .\n";
domain_instance_uri
relation_uri
range_instance_uri
NTriple_Relation_statement
/*
N-triple beanshell: Property of Instance
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_DocumentSearchQuery> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#has_lucene_query> "\"EZH2\" AND chromatin"^^<http://www.w3.org/2001/XMLSchema#string> .
*/
/*
variables, input:
instance_uri
annotation_property_uri
seeAlso_url
output:
NTriple_PropertyOfInstance_statement
*/
import java.net.URLEncoder;
// e.g. output=URLEncoder.encode(input, "UTF-8");
seeAlso_url = URLEncoder.encode(seeAlso_url, "UTF-8");
NTriple_PropertyOfInstance_statement = "<" + instance_uri + "> <"+ annotation_property_uri + "> \"" + seeAlso_url + "\"^^<http://www.w3.org/2001/XMLSchema#anyURI> .\n";
instance_uri
annotation_property_uri
seeAlso_url
NTriple_PropertyOfInstance_statement
/*
N-triple beanshell: RelationByURI
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_InteractionTerm> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovered_by> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_RelationDiscoveryProcess> .
*/
/*
variables,
input:
domain_instance_uri
relation_uri
range_instance_uri
output:
NTriple_Relation_statement
*/
NTriple_Relation_statement = "<" + domain_instance_uri + "> <" + relation_uri + "> <" + range_instance_uri + "> .\n";
domain_instance_uri
relation_uri
range_instance_uri
NTriple_Relation_statement
/*
input:
protein_name
uniprot_id
output:
protein_label
protein_comment
protein_term_label
protein_term_comment
*/
if(uniprot_id.length()>0) {
protein_label = protein_name + "_" + uniprot_id;
protein_comment = "protein referred to by as " + protein_name + " and UniProt ID: " + uniprot_id;
protein_term_comment = "protein term " + protein_name + ", identified as name of protein with UniProt ID: " + uniprot_id;
} else {
protein_label = protein_name; // flawed assumption that proteins without a uniprot id can be identified by their name
protein_comment = "protein referred to by as " + protein_name + " without a rat, mouse, or human UniProt ID";
protein_term_comment = "protein term " + protein_name + " as name of protein without rat, mouse, or human UniProt identifier";
}
protein_term_label = protein_label;
protein_name
uniprot_id
protein_label
protein_comment
protein_term_label
protein_term_comment
/*
N-triple beanshell: Instance of Type including label and comment
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/BioModel.owl#Enzyme> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/2000/01/rdf-schema#label> "an enzyme"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/2000/01/rdf-schema#comment> "e.g. the enzyme referred to by as 'EZH2'"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_BioModel> <http://purl.org/dc/elements/1.1/date> "1999-05-31T13:20:00-05:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
*/
/*
input:
instance_ontology_url
instance_name
type_uri
label_string
comment_string
datetime
output:
NTriple_InstanceOf_statement
instance_uri
*/
import java.net.URLEncoder;
// e.g. output=URLEncoder.encode(input, "UTF-8");
instance_uri = instance_ontology_url + "#" + URLEncoder.encode(instance_name, "UTF-8");
NTriple_InstanceOf_statement = "<" + instance_uri + "> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <" + type_uri + "> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#label> \"" + label_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#comment> \"" + comment_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://purl.org/dc/elements/1.1#date> \"" + datetime + "\"^^<http://www.w3.org/2001/XMLSchema#dateTime> .\n";
instance_ontology_url
instance_name
type_uri
label_string
comment_string
datetime
NTriple_InstanceOf_statement
instance_uri
/*
N-triple beanshell: RelationByURI
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_InteractionTerm> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovered_by> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_RelationDiscoveryProcess> .
*/
/*
variables,
input:
domain_instance_uri
relation_uri
range_instance_uri
output:
NTriple_Relation_statement
*/
NTriple_Relation_statement = "<" + domain_instance_uri + "> <" + relation_uri + "> <" + range_instance_uri + "> .\n";
domain_instance_uri
relation_uri
range_instance_uri
NTriple_Relation_statement
/*
Concatenates List of Strings.
Use 'merge all data' on its input to concatenate different inputs.
*/
String s;
Iterator iter = (Iterator) stringlist.iterator();
if (iter.hasNext()) s = iter.next();
while (iter.hasNext()) {
s = s + delimiter + iter.next();
}
output = s;
stringlist
delimiter
output
/*
N-triple beanshell: Property of Instance
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_DocumentSearchQuery> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#has_lucene_query> "\"EZH2\" AND chromatin"^^<http://www.w3.org/2001/XMLSchema#string> .
*/
/*
variables, input:
instance_uri
annotation_property_uri
seeAlso_url
output:
NTriple_PropertyOfInstance_statement
*/
import java.net.URLEncoder;
// e.g. output=URLEncoder.encode(input, "UTF-8");
seeAlso_url = URLEncoder.encode(seeAlso_url, "UTF-8");
NTriple_PropertyOfInstance_statement = "<" + instance_uri + "> <"+ annotation_property_uri + "> \"" + seeAlso_url + "\"^^<http://www.w3.org/2001/XMLSchema#anyURI> .\n";
instance_uri
annotation_property_uri
seeAlso_url
NTriple_PropertyOfInstance_statement
http://aida.science.uva.nl:8888/axis/AidaFiler.jws?wsdl
save_as
E.g.
EZH2
e.g.
Q15910
http://www.ncbi.nlm.nih.gov/pubmed/
http://www.expasy.ch/
E.g.
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/BioAID_Instances.owl
E.g.
2008-08-14T14:37:29+02:00
E.g.
http://aida.science.uva.nl:9999/aida_public/rdf-output/tmp-rdf-out.rdf
Add Document to Semantic model with Sesame service cf example discovered document
Add Document to Semantic model with Sesame service cf example discovered document
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/Text.owl#Document
Retrieved PubMed Document
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/Workflow.owl#has_input
Retrieved document, PubMed URL:
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/Workflow.owl#has_output
yes
http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovered_by
/*
N-triple beanshell: RelationByURI
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_InteractionTerm> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovered_by> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_RelationDiscoveryProcess> .
*/
/*
variables,
input:
domain_instance_uri
relation_uri
range_instance_uri
output:
NTriple_Relation_statement
*/
NTriple_Relation_statement = "<" + domain_instance_uri + "> <" + relation_uri + "> <" + range_instance_uri + "> .\n";
domain_instance_uri
relation_uri
range_instance_uri
NTriple_Relation_statement
org.embl.ebi.escience.scuflworkers.java.StringConcat
org.embl.ebi.escience.scuflworkers.java.StringConcat
/*
Concatenates List of Strings.
Use 'merge all data' on its input to concatenate different inputs.
*/
String s;
Iterator iter = (Iterator) stringlist.iterator();
if (iter.hasNext()) s = iter.next();
while (iter.hasNext()) {
s = s + delimiter + iter.next();
}
output = s;
stringlist
delimiter
output
/*
N-triple beanshell: RelationByURI
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_InteractionTerm> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovered_by> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_RelationDiscoveryProcess> .
*/
/*
variables,
input:
domain_instance_uri
relation_uri
range_instance_uri
output:
NTriple_Relation_statement
*/
NTriple_Relation_statement = "<" + domain_instance_uri + "> <" + relation_uri + "> <" + range_instance_uri + "> .\n";
domain_instance_uri
relation_uri
range_instance_uri
NTriple_Relation_statement
domain_instance_uri
range_instance_uri
/*
N-triple beanshell: RelationByURI
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_InteractionTerm> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/TextMining.owl#discovered_by> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_RelationDiscoveryProcess> .
*/
/*
variables, input:
domain_instance_uri
relation_uri
range_instance_uri
output:
NTriple_Relation_statement
*/
NTriple_Relation_statement = "<" + domain_instance_uri + "> <" + relation_uri + "> <" + range_instance_uri + "> .\n";
domain_instance_uri
relation_uri
range_instance_uri
NTriple_Relation_statement
/*
N-triple beanshell: Instance of Type including label and comment
*/
/*
conform:
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Proto-ontology/BioModel.owl#Enzyme> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/2000/01/rdf-schema#label> "an enzyme"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_Enzyme> <http://www.w3.org/2000/01/rdf-schema#comment> "e.g. the enzyme referred to by as 'EZH2'"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/ExampleInstances.owl#ExampleInstance_BioModel> <http://purl.org/dc/elements/1.1/date> "1999-05-31T13:20:00-05:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
*/
/*
input:
instance_ontology_url
instance_name
type_uri
label_string
comment_string
datetime
output:
NTriple_InstanceOf_statement
instance_uri
*/
import java.net.URLEncoder;
// e.g. output=URLEncoder.encode(input, "UTF-8");
instance_uri = instance_ontology_url + "#" + URLEncoder.encode(instance_name, "UTF-8");
NTriple_InstanceOf_statement = "<" + instance_uri + "> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <" + type_uri + "> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#label> \"" + label_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://www.w3.org/2000/01/rdf-schema#comment> \"" + comment_string + "\"^^<http://www.w3.org/2001/XMLSchema#string> .\n";
NTriple_InstanceOf_statement = NTriple_InstanceOf_statement + "<" + instance_uri + "> <http://purl.org/dc/elements/1.1#date> \"" + datetime + "\"^^<http://www.w3.org/2001/XMLSchema#dateTime> .\n";
instance_ontology_url
instance_name
type_uri
label_string
comment_string
datetime
NTriple_InstanceOf_statement
instance_uri
http://aida.science.uva.nl:8888/axis/AidaFiler.jws?wsdl
save_as
e.g. http://rdf.adaptivedisclosure.org/owl/BioAID/myModel/Enriched-ontology/BioAID_Instances.owl
e.g.
http://aida.science.uva.nl:9999/aida_public/rdf-output/tmp-rdf-out.rdf
conform
2008-08-14T14:19:00+02:00
Workflow to optimize a Lucene document retrieval query to
1. increase the priority of recent years (in decreasing order from 2009 down to 2002)
2. limit a subsequent search to a specific organism using a mesh organism tag
Workflow to optimize a Lucene document retrieval query to
1. increase the priority of recent years (in decreasing order from 2009 down to 2002)
2. replace protein names with protein synonym strings.
3. provide protein names and uniprot ids for bonifide human protein terms in the input query
(see subworkflow for details)
year:(2009^10 2008^9 2007^8 2007^7 2006^6 2005^5 2004^4 2003^3 2002^2 2002^1)
StringBuffer temp=new StringBuffer();
temp.append("+(");
temp.append(query_string);
temp.append(") ");
/* comment if temporarily out of order: */
temp.append(" +");
temp.append(priority_string);
String lucene_query = temp.toString();
query_string
priority_string
lucene_query
This workflow creates a query string from the query term using Martijn Schuemie's synonym service. The service is limited to proteins, enzymes and genes. An input query that is a boolean string will be split and processed. Until I find a smarter regular expression only terms withing double quotes will be replaced by synonym strings.
This workflow will try to expand an input query with human/rat/mouse protein synonyms using Martijn Schuemie's UniProt-based synonym service. The service is limited to proteins, enzymes and genes. Terms that are bonifide human/rat/mouse protein names are also produced.
Warning note:
Synonym expansion may add ambiguous terms to a query.
This workflow is a fork from ProteinSynonymsToQuery.xml.
Known issues:
* see the query string input metadata
* the Lucene-based tokenizer removes stop-words from multiword tokens, which likely renders the token unrecognizable for the synonym service. (E.g. 'Enhancer of Zeste', becomes "enhancer zeste".)
* replacement is iterative, so nested replacements (creating very long strings) may occur.
org.embl.ebi.escience.scuflworkers.java.FlattenList
org.embl.ebi.escience.scuflworkers.java.FlattenList
if(input_list.size()==1) {
output = input_list.iterator().next().toString();
} else {
output = input_list;
}
input_list
output
List unitmp = new ArrayList();
List protmp = new ArrayList();
List unilisttmp = new ArrayList();
Iterator val_iter = validated.iterator();
Iterator uni_iter = uniProtIDlist.iterator();
Iterator prot_iter = proteinNameList.iterator();
Iterator unilist_iter = uniProtIDlistList.iterator();
String val;
while (val_iter.hasNext() && uni_iter.hasNext() && prot_iter.hasNext() && unilist_iter.hasNext()) {
val=val_iter.next();
if (val=="True" || val == "true") {
unitmp.add(uni_iter.next());
protmp.add(prot_iter.next());
unilisttmp.add(unilist_iter.next());
}
}
cleanUniProtIDlist = unitmp;
cleanProteinNamelist = protmp;
cleanUniProtIDlistList = unilisttmp;
validated
uniProtIDlist
proteinNameList
uniProtIDlistList
cleanUniProtIDlist
cleanProteinNamelist
cleanUniProtIDlistList
import java.util.*;
String synstring=query_term;
String syn;
Iterator iterator = synonymlist.iterator();
while ( iterator.hasNext() )
{
syn = ((String) iterator.next());
synstring = synstring + " OR " + "\"" + syn + "\"";
}
new_query = synstring;
synonymlist
query_term
new_query
// replace words in the input query by synonym strings plus the original
// input: the input query
// findstringlist: list of words to be replaced
// replacestringlist: list of synonym string to replace items in the findstringlist
// find and replaces string lists form pairs (should be of equal length)
import java.util.regex.*;
String tmp=(String) input;
String findstring;
Iterator find_iterator = findstringlist.iterator();
Iterator replace_iterator = replacestringlist.iterator();
int i=0;
/* iteration 1: replace with placeholders to prevent replacing replacements */
while (find_iterator.hasNext())
{
findstring = ((String) find_iterator.next().toString());
replacestring = "§" + i + "¡";
i=i+1;
Pattern p = Pattern.compile("[\"]*"+findstring+"[\"]*");
Matcher m = p.matcher(tmp);
tmp = (String) m.replaceAll(replacestring);
}
i=0;
while (replace_iterator.hasNext())
{
findstring = "§" + i + "¡";
i=i+1;
replacestring = ((String) replace_iterator.next().toString());
Pattern p = Pattern.compile("[\"]*"+findstring+"[\"]*");
Matcher m = p.matcher(tmp);
tmp = (String) m.replaceAll("("+replacestring+")");
}
output = tmp;
input
findstringlist
replacestringlist
output
/* if list is empty replaces it with list of empty lists
tested on 1-deep lists
in
listOfLists
out
listOfLists_out
*/
List newListOfLists = new ArrayList();
if (listOfLists_in.size()<=0) {
List newList = new ArrayList();
newListOfLists.add((List) newList);
} else {
newListOfLists = listOfLists_in;
}
listOfLists_out = (List) newListOfLists;
listOfLists_in
listOfLists_out
This workflow filters protein_molecule-labeled terms from an input string(list). The result is a tagged list of proteins (disregarding false positives in the input).
Internal information:
This workflow is a copy of 'filter_protein_molecule_MR3' used for the NBIC poster (now in Archive).
This workflow filters protein_molecule-labeled terms from an input string(list). The result is a tagged list of proteins (disregarding false positives in the input).
Internal information:
This workflow is a copy of 'filter_protein_molecule_MR3' used for the NBIC poster (now in Archive).
List tmp_list = new ArrayList();
String tmp_id;
String tmp_name;
if (!(uniProtIDlist.isEmpty())) {
tmp_list=uniProtIDlist;
tmp_name = potentialProteinName;
Iterator iter = uniProtIDlist.iterator();
if (uniProtIDlist.size()>1) {
tmp_id = "OneOf";
while (iter.hasNext()) {
tmp_id=tmp_id+"_"+iter.next().toString();
}
} else {
tmp_id = iter.next().toString();
}
validated="True";
} else {
tmp_list.add("NO_UNIPROT_ID_FOR_"+potentialProteinName);
tmp_id = "NO_UNIPROT_ID_FOR_"+potentialProteinName;
tmp_name = "NO_UNIPROT_ID_FOR_"+potentialProteinName;
validated="False";
}
validatedUniProtIDlist = tmp_list;
validatedUniProtID = tmp_id;
validatedProteinName = tmp_name;
uniProtIDlist
potentialProteinName
validatedUniProtIDlist
validatedUniProtID
validatedProteinName
validated
http://biosemantics.org:8080/axis/services/SynsetServer/SynsetServer.jws?wsdl
getUniprotID
http://biosemantics.org:8080/axis/services/SynsetServer/SynsetServer.jws?wsdl
getSynsets
Splits and input query string into its parts. Works for queries that contain search terms, search phrases between double quotes, connected by AND or OR. Behaviour undetermined when other characters such as +, -, or brackets are used. Should work now for well formed patterns with bracketed substrings separated by AND/OR/AND NOT/OR NOT, e.g. (Topic1) AND NOT (Topic2), but not extensively tested.
Extracts query terms or phrases from a (Lucene) boolean query.
Known issues:
The Lucene tokenizer is used. This has some side-effects. The conversion of single word terms to lower case is reversed in this workflow, but the drop of stop words in multi-term tokens is not (e.g. 'Enhancer of Zeste' will become 'enhancer zeste').
org.embl.ebi.escience.scuflworkers.java.StringStripDuplicates
Use this method to tokenize a Lucene Query into terms.
http://aida.science.uva.nl:8888/axis/services/tokenize?wsdl
queryToArray
Queries that contain search terms, search phrases between double quotes, possibly connected by AND or OR. Behaviour undetermined when other characters such as +, -, or brackets are used.
E.g.
EZH2 +(HDAC1 AND SMC1) OR "Enhancer of Zeste Drosophila Homologue 2" AND ('P53' or 'ftsQ')
(Boolean) query. Protein/gene/enzyme names registered as human, rat, or mouse protein at UniProt will be replaced by a string of synonyms including the original name. Other types of names are left as they are.
Example inputs:
"EZH2"
"Enhancer of Zeste" AND "HDAC1" AND "chromatin"
EZH2 +(HDAC1 AND SMC1) OR "Enhancer of Zeste Drosophila Homologue 2" AND ('P53' or 'ftsQ')
Known issues:
* It is assumed that each item in the query is a separate biological term, including synonyms that the BioSemantics synonym/uniprot service knows about will lead to unnecessary redundancy in the new query.
Lucene query string
Lucene query based on the input query with the addition of:
1. A Lucene string to give recent years higher priority (in decreasing order from 2009 down to 2002)
2. A mesh organism term to limit subsequent searches
Workflow to extract proteins from text, followed by filtering protein names known as human protein names.
Workflow to extract proteins from text, followed by filtering protein names known as human protein names.
This workflow filters protein_molecule-labeled terms from an input string(list). The result is a tagged list of proteins (disregarding false positives in the input).
Internal information:
This workflow is a copy of 'filter_protein_molecule_MR3' used for the NBIC poster (now in Archive).
This workflow filters protein_molecule-labeled terms from an input string(list). The result is a tagged list of proteins (disregarding false positives in the input).
Internal information:
This workflow is a copy of 'filter_protein_molecule_MR3' used for the NBIC poster (now in Archive).
List tmp_list = new ArrayList();
String tmp_id;
String tmp_name;
if (!(uniProtIDlist.isEmpty())) {
tmp_list=uniProtIDlist;
tmp_name = potentialProteinName;
Iterator iter = uniProtIDlist.iterator();
if (uniProtIDlist.size()>1) {
tmp_id = "OneOf";
while (iter.hasNext()) {
tmp_id=tmp_id+"_"+iter.next().toString();
}
} else {
tmp_id = iter.next().toString();
}
validated="True";
} else {
tmp_list.add("NO_UNIPROT_ID_FOR_"+potentialProteinName);
tmp_id = "NO_UNIPROT_ID_FOR_"+potentialProteinName;
tmp_name = "NO_UNIPROT_ID_FOR_"+potentialProteinName;
validated="False";
}
validatedUniProtIDlist = tmp_list;
validatedUniProtID = tmp_id;
validatedProteinName = tmp_name;
uniProtIDlist
potentialProteinName
validatedUniProtIDlist
validatedUniProtID
validatedProteinName
validated
http://biosemantics.org:8080/axis/services/SynsetServer/SynsetServer.jws?wsdl
getUniprotID
Workflow to extract proteins from text
Two methods are combined:
Named entity recognition using LingPipe (NERecognize)
Named entity recognition using Conditional Random Fields (applyCRF)
Both are based on machine learning methods.
Default inputs:
Model: 1 = BioCreative I
OutputMode: 1 = IOB format; 2 = SGML format; 3 = a list of entities; 4 = ABNER format
Tokenization: 1 = activate tokenization (makes no difference in practice)
testFile is expected to be a piece of text (string)
Workflow to extract proteins from text
Two methods are combined:
Named entity recognition using LingPipe (NERecognize)
Named entity recognition using Conditional Random Fields (applyCRF)
Both are based on machine learning methods.
Default inputs:
Model: 1 = BioCreative I
OutputMode: 1 = IOB format; 2 = SGML format; 3 = a list of entities; 4 = ABNER format
Tokenization: 1 = activate tokenization (makes no difference in practice)
testFile is expected to be a piece of text (string)
PROTEIN \[(.+)\]
3
\n
1
1
org.embl.ebi.escience.scuflworkers.java.StringStripDuplicates
org.embl.ebi.escience.scuflworkers.java.SplitByRegex
org.embl.ebi.escience.scuflworkers.java.StringSetUnion
1
org.embl.ebi.escience.scuflworkers.java.RegularExpressionStringList
import java.util.*;
String tmp;
tmp = input;
if (input.endsWith(" protein")) {
tmp = input.substring(0,input.length()-8);
}
if (input.endsWith(" gene")) {
tmp = input.substring(0,input.length()-5);
}
output = tmp;
input
output
http://aida.science.uva.nl:8888/axis/services/CRFapply?wsdl
apply
http://aida.science.uva.nl:8888/axis/services/CRFapply?wsdl
apply
http://aida.science.uva.nl:9999/axis/services/CRFapply?wsdl
apply
Example:
We have identified a transcriptional repressor, Nrg1, in a genetic screen designed to reveal negative factors involved in the expression of STA1, which encodes a glucoamylase. The NRG1 gene encodes a 25-kDa C2H2 zinc finger protein which specifically binds to two regions in the upstream activation sequence of the STA1 gene, as judged by gel retardation and DNase I footprinting analyses. Disruption of the NRG1 gene causes a fivefold increase in the level of the STA1 transcript in the presence of glucose.
Plain text to extract proteins from.
e.g.
We have identified a transcriptional repressor, Nrg1, in a genetic screen designed to reveal negative factors involved in the expression of STA1, which encodes a glucoamylase. The NRG1 gene encodes a 25-kDa C2H2 zinc finger protein which specifically binds to two regions in the upstream activation sequence of the STA1 gene, as judged by gel retardation and DNase I footprinting analyses. Disruption of the NRG1 gene causes a fivefold increase in the level of the STA1 transcript in the presence of glucose.
text/xml
Biological query, e.g. a protein of interest. See Lucene documentation for advanced queries (http://lucene.apache.org/)
Synonyms for protein names will be searched and added for terms within double quotes.
limits the maximum number of hits search will produce. In Taverna 1 '100' works well while a 1000 and above is likely to halt Taverna 1 due to memory problems. This also depends on the memory setting for the java virtual machine by the client (usually your local Taverna).
A magic word is required to make use of the AIDA semantic repository for BioAID workflows. Please ask Scott Marshall (marshall@science.uva.nl) or Marco Roos (M.Roos1@uva.nl) for the magic word. NB: this semantic repository is for temporary data only. You should expect the repository to be cleared often and without warning.
true
if you would like the knowledge base (triple store) to be cleared and the proto-ontologies reloaded
false
otherwise
Completed
s07_AddScoreToSemanticModel
s08_AddRdfToRepository
Scheduled
Running
Completed
Fail
s06_AddProteinRelationToSemanticModel
Scheduled
Running
Completed
Fail
04_ExtractProteinRelations_HomoSapiens
Scheduled
Running
Completed
05_ScoreExtractedProteins
06_UniProtXrefURLs_iHopBYPASS
Scheduled
Running
Completed
s00_InitializeSemanticStorage
01_ProcessQuery
Scheduled
Running
Completed
s03_AddExpandedQueryToSemanticModel
s03b_AddQueryProteinsToSemanticModel
Scheduled
Running
Completed
p01_AddWorkflowToSemanticModel
s03_AddExpandedQueryToSemanticModel
Scheduled
Running
Completed
s03b_AddQueryProteinsToSemanticModel
p02_AddDocumentDiscoveryToSemanticModel
Scheduled
Running