This workflow takes in a CDNA raw file and a normalisation method then returns a series of images/graphs which represent the same output obtained using the R and bioconductor. Also retruned by this workflow are a list of the top differentialy expressed genes (size dependant on the number specified as input - geneNumber), which are then used to find the candidate pathways which may be influencing the observed changes in the microarray data. By identifying the candidate pathways, more detailed insights into the gene expression data can be obtained. These pathways are subsequently used to obtain a corpus of published abstracts (from the PubMed database) relating to each biological pathway identified. These pathways are subsequently used to obtain a corpus of published abstracts (from the PubMed database) relating to each biological pathway identified.
Also it generates a pie chart which, indicates the number of genes in a dataset that are regulated by a known transcriptional regulator, or by combination of regulators, and can suggest previously unknown regulatory interactions. The information for each regulon comes from files that are created manually from the EcoCyc database.
NOTE - You will also need to install R and Rserv on your machine and install the libaries required by the R script into you R library directory
The example inputs for this workflow are as follows:
geneNumber = the number of differentialy expressed gene to be returned above a given p-value, e.g. 20
path = the direct path to the raw data file location, e.g. C:/Microarray_Data/FILES/ - note the forward slashes
p-value = the p-value cut-off value for the array data, e.g. 0.05
foldChange = the fold change value for the microarray data, e.g. 1 (means greater than 1 or less than -1)
regulonDir = the direct path to the regulon file.
\n
org.embl.ebi.escience.scuflworkers.java.StringListMerge
org.embl.ebi.escience.scuflworkers.java.StringListMerge
org.embl.ebi.escience.scuflworkers.java.StringListMerge
org.embl.ebi.escience.scuflworkers.java.SplitByRegex
org.embl.ebi.escience.scuflworkers.java.SplitByRegex
org.embl.ebi.escience.scuflworkers.java.StringListMerge
String[] split = input.split("\n");
Vector nonEmpty = new Vector();
for (int i = 0; i < split.length; i++){
if (!(split[i].equals("")))
{
nonEmpty.add(split[i].trim());
}
}
String[] non_empty = new String[nonEmpty.size()];
for (int i = 0; i < non_empty.length; i ++)
{
non_empty[i] = nonEmpty.elementAt(i);
}
String output = "";
for (int i = 0; i < non_empty.length; i++)
{
output = output + (String) (non_empty[i] + "\n");
}
input
output
org.embl.ebi.escience.scuflworkers.java.SplitByRegex
org.embl.ebi.escience.scuflworkers.java.StringListMerge
org.embl.ebi.escience.scuflworkers.java.StringListMerge
String[] split = input.split("\n");
Vector nonEmpty = new Vector();
for (int i = 0; i < split.length; i++)
{
String mytext = split[i].substring((split[i].indexOf("eco:") +4));
nonEmpty.add(mytext);
}
String output = "";
for (int i = 0; i < nonEmpty.size(); i++)
{
output = output + (String) (nonEmpty.elementAt(i));
}
input
output
org.embl.ebi.escience.scuflworkers.java.StringListMerge
\n
String[] split = abstracts.split("\n");
String pathway_name = pathway;
Vector nonEmpty = new Vector();
for (int i = 0; i < split.length; i++)
{
String trimmed = split[i].trim();
nonEmpty.add(trimmed);
}
String output = ">> " + pathway_name + "\n";
for (int i = 0; i < nonEmpty.size(); i++)
{
output = output + (String) (nonEmpty.elementAt(i) + "\n");
}
pathway
abstracts
output
String[] split = input.split(" ");
Vector nonEmpty = new Vector();
String trimmed = split[0].trim();
String output = trimmed + "\n";
input
output
String[] split = input.split("\n");
Vector nonEmpty = new Vector();
for (int i = 0; i < split.length; i++)
{
String mytext = split[i].substring(split[i].indexOf(" "), split[i].indexOf(" - "));
nonEmpty.add(mytext);
}
String output = "";
for (int i = 0; i < nonEmpty.size(); i++)
{
output = output + (String) (nonEmpty.elementAt(i) + "\n");
}
input
output
String[] split = input.split("\n");
Vector nonEmpty = new Vector();
for (int i = 0; i < split.length; i++){
if (!(split[i].equals("")))
{
nonEmpty.add(split[i].trim());
}
}
String[] non_empty = new String[nonEmpty.size()];
for (int i = 0; i < non_empty.length; i ++)
{
non_empty[i] = nonEmpty.elementAt(i);
}
String output = "";
for (int i = 0; i < non_empty.length; i++)
{
output = output + (String) (non_empty[i] + "\t");
}
input
output
String[] split = input.split("\n");
Vector nonEmpty = new Vector();
for (int i = 0; i < split.length; i++)
{
String trimmed = split[i].trim();
nonEmpty.add(trimmed);
}
String output = "";
String output2 = "";
for (int i = 0; i < nonEmpty.size(); i++)
{
output = output + "eco:" + (String) (nonEmpty.elementAt(i) + "\n");
output2 = output2 + (String) (nonEmpty.elementAt(i) + "\n");
}
input
output
output2
String[] split = input.split("\n");
Vector nonEmpty = new Vector();
for (int i = 0; i < split.length; i++){
if (!(split[i].equals("")))
{
nonEmpty.add(split[i].trim());
}
}
String[] non_empty = new String[nonEmpty.size()];
for (int i = 0; i < non_empty.length; i ++)
{
non_empty[i] = nonEmpty.elementAt(i);
}
String output = "";
for (int i = 0; i < non_empty.length; i++)
{
output = output + (String) (non_empty[i] + "\n");
}
input
output
String[] split = input.split("\n");
Vector nonEmpty = new Vector();
for (int i = 0; i < split.length; i++)
{
String trimmed = split[i].trim();
nonEmpty.add(trimmed);
}
String output = "";
for (int i = 0; i < nonEmpty.size(); i++)
{
output = output + (String) (nonEmpty.elementAt(i) + " AND \"Metabolic Networks and Pathways\"[MeSH Terms]" + " AND \"Escherichia coli\"" + "\n");
}
input
output
org.embl.ebi.escience.scuflworkers.java.SplitByRegex
String pathway_id_input = pathway_ids.trim();
String gene_id_input = gene_ids.trim();
String output = "";
if(!(pathway_id_input.equals("")))
{
output = gene_id_input + "\t" + pathway_id_input;
}
pathway_ids
gene_ids
output
org.embl.ebi.escience.scuflworkers.java.StringListMerge
String[] split = input.split("\n");
Vector nonEmpty = new Vector();
for (int i = 0; i < split.length; i++){
if (!(split[i].equals("")))
{
nonEmpty.add(split[i].trim());
}
}
String[] non_empty = new String[nonEmpty.size()];
for (int i = 0; i < non_empty.length; i ++)
{
non_empty[i] = nonEmpty.elementAt(i);
}
String output = "";
for (int i = 0; i < non_empty.length; i++)
{
output = output + (String) (non_empty[i] + "\n");
}
input
output
String[] split = input.split("\n");
Vector nonEmpty = new Vector();
for (int i = 0; i < split.length; i++){
if (!(split[i].equals("")))
{
nonEmpty.add(split[i].trim());
}
}
String[] non_empty = new String[nonEmpty.size()];
for (int i = 0; i < non_empty.length; i ++)
{
non_empty[i] = nonEmpty.elementAt(i);
}
String output = "";
for (int i = 0; i < non_empty.length; i++)
{
output = output + (String) (non_empty[i] + "\n");
}
input
output
String[] split = input.split("\n");
Vector nonEmpty = new Vector();
for (int i = 0; i < split.length; i++){
if (!(split[i].equals("")))
{
nonEmpty.add(split[i].trim());
}
}
String[] non_empty = new String[nonEmpty.size()];
for (int i = 0; i < non_empty.length; i ++)
{
non_empty[i] = nonEmpty.elementAt(i);
}
String output = "";
for (int i = 0; i < non_empty.length; i++)
{
output = output + (String) (non_empty[i] + "\n");
}
input
output
library(Biobase)
library(vsn)
library(limma)
library(genefilter)
library(arrayMagic)
###### dirPath comes as an input dirPath
#gNum=100
#FCVal=1
#pVal=0.05
arrayType=NULL
#setwd("C:/SUMO biological data/ecoli matt exp 1/Timeseries")
setwd(dirPath)
fileName <- "expDetail.txt"
description <- readpDataSlides(fileName)
files = description[,c("FileNameCy5","FileNameCy3")]
########loading data based on description info file
RG = read.imagene(files = files)
#read.imagene(files = files, path = path, ext = ext, names = names, columns = columns, wt.fun = wt.fun,
########Normalization
RG.norm=backgroundCorrect(RG)
RG.normlg=normalizeWithinArrays(RG.norm,method="loess")
RG.norml=normalizeWithinArrays(RG.norm)
summary(RG$R)
MA = normalizeWithinArrays(RG, method="none")
rawImagePlot = 'rawImagePlot.png'
png(rawImagePlot)
par(mfrow=c(1,2))
imageplot(MA$M[,1], RG$printer, zlim=c(-3,3))
plotMA(MA)
dev.off()
#------------- create print tip plot
printTipPlot = 'printTipPlot.png'
png(printTipPlot)
plotPrintTipLoess(MA)
dev.off()
MA.l<- normalizeBetweenArrays(MA)
# create box plot
boxNormPlot = 'boxNormPlot.png'
png(boxNormPlot)
par(mfrow=c(1,2))
boxplot(MA$M,main="Before lowess",col=rainbow(30))
boxplot(MA.l$M,main="After lowess",col=rainbow(30))
dev.off()
imageplot(log2(RG$Rb[,1]), RG$printer, low="white", high="red")
imageplot(MA$M[,1], RG$printer, zlim=c(-3,3))
write.table(cbind(MA$genes,MA$M,MA$A), file = 'resultNorm.csv', sep = ',',row.names = FALSE)
MA=MA.l
## limma test
group1=description[,c("FileNameCy5","FileNameCy3")]
cl = c(rep(1, dim(group1)[1])) #====number of rows
fit = lmFit(MA,design = cl)
efit <- eBayes(fit)
tops = topTable(efit,coef=1,adjust='fdr',sort.by='B',number = 50000)
ID = substr(tops$Gene.ID,1,10)
P = tops$P.Value
T = tops$t
M = tops$logFC
A= tops$AveExpr
B = tops$B
# result file name should be same as workflow output
result2 = 'testResult.txt'
write.table(cbind(ID,A,M,T,P,B), file = "resultTest.csv",sep = ',',row.names = FALSE)
# filtering
#tops = topTable(efit,coef=1,adjust="fdr",sort.by="B",number=gNum)
tops = topTable(efit,coef=1,adjust="fdr",sort.by="M",number=gNum)
filter = tops[tops$P.Value <pVal & (tops$logFC > FCVal| tops$logFC < (-1*FCVal) ) ,]
write.table(filter, file = "resultFilter.csv",sep = ',',row.names = FALSE)
affyID1 = filter[1:gNum,]$Gene.ID
affyID1 = substr(affyID1,1,9)
############## sorting Normalized file baes on M value
#Awork=MA$A
#Mwork=MA$M
#GeneID=MA$genes
#annot=hu10ken[select,]
ng=20000
#ng=min(ng,dim(Mwork)[1]) #
#sd.order=order(Mwork,decreasing=T)
#Awork=Awork[sd.order[1:ng]]
#Mwork=Mwork[sd.order[1:ng]]
#annot=annot[sd.order[1:ng]]
#GeneID=GeneID[sd.order[1:ng],]
#ta = cbind(GeneID,Mwork,Awork)
#write.table(ta, file = 'resultNormOrdered3.csv', sep = ',',row.names = FALSE)
########## getting the list of gene : based on gene number
#x = ta[,6]
#x = substr(x,1,8)
#gNum = 10 ###### comes as an input gNum
#geneList = x[1:gNum]
#affyID1 = geneList
dirPath
gNum
pVal
FCVal
boxNormPlot
affyID1
rawImagePlot
printTipPlot
library(tools)
library( splines)
library(survival)
library(preprocessCore)
library(Biobase)
library(affyio)
library(affy)
library(limma)
library(vsn)
library(genefilter)
library(arrayMagic)
library(scatterplot3d)
library(ade4)
library(made4)
########## Calculate size function
PieList = trimWhiteSpace(GeneList)
cal_size <- function(data) {
size = 0
n=length(data)
for (i in 1:n) { if (!data[i]=="") size=size +1 }
return (size)
}
remove_null <- function(data1) {
newData = NULL
n=length(data1)
for (i in 1:n) { if (!data1[i]=="") newData = c(newData,data1[i]) }
return (newData)
}
###### dirPath comes as an input dirPath
#setwd("C:/SUMO biological data/ecoli matt exp 1/Regulon")
setwd(regulonDir)
fileName <- "regulon_2.txt"
description <- readpDataSlides(fileName)
#GeneList= as.character(description[,"GeneIDAll"])
#GeneList= as.character(description[,"GeneID"])
#GeneList <- NULL
#m=length(PieList)
#for (i in 1:m) { GeneList = c(GeneList, PieList[i])}
#GeneList= as.character(description[,"GeneID"])
Size_GeneList= (cal_size(GeneList))
# get list from external file
Fnr_reg = remove_null(as.character(description[,"Fnr"]))
ArcA_reg = remove_null(as.character(description[,"ArcA"]))
PdhR_reg = remove_null(as.character(description[,"PdhR"]))
#calculate intersections
Gene_Fnr_reg_Inter = comparelists(GeneList,Fnr_reg)$intersect
Size_Gene_Fnr_reg = length(Gene_Fnr_reg_Inter)
Gene_ArcA_reg_Inter = comparelists(GeneList,ArcA_reg)$intersect
Size_Gene_ArcA_reg = length(Gene_ArcA_reg_Inter)
Gene_PdhR_reg_Inter = comparelists(GeneList,PdhR_reg)$intersect
Size_Gene_PdhR_reg = length(Gene_PdhR_reg_Inter)
Gene_Fnr_ArcA_reg_Inter = comparelists(Gene_Fnr_reg_Inter,Gene_ArcA_reg_Inter)$intersect
Size_Gene_Fnr_ArcA_reg = length(Gene_Fnr_ArcA_reg_Inter)
Size_Unknown = (Size_GeneList) - (Size_Gene_Fnr_reg) - (Size_Gene_ArcA_reg)- (Size_Gene_PdhR_reg) + (Size_Gene_Fnr_ArcA_reg)
################### Pie chart
# calculate %
Size_Gene_Fnr_reg_per = (Size_Gene_Fnr_reg *100)/Size_GeneList
#Size_Gene_Fnr_reg_per = (Size_GeneList*100)/Size_Gene_Fnr_reg
Size_Gene_ArcA_reg_per = (Size_Gene_ArcA_reg *100)/Size_GeneList
#Size_Gene_ArcA_reg_per = (Size_Gene_ArcA_reg *100)/Size_Gene_ArcA_reg
Size_Gene_PdhR_reg_per = (Size_Gene_PdhR_reg *100)/Size_GeneList
#Size_Gene_PdhR_reg_per = (Size_GeneList *100)/Size_Gene_PdhR_reg
Size_Gene_Fnr_ArcA_reg_per = (Size_Gene_Fnr_ArcA_reg*100)/Size_GeneList
#Size_Gene_Fnr_ArcA_reg_per = (Size_GeneList*100)/Size_Gene_Fnr_ArcA_reg
Size_Unknown_per = (Size_Unknown *100)/Size_GeneList
#Size_Unknown_per = (Size_GeneList *100)/Size_Unknown
pieChart = 'pieChart.png'
png(pieChart)
pie.reg <- c(Size_Gene_Fnr_reg_per,Size_Gene_ArcA_reg_per,Size_Gene_Fnr_ArcA_reg_per,Size_Gene_PdhR_reg_per,Size_Unknown_per)
names(pie.reg) <- c("Fnr", "ArcA","Fnr&ArcA", "PhdR", "Other")
pie(pie.reg,col=c("purple","violetred1","green3","cornsilk","cyan"))
title(main="Regulon", cex.main=1.8, font.main=1)
#title(xlab="(Don't try this at home kids)", cex.lab=0.8, font.lab=3)
dev.off()
regulonDir
GeneList
pieChart
PieList
GeneList
Size_Gene_ArcA_reg
Size_GeneList
Size_Gene_Fnr_reg
http://soap.genome.jp/KEGG.wsdl
btit
http://soap.genome.jp/KEGG.wsdl
get_pathways_by_genes
http://soap.genome.jp/KEGG.wsdl
btit
This workflow takes in a list of KEGG pathway descriptions and a stop_list of KEGG pathway descriptions. These patwhays are then compared, and those in the stop list are removed. The workflow then extracts the pathway process from the KEGG formatted pathway descriptions output. It takes in a list of KEGG pathway descriptions and returns a list of pathways, without the KEGG pathway identifier or the species from which it came (e.g. - mus musculus (mouse) ). These are passed to the eSearch function and searched for in PubMed. Those abstracts found are returned to the user
This workflow takes in a list of KEGG pathway descriptions and a stop_list of KEGG pathway descriptions. These patwhays are then compared, and those in the stop list are removed. The workflow then extracts the pathway process from the KEGG formatted pathway descriptions output. It takes in a list of KEGG pathway descriptions and returns a list of pathways, without the KEGG pathway identifier or the species from which it came (e.g. - mus musculus (mouse) ). These are passed to the eSearch function and searched for in PubMed. Those abstracts found are returned to the user
/*[local-name(.)='eSearchResult']/*[local-name(.)='IdList']/*[local-name(.)='Id']
net.sourceforge.taverna.scuflworkers.xml.XPathTextWorker
org.embl.ebi.escience.scuflworkers.java.StringListMerge
500
String id = id.trim();
String abstract_text = abstract_text.trim();
String date_text = date_text.trim();
String output = "";
output = id + "\t" + date_text + "\t" + abstract_text;
id
abstract_text
date_text
output
pubmed
org.embl.ebi.escience.scuflworkers.java.StringListMerge
org.embl.ebi.escience.scuflworkers.java.StringListMerge
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/soap/eutils.wsdl
run_eSearch
This workflow takes in a number of search terms (as used in the normal PubMed interface) and retrieves a list of PubMed ids in a list format.
This workflow takes in a number of search terms (as used in the normal PubMed interface) and retrieves a list of PubMed ids in a list format.
net.sourceforge.taverna.scuflworkers.xml.XPathTextWorker
pubmed
/*[local-name(.)='eFetchResult']/*[local-name(.)='PubmedArticleSet']/*[local-name(.)='PubmedArticle']/*[local-name(.)='MedlineCitation']/*[local-name(.)='Article']/*[local-name(.)='Abstract']/*[local-name(.)='AbstractText']
net.sourceforge.taverna.scuflworkers.xml.XPathTextWorker
/*[local-name(.)='eFetchResult']/*[local-name(.)='PubmedArticleSet']/*[local-name(.)='PubmedArticle']/*[local-name(.)='MedlineCitation']/*[local-name(.)='DateCreated']/*[local-name(.)='Year']
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/soap/eutils.wsdl
run_eFetch
Completed
cDNADataAnalysis
pieChart
Scheduled
Running