Mapping OligoNucleotides to an assembly

Created: 2009-02-13 09:05:35 Last updated: 2009-02-13 09:08:20

Download Workflow

Version info

The former version of the workflow expected that results from BioMART only report transcripts when the query (the probe in our
case) are entirely encapsulated in an exon of that transcript. However, the BioMart service also returns transcripts when the query is not or only partially overlapping with an exon in the stretch on the assembly on which a transcript is defined. This resulted in too many oligos classified as having multiple transcripts or having multiple genes.

Workflow description

We used RShell in the design process of a Zebrafish microarray
(supp. info Figure S1 and Figure S2). A microarray with 15k probes
of 60-mer oligonucleotides was designed on gene sequences from
Vega (http://vega.sanger.ac.uk/Danio_rerio) and Ensembl
(http://www.ensembl.org/Danio_rerio/) that are also known
in the Zebrafish Information Network (http://zfin.org) (for zebra
fish, the VEGA set is not a subset of the Ensembl set) of the genome
DNA-sequence assemblies and to judge the agreement that exists between
the different assembly annotations, we mapped the Vega-designed probes
onto the Ensembl assembly

It first performs an alignment using the BioMoby Blat and Blast service provided by WUR (www.bioinformatics.nl). Next, for each hit, tries to find the corresponding transcripts and genes using a biomart webservice. The final task is an analysis task using RShell. It calculates for each oligo to which class it belongs:

0 no hit
1 single hit, single transcript, single gene
2 multiple hits, single transcript, single gene, intron spanning
3 multiple hits, single transcript, single gene, possible intron spanning *
4 multiple hits, single transcript, single gene, no intron spanning
5 multiple hits, multiple transcripts, single gene, intron spanning
6 multiple hits, multiple transcripts, single gene, possible intron spanning *
7 multiple hits, multiple transcripts, single gene, no intron spanning
8 single hit, does not meet additional criteria **
9 multiple hits, single transcript, do not meet additional criteria **
10 multiple hits, multiple transcripts, do not meet additional criteria **
11 multiple hits, multiple genes
12 no transcript found but hit(s) meet additional criteria **
13 no transcript found and hit(s) do not meet additional criteria **
14 multiple hits, single transcript, single gene plus hit without transcript found and hits
meet additional criteria **
* Oligo below e-value cut-off 1e-12, but also intron spanning criteria met.
** Additional criteria: either e-value below 1e-12 or intron spanning.

To run this workflow, a certificate to access www.bioinformatics.nl needs to installed (Some services use an SSL connection). Look at the link below how to install this certificate.

http://www.myexperiment.org/files/148

The myExperiment pack http://www.myexperiment.org/packs/45 contains the workflow, the input and a test input. The whole input set is large. It takes about 6 hours on a 3 GHz Linux pc with 24 Gig RAM. The test input set can be run on almost any computer with Taverna and R installed. This set takes approximately 10 minutes.

Preview

Download as scalable diagram (SVG)

Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/603/download?version=7
[ More Info Expand ]

Workflow Components

Inputs (3)

Name	Description
DataBaseName	Database name (for example Danio_rerio_Genome)
Sequences	Sequences in fasta format, for example >ENSDART00000061775 TTGTTTCCTCATCAACACAGCAGATCGAATCATTCGAGTTTACGACGGTCGAGAGATCCT >ENSDART00000100022 TGCTGTTCAGTGGTTATGTTGTTGTTTGAATAAATGTTAAGAGCCAGTGGATGGCACAAA >OTTDART00000006800 ATCTCTTAGCACTCTGCTGACTCACAACTTCTTCAGAAATGACTTTTTGGATATCATGAA >OTTDART00000002447 AAGACTGTACGACAAGACAGTGCAAATGGCACCATAGTAAATTCAACCGCTCACCAGGAA >ENSDART00000047499 CTCTATGACGTATATTGCTATGTGGAGAACATTCATGGGGAGGTTTTTCATGGCTCAACC >ENSDART00000093312 CAGAGGGTTGCAACCTCTTCATCTATCATCTACCACAAGAGTTTGGTGACAATGAGCTTA >OTTDART00000002445 AATGTTGCTGGTATCAGTGACCCCTTTCTGCAGGTGCGCATTCTTAGATTGCTAAGGATT >ENSDART00000093311 CGGGCCTTTCTGGAGAAACGCAAACCTGTGTGGAGCAACACAGACGACTGCATTCACTGA >ENSDART00000085701 AGAATGACAATGACTGTGGAGCTTTTGTTTTGGAGTACTGTAAGTGCCTGGCCTTCATGA >OTTDART00000002443 AAACCATCACGCTTTAATTAGTTTCCCCTGTTAACCATTGTCCCACAAGTCTTATGTGGA >OTTDART00000002442 CTGAAAGGCACTTGAGTTAATCAAATCCGCTTCTATGTAAGTGTTTTGTAAGAGCAGGCT
Restart	Whether the workflow should start from scratch. If false, the workflow will continue. This is useful if a crash has occured in Taverna (and it does sometimes when the data set is large)

Processors (20)

Name	Type	Description
BlastHitsFinished_Filename	stringconstant
BioMartReport_Filename	stringconstant
OligosNotFound_Filename	stringconstant
BlastReport_Filename	stringconstant
GeneratePlots	rshell
Read_BlastReport	local
Create_Header_BlastReport	beanshell	Processor to add content to a (existing) file. The content is added to the end of the file. The inputs: Filename: the file name of a file, if the file does not exists, a new file is added Content: the string to append NewLine [default = true]: if true, a newline is added to the end of the line (useful if you want to add a record each time)
Create_Semaphore	beanshell
Read_OligosNotFound	local
Chunk_Size	stringconstant
Create_Header_BioMartReport	beanshell	Processor to add content to a (existing) file. The content is added to the end of the file. The inputs: Filename: the file name of a file, if the file does not exists, a new file is added Content: the string to append NewLine [default = true]: if true, a newline is added to the end of the line (useful if you want to add a record each time)
Create_Sequence_Chunks	beanshell
Touch_BlastHitsFinished_File	beanshell
Split_Blast_Report	beanshell
Read_BioMartReport	local
Touch_OligosNotFound_File	beanshell	Processor to add content to a (existing) file. The content is added to the end of the file. The inputs: Filename: the file name of a file, if the file does not exists, a new file is added Content: the string to append NewLine [default = true]: if true, a newline is added to the end of the line (useful if you want to add a record each time)
Filter_Sequences_For_Blast	beanshell
Split_sequences	beanshell
DoBioMart	workflow
BlatOrBlast	workflow	This workflow combines the blat and blast workflows. It takes as input a database name (Danio_rerio_Genome for Zebra Fish for example) and and a set of Fasta sequences. It first tries to perform a blat (at www.bioinformatics.nl). When this service returns nothing, a blast is done (also at www.bioinformatics.nl). The resulting reports are combined.

Beanshells (28)

Name	Description	Inputs	Outputs
Create_BioMart_Record		geneId transcriptId transcriptStart transcriptEnd	record
EmptyList			list
checkIsInExon		DStart DStop ExonStart ExonStop	isInExon
Append_To_BioMartReport	Processor to add content to a (existing) file. The content is added to the end of the file. The inputs: Filename: the file name of a file, if the file does not exists, a new file is added Content: the string to append NewLine [default = true]: if true, a newline is added to the end of the line (useful if you want to add a record each time)	Filename Content NewLine
Split_Blast_Record		blastRecord	oligoId blastIndex chromosomeRegion dstart dstop
Add_OligoID_BlastIndex_Prefix		BioMartRecord OligoID BlastIndex	FullRecord
Append_To_BlastHitsFinished_File		Filename Content NewLine
Correct_Moby_Object		inputXML	outputXML
isRunning		status	isRunning
DownloadURLWithBasicAuth	This Beanshell downloads a file to disk. The standard download local Java widgets don't handle URLs with HTTP(S) Basic Authentication, but this Beanshell can. When a webserver uses BasicAuth, a login and password can be coded as part of the URL using the following syntax: http(s)://login:password@www.some.website/my/great/tool/result.xml. This beanshel extracts the login and password from the URL and supplies them automatically to the webserver. This prevents Taverna from showing popup dialogs requesting the login and password from the user as this will be problematic for large workflows. Please note that the path where the downloaded file will be stored must be an absolute path to a folder ended with a slash. (Slash backward on Windows or a slash forward on Linux/Unix/Mac OS X.) The filename for the result is automatically extracted from the URL.	theURL	blastResults
Download_Report_and_Filter	This Beanshell downloads a file to disk. The standard download local Java widgets don't handle URLs with HTTP(S) Basic Authentication, but this Beanshell can. When a webserver uses BasicAuth, a login and password can be coded as part of the URL using the following syntax: http(s)://login:password@www.some.website/my/great/tool/result.xml. This beanshel extracts the login and password from the URL and supplies them automatically to the webserver. This prevents Taverna from showing popup dialogs requesting the login and password from the user as this will be problematic for large workflows. Please note that the path where the downloaded file will be stored must be an absolute path to a folder ended with a slash. (Slash backward on Windows or a slash forward on Linux/Unix/Mac OS X.) The filename for the result is automatically extracted from the URL.	URL eValue	blatResults
isEmpty		string	isEmpty
Filter_Sequences_Not_Found		Report Sequences	SequencesNotFound
Filter_Sequences_For_Blast		sequences blatResult	sequencesForBlast
Append_To_BlastReport	Processor to add content to a (existing) file. The content is added to the end of the file. The inputs: Filename: the file name of a file, if the file does not exists, a new file is added Content: the string to append NewLine [default = true]: if true, a newline is added to the end of the line (useful if you want to add a record each time)	Filename Content NewLine
Append_To_OligosNotFound	Processor to add content to a (existing) file. The content is added to the end of the file. The inputs: Filename: the file name of a file, if the file does not exists, a new file is added Content: the string to append NewLine [default = true]: if true, a newline is added to the end of the line (useful if you want to add a record each time)	Filename Content NewLine
Join_Blat_Blast_Results		list1 list2	outputList
EmptyList			list
Add_Indices_To_BlastReport		record	record_with_index
Create_Header_BlastReport	Processor to add content to a (existing) file. The content is added to the end of the file. The inputs: Filename: the file name of a file, if the file does not exists, a new file is added Content: the string to append NewLine [default = true]: if true, a newline is added to the end of the line (useful if you want to add a record each time)	Filename Content NewLine Restart
Create_Semaphore
Create_Header_BioMartReport	Processor to add content to a (existing) file. The content is added to the end of the file. The inputs: Filename: the file name of a file, if the file does not exists, a new file is added Content: the string to append NewLine [default = true]: if true, a newline is added to the end of the line (useful if you want to add a record each time)	Filename Content NewLine Restart
Create_Sequence_Chunks		sequences chunkSize	chunks
Touch_BlastHitsFinished_File		Filename Content NewLine Restart
Split_Blast_Report		BlastReport filename	Records
Touch_OligosNotFound_File	Processor to add content to a (existing) file. The content is added to the end of the file. The inputs: Filename: the file name of a file, if the file does not exists, a new file is added Content: the string to append NewLine [default = true]: if true, a newline is added to the end of the line (useful if you want to add a record each time)	Filename Content NewLine
Filter_Sequences_For_Blast		sequences filename	sequencesToDo
Split_sequences		sequenceText	sequences

Outputs (6)

Name	Description
BlastReport	The blast hits
BioMartReport	The biomart genes and transcripts
SequencesNotFound	The sequences not found by blast and blat
BarPlot	A bar plot of the oligos per class
Classes	The classes, same as bar plot
Report	The total R report

Links (33)

Source	Sink
DataBaseName	BlatOrBlast:DataBaseName
BioMartReport_Filename:value	Create_Header_BioMartReport:Filename
BioMartReport_Filename:value	DoBioMart:BioMartReport_Filename
BioMartReport_Filename:value	Read_BioMartReport:fileurl
BlastReport_Filename:value	BlatOrBlast:BlastReport_Filename
BlastReport_Filename:value	Create_Header_BlastReport:Filename
BlastReport_Filename:value	Read_BlastReport:fileurl
Chunk_Size:value	Create_Sequence_Chunks:chunkSize
Create_Sequence_Chunks:chunks	BlatOrBlast:Sequences
OligosNotFound_Filename:value	BlatOrBlast:OligosNotFound_Filename
OligosNotFound_Filename:value	Read_OligosNotFound:fileurl
OligosNotFound_Filename:value	Touch_OligosNotFound_File:Filename
Read_BioMartReport:filecontents	GeneratePlots:biomartfile
Read_BlastReport:filecontents	GeneratePlots:blastresultfile
Read_BlastReport:filecontents	Split_Blast_Report:BlastReport
Read_OligosNotFound:filecontents	GeneratePlots:oligosNotFound
Restart	Create_Header_BioMartReport:Restart
Restart	Create_Header_BlastReport:Restart
Sequences	Filter_Sequences_For_Blast:sequences
BlastHitsFinished_Filename:value	Touch_BlastHitsFinished_File:Filename
BlastReport_Filename:value	Filter_Sequences_For_Blast:filename
Filter_Sequences_For_Blast:sequencesToDo	Split_sequences:sequenceText
Restart	Touch_BlastHitsFinished_File:Restart
BlastHitsFinished_Filename:value	DoBioMart:BlastHitsFinished_FileName
BlastHitsFinished_Filename:value	Split_Blast_Report:filename
GeneratePlots:BarPlot	BarPlot
GeneratePlots:Classes	Classes
GeneratePlots:Report	Report
Read_BioMartReport:filecontents	BioMartReport
Read_BlastReport:filecontents	BlastReport
Read_OligosNotFound:filecontents	SequencesNotFound
Split_Blast_Report:Records	DoBioMart:blastRecord
Split_sequences:sequences	Create_Sequence_Chunks:sequences

Coordinations (11)

Controller	Target
Create_Header_BlastReport	BlatOrBlast
BlatOrBlast	Read_BlastReport
BlatOrBlast	Read_OligosNotFound
DoBioMart	Read_BioMartReport
Create_Header_BioMartReport	DoBioMart
Create_Header_BlastReport	Filter_Sequences_For_Blast
Touch_OligosNotFound_File	BlatOrBlast
Create_Semaphore	Create_Header_BlastReport
Create_Semaphore	Touch_OligosNotFound_File
Create_Semaphore	Create_Header_BioMartReport
Create_Semaphore	Touch_BlastHitsFinished_File

Information Workflow Type

Taverna 1

Information Uploader

Wassinki

Information License

All versions of this Workflow are licensed under:

Information Version 7 (latest) (of 7)

Information Credits (2)

(People/Groups)

Information Attributions (6)

(Workflows/Files)

Information Tags (7)

Uploader tags

biomart
|
BLAST
|
blat
|
ensembl
|
microarray
|
r
|
rshell

Log in to add Tags

Information Shared with Groups (0)

None

Information Featured In Packs (2)

Log in to add to one of your Packs

Information Attributed By (0)

(Workflows/Files)

None

Information Favourited By (1)

Katy Wolstencroft

Information Statistics

11024 viewings

4067 downloads

[ see breakdown ]

Citations (0)

None

Version History

In chronological order:

Mapping oligonucleotides to an assembly

Created by Wassinki on Thursday 11 December 2008 12:11:59 (UTC)

Last edited by Wassinki on Thursday 11 December 2008 12:25:55 (UTC)
Mapping OligoNucleotides to an assembly

Created by Wassinki on Thursday 11 December 2008 12:26:50 (UTC)

Last edited by Wassinki on Thursday 11 December 2008 12:32:04 (UTC)
Mapping OligoNucleotides to an assembly

Created by Wassinki on Friday 19 December 2008 09:41:13 (UTC)

Revision comment:

<meta /> <meta /> <meta /> <meta /> <link />
----------------------------------------

The newest version only takes into account the probes that have blast hits that map on exons. The BioMart sub workflow has been modified to do this by adding an extra BioMart processor and a beanshell processor to filter those blast hits that map on exons.

----------------------------------------

This workflow maps the input oligo set to an assembly.<o:p></o:p>

It first performs an alignment using the BioMoby Blat and Blast service provided by WUR (www.bioinformatics.nl). Next, for each hit, tries to find the corresponding transcripts and genes using a biomart webservice. The final task is an analysis task using RShell. It calculates for each oligo to which class it belongs:<o:p></o:p>

1 single hit
2-4 multiple hits single transcript
5-7 mulitple hits multiple transcripts
8 single hit, discarded
9 multiple hits single transcript, discarded
10 multiple transcripts, discarded*
11 multi gene, discarded
12 no transcript
13 no transcript, discarded
* classified on the criteria intron spanning only, possible intron spanning and no intron spanning.
* hit(s) do not meet high stringency threshold
* no transcript found but hit(s) meet high stringency threshold.<o:p></o:p>

To run this workflow, a certificate to access www.bioinformatics.nl needs to installed (Some services use an SSL connection). Look at the link below how to install this certificate.

http://www.myexperiment.org/files/148<o:p></o:p>

The myExperiment pack http://www.myexperiment.org/packs/45 contains the workflow, the input and a test input. The whole input set is large. It takes about 6 hours on a 3 GHz Linux pc with 24 Gig RAM. The test input set can be run on almost any computer with Taverna and R installed. This set takes approximately 10 minutes.<o:p></o:p>

<o:p> </o:p>

<o:p> </o:p>
Mapping OligoNucleotides to an assembly

Created by Wassinki on Tuesday 03 February 2009 15:27:18 (UTC)

Last edited by Wassinki on Tuesday 03 February 2009 15:33:01 (UTC)

Revision comment:

The former version of the workflow expected that results from BioMART only report transcripts when the query (the probe in our case) are entirely encapsulated in an exon of that transcript. However, the BioMart service also returns transcripts when the query is not or only partially overlapping with an exon in the stretch on the assembly on which a transcript is defined. This resulted in too many oligos classified as having multiple transcripts or having multiple genes.
Mapping OligoNucleotides to an assembly

Created by Wassinki on Wednesday 04 February 2009 08:25:36 (UTC)

Last edited by Wassinki on Wednesday 04 February 2009 08:26:04 (UTC)
Analysing workflows

Created by Wassinki on Friday 13 February 2009 09:03:37 (UTC)
Mapping OligoNucleotides to an assembly

Created by Wassinki on Friday 13 February 2009 09:05:35 (UTC)

Last edited by Wassinki on Friday 13 February 2009 09:08:23 (UTC)

Reviews (0)

No reviews yet

Be the first to review!

Comments (0)

View Timeline

No comments yet

Log in to make a comment

Other workflows that use similar services (0)

There are no workflows in myExperiment that use similar services to this Workflow.