Mapping OligoNucleotides to an assembly

Created: 2009-02-13 09:05:35      Last updated: 2009-02-13 09:08:20

Version info

The former version of the workflow expected that results from BioMART only report transcripts when the query (the probe in our
case) are entirely encapsulated in an exon of that transcript. However, the BioMart service also returns transcripts when the query is not or only partially overlapping with an exon in the stretch on the assembly on which a transcript is defined. This resulted in too many oligos classified as having multiple transcripts or having multiple genes.

Workflow description

We used RShell in the design process of a Zebrafish microarray
(supp. info Figure S1 and Figure S2). A microarray with 15k probes
of 60-mer oligonucleotides was designed on gene sequences from
Vega ( and Ensembl
( that are also known
in the Zebrafish Information Network ( (for zebra
fish, the VEGA set is not a subset of the Ensembl set) of the genome
DNA-sequence assemblies and to judge the agreement that exists between
the different assembly annotations, we mapped the Vega-designed probes
onto the Ensembl assembly

It first performs an alignment using the BioMoby Blat and Blast service provided by WUR ( Next, for each hit, tries to find the corresponding transcripts and genes using a biomart webservice. The final task is an analysis task using RShell. It calculates for each oligo to which class it belongs:

0 no hit
1 single hit, single transcript, single gene
2 multiple hits, single transcript, single gene, intron spanning
3 multiple hits, single transcript, single gene, possible intron spanning *
4 multiple hits, single transcript, single gene, no intron spanning
5 multiple hits, multiple transcripts, single gene, intron spanning
6 multiple hits, multiple transcripts, single gene, possible intron spanning *
7 multiple hits, multiple transcripts, single gene, no intron spanning
8 single hit, does not meet additional criteria **
9 multiple hits, single transcript, do not meet additional criteria **
10 multiple hits, multiple transcripts, do not meet additional criteria **
11 multiple hits, multiple genes
12 no transcript found but hit(s) meet additional criteria **
13 no transcript found and hit(s) do not meet additional criteria **
14 multiple hits, single transcript, single gene plus hit without transcript found and hits
meet additional criteria **
* Oligo below e-value cut-off 1e-12, but also intron spanning criteria met.
** Additional criteria: either e-value below 1e-12 or intron spanning.

To run this workflow, a certificate to access needs to installed (Some services use an SSL connection). Look at the link below how to install this certificate.

The myExperiment pack contains the workflow, the input and a test input. The whole input set is large. It takes about 6 hours on a 3 GHz Linux pc with 24 Gig RAM. The test input set can be run on almost any computer with Taverna and R installed. This set takes approximately 10 minutes.

Information Preview

Information Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
[ More InfoExpand ]

Information Workflow Components

Inputs (3)
Processors (20)
Beanshells (28)
Outputs (6)
Links (33)
Coordinations (11)

Information Workflow Type

Taverna 1

Information Uploader

Information License

All versions of this Workflow are licensed under:

Information Version 7 (latest) (of 7)

View version:

Information Tags (7)

Log in to add Tags

Information Shared with Groups (0)


Information Featured In Packs (2)

Log in to add to one of your Packs

Information Attributed By (0)



Information Favourited By (1)

Information Statistics


Citations (0)


Version History

In chronological order:

Reviews Reviews (0)

No reviews yet

Be the first to review!

Comments Comments (0)

No comments yet

Log in to make a comment

Workflow Other workflows that use similar services (0)

There are no workflows in myExperiment that use similar services to this Workflow.