jMOTU & Taxonerator
jMOTU: Current version 1.0.7 released on 10th March 2011
Taxonerator: Current version 0.9 released on 10th June 2010
The easiest way to run jMOTU, Taxonnerator, and WebTax (a webapp version of Taxonnerator) is by downloading the virtual machine. The virtual machine is in OVA format and contains all the dependencies needed to run jMOTU and taxonnerator.
jMOTU
jMOTU is a software package for clustering barcode DNA sequence data into molecular operational taxonomic units (MOTU). If you are not sure what a MOTU is, please see the DNA Barcoding pages on our website.
jMOTU does the following:
-
reads input sequences in FASTA or NEXUS format
-
calculates distances between pairs of sequences using a combination of BLAST and the Needleman-Wunch exact global alignment algorithm
-
clusters input sequences into MOTU using various cutoff measures
jMOTU can process large datasets and uses a series of filtering steps to minimise the number of exact global alignments that must be performed. When calculating pairwise distances, jMOTU ignores gaps and counts only nucleotide mis-matches, making it relatively robust to sequencing errors caused by homopolymer runs. Clustering is carried out using a greedy algorithm that is not dependent on input sequence order. JMOTU has been tested on raw datasets of around 50,000 input sequences, and can cluster larger datasets by preforming preclustering of subsets of data before combining them for a global analysis.
Taxonerator
Taxonnerator is a software package for carrying out simple similarity-based annotation of clustered barcode sequence data produced by jMOTU. It generates taxonomic annotation for MOTU by BLASTing representative sequences against a preformatted database of sequences from known organisms (available from this website). Taxonnerator is useful for initial taxonomic investigation of environmental barcode datasets. For each database sequence showing significant similarity to a MOTU, taxonomic information is stored at all taxonomic levels. This allows the end-user to investigate the taxonomic distribution of the sample sequences at multiple levels. Because data are stored in a relational database, they can be queried using structured query language (SQL) in a very flexible manner, or written to output files for further analysis.
Data Files
Taxonerator uses reference data files for assignment of taxonomic names to MOTU clusters. You can build these yourself, but we offer links to precompiled BLAST databases of the SILVA eukaryotic small subunit ribosomal RNA database (May 2010) and a cytochrome oxidase 1 database generated from GenBank/EMBL/DDBJ (May 2010). We also provide formatssu, a command-line java program for formatting the SIVA (and other) databases for Taxonerator.
Taxonerator also uses the NCBI taxonomy database. We have archived a copy here (May 2010) but you SHOULD download this anew from NCBI as described in the User Guide.
SQL
Taxonerator and jMOTU produce data that can / is stored in an SQL (postgreSQL) database. Here are some commands for extracting summary data from these databases.
|