Amphiencode    Project    Data    Contacts

Data

Please note that there is a newer genome assembly (braLan3). Please check Brasó-Vives et al Genome Biol 2022.

Datasets

The genome of the European amphioxus Branchiostoma lancealatum was generated from 150x Illumina coverage sequenced by Genoscope. In the Bl71nemr assembly, haplotypes were reconcilied using haplomerger.

This assembly spans 495.3Mb (N50: 1.29Mb) split in 10,247 scaffolds with 4% of residual gaps. Masked regions are represented with lowercase characters (soft-masking); gaps in the assembly are represented with Ns.

The genome was annotated using both EVM/PASA pipeline and cufflinks/transdecoder. Both these annotation have their strength. Cufflinks provides a better representation of transcript diversity, limits and TSSs positions. Conversely, EVM generates more robust predicted proteins, especially in regard to frameshifts, etc… A final annotation with unified IDs (BL00000) was subsequently built by merging the two annotations with a suffix indicative of their origin (BL00000_evm0 or BL00000_cuff0). All three annotations are available as GTFs.

File Dataset Format
Bl71nemr.fa Genome assembly fasta
Bla_annot_final.gtf Final merged annotation (EVM/PASA+cufflinks) GTF
Bla_annot_evm.gtf  EVM/PASA annotation  GTF 
Bla_annot_cuff.gtf Cufflinks annotation GTF 
Bla_annot_final_v4.fa All full transcripts (coding and non-coding) fasta 
Bla_annot_v4_best_Aac.fa Reference protein* fasta
Bla_annot_v4_best_Cds.fa Reference CDS* fasta
Bla_annot_v4_best_Tra.fa Reference transcripts* fasta

*(*) most highly expressed transcript for each coding locus (35961)

Blast server

A blast server makes it possible to search the assembly, the reference transcripts and proteins. (The blast server is no longer active)

Genome browser

The European amphioxus genome assembly and annotation are available as a UCSC track hub. The assembly should be searchable using BLAT. To enable access, you can either: