EMBOSS: cai

cai

Function

CAI codon adaptation index

Description

cai calculates the Codon Adaptation Index. This is a simple, effective measure of synonymous codon usage bias.

The CAI index uses a reference set of highly expressed genes from a species to assess the relative merits of each codon, and a score for a gene sequence is calculated from the frequency of use of all codons in that gene sequence. The index assesses the extent to which selection has been effective in moulding the pattern of codon usage. In that respect it is useful for predicting the level of expression of a gene, for assessing the adaptation of viral genes to their hosts, and for making comparisons of codon usage in different organisms. The index may also give an approximate indication of the likely success of heterologous gene expression.

Usage

Here is a sample session with cai

% cai TEMBL:AB009602 CAI codon adaptation index Codon usage file [Eyeast_cai.cut]: Output file [ab009602.cai]:

Go to the input files for this example
Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers:
  [-seqall]            seqall     Nucleotide sequence(s) filename and optional
                                  format, or reference (input USA)
   -cfile              codon      [Eyeast_cai.cut] Codon usage table name
  [-outfile]           outfile    [*.cai] Output file name

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-seqall" associated qualifiers
   -sbegin1            integer    Start of each sequence to be used
   -send1              integer    End of each sequence to be used
   -sreverse1          boolean    Reverse (if DNA)
   -sask1              boolean    Ask for begin/end/reverse
   -snucleotide1       boolean    Sequence is nucleotide
   -sprotein1          boolean    Sequence is protein
   -slower1            boolean    Make lower case
   -supper1            boolean    Make upper case
   -sformat1           string     Input sequence format
   -sdbname1           string     Database name
   -sid1               string     Entryname
   -ufo1               string     UFO features
   -fformat1           string     Features format
   -fopenfile1         string     Features file name

   "-cfile" associated qualifiers
   -format             string     Data format

   "-outfile" associated qualifiers
   -odirectory2        string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write standard output
   -filter             boolean    Read standard input, write standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages

Standard (Mandatory) qualifiers Allowed values Default

[-seqall]
(Parameter 1) Nucleotide sequence(s) filename and optional format, or reference (input USA) Readable sequence(s) Required

-cfile Codon usage table name Codon usage file in EMBOSS data path Eyeast_cai.cut

[-outfile]
(Parameter 2) Output file name Output file <*>.cai

Additional (Optional) qualifiers Allowed values Default

(none)

Advanced (Unprompted) qualifiers Allowed values Default

(none)

Standard (Mandatory) qualifiers	Allowed values	Default
[-seqall] (Parameter 1)	Nucleotide sequence(s) filename and optional format, or reference (input USA)	Readable sequence(s)	Required
-cfile	Codon usage table name	Codon usage file in EMBOSS data path	Eyeast_cai.cut
[-outfile] (Parameter 2)	Output file name	Output file	<>*.cai
Additional (Optional) qualifiers	Allowed values	Default
(none)
Advanced (Unprompted) qualifiers	Allowed values	Default
(none)

Input file format

cai reads a nucleic acid sequence of a gene.

Input files for usage example

Database entry: TEMBL:AB009602

ID   AB009602; SV 1; linear; mRNA; STD; FUN; 561 BP.
XX
AC   AB009602;
XX
DT   15-DEC-1997 (Rel. 53, Created)
DT   14-APR-2005 (Rel. 83, Last updated, Version 2)
XX
DE   Schizosaccharomyces pombe mRNA for MET1 homolog, partial cds.
XX
KW   MET1 homolog.
XX
OS   Schizosaccharomyces pombe (fission yeast)
OC   Eukaryota; Fungi; Ascomycota; Schizosaccharomycetes;
OC   Schizosaccharomycetales; Schizosaccharomycetaceae; Schizosaccharomyces.
XX
RN   [1]
RP   1-561
RA   Kawamukai M.;
RT   ;
RL   Submitted (07-DEC-1997) to the EMBL/GenBank/DDBJ databases.
RL   Makoto Kawamukai, Shimane University, Life and Environmental Science; 1060
RL   Nishikawatsu, Matsue, Shimane 690, Japan
RL   (E-mail:kawamuka@life.shimane-u.ac.jp, Tel:0852-32-6587, Fax:0852-32-6499)
XX
RN   [2]
RP   1-561
RA   Kawamukai M.;
RT   "S.pmbe MET1 homolog";
RL   Unpublished.
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..561
FT                   /organism="Schizosaccharomyces pombe"
FT                   /mol_type="mRNA"
FT                   /clone_lib="pGAD GH"
FT                   /db_xref="taxon:4896"
FT   CDS             <1..275
FT                   /codon_start=3
FT                   /transl_table=1
FT                   /product="MET1 homolog"
FT                   /db_xref="GENEDB:SPCC1739.06c"
FT                   /db_xref="GOA:O74468"
FT                   /db_xref="UniProtKB/Swiss-Prot:O74468"
FT                   /protein_id="BAA23999.1"
FT                   /translation="SMPKIPSFVPTQTTVFLMALHRLEILVQALIESGWPRVLPVCIAE
FT                   RVSCPDQRFIFSTLEDVVEEYNKYESLPPGLLITGYSCNTLRNTA"
XX
SQ   Sequence 561 BP; 135 A; 106 C; 98 G; 222 T; 0 other;
     gttcgatgcc taaaatacct tcttttgtcc ctacacagac cacagttttc ctaatggctt        60
     tacaccgact agaaattctt gtgcaagcac taattgaaag cggttggcct agagtgttac       120
     cggtttgtat agctgagcgc gtctcttgcc ctgatcaaag gttcattttc tctactttgg       180
     aagacgttgt ggaagaatac aacaagtacg agtctctccc ccctggtttg ctgattactg       240
     gatacagttg taataccctt cgcaacaccg cgtaactatc tatatgaatt attttccctt       300
     tattatatgt agtaggttcg tctttaatct tcctttagca agtcttttac tgttttcgac       360
     ctcaatgttc atgttcttag gttgttttgg ataatatgcg gtcagtttaa tcttcgttgt       420
     ttcttcttaa aatatttatt catggtttaa tttttggttt gtacttgttc aggggccagt       480
     tcattattta ctctgtttgt atacagcagt tcttttattt ttagtatgat tttaatttaa       540
     aacaattcta atggtcaaaa a                                                 561
//

Output file format

cai writes the Codon Adaptation Index to the output file.

Output files for usage example

File: ab009602.cai

Sequence: AB009602 CAI: 0.188

Data files

cai reads a reference codon usage table prepared from a set of genes which are known to be highly expressed.

The default codon usage table 'Eyeastcai.cut' is the standard set of Saccharomyces cerevisiae highly expressed gene codon frequiencies. Another table Eschpo_cai.cut was prepared from a set of Schizosaccharomyces pombe genes by Peter Rice for the S. pombe sequencing team at the Sanger Centre.

You should prepare your own codon usage table for your organism of interest.

EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.

To see the available EMBOSS data files, run:

% embossdata -showall

To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:


% embossdata -fetch -file Exxx.dat

Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".

The directories are searched in the following order:

. (your current directory)
.embossdata (under your current directory)
~/ (your home directory)
~/.embossdata

Notes

None.

References

Sharp PM., Li W-H. "The codon adaptation index - a measure of directional synonymous codon usage bias, and its potential applications." Nucleic Acids Research 1987 vol 15, pp 1281-1295.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

Program name	Description
chips	Codon usage statistics
codcmp	Codon usage table comparison
cusp	Create a codon usage table
syco	Synonymous codon usage Gribskov statistic plot

Author(s)

Alan Bleasby (ajb © ebi.ac.uk)
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

History

Written (March 2001) - Alan Bleasby.

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments

None

Function

Description

Usage

Command line arguments

Input file format

Input files for usage example

Database entry: TEMBL:AB009602

Output file format

Output files for usage example

File: ab009602.cai

Data files

Notes

References

Warnings

Diagnostic Error Messages

Exit status

Known bugs

See also

Author(s)

History

Target users

Comments