degapseq |
In fact, if does more than just this as it removes ANY non-alphabetic character from the input sequence, so as well as removing the gap-characters, it will remove such things as the '*' in protein sequences that indicates the position of a 'translated' STOP codon.
There are many different formats for storing sequences in files. Some sequence formats allow you to store aligned sequences, including the information on where gaps have been introduced to make the sequence align properly. This is indicated by using a special character to indicate that there is a gap at that position. Different sequence formats use different characters to indicate gaps. Some formats may use more than one type of character to indicate different types of gaps (e.g. gaps at the ends of the sequences, internal gaps, gaps introduced by a program or by a person editing the alignment, etc.) Some typicate characters used to indicate where gaps are may be: '.', '-' and '~'.
When EMBOSS programs read in a sequence that has gap-characters in, all gap characters are internally changed to '-' characters. i.e. EMBOSS only has one type of gap character. Thus any distinguishing characters for different gap types are reduced to a '-'. There is only one type of gap in EMBOSS.
degapseq removes any non-alphabetic character in the sequence, in effect this means that gaps and '*' characters are removed. The sequence is then written out.
% degapseq dnagap.fasta nogaps.seq Removes gap characters from sequences |
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers: [-sequence] seqall (Gapped) sequence(s) filename and optional format, or reference (input USA) [-outseq] seqoutall [ |
Standard (Mandatory) qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
(Gapped) sequence(s) filename and optional format, or reference (input USA) | Readable sequence(s) | Required |
[-outseq] (Parameter 2) |
Sequence set(s) filename and optional format (output USA) | Writeable sequence(s) | <*>.format |
Additional (Optional) qualifiers | Allowed values | Default | |
(none) | |||
Advanced (Unprompted) qualifiers | Allowed values | Default | |
(none) |
The input sequence can be nucleic or protein.
The input sequence can be gapped or ungapped.
>FASTA F10002 FASTA FORMAT DNA SEQUENCE ACGT....ACGTACGTACGTACGTACGTACGTACGTACGT ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT ACGTACGTACGTACGTACGT |
>FASTA F10002 FASTA FORMAT DNA SEQUENCE ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT ACGTACGTACGTACGTACGTACGTACGTACGTACGT |
Program name | Description |
---|---|
biosed | Replace or delete sequence sections |
codcopy | Reads and writes a codon usage table |
cutseq | Removes a specified section from a sequence |
descseq | Alter the name or description of a sequence |
entret | Reads and writes (returns) flatfile entries |
extractalign | Extract regions from a sequence alignment |
extractfeat | Extract features from a sequence |
extractseq | Extract regions from a sequence |
listor | Write a list file of the logical OR of two sets of sequences |
makenucseq | Creates random nucleotide sequences |
makeprotseq | Creates random protein sequences |
maskfeat | Mask off features of a sequence |
maskseq | Mask off regions of a sequence |
newseq | Type in a short new sequence |
noreturn | Removes carriage return from ASCII files |
notseq | Exclude a set of sequences and write out the remaining ones |
nthseq | Writes one sequence from a multiple set of sequences |
pasteseq | Insert one sequence into another |
revseq | Reverse and complement a sequence |
seqret | Reads and writes (returns) sequences |
seqretsplit | Reads and writes (returns) sequences in individual files |
skipseq | Reads and writes (returns) sequences, skipping first few |
splitter | Split a sequence into (overlapping) smaller sequences |
trimest | Trim poly-A tails off EST sequences |
trimseq | Trim ambiguous bits off the ends of sequences |
union | Reads sequence fragments and builds one sequence |
vectorstrip | Strips out DNA between a pair of vector sequences |
yank | Reads a sequence range, appends the full USA to a list file |