trimseq |
Specifically, it:
It then optionally trims off poor quality regions from the end, using a threshold percentage of unwanted characters in a window which is moved along the sequence from the ends. The unwanted characters which are used are X's and N's (in nucleic sequences), optionally *'s, and optionally IUPAC ambiguity codes.
The program stops trimming the ends when the percentage of unwanted characters in the moving window drops below the threshold percentage.
Thus if the window size is set to 1 and the percentage threshold is 100, no further poor quality regions will be removed. If the window size is set to 5 and the percentage threshold is 40 then the sequence AAGCTNNNNATT will be trimmed to AAGCT, while AAGCTNATT or AAGCTNNNNATTT will not be trimmed as less than 40% of the last 5 characters are N's.
After trimming these poor quality regions, it will again then trim off any dangling gap characters from the ends .
% trimseq untrimmed.seq trim1.seq -window 1 -percent 100 Trim ambiguous bits off the ends of sequences |
Go to the input files for this example
Go to the output files for this example
Example 2
% trimseq untrimmed.seq trim2.seq -window 5 -percent 40 Trim ambiguous bits off the ends of sequences |
Go to the output files for this example
Example 3
% trimseq untrimmed.seq trim3.seq -window 5 -percent 50 Trim ambiguous bits off the ends of sequences |
Go to the output files for this example
Example 4
% trimseq untrimmed.seq trim4.seq -window 5 -percent 50 -strict Trim ambiguous bits off the ends of sequences |
Go to the output files for this example
Example 5
% trimseq untrimmed.seq trim5.seq -window 5 -percent 50 -strict -noright Trim ambiguous bits off the ends of sequences |
Go to the output files for this example
Standard (Mandatory) qualifiers: [-sequence] seqall (Gapped) sequence(s) filename and optional format, or reference (input USA) [-outseq] seqoutall [ |
Standard (Mandatory) qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
(Gapped) sequence(s) filename and optional format, or reference (input USA) | Readable sequence(s) | Required |
[-outseq] (Parameter 2) |
Sequence set(s) filename and optional format (output USA) | Writeable sequence(s) | <*>.format |
Additional (Optional) qualifiers | Allowed values | Default | |
-window | This determines the size of the region that is considered when deciding whether the percentage of ambiguity is greater than the threshold. A value of 5 means that a region of 5 letters in the sequence is shifted along the sequence from the ends and trimming is done only if there is a greater or equal percentage of ambiguity than the threshold percentage. | Any integer value | 1 |
-percent | This is the threshold of the percentage ambiguity in the window required in order to trim a sequence. | Any numeric value | 100.0 |
-strict | In nucleic sequences, trim off not only N's and X's, but also the nucleotide IUPAC ambiguity codes M, R, W, S, Y, K, V, H, D and B. In protein sequences, trim off not only X's but also B and Z. | Boolean value Yes/No | No |
-star | In protein sequences, trim off not only X's, but also the *'s | Boolean value Yes/No | No |
Advanced (Unprompted) qualifiers | Allowed values | Default | |
-[no]left | Trim at the start | Boolean value Yes/No | Yes |
-[no]right | Trim at the end | Boolean value Yes/No | Yes |
>myseq ...ttyyyctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttca.gnntcynnnnnn |
>myseq ttyyyctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttca-gnntcy |
>myseq ttyyyctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttca-g |
>myseq ttyyyctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttca-gnntcy |
>myseq ctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgcagctc tttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcgcccag atcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgctcctg gcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccctgact accctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggcccgtgct ggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaagaagaca ggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgcccaccttt ggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttctctaa taaaaaagccacttagttca-gnntc |
>myseq ctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgcagctc tttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcgcccag atcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgctcctg gcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccctgact accctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggcccgtgct ggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaagaagaca ggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgcccaccttt ggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttctctaa taaaaaagccacttagttca-gnntcynnnnnn |
Program name | Description |
---|---|
biosed | Replace or delete sequence sections |
codcopy | Reads and writes a codon usage table |
cutseq | Removes a specified section from a sequence |
degapseq | Removes gap characters from sequences |
descseq | Alter the name or description of a sequence |
entret | Reads and writes (returns) flatfile entries |
extractalign | Extract regions from a sequence alignment |
extractfeat | Extract features from a sequence |
extractseq | Extract regions from a sequence |
listor | Write a list file of the logical OR of two sets of sequences |
makenucseq | Creates random nucleotide sequences |
makeprotseq | Creates random protein sequences |
maskfeat | Mask off features of a sequence |
maskseq | Mask off regions of a sequence |
newseq | Type in a short new sequence |
noreturn | Removes carriage return from ASCII files |
notseq | Exclude a set of sequences and write out the remaining ones |
nthseq | Writes one sequence from a multiple set of sequences |
pasteseq | Insert one sequence into another |
revseq | Reverse and complement a sequence |
seqret | Reads and writes (returns) sequences |
seqretsplit | Reads and writes (returns) sequences in individual files |
skipseq | Reads and writes (returns) sequences, skipping first few |
splitter | Split a sequence into (overlapping) smaller sequences |
trimest | Trim poly-A tails off EST sequences |
union | Reads sequence fragments and builds one sequence |
vectorstrip | Strips out DNA between a pair of vector sequences |
yank | Reads a sequence range, appends the full USA to a list file |