splitter |
There should be little requirement to split sequences into smaller sub-sequences in EMBOSS, but there may be circumstances where memory usage becomes restrictive when dealing with truly large sequences. In this case, memory usage may be reduced by repeating the analysis several times on split sub-sequences.
If you need to split a large sequence into smaller subsequences so that a non-EMBOSS program can analyse the smaller sequence, it may also be useful to write the sub-sequences into separate files instead of the default EMBOSS behaviour of concatenating them together into one file.
To write the output sequences to separate files, use the command-line switch '-ossingle'.
Split a sequence into sub-sequences of 10,000 bases (the default size) with no overlap between the sub-sequences:
% splitter tembl:BA000025 ba000025.split Split a sequence into (overlapping) smaller sequences |
Go to the input files for this example
Go to the output files for this example
Example 2
Split a sequence into sub-sequences of 50,000 bases with an overlap of 3,000 bases on each sub-sequence:
% splitter tembl:BA000025 ba000025.split -size=50000 -over=3000 Split a sequence into (overlapping) smaller sequences |
Go to the output files for this example
Standard (Mandatory) qualifiers: [-sequence] seqall Sequence(s) filename and optional format, or reference (input USA) [-outseq] seqoutall [ |
Standard (Mandatory) qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
Sequence(s) filename and optional format, or reference (input USA) | Readable sequence(s) | Required |
[-outseq] (Parameter 2) |
Sequence set(s) filename and optional format (output USA) | Writeable sequence(s) | <*>.format |
Additional (Optional) qualifiers | Allowed values | Default | |
-size | Size to split at | Integer 1 or more | 10000 |
-overlap | Overlap between split sequences | Integer 0 or more | 0 |
-source | Split using source features with /origid qualifiers | Boolean value Yes/No | No |
-multifile | Split sequence into multiple files | Boolean value Yes/No | No |
Advanced (Unprompted) qualifiers | Allowed values | Default | |
-feature | Use feature information | Boolean value Yes/No | No |
-addoverlap | Add overlap to size | Boolean value Yes/No | No |
ID BA000025; SV 2; linear; genomic DNA; STD; HUM; 2229817 BP. XX AC BA000025; AP000502-AP000521; XX DT 09-DEC-2004 (Rel. 82, Created) DT 14-NOV-2006 (Rel. 89, Last updated, Version 4) XX DE Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region. XX KW . XX OS Homo sapiens (human) OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; OC Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; OC Homo. XX RN [1] RP 1-2229817 RA Hirakawa M., Yamaguchi H., Imai K., Shimada J.; RT ; RL Submitted (21-AUG-2001) to the EMBL/GenBank/DDBJ databases. RL Mika Hirakawa, Japan Science and Technology Corporation (JST), Advanced RL Databases Department; 5-3, Yonbancho, Chiyoda-ku, Tokyo 102-0081, Japan RL (E-mail:mika@tokyo.jst.go.jp, URL:http://www-alis.tokyo.jst.go.jp/, RL Tel:81-3-5214-8491, Fax:81-3-5214-8470) XX RN [2] RA Shiina S., Tamiya G., Oka A., Inoko H.; RT "Homo sapiens 2,229,817bp genomic DNA of 6p21.3 HLA class I region"; RL Unpublished. XX DR EPD; EP11158; HS_TNF. DR EPD; EP11159; HS_LTA. DR EPD; EP73522; HS_HLA-B. DR EPD; EP73908; HS_GTF2H4. DR EPD; EP73940; HS_NEU1. DR EPD; EP74013; HS_VARS2. DR EPD; EP74203; HS_MRPS18B. DR EPD; EP74346; HS_HLA-E. DR EPD; EP74389; HS_BAT1. DR EPD; EP74485; HS_IER3. DR GDB; 11515913. DR GOA; P59942. DR IMGT/HLA; HLA02629; J*01010102. DR RFAM; RF00017. DR RFAM; RF00019. DR RFAM; RF00026. DR RFAM; RF00100. DR RFAM; RF00137. DR RFAM; RF00276. [Part of this file has been deleted for brevity] ttggccccac cccagcatgt ctccaggttc ctctcagccc tggttccttt tggccctgca 2226900 gtcacaatgg gcaacactgt gacgcaccct gtcctgtgtc acagtgtcat acactcaggc 2226960 tcacattgcc cctaggccac ttgccagcca agggacatgg ccacattttg tgtcttctgc 2227020 acctcagcct tgctttcaag tgcaggtgat gatggcaccc acgcagaaca aatgttattt 2227080 gctatcttcg tcgagtttag tcatccaatt ttccaaccct cactgggcaa ggaagagtgt 2227140 ggtttccacc aagaaggcag gatgtcagca gtcacagggg caaccaacag ggaaagccgc 2227200 cggaaaatag accccacagg aagcacaggt gtccagtgga gatgggaacc ctgcagattt 2227260 gaccgtcttt aagcagatta gagagattac cgttactaac aacttagcca taaaagttta 2227320 ttagctattt tcaaaaagca taaaattatg taatataatt ttttttaaat ttccatcaat 2227380 acaaaactaa tctgggcact gcaacttccg gtgggcaact gggataggcg gcatcatcag 2227440 gaaggcgagc cctgccgtgc cccatgtgcc agtgccccag atggcggcag cctccccaga 2227500 agcaccttgt atctcccctg cacagggcca gggtcccagc ttcccataca ccttctcctg 2227560 ctttttcttt tctgtccttt cctttttcaa taaaccacct gcaaaaaggg aaaaccattc 2227620 tgaggacaag aaacatgtca atgggaaata cacagttgcc agagggtaaa aggccctgtt 2227680 cattctcatt gaaaagctca ggtatttctg ttaaagtctc tccttttact ttaggatgct 2227740 gactcctgcg tccatctcaa cctgggcatc gtgccaccac cttcaagaag agaaaaacta 2227800 agtagtgctt tgcaaagggg cagcagcatt tctcatttct gaccatgtca ggcacatggc 2227860 catgcagatg agcaggtggg ggacacaggt gagtctccag acctgctctc ctcccacagt 2227920 acattcttga gtctttttaa acagttgtga aaatgccaca gatgcaagca cctgtgggcc 2227980 actcccatgg ggaccgttgc acaaggcagt gccactcatt ctcagaacct cctaccatgg 2228040 gctatgctta gtgacccgag gccaagccaa ggaagacgcc agccacaggg tgccatcctc 2228100 aggggcatgc tgccagcagg ggcaaagtta tccctagcaa caagatacag aaagaaagaa 2228160 aaaaggaagg aaatgtagcc aatgggccgg ttcaggttct tgactttgcc acacaaaaga 2228220 atttgagagc aagtccaaag taaaagtcag caagagaatt tattgcaaag tgaaagtaca 2228280 ctctgacagc tgatcagagc agctgctcaa aagagagaca gtaccctccc ctcacgggag 2228340 tcttacatga ttattcatga ataggtggga aggggtattg ttttaagcat gttctgtggt 2228400 ctcttgaacg tgcatgcact gtggttgtac atatcagcac acacatctta cgtctcatta 2228460 gcatcttaac ttccctctca gagttgtgtt tgctactatt gtaatgagca taggtcagcc 2228520 caaggacact attcatgggt ttctgggctt cctcagatgt ggggatgcct cccttggctc 2228580 ttctacctct ttgctgcagg atgttctaac cacaagccca ggatatggtt tgcgcactgt 2228640 cgaacagctt gttctctcca tcaacctgac aagtctcttg tttcctttca agggaggctg 2228700 tgaacaccct atctcactga cctcagaagg acagtacagc agtagccacc atgaccaaaa 2228760 agatgattcc agaagtgcag gacaactccc tacccagagg ctgtggctgt gcagtaacac 2228820 accaagaggg gagtccagct ggctctcagg gtgctcacta ccctcatctg ggggcctgga 2228880 ggacgtcaat tcctgagaac gccacgttct agtgagtaga atgaactgag agatacacag 2228940 caaagctcca catacttttc cttttctttg tgcccgcagt gttcttcatc agtgtgctct 2229000 cgcttttcag ctactactgt tggctggctg gaaaaaatag aacaatagta aaaattagag 2229060 accagtcttt ggtgatgaag agaaatattg gctacttcca gtattttcta gctttggtta 2229120 tggttgcagt tttccagctc accttgtggg gatgaattca gaaaaaagtt acaaattgaa 2229180 atgaacatgc cagaagtatt ggctcaaatc aacgttgtcc tattaagcca cttagtgaat 2229240 caaaagaccg cttgttggac tgttaatctc ggtggccaga gaaaggagct gaagaaggtg 2229300 ttgccagatc aggaacaaat aattacagcg gcaatagaaa atggaagacc acttgttcat 2229360 aaccatttga ataagggcaa ggtgtatgga aacacattat gaactgatat tttcagtttt 2229420 gtttgcaaga aaatgattaa taaggtgaaa taattgaagt atcacggaag atacattaaa 2229480 aaaaaaaaaa gcctttgtac agtttgctgg agccacagat gtcctactcc agagcagaac 2229540 aatgcctgaa tcttcagggt ccatttctgc cgcattcact agcaaccaca aatgtgactt 2229600 aattttactt tggaaataat gcttacccat tgtgagatgc tgtaatatga accatcatta 2229660 catgttaaca tggcacatgg aattttgagt gtctaagtta catttttaga gttgtttctt 2229720 agtagccatg tgagtttcca ctccaaaaac acaagctaaa aacttgtttt gagtgaagga 2229780 catctagggc aaatggtggc tgaaagtgaa tgagatc 2229817 // |
>BA000025_1-10000 Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region. gatctccagagcactcttccctgcagggcaccctcccatcccagactccaggcacctggc atgggtggacatctttactttctgggccagcttcagcagagctatgtcatcaccatagaa ctccaggattccctggttctttttggcaaagacatcaaaccctggggagatcaccgcctt ctcaataaggaattctttgccccactgggatttggggtctcctggaaatgatacactaga ttaggctagaccagggctcctgcaggggccagaggctgggtgaggtggtaggatctgtgg cttcaggatcaggaggctggtgcatcccctgccttacccacattgaccctccacagggag tggtcgttgccatcgcggaagcaatgagctgctgtcaggacccattggtcggagatgagg gccccccggcaggtctcttggctcttgggctgcaggggaacaggtgattttcagagattg cagtatgtctggcccatggccgcttttacctctggaatccaagccctgcccctccttcct ggtaccttaatagtgacatgccagggtgtcctctcctggtcagaggcgtttgctgacatg ttccccaccccgcagatggtgtctgtgagcttggagacatctgtgggtgtgaggatcaga tggggaaggaggcaagtgaggggcactgtgtccaggttcccaacacgggcctctggcggg ctcctcaccatcctccccacaccaaggagggcaaagctcactcacccagcatatgttcaa agacctggtgcagagcctttgtgtcctgcagaatgaaggcatgcctctcaccatccttct tggaccctagctcattcagttctctccagtccacatccagcttgcccaccccgatggcat agatgtctggtggggaagagggaaatcaccagactcctgtggctttggggctaccccatg agacaggaggctgtcatctgaaactcactgtgtccaatcaagacctacatgagctggacc cctgcgtcctccccactgctacctgtctgccttcatttcctgccactccctgcccttcac tctcctgcagcacacagcctctttgaagttcctcaaatccataggcatggtcacacctca ggccctttgcccagctgtgcctctgcctagttcactcctcccccccagacttccacatgg ctcactttcgtacctttttaagtcttggctcaaatgtcaccttctcagtgaggccttccc tggtcttcctgtctaaaactgcaatgccccagacaaactttcatccccactttgggaggc aaggtgggaggatcccttgaagccagaagtttgagaccagcctgggcaacatggcaacac cccttagcttgtgtcacctaccacctgctgggttctatggttttcttatcctgtttattc cctgtaatggtggaattgtgtcccccagaaagatgtgttcgagtcctaatccccagtatc tgtgactttatttggaaaaagggtctttgcagatgtaatcaagttaagattaagtcatac tagattagggtgagctctaatccaatgactgaggtccttataagaagaggtaagccagag ccaggcgtggtggctcacacctgtaatcaccaggaggcggtggttgtggtgagccaagat cgcgccattgcactccagcctgggcaacaagagcaaaaccccgtctcaaaaaaaaaaaaa gaagaggtgagccgggcacggtggctcacacctgtaatcccagcactctgggaggctgag gcgggcagatcacgaggtcaggaattcaagaccagcctgaccaacatggtgaaaccctgt ctctactaaaaatacaaaaattagccagacatgctggcacacacctgtaatcccagctac tcaggaggctgaggcaggagaatcgcttgaaccgggaggcggatgttgcagtgagccgag attgcaccactgcactccagcctgggcaacagagcaagactccatctcaaaaaaaaaaaa aaaaaaaaaaagtgaactggctgggcatggtggtgactcatgcctgtaatcccggcagtt tttttgaggcgaaggcaggcagatcgccttgaggccaggagtttaagaccagcctagcca acatggcgagaccatgtctctactaaaaatacaaaaatttgccgggcatggtggcacatg cctgtaatcccagcttcttgggagactgaggcacgagaatcacctgaacccaggaggcag aggttacagtgagccgggatcccgccactgcactgcagcctgggcttctgggtgacagag cgagactctgtctcaaacaaatgaacagaaaaagaagaaaggaatttggacacaaagaca caggtagtgggtctcctatctatataagagaacagcatgtaatgacacagaggcacacac agaaaagaaggcgagttgaagacagaggcagagaatgggtttatgctgccgcaagccaag gttggagctgccggcagccggaaaaggcaggaaagaattcttcccaagagccttctgagg aagcacggccctgccaacaccttgatttcagacttctaacctccagaactgtaagaaaaa gaaattctgtgttctaagccacccaggtttgtggtagtttggtaagtacttttaaatgac tgaatgaatagaaagaactcagaacacaacatggaaactaaacctcagatctggtcttcc tctgtaaaaggtagcatctgggagaagggcctaaagccacgttttcccactggaggccct ggacccacacaacaggccgcgcctgtcctccgactgtggtgccagtcagaactgccctca gacagaccacagagtctactcctctcccagcctttgcaccccttgtggcccatttttgtt [Part of this file has been deleted for brevity] cctcggtctgtctccaccaggccctgtgagggtgggtggaggctctctccaagccctcgt ttggccccaccccagcatgtctccaggttcctctcagccctggttccttttggccctgca gtcacaatgggcaacactgtgacgcaccctgtcctgtgtcacagtgtcatacactcaggc tcacattgcccctaggccacttgccagccaagggacatggccacattttgtgtcttctgc acctcagccttgctttcaagtgcaggtgatgatggcacccacgcagaacaaatgttattt gctatcttcgtcgagtttagtcatccaattttccaaccctcactgggcaaggaagagtgt ggtttccaccaagaaggcaggatgtcagcagtcacaggggcaaccaacagggaaagccgc cggaaaatagaccccacaggaagcacaggtgtccagtggagatgggaaccctgcagattt gaccgtctttaagcagattagagagattaccgttactaacaacttagccataaaagttta ttagctattttcaaaaagcataaaattatgtaatataattttttttaaatttccatcaat acaaaactaatctgggcactgcaacttccggtgggcaactgggataggcggcatcatcag gaaggcgagccctgccgtgccccatgtgccagtgccccagatggcggcagcctccccaga agcaccttgtatctcccctgcacagggccagggtcccagcttcccatacaccttctcctg ctttttcttttctgtcctttcctttttcaataaaccacctgcaaaaagggaaaaccattc tgaggacaagaaacatgtcaatgggaaatacacagttgccagagggtaaaaggccctgtt cattctcattgaaaagctcaggtatttctgttaaagtctctccttttactttaggatgct gactcctgcgtccatctcaacctgggcatcgtgccaccaccttcaagaagagaaaaacta agtagtgctttgcaaaggggcagcagcatttctcatttctgaccatgtcaggcacatggc catgcagatgagcaggtgggggacacaggtgagtctccagacctgctctcctcccacagt acattcttgagtctttttaaacagttgtgaaaatgccacagatgcaagcacctgtgggcc actcccatggggaccgttgcacaaggcagtgccactcattctcagaacctcctaccatgg gctatgcttagtgacccgaggccaagccaaggaagacgccagccacagggtgccatcctc aggggcatgctgccagcaggggcaaagttatccctagcaacaagatacagaaagaaagaa aaaaggaaggaaatgtagccaatgggccggttcaggttcttgactttgccacacaaaaga atttgagagcaagtccaaagtaaaagtcagcaagagaatttattgcaaagtgaaagtaca ctctgacagctgatcagagcagctgctcaaaagagagacagtaccctcccctcacgggag tcttacatgattattcatgaataggtgggaaggggtattgttttaagcatgttctgtggt ctcttgaacgtgcatgcactgtggttgtacatatcagcacacacatcttacgtctcatta gcatcttaacttccctctcagagttgtgtttgctactattgtaatgagcataggtcagcc caaggacactattcatgggtttctgggcttcctcagatgtggggatgcctcccttggctc ttctacctctttgctgcaggatgttctaaccacaagcccaggatatggtttgcgcactgt cgaacagcttgttctctccatcaacctgacaagtctcttgtttcctttcaagggaggctg tgaacaccctatctcactgacctcagaaggacagtacagcagtagccaccatgaccaaaa agatgattccagaagtgcaggacaactccctacccagaggctgtggctgtgcagtaacac accaagaggggagtccagctggctctcagggtgctcactaccctcatctgggggcctgga ggacgtcaattcctgagaacgccacgttctagtgagtagaatgaactgagagatacacag caaagctccacatacttttccttttctttgtgcccgcagtgttcttcatcagtgtgctct cgcttttcagctactactgttggctggctggaaaaaatagaacaatagtaaaaattagag accagtctttggtgatgaagagaaatattggctacttccagtattttctagctttggtta tggttgcagttttccagctcaccttgtggggatgaattcagaaaaaagttacaaattgaa atgaacatgccagaagtattggctcaaatcaacgttgtcctattaagccacttagtgaat caaaagaccgcttgttggactgttaatctcggtggccagagaaaggagctgaagaaggtg ttgccagatcaggaacaaataattacagcggcaatagaaaatggaagaccacttgttcat aaccatttgaataagggcaaggtgtatggaaacacattatgaactgatattttcagtttt gtttgcaagaaaatgattaataaggtgaaataattgaagtatcacggaagatacattaaa aaaaaaaaaagcctttgtacagtttgctggagccacagatgtcctactccagagcagaac aatgcctgaatcttcagggtccatttctgccgcattcactagcaaccacaaatgtgactt aattttactttggaaataatgcttacccattgtgagatgctgtaatatgaaccatcatta catgttaacatggcacatggaattttgagtgtctaagttacatttttagagttgtttctt agtagccatgtgagtttccactccaaaaacacaagctaaaaacttgttttgagtgaagga catctagggcaaatggtggctgaaagtgaatgagatc |
>BA000025_1-50000 Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region. gatctccagagcactcttccctgcagggcaccctcccatcccagactccaggcacctggc atgggtggacatctttactttctgggccagcttcagcagagctatgtcatcaccatagaa ctccaggattccctggttctttttggcaaagacatcaaaccctggggagatcaccgcctt ctcaataaggaattctttgccccactgggatttggggtctcctggaaatgatacactaga ttaggctagaccagggctcctgcaggggccagaggctgggtgaggtggtaggatctgtgg cttcaggatcaggaggctggtgcatcccctgccttacccacattgaccctccacagggag tggtcgttgccatcgcggaagcaatgagctgctgtcaggacccattggtcggagatgagg gccccccggcaggtctcttggctcttgggctgcaggggaacaggtgattttcagagattg cagtatgtctggcccatggccgcttttacctctggaatccaagccctgcccctccttcct ggtaccttaatagtgacatgccagggtgtcctctcctggtcagaggcgtttgctgacatg ttccccaccccgcagatggtgtctgtgagcttggagacatctgtgggtgtgaggatcaga tggggaaggaggcaagtgaggggcactgtgtccaggttcccaacacgggcctctggcggg ctcctcaccatcctccccacaccaaggagggcaaagctcactcacccagcatatgttcaa agacctggtgcagagcctttgtgtcctgcagaatgaaggcatgcctctcaccatccttct tggaccctagctcattcagttctctccagtccacatccagcttgcccaccccgatggcat agatgtctggtggggaagagggaaatcaccagactcctgtggctttggggctaccccatg agacaggaggctgtcatctgaaactcactgtgtccaatcaagacctacatgagctggacc cctgcgtcctccccactgctacctgtctgccttcatttcctgccactccctgcccttcac tctcctgcagcacacagcctctttgaagttcctcaaatccataggcatggtcacacctca ggccctttgcccagctgtgcctctgcctagttcactcctcccccccagacttccacatgg ctcactttcgtacctttttaagtcttggctcaaatgtcaccttctcagtgaggccttccc tggtcttcctgtctaaaactgcaatgccccagacaaactttcatccccactttgggaggc aaggtgggaggatcccttgaagccagaagtttgagaccagcctgggcaacatggcaacac cccttagcttgtgtcacctaccacctgctgggttctatggttttcttatcctgtttattc cctgtaatggtggaattgtgtcccccagaaagatgtgttcgagtcctaatccccagtatc tgtgactttatttggaaaaagggtctttgcagatgtaatcaagttaagattaagtcatac tagattagggtgagctctaatccaatgactgaggtccttataagaagaggtaagccagag ccaggcgtggtggctcacacctgtaatcaccaggaggcggtggttgtggtgagccaagat cgcgccattgcactccagcctgggcaacaagagcaaaaccccgtctcaaaaaaaaaaaaa gaagaggtgagccgggcacggtggctcacacctgtaatcccagcactctgggaggctgag gcgggcagatcacgaggtcaggaattcaagaccagcctgaccaacatggtgaaaccctgt ctctactaaaaatacaaaaattagccagacatgctggcacacacctgtaatcccagctac tcaggaggctgaggcaggagaatcgcttgaaccgggaggcggatgttgcagtgagccgag attgcaccactgcactccagcctgggcaacagagcaagactccatctcaaaaaaaaaaaa aaaaaaaaaaagtgaactggctgggcatggtggtgactcatgcctgtaatcccggcagtt tttttgaggcgaaggcaggcagatcgccttgaggccaggagtttaagaccagcctagcca acatggcgagaccatgtctctactaaaaatacaaaaatttgccgggcatggtggcacatg cctgtaatcccagcttcttgggagactgaggcacgagaatcacctgaacccaggaggcag aggttacagtgagccgggatcccgccactgcactgcagcctgggcttctgggtgacagag cgagactctgtctcaaacaaatgaacagaaaaagaagaaaggaatttggacacaaagaca caggtagtgggtctcctatctatataagagaacagcatgtaatgacacagaggcacacac agaaaagaaggcgagttgaagacagaggcagagaatgggtttatgctgccgcaagccaag gttggagctgccggcagccggaaaaggcaggaaagaattcttcccaagagccttctgagg aagcacggccctgccaacaccttgatttcagacttctaacctccagaactgtaagaaaaa gaaattctgtgttctaagccacccaggtttgtggtagtttggtaagtacttttaaatgac tgaatgaatagaaagaactcagaacacaacatggaaactaaacctcagatctggtcttcc tctgtaaaaggtagcatctgggagaagggcctaaagccacgttttcccactggaggccct ggacccacacaacaggccgcgcctgtcctccgactgtggtgccagtcagaactgccctca gacagaccacagagtctactcctctcccagcctttgcaccccttgtggcccatttttgtt [Part of this file has been deleted for brevity] ggagaggggcaggtgcccctcctcggtctgtctccaccaggccctgtgagggtgggtgga ggctctctccaagccctcgtttggccccaccccagcatgtctccaggttcctctcagccc tggttccttttggccctgcagtcacaatgggcaacactgtgacgcaccctgtcctgtgtc acagtgtcatacactcaggctcacattgcccctaggccacttgccagccaagggacatgg ccacattttgtgtcttctgcacctcagccttgctttcaagtgcaggtgatgatggcaccc acgcagaacaaatgttatttgctatcttcgtcgagtttagtcatccaattttccaaccct cactgggcaaggaagagtgtggtttccaccaagaaggcaggatgtcagcagtcacagggg caaccaacagggaaagccgccggaaaatagaccccacaggaagcacaggtgtccagtgga gatgggaaccctgcagatttgaccgtctttaagcagattagagagattaccgttactaac aacttagccataaaagtttattagctattttcaaaaagcataaaattatgtaatataatt ttttttaaatttccatcaatacaaaactaatctgggcactgcaacttccggtgggcaact gggataggcggcatcatcaggaaggcgagccctgccgtgccccatgtgccagtgccccag atggcggcagcctccccagaagcaccttgtatctcccctgcacagggccagggtcccagc ttcccatacaccttctcctgctttttcttttctgtcctttcctttttcaataaaccacct gcaaaaagggaaaaccattctgaggacaagaaacatgtcaatgggaaatacacagttgcc agagggtaaaaggccctgttcattctcattgaaaagctcaggtatttctgttaaagtctc tccttttactttaggatgctgactcctgcgtccatctcaacctgggcatcgtgccaccac cttcaagaagagaaaaactaagtagtgctttgcaaaggggcagcagcatttctcatttct gaccatgtcaggcacatggccatgcagatgagcaggtgggggacacaggtgagtctccag acctgctctcctcccacagtacattcttgagtctttttaaacagttgtgaaaatgccaca gatgcaagcacctgtgggccactcccatggggaccgttgcacaaggcagtgccactcatt ctcagaacctcctaccatgggctatgcttagtgacccgaggccaagccaaggaagacgcc agccacagggtgccatcctcaggggcatgctgccagcaggggcaaagttatccctagcaa caagatacagaaagaaagaaaaaaggaaggaaatgtagccaatgggccggttcaggttct tgactttgccacacaaaagaatttgagagcaagtccaaagtaaaagtcagcaagagaatt tattgcaaagtgaaagtacactctgacagctgatcagagcagctgctcaaaagagagaca gtaccctcccctcacgggagtcttacatgattattcatgaataggtgggaaggggtattg ttttaagcatgttctgtggtctcttgaacgtgcatgcactgtggttgtacatatcagcac acacatcttacgtctcattagcatcttaacttccctctcagagttgtgtttgctactatt gtaatgagcataggtcagcccaaggacactattcatgggtttctgggcttcctcagatgt ggggatgcctcccttggctcttctacctctttgctgcaggatgttctaaccacaagccca ggatatggtttgcgcactgtcgaacagcttgttctctccatcaacctgacaagtctcttg tttcctttcaagggaggctgtgaacaccctatctcactgacctcagaaggacagtacagc agtagccaccatgaccaaaaagatgattccagaagtgcaggacaactccctacccagagg ctgtggctgtgcagtaacacaccaagaggggagtccagctggctctcagggtgctcacta ccctcatctgggggcctggaggacgtcaattcctgagaacgccacgttctagtgagtaga atgaactgagagatacacagcaaagctccacatacttttccttttctttgtgcccgcagt gttcttcatcagtgtgctctcgcttttcagctactactgttggctggctggaaaaaatag aacaatagtaaaaattagagaccagtctttggtgatgaagagaaatattggctacttcca gtattttctagctttggttatggttgcagttttccagctcaccttgtggggatgaattca gaaaaaagttacaaattgaaatgaacatgccagaagtattggctcaaatcaacgttgtcc tattaagccacttagtgaatcaaaagaccgcttgttggactgttaatctcggtggccaga gaaaggagctgaagaaggtgttgccagatcaggaacaaataattacagcggcaatagaaa atggaagaccacttgttcataaccatttgaataagggcaaggtgtatggaaacacattat gaactgatattttcagttttgtttgcaagaaaatgattaataaggtgaaataattgaagt atcacggaagatacattaaaaaaaaaaaaagcctttgtacagtttgctggagccacagat gtcctactccagagcagaacaatgcctgaatcttcagggtccatttctgccgcattcact agcaaccacaaatgtgacttaattttactttggaaataatgcttacccattgtgagatgc tgtaatatgaaccatcattacatgttaacatggcacatggaattttgagtgtctaagtta catttttagagttgtttcttagtagccatgtgagtttccactccaaaaacacaagctaaa aacttgttttgagtgaaggacatctagggcaaatggtggctgaaagtgaatgagatc |
The names of the sequences are the same as the original sequence, with '_start-end' appended, where 'start', and 'end' are the start and end positions of the sub-sequence. eg: The name HSHBB would be changed in the sub-sequences to: HSHBB_1-50000 and HSHBB_50001-73308 if they were split at the size of 50000 with no overlap.
Program name | Description |
---|---|
biosed | Replace or delete sequence sections |
codcopy | Reads and writes a codon usage table |
cutseq | Removes a specified section from a sequence |
degapseq | Removes gap characters from sequences |
descseq | Alter the name or description of a sequence |
entret | Reads and writes (returns) flatfile entries |
extractalign | Extract regions from a sequence alignment |
extractfeat | Extract features from a sequence |
extractseq | Extract regions from a sequence |
listor | Write a list file of the logical OR of two sets of sequences |
makenucseq | Creates random nucleotide sequences |
makeprotseq | Creates random protein sequences |
maskfeat | Mask off features of a sequence |
maskseq | Mask off regions of a sequence |
newseq | Type in a short new sequence |
noreturn | Removes carriage return from ASCII files |
notseq | Exclude a set of sequences and write out the remaining ones |
nthseq | Writes one sequence from a multiple set of sequences |
pasteseq | Insert one sequence into another |
revseq | Reverse and complement a sequence |
seqret | Reads and writes (returns) sequences |
seqretsplit | Reads and writes (returns) sequences in individual files |
skipseq | Reads and writes (returns) sequences, skipping first few |
trimest | Trim poly-A tails off EST sequences |
trimseq | Trim ambiguous bits off the ends of sequences |
union | Reads sequence fragments and builds one sequence |
vectorstrip | Strips out DNA between a pair of vector sequences |
yank | Reads a sequence range, appends the full USA to a list file |