pepwindow |
% pepwindow tsw:hba_human Displays protein hydropathy Graph type [x11]: cps Created pepwindow.ps |
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers: [-sequence] sequence Protein sequence filename and optional format, or reference (input USA) -graph xygraph [$EMBOSS_GRAPHICS value, or x11] Graph type (ps, hpgl, hp7470, hp7580, meta, cps, x11, tekt, tek, none, data, xterm, png, gif) Additional (Optional) qualifiers: -datafile datafile [Enakai.dat] AAINDEX entry data file -length integer [7] Window size (Integer from 1 to 200) Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-sequence" associated qualifiers -sbegin1 integer Start of the sequence to be used -send1 integer End of the sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-graph" associated qualifiers -gprompt boolean Graph prompting -gdesc string Graph description -gtitle string Graph title -gsubtitle string Graph subtitle -gxtitle string Graph x axis title -gytitle string Graph y axis title -goutfile string Output file for non interactive displays -gdirectory string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write standard output -filter boolean Read standard input, write standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages |
Standard (Mandatory) qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
Protein sequence filename and optional format, or reference (input USA) | Readable sequence | Required |
-graph | Graph type | EMBOSS has a list of known devices, including ps, hpgl, hp7470, hp7580, meta, cps, x11, tekt, tek, none, data, xterm, png, gif | EMBOSS_GRAPHICS value, or x11 |
Additional (Optional) qualifiers | Allowed values | Default | |
-datafile | AAINDEX entry data file | Data file | Enakai.dat |
-length | Window size | Integer from 1 to 200 | 7 |
Advanced (Unprompted) qualifiers | Allowed values | Default | |
(none) |
ID HBA_HUMAN Reviewed; 142 AA. AC P69905; P01922; Q96KF1; Q9NYR7; DT 21-JUL-1986, integrated into UniProtKB/Swiss-Prot. DT 23-JAN-2007, sequence version 2. DT 03-APR-2007, entry version 41. DE Hemoglobin subunit alpha (Hemoglobin alpha chain) (Alpha-globin). GN Name=HBA1; GN and GN Name=HBA2; OS Homo sapiens (Human). OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; OC Catarrhini; Hominidae; Homo. OX NCBI_TaxID=9606; RN [1] RP NUCLEOTIDE SEQUENCE [GENOMIC DNA] (HBA1). RX MEDLINE=81088339; PubMed=7448866; DOI=10.1016/0092-8674(80)90347-5; RA Michelson A.M., Orkin S.H.; RT "The 3' untranslated regions of the duplicated human alpha-globin RT genes are unexpectedly divergent."; RL Cell 22:371-377(1980). RN [2] RP NUCLEOTIDE SEQUENCE [MRNA] (HBA2). RX MEDLINE=80137531; PubMed=6244294; RA Wilson J.T., Wilson L.B., Reddy V.B., Cavallesco C., Ghosh P.K., RA Deriel J.K., Forget B.G., Weissman S.M.; RT "Nucleotide sequence of the coding portion of human alpha globin RT messenger RNA."; RL J. Biol. Chem. 255:2807-2815(1980). RN [3] RP NUCLEOTIDE SEQUENCE [GENOMIC DNA] (HBA2). RX MEDLINE=81175088; PubMed=6452630; RA Liebhaber S.A., Goossens M.J., Kan Y.W.; RT "Cloning and complete nucleotide sequence of human 5'-alpha-globin RT gene."; RL Proc. Natl. Acad. Sci. U.S.A. 77:7054-7058(1980). RN [4] RP NUCLEOTIDE SEQUENCE [GENOMIC DNA]. RX PubMed=6946451; RA Orkin S.H., Goff S.C., Hechtman R.L.; RT "Mutation in an intervening sequence splice junction in man."; RL Proc. Natl. Acad. Sci. U.S.A. 78:5041-5045(1981). RN [5] RP NUCLEOTIDE SEQUENCE [GENOMIC DNA], AND VARIANT LYS-32. RX MEDLINE=21303311; PubMed=11410421; RA Zhao Y., Xu X.; RT "Alpha2(CD31 AGG-->AAG, Arg-->Lys) causing non-deletional alpha- RT thalassemia in a Chinese family with HbH disease."; RL Haematologica 86:541-542(2001). RN [6] [Part of this file has been deleted for brevity] FT /FTId=VAR_002840. FT VARIANT 131 131 A -> D (in Yuda; O(2) affinity down). FT /FTId=VAR_002842. FT VARIANT 131 131 A -> P (in Sun Prairie; unstable). FT /FTId=VAR_002841. FT VARIANT 132 132 S -> P (in Questembert; highly unstable; FT causes alpha-thalassemia). FT /FTId=VAR_002843. FT VARIANT 134 134 S -> R (in Val de Marne; O(2) affinity FT up). FT /FTId=VAR_002844. FT VARIANT 136 136 V -> E (in Pavie). FT /FTId=VAR_002845. FT VARIANT 137 137 L -> M (in Chicago). FT /FTId=VAR_002846. FT VARIANT 137 137 L -> P (in Bibba; unstable; causes alpha- FT thalassemia). FT /FTId=VAR_002847. FT VARIANT 139 139 S -> P (in Attleboro; O(2) affinity up). FT /FTId=VAR_002848. FT VARIANT 140 140 K -> E (in Hanamaki; O(2) affinity up). FT /FTId=VAR_002849. FT VARIANT 140 140 K -> T (in Tokoname; O(2) affinity up). FT /FTId=VAR_002850. FT VARIANT 141 141 Y -> H (in Rouen; O(2) affinity up). FT /FTId=VAR_002851. FT VARIANT 142 142 R -> C (in Nunobiki; O(2) affinity up). FT /FTId=VAR_002852. FT VARIANT 142 142 R -> H (in Suresnes; O(2) affinity up). FT /FTId=VAR_002854. FT VARIANT 142 142 R -> L (in Legnano; O(2) affinity up). FT /FTId=VAR_002853. FT VARIANT 142 142 R -> P (in Singapore). FT /FTId=VAR_002855. FT HELIX 4 15 FT HELIX 16 20 FT HELIX 21 35 FT HELIX 37 42 FT HELIX 53 71 FT HELIX 73 75 FT HELIX 76 79 FT HELIX 81 89 FT HELIX 96 112 FT TURN 114 116 FT HELIX 119 136 FT TURN 137 139 SQ SEQUENCE 142 AA; 15258 MW; 15E13666573BBBAE CRC64; MVLSPADKTN VKAAWGKVGA HAGEYGAEAL ERMFLSFPTT KTYFPHFDLS HGSAQVKGHG KKVADALTNA VAHVDDMPNA LSALSDLHAH KLRVDPVNFK LLSHCLLVTL AAHLPAEFTP AVHASLDKFL ASVSTVLTSK YR // |
EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.
To see the available EMBOSS data files, run:
% embossdata -showall
To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:
% embossdata -fetch -file Exxx.dat
Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".
The directories are searched in the following order:
pepwindow reads the Kyte-Doolittle hydropathy data from the file 'Enakai.dat'
The EMBOSS data file 'Enakai.dat' contains :-
D Hydropathy index (Kyte-Doolittle, 1982) R 0807099 A Kyte, J. and Doolittle, R.F. T A simple method for displaying the hydropathic character of a protein J J. Mol. Biol. 157, 105-132 (1982) C CHOC760103 0.964 JANJ780102 0.922 DESM900102 0.898 EISD860103 0.897 CHOC760104 0.889 WOLR810101 0.885 RADA880101 0.884 MANP780101 0.881 EISD840101 0.878 PONP800103 0.870 NAKH920108 0.868 JANJ790101 0.867 JANJ790102 0.866 PONP800102 0.861 MEIH800103 0.856 PONP800101 0.851 PONP800108 0.850 WARP780101 0.845 RADA880108 0.842 ROSG850102 0.841 DESM900101 0.837 BIOV880101 0.829 RADA880107 0.828 LIFS790102 0.824 KANM800104 0.824 CIDH920104 0.824 MIYS850101 0.821 RADA880104 0.819 NAKH900111 0.817 NISK800101 0.812 FAUJ830101 0.811 ARGP820103 0.806 NAKH920105 0.803 ARGP820102 0.803 KRIW790101 -0.805 CHOC760102 -0.838 GUYH850101 -0.843 RACS770102 -0.844 JANJ780103 -0.845 ROSM880101 -0.845 PRAM900101 -0.850 JANJ780101 -0.852 GRAR740102 -0.859 MEIH800102 -0.871 ROSM880102 -0.878 OOBM770101 -0.899 I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V 1.8 -4.5 -3.5 -3.5 2.5 -3.5 -3.5 -0.4 -3.2 4.5 3.8 -3.9 1.9 2.8 -1.6 -0.8 -0.7 -0.9 -1.3 4.2 //
pepwindow can use any of the "Nakai et al." database of amino acid parameters - these used to be in a database called "NAKAI" but are now in one called "AAINDEX". EMBOSS has a program aaindexextract that takes data from this database and makes it available for pepwindow.
1. FTP the AAINDEX database from Japan:
ftp://ftp.genome.ad.jp/pub/db/genomenet/aaindex/aaindex1
2. Run aaindexextract with the aaindex1 file as input (or ask whoever installs EMBOSS to run it)
3. Run pepwindow with -datafile specifying the name of whatever "AAINDEX" datafile you wish to use. (Use embossdata -showall to see your available "AAINDEX" data file names.)
Kyte, J. and Doolittle, R.F. A simple method for displaying the hydropathic character of a protein J. Mol. Biol. 157, 105-132 (1982)
Program name | Description |
---|---|
backtranambig | Back translate a protein sequence to ambiguous codons |
backtranseq | Back translate a protein sequence |
charge | Protein charge plot |
checktrans | Reports STOP codons and ORF statistics of a protein |
compseq | Count composition of dimer/trimer/etc words in a sequence |
emowse | Protein identification by mass spectrometry |
freak | Residue/base frequency table or plot |
iep | Calculates the isoelectric point of a protein |
mwcontam | Shows molwts that match across a set of files |
mwfilter | Filter noisy molwts from mass spec output |
octanol | Displays protein hydropathy |
pepinfo | Plots simple amino acid properties in parallel |
pepstats | Protein statistics |
pepwindowall | Displays protein hydropathy of a set of sequences |
Based on the original program by Jack Kyte and Russell F. Doolittle.