genseq.pl

Usage

usage:   genseq.pl [options] [-pdb | -monsster | -one] [file]
options: [-out [one][sec] | monsster]
         [-sel from:to]
         [-s inx:sequence[=inx:sequence]]
         [-2ndpred file[:file...]
         [-2ndone file]
         [-dssp] [-dsspfull] [-dsspnum]
         [-fill]

Show source

Description

The main purpose of this script is to generate a MONSSTER amino acid sequence file from a PDB file or sequence string but it can also perform some other related functions.

The input format is selected with the -pdb (PDB file, default), -one (string consisting of one letter amino acid codes), or -monsster (MONSSTER sequence file) options. If a file name is specified input is read from that file, otherwise from standard input.
The output format is either a MONSSTER sequence file (default) or a string of one letter amino acid codes. It is selected with -out monsster or -out one, respectively. With -out onesec secondary structure abbreviations are printed along with the sequence. Using these options a sequence file can be generated from a PDB file but it is also possible to get an abbreviated sequence string from a sequence file, e.g., as shown in the examples below.
If the PDB file is lacking part of the complete structure, as in typical loop modeling applications, the sequence for the missing parts can be given through the option -s. This option requires the first index in the PDB structure and an abbreviated sequence string as a colon-separated argument.

Further options are available to include secondary structure information in the sequence file. If -dssp is specified the secondary structure is taken from DSSP output. This requires the availability of a compiled version of DSSP in the MMTSB binary directory. With the option -2ndpred a list of files containing output from common automated secondary structure prediction programs can be provided. If more than file is given a consensus prediction will be determined from all predictions. Currently prediction outputs from the following programs are recognized: PHD, SSpro, PSIpred, Jpred2, PSSP, PROF, Pred2ary. Finally it is possible to set the secondary structure from a file containing an abbreviated string with the option -2ndone.

Options

-help: usage information
-pdb: read input in PDB format
-monsster: read input in MONSSTER sequence format
-one: read input as one-letter amino acid abbreviations
-out one|onesec|sec|monsster: specify output format
-sel from:to: output only limited residue range
-s inx:sequence[=...]: provide sequence at specific index (useful for filling in missing sequence from input)
-2ndpred file[:file...]: read secondary structure predictions from files
-2ndone file: read one-letter secondary structure code from file
-dssp: determine secondary structure with DSSP program

Examples

genseq.pl 1vii.exp.pdb
generates a MONSSTER sequence file from a PDB file

    1   MET    1    1
    2   LEU    1    1
    3   SER    1    1
    4   ASP    1    1
    5   GLU    1    1
    6   ASP    1    1
    7   PHE    1    1
    8   LYS    1    1
    9   ALA    1    1
   10   VAL    1    1

...

genseq.pl -out one -monsster 1vii.seq
generates an abbreviated sequence string from a MONSSTER sequence file

MLSDEDFKAVFGMTRSAFANLPLWKQQNLKKEKGLF

genseq.pl -dssp 1vii.exp.pdb
generates a MONSSTER sequence file with secondary structure identification calculated via dssp (alpha: 2, beta: 4, coil/unknown: 1)

    1   MET    1    1
    2   LEU    1    1
    3   SER    1    1
    4   ASP    2    1
    5   GLU    2    1
    6   ASP    2    1
    7   PHE    2    1
    8   LYS    2    1
    9   ALA    1    1
   10   VAL    1    1

...

genseq.pl -out onesec -2ndpred phd.out:jpred2.out 1vii.exp.pdb
generates abbreviated sequence information from a PDB file and secondary structure information from PHD and jpred2 secondary structure prediction server output.

MLSDEDFKAVFGMTRSAFANLPLWKQQNLKKEKGLF
UUUUHHHHHHHHHHHHHHHHHHHHHHHHHHHHUUUU

genseq.pl -s 10:VFGMTRSAFANL 1vii.exp.x10:21.pdb
generates a MONSSTER sequence file from the given PDB file. The sequence for the missing residues starting at index 10 are inserted from the sequence string.

    1   MET    1    1
    2   LEU    1    1
    3   SER    1    1
    4   ASP    1    1
    5   GLU    1    1
    6   ASP    1    1
    7   PHE    1    1
    8   LYS    1    1
    9   ALA    1    1
   10   VAL    1    1

...