![]() |
|
cluster.pl
Usage
usage: cluster.pl [options] [files] options: [-jclust] [-kclust] [-maxnum value] [-minsize value] [-maxlevel value] [-radius value] [-[no]iterate] [-mode rmsd|contact|phi|psi|phipsi|mix] [-contmaxdist value] [-mixfactor value] [-pdb | -sicho] [-selmode ca|cb|cab|heavy|all] [-l min:max[=min:max ...]] [-fitxl] [-[no]lsqfit] [-centroid] [-centout template] [-log file]
Description
This script applies a clustering algorithm to a set
of files. The list of files can be passed either as the last arguments
on the command line or through standard input. The clustering result
is written to standard output.
Two clustering methods are available. Hierarchical clustering
that uses automatic stopping criteria is used by default
or if the option -jclust is given.
With this cluster method the ideal number of
clusters is determined automatically up to a maximum number of clusters.
The default maximum is 4 clusters, it may be changed with -maxnum.
For each of these initially identified clusters the clustering procedure is
then reapplied to determine subclusters if the number
of elements in a cluster is larger than a given threshold. This
threshold is set to 400 structures by default and can be changed
with -minsize. The procedure is repeated recursively as long as there
are large enough subclusters or until a maximum recusion level is
reached. The level is set to 999 by default, which means practically
no limit, it can be changed with -maxlevel.
An alternative clustering method employs a fixed cluster radius and
is selected with -kclust. Subclusters are not generated in this
case, and the resulting number of clusters cannot be limited with
-maxnum. The fixed cluster radius is given with -radius.
Depending on the cluster method different cluster modes that
use different criteria for measuring distances between the
input structures are available. The cluster mode is selected with
-mode. Both cluster methods support clustering based on
cartesian coordinate RMSD between structures. With the hierarchical
clustering algorithm it is also possible to cluster based on similarities
in the contact map (contact).
In this case the option -contmaxdist determines the maximum distance
between residues for being counted in the comparison. The default
value is 12.0 A.
The fixed radius clustering method supports dihedral based clustering
using RMSD values for phi angles (phi), psi angles (psi),
or both (phipsi). A mixing mode is also supported where the
distance measure is given by the sum of phi and psi RMSD values, divided by 20
and multiplied by a mixing factor, and the cartesian space RMSD multiplied
by 1 minus the mixing factor. This mode is selected with mix.
The mixing factor can be set with -mixfactor.
In the default mode of operation the input structures are expected
to be in PDB format (option: -pdb). Alternatively, SICHO lattice
chains can be used as input if -sicho is given.
For loop/fragment modeling it is possible to restrict the comparison
to a range of residues specified with the -l option. In this case
the fit between different structures before an RMSD value is calculated
may either be done based on the loop/fragment residues (default) or based
on the rest of protein excluding the loop/fragment if -fitxl is
specified.
A log file can be requested if the option -log is used.
It is also possible to request output of the centroids of each cluster
with -centroid. The option -centout is used to provide
a template for the centroid file names. The centroids are written in PDB
format if the clustering is based on RMSD comparison. For contact map based
clustering the centroids are RGB maps with the contact map represented
as a bitmap.
Options
- -help
- usage information
Examples
cluster.pl 1vii.sample.*.pdb
performs hierarchical clustering of the input files
1vii.sample.*.pdb based on mutual RMSD.
# cluster file # automatically generated on: Tue Sep 25 14:08:51 2001 # mode: rmsd, filetype: pdb, lsqfit: 1, selmode: cab @cluster t has 16 elements, 3 subclusters 1 1vii.sample.1.pdb 2 1vii.sample.10.pdb 3 1vii.sample.11.pdb 4 1vii.sample.12.pdb 5 1vii.sample.13.pdb 6 1vii.sample.14.pdb ...
cluster.pl -maxnum 3 -mode contact -contmaxdist 8.0 1vii.sample.*.pdb
performs hierarchical clustering of the input files
1vii.sample.*.pdb based on differences
in the residue contact maps allowing a maximum of 3 clusters.
Residue separations of more
than 8 A are ignored in the comparison.
# cluster file # automatically generated on: Tue Sep 25 14:09:23 2001 # mode: contact, filetype: pdb, lsqfit: 1, selmode: cab @cluster t has 16 elements, 2 subclusters 1 1vii.sample.1.pdb 2 1vii.sample.10.pdb 3 1vii.sample.11.pdb 4 1vii.sample.12.pdb 5 1vii.sample.13.pdb 6 1vii.sample.14.pdb ...
cluster.pl -maxnum 3 -minsize 5 -l 10:21 -fitxl -log cluster.log 1vii.sample.*.pdb
performs hierarchical clustering of the input files
1vii.sample.*.pdb based on mutual RMSD
for residues 10 through 21. Before calculating RMSD
values the rest of two protein structures that
are compared is overlayed with a least squares fit.
At each clustering level
a maximum of three clusters are selected and
subclusters are idenitified for clusters containing
5 or more structures.
A log file cluster.log is produced.
# cluster file # automatically generated on: Tue Sep 25 14:10:13 2001 # mode: rmsd, filetype: pdb, lsqfit: 1, selmode: cab @cluster t has 16 elements, 2 subclusters 1 1vii.sample.1.pdb 2 1vii.sample.10.pdb 3 1vii.sample.11.pdb 4 1vii.sample.12.pdb 5 1vii.sample.13.pdb 6 1vii.sample.14.pdb ...
cluster.pl -maxnum 3 -minsize 10 -centroid -centout sample 1vii.sample.*.pdb
performs hierarchical clustering of the input files
1vii.sample.*.pdb based on mutual RMSD.
At each clustering level
a maximum of three clusters are selected and
subclusters are idenitified for clusters containing
10 or more structures.
In addition the centroids of each cluster are written
out in PDB format to files beginning with sample.
# cluster file # automatically generated on: Mon Jun 18 10:09:08 2001 # mode: rmsd, filetype: pdb, lsqfit: 1, selmode: cab @cluster t has 16 elements, 3 subclusters 1 1vii.sample.1.pdb 2 1vii.sample.10.pdb 3 1vii.sample.11.pdb 4 1vii.sample.12.pdb 5 1vii.sample.13.pdb 6 1vii.sample.14.pdb ...
cluster.pl -kclust -radius 3 1vii.sample.*.pdb
performs fixed radius clustering of the input files
1vii.sample.*.pdb based on mutual RMSD.
The radius is set to 3 A.
# cluster file # automatically generated on: Tue Sep 25 14:10:58 2001 # mode: rmsd, filetype: pdb, lsqfit: 1, selmode: cab @cluster t has 16 elements, 8 subclusters 1 1vii.sample.1.pdb 2 1vii.sample.10.pdb 3 1vii.sample.11.pdb 4 1vii.sample.12.pdb 5 1vii.sample.13.pdb 6 1vii.sample.14.pdb ...
cluster.pl -kclust -mode phi -radius 30 1vii.sample.*.pdb
performs fixed radius clustering of the input files
1vii.sample.*.pdb based on phi dihedral RMSD.
The radius is set to 30 degrees.
# cluster file # automatically generated on: Tue Sep 25 14:12:19 2001 # mode: phi, filetype: pdb, lsqfit: 1, selmode: cab @cluster t has 16 elements, 7 subclusters 1 1vii.sample.1.pdb 2 1vii.sample.10.pdb 3 1vii.sample.11.pdb 4 1vii.sample.12.pdb 5 1vii.sample.13.pdb 6 1vii.sample.14.pdb ...