Constraint-based Protein Structure Prediction (CPSP)
CPSP is a collection of tools related to the prediction of optimal
structures in simple lattice-protein models like the HP-model. We support
the standard backbone models as well as
side chain models!
It is therefore embedded in our research field
Simplified Protein Models.
For constraint solving the
Gecode
library is used and needed.
Available tools
- HPstruct - Optimal structure prediction in 3D-lattices
- HPrep - Equivalence class representatives in 3D-lattices
- HPdeg - Degeneracy of HP-sequences
- HPoptdeg - Search for low degenerated HP-sequences
- HPdesign - HP-sequence design for given structure
- HPnnet - Neutral nets of HP-sequences
- HPrand - Random HP-sequence generator
- HPcompress - HP-sequence (de-)compression
- HPconvert - Lattice structure representation conversion
- HPview - HP-model lattice structure viewer
- HPseq - Converts amino acid into HP sequences
Main Publications
- Martin Mann, Sebastian Will
and Rolf Backofen.
CPSP-tools - Exact and Complete Algorithms for High-throughput 3D Lattice Protein Studies.
In BMC Bioinformatics, 9, 230, 2008.
- Martin Mann, Rolf Backofen
and Sebastian Will.
Equivalence classes of optimal structures in HP protein models including side chains.
In Proceedings of the Fifth Workshop on Constraint Based Methods for
Bioinformatics (WCB09), 2009.
- Martin Mann, Cameron Smith, Mohamad
Rabbath, Marlien Edwards, Sebastian Will,
and Rolf Backofen.
CPSP-web-tool : a server for 3D lattice protein studies.
In Bioinformatics, 25 no. 5 pp. 676--677, 2009.
- Sebastian Will and Martin Mann.
Counting protein structures by dfs with dynamic decomposition.
In Proc. of the Workshop on Constraint Based Methods for
Bioinformatics, page 6, 2006.
- Rolf Backofen and
Sebastian Will.
A constraint-based approach to fast and exact structure
prediction in three-dimensional protein models.
In Journal of Constraints, 11 no. 1 pp. 5-30, January 2006.
- Sebastian Will.
Exact, Constraint-Based Structure Prediction in Simple Protein Models.
PhD thesis, Friedrich-Schiller-Universität Jena, April 2005.
Documentation
- We provide an extensive
Frequently Asked Questions (FAQ)
section within our
CPSP-web-tools
server.
- The online HTML documentation can be found
here.
- use 'make doc' in order to create a local copy (needs installed doxygen)
- First steps:
- download, make and install the source
- download the H-core database (or use the included database 'CoreDB')
- run 'HPstruct -dbPath=... -seq=...' for structure prediction
- run 'HPstruct -help' for a list of all program parameters
Dependencies
Downloads
- (1/2) the CPSP-tool-library source code including
configure scripts
- cpsp-2.4.6.tar.gz
- 2012-02-29 - BIU(2.3.5), Gecode(1.3.0)
Bundle: cpsp-bundle-2.4.6.tar.gz
- 2012-02-29 - includes [adjusted] BIU and Gecode libraries and respective compiler settings
- cpsp-2.4.5.tar.gz
- 2011-01-19 - BIU(2.3.5), Gecode(1.3.0)
Bundle: cpsp-bundle-2.4.5.tar.gz
- 2011-01-19 - includes BIU and Gecode library
- cpsp-2.4.4.tar.gz
- 2010-06-25 - BIU(1.3.0), Gecode(1.3.0)
Bundle: cpsp-bundle-2.4.4.tar.gz
- 2010-06-25 - includes BIU and Gecode library
- cpsp-2.4.2.tar.gz
- 2009-08-18 - BIU(1.3.0), Gecode(1.3.0)
Bundle: cpsp-bundle-2.4.2.tar.gz
- 2009-08-18 (update 2010-06-23) - includes BIU and Gecode library
- Bundle: cpsp-bundle-2.3.0.tar.gz
- 2008-02-25 - includes BIU and Gecode library
- cpsp-2.2.4.tar.gz
- 2008-02-10 - BIU(1.3.0), Gecode(1.3.0)
- cpsp-2.1.2.tar.gz
- 2007-11-14 - BIU(1.3.0), Gecode(1.3.1)
- cpsp-2.1.0.tar.gz
- 2007-10-22 - BIU(1.3.0), Gecode(1.3.1)
Note: Gecode-1.3.1 crashes
on 64-bit systems. Here, you can use Gecode-1.3.0 instead.
Bundle: full
cpsp-bundle-2.1.0.tar.gz
includes BIU and Gecode library
- cpsp-2.0.0.tar.gz
- 2007-07-11 - BIU(1.2.2), Gecode(1.3.1), boost(1.33.x)
Bundle:
cpsp-bundle-2.0.0.tar.gz
includes BIU and Gecode library
- cpsp-1.4.0.tar.gz
- 2007-06-18 - BIU(1.1), ELL(0.3.2), Gecode(1.3.1), boost(1.33.x)
- cpsp-1.3.1.tar.gz
- 2007-03-29 - BIU(1.1), ELL(0.3.2), Gecode(1.3.1)
- cpsp-1.3.tar.gz
- 2007-03-15 - BIU(1.1), ELL(0.3.2), Gecode(1.3.1)
- cpsp-1.2.tar.gz
- 2007-03-14 - BIU(1.0), ELL(0.2), Gecode(1.3.0)
- cpsp-1.1.tar.gz
- 2006-11-20
- optimal and suboptimal H-cores of size 3-10 are included
(needed for HPstruct) - for further cores please use the links
below or
mail me
- (2/2) the optimal and suboptimal H-core data needed for most CPSP-tools
Contributing group members
HPstruct - Optimal structure prediction and counting
HPstruct predicts optimal structures for simple 3D-lattice proteins (HP-model).
It implements the final step of the CPSP-approach of Rolf Backofen and
Sebastian Will.
For a given HP-sequence HPstruct computes a list of optimal structures
(in absolute moves on the lattice) or counts them.
Within the latest extension of the CPSP-package (v2.2.*) we support the
prediction of optimal structures in the HP side chain model.
It is possible to generate only one optimal, all optimal, all available
structures (limited by the size of the H-core database).
For further H-core files (size 3-10 included) please use the download links
provided or
mail me.
To get a good sample set for high degenerated sequences one can constrain
the solution structures to differ either in x absolute move string positions or
lattice positions.
To see the full parameter list run the tool using '-help'.
Current status
- Support of side chain models
- Minimal distance for generated structures can be constrained
in terms of minimal differences in absolute positions or
moves (see documentation)
- Symmetry breaking - no generation or counting of symmetric
structures
- Support of cubic and face centered cubic lattice
- Binary neighboring constraints on the lattice
- Global Alldifferent constraint
- Minimal domain initialisation (hulls, P-singlet positions)
- H-core access via file based database
- H-core skipping due to insufficient P-singlet positions
- Cubic H-core skipping due to wrong even/odd position ratio
HPrep - Equivalence class representatives with minimal energy
HPrep enables the enumeration of equivalence class representatives of
optimal structures. It implements the definitions and methods introduced in
Equivalence classes of optimal structures in HP protein models including side chains.
Here, two structures are defined to be equivalent if they
do not differ
in their H-monomer placement. Thus, the equivalence definition follows the
HP energy function that does not constrain P-monomers.
HPrep enumerates one representative structure for each equivalence class
among all optimal structures for a given HP-sequence. The maximal number
of structures to enumerate can be restricted.
To see the full parameter list run the tool using '-help'.
HPdeg - Degeneracy of HP-sequences
HPdeg calculates the degeneracy of a given HP-sequence. This is the number
of optimal structures the sequence can adopt in a specific lattice.
For the calculation, the final step of the CPSP-approach of Rolf Backofen and
Sebastian Will is used as done for
HPstruct.
To handle high degenerated sequences as well and to allow testing for a
maximal degeneracy this can be constrained to an upper bound.
To see the full parameter list run the tool using '-help'.
HPoptdeg - Search for low degenerated HP-sequences
The degeneracy of HP-sequences forms funnel-like structures in the sequence
space. Local search algorithms are therefore a possibility to find local
minima.
HPoptdeg performs a Monte-Carlo search in the sequence space and finds low
degenerated HP-sequences.
To see the full parameter list run the tool using '-help'.
HPdesign - HP-sequence design for given structure
The problem HPdesign is facing is about the design of HP-sequences that fold
optimal into a given structure and have a degeneracy below a given upper bound.
The approach first uses a precalculated database of H-cores to detect
sequences that can adopt the structure as an optimal one. Afterwards
the degeneracy of the sequences is checked using the CPSP approach of
R. Backofen and S. Will.
The level of suboptimal H-cores taken into account can be restricted to
speed up the search. If no sequence is found you should increase this level
to take more sequences for tests into account.
Additionally, the H-content of the sequence can be constrained in order to
restrict the enumerated sequences.
To see the full parameter list run the tool using '-help'.
HPnnet - Neutral nets of HP-sequences
A neutral net for a given sequence S and its only optimal structure X
includes all sequences S' that can adopt X as their only optimal structure
too. Additionally, all sequences in S' have to be direct or indirect
neighbors of S. Two sequences are neighbored if they differ only in one
sequence position.
HPnnet uses for its calculation the CPSP approach of R. Backofen and S. Will
in order to check the degeneracy of a sequence neighbor and to compare its
optimal structure to X if degeneracy is 1. Per default symmetric structures
are excluded but can be included on demand.
To weaken the degeneracy criteria one can increase the maximal value allowed.
To see the full parameter list run the tool using '-help'.
HPrand - Random HP-sequence generator
HPrand generates random HP-sequence of a given length that can be
constrained in terms of H-monomer content.
To see the full parameter list run the tool using '-help'.
HPview - HP-model lattice structure viewer
HPview creates an output in CML-file format of a sequence/structure that c
an be viewed with molecule viewers like Chime or Jmol. HPview can call such
an external viewer directly.
The structure is NOT validated (if connected and selfavoiding).
If it is invalid normal execution can not be guaranteed.
The move string representation follows the encoding:
- F/B = +- x
- L/R = +- y
- U/D = +- z
Currently supported viewers for direct visualization are:
To see the full parameter list run the tool using '-help'.
HPcompress - HP-sequence (de-)compression
HPcompress allows the conversion of HP-sequences between normal/expanded
representation and a compressed one.
e.g. HHHHPPPPPH <--> 4H5PH
To see the full parameter list run the tool using '-help'.
HPconvert - Lattice structure representation conversion
HPconvert converts lattice structures between different formats.
Currently supported representation formats are:
- Absolute move string
- Relative move string
- Absolute monomer positions given in XYZ-file format
The move string representation follows the encoding:
- F/B = +- x
- L/R = +- y
- U/D = +- z
The given structure is not validated (check if connected and selfavoiding).
For invalid structures a normal tool execution cant be guarantied.
The XYZ-file format looks like that:
# Beginning with '#' marks a comment line
# The lattice positions x,y and z of each point are given
# in integer coding e.g.
0 1 0
1 1 0
1 1 -1
# EOF #
To see the full parameter list run the tool using '-help'.
HPseq - Converts amino acid into HP sequences
HPseq implements the method introduced by Kyte and Doolittle (1982) to
derive an HP sequence from a 20 letter code amino acid sequence using
hydrophobicity tables. At each position the average hydrophobicity of
a given span is calculated. Based on a certain threshold the position
is than classified (H)ydrophobic or (P)olar.
We implemented a certain number of different hydrophobicity tables as
listed in
CLC bio.