Bioinformatics
Institute of Computer Science
University Freiburg
de

Data

Sequence Data - HP sequence classification via folding properties

We used a thermodynamic and kinetic feature based classificatin procedure to identify protein-like sequences in the 3D-cubic HP-model. The following properties are tested:

These properties ensure a thermodynamically stable native structure (the unique mfe) and the ability to fold into this functional conformation within a short time interval as requested by short biomolecule life cycles. Furthermore, the sequential assembly of proteins is considered. There is evidence for a co-translational folding during elongation that should restrict the accessible folding space. Thus we are only interested in sequences that are able to form their native structure via sequential folding without high energy barriers in the traversed energy landscape.
A sequence fulfilling all criteria is called protein-like. If the ground state is not reachable sequentially but via global folding at high rate is is classified as a good folder. Bad folder are not able to adopt the native structure in a short time interval. All checked sequences are non-degenerate, i.e. having a unique ground state.

Main Publications

HP in unrestricted 3D-cubic

Benchmark set for Protein Chain Lattice Fitting (PCLF) Problem

This is the benchmark set of high resolution protein structures used for benchmarking tools solving the Protein Chain Lattice Fitting (PCLF) problem (see publication below).
The test set was taken from the PISCES web server (Wang and Dunbrack, 2005). We enforced 40% sequence identity cutoff, chain length 50–300, R-factor ≤ 0.3, and resolution ≤ 1.5 A to derive a high-quality set of proteins to model. Given our requirement for side chains, C_alpha-only chains were ignored. The resulting benchmark set contains 1198 proteins exhibiting a mean length of 160.

Main Publications

Contact

In case of questions, comments or contributions to this page please contact Martin Mann