equiv="content-type" content="text/html; charset=iso-8859-1" />
Amino acid distances represents a statistical overview of pairwise distances between the 20 amino acids in chains of proteins.
The dataset used for statistics contains 940 chains of 933 proteins with mutual sequence similarity of less then 25%
(obtained by PDBselect). A list of protein chains could be seen at
PDB list.
3D structures of all proteins in the dataset were revealed by X-ray crystallography with the resolution up to 3
and R-factor up to 0.3.
A secondary structure was assigned to each amino acid of protein chains using DSSP. Each amino acid in a chain was marked with a label of appropriate secondary structure:
Table 1 shows statistics about secondary structures in the used dataset.
Type of secondary structure | Number of secondary structure | Percentage with respect to the number of all the secondary structures |
H | 4255 | 9.362 |
B | 1547 | 3.404 |
E | 6286 | 13.831 |
G | 1690 | 3.718 |
I | 1 | 0.002 |
T | 8077 | 17.772 |
S | 7925 | 17.437 |
- | 15668 | 34.474 |
Using SCOP database, a class was assigned to each protein in the dataset. The dataset contains proteins with classes:
Table 2 shows the statistics about SCOP classes in the used dataset.
Scop class | Number of proteins | Percentage with respect to the number of all proteins |
a | 202 | 21.489 |
b | 226 | 24.043 |
c | 192 | 20.426 |
d | 310 | 32.979 |
e | 10 | 1.064 |
Amino acid distances can be statistically analysed taking into account:
In order to optimize access time to the descriptive statistics for pairwise amino acids, groups of distances for analysis with pre-defined properties have been created. Criteria that were used for the definition of pre-defined groups were:
The following descriptive statistics are shown for selected group of distances:
Statistics for pairwise amino acids in pre-defined groups are displayed in the form of 20x20 tables. Amino acids in tables are sorted by Kyte and Doolittle scale for the hydrophobicity of amino acids. Names of hydrophilic amino acids are highlighted in blue and names of hydrophobic amino acids are highlighted in red color in tables. Despite being hydrophobic, Pro is listed among hydrophilic amino acids since it is usually water exposed due to its conformation. On the other hand, polar reduced Cys residue is seldom present in proteins, while oxidized Cys pair shows hydrophobic behavior, thus Cys pair is considered as hydrophobic.
Distribution of calculated amino acid distances for a selected group is shown by a box plot and by a histogram with additional statistical information in a form of a table.
On the histogram, distances in Å are splited into bins, with each bin representing Å, starting at the minimal distance for the pair. Each bin contains the number of amino acid pairs with the rounded distance equal to the assigned value. Only bins containing at least one amino acid pair are shown.
In additional statistical information, for each bin value of the histogram are shown:
Also, distribution of distances for the selected group can be presented by a box plot.
For easier comparison of distributions of the distances between pairwise amino acids, for every pre-defined group with distances without limits in Å was made a graphic with box plots with information about the distribution of distances for each two amino acids.
Distances between amino acids can be calculated using one of three methods:
Arbitrary threshold can be selected for geometric distances for user-defined groups. For pre-defined groups, allowed thresholds are
Statistics can be displayed for pairs of amino acids which belong to the same secondary structure or to different secondary structures, i.e. intra-secondary structure pairs or inter-secondary structure pairs.
For groups with multiple porteins, statistics can be displayed for amino acids in proteins of specified SCOP class or for proteins of all SCOP classes.
For pairs of amino acids in different secondary structures, type of secondary structure of the first amino acid in a pair and
type of secondary structure of the second amino acid in a pair can be specified.
Likewise, For pairs of amino acids in a same secondary structure, the type of secondary structure of amino acids can be specified.
For groups with multiple porteins, the resolution threshold of proteins can be specified and it will be taken into account when calculating the descriptive statistics and graphics. Proteins in the dataset have a resolution up to 3 Å.
The minimum and the maximum number of amino acids that can be in the chain between amino acids can be assigned.
Descriptive statistics and graphics can be calculated by taking into account all pairs of amino acids or just pairs of amino acids for two amino acids whose names are selected by a user.