Amino acid distances represents a statistical overview of pairwise distances between the 20 amino acids in chains of proteins.

Description of dataset

The dataset used for statistics contains 940 chains of 933 proteins with mutual sequence similarity of less then 25% (obtained by PDBselect). A list of protein chains could be seen at PDB list.
3D structures of all proteins in the dataset were revealed by X-ray crystallography with the resolution up to 3 and R-factor up to 0.3.

A secondary structure was assigned to each amino acid of protein chains using DSSP. Each amino acid in a chain was marked with a label of appropriate secondary structure:

Table 1 shows statistics about secondary structures in the used dataset.

Type of secondary structure Number of secondary structure Percentage with respect to the number of all the secondary structures
H 4255 9.362
B 1547 3.404
E 6286 13.831
G 1690 3.718
I 1 0.002
T 8077 17.772
S 7925 17.437
- 15668 34.474
Table 1: Statistics on secondary structures

Using SCOP database, a class was assigned to each protein in the dataset. The dataset contains proteins with classes:

Table 2 shows the statistics about SCOP classes in the used dataset.

Scop class Number of proteins Percentage with respect to the number of all proteins
a 202 21.489
b 226 24.043
c 192 20.426
d 310 32.979
e 10 1.064
Table 2: Statistics on SCOP classes of proteins in the dataset

Groups of amino acid distances

Amino acid distances can be statistically analysed taking into account:

Based on selected properties, the group of calculated amino acid distances is made. Descriptive statistics and graphics of distances are calculated and displayed for the selected group.

Pre-defined groups of distances of amino acid pairs

In order to optimize access time to the descriptive statistics for pairwise amino acids, groups of distances for analysis with pre-defined properties have been created. Criteria that were used for the definition of pre-defined groups were:

One pre-defined group contains distances with properties that correspond to one combination of possible properties. Descriptive statistics and graphics of distances were calculated for pairwise amino acids in each obtained group.

Description of statistics

The following descriptive statistics are shown for selected group of distances:

Statistics for pairwise amino acids in pre-defined groups are displayed in the form of 20x20 tables. Amino acids in tables are sorted by Kyte and Doolittle scale for the hydrophobicity of amino acids. Names of hydrophilic amino acids are highlighted in blue and names of hydrophobic amino acids are highlighted in red color in tables. Despite being hydrophobic, Pro is listed among hydrophilic amino acids since it is usually water exposed due to its conformation. On the other hand, polar reduced Cys residue is seldom present in proteins, while oxidized Cys pair shows hydrophobic behavior, thus Cys pair is considered as hydrophobic.

Description of graphics

Distribution of calculated amino acid distances for a selected group is shown by a box plot and by a histogram with additional statistical information in a form of a table.

On the histogram, distances in Å are splited into bins, with each bin representing Å, starting at the minimal distance for the pair. Each bin contains the number of amino acid pairs with the rounded distance equal to the assigned value. Only bins containing at least one amino acid pair are shown.

In additional statistical information, for each bin value of the histogram are shown:

Also, distribution of distances for the selected group can be presented by a box plot.

For easier comparison of distributions of the distances between pairwise amino acids, for every pre-defined group with distances without limits in Å was made a graphic with box plots with information about the distribution of distances for each two amino acids.

Options for statistics/graphics

Distance method

Distances between amino acids can be calculated using one of three methods:

Distance threshold

Arbitrary threshold can be selected for geometric distances for user-defined groups. For pre-defined groups, allowed thresholds are

Pairs from same or different secondary structures

Statistics can be displayed for pairs of amino acids which belong to the same secondary structure or to different secondary structures, i.e. intra-secondary structure pairs or inter-secondary structure pairs.

SCOP class of proteins

For groups with multiple porteins, statistics can be displayed for amino acids in proteins of specified SCOP class or for proteins of all SCOP classes.

Type of secondary structures

For pairs of amino acids in different secondary structures, type of secondary structure of the first amino acid in a pair and type of secondary structure of the second amino acid in a pair can be specified.
Likewise, For pairs of amino acids in a same secondary structure, the type of secondary structure of amino acids can be specified.

Protein resolution threshold

For groups with multiple porteins, the resolution threshold of proteins can be specified and it will be taken into account when calculating the descriptive statistics and graphics. Proteins in the dataset have a resolution up to 3 Å.

Lower and upper limit of distance in positions

The minimum and the maximum number of amino acids that can be in the chain between amino acids can be assigned.

Names of amino acids in pairs

Descriptive statistics and graphics can be calculated by taking into account all pairs of amino acids or just pairs of amino acids for two amino acids whose names are selected by a user.