Motivated by the ability of a simple threading approach to
predict MHC-peptide binding, we developed a new and improved structure-based model
for which parameters can be estimated from additional sources of data about
MHC-peptide binding. In addition to the known 3D structures of a small number
of MHC-peptide complexes that were used in the original threading approach, we
included three other sources of information on peptide-MHC binding: (1) MHC
class I sequences; (2) known binding energies for a large number of MHC-peptide
complexes; and (3) an even larger binary dataset that contains information
about strong binders (epitopes) and non-binders (peptides that have a low
affinity for a particular MHC molecule).
Our model significantly outperforms the standard threading approach in binding
energy prediction. We used the resulting binding energy predictor to study viral
infections in 246 HIV patients from the West Australian cohort, and over 1000
sequences in HIV clade B from Los Alamos National
Laboratory database, capturing the course of HIV evolution over the last 20
years. Finally, we illustrate short-, medium-, and long-term adaptation of HIV
to the human immune system.
Additional Information:
Some results for the adaptive double
threading approach used as an epitope classifier.
ROC for binders/non binders
For these MHC types, for which the 3D structure is available, as
well as experimental binding energy values for a set of peptides. The
Bilinear model was trained with the peptides for which experimental binding
energies are available and later tested on classifying peptides from the Syfpeithi Database.
For these MHC
types, for which the 3D structure is available but not experimental energy
values are available, we combined the binary energy peptides from the Syfpeithi
and MHCBN databases to run our experiments. We randomly partition the data into
10 (70% training-30% Testing) partitions. Results are reported in terms of both
the average performance and the standard deviation.
For
these MHC types, for which no 3D
structure is available and not experimental energy values are available, we
combined the binary energy peptides from the Syfpeithi and
MHCBN databases to run our experiments. We randomly partition the data into 10
(70% training-30% Testing) partitions. Results are reported in terms of both
the average performance and the standard deviation. The 3D structural data is
estimated as described in the paper.
For
the examples were we don’t have any binding data available (even the
binary one) training of the bilinear model can’t be done. However just
transferring/estimating the 3D structure from one HLA type to another can
greatly improve the binding classification results using threading, which would
be equivalent to set the HLA dependant weights wmj to unity while still using the
learned global pair wise potentials fsi
. Even though we have binding data for these MHC molecules, (otherwise we
wouldn’t be able to get the ROC curves) we pretend we don’t have
them and therefore we set their wmj
weights to unity. The plots present results using the correct 3D
structure for that MHC molecule and the results using a suboptimal (not the
closest one, sequencewise) 3D structure. The plots
also show the improvement in performance when the appropriate amino acids
substitutions are done in the binding groove of the “borrowing” MHC
molecule.
Overview of prediction performance as measured for AUC values for the 5-fold cross validation sets for the
HLA-peptide binding energies database presented at [1].
ADT : refers to the adaptive double threading.
ANN : refers to the approach described in [2], found as the best predictor on [1].
Best Online refers to the best external (external to the authors on [1]) online predictor tested on [1].
Tool refers to the identity of such best online predictor.
|
HLA |
Number of Peptides |
AUC ADT |
AUC ANN |
Best Online |
Tool |
|
A_0101 |
1158 |
0.9657 |
0.9798 |
0.955 |
hla ligand |
|
A_0201 |
3090 |
0.9521 |
0.9564 |
0.922 |
hla_a2_smm |
|
A_0202 |
1448 |
0.9033 |
0.8988 |
0.793 |
multipredann |
|
A_0203 |
1444 |
0.9141 |
0.9203 |
0.788 |
multipredann |
|
A_0206 |
1438 |
0.9191 |
0.9261 |
0.735 |
multipredann |
|
A_0301 |
2095 |
0.9298 |
0.9366 |
0.851 |
multipredann |
|
A_1101 |
1986 |
0.9442 |
0.9511 |
0.869 |
multipredann |
|
A_2301 |
105 |
0.8044 |
0.8514 |
|
multipredann |
|
A_2402 |
198 |
0.7852 |
0.822 |
0.77 |
syfphethi |
|
A_2403 |
255 |
0.8784 |
0.9175 |
|
|
A_2601 |
673 |
0.9224 |
0.9552 |
0.736 |
pepdist |
|
A_2902 |
161 |
0.8866 |
0.9317 |
0.597 |
rankpep |
|
A_3001 |
670 |
0.941 |
0.945 |
|
|
A_3002 |
93 |
0.7633 |
0.744 |
|
|
A_3101 |
1870 |
0.9313 |
0.9274 |
0.829 |
bimas |
|
A_3301 |
1141 |
0.9363 |
0.9141 |
0.807 |
pepdist |
|
A_6801 |
1142 |
0.8847 |
0.8823 |
0.772 |
syfphethi |
|
A_6802 |
1435 |
0.8963 |
0.8986 |
0.643 |
mhcpred |
|
A_6901 |
834 |
0.8902 |
0.8803 |
|
|
B_0702 |
1263 |
0.9573 |
0.9636 |
0.942 |
hlaligand |
|
B_0801 |
709 |
0.854 |
0.9533 |
0.766 |
pepdist |
|
B_1501 |
979 |
0.9075 |
0.942 |
0.816 |
rankpep |
|
B_1801 |
119 |
0.8687 |
0.838 |
0.779 |
pepdist |
|
B_2705 |
970 |
0.9217 |
0.9371 |
0.926 |
bimas |
|
B_3501 |
737 |
0.8691 |
0.8739 |
0.792 |
bimas |
|
B_4001 |
1079 |
0.8933 |
0.9155 |
|
|
B_4002 |
119 |
0.8186 |
0.7524 |
0.775 |
rankpep |
|
B_4402 |
120 |
0.6775 |
0.7785 |
0.783 |
syfphethi |
|
B_4403 |
120 |
0.6239 |
0.7634 |
0.628 |
rankpep |
|
B_4501 |
115 |
0.8015 |
0.8609 |
|
|
B_5101 |
245 |
0.8474 |
0.8856 |
0.82 |
pepdist |
|
B_5301 |
255 |
0.8934 |
0.8974 |
0.861 |
rankpep |
|
B_5401 |
256 |
0.8457 |
0.9025 |
0.799 |
svmhc |
|
B_5701 |
60 |
0.832 |
0.8246 |
0.767 |
pepdist |
|
B_5801 |
989 |
0.94 |
0.96 |
0.899 |
bimas |
References:
[1] B. Peters, HH Bui, S. Frankild, M. Nielsen, C. Lundegaard, et. al.,
“A Community Resource Benchmarking Predictions of Peptide Binding to MHC-I Molecules,: PLoS Computational Biology (2006) In press. DOI 10.1371/jornal.pcbi.0020065.eor
[2] M. Nielsen, C. Lundegaard, P. Worning, SL Lauemoller, K. Lamberth, et al.
“Reliable prediction of T-cell epitopes using neural networks with novel sequence