A Class of Edit Kernels for SVMs to Predict Translation
Initiation Sites in Eukaryotic mRNAs
Dr Tao Jiang
Dept of Computer Science
UC Riverside
Room 610, CBI
2:00PM, Tuesday, 6 July 2004
The prediction of translation initiation sites (TISs)
in eukaryotic mRNAs has been a challenging problem in computational
molecular biology. In this paper, we present a new algorithm to
recognize TISs with a very high accuracy. Our algorithm includes
two novel ideas. First, we introduce a class of new sequence-similarity
kernels based on string edit, called the edit kernels, for use with
support vector machines
(SVMs) in a discriminative approach to predict TISs. SVMs are among
the most powerful and popular machine learning techniques. The edit
kernels are simple and have significant biological and probabilistic
interpretations. Second, we convert the region of an input mRNA
sequence downstream to a putative TIS into an amino acid sequence
before applying SVMs to avoid the high redundancy in the genetic
code. The algorithm has been implemented and tested on previously
published data. Our experimental results on real mRNA data show
that both ideas improve the prediction accuracy greatly and our
method performs significantly better than those based on neural
networks and SVMs with polynomial kernels or Salzberg kernel.
This is joint work with Haifeng Li.
|