`a`

Learning theory applied to Sigmoid network classification of protein biological function using primary protein structure

Pages: 898 - 904, Issue Special, July 2003

 Abstract        Full Text (174.7K)              

D. Warren - College of Information Technology The University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC 28223, United States (email)
K Najarian - College of Information Technology The University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC 28223, United States (email)

Abstract: Recently, Valiant’s Probably Approximately Correct (PAC) learning theory has been extended to learning m-dependent data. With this extension, training data set size for sigmoid neural networks have been bounded without underlying assumptions for the distribution of the training data. These extensions allow learning theory to be applied to training sets which are definitely not independent samples of a complete input space. In our work, we are developing length independent measures as training data for protein classification. This paper applies these learning theory methods to the problem of training a sigmoid neural network to recognize protein biological activity classes as a function of protein primary structure. Specifically, we explore the theoretical training set sizes for classifiers using the full amino acid sequence of the protein as the training data and using length independent measures as the training data. Results show bounds for training set sizes given protein size limits for the full sequence input compared to bounds for input that is sequence length independent.

Keywords:  Neural Networks, Protein Sequencing, DNA Sequencing, Protein Function Prediction, Gene Function Prediction, Learning Theory
Mathematics Subject Classification:  Primary: 58F15, 58F17, 58F11; Secondary: 53C35.

Received: September 2002;      Revised: March 2003;      Published: April 2003.