Folding Pattern Recognition in Proteins Using ... - Semantic Scholar

Genome Informatics 13: 163–172 (2002)

163

Folding Pattern Recognition in Proteins Using Spectral Analysis Methods Carlos A. Del Carpio-Mu˜ noz

Julio Cesar Carbajal L.

[email protected]

[email protected]

Laboratory for Bioinformatics Department of Ecological Engineering, Toyohashi University of Technology, Tempaku, Toyohashi 441-8580, Japan

Abstract Divergence in sequence through evolution precludes sequence alignment based homology methodologies for protein folding prediction from detecting structural and folding similarities for distantly related protein. Homolog coverage of actual data bases is also a factor playing a critical role in the performance of those methodologies, the factor being conspicuously apparent in what is called the twilight zone of sequence homology in which proteins of high degree of similarity in both biological function and structure are found but for which the amino acid sequence homology ranges from about 20% to less than 30%. In contrast to these methodologies a strategy is proposed here based on a different concept of sequence homology. This concept is derived from a periodicity analysis of the physicochemical properties of the residues constituting proteins primary structures.. The analysis is performed using a front-end processing technique in automatic speech recognition by means of which the cepstrum (measure of the periodic wiggliness of a frequency response) is computed that leads to a spectral envelope that depicts the subtle periodicity in physicochemical characteristics of the sequence. Homology in sequences is then derived by alignment of spectral envelopes. Proteins sharing common folding patterns and biological function but low sequence homology can then be detected by the similarity in spectral dimension.. The methodology applied to protein folding recognition underscores in many cases other methodologies in the twilight zone.

Keywords: protein structure determination, spectral analysis, cepstrum, proteomics, protein folding

1

Introduction

Incompleteness in homolog coverage is the major factor precluding total success of very well and long-established methods for protein structure modeling based on sequence alignments [23] and other methodologies [2, 24, 27, 28, 18, 13, 1, 10, 9, 15, 17, 14, 22]; divergence in sequence through evolution being the natural cause. The problems that still remain, particularly, with broadly used sequence alignment techniques are those associated with the length of the sequence of amino acids, the unambiguous differentiation of domains within the sequence, the increased number of gaps to optimize the matching residues and other factors. These problems become conspicuously apparent in what is called the twilight zone of protein structure prediction, in which proteins of high degree of similarity in both structure and biological function have poor similarity in amino acid sequence; the homology among them generally ranging from about 20% to less than 30% [8]. Recently, in contrast to those sequence alignment based methodologies, a strategy based on periodicity analysis of the physicochemical properties of the residues constituting the sequences has been proposed by the authors [7], and other researchers [5, 3, 12], and has been applied to protein secondary structure [11] and function elucidation processes [5, 12, 3, 4]. The common denominator to these methodologies is a spectral analysis of the profiles of physicochemical properties of the residues in sequences. Here we describe this new methodology and present some results obtained by the methodology and presented at the CASP4 [30] contest, and the undergoing CASP5 [31] competitions, and discuss

164

Carpio-Mu˜ noz et al.

the capabilities of the methodology as well as its potentiality as a tool for genome-wide analysis and proteomics. The underlying characteristic of the methodology is that it relates the above mentioned intrinsic, though, subtle periodicity of the physicochemical characteristics of the sequence of residues with the folding characteristics of its three dimensional structure. We utilize the classification of several protein primary structures into families and super-families in the SCOP database [20]. We have found that common folding patterns exist for proteins possessing similar spectral characteristics over a set of physicochemical parameters that we have called the dominant coefficients for a particular super-family. The methodology is effective at recognizing folding characteristics of protein structures in the twilight zone [8]. Using the sets of dominant coefficients for each type of folding, automatic assignment of an unknown sequence to its putative class, family, and super-family can be performed with high probability. This constitutes the basis of a new fully automated system for protein folding recognition, and a detailed description of the methodology as well as the results obtained is given in what follows [6].

2 2.1

Methodology Spectral Representation of Protein Primary Structures

We adopted a well known technique of front-end processing in robust automatic speech recognition (ASR) the objective of which is to preserve critical linguistic information while suppressing irrelevant information such as speaker-specific characteristics, channel characteristics, and noise [21]. This analysis-synthesis technique is based on the transformation of a signal into its cepstrum which is a measure of the periodic wiggliness of a frequency response plot. The cepstrum is calculated as the logarithm of the power spectrum of a signal and leads to a logarithmic periodgram for which the spectral envelope is obtained as a smooth curve depicted by connecting the main local peaks of the minute structure of the frequency spectrum. The technique applied to the analysis the profile of physicochemical features of the protein sequence allows extraction of information in the form of the spectral envelop which is used to model the relationship between the primary and tertiary structures of a protein. Comparison of two sequences is then reduced to an alignment of spectral envelopes representing the primary structures. After obtaining the profile of physicochemical characteristics, this is converted to the frequency domain by applying a Fourier transform. For a sequence of N amino acids the physicochemical profile is represented by xn , the discrete Fourier transform (DFT) Xk is then computed by: N −1 X 2πkn (0

Folding Pattern Recognition in Proteins Using ... - Semantic Scholar

Folding Pattern Recognition in Proteins Using ... - Semantic Scholar

Suggest Documents

Pattern Recognition using Multilevel Wavelet ... - Semantic Scholar

pattern recognition using context- dependent ... - Semantic Scholar

Syntactic pattern recognition - Semantic Scholar

Evaluating Pattern Recognition Techniques in ... - Semantic Scholar

Pattern Recognition Approach in Multidimensional ... - Semantic Scholar

Universality Classes in Folding Times of Proteins - Semantic Scholar

Control chart pattern recognition using semi ... - Semantic Scholar

Using Fuzzy Pattern Recognition to Detect ... - Semantic Scholar

Isolated Malay Digit Recognition Using Pattern ... - Semantic Scholar

Pattern Recognition of International Crises using ... - Semantic Scholar

Pattern recognition using multilayer neural-genetic ... - Semantic Scholar

Pattern Recognition General support vector ... - Semantic Scholar

Unsupervised Spectral Pattern Recognition for ... - Semantic Scholar

Hypocenter location by pattern recognition - Semantic Scholar

Improving Texture Pattern Recognition by ... - Semantic Scholar

Machine Learning and Pattern Recognition ... - Semantic Scholar

Pattern Recognition Via Vasconcelos' Genetic ... - Semantic Scholar

A New Classification Pattern Recognition ... - Semantic Scholar

Pattern Recognition Supervised dimensionality ... - Semantic Scholar

Parallel Pattern Recognition Computations Within ... - Semantic Scholar

Pattern Recognition Based on Multidimensional ... - Semantic Scholar

Pattern Recognition Off-line recognition of realistic ... - Semantic Scholar

Pattern recognition using morphological class

Cuneiform Symbols Recognition Using Pattern