Discrete and Continuous Models and Applied Computational ScienceDiscrete and Continuous Models and Applied Computational Science2658-46702658-7149Peoples' Friendship University of Russia named after Patrice Lumumba (RUDN University)8500Extraction of Data Features for Neuro-Classifier InputOsoskovG AЛаборатория информационных технологий; Объединённый институт ядерных исследований; Joint Institute for Nuclear Researchososkov@jinr.ruBaranovD AЛаборатория информационных технологий; Объединённый институт ядерных исследований; Joint Institute for Nuclear Research-Joint Institute for Nuclear Research020320103.2253108092016Copyright © 2010,2010The problem of essential data compression to be input to ANN-classifier without loosing significant information is considered on the example of the quite substantial task of the genetic protein structure analysis, which is important for genetic biology researches in radiobiology and, especially, in agricultural. Such analysis is usually carried out by studying ElectroPhoretic Spectra (EPS) of gliadin (alcohol soluble protein) of the inspected grain cultivar. EPS digitization produces a densitogram with 4 thousands counts, which most informative features must be extracted to be input to ANN. Besides these data require special preprocessing for densitogram smoothing, pedestal eliminating, as well as compensating such digitization orocess defects as signal noise, variability of spectrum borders and illumination, their non-linear starches due to electrophoresis nonstationarity.
Several alternative approaches to features extracting were studied: (1) the densitogram coarsing into 200 averaged measurements; (2) the principal component analysis; (3) recognition of all well-pronounced peaks in order to evaluate their parameters to be input to ANN; (4)-(5) data compression by both discrete Fourier (DFT) and wavelet (DWT) transformations. These methods have been used for feature extraction from samples formed by experts for 30 different sorts. Then extracted features were used to train ANN of three-layer perceptron type. The comparative study of the recognition efficiency with data compressed by the methods listed above shows their high sensitivity to the number of sorts to be classified. Only DFT and DWT approaches could keep the efficiency on the level 95-97% up to 20 sorts.
A further development of feature extraction methods and a study of possibility to develop a hierarchy of classifying ANNs are intended.artificial neural networksclassificationgenetic analysiselectroforetic spectrumdata compressionfast Fourier transformprincipal component analysisdiscrete wavelet transformискусственные нейронные сетиклассификациягенетический анализэлектрофоретический спектрсжатие данныхбыстрое преобразование Фурьеметод главных компонентдискретное вейвлет-преобразование