PARALLEL METHOD OF BIG DATA REDUCTION BASED ON STOCHASTIC PROGRAMMING APPROACH

Authors

  • A. Oliinyk Zaporizhzhia National Technical University, Zaporizhzhia, Ukraine, Ukraine
  • S. Subbotin Zaporizhzhia National Technical University, Zaporizhzhia, Ukraine, Ukraine
  • V. Lovkin Zaporizhzhia National Technical University, Zaporizhzhia, Ukraine, Ukraine
  • M. Ilyashenko Zaporizhzhia National Technical University, Zaporizhzhia, Ukraine, Ukraine
  • O. Blagodariov Zaporizhzhia National Technical University, Zaporizhzhia, Ukraine, Ukraine

DOI:

https://doi.org/10.15588/1607-3274-2018-2-7

Keywords:

data sample, pattern recognition, feature selection, parallel computing, informativeness criterion, stochastic programming approach.

Abstract

Context. The task of automation of big data reduction in diagnostics and pattern recognition problems is solved. The object of the
research is the process of big data reduction. The subject of the research are the methods of big data reduction.
Objective. The research objective is to develop parallel method of big data reduction based on stochastic calculations.
Method. The parallel method of big data reduction is proposed. This method is based on the proposed criteria system, which allows to
estimate concentration of control points around local extrema. Calculation of solution concentration estimates in the developed criteria
system is based on the spatial location of control points in the current solution set. The proposed criteria system can be used in stochastic
search methods to monitor situations of excessive solution concentration in the areas of local optima and, as a consequence, to increase the diversity of the solution set in the current population and to cover the search space by control points in a more uniform way during
optimization process.
Results. The software which implements the proposed parallel method of big data reduction and allows to select informative features
and to reduce the big data for synthesis of recognition models based on the given data samples has been developed.
Conclusions. The conducted experiments have confirmed operability of the proposed parallel method of big data reduction and allow
to recommend it for processing of data sets for pattern recognition in practice. The prospects for further researches may include the
modification of the known feature selection methods and the development of new ones based on the proposed system of criteria for control points concentration estimation.

References

Jensen R., Shen Q. Computational intelligence and feature

selection: rough and fuzzy approaches. Hoboken, John Wiley &

Sons, 2008, 339 p. DOI: 10.1002/9780470377888.

Lee J. A., Verleysen M. Nonlinear dimensionality reduction. New York, Springer, 2007, 308 p. DOI: 10.1007/978-0-387-39351-3.

Mulaik S. A. Foundations of Factor Analysis. Boca Raton, Florida, CRC Press, 2009, 548 p.

Oliinyk A. Production rules extraction based on negative selection, Radio Electronics, Computer Science, Control, 2016, Vol. 1, pp. 40–49. DOI: 10.15588/1607-3274-2016-1-5.

McLachlan G. Discriminant Analysis and Statistical Pattern

Recognition. New Jersey, John Wiley & Sons, 2004, 526 p.

DOI: 10.1002/0471725293.

Bow S. Pattern recognition and image preprocessing. New York, Marcel Dekker Inc., 2002, 698 p. DOI: 10.1201/9780203903896.

eds. Sammut C., Webb G. I. Encyclopedia of machine learning. New York, Springer, 2011, 1031 p. DOI: 10.1007/978-0-387-30164-8.

Andrew Pavlo, Paulson E., Rasin A., Abadi D. J., DeWitt D. J. A comparison of approaches to large-scale data analysis,

International Conference on Management of Data, 2009,

pp. 165–178. DOI: 10.1145/1559845.1559865.

Oliinyk A. A., Skrupsky S. Yu., Shkarupylo V. V., Subbotin S. A. The model for estimation of computer system used resources while extracting production rules based on parallel computations, Radio Electronics, Computer Science, Control, 2017, No. 1, pp. 142–152. DOI: 10.15588/1607-3274-2017-1-16.

Sulistio A., Yeo C. S., Buyya R. Simulation of Parallel and

Distributed Systems: A Taxonomy – and Survey of Tools,

International Journal of Software Practice and Experience. Wiley Press, 2002, pp. 1–19.

Shin Y. C., Xu C. Intelligent systems : modeling, optimization, and control. Boca Raton, CRC Press, 2009, 456 p. DOI: 10.1201/9781420051773.

Oliinyk A. A., Subbotin S. A., Skrupsky S. Yu., Lovkin V. M.,

Zaiko T. A. Information Technology of Diagnosis Model Synthesis Based on Parallel Computing, Radio Electronics, Computer Science, Control, 2017, No. 3, pp. 139–151.

Kira K., Rendell L. A practical approach to feature selection,

Machine Learning : International Conference on Machine

Learning ML92, Aberdeen, 1–3 July 1992 : proceedings of the

conference. New York, Morgan Kaufmann, 1992, pp. 249–256.

DOI: 10.1016/B978-1-55860-247-2.50037-1.

Shitikova O. V., Tabunshchyk G. V. Method of Managing

Uncertainty in Resource-Limited Settings, Radio Electronics,

Computer Science, Control, 2015, No. 2, pp. 87–95. DOI:

15588/1607-3274-2015-2-11.

Guyon I., Elisseeff A. An introduction to variable and feature selection, Journal of machine learning research, 2003, No. 3, pp. 1157–1182.

Hyvarinen A., Karhunen J., Oja E. Independent component

analysis. New York, John Wiley & Sons, 2001, 481 p. DOI:

1002/0471221317.

Oliinyk A. A., Skrupsky S. Yu., Shkarupylo V. V., Blagodariov O. Parallel multiagent method of big data reduction for pattern recognition, Radio Electronics, Computer Science, Control, 2017, No. 2, pp. 82–92.

Bezdek J. C. Pattern Recognition with Fuzzy Objective Function Algorithms. N.Y., Plenum Press, 1981, 272 p. DOI: 10.1007/978-1-4757-0450-1.

Oliinyk A., Skrupsky S., Subbotin S., Blagodariov O., Gofman Ye. Parallel computing system resources planning for neuro-fuzzy models synthesis and big data processing, Radio Electronics, Computer Science, Control, 2016, Vol. 4, pp. 61–69. DOI: 10.15588/1607-3274-2016-4-8.

Zaigham Mahmood Data Science and Big Data Computing:

Frameworks and Methodologies, Springer International

Publishing, 2016, pp. 332. DOI: 10.1007/978-3-319-31861-5.

Subbotin S., Oliinyk A., Oliinyk O. Noniterative, evolutionary and multi-agent methods of fuzzy and neural network models synthesis : monograph. Zaporizhzhya, ZNTU, 2009, 375 p. (In Ukrainian).

Subbotin S., Oleynik A. Entropy Based Evolutionary Search for Feature Selection, The experience of designing and application of CAD systems in Microelectronics : IX International Conference CADSM-2007, 20–24 February 2007 : proceedings of the conference. Lviv, 2007, pp. 442–443. DOI: 10.1109/CADSM.2007.4297612.

How to Cite

Oliinyk, A., Subbotin, S., Lovkin, V., Ilyashenko, M., & Blagodariov, O. (2018). PARALLEL METHOD OF BIG DATA REDUCTION BASED ON STOCHASTIC PROGRAMMING APPROACH. Radio Electronics, Computer Science, Control, (2). https://doi.org/10.15588/1607-3274-2018-2-7

Issue

Section

Neuroinformatics and intelligent systems