DOI: https://doi.org/10.15588/1607-3274-2018-2-7

PARALLEL METHOD OF BIG DATA REDUCTION BASED ON STOCHASTIC PROGRAMMING APPROACH

A. Oliinyk, S. Subbotin, V. Lovkin, M. Ilyashenko, O. Blagodariov

Abstract


Context. The task of automation of big data reduction in diagnostics and pattern recognition problems is solved. The object of the
research is the process of big data reduction. The subject of the research are the methods of big data reduction.
Objective. The research objective is to develop parallel method of big data reduction based on stochastic calculations.
Method. The parallel method of big data reduction is proposed. This method is based on the proposed criteria system, which allows to
estimate concentration of control points around local extrema. Calculation of solution concentration estimates in the developed criteria
system is based on the spatial location of control points in the current solution set. The proposed criteria system can be used in stochastic
search methods to monitor situations of excessive solution concentration in the areas of local optima and, as a consequence, to increase the diversity of the solution set in the current population and to cover the search space by control points in a more uniform way during
optimization process.
Results. The software which implements the proposed parallel method of big data reduction and allows to select informative features
and to reduce the big data for synthesis of recognition models based on the given data samples has been developed.
Conclusions. The conducted experiments have confirmed operability of the proposed parallel method of big data reduction and allow
to recommend it for processing of data sets for pattern recognition in practice. The prospects for further researches may include the
modification of the known feature selection methods and the development of new ones based on the proposed system of criteria for control points concentration estimation.

Keywords


data sample; pattern recognition; feature selection; parallel computing; informativeness criterion; stochastic programming approach.

References


Jensen R., Shen Q. Computational intelligence and feature

selection: rough and fuzzy approaches. Hoboken, John Wiley &

Sons, 2008, 339 p. DOI: 10.1002/9780470377888.

Lee J. A., Verleysen M. Nonlinear dimensionality reduction. New York, Springer, 2007, 308 p. DOI: 10.1007/978-0-387-39351-3.

Mulaik S. A. Foundations of Factor Analysis. Boca Raton, Florida, CRC Press, 2009, 548 p.

Oliinyk A. Production rules extraction based on negative selection, Radio Electronics, Computer Science, Control, 2016, Vol. 1, pp. 40–49. DOI: 10.15588/1607-3274-2016-1-5.

McLachlan G. Discriminant Analysis and Statistical Pattern

Recognition. New Jersey, John Wiley & Sons, 2004, 526 p.

DOI: 10.1002/0471725293.

Bow S. Pattern recognition and image preprocessing. New York, Marcel Dekker Inc., 2002, 698 p. DOI: 10.1201/9780203903896.

eds. Sammut C., Webb G. I. Encyclopedia of machine learning. New York, Springer, 2011, 1031 p. DOI: 10.1007/978-0-387-30164-8.

Andrew Pavlo, Paulson E., Rasin A., Abadi D. J., DeWitt D. J. A comparison of approaches to large-scale data analysis,

International Conference on Management of Data, 2009,

pp. 165–178. DOI: 10.1145/1559845.1559865.

Oliinyk A. A., Skrupsky S. Yu., Shkarupylo V. V., Subbotin S. A. The model for estimation of computer system used resources while extracting production rules based on parallel computations, Radio Electronics, Computer Science, Control, 2017, No. 1, pp. 142–152. DOI: 10.15588/1607-3274-2017-1-16.

Sulistio A., Yeo C. S., Buyya R. Simulation of Parallel and

Distributed Systems: A Taxonomy – and Survey of Tools,

International Journal of Software Practice and Experience. Wiley Press, 2002, pp. 1–19.

Shin Y. C., Xu C. Intelligent systems : modeling, optimization, and control. Boca Raton, CRC Press, 2009, 456 p. DOI: 10.1201/9781420051773.

Oliinyk A. A., Subbotin S. A., Skrupsky S. Yu., Lovkin V. M.,

Zaiko T. A. Information Technology of Diagnosis Model Synthesis Based on Parallel Computing, Radio Electronics, Computer Science, Control, 2017, No. 3, pp. 139–151.

Kira K., Rendell L. A practical approach to feature selection,

Machine Learning : International Conference on Machine

Learning ML92, Aberdeen, 1–3 July 1992 : proceedings of the

conference. New York, Morgan Kaufmann, 1992, pp. 249–256.

DOI: 10.1016/B978-1-55860-247-2.50037-1.

Shitikova O. V., Tabunshchyk G. V. Method of Managing

Uncertainty in Resource-Limited Settings, Radio Electronics,

Computer Science, Control, 2015, No. 2, pp. 87–95. DOI:

15588/1607-3274-2015-2-11.

Guyon I., Elisseeff A. An introduction to variable and feature selection, Journal of machine learning research, 2003, No. 3, pp. 1157–1182.

Hyvarinen A., Karhunen J., Oja E. Independent component

analysis. New York, John Wiley & Sons, 2001, 481 p. DOI:

1002/0471221317.

Oliinyk A. A., Skrupsky S. Yu., Shkarupylo V. V., Blagodariov O. Parallel multiagent method of big data reduction for pattern recognition, Radio Electronics, Computer Science, Control, 2017, No. 2, pp. 82–92.

Bezdek J. C. Pattern Recognition with Fuzzy Objective Function Algorithms. N.Y., Plenum Press, 1981, 272 p. DOI: 10.1007/978-1-4757-0450-1.

Oliinyk A., Skrupsky S., Subbotin S., Blagodariov O., Gofman Ye. Parallel computing system resources planning for neuro-fuzzy models synthesis and big data processing, Radio Electronics, Computer Science, Control, 2016, Vol. 4, pp. 61–69. DOI: 10.15588/1607-3274-2016-4-8.

Zaigham Mahmood Data Science and Big Data Computing:

Frameworks and Methodologies, Springer International

Publishing, 2016, pp. 332. DOI: 10.1007/978-3-319-31861-5.

Subbotin S., Oliinyk A., Oliinyk O. Noniterative, evolutionary and multi-agent methods of fuzzy and neural network models synthesis : monograph. Zaporizhzhya, ZNTU, 2009, 375 p. (In Ukrainian).

Subbotin S., Oleynik A. Entropy Based Evolutionary Search for Feature Selection, The experience of designing and application of CAD systems in Microelectronics : IX International Conference CADSM-2007, 20–24 February 2007 : proceedings of the conference. Lviv, 2007, pp. 442–443. DOI: 10.1109/CADSM.2007.4297612.


GOST Style Citations


1. Jensen R. Computational intelligence and feature selection: rough
and fuzzy approaches / R. Jensen, Q. Shen. – Hoboken: John Wiley
& Sons, 2008. – 339 p. DOI: 10.1002/9780470377888.
2. Lee J. A. Nonlinear dimensionality reduction / J. A. Lee,
M. Verleysen. – New York : Springer, 2007. – 308 p. DOI:
10.1007/978-0-387-39351-3.
3. Mulaik S. A. Foundations of Factor Analysis / S. A. Mulaik. – Boca
Raton, Florida : CRC Press. – 2009. – 548 p.
4. Oliinyk A. Production rules extraction based on negative selection
/ A. Oliinyk // Radio Electronics, Computer Science, Control. –
2016. – Vol. 1. – P. 40–49. DOI: 10.15588/1607-3274-2016-1-5.
5. McLachlan G. Discriminant Analysis and Statistical Pattern
Recognition / G. McLachlan. – New Jersey : John Wiley & Sons,
2004. – 526 p. DOI: 10.1002/0471725293.
6. Bow S. Pattern recognition and image preprocessing / S. Bow. –
New York : Marcel Dekker Inc., 2002. – 698 p. DOI: 10.1201/
9780203903896.
7. Encyclopedia of machine learning / [eds. C. Sammut, G. I. Webb]. –
New York : Springer, 2011. – 1031 p. DOI: 10.1007/978-0-387-
30164-8.
8. A comparison of approaches to large-scale data analysis /
[A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi et al] // International
Conference on Management of Data. – 2009. – P. 165–178.
DOI: 10.1145/1559845.1559865.
9. The model for estimation of computer system used resources
while extracting production rules based on parallel computations
/ [A. A. Oliinyk, S. Yu. Skrupsky, V. V. Shkarupylo, S. A. Subbotin]
// Радіоелектроніка, інформатика, управління. – 2017. – № 1.
– С. 142–152. DOI: 10.15588/1607-3274-2017-1-16.
10. Sulistio A. Simulation of Parallel and Distributed Systems: A
Taxonomy - and Survey of Tools / A. Sulistio, C. S. Yeo, R. Buyya
// International Journal of Software Practice and Experience.
Wiley Press. – 2002. – P. 1–19.
11. Shin Y.C. Intelligent systems : modeling, optimization, and control
/ C. Y. Shin, C. Xu. – Boca Raton: CRC Press, 2009. – 456 p.
DOI: 10.1201/9781420051773.
12. Oliinyk A. A. Information Technology of Diagnosis Model
Synthesis Based on Parallel Computing / [A. A. Oliinyk,
S. A. Subbotin, S. Yu. Skrupsky et al] // Радіоелектроніка, інфор-
матика, управління. – 2017. – № 3. – С. 139–151.
13. Kira K. A practical approach to feature selection / K. Kira, L.
Rendell // Machine Learning : International Conference on
Machine Learning ML92, Aberdeen, 1–3 July 1992 : proceedings
of the conference. – New York : Morgan Kaufmann, 1992. –
P. 249–256. DOI: 10.1016/B978-1-55860-247-2.50037-1.
14. Shitikova O. V. Method of Managing Uncertainty in Resource-
Limited Settings / O. V. Shitikova, G. V. Tabunshchyk // Радіо-
електроніка, інформатика, управління. – 2015. – № 2. – С. 87–
95. DOI: 10.15588/1607-3274-2015-2-11.
15. Guyon I. An introduction to variable and feature selection /
I. Guyon, A. Elisseeff // Journal of machine learning research. –
2003. – № 3. – P. 1157–1182.
16. Hyvarinen A. Independent component analysis / A. Hyvarinen,
J. Karhunen, E. Oja. – New York: John Wiley & Sons, 2001. –
481 p. DOI: 10.1002/0471221317.
17. Oliinyk A. A. Parallel multiagent method of big data reduction
for pattern recognition / A. A. Oliinyk, S. Yu. Skrupsky,
V. V. Shkarupylo, O. Blagodariov // Радіоелектроніка, інформа-
тика, управління. – 2017. – № 2. – С. 82–92.
18. Bezdek J. C. Pattern Recognition with Fuzzy Objective Function
Algorithms / J. C. Bezdek. – N.Y. : Plenum Press, 1981. – 272 p.
DOI: 10.1007/978-1-4757-0450-1.
19. Oliinyk A. Parallel computing system resources planning for
neuro-fuzzy models synthesis and big data processing / A. Oliinyk,
S. Skrupsky, S. Subbotin, O. Blagodariov, Ye. Gofman // Радіое-
лектроніка, інформатика, управління. – 2016. – № 4. – С. 61–
69. DOI: 10.15588/1607-3274-2016-4-8.
20. Zaigham Mahmood Data Science and Big Data Computing:
Frameworks and Methodologies / Zaigham Mahmood // Springer
International Publishing. – 2016. – P. 332. DOI: 10.1007/978-
3-319-31861-5.






Copyright (c) 2018 A. Oliinyk, S. Subbotin, V. Lovkin, M. Ilyashenko, O. Blagodariov

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Address of the journal editorial office:
Editorial office of the journal «Radio Electronics, Computer Science, Control»,
Zaporizhzhya National Technical University, 
Zhukovskiy street, 64, Zaporizhzhya, 69063, Ukraine. 
Telephone: +38-061-769-82-96 – the Editing and Publishing Department.
E-mail: rvv@zntu.edu.ua

The reference to the journal is obligatory in the cases of complete or partial use of its materials.