SCOPING ADVERSARIAL ATTACK FOR IMPROVING ITS QUALITY
Keywords:adversarial attacks, fast adversarial attack algorithm, logistic regression, neural network vulnerabilities.
AbstractContext. The subject of this paper is adversarial attacks, their types, reasons for the emergence. A simplified fast and effective logistic regression attack algorithm has been presented. The work’s relevance is explained by the fact that neural network’s critical vulnerability the so-called adversarial examples is yet to be deeply explored. By exploiting such a mechanism, it is possible to get a deliberate result from it breaking defenses of neural-network-based safety systems.
Objective. The purpose of the work is to develop algorithms for different kinds of attacks of a trained neural network with respect to preliminary the network’s weights analysis, to estimate attacked image quality loss, to perform a comparison of the developed algorithms and other adversarial attacks of a similar type.
Method. A fast and fairly efficient attack algorithm that can use either whole image or its certain regions is presented. Using the SSIM image structural similarity metric, an analysis of the algorithm and its modifications was carried out, as well as a comparison with previous methods using gradient for the attack.
Results. Simplified targeted and non-targeted attack algorithms have been built for a single-layer neural network trained to perform handwritten digit classification on the MNIST dataset. A visual and semantic interpretation of weights as pixel “importance” for recognizing an image as one class or another is given. Based on structural image similarity index SSIM an image quality loss analysis has been performed for images attacked by the proposed algorithms on the whole test dataset. Such an analysis has revealed the classes the most vulnerable to an adversarial attack as well as images, whose class can be changed by adding noise imperceptible by a human being.
Adversarial examples built with the developed algorithm has been transferred to a 5-layered network of an unknown architecture. In many cases images that were difficult to attack for the original network have seen a higher transfer rates, then the ones needed only minor image changes.
Conclusions. Adversarial examples built upon the adversarial attack scoping idea and the methodic of the input data analysis can be easily generalized to other image recognition problems which makes it applicable to a wide range of practical tasks. This way, another way of analyzing neural network safety (logistic regression included) against input data attacks is presented.
Krizhevsky A. ImageNet classification with deep convolutional neural networks / A. Krizhevsky, I. Sutskever, G. E. Hinton // Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems: 3–6 December 2012: proceedings. – Lake Tahoe, Nevada, USA: NIPS, 2012. – P. 1106–1114. DOI: 10.1145/3065386
Pang Wei Koh. Understanding Black-box Predictions via Influence Functions [Electronic resource] / Pang Wei Koh,
Percy Liang. – Access mode: http://arXiv:1703.04730.
Szegedy C. Intriguing properties of neural networks [Electronic resource] / C. Szegedy, W. Zaremba, I. Sutskever et al. – Access mode: http://arXiv:1312.6199.
Zeiler M. D. Visualizing and Understanding Convolutional Networks / Matthew D Zeiler, Rob Fergus eds.: Fleet D.,
Pajdla T., Schiele B., Tuytelaars T. // Computer Vision – ECCV 2014. Lecture Notes in Computer Science. –
Springer, Cham, 2014. – Part 1. – Vol 8689. – P. 818–833. – DOI:10.1007/978-3-319-10590-1_53
LeCun Y. The MNIST database of handwritten digits [Electronic resource] / Y. LeCun, Corinna Cortes,
Christopher J. C. Burges. – Access mode:http://yann.lecun.com/exdb/mnist/.
Goodfellow Ian J. Explaining and Harnessing Adversarial Examples [Electronic resource] / Ian J Goodfellow, J. Shlens, C. Szegedy. – Access mode:http://arXiv:1412.6572.
Kurakin A. Adversarial machine learning at scale [Electronic resource] / A. Kurakin, Ian J. Goodfellow, S. Bengio. – Access mode: http://arXiv preprint arXiv:1611.01236.
Dong Y. Boosting adversarial attacks with momentum. [Electronic resource] / Y. Dong, F. Liao, T. Pang et al. –
Access mode: http://arXiv:1710.06081.
Eykholt K. Robust Physical-World Attacks on Deep Learning Models. [Electronic resource] / K. Eykholt,
I. Evtimov, E. Fernandes et al. – Access mode: http://arXiv preprint arXiv:1707.08945v5.
Kurakin A. Adversarial examples in the physical world [Electronic resource] / A. Kurakin, I. Goodfellow,
S. Bengio. – Access mode: http://arXiv preprint arXiv:1607.02533.
Sharif M. Accessorize to a crime: Real and stealthy attacks on state-of the-art face recognition / M. Sharif,
S. Bhagavatula, L. Bauer, M. K. Reiter // Computer and Communications Security: ACM SIGSAC Conference,
Vienna, Austria, 24 –28 October 2016: proceedings. – ACM, 2016. – P. 1528–1540. DOI:10.1145/2976749.2978392
Chen Pin-Yu. ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without
Training Substitute Models / Pin-Yu Chen, Huan Zhang, Yash Sharma et al. // Artificial Intelligence and Security: the
th ACM Workshop AISec’17, Dallas, TX, USA, 30 October – 03 November 2017: proceedings. – ACM New
York, NY, USA, 2017. – P. 15–26. – DOI:10.1145/3128572.3140448
Carlini N. Towards evaluating the robustness of neural networks [Electronic resource] / N. Carlini, D. Wagner. – Access mode: http://arXiv:1608.04644 [cs.CR].
Papernot N. Technical Report on the CleverHans v2.1.0 Adversarial Examples Library [Electronic resource] /
N. Papernot, F. Faghri, N. Carlini et al. – Access mode:http://arxiv.org/abs/1610.00768.
Xiaoyong Yuan. Adversarial Examples: Attacks and Defenses for Deep Learning [Electronic resource] / Yuan
Xiaoyong, Pan He, Qile Zhu et al. – Access mode:http://arXiv:1712.07107v2 [cs.LG].
Naveed Akhtar. Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey [Electronicresource] / Naveed Akhtar, Ajmal Mian. – Access mode:http://arXiv:1801.00553 [cs.CV].
Papernot N. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples
[Electronic resource] / N. Papernot, P. McDaniel, and I. Goodfellow. – Access mode: http://arXiv preprint arXiv:1605.07277.
Image quality assessment: From error measurement to structural similarity / [Z. Wang, A. C. Bovik, H. R. Sheikh,
E. P. Simoncelli] // IEEE Trans. Image Process. – 2004. – Vol. 13, No. 4. – P. 600–612. – DOI:10.1109/tip.2003.819861
How to Cite
Copyright (c) 2019 K. S. Khabarlak, L. S. Koriashkina
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Creative Commons Licensing Notifications in the Copyright Notices
The journal allows the authors to hold the copyright without restrictions and to retain publishing rights without restrictions.
The journal allows readers to read, download, copy, distribute, print, search, or link to the full texts of its articles.
The journal allows to reuse and remixing of its content, in accordance with a Creative Commons license СС BY -SA.
Authors who publish with this journal agree to the following terms:
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License CC BY-SA that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.