Discrete Adversarial Attack to Models of Code (PLDI 2023 - PLDI Research Papers)

Who

Fengjuan Gao, Yu Wang, Ke Wang

Track

PLDI 2023 PLDI Research Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 20 Jun 2023 09:40 - 10:00 at Cypress 1 - PLDI: Security Chair(s): Limin Jia

Abstract

The pervasive brittleness of deep neural networks has attracted significant attention in recent years. A particularly interesting finding is the existence of adversarial examples, imperceptibly perturbed natural inputs that induce erroneous predictions in state-of-the-art neural models. In this paper, we study a different type of adversarial examples specific to code models, called discrete adversarial examples, which are created through program transformations that preserve the semantics of original inputs. In particular, we propose a novel, general method that is highly effective in attacking a broad range of code models. From the defense perspective, our primary contribution is a theoretical foundation for the application of adversarial training — the most successful algorithm for training robust classifiers — to defending code models against discrete adversarial attack. Motivated by the theoretical results, we present a simple realization of adversarial training that substantially improves the robustness of code models against adversarial attacks in practice.

We conduct a comprehensive evaluation on both our attack and defense methods. Results show that our discrete attack is significantly more effective than state-of-the-art whether or not defense mechanisms are in place to aid models in resisting attacks. In addition, our realization of adversarial training improves the robustness of all evaluated models by the widest margin against state-of-the-art adversarial attacks as well as our own.

DOI

https://doi.org/10.1145/3591227

Fengjuan Gao

Nanjing University of Science and Technology

China

Yu Wang

Nanjing University

China

Ke Wang

Visa Research

United States