2018-10-25 research

Frost

I'm working on some cool research right now and am in the process of writing it up nicely into a paper to be submitted to a conference, so here is a little taste of the abstract and intro.

Abstract

With the recent and continuing boom in computer science enrollments, the need for effective automatic plagiarism detection tools has also grown. By far the most popular such tool is Moss, which is widely used to detect code plagiarism. This research is the first to demonstrate that Moss can be effectively defeated without resorting to obfuscation. Our system, FROST, constitutes a surprisingly simple and entirely automated attack on Moss that takes a program and alters it so that Moss will not indicate that it has been plagiarized. FROST automatically performs a series of mutation operations that retain the semantics and most of the original structure of code while directly undermining Moss's plagiarism detection algorithm. We show that, in a matter of minutes, FROST can produce a non-obfuscated variant of an original program that yields unsuspiciously low scores typical of non-plagiarized code.

We present a simple countermeasure to FROST that effectively undoes these and other source-level transformations, leveraging the fact that optimizing compilers are effective canonicalizers of code. For languages amenable to optimizing compilation (including C, C++, Java, and JavaScript), comparing the assembly code of FROST-transformed programs to that of their originals results in high scores that are clear indicators of plagiarism.

Introduction

In the past decade, there has been a large boom in computer science enrollments. The current boom has so far exceeded previous surges in computer science enrollments that the Computing Research Association created a committee solely dedicated to investigating this boom. In 2015, the CRA Enrollment Committee reported a near 185% increase in computer science undergraduate program enrollments since 2006. With the increase of program enrollments, larger class sizes have become inevitable and previous classroom management tools and techniques have been forced to update accordingly. One such classroom management technique is manual grading, as it does not scale with large class sizes.

To keep up with the larger class sizes, instructors and teaching assistants have become increasingly dependendent on automatic grading tools. Automatic grading allows instructors to grade large corpuses much faster than manually grading, but it also takes away the requirement that instructors view student code. In many cases this can be a good thing, but it also makes it easier for students to plagiarize their code.

In parallel to the surge of computer science enrollments, publicly available code resources have also been growing. Sites like Stack Overflow and Stack Exchange offer students abundant resources for coding assistance and tips, but they also fall prey to students looking to cheat on assignments. Using these sites, it is possible to find fully prepared assignment solutions and code. Additionally, code-hosting sites like Github also fall prey to students looking to cheat, as it is possible to find code solutions hosted on these sites by just keyword matching. There also exists sites like Chegg that are specifically meant to host assignment solutions. The heightened amount of these resources, in combination with the trend towards automatic grading tools, have started to aid students in committing software plagiarism.

From a few different reports in recent years, universities have reported between 10 and 70% of their students have cheated on coding assignments. Some of these universities are Yale, Brown, UC Berkeley, Stanford, and Harvard. To combat software plagiarism in coding courses, many instructors and assistants have been using automatic plagiarism detectors. The most commonly used tool is Moss (Measure of Software Similarity), from Alex Aiken of Stanford. Though it was developed in 1994, the only demonstrated attacks on Moss have resorted to source code obfuscation. Though these attacks are effective at defeating Moss, it is unfeasible to use obfuscation as a method to thwart Moss in the classroom setting, as obfuscated code is atypical of course assignments and could raise suspicious if sampled and viewed by course staff.

Our research introduces FROST, an entirely automated attack on Moss that takes an input program and creates a non-obfuscated variant such that, when compared with the original, Moss will produce similarity scores akin to non-plagiarized code. For any source language amenable to optimizing compilation, users provide a source program and a target number of mutations (or target Moss score), and FROST will perform a series of mutation operations to meet the target whilst retaining the program semantics. FROST will automatically compare the generated variant and the original source using Moss and return the Moss scores.

FROST effectively hides software plagiarism without obfuscation by directly undermining Moss's plagiarism detection algorithm. FROST takes advantage of the fact that Moss essentially makes comparisons at a large granularity. After the input programs are tokenized, Moss compares the hashes of every k-length window of tokens, rather than comparing every token. Of all the hashes that Moss produces over all of the input files, Moss detects similarity if any of the hashes match. Directly attacking this methodology, FROST simply creates a small mutation within enough of the k-length windows of tokens. One small mutation will change the hash of the window, resulting in mismatching hashes across files and therefore a lower Moss similarity score. With enough small mutations, Moss will produce an unsuspiciously low similarity score akin to non-plagiarized programs.

In order to retain effectiveness of Moss in light of our attack, we present a simple countermeasure to FROST that essentially canonicalizes the programs by undoing the program transformations produced by FROST. Since FROST creates variants that retain the original algorithm, semantics, and general structure of the source file, we leverage compiler optimizations to normalize the programs. Comparing the resulting assembly files using Moss yields high similarity scores indicative of plagiarized code.