Five Research Articles on Fuzzing Methods

Since my last update, I’ve made some progress on deciding what I want to explore as part of my research project on fuzzing. In particular, I’ve decided to look into using alternative metrics (not coverage) to guide a greybox or blackbox fuzzer.

With that idea in mind, I started looking at research papers that explore other kinds of feedback for guiding fuzzers. I also read less directly related papers that came up in my search if they seemed relevant. Here are five of the articles and papers I read in the last few days.

1. PTFuzz

Zhang, G., Zhou, X., Luo, Y., Wu, X., & Min, E. (2018). Ptfuzz: Guided fuzzing with processor trace feedback. IEEE Access, 6, 37302-37313.

Standard coverage-guided greybox fuzzing generally requires that instrumentation is inserted at compile time. Although there are binary emulation methods that can be used to fuzz binaries without instrumentation, these techniques are almost always very slow. This paper presents a method using hardware functionality (Intel Processor Trace) to approximate compiler instrumentation without recompiling or modifying the target binary. The experimental results shown later in the article make it seem as though this technique is pretty effective.

Although the fuzzer introduced in this article effectively still uses coverage-guided fuzzing, it’s an interesting technique nonetheless. Something similar could be used to fuzz closed-source operating system kernels on actual hardware at full speed since no recompiling is necessary. However, the setup required to make that work would likely differ significantly from the setup used in this article.

2. ParmeSan

Ă–sterlund, S., Razavi, K., Bos, H., & Giuffrida, C. (2020). ParmeSan: Sanitizer-guided Greybox Fuzzing. In 29th {USENIX} Security Symposium ({USENIX} Security 20).

The authors identify two issues with existing types of fuzzers:

Instead of optimizing for code coverage, the authors of this paper attempt to optimize for bug coverage using an approach they call sanitizer-guided fuzzing. Although existing fuzzing techniques make use of sanitizers to aid in bug discovery (such as using libFuzzer together with AddressSanitizer to catch more bugs), none use sanitizer information to guide the actual fuzzing process.

In both the abstract and the results section later on, the authors claim significant improvements in bug time-to-exposure (TTE) compared to state-of-the-art coverage-guided fuzzers (in this case, Angora) and extreme improvements in TTE compared to directed fuzzers (AFLGo). Reading about Angora in this paper led me to explore it as my next article.

Another interesting aspect of this research was the use of extremely specific fuzzers (like TySan for type confusion), which only uncover specific kinds of bugs but do so in a very efficient manner compared to directed greybox fuzzing.

The section on accurate dynamic control flow graph (CFG) generation mostly went over my head, but I’m sure that I’ll be able to refer back to that in the future.

Based on what I read in this article, I think that this would be an interesting path to continue exploring. Since sanitizer-guided fuzzing appears to be a very new concept, there should be significant room for further study.

3. Angora

Chen, P., & Chen, H. (2018, May). Angora: Efficient fuzzing by principled search. In 2018 IEEE Symposium on Security and Privacy (SP) (pp. 711-725). IEEE.

The Angora paper starts by discussing the disadvantages of symbolic execution-based fuzzers (they’re slow) and those that employ random mutation (they have trouble producing good-quality inputs for the program being tested). Angora uses a variety of new techniques (including mutating only the bytes that affect program state, inferring type information, and exploring input length) that allow it to trigger more program states than other fuzzers tested.

Although there is plenty of interesting information in the Angora paper (particularly the taint tracking strategy), it is still a coverage-guided fuzzer in every way that is relevant to my project. Since some of the tricks used in Angora could be applied to other kinds of fuzzing, I’ll keep it around for future reference. I’m interested in whether I will be able to reproduce the ParmeSan authors' result that sanitizer-guided fuzzing was significantly more effective than state-of-the-art coverage-guided fuzzers like Angora.

4. FuZZan

Jeon, Y., Han, W., Burow, N., & Payer, M. (2020). FuZZan: Efficient Sanitizer Metadata Design for Fuzzing. In 2020 {USENIX} Annual Technical Conference ({USENIX}{ATC} 20) (pp. 249-263).

First, this paper explains the rationale for using sanitizers in conjunction with regular coverage-guided fuzzing. Fuzzing without a sanitizer will find crashes and hangs but will almost certainly miss other issues—like memory corruption—that do not result in a crash or cause a crash much later on in program execution. For that reason, most security researchers and organizations with fuzzing programs always make sure to use appropriate LLVM sanitizers (ASan, MSan, UBSan, LeakSan, TSan, etc.) so that crashes with useful information are far more likely when bugs are triggered.

The issue with sanitizers is that they slow down the execution of short-lived programs (like the program being tested by a fuzzer) by a significant amount. While sanitizers are intended to be run for a long period of time on a program, and they are efficient in this configuration, they can be much slower when being used with programs that only run for a tiny fraction of a second.

The FuZZan paper presents an effective method for improving the speed of sanitizers on these kinds of tasks. The techniques used here could be useful in conjunction with sanitizer-guided fuzzing to improve efficiency.

5: Fuzzing With Grammar

Godefroid, P., Kiezun, A., & Levin, M. Y. (2008, June). Grammar-based whitebox fuzzing. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (pp. 206-215).

This paper presents methods for fuzzing targets that expose more code paths when provided high-quality input instead of large quantities of low-quality input. Compilers, for instance, have a lot of early stages with a lot of potential code paths. Although mutation-based fuzzing can work well for finding bugs in those very first parts of a complex program like a compiler, it has a harder time getting deeper into the program because early stages reject input that does not match syntax expectations.

While I don’t think that I’ll adopt the technique mentioned in this paper because it is slow for targets that can handle higher levels of input randomness, it might still come in handy in the future.

My Ideas: Time-Guided Fuzzing? Memory Usage-Guided Fuzzing? Explore Sanitizer-Guided Fuzzing Further?

Now that I’ve explored a lot of the literature surrounding this particular aspect of my research topic, I’m narrowing in on my exact topic.

Of the topics presented in the articles I read, sanitizer-guided fuzzing is the most interesting to me. It’s cutting-edge research—the paper was published this year—with huge potential for improving fuzzing efficiency. With the large (and growing) number of LLVM sanitizers, this approach has the potential to be used to detect all sorts of issues that regular fuzzing has trouble finding.

That said, I’ve also been thinking of other ways to guide fuzzers that I haven’t yet seen in any paper. For instance, to effectively find hangs and accidentally quadratic code paths, execution time-guided fuzzing might be useful. It would probably also be really slow, so I’m not sure how effective it would be.

Another thought I had was guiding a fuzzer using memory usage to find leaks and program states that allocate tons of RAM. Again, this is somewhat specialized and could be very inefficient. I haven’t read any papers that incorporate it.

Based on what I’ve read and thought about, I think that sanitizer-guided fuzzing is the most promising approach continue studying. I originally presumed that I would write my own fuzzer as part of this project; I’m undecided as to whether or not I will end up doing that given the significant complexity associated with an advanced fuzzer like this. However, I think that producing a half-completed project from scratch would be a better learning experience than modifying an existing project.