In the last few months, I’ve done quite a few things to advance my research project. I wrote a formal proposal, connected with a couple of people in my field, and studied existing fuzzers and fuzzing infrastructure.
At this point, I have a fairly good idea of what I want to do with my time. I plan to integrate ParmeSan-style sanitizer-guided fuzzing with in-process fuzzing. I hypothesize that this combination will be more efficient than regular coverage-guided fuzzing while being easy to use with larger projects.
Over the last couple of weeks, I’ve been exploring possible ways to combine sanitizer-guided and in-process fuzzing. This article contains all of the important takeaways I’ve picked up from my deep dive.
What approaches did I initially consider?
In my research proposal, I laid out a few possibilities for how my project would work:
Develop a strategy for integrating sanitizer guidance into in-process fuzzing.
There are multiple possible ways to make use of sanitizer guidance in in-process fuzzing. For example, an existing fuzzer like AFL could be retrofitted to include sanitizer guidance. AFL already includes integrations with the LLVM compiler infrastructure that allow it to be used in-process. Another possibility is to integrate ParmeSan, which includes sanitizer-guided fuzzing, into LLVM in the same way as AFL. Finally, the most educational option would be to build a custom fuzzer with compiler integration. None of these three options have been attempted in the past as far as the researcher is aware.
While each of these options looked interesting, I had done no more than a cursory search when I wrote this paragraph in my proposal.
Now, I’ve done a little big more research into each of these possibilities. Based on that research, here’s what the implementation process would look like for each one:
- Modifying an existing fuzzer with in-process or persistent mode support to add sanitizer guidance. This seems achievable, although I would certainly need to write some code.
git diffParmeSan’s changes on top of the version of Angora that it was forked from. I could figure out how sanitizer guidance was added and whether it would be reasonable to continue using this approach.
- Based on this information, I would port these changes to either AFL or libFuzzer, depending on which was more similar to Angora/ParmeSan. I’m pretty sure that AFL is much closer, so I would probably end up using that as a base.
- Afterwards, I could fuzz with AFL through the AFL driver in LLVM while taking advantage of my new sanitizer guidance.
- Integrating ParmeSan into LLVM. Compared to the previous option, this possibility should require fewer modifications on the fuzzer side. However, as a trade-off, it would require a new driver on the LLVM side.
- Depending on how similar ParmeSan and AFL are, I could potentially reuse a large part of the existing AFL driver in LLVM.
- ParmeSan has the same
llvm-modedirectory in its source code repository as AFL. Its code also includes a few mentions of the word “persistent”, although I don’t know if this is a vestige or simply undocumented functionality. If it works, then I have a lot less work to do.
- There might be other ways to integrate a different fuzzer’s output with libFuzzer, so I shouldn’t assume that it will be integrated just like AFL.
- Building a fuzzer from scratch. At this point, this option seems like a poor use of time—even if it’s the most educational. I would like to finish this project by the end of the school year. Writing my own fuzzer and LLVM integration from scratch would likely take quite a bit longer, now that I’ve seen that fuzzing engines aren’t exactly simple.
Of those three options, the second seems the most plausible. Not only does it work in a well-established way, but also I may be able to take advantage of code intended to work with AFL.
Given my lack of knowledge about LLVM’s fuzzing infrastructure and how it connects to third-party fuzzing engines, I read through its documentation.
How does LLVM integrate third-party fuzzers?
LLVM includes its own in-process fuzzing engine, libFuzzer, which plenty of large projects use. libFuzzer is actually linked with the program under test, unlike AFL and other non-in-process fuzzers. As a result, libFuzzer can find bugs in “deeper” parts of programs, especially complex ones like web browsers.
The libFuzzer authors recognize that users might want to try other fuzzing engines. As a result, you can integrate other fuzzing strategies with libFuzzer in one of three ways:
- Running both on the same corpus. This approach is easy to use but requires that both fuzzers can interface directly with the program. For testing specific code deep within programs that can’t be reached from the outside, this doesn’t work so well. The external fuzzer would be testing a different part of the code than libFuzzer.
- Adding a driver (or glue code) directly into the LLVM source tree. This has already been done for AFL. However, it requires code on the AFL side as well. It may or may not be feasible to add glue code for this to both LLVM and ParmeSan. I don’t know how much of the AFL glue code can be reused; I also don’t know how well Angora (and therefore ParmeSan by extension) works with persistent mode like AFL does.
- Writing a custom mutator for libFuzzer. This technique can handle compressed or encrypted input through an additional
LLVMFuzzerCustomMutatorfunction. However, this strategy would require implementing all of the sanitizer guidance logic within the fuzz target unless the fuzz target is somehow linked to ParmeSan.
It appears that my original assumption—that connecting ParmeSan to LLVM just like AFL already is would be the most straightforward approach—has held true. Of course, the most complete test will be to give it a shot for real.
After finishing this phase of my research, I’m moving to the implementation phase. I’ll attempt to integrate ParmeSan and LLVM by modifying the existing driver for AFL in LLVM’s source code. After that, I’ll run my fuzzer on other people’s fuzz targets, starting with simple tests and moving up to real-world software.
And of course, I’ll continue writing updates here, especially if I start finding bugs.