When I mentioned fuzzing a data structure, I was asked some questions like "how does one actually do that?", and "how does fuzzing work?"
This article intends to bring a very brief introduction to explain what is going on and how to use it. Making sophisticated use of it, is up to you, once you have started, but after having read this you should be able to make your first fuzz test.
Fuzzing is to programmatically exercise your code under test through rules that are controlled by guided randomness. You write the rules with your fuzz test, and the fuzzing library provides the guided randomness.
default constructed stack is empty PASS!
Each push grows size by one PASS!
Pop returns the pushed elements in reverse order PASS!
pop on empty throws PASS!
Let's write a fuzz test and see if it can find the problems.
This is super naïve, but when compiled with sanitizers, it may still provide useful information about things like memory access errors or undefined behavior.
When possible, I usually try to mimic the behavior of my API through another implementation, so that I can compare the actual results with expected results. For things like data structures, I find that populating them with std::unique_ptr<> is a good way to catch many errors (the sanitizers are very good at picking them up).
I also try to populate my data structure with data from the input, so that I can distinguish between values. For this a small helper is useful:
A source instance is created from the random data, and you can get any trivial type from it, as long as there's sufficient data remaining.
A fuzz test can then look like this:
Here a std::optional<> is used to hold the stack, because that makes it very easy to test destruction.
I've chosen std::vector<> as the reference to compare with, and here I use my knowledge of when it should throw. I've chosen to not test empty(), size() and capacity() as separate operations, but they are all exercised in the tests.
A more sophisticated test should also include copy and move construction/assignment, but this.
Here comes the fuzz!
Building this with -fsanitize=address,undefined,fuzzer --coverage and running the program yields this output:
So the fuzzer found a problem. Line 16 shows an important bit. The crash data. This file holds the byte sequence that caused the problem.
If you run the fuzzer with that file name as argument, it will run exactly that sequence. You can run it through a debugger. If I run it in a debugger and just run until it crashes, I get this result:
A memory access error in unique_ptr<>::operator= for when doing push.
Looking at the member data when in the push() member shows:
And the source code for push() says:
And here it is revealed, the logic for accessing the data_ member uses 1-based indexing, and the data_ array uses 0-based indexing, so when size_ was 7, it is incremented to 8 and accesses data past the end.
Fixing that error, to ensure 0-based indexing in push(), pop() and back() makes it pass this set of tests.
This was a very simple example to get you going. More complex APIs can be considerably more difficult to fuzz.
For data-structure like types, I almost always try to fuzz them, because they're reasonably easy and it tends to pay off very well.
For APIs that in their turn communicates with other APIs that you need to have test doubles for, it can be very tricky indeed.
One last thing. As shown the fuzzer runs in one thread. Start your fuzzer with -fork=X, where X is the number of threads you want it to spawn if you want to parallelize and accelerate the finding of problems. In my experience, you probably want to start single threaded because the first found bug tends to come embarrassingly early, when it starts to look good, run over night with as many threads as you have CPU cores.