linux slab poisoning 101

During development of my PCI core learns hotplug patch series, Vegard Nossum decided he wanted to subject them to kmemcheck, which is a really cool tool he’s been working on.

One of the things he found was a use-after-free bug, and as we all learned in Programming 101 (at least when 101 was taught in a manly language like C as opposed to a modern-day metro language like python who wears pants that are too tight and throws whiny emo temper tantrums aka exceptions), you really really shouldn’t be using a pointer after you free it, especially if you’re attempting to write a non-sucky operating system.

Since I’m not as smart as Vegard, I didn’t use kmemcheck, but luckily for simpletons like me, the Linux kernel comes with several other nifty built-in tools that get you pretty close, namely slab poisoning. Unfortunately, there aren’t tons of documents that explain how to actually interpret slab poisoning data, maybe because all the great coders come from other planets or Europe where they apparently teach this stuff in preschool along with naptime and “punching a girl is not an acceptable method of indicating to her that she has slightly fewer cooties than the other cootie-infested harlots”.

What follows is a very cursory introduction to reading some slab poisoning information that might get you started down the right path, but only if you’re smarter than me. Then again, I drool in my sleep so that’s a pretty low bar we’re talking about.

After you turn on slab debugging, reboot into your kernel, and reproduce the error, you’ll see some scary output that looks like this:

Keep in mind that this is one of the world’s easiest-to-debug slab corruptions because that’s about what I can handle. Anything harder than this and I’ll be reading your blog entry.

Our first clue is this line:

and it’s an important one. It tells us that this is a use-after-free, because we were expecting 0x6b and we got something different. Now when slab debugging is turned on, every time you kfree an object, the allocator fills up the old memory with 0x6b. By way of contrast, uninitialized memory is filled with 0x5a bytes, so if you see that pattern show up in your debug output, you know that you’re using some pointer before you’ve initialized it. Bad penguin! No mackerel for you!

Now that we know that it’s a use-after-free bug, we can safely ignore the stack trace at the end, because that trace is telling us where we detected the bug, not where we created it. Mmmkay?

Ok, let’s try and figure out where our bug is.

The fact that the byte it’s complaining about is a 0x6a instead of a 0x6b means something got decremented. But… what? And that brings us to our next lines, which are:

At this point, a little familiarity with your code is in order. Let’s see… what could possibly be allocated by alloc_pci_dev and freed in pci_release_dev. Hm…. what could it be…. what could it be….

Give up?

It’s a … struct pci_dev *! Ta daaa!

So we’re touching a struct pci_dev * that we really shouldn’t be touching, and in fact, we’re decrementing something in that structure. Let’s go back our debug output:

Alrighty you pinball wizards, do the math! Me? I’m lazy, so I’ll just fire up gdb and have it tell me the answer:

Ah ha, we’re decrementing something that’s offset 216 bytes in our struct pci_dev *. Now at this point, you’re pretty much done, but calculating offsets by hand is a tedious task reserved for struggling undergraduates and possibly people who abuse small animals. I’ve already taken all my required tests and got my piece of parchment that says I paid too much for that hazy crazy 6-year party, so I get to cheat and use a tool that someone else has written.

Enter pahole.

The original intent of the tool was to help developers write tighter structures and fit them into cachelines (note to self, at this point, explain to current CS undergraduates that a cacheline has nothing to do with web 2.0 and has nothing to do with bling blau) but we can use it to quickly figure out what’s at byte 216.

You can run it on any object file that references a struct pci_dev

Let’s peek inside our output file and see what we got…

Ah, so somewhere inside a struct dev.

Almost, but not quite there. 92 + 96 is 188, but 92 + 132 is 224, so we’re somewhere inside that struct kobject

Bingo! 92 + 96 is the start of our struct kobject and right at offset 28 is a struct kref. 92 + 96 + 28 = 216.

Let’s see what’s in that struct kref:

And that matches up fo’ shizzle with what the slab poisoning info was telling us — a byte was getting decremented at offset 216. That byte happened to be a refcount, which makes perfect sense — someone is trying to decrement a refcount after this object was kfreed.

Now you know what to go look for. Someone is calling pci_dev_put (or similar) after pci_release_dev has already been called.

If you’re lucky, you were writing a patch that was messing with refcounts and you simply put one too many in. At least you know where to start sprinkling printks.

If you’re unlucky and your patch has nothing to do with pci_dev refcounts, well, maybe it’s time to go to confessional (yes it’s been a while, hasn’t it?), say 8 Hail Marys, 2 Our Fathers, grab a cup of coffee, and go fix some other idiot’s code (perhaps some idiot that has a Web 6.0 blog).