Sniffing packets using BPF
I'm insatiably curious. It's hard for me to not wonder how something works. If I see even a hint of something interesting, I will find out how it works.
One of the few pieces of software I keep running 24/7 is a Perl script called MySQL query sniffer. It watches your network interface of choice and dumps out the query from any packet containing a MySQL query. This is a very handy trick for debugging database issues when your software says it's execute a query but you want to know exactly what that query looks like to the database. I find it much more convenient for a few reasons:
- Figuring out which queries are mine on a shared dev database is challenging
- I may not have permission to turn on query logging
- Query logging slows down the database
- Query logging can take up quite a lot of space
- Sometimes the MySQL server restart required to turn on query logging isn't an option
I usually leave this script running in the background on my MacBook Pro at work all day long. Starting the script though not using sudo give the following error:
durandal:~ ttrueman$ ./mysqlsniff-0.10.pl en0
(no devices found) /dev/bpf0: Permission denied
I thought I told mysqlsniff to listen on en0, what the hell is this bpf0 device? Curiosity got the better of me, so you're going to hear from me just what this bpf0 really is.
The Berkley Packet Filter is an abstraction that sits between the raw network interface and application software. It allows applications to access the raw interface if they want or just see relevant packets. The real win with the Berkley Packet Filter is its speedy filtering can allow an application to just see packets relevant to itself. The benefits of this are two-fold: lower CPU overhead from less packets to handle and less packets in the device buffer, which means the buffer is less likely to fill up and drop packets. Wikipedia actually has a short but helpful article on it actually.
If you're really curious about the benefits of using the Berkley Packet Filter this relatively old but not too long research paper does a good job of elaborating just how expensive it is to process packets with a CPU. Just imagine a few dozen instructions per packet times a gigabit Ethernet and try not to cringe.
What kind of things have you just had figure out how they worked?
