I live in Silicon Valley. I'm really into machine learning and Formula 1. I'm 23 years old. More about me…

Subscribe in a reader

Article archives

Feel free to email me or IM me.

I've decided to try out this writing thing, which wasn't exactly my favorite part of college, but it's growing on me. I'm going to focus on writing about all things tech. I tend to write once a week, publishing early Monday morning.

Top Posts:
So I went to Startup School
Explaining my addiction
A quick foray into linear algebra and Python: tf-idf
Waking from a 64-bit nightmare
Kickin' ass: Firefox 3

Top Commentors:
Paul Stamatiou (4)
Gordon (3)
Arthur (2)
Anton Lindstrom (2)
m13 (2)

This page is valid XHTML 1.0 Strict

© 2008 Tim Trueman

The importance of being abstract

March 11, 2008 Link

powerbook

As an undergrad, I didn’t spend much time sleeping before a project was due. Then again I hadn’t spent the last two weeks working on it. Now it was 24 hours before it was due and I was hadn’t even thought about how I was going to code an mathematical expression interpreter.

The project was pretty open-ended on how it was implemented but it had to be able to parse and interpret a mathematical expression and evaluate it. The more features your software had the higher your grade, for instance having order of operations, square roots, or variables were all good ways to push your grade into the A range.

Googling turned up an interesting approach I had never heard of before, the recursive descent parser. It was a beautiful solution to the problem, but let me tell you it was hard to wrap my mind around. It’s abstract simplicity was what made it so hard to grasp how it was actually working. I had never thought of using recursion that way; layering function calls into four different levels according to the order of operation. It took hours for me to really fully understand how it worked but finally the “AH-HAH!” moment came to me. I submitted my program, drearily sleep-walked through the snow to my dorm room like a zombie, and fell asleep.

It turns out that recursive descent parser I wrote in a night let me blaze three weeks ahead of schedule in a ten week project to build a Modula-2 compiler in C++.

fluent cluster

Almost a year after graduation I find I still don’t get much sleep, although I guess that’s my own fault. I blame it on the absolute beauty of abstraction. When I started studying computer science it never dawned on me the sheer importance of abstraction. As we abstracted from assemblers to memory-managed languages, we increased the ease, manageability, and speed of development, allowing much more advanced software to be written as fast as Saturn V at escape velocity.

Anyways I sidetracked, let me get to the point I was really trying to share with you. It’s a pair of abstractions called map and reduce that go together like peanut butter and jelly.

So let’s imagine you’ve got the hankering to sum up the first ten perfect numbers. Normally you’d write something like this (in Python, as usual):

sum, array = 0, [2,3,5,7,11,13,17,19,23,29]

for num in array:
  sum = sum + (pow(2,(num-1)) * (pow(2,num) - 1))

What sucks about this is that it’s not very abstract. Why would that matter? Just bear with me and I’ll explain just how awesome this concept actually is. Let’s start by abstracting out the perfect number formula into a separate function:

array = [2,3,5,7,11,13,17,19,23,29]

def perfect(num):
  return (pow(2,(num-1)) * (pow(2,num) - 1))

perfectNums = map(perfect,array)

What’s happened here is we’ve defined a function for calculating perfect numbers, and now we’re mapping that function onto our data set, producing a resulting array of the first ten perfect numbers. But what’s the best way to sum up that array? This is where reduce comes into play:

array = [2,3,5,7,11,13,17,19,23,29,31]

def perfect(num):
  return (pow(2,(num-1)) * (pow(2,num) - 1))

def sum(x, y):
  return x + y

reduce(sum,map(perfect,array))

google datacenterNow we’ve mapped our reduce function onto our previous results, and we finally have the sum of the first ten perfect numbers. Awesome. But wait, isn’t that more code, that’s possibly even more difficult to understand than Lock, Stock and Two Smoking Barrels? Ahh, yes. Why would someone use the map/reduce abstractions? Look no further than Google, who refactored their search algorithm for map/reduce because it offered a leviathan leap forward in speed. Can you guess why?

Parallelization (what a tough word to say!). The beauty of map/reduce is that mapping a function to a data set can be broken up into many tasks. It doesn’t matter if you calculate the first perfect number then the second, or if you split it into two halves being simultaneously executed in parallel. Now imagine your data set is the web and your function is the PageRank algorithm. Apply that to your massively parallel cluster or commodity hardware and you have what is the heart of Google.

There you have it, functional programming’s gift to man: map and reduce. If you’re still confused or have questions, leave a comment!

What class did you have to write a mathematical expression interpreter for?

Gordon on March 15, 2008 at 11:29 PM

Object-oriented programming, and it was fun because we were just told to write it in C# which none of us knew. Good times.

Tim Trueman on March 15, 2008 at 11:31 PM

Have your say

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Safari hates me