Monday, January 25, 2016

Humans Rock, Computers are Dumb

People Are Good at Some Things

Humans are very well adapted to solve complex problems quickly. So quickly, in fact, that often times we don't realize that we are solving problems.

Consider motion detection - how do you know when something has moved? Think about a leaf blowing across the lawn. You see the leaf, and you see the lawn. You see the leaf blowing across the lawn. But how do you know the leaf moved? You just know.

You probably can't come up with an explanation other than "it just moved". But how do you know it moved? Your brain has a vast understanding of the world. You know what a leaf is. You know the properties of a leaf - light, crisp, blowable. You know the properties of a lawn: firm, stays on the ground and not blowable. You also have an understanding of things that are not visible - the wind, for example. You know that wind exists, that it can change direction and speed and that it can blow light objects (like leaves, which are blowable). When you put all these (somewhat complex) concepts together, you form an understanding that the leaf is blowable, the lawn is not blowable and wind can blow the leaf across the lawn.

Yet, that still is not sufficient: you still need to realize that the leaf is not where it used to be. You can see the leaf, you can observe where the leaf resides in individual points in time. And you can reason that if the leaf appears to change position from one moment to the next, then the leaf is probably being blown by the wind. And all this happens in a fraction of a second, without you thinking about it.

It is beautiful, really - your brain can process and solve very complex problems without even consciously thinking about it. Even more incredible - your brain can actually predict where the leaf will go next. By using contextual clues such as the current path of the leaf, how hard the wind is blowing, any obstacles that might be in the way and past experiences with blowing leaves, your brain puts together a flight path of the leaf. And your brain is so good at this that you don't even realize that you just solved a very complicated calculus problem less time than it takes to blink.

But Computers are Dumb at Those Same Things

Even though you are smart, your computer is dumb. Very dumb. Ever try to have a discussion with your computer about a leaf? Probably not - most computers aren't built like that, and most computers have no concept of leafs, wind or even how to communicate with a human. Or what a human is. Or that the notion of communication exists. Short story: computers don't know how to figure out that a leaf is blowing across the lawn. The computer simply has no knowledge of any of the concepts that your brain was able to resolve in a fraction of a second in order to solve the same problem.

Computers are a far cry from Skynet. If a computer can't figure out what wind is, how can your computer know to launch nuclear missiles at humans in order to insure its own survival?

Motion Detection is Super Easy... for Humans

Motion detection is easy. Very easy. At least, motion detection is very easy to do very poorly. Not to be confused with motion tracking (following an object as it moves), which is a bit more complicated.

Keep in mind that in order to detect motion, computers do not need to know what is moving (computers do not need to have a notion of a leaf, or a lawn or the wind). Which is good, because computers are dumb (remember?). But, computers are fast at simple tasks. Incredibly fast. Like calculating the differences in images. As it turns out, being able to find the differences between two consecutive images of the same thing is an easy way to figure out if something has moved.

Our "algorithm" is a set of instructions that are followed in order; at the end of this algorithm, we will have determined if a motion event has occurred, based on two images.

For each image
  1. Convert the image into black and white
  2. Divide the image into grid boxes
  3. Count how many black pixels are in the each grid box
  4. Compare the number of black pixels in any grid box to the corresponding grid box in some other image
  5. If the number of black pixels between two corresponding grid boxes is very different, mark that grid box as experiencing a motion event
Please forgive this crude example. Below, I have created a 10x10 grid:

Why do we use black and white? Because it is easy (and, as it turns out, a bit more tolerant*)! If we had to track the number of pixels in each grid for each possible color, we would be dealing with 64 bits for each pixel. This translates to roughly 8 megabytes of memory for each 10 megapixel image. That is a lot of data to process, and keeping track of those colors is confusing - keep in mind that you can express about 10,000,000,000,000,000,000 (10 billion billion) different colors for each 64 bit pixel. By my calculations, that is in the neighborhood of "a crap ton". Why should we use a complex data structure to store all that data? Instead, I can use 2 numbers: one number for the black pixel count and one number for the white pixel count. Now my storage requirement is simply 2n (where n is the number of grid boxes). Even if there are 10,000 grid boxes, 20,000 (2n, or 2 x 10,000) is a far cry from a "crap ton". But we can actually skate by with 1n since we can calculate both the black and white pixel counts for any grid box if we know one of the values and the size of the grid box (the inverse would be "the size of the grid box - n").

Black and White is not Grey

Just to clarify, when I say black and white, I mean black and white. Not Casablanca black and white. Each pixel is either black or white - there is no gray. Images will look pretty crappy when converted to black and white. Remember that these images are much easier for our algorithm to "understand". Here is a sample of the original picture, the black and white picture and the grey-scale image.

Another crude example - the same image (a color rainbow) in color, grey-scale and black and white:
Color Rainbow
Grey-scale Rainbow
Black and White Rainbow
How we convert an image to black or white is pretty simple: convert all the pixels to either black or white. If the pixel is bright enough, we consider it white. Otherwise, we consider the pixel black.

There are several ways to determine the "brightness" (called "luminosity" in nerd-speak) of a pixel, but the methods are generally simple and involve multiplying each of the component colors (red, green, blue) by some constants and adding the values together before rounding up or down. Something like this random formula I found on the Internet just now:

(0.299 * red) + (0.587 * green) + (0.114 * blue)

Notice how boring that is. No magic here. Computers are great at calculations like these.

Grids and Boxes

Remember that grid box from up above? This is what it looks like with a creepily-proportioned stick figure:

Original Image
I have cleverly created the image so each grid has all the pixels of the same color. This will make the example easier to follow; also the image has already been converted to black and white.

Multiple Images

Motion detection, even poor motion detection, requires multiple images to compare. But simply put, we just compare the black counts in each grid box from one image to the next. If the value changes by more than a certain amount, we can say that some motion has occurred.

Now, adding an element of motion (notice how his right (your left) hand has moved - B3 has moved to B5)
Image After Motion

Following along with the "algorithm", the black pixel totals for the grid boxes in the original image:
0000000000
0000110000
0100110000
0111111110
0001111010
0001111000
0001111000
0001001000
0001001000
0001001000

The black pixel counts for the grid boxes in the image after motion - I have underlined the changed counts:
0000000000
0000110000
0000110000
0111111110
0101111010
0001111000
0001111000
0001001000
0001001000
0001001000

We can clearly see that two grid boxes indicate a motion event (grid boxes B3 and B5).

And just like that, we have motion detection!

Betterifying It

Cool - we just detected motion! And it was simple. But can we do better? Certainly! We could figure out a good threshold for the differences in the grid box comparison values (is 5% change enough?).

We could decrease the size of each grid box (which would provide more resolution).

Perhaps we could require that some percentage of all the grid boxes have to detect motion before we consider a valid motion event. Even better - only consider changes in neighboring grid boxes.

My Application

While these are all valid techniques, please keep in mind that this is a gross oversimplification of a complex problem. While valid and accepted, it is not the best solution for mission critical applications. In other words: please don't sue me if you use this for anything. Ever.

That being said, I am using this concept in one of my pet projects. The project is a a home alarm system and while I have implemented a slightly more complex motion detection algorithm, the results are promising!

The End

That about wraps it up. Motion detection is simple for both humans and computers. Except humans and computers detect motion in vastly different ways. Also, humans are way better at motion detection across a variety of conditions (light, weather, wind, etc.).

*Black and white is more tolerant for this application because the binary classification of each pixel (only black or white) makes it unlikely that subtle variances in lighting, shadows or inexpensive hardware will result a false positive. Trust me, please.