Wednesday, March 23, 2016

Fringe in Practice: How Does Sleep Affect Performance?

Does Sleep Affect Performance?

This is a great question! Conventional wisdom in the running world has taught me that it is not the night before which affects performance - the sleep two nights prior is key.

I decided to test this.

As of today, I have been recording a number of data points daily for 1 year exactly. It is apropos to investigate these numbers, then. The recoreded data points contain features such as:
  • Hours of sleep
  • Training volume (hours)
  • Calories in
  • Calories out
  • Fatigue (1-10, 1 being the least fatigued)
  • Freshness (1-10, 1 being the least fresh)
  • RPE (rating of perceived effort, 1-10, 1 being the least taxing level)
  • RPP (rating of perceived performance, 1-10, 1 being the most taxing level)
I also keep track of a number of other features (such as ATL, CTL, TSB, etc.), but these are not relevant for this discussion, as you will see briefly.

It is worth noting that I am not a fanatic about recording data - there are plenty of holes in the data which reflect times when it was not feasible to record data. For instance, ATL and CTL rely on heart rate or power data to be accurate - but I do not always use a heart rate monitor or power meter for my workouts. If I am travelling, I may not have access to the measurement tools (such as a scale).

It is also worth noting that I use specialized software to track workouts and caloric intake (in the form of macronutrients). While not a perfect representation of reality, the tool does model things reasonably well. I recently compared my actual weight differential over the past year with the estimated weight differential from the software and the difference between the two values was somewhere about 1%. I probably got lucky because hydration levels could easily account for a few pounds differential - but the point is that tracking does provide some degree of accuracy.

With that out of the way, let's get to the fringe. Using R, I quickly created a correlation plot of some of the features I suspected might affect RPP (my perception of performance):

Correlation Plot of Features Thought to Affect Performance
This plot does not really tell a whole lot. The red dots indicate that TSB is strongly and inversely related to ATL, which is basically the definition of TSB. Consequently, we should remove ATL and TSB for our next attempt. Same for ATL and RPE, also CTL and RPE. These are all expected, this correlation plot just validates that thought.

Sleep seems to be somewhat related to TSS and TSS is strongly related to RPE, calories consumed, IF and ATL. Again, these are all expected - the longer you work out per day, the harder the effort and the more food you need.

Let's try again, this time removing the strong (but obvious) correlations in order to show any correlations which may have been shaddowed by strong correlations.

Take 2: Remove obvious correlations
Now we are getting somewhere! Despite the direct correlation between calories in (KCal.in) and RPE, we see that sleep is correlated to RPE.

Time to investigate this further. I am performing these next calculations in Google Sheets (a very capable online spreadsheet). Armed with what I just learned from the correlation plots, I will look at the the covariance and correlation between some of these interesting features.

The covariance value tells us how much two values are related to each other, but that number is not scaled for comparison. The correlation is sort of like the normalized covariance and is represented in a number between -1 and 1. For both covariance and correlation, a negative value indicates an inverse relationship and a positive value indicates a direct relationship.

For example, consider that a and b have a covariance of 0.29, while c and d have a covariance of 0.98. We cannot make any claims that a and b are less related than c and d because the relationships between the two pairs may be vastly different. However, after we calculate the correlation of the two covariances, we can compare the correlation values. Continuing the example, let's say that a and b have a correlation of 0.62 and c and d have a correlation value of 0.22. We can now have more confidence when we assert that the correlation between and b is stronger than that of and d.

My end-game is to figure out what contributes to optimal performance, so I need to start defining what "optimal performance" is. For now, let's start with "low RPE and high RPP". The actual optimization will be a topic for another day, but for now, I am using a combing function which examines both RPE and RPP in order to derive a value, QWS (quality workout score). The higher QWS is, the better.


Considering the amount of sleep the night before, these are my findings:



Sleep The Night Prior
RPERPPQWS
Covariance0.290.240.02
Correlation0.130.110.12

The most important number is the bottom-right number: the "correlation" of sleep vs. QWS. The correlation is somewhat low, but perhaps still significant. 

For the next two charts, keep in mind that the slope indicates the correlation - the steeper the slope, the stronger the correlation. Plotting the values of sleep the night prior vs. QWS , we get this chart:


Sleep Two Nights Prior

RPERPPQWS
Covariance0.200.180.00
Correlation0.100.080.02

In this case, the sleep vs. QWS correlation is extremely low and probably not significant at all. Plotting this data we end up with a chart that looks like this:


My data and analysis shows that the sleep 2 nights prior is far less important than the sleep the night prior. However, the sleep the night prior is not a terribly good indicator of QWS. Perhaps we would have to look at multiple days of sleep leading up to a workout in order to get a better idea of how sleep plays into performance. Or consider more than one feature at a time (i.e. CTL, ATL and sleep). Also, using data for higher volume only days might yield better results. This is because most shorter workouts tend to be easy, steady state efforts and I suspect that I am not as likely to consider the workouts as tough.

This is certainly a fascinating topic and I would love to explore this further.

Takeaway

The sleep the night before does not appear to have a big affect on the QWS of a workout, and the sleep two days prior is essentially a non-player. Your mileage may vary.

One Last Point

NOTE: I looked at the covariance and correlation between weight and QWS as well as carry over balance (basically eating a lot of "extra" food the day before a long workout) and QWS, and these are my findings:
  • The heavier I am (within my normal range), the fresher I feel; QWS is slightly negatively correlated. I think the physiological advantage gained by being a few pounds heavier outweighs the negative affect on performance. It might be useful to figure out the optimal weight, as well.
  • Carrying extra calories into a workout has an extremely small affect QWS. Consequently, there is little need to consume extra calories the day before a big workout or event.

I also looked at how training volume correlates to QWS and found a significant inverse correlation. Simply put, the more time I work out in any one day, the less fatigued I feel. As well, more sleep leads to less fatigue.


Tuesday, March 22, 2016

Contextual Hints And Using Them For Personal Gain

Contextual Hints

These are "things that tell us more information about something". In technical speak, we call this "adding context". But you probably know this as "Facebook Tags". As in:

This image comes from a TechCrunch article

Here, the image has a "contextual hint", or additional information about the contents of the image. In these cases, we are told who is in the image. I am sure that friends of Josh and Zack know who Josh and Zack are - but I did not know. That is, until I saw the contextual hint.

Contextual hints can sometimes provide information not captured within the original item - an example may be "recommended movies" from Netflix. The recommended items for a particular movie might give us information about the current movie we are viewing - perhaps the genre of the movie may become apparent. Here is an example:

Netflix Recommended Movies Example

You stumble upon a movie called "The Bad Day" (a movie that I made up for this example). The recommended movies immediately visible are:

  • Failure to Launch
  • How to Lose a Guy in 10 Days
  • Fool's Gold


Based on those recommendations, you may conclude that "The Bad Day" is a great rom-com, likely starring Matthew McConaughey.


But what if  the recommended movies for "The Bad Day" were:
  • Saw
  • Hostel
  • The Bad Day 7 (in 3D)
Well, now you may consider that "The Bad Day" is a horror movie.


In both these examples, we can make inferences about the original item based on the information we have about related items.

Other Familiar Examples


Some other forms of contextual hints:
  • Location data encoded in photos
  • "Similar Items" recommendations from Amazon
  • Accessibility hints (notice the pop-up hint that helps explain what the image is)
  • Info Cards on movies
This image is from the Official Android Blog
  • The Chrome Omnibar (bringing awareness to justice beavers across the world)

A good contextual hints is not necessary, but should be helpful. In other words, we should be able to figure out what something is in the absence of contextual hints, but having the extra information can aid in recognition and understanding.

How This Relates to Me

Imagine that I have a lot of digital images of my friends, and I want to sort them based on who is in the image. With a small amount of code and a good image processing library, I could build an individualized facial recognition model for each person I expect to find in my collection of photos. But the problem is that I would first need to sort the photos in order to train the models - so the solution to my problem is the very problem I am trying to solve. In computer science, we call this recursion. In the normal world, we call this the chicken and egg problem.

But, if we are clever, we may be able to use existing contextual information to do a lot of the work for us. Within Facebook, people may be tagged in images; generally, the face of the person is outlined in a overlay box and the Facebook profile for the user is linked in a popup hint. This is a great form of contextual information - it is also a great way for me to extract the face of the user so that I can use it as training data. In fact, Facebook provides all the information we need - the face and a link to associated user's Facebook profile.

Warning, this next part is a little techy...

Let's figure out a general algorithm. Assume that we can use the Facebook Graph API to query for users and images (for the time being, ignore any oauth scope or permission problems).

For each image for a specific user
  • Step 1: Save the image and crop down to the overly box, resulting in a small image of the user's face
  • Step 2: Get the Facebook user id (or the username) from the URL provided in the contextual hint
  • Step 3: Optionally query the graph API for other images which contain tags of the user
  • Step 4: Repeat with other images until enough training data is present
During these steps, we might also save the full image and other contextual information, such as:
  • Any location data in the image (GPS or place information)
  • Who else is tagged in the image
  • The time and date of the photo (cross referenced with weather data and GPS data, we may be able to use this additional information to make inferences about the image - I am getting ahead of myself, but hopefully you get the point)

// Get images and metadata from Facebook using the Graph API
//
// Get the Facebook User ID
var fbProfile = fb.queryGraphForProfile(fbUsername);
// Get the profile object for the Facebook User (given the ID)
var fbID = fbProfile.fbUserId;
// Get all images in which the user is tagged
//  NOTE: The graph call will actually return a series of pages with results, but
//        for this example, we have reduced the stream to an array
var imageIDs[] = fb.queryGraphForTaggedImages(fbID);

// Go through all the images in the array
forEach(imageId in imageIDs) {
    // Get the image from the Image ID
    var image = fb.queryGraphForImage(imageId);

    // Crop the image based on tag information
    //  NOTE: This is a bit more involved because we need to query to find the crop coordinates
    //
    // It might be worthwhile to also save the original image and any metadata about the image
    // for future reference.
    //
    // Use fbProfile to associate the image with the cropped image and the Facebook profile
    saveImageToStorage(getAndCropByMetadata(image), fbProfile, image);
}

I have just written pseudo-code to automatically crowd-source Facebook image tags in order to build a database of tagged facial images custom tailored to specific people. We can now use an image processing library to build a facial recognition model for every person in my digital images. Hopefully we have enough data so we can build a decent model and validate that model with a good degree of accuracy and precision.

// Load the images from  storage (each image is associated with a Facebook profile
var allImages = getImagesFromStorage();

// Get 90% of the images
var trainingImages = sample(allImages, .90);

// The other 10% is used for validation
var validationImages = Sets.difference(allImages, trainingImages);

// Build a facial recognition model from the training data (remember, each image
// is associated with a Facebook profile; let's assume that the model associates the
// image to the Facebook UserID
var model = FancyImageLib.buildFacialRecognitionModel(trainingImages);

// Validate the model
//  NOTE: The validation is a bit more complex and we should use images from other people
//        in the validation routine in order to measure false positives
var validationResults = Validator.validate(model, validationImages);

// Hope that our model is good!
output(validationResults);

Since this is pseudo-code, we can assume that the pseudo-results are quick and accurate. This leaves us with a facial recognition model that I can use to automatically sort my digital images based on faces.

At this point, we can (hopefully) reliably sort my entire digital archive based on people in the photos. But the cool part is that the facial recognition model was built entirely from contextual hints provided by Facebook.

Takeaway

The important point to takeaway from this discussion is that contextual information can be very valuable. What makes my particular application exciting is that we can leverage otherwise boring contextual information (photo tags) in order to automatically perform some higher-order functionality (build and use facial recognition).

One Last Point

This is a blog about theory. While the ability to build individualized facial recognition models based on Facebook image tags is possible, I am unaware of any actual implementation in the wild. This does not mean such an implementation does not exist, rather that if it does exist, I have not seen it. 

Friday, March 18, 2016

AI is Just A, Corn Mazes and Training Your Model

AI

Artificial Intelligence seems to be pretty popular these days. We see it in movies, we hear about it in popular culture (Watson, the computer that kicked butt in Jeopardy) and we read about it in books.

For decades, people have been prophesying that artificial intelligence will rise up. Machines will become self aware (2001: A Space Odyssey, War Games, Terminator, etc.). This makes for a great story and movies with a lot of explosions and Schwarzenegger's, but the concept is not really grounded in fact.

The truth is, artificial intelligence really has little intelligence - insofar as actually being able to reason. Computer Scientists say that computers "learn" (a topic I hope to cover in depth later), and this is true to an extent - I suppose it depends on your definition of "learn". But Computer Scientists pretty much agree that "reasoning" is substantially harder for a computer.

Learning

When a computer "learns", the computer is looking for patterns and correlations within a set of data. If your memory was super fast and your recall was immediate, you could easily processes thousands of images and draw some conclusions about the images. Consider a case where all the images have some sort of plant in them. If you come across an image of a plant which you have never seen, you likely will be able to recognize the image as an image of a plant. How your brain stores, processes and predicts what is likely to be a plant is extraordinarily complicated.

Yet, we can "train" a computer to recognize a plant, too. It is important to understand that the computer "learns" what a plant is in a very different way. One approach might be to tell the computer that it is about to look at images of plants. For each image, the computer might evaluate the colors in the image, the shapes within the image, identify contours to better understand the relationships of the shapes and so on. After looking at thousands of images, the computer can then generate a "model" which roughly estimates what features are common across most of the images. For example, most of the images are likely to be rich in green and have somewhat rounded shapes (leaves) with some straight lines connecting the round shapes (branches).

But wait! What if we trick the computer and show it a black and white image? Based on the model that the computer has previously generated, it is very unlikely that the computer will be able to recognize the image as a plant. The failure in this case represents a fault in the model: perhaps we depended too much on color images. An improvement may be to train the computer with both color and black and white images in order to build some variation into the model.

The "how" portion of finding correlations in this example is somewhat complex, but you can conceptually think of the computer looking for "things that are the same" and assigning percentage values to these features - when presented with a new image, the features are identified and the percentages are processed. Most plants have roundish leaves, so "round things" should have a very high percentage associated with it. On the other hand, "spots" probably ranks low because a good number of plants are not spotted. And so it goes with other interesting features that the computer is programmed to look for. The key is programmed. Without instructions of what to look for, the computer simply will not be able to recognize a plant.

Fortunately, we do not have to be terribly specific when we tell the computer what to look for. By way of example, we can simply tell the computer what "shapes" are and how to identify them. Then we can instruct the computer to identify all  the shapes in the images and figure out what the most common shape is. So as long as the computer knows how to identify a rectangle or circle, the computer should be able to figure out how common these shapes are in the image.

The Vroomin Koomen And Corn Mazes (Spoiler Alert)

When I was doing my undergraduate work, I was involved with a team project that involved building and programming a robot to navigate a maze. At first, this seemed like a pretty hard task - but once we researched maze theory and built a proof-of-concept robot, we found that it is not hard at all - there is little "intelligence" when navigating a maze. At least, some mazes. You see, if the maze has no moving parts (unlike like Maze Runner) and no cycles (endless loops), then one simply applies the "right hand rule" (or "left hand rule" if one is a southpaw). The "rule" is just a simple algorithm:
  1. If you can go right, go right
  2. If you cannot go right, go straight
  3. If you cannot go straight, go left
That is it - I have given you the solution to every corn maze in the nation! While this approach is guaranteed to generate a correct result (caveat - the maze entrance and exit must be on an edge of the maze), the result may not be optimal. Moreover, corn mazes tend to have "checkpoints" within the maze. These checkpoints add some level of adventure to the maze, but also serve to draw players in from the edges - rendering the "rule" useless. A number of corn mazes also have high bridges that allow players to climb up and see the maze from a higher perspective. This may help players see parts of the maze from an advantageous position, but often times the bridges introduce "cycles" in the maze; essentially allowing for infinite loops. Think of it as trying to find your way out of a maze when you are next to a tree - if you apply the right hand rule to the tree, you will circle the tree forever.

Because bridges can introduce infinite loops, I think a good idea for a corn maze would be to have a bridge connected to a edge which must be traversed - this would introduce the possibility of an endless loop which cannot be avoided.

Once we established the algorithm, all that needed to be done was create software to drive the robot while following the "rule". This introduced some real-world programs (such as over-steering, processing real time inputs and accounting for terrain imperfections), but overall was an achievable goal. We named the robot "Vroomin Koomen" in honor of one of the university's professors whom we all held in high regard.

Reasoning

What is the difference between reasoning and learning? I like to think of reasoning as "the next level" of learning. When we reason, we recall all of the pieces of information (evidence) we have which may be relevant to the topic at hand. The sources are varied and may not be tangible, like memories or past experiences. We consider what conclusions are possible, as well as the likelihood that each conclusion is possible based on the evidence. We assimilate several classifications of data, filter out non-relevant data maybe make some seemingly complex connections based on intuition or "gut instinct". We put all this together and then we draw higher level conclusions about the data.

Takeaway

Despite going off topic a bit, the most important point I am trying to make is that we need not fear artificial intelligence. The machine apocalypse is a long time from now. Think about this: we can't even get machines to predict the weather any better than by flipping a coin. But we are finding ways to use machine learning to process large amounts of data and drive decisions a number of industries - marketing, manufacturing, banking, pharmaceutical and pretty much everywhere else.

Sunday, February 28, 2016

Quantification of Successful Concept Predictors

The 5 Core Qualities

As we discuss concepts (for the sake of this conversation, consider a "concept" as an idea, software project, or some tangible product), it becomes important to define the potential of success for a concept. I argue that there are a 5 core qualities which can be used as predictors for the success of a concept. The classification of these core qualities is not perfect and there is overlap between some of the qualities. However, it is a good starting point and will allow concepts to be compared along side each other. This is an important theme moving forward. Recall, a good deal of the content of this blog is theoretical in nature; consequently there are often times where no actual results exist in which to gauge success. It is therefore imperative that we have a means to estimate how successful an idea may become. In particular, this estimator can be used to drive which projects should be afforded more research and which projects should be tabled.

I'd like to introduce the core qualities here, but I hope to expand on them and construct more precise definitions at some point.

Value

Does the concept have inherent value or can the concept bring some value? Value is perhaps the most nebulous of the core qualities. For example, it is difficult to quantify the value obtained from a piece of software and even more challenging to compare that value against value derived from a tangible good. Nevertheless, we must strive to quantify the value so that we can predict success. Most people will probably see "value" as the most important core quality, though I suspect that it is the hardest core quality to assess.

Accessibility

Is the concept accessible? Consider cost and required resources in order to execute the concept.

This core quality is assessed with respect to the average person, not the intended concept consumer; this approach leads to less biased comparisons across concepts. Consider a domain-specific assessment for like-minded concepts.

For example, if a concept describes a manufacturing process for piston rings which will not wear over time, what are the execution requirements? Almost certainly there is a need for precision manufacturing equipment and specialized raw materials. Accessibility for the average person, then, would be low. Conversely, a concept describing a new way to tie a shoe would have a very high accessibility score because nearly everyone in the world has the equipment and ability to execute the concept.

Usability

How easy is it to execute the concept? Here, the definition of "usability" changes from concept to concept. For software, usability might describe the user interface. For a car, usability may describe steering, acceleration and braking. Some people may think of usability as a means to describe simpleness or intuitiveness.

Classic example: if the software interface is not intuitive and the instruction manual is written in Klingon, then the concept scores low with respect to usability because most people cannot easily use the software.

As a note, when I am involved with designing software, I often apply the "mother rule" to common or complicated functionality. I ask whether or not my mother could use the functionality without needing someone to explain it to her. This turns out to be a good litmus test for other developers on my team - perhaps because other developers can relate to the situation. To wit, most developers have fixed their mothers computer, fielded calls about email attachments or configured a new computer.

Awareness

How are others made aware of the concept? It does not matter how awesome my concept is or how easy it is to implement or execute - if no one knows about my concept, it may as well not exist.

Bringing awareness to a concept can sometimes be challenging. It is not my intent to discuss how to generate awareness; rather, my intent is to simply state that awareness of a concept is fundamental in predicting success. More to the point, having the ability to generate awareness is needed.

Reliability

Is the concept reliable? Another nebulous determinant, but still important. For a car manufacturer, the car needs to run well with little maintenance. For iPhone applications, the app needs to not crash frequently (or at all). How we measure reliability will differ between concepts, but we should be able to capture some value of reliability.

Now What?

We need to identify a way to capture all the core qualities at once. How can we easily convey the potential of success for a concept? I think that a good visualization should be able to convey its message with no words (legends, keys, tips, etc.). Radar charts are our friend here. These charts allow us to render multivariate data in an easy-to-understand manner. At a glance, we can see how a concept's core qualities rank with respect to each other.

If you are not familiar with a radar chart, I think it is easier to see an imperfect concept before an ideal (or perfect). Take a look at the chart below - hopefully you are able to infer that the concept lacks in reliability, is average with respect to value and awareness and scores well with usability and accessibility.
Poor Reliability
Below, I illustrate the "perfect concept". This radar chart shows what a theoretically perfect concept would look like. I find this quite boring when compared to an imperfect concept.
Perfect Concept

Comparing Concepts

Comparing concepts against each other is an art. We can look at a number of concepts simultaneously:
Comparisons
While somewhat helpful, this is not ideal. It is difficult to rank these concepts from best to worst (most-likely to succeed and least-likely to succeed). My brain cannot easily quantify and rank these concepts (with the notable exception of the ideal concept).

I have used various strategies with limited success when comparing concepts. From assigning "point values" and success coefficients to coloring the points to calculating volume, I have not seen anything that works better than intuition derived from some form of visual representation. And I think there is inherent value in gut feelings.

Multiple Concepts

Sometimes I am evaluating multiple concepts where the usability does not matter much - this happens often with software projects I write for myself (I simply do not care if anyone else can use the software other than me). I guess what I am trying to say (poorly) is that I need to evaluate groups of concepts with unique constraints on the set (ie: "usability does not matter" or "awareness is inconsequential"). Regardless of the constraints, if I know what I am looking for, it becomes easier to process the radar charts.

Takeaway

The main message to take away is that we need a way to quantify the potential for success of a concept. This is important because I will be talking about a wide variety of concepts and it is valuable to have a sense of which projects I should be pursuing. By defining and understanding the five core qualities, I can assign unbiased values to the five core qualities and represent the assignments as a radar chart. This permits me to be "roughly right" in the "resource assignment" phase of the concept. I simply will cannot afford to invest in the concepts which predict little chance of success.

Wednesday, February 17, 2016

Checksums and Bugs

Bugs Are a Joke

Those familiar with the history of computers know all too well that bugs are a joke. Literally. Like "Walt Disney on Ice" literally. The first well known defect (the term we use professionally to describe a bug, or "undesired/unexpected behavior") was a moth. That's right - not a typo. You see, back in the days of Shannon (remember him?), computers had a lot of moving parts. And sometimes, bugs would fly into the moving parts. And sometimes the bug might have an unfortunate experience with the moving parts. True story - a moth flew into a relay back in 1946 and Grace Hopper found the moth (essentially finding the first "bug"). Incidentally, Grace Hopper is totally awesome and was a true pioneer in the field of computer science.

Checksums

A checksum is a fancy term we use for making sure computers didn't miss something. Not surprisingly, checksums are used in Information Theory (again, remember Shannon?). Bugs notwithstanding, the computer world is not perfect. Stuff happens, and that stuff can be out of our control. Signals are susceptible to interference. In the computer world, all communication boils down to 1's and 0's. But nature doesn't always play nicely and sometimes all those pretty copper or fiber lines that carry your Internet signals get mixed signals. If too many mixed signals go unnoticed, bad things tend to happen. Happily, pioneers of Information Theory understood this (keep in mind that this dates back to WWII, where radio communication was expensive but extremely important). They came up with the notion of a checksum. There are a ton of really good uses, but this is a trivial example:

For every 7 bits I send, I will add an 8th bit which will make the number of 1's in the 8 bit string even; when you receive my message, you can check that the number of 1's is even.

Example:
I send the 7 bit value:
0110011

There are 4 1's in the string - an even number; consequently, I do NOT need to add another 1, so I add a 0 instead.
01100110

Now, when you receive the value, you will remove the last 0 from the byte (a byte is 8 bits), resulting in the original message:
0110011

You will count the number of 1's, see that there are 4 - which is even!.

But what happens when something goes awry during the transmission? Let's take a look. I send the original message (with the "0" appended):
01100110

But because of transmission errors, you receive:
01000110

You count the number of 1's in the message and see that the value is odd, not even. You have just detected an error (because of checksums)! All that you can deduce is that there was an error in the transmission - you do not know where the error occurred - just that there is an error.

This is a simple example, but hopefully you get the gist of checksums. Be aware that the particular example I illustrated will only detect one error. Multiple errors may go undetected. Fear not - science has provided more complex checksum methods to account for that; as well, some methods can offer error correction. That is to say that not only can errors be detected, but in some cases, the errors can be corrected - wild!

OK, But What About The Real World?

Great question! You see checksums all the time. UPC symbols have them - so any time you buy something that is scanned, you are seeing checksums at play (technically it is a check digit, but the concept is the same).

Checksums are also used in ISBN numbers. If you are old enough to remember what a real book looked like, you might recall that each book has its own ISBN. Boring, but it might help on Final Jeopardy.

Try It At Home

Need a weekend project? Try this on for size: credit card authorization did not used to happen instantly. Merchants would batch up all the transactions over a set amount of time and then process them with the credit card company. And sometimes the processing used unreliable communication links (sound familiar yet?). In order to help identify which credit card numbers were transmitted in error, credit cards has a checksum built in. This is still used today. Take your favorite credit card and do the following:

  1. Multiply the first number by 1
  2. Multiply the second number by 2
  3. Multiply the third number by 1
  4. Multiply the fourth number by 2
  5. ...
  6. Repeat for all numbers except for the last number
  7. Sum up all the products
  8. Add the last number to the sum of the products
  9. If the total sum is evenly divisible by 10 (modulo 10 if you are mathy), the number is valid
For example, give the 16 digit credit card number and the multiplier:
  1234 1234 1234 123?
x 1212 1212 1212 121 

We get the sum:
      1438 1438 1438 143 = 56

The check digit needs to be a value such that when added to 56, will be divisible by 10. We need a 4. The complete credit card number is then:
1234 1234 1234 1234

And How Does This Relate To Bugs?

Fair question. This is where the boring part comes into play, but it is important to understand. Remember when I was talking about motion detection? When I was writing the detection algorithm, I noticed that the edges of the image did not look quite right. When I looked into it a bit further, I found that the image was being improperly handled: my algorithm was cropping part of the image (to focus on the section of the image which I was interested in), but the code I used to crop the image was not properly handling decompression/compression. As a consequence, parts of the compressed image that was cropped off still had some data within the image. Think about it like this:

The compressed image has localized data; basically a count of the number of pixels of each color within a quadrant of the image, along with a checksum of the value. But when I remove part of the image, the count was not updated and the checksum does not validate.

By way of example, let me illustrate. Remember this dude?
The Guy
If we want to compress the image, we might break up the image into quadrants like this:
Quadrants
Now, count the number of black squares in each quadrant and add a check digit that value. For each of the top two quadrants, we have 11 full black squares. In the lower quadrants, we have 7 black squares. Checksum modulo 10 and our values for the top quadrants are 11 and 9 (11 is the count, 9 is the check digit) and the lower quadrants are 7 and 3. We can represent the values as touples, and they look like this: (11, 9) and (7, 3).

Lets say that I crop the image, like so:
Cropped for profile pic


But whoops! I did not update the compression data with checksums! The top two quadrants will still validate (11 ,9), but the two lower quadrants are in error. The lower quadrants should be (4, 6) but are still (7, 3). By virtue of not recalculating the compression values, the data we have about the image describes part of the image which is no longer present.

Is This A Problem?

No, not at all. This does not affect how we perceive the image - the image will continue to look as it should because the compression error has no visual manifestation. But software that looks at the checksum will notice that something does not quite add up (pun intended).

Endgame

What is the takeaway? Hopefully you have a slightly better understanding of what a checksum is and how it works. Additionally, you now have an anecdote to tell the next time someone says that they found a bug in some software.

Also, for the purpose of this post, I use the term "checksum" to be interchangeable with the term "check digit". While the two terms are related, they are not exactly the same. Checksums are typically more complex than what I describe, but the concept is the same. But checksums are more widely known, so we'll stick with that term.

Thursday, February 4, 2016

Shannon's Balls

Information Theory and Pornography

In 1948, a very important paper was published by Claude Shannon. The paper was entitled "A Mathematical Theory of Communication". In it, Shannon basically invented information theory. While very technical and boring sounding, information theory is crazy important in today's world because it tells us exactly how much porn we can download.

That's right - Shannon drops a knowledge bomb on us so large that the aftershocks are still realized today. Shannon relates entropy, negative logarithms and way nerdier concepts into a model that is the basis of nearly every bit of electronic communication today. From error detection to error correction to encryption to compression, all electronic communication is rooted in Shannon's work.

I mentioned porn because it is relevant (inasmuch as it is a compressible form of information), but also it helps to draw the reader in. Also, it increases my SEO score.


Entropy - Why Does It Matter?

Great question! Let me feed you, baby bird... If we can calculate the entropy of a message, we can determine exactly how "compressed" the message can be. This is important because all Internet traffic is metered at some point. Someone is paying for each and every bit that ventures onto the Internet. Bandwidth costs are a real, viable metric. Maybe not for the home consumer where you pay $50/month for a cable modem connection - but for large data centers that handle significant portions of Internet traffic.

According to this out-dated Wikipedia article, the Internet saw nearly 1 petabyte of traffic every day in 2012. If we extrapolate just past a liner slope, we are approaching 2 petabytes of traffic per day in 2016. To put things in perspective, a petabyte is 1 with fifteen 0's after it. And petabytes are counting bytes, not bits. A byte is 8 bits. So multiply by 8 for a more accurate number. The Internet sees

1,000,000,000,000,000 bytes per day


That is a lot. Huge, in fact. Any basic calculator more than a few years old will not allow numbers that big. Let me put this in perspective for you. If you had one thousand dollar bills, you would have a stack of dollar bills that is 4.3 inches high. If you had 1,000,000,000,000,000 dollar bills, you would have a stack of dollar bills that would go from here to the moon and back almost 150 times over (70 million miles). If you look at the traffic generated from your cable modem in one day (with an aggressive estimate of 8GB), you are looking at a stack of dollar bills shorter than the tallest building on earth. Which isn't even a rounding error of a rounding error of a petabyte.

All this to say, if everyone can save even 1% of bytes when sending traffic across the Internet, we will save more bandwidth in one day than the entire Internet saw from January 1st of 1990 through December 31st of 2005. So yes, compression is important.

Don't Forget The Balls

Let's relate entropy to balls. Specifically, shipping balls in a box. Consider a scenario where I have to ship hundreds of balls across the country.

Free Balls

All the boxes I have are fairly standard shipping boxes, and it will take a number of boxes to fit all the balls. Each box costs $12 to send via USPS, so it is in my best interest to reduce the number of boxes that I want to send.

Packaged Balls
A simple and fast solution might be to haphazardly throw balls into a box until no more fit, seal the box and start over with the next empty box. This will work and it will be quick for me. This is quick solution, but it comes at a financial cost (more boxes).

Alternatively, I could place each ball in the box very carefully, lining up all the balls and optimizing the space in each box. It is likely that I would use fewer boxes - this is a long solution, but it has a lower financial cost.

Using Shannon's work, we can calculate the entropy of the balls in the box. This number will tell us the maximum number of balls we can place in the box. With that in mind, I can figure out how best to spend my time and money. Using the ball entropy, I can determine my best course of action. I might choose to spend more money and package the balls quickly into a lot of boxes; conversely, I may choose to spend a lot of time to package the balls in as few boxes as possible. What is most likely is that I will choose a happy medium of cost vs. time. But it is important to recognize that Shannon's work tells me what I am capable of.

A Bad Example To Follow That Bad Analogy

This is a terrible example of entropy and how it can be used for compression. Forgive me.

Let's imagine that we want to send the lyrics for the Bryan Adam's classic hit, "Have You Ever Really Loved A Woman?". Our goal is to send the smallest amount of information. For this example, assume that any key we create is available to the recipient of the message.

If you want to play along at home, the lyrics I am using can be found here. There are 328 words in the song, counting duplicates (95 unique words). Incidentally, the ratio of unique words to all words is an indicator of how well we might be able to compress the message.

After some processing, it turns out that the most common word is "you", which shows up 26 times (nearly 8% of the words in the song). Rounding out second place is "her" with 22 occurrences. The word "really" shows up 20 times. Here is the break down of the top 10 words:

Percent  Occurrences  Word
   8%        26        you
   7%        22        her
   6%        20        really
   5%        17        woman
   5%        16        a
   4%        14        tenderness
   4%        12        loved
   3%        10        ever
   3%        8         you
   2%        6         yeah

The most common distinct 10 words in this song make up 46% of the total words in the lyrics. We can assign a number for each word in the list.

Percent  Occurrences  Word          Number
   8%        26        you            0
   7%        22        her            1
   6%        20        really         2
   5%        17        woman          3
   5%        16        a              4
   4%        14        tenderness     5      
   4%        12        loved          6
   3%        10        ever           7
   3%        8         you            8
   2%        6         yeah           9

We can pair the number with the word and refer to that as our "key". Our key, then, looks like this:
0   you
1   her
3   really
... and so on

If we were to use these numbers to represent the lyric "really really ever loved a woman", we would write "2 2 7 7 4 3". The original phrase is 32 characters (think of a "character" as a keystroke), whereas the "compressed" phrase is 11 characters long - nearly 1/3rd of the original message!

We can assign a number (or symbol) for each word or phrase in the song. When the recipient gets the compressed message, they can use our key to figure out what the original message was. For example, when the recipient sees the phrase "2 2 7 7 4 3" and applies the key, the recipient can calculate the original phrase ("really really ever loved a woman").

Note that all is not perfect! When we assign a number to the word "a" (which appears 2 times only), we will be assigning a 2 digit number (because all the 1 digit numbers have been used). This means that at least for the word "a", the compressed representation is longer than the uncompressed representation!

The Takeaway

Please keep in mind that this exercise is simply to help show the basic concepts of entropy and how it might be useful with compression.

This is a simplification of sorts, and there are a lot of considerations with compression and entropy. For example, the "key" needs to be sent along with the message; consequently, in order to be useful, the "compressed" message plus the key needs to be smaller than the original message. As well, I cheated and used spaces in the compressed message. In real life, word delineation would happen a different way - in fact, a lot of complicated things happen. But the concept is sound and hopefully you have a better understanding of compression.



Nerd Spoiler Alert

If you are really interested in some of the complexity, think about breaking the lyrics into "patterns" that repeat, not "words". For example, "really really really" is a pattern that shows up. It would be extremely beneficial to represent that pattern as "1" - we have compressed that phrase to 1/17th of its original size. But it might be the case that the best pattern is really every 6 characters, regardless of boundary - it all depends on a lot of thnigs!

But we also need to store the data which represents the decompression key - all in less space than the original message. And we can use clever tricks such as changing keys based on stream input, for starters.

Keep in mind that the strategy used to compress is also based on compute power. And the type of data (and the related entropy). There is no single right answer.


Monday, January 25, 2016

Humans Rock, Computers are Dumb

People Are Good at Some Things

Humans are very well adapted to solve complex problems quickly. So quickly, in fact, that often times we don't realize that we are solving problems.

Consider motion detection - how do you know when something has moved? Think about a leaf blowing across the lawn. You see the leaf, and you see the lawn. You see the leaf blowing across the lawn. But how do you know the leaf moved? You just know.

You probably can't come up with an explanation other than "it just moved". But how do you know it moved? Your brain has a vast understanding of the world. You know what a leaf is. You know the properties of a leaf - light, crisp, blowable. You know the properties of a lawn: firm, stays on the ground and not blowable. You also have an understanding of things that are not visible - the wind, for example. You know that wind exists, that it can change direction and speed and that it can blow light objects (like leaves, which are blowable). When you put all these (somewhat complex) concepts together, you form an understanding that the leaf is blowable, the lawn is not blowable and wind can blow the leaf across the lawn.

Yet, that still is not sufficient: you still need to realize that the leaf is not where it used to be. You can see the leaf, you can observe where the leaf resides in individual points in time. And you can reason that if the leaf appears to change position from one moment to the next, then the leaf is probably being blown by the wind. And all this happens in a fraction of a second, without you thinking about it.

It is beautiful, really - your brain can process and solve very complex problems without even consciously thinking about it. Even more incredible - your brain can actually predict where the leaf will go next. By using contextual clues such as the current path of the leaf, how hard the wind is blowing, any obstacles that might be in the way and past experiences with blowing leaves, your brain puts together a flight path of the leaf. And your brain is so good at this that you don't even realize that you just solved a very complicated calculus problem less time than it takes to blink.

But Computers are Dumb at Those Same Things

Even though you are smart, your computer is dumb. Very dumb. Ever try to have a discussion with your computer about a leaf? Probably not - most computers aren't built like that, and most computers have no concept of leafs, wind or even how to communicate with a human. Or what a human is. Or that the notion of communication exists. Short story: computers don't know how to figure out that a leaf is blowing across the lawn. The computer simply has no knowledge of any of the concepts that your brain was able to resolve in a fraction of a second in order to solve the same problem.

Computers are a far cry from Skynet. If a computer can't figure out what wind is, how can your computer know to launch nuclear missiles at humans in order to insure its own survival?

Motion Detection is Super Easy... for Humans

Motion detection is easy. Very easy. At least, motion detection is very easy to do very poorly. Not to be confused with motion tracking (following an object as it moves), which is a bit more complicated.

Keep in mind that in order to detect motion, computers do not need to know what is moving (computers do not need to have a notion of a leaf, or a lawn or the wind). Which is good, because computers are dumb (remember?). But, computers are fast at simple tasks. Incredibly fast. Like calculating the differences in images. As it turns out, being able to find the differences between two consecutive images of the same thing is an easy way to figure out if something has moved.

Our "algorithm" is a set of instructions that are followed in order; at the end of this algorithm, we will have determined if a motion event has occurred, based on two images.

For each image
  1. Convert the image into black and white
  2. Divide the image into grid boxes
  3. Count how many black pixels are in the each grid box
  4. Compare the number of black pixels in any grid box to the corresponding grid box in some other image
  5. If the number of black pixels between two corresponding grid boxes is very different, mark that grid box as experiencing a motion event
Please forgive this crude example. Below, I have created a 10x10 grid:

Why do we use black and white? Because it is easy (and, as it turns out, a bit more tolerant*)! If we had to track the number of pixels in each grid for each possible color, we would be dealing with 64 bits for each pixel. This translates to roughly 8 megabytes of memory for each 10 megapixel image. That is a lot of data to process, and keeping track of those colors is confusing - keep in mind that you can express about 10,000,000,000,000,000,000 (10 billion billion) different colors for each 64 bit pixel. By my calculations, that is in the neighborhood of "a crap ton". Why should we use a complex data structure to store all that data? Instead, I can use 2 numbers: one number for the black pixel count and one number for the white pixel count. Now my storage requirement is simply 2n (where n is the number of grid boxes). Even if there are 10,000 grid boxes, 20,000 (2n, or 2 x 10,000) is a far cry from a "crap ton". But we can actually skate by with 1n since we can calculate both the black and white pixel counts for any grid box if we know one of the values and the size of the grid box (the inverse would be "the size of the grid box - n").

Black and White is not Grey

Just to clarify, when I say black and white, I mean black and white. Not Casablanca black and white. Each pixel is either black or white - there is no gray. Images will look pretty crappy when converted to black and white. Remember that these images are much easier for our algorithm to "understand". Here is a sample of the original picture, the black and white picture and the grey-scale image.

Another crude example - the same image (a color rainbow) in color, grey-scale and black and white:
Color Rainbow
Grey-scale Rainbow
Black and White Rainbow
How we convert an image to black or white is pretty simple: convert all the pixels to either black or white. If the pixel is bright enough, we consider it white. Otherwise, we consider the pixel black.

There are several ways to determine the "brightness" (called "luminosity" in nerd-speak) of a pixel, but the methods are generally simple and involve multiplying each of the component colors (red, green, blue) by some constants and adding the values together before rounding up or down. Something like this random formula I found on the Internet just now:

(0.299 * red) + (0.587 * green) + (0.114 * blue)

Notice how boring that is. No magic here. Computers are great at calculations like these.

Grids and Boxes

Remember that grid box from up above? This is what it looks like with a creepily-proportioned stick figure:

Original Image
I have cleverly created the image so each grid has all the pixels of the same color. This will make the example easier to follow; also the image has already been converted to black and white.

Multiple Images

Motion detection, even poor motion detection, requires multiple images to compare. But simply put, we just compare the black counts in each grid box from one image to the next. If the value changes by more than a certain amount, we can say that some motion has occurred.

Now, adding an element of motion (notice how his right (your left) hand has moved - B3 has moved to B5)
Image After Motion

Following along with the "algorithm", the black pixel totals for the grid boxes in the original image:
0000000000
0000110000
0100110000
0111111110
0001111010
0001111000
0001111000
0001001000
0001001000
0001001000

The black pixel counts for the grid boxes in the image after motion - I have underlined the changed counts:
0000000000
0000110000
0000110000
0111111110
0101111010
0001111000
0001111000
0001001000
0001001000
0001001000

We can clearly see that two grid boxes indicate a motion event (grid boxes B3 and B5).

And just like that, we have motion detection!

Betterifying It

Cool - we just detected motion! And it was simple. But can we do better? Certainly! We could figure out a good threshold for the differences in the grid box comparison values (is 5% change enough?).

We could decrease the size of each grid box (which would provide more resolution).

Perhaps we could require that some percentage of all the grid boxes have to detect motion before we consider a valid motion event. Even better - only consider changes in neighboring grid boxes.

My Application

While these are all valid techniques, please keep in mind that this is a gross oversimplification of a complex problem. While valid and accepted, it is not the best solution for mission critical applications. In other words: please don't sue me if you use this for anything. Ever.

That being said, I am using this concept in one of my pet projects. The project is a a home alarm system and while I have implemented a slightly more complex motion detection algorithm, the results are promising!

The End

That about wraps it up. Motion detection is simple for both humans and computers. Except humans and computers detect motion in vastly different ways. Also, humans are way better at motion detection across a variety of conditions (light, weather, wind, etc.).

*Black and white is more tolerant for this application because the binary classification of each pixel (only black or white) makes it unlikely that subtle variances in lighting, shadows or inexpensive hardware will result a false positive. Trust me, please.