Tuesday, March 22, 2016

Contextual Hints And Using Them For Personal Gain

Contextual Hints

These are "things that tell us more information about something". In technical speak, we call this "adding context". But you probably know this as "Facebook Tags". As in:

This image comes from a TechCrunch article

Here, the image has a "contextual hint", or additional information about the contents of the image. In these cases, we are told who is in the image. I am sure that friends of Josh and Zack know who Josh and Zack are - but I did not know. That is, until I saw the contextual hint.

Contextual hints can sometimes provide information not captured within the original item - an example may be "recommended movies" from Netflix. The recommended items for a particular movie might give us information about the current movie we are viewing - perhaps the genre of the movie may become apparent. Here is an example:

Netflix Recommended Movies Example

You stumble upon a movie called "The Bad Day" (a movie that I made up for this example). The recommended movies immediately visible are:

  • Failure to Launch
  • How to Lose a Guy in 10 Days
  • Fool's Gold


Based on those recommendations, you may conclude that "The Bad Day" is a great rom-com, likely starring Matthew McConaughey.


But what if  the recommended movies for "The Bad Day" were:
  • Saw
  • Hostel
  • The Bad Day 7 (in 3D)
Well, now you may consider that "The Bad Day" is a horror movie.


In both these examples, we can make inferences about the original item based on the information we have about related items.

Other Familiar Examples


Some other forms of contextual hints:
  • Location data encoded in photos
  • "Similar Items" recommendations from Amazon
  • Accessibility hints (notice the pop-up hint that helps explain what the image is)
  • Info Cards on movies
This image is from the Official Android Blog
  • The Chrome Omnibar (bringing awareness to justice beavers across the world)

A good contextual hints is not necessary, but should be helpful. In other words, we should be able to figure out what something is in the absence of contextual hints, but having the extra information can aid in recognition and understanding.

How This Relates to Me

Imagine that I have a lot of digital images of my friends, and I want to sort them based on who is in the image. With a small amount of code and a good image processing library, I could build an individualized facial recognition model for each person I expect to find in my collection of photos. But the problem is that I would first need to sort the photos in order to train the models - so the solution to my problem is the very problem I am trying to solve. In computer science, we call this recursion. In the normal world, we call this the chicken and egg problem.

But, if we are clever, we may be able to use existing contextual information to do a lot of the work for us. Within Facebook, people may be tagged in images; generally, the face of the person is outlined in a overlay box and the Facebook profile for the user is linked in a popup hint. This is a great form of contextual information - it is also a great way for me to extract the face of the user so that I can use it as training data. In fact, Facebook provides all the information we need - the face and a link to associated user's Facebook profile.

Warning, this next part is a little techy...

Let's figure out a general algorithm. Assume that we can use the Facebook Graph API to query for users and images (for the time being, ignore any oauth scope or permission problems).

For each image for a specific user
  • Step 1: Save the image and crop down to the overly box, resulting in a small image of the user's face
  • Step 2: Get the Facebook user id (or the username) from the URL provided in the contextual hint
  • Step 3: Optionally query the graph API for other images which contain tags of the user
  • Step 4: Repeat with other images until enough training data is present
During these steps, we might also save the full image and other contextual information, such as:
  • Any location data in the image (GPS or place information)
  • Who else is tagged in the image
  • The time and date of the photo (cross referenced with weather data and GPS data, we may be able to use this additional information to make inferences about the image - I am getting ahead of myself, but hopefully you get the point)

// Get images and metadata from Facebook using the Graph API
//
// Get the Facebook User ID
var fbProfile = fb.queryGraphForProfile(fbUsername);
// Get the profile object for the Facebook User (given the ID)
var fbID = fbProfile.fbUserId;
// Get all images in which the user is tagged
//  NOTE: The graph call will actually return a series of pages with results, but
//        for this example, we have reduced the stream to an array
var imageIDs[] = fb.queryGraphForTaggedImages(fbID);

// Go through all the images in the array
forEach(imageId in imageIDs) {
    // Get the image from the Image ID
    var image = fb.queryGraphForImage(imageId);

    // Crop the image based on tag information
    //  NOTE: This is a bit more involved because we need to query to find the crop coordinates
    //
    // It might be worthwhile to also save the original image and any metadata about the image
    // for future reference.
    //
    // Use fbProfile to associate the image with the cropped image and the Facebook profile
    saveImageToStorage(getAndCropByMetadata(image), fbProfile, image);
}

I have just written pseudo-code to automatically crowd-source Facebook image tags in order to build a database of tagged facial images custom tailored to specific people. We can now use an image processing library to build a facial recognition model for every person in my digital images. Hopefully we have enough data so we can build a decent model and validate that model with a good degree of accuracy and precision.

// Load the images from  storage (each image is associated with a Facebook profile
var allImages = getImagesFromStorage();

// Get 90% of the images
var trainingImages = sample(allImages, .90);

// The other 10% is used for validation
var validationImages = Sets.difference(allImages, trainingImages);

// Build a facial recognition model from the training data (remember, each image
// is associated with a Facebook profile; let's assume that the model associates the
// image to the Facebook UserID
var model = FancyImageLib.buildFacialRecognitionModel(trainingImages);

// Validate the model
//  NOTE: The validation is a bit more complex and we should use images from other people
//        in the validation routine in order to measure false positives
var validationResults = Validator.validate(model, validationImages);

// Hope that our model is good!
output(validationResults);

Since this is pseudo-code, we can assume that the pseudo-results are quick and accurate. This leaves us with a facial recognition model that I can use to automatically sort my digital images based on faces.

At this point, we can (hopefully) reliably sort my entire digital archive based on people in the photos. But the cool part is that the facial recognition model was built entirely from contextual hints provided by Facebook.

Takeaway

The important point to takeaway from this discussion is that contextual information can be very valuable. What makes my particular application exciting is that we can leverage otherwise boring contextual information (photo tags) in order to automatically perform some higher-order functionality (build and use facial recognition).

One Last Point

This is a blog about theory. While the ability to build individualized facial recognition models based on Facebook image tags is possible, I am unaware of any actual implementation in the wild. This does not mean such an implementation does not exist, rather that if it does exist, I have not seen it. 

No comments:

Post a Comment