Enhancing GIPHY Search with Google Cloud ML Tools

December 8, 2017 by Nick Hasty

Editor’s Note: Earlier this week our Director of Engineering, Nick Hasty, was a guest author on the Google Cloud Big Data and ML blog. Nick touched on the use of Google Cloud Machine Learning to analyze and tag our GIFs, ultimately making it easier to find the perfect GIF on GIPHY. The following blog post is a condensed and modified version of the post published on Google Cloud Platform blog.


Here at GIPHY, we serve over three billion GIFs a day to over 300 million daily active users, and are constantly looking for ways to improve the results of their GIF searches. Recently, we’ve integrated Google’s Cloud Vision and Cloud NLP APIs into our GIF processing pipeline, which has helped us collect more metadata about our GIFs and enhance the core GIPHY search engine. Specifically, using Web Entities data from Cloud Vision helped us boost our search performance and using syntax data from Cloud Natural Language and the Knowledge Graph API let us create rich titles for our GIFs.

With entertainment and culture at the heart of our content, we need to be able to identify specific instances of celebrities, animals, movies, sports, video games, and more. The incredible pop-culture wizards in our in-house editorial team crawl through our catalog identifying and annotating our content, but at our current size, they need help. While we wanted to leverage machine learning to supplement our editors, our need for specificity presented a challenge.

The depth of this problem grew with every GIF crawled or uploaded to our site. We knew it was possible to train custom ML models in-house, and we do have a large amount of labeled data, but we needed a solution fast. After some research, we believed Google offered the most mature service and the widest array of quality models. After a few exploratory tests with positive results, we decided to go big and processed over 10 million GIFs across using the full array of Google Vision services. Here’s an overview of what we did.

In search of subtitles (and other text)

Our first integration of the metadata generated from Cloud Vision into our search engine was to use its optical character recognition (OCR) endpoint, which evaluates images for the presence of text, notably subtitles.

OCR detects text that is integrated into the actual pixel data of an image, and because this text can change over frames, like with subtitles, we wanted to make sure we captured as much text data as possible.

To begin evaluating a GIF for text, we sent off an initial frame to see if the API detected any text. If it did, we sent another frame and computed the difference in text values to see if the the text found in the GIF was static or dynamic across frames. If the text differed enough across frames, like in the case of dialog subtitles, then we repeated this process across a percentage of total frames until we had sufficient coverage to perform textual analysis. You can check out our sample Python code and try it for yourself-all you need is a captioned GIF with which to test and your Google Cloud credentials.

Contextual labels with Web Entities

After our success with OCR, we focused on integrating Google’s label data, specifically the Web Entities, which were great for discovering the additional specifics about the image.

Cloud Vision provides two types of image labels: “Label Detection”, which provides labels for objects that Google’s machine learning models are trained to detect, and “Web Entities”, which yield labels derived from the context in which the image was crawled and indexed. Since Web Entity labels take into account the data embedded around an image, like the surrounding text or captions, they tend to be very specific and can even provide proper nouns. For example, while Cloud Vision doesn’t have a model specifically trained to identify a specific celebrity, if it discovers an person’s image across multiple websites and in each case that image is displayed with a caption containing a proper noun, then the Web Entities endpoint labels that image with that proper noun.

Toward better GIF titles

A parallel project to using Web Entities was to improve the algorithm for generating titles for our GIFs. GIF titles are most noticeable on our GIF detail pages-right above the GIF itself. Initially, creating these titles was fairly simple and involved choosing the most popular tag for that GIF. This worked for a while, but with an ever-growing catalog we were creating many duplicate titles and lacking specificity.

For our new title schema, we wanted to be as descriptive as possible, with special emphasis on including the names of famous people, fictional characters, or TV shows, as well as actions, emotions, or reactions. Our schemas required metadata about, say, a tag’s part of speech and whether or not the tag was a proper noun. There are many solid open-source libraries for determining syntax, but we ended up using Google Cloud Natural Language asit provide very precise syntax detection and its Entity Recognition also provides a great way to identify proper nouns. Entity Recognition identifies known objects in a block of text, and provides Google Knowledge Graph IDs when available. This tight coupling between APIs helped us identify tags that refer to specific people, places, and things and highly specific metadata about those objects.

After processing our tag catalog, we began putting the data to use and generating new GIF titles. So far we’ve been thrilled with the results.

Check out some of our before and after GIF titles:

In Conclusion

Historically, taking advantage of the latest developments in machine learning and computer vision required in-house specialists and lots of time-two things that growth-stage startups like GIPHY don’t always have.

Although these technologies are becoming more accessible, easier to use, and faster to implement, their ultimate success depends heavily on both the quality and quantity of data you have available for training. Google Cloud’s Machine Learning APIs provides tremendously powerful, cutting-edge models that are just a request away. To learn more about the technical details visit the Google Cloud Platform blog.

–Nick Hasty, Director of Engineering

Original featured GIF by Leroy Patterson, GIPHY Studios Artist