GIPHY <3 Analytics

October 10, 2017 by Niger Little-Poole

Data Science and Analytics are an important part of what makes GIPHY the best place to find GIFs. We recently teamed up with Interana to help them spread the message of good data analytics practices during their “Summer Data Love Roadshow”. I answered questions from New York area Data Scientists, Product Managers, CEOs, etc along with experts from Comcast and Facebook. The following are some of the key insights we shared.

Never Delete Your Data

We encountered a question from a CEO wondering how his location based startup could manage their data given the volume was too expensive to store. He was considering just putting rollups in a data warehouse. If you find yourself in this situation, worst comes to worse sample your data. You never want to give up the power of the raw data. The rollups and views you create now, might be sufficient, but you never know when you’ll need to look at the data on a different set of dimensions. In some cases, like during an audit or for compliance, this could make or break your business. At GIPHY, every GIF, Sticker, API call, etc we’ve ever served has a corresponding log entry backed up to S3 in Parquet format. While we use aggregations to query things more quickly, we always have the raw data accessible if we need to build new views.

Data is for Everybody

Everyone today wants to be “data driven” but many organizations don’t know what that means and aren’t structurally ready for such a shift. For data to truly be for everyone, it needs to be presented in a way that complements existing workflows, like Calendar Analytics does, instead of operating orthogonally. At GIPHY we ensure that our tools either integrate with the Admin version of our platform, or have the capability to render relevant content directly so that our editorial team doesn’t have to add steps to connect insights to content. Rendering GIFs may seem like a small feature for any tool but it makes a substantial difference in the adoption of tools within our organization.

We’re working on some cool projects at GIPHY that utilize computer vision, NLP, and machine learning. We have models that can detect if a GIF is viral, if it’s animated, if it has a caption (and the caption text), and even if it has certain celebrities. However the key to “Data is for Everyone” isn’t models. It’s really about learning lessons from UX and Human Factors to present information in the most effective way. We try to avoid spreadsheets and dashboards and focus more on automated or interactive tools that reduce cognitive burden. For example, we’ve built Slack bots that update relevant teams in concise, easy to understand English, avoiding jargon.

Keep it Simple Stupid

When organizations attempt to introduce more Data to their process, they sometimes go a bit overboard. Organizations will chase a “golden goose insight” that will magically improve their business. It doesn’t help that media reporting focuses on these kinds of data stories. For most organizations this insight doesn’t materialize and “data burnout” can occur. Its much better to focus on one metric at a time and slowly introduce more data. At GIPHY, our content team uses click through rate as their golden metric. The impact of new content partnerships, artist collaborations, and even work from our Studio are all judged on a high level by this one simple to understand metric. Having a simple high level metric like this, offers context and makes it easier to ask more detailed follow up questions like “how” or “why”.

Automated Insight

Lastly a big focus for us at GIPHY is the concept of “Automated Insight”. We believe it’s unrealistic for everyone to be remembering to check a dashboard every single day and also notice changes in that data. Tons of research in the field of Human Factors have demonstrated the limits of human memory and perception. People become numb to the data over time if it’s constantly the same. The key is to only present data to people when new information is available. For example, our content team is updated whenever we see a high volume of searches but a low volume of clicks. This indicates a lot of people are searching for something but not finding it. This automation allows us to react quickly to real time events.

Conclusion

In conclusion, we had a really great conversation on data. Interana was a great host for this event as their Platform really encapsulates the takeaways we discussed at the event. They are helping organizations to harness the power of the raw data, make it easily presentable to users, and provide an API to build further tooling. Stay tuned for more talk on data on the GIPHY engineering blog.

– Niger Little-Poole, Data Scientist