What’s that Spike?

August 13, 2019 by Alex Anderson

Weekly Search Trends at GIPHY

The Product Analytics team at GIPHY sits at the interface of technical data engineering teams and the rest of our business. We’re tasked with making data easily accessible across the org and uncovering insights to enhance our product and sales strategies. This can take the form of building data visualizations, automating routine pulls, and, most importantly, performing deep dives into search behavior and content. Ultimately, our goal is to craft stories out of data that can be easily understood by all.

When I started at GIPHY as a Product Analyst, many of the first data analyses I completed were related to enhancing our knowledge of what our users were searching for. Understanding users’ desires impacts nearly all teams across the organization. While there is already a robust set of internal tools and products that make our Search algorithm first class, it’s not always easy for less technical teams to sift through the data and extract insights without getting in the weeds with SQL. Supplying teams like editorial and marketing with easily digestible, summarized information about search trends was a high priority, but something we quickly learned that nearly all teams wanted. To mitigate one-offs and equip the entire GIPHY org with useful data, Product Analytics decided to craft a “Weekly Search Trends” newsletter. This allows us to push information to all stakeholders, rather than being reactive to requests or asking them to monitor existing dashboards. The newsletter presents high level categories of what users searched for over the previous week, as well as keywords that spiked, by building a more automated reporting framework.

What are users searching for?

Every time a user makes a request to our API, from any integration we partner with, that request is logged into our databases. We’re able to extract multiple pieces of information about each of these requests, but for this exercise, we wanted to include search requests coming from the U.S. across our API network. The U.S. is our largest market, and these searches are generally in English which makes it simpler to assess trends.

Once we have this set of data, we apply some functions to cleanse it for easier readability. Sanitizing the queries involves a number of methods, including utilizing an existing remapping framework for semantically similar and misspelled phrases (i.e. “happybirthda” maps to “happy birthday;” “laugh out loud” maps to “lol”). Finally, we filter on searches that fall into the top by volume per day to ensure we’re focusing on statistically significant results.

How do we determine what constitutes a trend?

Although we get millions of unique queries per language per day, certain terms perennially appear in the list of top terms, such as “love,” “happy,” and “sad”. Upon further analysis, we also noticed that there are over 500 terms that appear multiple times in the top searches of the day over the course of a given month. To establish a benchmark of what constitutes an average top term, we crafted a “high volume score.” To calculate this, we take four weeks worth of data of the top terms for each day. Every day a keyword appears in the top terms, this gives it an additional point, for a max score of 28. We then look at these points by day of the week to determine if this term actually should be included in the high volume keywords list. For example, “tbt” generally always is a top search on Thursdays so would only have a score of 4 in a given four-week span, but is something we want to include as a benchmark. Once we have this top terms benchmark list, we compare it with our most recent weekly search dataset. This makes it simple to filter and find terms that are “new” to the top terms list.

Aside from finding “new” top terms, we also want to see if any of the benchmark high volume terms spiked in a given week. As a point of comparison, we look at the volume of each term for the previous week, then divide by the total search volume for the week to come up with an index that represents a week over week delta. Though uncomplicated, this has proven to be an effective way to spot anomalies without the need for a more complex algorithm for this simple newsletter.

Once we have a list of terms that spiked, the real work begins! At GIPHY, we believe that combining machine learning algorithms with culturally aware (and diverse!) humans yields the most powerful and effective insights. Our algorithm will export a list of trending terms for the week, but our cultural insight about the latest events and internet memes can provide more context about the intent of the user. The next step after identifying terms that spiked is to bucket them into categories or trends. Finally, we package it up into a quick, easy-to-read newsletter that gets sent out company-wide.

Concluding thoughts

While the newsletter has only been distributed for a few months, it has already been helpful to nearly all GIPHY teams. It’s also been a great tool to expose our fairly new Product Analytics team to GIPHY to show examples of the types of analyses we can create. Some of the fun trends and categories we’ve identified users searching for include entertainment releases (Game of Thrones was particularly popular), major holidays (think Mother’s Day, Easter, and Cinco de Mayo), sporting events (soccer nearly always appears in top searches), and more. Moreover, by sharing this simple newsletter, we’ve empowered our editorial team to take action and improve content around popular themes. Lastly, we’ve reduced the amount of ad-hoc requests by proactively pushing out information, instead of reacting to one-off requests asking if specific events are showing up in search.

Going forward, there are a few ways I can foresee this newsletter evolving to take initial feedback into consideration and add additional value to the company. Right now, the emails are sent out manually and insights are not catalogued. In the future, I’d love to make a simple web application to visualize this data and add the ability to look into historical reports by various content verticals. Teams could see which holidays, sporting events, seasons, or other categories of keywords that users were searching for in the past to inform their strategy.

Additionally, our method of determining spikes, while effective, remains simple, but has the potential to become more sophisticated. We could test out time-series models to see if it improves the number of events we’re able to spot, and potentially expose more niche spikes. The most important thing is that the newsletter is simple to create and spreads knowledge to teams that were previously not empowered with easily accessible insights, and can now understand the story we extract from the numbers.

— Alex Anderson, Product Analyst