The Making of a GIF Music Video

December 1, 2017 by Bjorn Roche

When singer/songwriter Cassandra Kubinski recently visited the GIPHY office, she got inspired and thought of the idea to create a music video made entirely of GIFs. We weren’t sure if this had ever been done before, but it didn’t really matter – it sounded too fun not to try it ourselves. Cassandra has a great newly released Christmas EP called Holiday Magic and we both agreed that “It Doesn’t feel Like December” would be a perfect track for GIFs. While all the songs on the album are fantastic, this track’s silliness (it even has a Kazoo!) really sets the mood for fun GIFs. I suggested we could do it automatically using the song’s lyrics and GIPHY’s search and showed Cassandra my first pass. While it was no masterpiece, it was fun enough to keep it going. It took a little more work, but much less than I expected, and the results were so good, and so positively silly, that Cassandra decided to release the video on December 1st, in honor of the song’s title.

The tools I used to build the video were nothing more than the GIPHY API, some open source tools including the fantastic Aubio and FFmpeg libraries, and a few simple shell scripts. I thought the shell scripts wouldn’t be useful for anything past proof of concept, but it turned out that they were good enough and I didn’t need anything fancier.

The general approach was to:

  1. Split the song into beats using Aubio
  2. Assign a term or phrase to each beat (using lyrics if there were any, and words that fit the theme of the song if not)
  3. Search the GIPHY API for GIF that matched each term
  4. Edit the GIFs together
  5. Watch the finished video and tweak

When I started, I wondered how much of the process could be automated. For the most part, I found that the only manual editing I needed was choosing the right thematic words, editing lyrics to remove words that resulted in incongruous searches, and then simply deleting GIFs I didn’t like or didn’t think fit. (Some of this process was entertaining in the way that only GIFs are: for example, I originally included the word “freeze” in my thematic words, forgetting that it means not just “very cold” but also “stop moving.”)

Splitting the song and assigning search terms

Splitting the into beats was easily accomplished by the “aubiocut” script that comes with Aubio, and results were saved in a CSV file:

aubiocut -i in.mp3 -b | sed  '/[^[:digit:].+-]/d' | sed 's/$/,/' > times.csv

That CSV file was edited manually by adding search terms to each line.

Searching the Giphy API

Making API calls and parsing JSON in shell requires a little help from curl and python. I used the translate endpoint, though search and random could also be used. You’ll need to get yourself an API key if you want to do this yourself, but the url is going to be something like this:

api.giphy.com/v1/gifs/translate?api_key=${apikey}&rating=g&s=$tag"

You’ll then want to parse out the URL, to select the “rendition” you want. I decided to take the “original_mp4” rendition, which is usually smaller than a GIF, and higher quality in some cases. You can do that with something like this:

python -c 'import sys, json, pprint; pp = pprint.PrettyPrinter(indent=4); print( json.load(sys.stdin)["data"]["images"]["original_mp4"]["mp4"] ); '

In my code, I saved the results to another CSV.

Editing the GIFs together

After looping through the new CSV and fetching the GIFs, they have to be edited together. This is a bit tricky because the GIFs may be too short or too long, and may have different frame rates and sizes. I decided to standardize on 30 FPS and letterbox all GIFs to 1280×720. The code to do this, and trim to required length () is a bit of an eyesore, but it can all be done in one line with FFmpeg:

ffmpeg -i gifs/$i.mp4 -ss '00:00:00.0' -t 00:00:"$d" -r 30 -vf "scale=(iw*sar)*min(1280/(iw*sar)\,720/ih):ih*min(1280/(iw*sar)\,720/ih), pad=1280:720:(1280-iw*min(1280/iw\,720/ih))/2:(720-ih*min(1280/iw\,720/ih))/2" -y gifs/$i-trimmed.mp4

Once all the files are conformed to the same format, they can be easily combined using a concat filter:

ffmpeg -f concat -i concat-list.txt -i in.mp3 -c:v libx264 -c:a aac -movflags faststart -y output.mp4

This approach isn’t perfect because rounding errors due to differences between the requested length and the nearest frame will accumulate, but I found it worked extremely well for “It Doesn’t feel Like December” which is less than 4 minutes long, and had less than 300 edits in total.

Watching and Tweaking

From here, the video is finished and can be played in any video player that can play MP4s. I found it useful go back to the first step a few times and tweak the search terms until things looked pretty good, and then go back to looking at individual edit decisions and remove and occasionally move around ones to perfection.

As a final tweak, I did a few manual searches on Giphy.com and found the perfect GIFs and pasted them in. Sometimes there’s just no substitute for that perfect find. Of course, for credits and a few custom GIFs of Cassandra, we used GIPHY’s Web GIF Maker.

Future Directions

It would be interesting to see if tweaking the search words could be automated using Natural Language Processing (NLP) to improve the semantic context of the lyrics and automatically determine thematic words. However, I would love to first expand these scripts with a simple User Interface so anyone could use it.

— Bjorn Roche, Sr. Media Pipeline Engineer