Generating Image Based Song Recommendations with Google Gemini and Spotify APIs

Posted: May 05, 2024

As a fun exploration, I wanted to challenge myself to build a GenAI enabled application. In this project, I created a basic NextJS application to integrate Google Gemini’s image recognition functionality with Spotify’s music library and search functionality. This article will walk through the key components of this project. All code is available on my Github at tjm165/gemini-spotify-recommendation.

Demo Screenshot

An image from Notion

Technology Choices

There are two note worthy choices to the technology that I selected.

API Tokens

As stated in the title, this project integrates with the Google Gemini and Spotify APIs. At the time of writing this article, both these APIs are free to use.

Google Gemini Integration

The core AI functionality works by sending a prompted and image to Google Gemini. I needed to Gemini to respond with song recommendations in an upon JSON format. After some experimentation I found that the following prompt would accomplish my goal

You are an assistant that generates JSON.
You always return JSON with no additional text. 
Please Generate a list of 5 songs in JSON format. 
The songs should relate to this image. 
Use the format like this example 
Example: {"recommendations": ["Song - Artist", "Song - Artist", ...]}.

It is worth noting that regardless of how specific I was that I wanted plain JSON, Gemini would wrap the response in markdown’s three backticks ``` . While it would have been nice to solve this in a prompt, I ultimately had to resort to trimming the string in javascript.

Spotify Integration

I utilized Spotify’s search API to search for a song that closest matched the Gemini response.

  const sdk = SpotifyWebApi.SpotifyApi.withClientCredentials(
    process.env.SPOTIFY_CLIENT_ID,
    process.env.SPOTIFY_CLIENT_SECRET
  );

  const item = await sdk.search("Here Comes the Sun - The Beatles", ["track"]);

It should be noted that the search functionality made it very convenient to query a song without requiring Gemini’s response to exactly match the titles listed in Spotify.

V0 + NextJS Frontend

I felt at home building the frontend as I’ve been familiar with React/NextJS for quite some time. I was excited to use this as an opportunity to test out Vercel’s v0.dev. It was quite neat to generate frontend components through GenAI prompts. In my experience, I found that v0 solved for 80% of the work. This allowed me to focus on polishing the final 20%. It should also be noted that I had to write all state logic on my own as v0 did not generate any code for this.