Best unofficial Apache Server developers community
Username
Forgot password?
Sign in with Twitter account
Sign in with Facebook account

Best way to store similar music

3

61 views

I have millions of songs, each song has its unique Song ID. Corresponding to each Song ID I have some attributes like song name, artist name, album name, year etc.

Now, I have implemented a mechanism to find out similarity ratio between two songs. It gives me a value between 0 - 100.

So, I need to show similar music to users, which can not be done on a run time. I need to preprocess the similarity values between each and every song.

Hence, if I create a DB with three attributes,

song1, song2, similarity

I will be having n*n records where n is the number of songs.

And whenever I want to fetch the similar music, I need to execute this query:

SELECT song2 WHERE song1 = x AND similarity > 80 ORDER BY similarity DESC;

Please suggest something to maintain such information.

Thanks.

asked June 25, 2011 10:38 am CDT
posted via StackOverflow

3 Answers

1
 

What you are proposing will work, however, you can reduce the number of rows by storing each pair only once. Then modifying your query to select the song id in song1 or song2.

Something like:

SELECT if(song1=?,song2,song1) as similar WHERE (song1 = ? or song2 =?) AND similarity > 80 ORDER BY similarity DESC;

answered June 25, 2011 11:23 am CDT
1
 

It seems required mass computation power to maintain and access the similarity information. For example, if you already have 2000 songs processed, and you still need to perform the similarity analyze 2000 times for the next new song. It may have scalability problem and the data scheme can make the database slow in just a short time period.

I recommend that you can find some pattern and tag each song. For example, you can analyze the songs for "blues", "rocks", "90's" pattern and give them tags. If you want to find similar song based on one song, you can just query all tags that the given songs have. ex. "New age", "Slow" and "techno"

answered June 25, 2011 11:23 am CDT
1
 

I think you'd be better off comparing similarity to a "prototypical" song or classification. Devise a fingerprint mechanism that includes information metadata about the song and whatever audio mechanism you use to judge similarity. Place each song into one (or more) categories and score the song within that category -- how closely does it match the prototype for the category using the fingerprint. Note that you could have hundreds or thousands of categories, i.e., they're not the typical categories that you think of when you think of music.

Once you have this done, you can then maintain indexes by category and when finding similar songs you devise a weight based on the category and similarity measures within the category -- say by giving greater weight to the category in which the song is closest to the prototype. Multiply the weight by the square of the difference between the candidate song and the current song to the prototype for the category. Sum the weights for the say top 3 categories with lower values being more similar.

This way you only need to store a few items of metadata for each song rather than keep relationship between pairs of songs. If the main algorithm runs too slowly, you could keep cached pair-wise data for the most common songs and default to the algorithmic comparison when a song isn't in your cached data set.

answered June 25, 2011 11:23 am CDT

Your answer

Join with account you already have


Sign in with Twitter account
Sign in with Facebook account
Sign in with Google Friend Connect

Preview
Similar questions
Similar Items in MongoDb
February 3, 2011