Movie Recommendation bases on Pearsone Correlation


In this article we will learn how to build a movie recommendation system using pearsone Correlation.

By using the rating of the movie previously seen by the user, we try to recommend him/her similar movies in terms of ratings. Here I would like you to note two things:

  1. We are not recommending movies based on Genre.
  2. The previous movie that user watched must be present in our data set for this algorithm to work.

This recommendation system recommends movies to the user based on the ratings of a movie that the user previously liked. To achieve this, I have used here the concept of correlations. If you don’t know what correlations is, you can check this out Correlation. Correlation coefficient is always between -1 and 1. The closer it is to 1, the more related two items are.

So basically what we are doing here is taking the rating of the previously watched movie and calculating it’s correlation with the rest of the movies present in the dataset.

import numpy as np
import pandas as pd

movielense = 'ml-latest-small/'

Load Movie and Rating Dataset

#Loading movies
movies = pd.read_csv(movielense+"movies.csv")
movies.head()
movieId title genres
0 1 Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy
1 2 Jumanji (1995) Adventure|Children|Fantasy
2 3 Grumpier Old Men (1995) Comedy|Romance
3 4 Waiting to Exhale (1995) Comedy|Drama|Romance
4 5 Father of the Bride Part II (1995) Comedy
#Loading Rating
ratings = pd.read_csv(movielense+"ratings.csv")
ratings.drop(["timestamp"], axis=1, inplace=True)
ratings.head()
userId movieId rating
0 1 31 2.5
1 1 1029 3.0
2 1 1061 3.0
3 1 1129 2.0
4 1 1172 4.0

Joint Movies and Rating

#Change movieId in rating dataset with title (joint)
def replace_name(x):
    return movies[movies["movieId"]==x].title.values[0]

ratings.movieId = ratings.movieId.map(replace_name)
ratings.head()
userId movieId rating
0 1 Dangerous Minds (1995) 2.5
1 1 Dumbo (1941) 3.0
2 1 Sleepers (1996) 3.0
3 1 Escape from New York (1981) 2.0
4 1 Cinema Paradiso (Nuovo cinema Paradiso) (1989) 4.0

Create pivot table for userId and MovieId

Pearsone correlation work only on matrix. Our dataset is row based so we need to convert to matrix by using pandas pivot_table function.

M = ratings.pivot_table(index=['userId'],columns=['movieId'],values='rating')
M.shape
#M
(671, 9064)

Pearsone Correlation

TODO: Explaination aboyt pearsone correlation

def pearsone(s1,s2):
    #take 2 pd.series objects and return correlation between them
    s1_c = s1 - s1.mean()
    s2_c = s2 - s2.mean()
    return np.sum(s1_c * s2_c)/np.sqrt(np.sum(s1_c**2)*np.sum(s2_c**2))

pearsone(M["Clerks (1994)"], M["Mallrats (1995)"])
0.26948637162292466

correlation coefficient is always between -1 and 1. The closer it is to 1, the more related two items are.

Recommendation bases on Persone Correlation

Calculating the correlation of the previously watched movie with the rest of the movies in the dataset, we sort them and print the first n recommendations based on the value entered by the user for n.

def get_recs(movie_name,M,num):
    #num : number of recommendations
    #genres = []
    reviews = []

    for title in M.columns:
        if title == movie_name:
            #genres.append()
            continue
        cor = pearsone(M[movie_name], M[title])
        if np.isnan(cor):
            continue
        else:
            reviews.append((title,cor))

    reviews.sort(key = lambda tup: tup[1], reverse = True)
    return reviews[:num]

Recommendation movies to watch based on "Toy Story (1995)"

n = int(input("Enter the required no. of recommendations:")) #10
mov = input("Enter the name of a movie that you liked:") #Toy Story (1995)

lst = get_recs(mov,M,n)
print ("Recommendations for you:")
lst
Enter the required no. of recommendations:10
Enter the name of a movie that you liked:Toy Story (1995)


/home/hendra_herviawan/bin/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:5: RuntimeWarning: invalid value encountered in double_scalars
  """


Recommendations for you:





[('Toy Story 2 (1999)', 0.38751909607714241),
 ("Bug's Life, A (1998)", 0.29185057100728379),
 ('Lion King, The (1994)', 0.26797063953540978),
 ('Aladdin (1992)', 0.24245219707788779),
 ('Monsters, Inc. (2001)', 0.23609185999396132),
 ('Shrek (2001)', 0.23379248872321592),
 ('Dark Knight, The (2008)', 0.22285149745651747),
 ('Spider-Man (2002)', 0.22253962836326829),
 ('Basic Instinct (1992)', 0.2213613466749314),
 ('Incredibles, The (2004)', 0.22061205440681195)]

The recommendation is good but not perfect, for "Toy story (1995)" pearsone correlation recommend Disney & Animated Movie. More reseach still need to do for filter "Basic Instinct (1992)" appear in our recommendation.