- Wed 01 February 2017
- Marketing Analytic
- M Hendra Herviawan
- #Recommendation
In this article we will learn how to build a movie recommendation system using pearsone Correlation.
By using the rating of the movie previously seen by the user, we try to recommend him/her similar movies in terms of ratings. Here I would like you to note two things:
- We are not recommending movies based on Genre.
- The previous movie that user watched must be present in our data set for this algorithm to work.
This recommendation system recommends movies to the user based on the ratings of a movie that the user previously liked. To achieve this, I have used here the concept of correlations. If you don’t know what correlations is, you can check this out Correlation. Correlation coefficient is always between -1 and 1. The closer it is to 1, the more related two items are.
So basically what we are doing here is taking the rating of the previously watched movie and calculating it’s correlation with the rest of the movies present in the dataset.
import numpy as np
import pandas as pd
movielense = 'ml-latest-small/'
Load Movie and Rating Dataset
#Loading movies
movies = pd.read_csv(movielense+"movies.csv")
movies.head()
movieId | title | genres | |
---|---|---|---|
0 | 1 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
1 | 2 | Jumanji (1995) | Adventure|Children|Fantasy |
2 | 3 | Grumpier Old Men (1995) | Comedy|Romance |
3 | 4 | Waiting to Exhale (1995) | Comedy|Drama|Romance |
4 | 5 | Father of the Bride Part II (1995) | Comedy |
#Loading Rating
ratings = pd.read_csv(movielense+"ratings.csv")
ratings.drop(["timestamp"], axis=1, inplace=True)
ratings.head()
userId | movieId | rating | |
---|---|---|---|
0 | 1 | 31 | 2.5 |
1 | 1 | 1029 | 3.0 |
2 | 1 | 1061 | 3.0 |
3 | 1 | 1129 | 2.0 |
4 | 1 | 1172 | 4.0 |
Joint Movies and Rating
#Change movieId in rating dataset with title (joint)
def replace_name(x):
return movies[movies["movieId"]==x].title.values[0]
ratings.movieId = ratings.movieId.map(replace_name)
ratings.head()
userId | movieId | rating | |
---|---|---|---|
0 | 1 | Dangerous Minds (1995) | 2.5 |
1 | 1 | Dumbo (1941) | 3.0 |
2 | 1 | Sleepers (1996) | 3.0 |
3 | 1 | Escape from New York (1981) | 2.0 |
4 | 1 | Cinema Paradiso (Nuovo cinema Paradiso) (1989) | 4.0 |
Create pivot table for userId and MovieId
Pearsone correlation work only on matrix. Our dataset is row based so we need to convert to matrix by using pandas pivot_table function.
M = ratings.pivot_table(index=['userId'],columns=['movieId'],values='rating')
M.shape
#M
(671, 9064)
Pearsone Correlation
TODO: Explaination aboyt pearsone correlation
def pearsone(s1,s2):
#take 2 pd.series objects and return correlation between them
s1_c = s1 - s1.mean()
s2_c = s2 - s2.mean()
return np.sum(s1_c * s2_c)/np.sqrt(np.sum(s1_c**2)*np.sum(s2_c**2))
pearsone(M["Clerks (1994)"], M["Mallrats (1995)"])
0.26948637162292466
correlation coefficient is always between -1 and 1. The closer it is to 1, the more related two items are.
Recommendation bases on Persone Correlation
Calculating the correlation of the previously watched movie with the rest of the movies in the dataset, we sort them and print the first n recommendations based on the value entered by the user for n.
def get_recs(movie_name,M,num):
#num : number of recommendations
#genres = []
reviews = []
for title in M.columns:
if title == movie_name:
#genres.append()
continue
cor = pearsone(M[movie_name], M[title])
if np.isnan(cor):
continue
else:
reviews.append((title,cor))
reviews.sort(key = lambda tup: tup[1], reverse = True)
return reviews[:num]
Recommendation movies to watch based on "Toy Story (1995)"
n = int(input("Enter the required no. of recommendations:")) #10
mov = input("Enter the name of a movie that you liked:") #Toy Story (1995)
lst = get_recs(mov,M,n)
print ("Recommendations for you:")
lst
Enter the required no. of recommendations:10
Enter the name of a movie that you liked:Toy Story (1995)
/home/hendra_herviawan/bin/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:5: RuntimeWarning: invalid value encountered in double_scalars
"""
Recommendations for you:
[('Toy Story 2 (1999)', 0.38751909607714241),
("Bug's Life, A (1998)", 0.29185057100728379),
('Lion King, The (1994)', 0.26797063953540978),
('Aladdin (1992)', 0.24245219707788779),
('Monsters, Inc. (2001)', 0.23609185999396132),
('Shrek (2001)', 0.23379248872321592),
('Dark Knight, The (2008)', 0.22285149745651747),
('Spider-Man (2002)', 0.22253962836326829),
('Basic Instinct (1992)', 0.2213613466749314),
('Incredibles, The (2004)', 0.22061205440681195)]
The recommendation is good but not perfect, for "Toy story (1995)" pearsone correlation recommend Disney & Animated Movie. More reseach still need to do for filter "Basic Instinct (1992)" appear in our recommendation.