PROBLEM STATEMENT

Netflix provided a lot of anonymous rating data, and a prediction accuracy bar that is 10% better than what Cinematch can do on the same training data set. (Accuracy is a measurement of how closely predicted ratings of movies match subsequent actual ratings.)

ALGORITHMS

XGBoost, Matrix Factorization, Support Vector, KNN Base Line Models.

OBJECTIVES

DATA FORMAT

TYPE OF MACHINE LEARNING PROBLEM

For a given movie and user we need to predict the rating would be given by him/her to the movie. The given problem is a Recommendation problem It can also be converted as a Regression problem.

PERFORMANCE METRIC

MACHINE LEARNING OBJECTIVE and CONSTRAINTS

EXPLORATORY DATA ANALYSIS

Preprocessing
Training the model
Machine Learning Steps
Featurizing train and test data

This problem can be solved using following machine learning techniques.

RECOMMENDATION MODEL

Transforming data for Surprise models:

Transforming train data:
Transforming test data:

Applying Machine Learning model.

XGBoost with initial 13 features

TEST DATA

RMSE : 1.0890322448240302

MAPE : 35.13968692492444

Suprise BaselineModel

TEST DATA

RMSE : 1.0865215481719563

MAPE : 34.9957270093008

XGBoost with initial 13 features + Surprise Baseline predictor

TEST DATA

RMSE : 1.0891181427027241

MAPE : 35.13135164276489

Surprise KNNBaseline with user user similarities

Test Data

RMSE : 1.0865005562678032

MAPE : 35.02325234274119

Surprise KNNBaseline with movie movie similarities

Test Data

RMSE : 1.0868914468761874

MAPE : 35.02725521759712

XGBoost with initial 13 features + Surprise Baseline predictor + KNNBaseline predictor

TEST DATA

RMSE : 1.088749005744821

MAPE : 35.188974153659295

REGRESSION MODEL

Matrix Factorization Techniques

SVD Matrix Factorization User Movie intractions

Test Data

RMSE : 1.0860031195730506

MAPE : 34.94819349312387

SVD Matrix Factorization with implicit feedback from user ( user rated movies )

Test Data

RMSE : 1.0862780572420558

MAPE : 34.909882014758175

XgBoost with 13 features + Surprise Baseline + Surprise KNNbaseline + MF Techniques

TEST DATA

RMSE : 1.0891599523508655

MAPE : 35.12646240961147

XgBoost with Surprise Baseline + Surprise KNNbaseline + MF Techniques

TEST DATA

RMSE : 1.095123189648495

MAPE : 35.54329712868095

Model Performence