Netflix Prize

AI Events AI History Machine Learning

17 min read

Updated Jun 24, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 24, 2026

Fact-checked

In review queue

Sources

15 citations

Revision

v2 · 3,367 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

The Netflix Prize was an open machine learning competition, run by Netflix from October 2, 2006 to September 21, 2009, that offered US$1,000,000 to the first team that could improve the accuracy of Netflix's in-house movie recommendation system, Cinematch, by at least 10 percent. Accuracy was measured as root mean squared error (RMSE) on a held-out set of customer ratings, and the grand prize was won by the team BellKor's Pragmatic Chaos, which achieved a 10.06 percent improvement (test RMSE 0.8567 versus Cinematch's 0.9525). ^[1]^[2] The contest released a public dataset of 100,480,507 ratings from 480,189 anonymous users across 17,770 movies, attracted tens of thousands of teams from more than 180 countries, and is widely credited with popularizing matrix factorization and ensemble methods for collaborative filtering. ^[3]^[4]

The Netflix Prize also became a defining cautionary tale about data privacy. In 2008, Arvind Narayanan and Vitaly Shmatikov of the University of Texas at Austin showed that the supposedly anonymous ratings could be re-identified using only a small amount of public auxiliary information from the Internet Movie Database (IMDb). ^[5] A class-action privacy lawsuit (Doe v. Netflix) and a Federal Trade Commission inquiry followed in 2009, and in March 2010 Netflix cancelled a planned sequel competition, the "Netflix Prize 2", in response. ^[6]^[7]

What was the Netflix Prize?

The Netflix Prize was announced on October 2, 2006 at a launch event in New York. ^[1] The rules, posted at netflixprize.com, were simple. Netflix released a training dataset of customer ratings and asked competitors to predict ratings on a withheld qualifying set. Predictions were scored by RMSE, and any team that beat Cinematch by 10 percent on the private test portion of the qualifying set would win one million dollars. Two interim Progress Prizes of US$50,000 each were available at the end of 2007 and 2008 for the team in the lead, provided they had improved by at least 1 percent over the previous year's best. ^[4]

A crucial design choice made the contest both fair and game-theoretic: Netflix never told entrants which of their submitted predictions were used for the public quiz score and which were used for the private test score, which made it difficult to overfit by repeated probing of the leaderboard. ^[4] Submissions were unlimited, but each team could only submit once per day.

Netflix had several motivations. Cinematch, deployed since 2000, was a competent neighborhood-based collaborative filter that powered the company's DVD-by-mail recommendations. Internal improvements had stalled at small RMSE gains. James Bennett and Stan Lanning, who designed the contest, hoped that exposing the problem to outside researchers and offering a large enough prize would draw in the academic and hobbyist communities that Netflix could not hire all of. ^[1] The format borrowed from earlier challenges such as the KDD Cup, but the size of the dataset and the size of the prize were both unprecedented.

How big was the Netflix Prize dataset?

The published training set contained 100,480,507 ratings of 17,770 movies given by 480,189 anonymous customers between October 1998 and December 2005. ^[4] Ratings were integers from 1 to 5 stars, and each record was a quadruplet of user ID, movie ID, date of rating, and star grade. ^[4] To enable in-sample tuning, Netflix released a probe set of 1,408,395 ratings drawn from the training data with statistical properties matching the held-out evaluation set, so teams could estimate their leaderboard score locally. ^[4]

The qualifying set, used for scoring, contained 2,817,131 user-movie pairs without ratings. ^[4] Netflix split this internally into a quiz set of 1,408,342 ratings (used for the public leaderboard) and a test set of 1,408,789 ratings (used for the private score that decided the prize). The qualifying and probe sets were both derived from the 9 most recent ratings of each user. ^[4] Only Netflix's judges knew which prediction belonged to which subset, and quiz scores were rounded to four decimal places to limit information leakage.

Subset	Ratings	Purpose
Training set	100,480,507	Model fitting
Probe set	1,408,395	In-sample validation, drawn from training data
Quiz set	1,408,342	Public leaderboard score
Test set	1,408,789	Private score that decided the grand prize

The data were heavily imbalanced. A handful of users had rated tens of thousands of movies, while most users had rated only a few; popular films had hundreds of thousands of ratings, while obscure films had a handful. The most-rated film in the dataset was Miss Congeniality (2000). ^[4]

How was the Netflix Prize scored? RMSE and the 10 percent target

The metric was root mean squared error between predicted ratings and actual ratings on the test set. Lower is better. Cinematch's RMSE on the quiz set at the start of the contest was 0.9514, and Netflix reported the test-set baseline at 0.9525, which is the figure used to compute the 10 percent target. ^[4] A trivial predictor that always returned the global mean rating scored about 1.0540.

The grand prize threshold was a 10 percent improvement on the test set, which works out to RMSE 0.8572 (a corresponding quiz RMSE of about 0.8563). ^[4] The winning submission scored 0.8567, an improvement of 10.06 percent. ^[2] To put that gap in context, the difference between Cinematch and the trivial mean predictor (0.9525 versus 1.0540) was about 10 percent, and three years of intense competition were required to close roughly the same distance again.

When did each milestone happen? Timeline

Date	Event
October 2, 2006	Netflix Prize launched in New York
October 8, 2006	First team beats Cinematch on the quiz leaderboard
November 13, 2007	First Progress Prize ($50,000) awarded to KorBell (later renamed BellKor) at RMSE 0.8712, an 8.43 percent improvement
2008	Second Progress Prize awarded to BellKor in BigChaos at RMSE 0.8616
June 26, 2009	BellKor's Pragmatic Chaos crosses the 10 percent threshold (Quiz RMSE 0.8558) and triggers the 30-day "last call" period
July 25-26, 2009	The Ensemble, a 30-plus-team coalition, ties BellKor's Pragmatic Chaos at the same Quiz RMSE; tie broken by submission timestamp (BellKor's Pragmatic Chaos submitted about 20 minutes earlier)
September 18, 2009	Netflix announces BellKor's Pragmatic Chaos as the winner with Test RMSE 0.8567
September 21, 2009	Grand Prize awarded at a ceremony in New York
August 2009	Netflix announces a sequel competition, "Netflix Prize 2"
December 17, 2009	Doe v. Netflix class-action lawsuit filed
March 12, 2010	Netflix cancels Netflix Prize 2, citing the FTC inquiry and the Doe lawsuit; settles with the FTC and plaintiffs shortly afterward

What algorithmic ideas emerged from the Netflix Prize?

The Netflix Prize is widely credited with popularising several recommender-system techniques that had existed in research but were not yet standard industrial practice. The most important was the latent-factor approach based on matrix factorization. As Koren, Bell, and Volinsky put it in their 2009 IEEE Computer survey, "As the Netflix Prize competition has demonstrated, matrix factorization models are superior to classic nearest-neighbor techniques for producing product recommendations." ^[3]

Matrix factorization

The core idea, developed in the contest by Simon Funk (Brandyn Webb) in late 2006 and refined into industrial form by Yehuda Koren, Robert Bell, and Chris Volinsky, was to model the user-item rating matrix R as the product of two low-rank matrices: a user-factor matrix P and an item-factor matrix Q. Each user and each movie is represented by a vector of, say, 50 to 200 latent dimensions, and the predicted rating is the dot product of the relevant user and item vectors plus per-user and per-item biases and a global mean. The factors are learned by stochastic gradient descent on the observed ratings with L2 regularisation. ^[3]

This is closely related to a truncated singular value decomposition, but classical SVD requires a fully populated matrix and the Netflix matrix was about 99 percent missing. Funk's contribution was to fit the factorization only on the observed entries, sidestepping the missing-data problem. Koren, Bell, and Volinsky's 2009 IEEE Computer article "Matrix Factorization Techniques for Recommender Systems" became the canonical reference for the approach and is one of the most cited papers in the recommender-system literature. ^[3]

SVD++ and asymmetric SVD

At KDD 2008, Yehuda Koren introduced two extensions in the paper "Factorization Meets the Neighborhood". ^[8] Asymmetric SVD replaces the per-user factor vector with a sum over the items the user has rated, which means new users do not require model retraining. SVD++ goes further by adding a second item-factor vector that captures implicit feedback, the bare fact that a user chose to rate an item, regardless of the score. Both ideas exploit information that pure rating prediction ignores, and both contributed materially to the final solution.

Neighborhood models with learned weights

Classical collaborative filtering computed similarity between users or items using fixed formulas (Pearson correlation, cosine similarity) and predicted ratings as similarity-weighted averages of neighbors' ratings. Bell and Koren's 2007 paper "Improved Neighborhood-based Collaborative Filtering" replaced the heuristic similarities with interpolation weights learned by least squares, producing a substantially better neighborhood model. ^[9] The neighborhood family was important because it captured local effects (a user who likes one obscure film tends to like a closely related one) that latent-factor models smooth out.

Restricted Boltzmann Machines

In 2007 Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey Hinton showed at ICML that Restricted Boltzmann Machines, a class of two-layer stochastic neural networks, could be applied to the rating-prediction problem and slightly outperformed carefully tuned SVD models. ^[10] Their RBM-based predictors were combined with SVD models in many of the top entries and ended up as a key ingredient of the BellKor's Pragmatic Chaos blend.

Temporal dynamics

Koren's 2009 KDD paper "Collaborative Filtering with Temporal Dynamics" introduced time-aware variants of the SVD++ and neighborhood models. ^[11] The data spanned more than seven years, and during that period the rating scale itself drifted: average ratings crept upward in early 2004 (probably because Netflix changed the wording on its rating widget) and individual users' baselines drifted with mood, age, or context. The temporal models captured these effects with time-dependent biases and time-dependent factor vectors and were responsible for some of the largest single-model gains during the final year.

Blending and stacking

No single model ever crossed the 10 percent line. The winning submission was a blend of 107 different predictors combined first by linear regression and later by gradient-boosted decision trees. ^[2] The lesson, repeated by every top team, was that ensembling diverse, individually mediocre models almost always beat any single careful model. This finding shaped the conventional wisdom for ensemble methods in machine-learning competitions throughout the next decade.

Technique	Originator	Year	Role in the winning solution
Funk SVD (matrix factorization on observed entries)	Simon Funk	2006	Core latent-factor predictor
Improved neighborhood model	Bell, Koren	2007	Local effects, complement to factor models
RBM for collaborative filtering	Salakhutdinov, Mnih, Hinton	2007	Diverse predictor for the blend
Asymmetric SVD, SVD++	Koren	2008	Implicit feedback, no retraining for new users
Temporal dynamics (timeSVD++, time-aware neighborhood)	Koren	2009	Modeled rating-scale drift over the seven-year span
Gradient-boosted decision tree blending	BellKor's Pragmatic Chaos	2009	Combined 107 predictors into the final submission

Who won the Netflix Prize? The teams

Three research groups dominated the final years and eventually merged into the winning entry. The seven-person winning team consisted of Robert Bell, Martin Chabbert, Michael Jahrer, Yehuda Koren, Martin Piotte, Andreas Toscher, and Chris Volinsky. ^[2]^[12]

Team	Members	Affiliation
BellKor (originally KorBell)	Yehuda Koren, Robert Bell, Chris Volinsky	AT&T Labs (later Yahoo Research for Koren)
BigChaos	Andreas Toscher, Michael Jahrer	Commendo Research and Consulting, Austria
Pragmatic Theory	Martin Piotte, Martin Chabbert	Independent engineers, Quebec
BellKor's Pragmatic Chaos	The seven names above	Joint team that won the Grand Prize
The Ensemble	A coalition of more than 30 individual teams	Open consortium that tied at the deadline

BellKor won the 2007 Progress Prize as a three-person AT&T team. For the 2008 Progress Prize they joined forces with BigChaos as "BellKor in BigChaos" and reached RMSE 0.8616. By spring 2009 the gains from individual models had largely been exhausted, so in the final months several leading teams combined. BellKor in BigChaos absorbed Pragmatic Theory to form BellKor's Pragmatic Chaos. A separate consortium, The Ensemble, formed by aggregating dozens of mid-leaderboard teams whose predictions were diverse enough to blend usefully.

The finish was extraordinarily close. On June 26, 2009 BellKor's Pragmatic Chaos posted a quiz RMSE of 0.8558 (about 10.05 percent improvement), which under the rules opened a 30-day "last call" window. ^[4] On July 26, the last possible day, The Ensemble matched the same quiz RMSE down to four decimal places, but their submission arrived about 20 minutes after BellKor's Pragmatic Chaos's final entry, which was time-stamped July 26, 2009 at 18:18:28 UTC. ^[4]^[12] The tie-breaking rule was the submission timestamp. Netflix scored both teams on the private test set in the weeks that followed. The two teams ended up effectively tied on the test set as well, with BellKor's Pragmatic Chaos at 0.8567 narrowly ahead of The Ensemble. ^[2] BellKor's Pragmatic Chaos was declared the winner.

What was the Netflix Prize privacy controversy?

Netflix released the training data with user IDs replaced by random integers and with no demographic information attached, and described the data as anonymous. In 2008 Arvind Narayanan and Vitaly Shmatikov of the University of Texas at Austin published "Robust De-anonymization of Large Sparse Datasets" at the IEEE Symposium on Security and Privacy. ^[5] They showed that the high dimensionality and sparsity of the rating vectors made each user's rating history nearly unique, so a small amount of side information was enough to identify them. In the paper's own words, "We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber's record in the dataset. Using the Internet Movie Database as the source of background knowledge, we successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive information." ^[5]

The paper became a landmark in privacy research because it generalised: the same attack could in principle work on any high-dimensional sparse behavioural dataset, not just movie ratings. It is frequently cited as evidence that simple removal of names and addresses is not sufficient anonymisation for rich behavioural data, and influenced subsequent work on differential privacy. ^[5]

In August 2009 Netflix announced a sequel competition that would have released additional, richer data. Four anonymous Netflix subscribers filed Doe v. Netflix in December 2009 alleging violations of the Video Privacy Protection Act, and the Federal Trade Commission opened an inquiry into how the new release would affect customer privacy. ^[6]^[7] On March 12, 2010 Netflix announced that it was cancelling Netflix Prize 2 and reached a settlement with the plaintiffs and the FTC shortly afterward. ^[6] The original Netflix Prize dataset remained available for several years afterward and was widely used in academic research, but Netflix never released a successor.

Did Netflix ever use the winning algorithm?

Netflix never put the full BellKor's Pragmatic Chaos solution into production. In a 2012 post on the Netflix Tech Blog, Xavier Amatriain and Justin Basilico explained that two algorithms developed during the contest, both from the 2007 Progress Prize era, were folded into Netflix's recommendation pipeline and stayed there for years, but the final 107-predictor blend was not. They wrote that "the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment." ^[13] The two algorithms they did keep were a matrix factorization (SVD) model and a Restricted Boltzmann Machine, which, in their words, "we put into production, where they are still used as part of our recommendation engine." ^[13]

More importantly, Netflix's product changed underneath the prize. By 2009 streaming had begun to overtake DVD-by-mail, and the recommendation problem moved from "predict what star rating this user would give this movie if they watched it" to "choose what to put on the home page so the user starts watching now". ^[13] The latter is a top-N ranking problem, often without any explicit ratings at all, and has different evaluation metrics, different inputs (clicks, dwell time, completion rates), and different optimisation targets than RMSE on five-star ratings. The Netflix Prize had pushed the rating-prediction problem about as far as it could be pushed at the time, which is partly why Netflix moved on.

Why does the Netflix Prize still matter? Legacy

For recommender-system research, the Netflix Prize moved matrix factorization from a niche technique to the default starting point. The 2009 IEEE Computer paper by Koren, Bell, and Volinsky has been cited tens of thousands of times and is taught in nearly every graduate-level course on recommender systems. ^[3] SVD++, time-aware models, and the recipe of "learn many models, then blend" all became standard practice in industry.

For competitive data science, the prize set the template that Kaggle and similar platforms later inherited: a public dataset, a held-out leaderboard, a private test set to discourage overfitting, and a clear single-number evaluation metric. The dual-leaderboard idea, with quiz scores public and test scores hidden, became a staple of subsequent competitions. The contest also demonstrated that small, distributed teams of strangers could outperform large internal research groups on focused problems, an observation that informed the rise of crowdsourced ML throughout the 2010s.

For data privacy, the Narayanan and Shmatikov result and the subsequent legal action helped move the conversation away from informal anonymisation and toward formal privacy guarantees, particularly differential privacy. It also made companies considerably more cautious about releasing real user data for academic challenges. The contrast is visible in later contests: most large-scale industry-sponsored competitions since 2010 either use synthetic data, heavily aggregated data, or data that is pre-screened by privacy reviewers.

For Netflix itself, the prize is remembered as a marketing and research success more than a product success. The company spent about a million dollars in cash and a great deal of engineering time and got hundreds of papers, dozens of new techniques, and an enduring association with cutting-edge machine learning. The fact that the winning algorithm was never fully deployed has become a standard teaching example of the gap between offline benchmark improvements and real product impact. ^[13]

References

Bennett, J. and Lanning, S. (2007). The Netflix Prize. Proceedings of KDD Cup and Workshop 2007. https://www.cs.uic.edu/~liub/KDD-cup-2007/proceedings/The-Netflix-Prize-Bennett.pdf ↩
Koren, Y. (2009). The BellKor Solution to the Netflix Grand Prize. Technical report. https://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf ↩
Koren, Y., Bell, R., and Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. IEEE Computer, 42(8), 30-37. https://ieeexplore.ieee.org/document/5197422 ↩
Wikipedia contributors. Netflix Prize. https://en.wikipedia.org/wiki/Netflix_Prize ↩
Narayanan, A. and Shmatikov, V. (2008). Robust De-anonymization of Large Sparse Datasets. IEEE Symposium on Security and Privacy. https://www.cs.cornell.edu/~shmat/shmat_oak08netflix.pdf ↩
Electronic Privacy Information Center (2010). Netflix Cancels Contest over Privacy Concerns. https://archive.epic.org/2010/03/netflix-cancels-contest-over-p.html ↩
Center for Information Technology Policy, Princeton (2010). Netflix Cancels the Netflix Prize 2. https://blog.citp.princeton.edu/2010/03/12/netflix-cancels-netflix-prize-2/ ↩
Koren, Y. (2008). Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model. KDD 2008. https://people.engr.tamu.edu/huangrh/Spring16/papers_course/matrix_factorization.pdf ↩
Bell, R. and Koren, Y. (2007). Improved Neighborhood-based Collaborative Filtering. KDD Cup and Workshop 2007. ↩
Salakhutdinov, R., Mnih, A., and Hinton, G. (2007). Restricted Boltzmann Machines for Collaborative Filtering. ICML 2007. https://www.cs.toronto.edu/~rsalakhu/papers/rbmcf.pdf ↩
Koren, Y. (2009). Collaborative Filtering with Temporal Dynamics. KDD 2009. https://cseweb.ucsd.edu/classes/fa17/cse291-b/reading/p447-koren.pdf ↩
Volinsky, C. BellKor's Pragmatic Chaos. AT&T Research. http://stats.research.att.com/volinsky/bpc.html ↩
Amatriain, X. and Basilico, J. (2012). Netflix Recommendations: Beyond the 5 Stars (Part 1). Netflix Tech Blog. https://netflixtechblog.com/netflix-recommendations-beyond-the-5-stars-part-1-55838468f429 ↩
IEEE Spectrum (2009). Million-Dollar Netflix Prize Won. https://spectrum.ieee.org/million-dollar-netflix-prize-won
Netflix Prize official site (2009). Grand Prize awarded to team BellKor's Pragmatic Chaos. https://www.netflixprize.com/community/topic_1537.html

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Collaborative filtering Items Kaggle Machine learning terms/Recommendation Systems Matrix factorization Recommender System User matrix Weighted Alternating Least Squares (WALS)

What was the Netflix Prize?

How big was the Netflix Prize dataset?

How was the Netflix Prize scored? RMSE and the 10 percent target

When did each milestone happen? Timeline

What algorithmic ideas emerged from the Netflix Prize?

Matrix factorization

SVD++ and asymmetric SVD

Neighborhood models with learned weights

Restricted Boltzmann Machines

Temporal dynamics

Blending and stacking

Who won the Netflix Prize? The teams

What was the Netflix Prize privacy controversy?

Did Netflix ever use the winning algorithm?

Why does the Netflix Prize still matter? Legacy

References

Improve this article

Related Articles

DARPA Robotics Challenge

ICML

A Survey of Techniques for Maximizing LLM Performance (OpenAI Dev Day 2023)

Andrew Ng: Opportunities in AI - 2023 (Stanford)

The New Stack and Ops for AI (OpenAI Dev Day 2023)

Consumer Electronics Show

What links here

Related Articles

DARPA Robotics Challenge

ICML

A Survey of Techniques for Maximizing LLM Performance (OpenAI Dev Day 2023)

Andrew Ng: Opportunities in AI - 2023 (Stanford)

The New Stack and Ops for AI (OpenAI Dev Day 2023)

Consumer Electronics Show

What links here