# Item matrix

> Source: https://aiwiki.ai/wiki/item_matrix
> Updated: 2026-07-11
> Categories: Machine Learning
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

*See also: [Machine learning terms](/wiki/machine_learning_terms)*

In [collaborative filtering](/wiki/collaborative_filtering) and [matrix factorization](/wiki/matrix_factorization) recommender systems, the **item matrix** (commonly written V, sometimes Q or H) is the matrix of latent-factor vectors for items: each row holds one item's low-dimensional [embedding](/wiki/embeddings) and each column is one latent factor learned during training. Multiplied against the [user matrix](/wiki/user_matrix) U, it reconstructs the large, sparse user-item interaction matrix R that records which users have rated, clicked, viewed, or bought which items, so the predicted score for user u on item i is the dot product $$U_u \cdot V_i$$. On the Netflix Prize data, R was a 480,189 by 17,770 matrix that was roughly 99% empty, and matrix factorization over its few filled cells produced the item matrices behind the $1,000,000 winning system [4][24].

The item matrix is the item-side half of the basic matrix factorization model that powered most production [recommendation system](/wiki/recommender_system) work from the mid-2000s onward. In modern neural retrieval architectures, the same lookup table survives as the item-tower output or the item-embedding bank that approximate nearest neighbor (ANN) indexes serve at inference time.

## What is the item matrix in matrix factorization?

Let there be N users and M items, and let R be the N by M interaction matrix whose entry $$r_{u,i}$$ is user u's rating, click, view, or purchase of item i (or missing if no interaction is recorded). Matrix factorization picks a small latent dimension k (typically 16 to 256) and learns two matrices:

- The user matrix U with shape N by k. Row U_u is the latent vector for user u.
- The item matrix V with shape M by k. Row V_i is the latent vector for item i.

The interaction matrix is approximated by the product

$$
R \approx U V^\top
$$

so the predicted score for user u on item i is the dot product

$$
\hat{r}_{u,i} = U_u \cdot V_i = \sum_{f=1}^{k} U_{u,f} V_{i,f}
$$

A more accurate version adds bias terms:

$$
\hat{r}_{u,i} = \mu + b_u + b_i + U_u \cdot V_i
$$

where $$\mu$$ is the global mean rating, $$b_u$$ is a user bias, and $$b_i$$ is the **item bias**. The item bias is one of the most useful pieces of the model in practice; it captures an item's inherent popularity or quality so that V_i does not have to absorb "everyone tends to rate this thing high." Koren, Bell, and Volinsky introduced this baseline-plus-bias formulation in their 2009 IEEE Computer survey and concluded that "matrix factorization models are superior to classic nearest-neighbor techniques for producing product recommendations, allowing the incorporation of additional information such as implicit feedback, temporal effects, and confidence levels" [4]. The formulation remains the default in libraries such as Surprise [21] and Spark MLlib [20].

The rows of V have no fixed meaning. After training they may loosely correspond to genres, price tiers, audience demographics, or production quality, but the optimizer is free to use the dimensions however it wants. This is the central trade-off of model-based collaborative filtering: instead of relying on hand-engineered item features, V is whatever low-rank summary of the interaction data minimizes prediction error.

## How sparse is the interaction matrix the item matrix factorizes?

Real interaction matrices are almost entirely empty, which is the practical reason a dense low-rank item matrix is worth learning in the first place. The Netflix Prize matrix had 480,189 users and 17,770 movies, giving roughly 8.5 billion possible cells, but only 100,480,507 ratings were observed, so about 99% of the matrix was missing and only about 1% was filled [24]. The MovieLens 20M benchmark, with 138,493 users and 27,278 movies, is reported at roughly 99.46% sparsity, meaning only about 0.54% of its 3.7 billion possible cells carry a rating [25]. Storing R densely is therefore wasteful and often impossible; the item matrix V replaces an M-column sparse vector per item with a short dense k-dimensional row, which is both smaller and far cheaper to compare at query time.

## How does the item matrix differ from the rating matrix?

The item matrix is sometimes confused with the broader item feature matrix used in [content-based filtering](/wiki/content-based_filtering). They are different objects.

| matrix | shape | what each row represents | how columns are defined |
| --- | --- | --- | --- |
| item matrix V (matrix factorization) | M by k | latent factor vector for one item | learned during training; columns have no fixed meaning |
| item feature matrix (content-based) | M by d | hand-engineered or pretrained features for one item | columns are explicit attributes (genre, price, TF-IDF tokens, image embedding dimensions) |
| rating / interaction matrix R | N by M | one user's interactions across all items | columns are items (one column per item in the catalog) |
| item-item similarity matrix | M by M | similarities between item i and every other item | columns are items; cells are cosine or Pearson similarities |

The content-based item feature matrix and the matrix-factorization item matrix can be combined. Hybrid models concatenate them, or train a neural item tower whose inputs are the content features and whose output replaces V_i.

## How are item embeddings learned?

There is no single algorithm. Different families of methods produce item matrices with very different properties.

| method | year and reference | how V is produced |
| --- | --- | --- |
| Funk SVD | Funk, 2006 ("Try This At Home" blog) | jointly learn U and V by stochastic gradient descent on observed ratings, one latent factor at a time |
| regularized SVD with bias | Paterek, 2007 KDD Cup | add global mean and item bias b_i; V is regularized with L2 |
| [ALS](/wiki/als) for matrix factorization | Zhou et al., 2008; Hu, Koren, Volinsky, ICDM 2008 | hold V fixed and solve for U by least squares; swap and solve for V; closed-form per row, parallelizable |
| SVD++ | Koren, KDD 2008 | adds a per-item implicit factor y_j; the user vector becomes U_u plus a sum of y_j over interacted items |
| probabilistic matrix factorization | Salakhutdinov and Mnih, NIPS 2007 | MAP estimation with Gaussian priors on U and V |
| non-negative matrix factorization ([NMF](/wiki/nmf)) | Lee and Seung, Nature 1999 | both U and V are constrained non-negative, often producing more interpretable item factors |
| Item2Vec | Barkan and Koenigstein, IEEE MLSP 2016 | apply [word2vec](/wiki/word2vec) skip-gram with negative sampling to sequences of items co-purchased or co-consumed by the same user |
| Prod2Vec | Grbovic et al., KDD 2015 | word2vec on sequences of products from email purchase receipts at Yahoo |
| Meta-Prod2Vec | Vasile et al., RecSys 2016 | Prod2Vec regularized with item metadata, addressing cold start |
| Neural Collaborative Filtering | He et al., WWW 2017 | item embedding fed through an MLP that also takes the user embedding |
| [two-tower model](/wiki/two-tower_model) | Yi et al., RecSys 2019 (YouTube) | item tower is a neural net mapping item content features to V_i; user tower mirrors it |
| PinSage | Ying et al., KDD 2018 (Pinterest) | graph convolutional network over the user-pin-board graph; V_i aggregates neighbor information |

Funk SVD, ALS, and SVD++ produce V as a free lookup table: every item in the catalog gets a row whose values are learned independently. Item2Vec, Prod2Vec, and the two-tower variants instead make V a function of something else (co-occurrence statistics or item content), which is usually what makes them workable for catalogs with constant churn.

## What is Item2Vec and how does it relate to word2vec?

Item2Vec is the most influential reframing of the item matrix. Barkan and Koenigstein's 2016 paper observed that the user-item interaction log has the same shape as a text corpus: each user's session or basket is a "sentence," and each item is a "word." Running [word2vec](/wiki/word2vec) skip-gram with negative sampling (SGNS) on those sequences yields an item matrix whose rows behave like word vectors [8]. Cosine similarity between two rows of V approximates how often the items appear together in user histories, and the geometry supports analogy-style queries. The authors state plainly that "the method is capable of inferring item-item relations even when user information is not available" [8], which is the property that makes it usable when user IDs are unstable or absent.

The attractive properties carried over from word embeddings:

- The item matrix can be trained without any user identity. The model only needs item co-occurrence inside sessions, which is convenient when user IDs are unstable or unavailable.
- It scales to very large catalogs by leveraging negative sampling, which makes the per-step cost independent of M.
- The resulting V is competitive with classical SVD on most retrieval benchmarks reported in the original paper, while being easier to extend with side information [8].

Prod2Vec, published a year earlier by Grbovic et al. at Yahoo, applied the same idea to email purchase receipts. The Yahoo Mail team treated each user's receipts as a basket of products and trained product vectors with skip-gram [9]. Their followup, Meta-Prod2Vec, regularized the embeddings with product metadata so that new SKUs with no purchase history could still be placed in the embedding space [10].

The practical lesson, repeated by Airbnb's listing embeddings (Grbovic and Cheng, KDD 2018) [11], Spotify's track embeddings, and Alibaba's product embeddings [12], is that the item matrix does not have to be fit jointly with a user matrix. Treating sessions as sentences and running a skip-gram-style objective is often enough.

## Why does the item bias absorb item popularity?

The item bias term b_i in the rating equation deserves its own attention because it absorbs an enormous amount of the variance in real data. On the Netflix Prize data, simply predicting the global mean plus user bias plus item bias already beat Cinematch on a sizeable fraction of the held-out ratings, before any U or V is even trained. The reason is intuitive: a Pixar movie is rated higher than a direct-to-DVD release by almost every user, and that effect has nothing to do with personalization.

Separating popularity into b_i has two practical benefits. First, the latent factors V_i are freed up to encode taste differences instead of being burned on "this movie is good in general." Second, the bias terms can be served and updated independently of the heavier latent matrices, which is convenient in production systems that update popularity counts every few minutes but only retrain V nightly or weekly.

A closely related concern is **popularity bias** in recommendations. Because head items appear far more often in training data than long-tail items, both classical matrix factorization and neural retrieval tend to over-represent the head at inference time. A 2024 survey by Klimashevskaia et al. in *User Modeling and User-Adapted Interaction* documents the size of the effect across MovieLens, Yelp, and Amazon datasets [19]. Mitigations include:

- explicit item-side regularizers such as PBiLoss (Sahbi et al., 2025), which add a popularity penalty to the training loss
- item loss equalization, which normalizes per-item loss values during training to give long-tail items a comparable gradient signal
- post-hoc reranking for diversity or exposure fairness on the served list
- inverse-propensity weighting in the implicit-feedback objective

Exposure fairness is a related but distinct concern, especially on platforms where item providers depend on impressions for revenue. Researchers have proposed $$(\alpha, \beta)$$-fairness constraints that require similar items to receive similar coverage, embedded into the matrix factorization objective.

## How is the item matrix used to find similar items?

Once V is trained, the item matrix is most often used not for predicting ratings but for finding **similar items**. Two items i and j are considered similar if their rows have a high cosine similarity:

$$
\mathrm{sim}(i, j) = \frac{V_i \cdot V_j}{\lVert V_i \rVert \, \lVert V_j \rVert}
$$

This is the engine behind "customers who bought this also bought" carousels, "more like this" panels, related-video sidebars, and content-discovery feeds. Item-item retrieval was popularized by Amazon's 2003 paper [2].

Greg Linden, Brent Smith, and Jeremy York's *Amazon.com Recommendations: Item-to-Item Collaborative Filtering* in IEEE Internet Computing (vol. 7, no. 1, 2003, pp. 76 to 80) showed that comparing items rather than users scales much better; they reported that the online portion of the algorithm "scales independently of the number of customers and number of items in the product catalog" [2]. With M items the offline cost is $$O(M^2)$$ at worst (and far less in practice because most pairs share no common buyers), while the online cost per recommendation is $$O(k)$$ where k is the number of items the customer has bought. The same paper showed that item neighborhoods can be precomputed daily for the entire Amazon catalog [2]. In 2017, IEEE Internet Computing's editorial board picked this paper as the single most influential article from the journal's first 20 years [3].

The item matrix V is a learned drop-in replacement for the explicit item-similarity matrix in Amazon-style item-to-item CF. Instead of computing cosine similarity over very long sparse interaction columns, the system computes it over short dense V_i rows. This is far cheaper at query time and supports approximate nearest neighbor search.

## How is the item matrix served at scale?

Production systems with millions of items cannot exhaustively compute $$V_u \cdot V_i$$ for the whole catalog at every request. The item matrix is therefore loaded into a vector index that supports approximate nearest neighbor search.

| index / library | developed by | indexing strategy | typical use |
| --- | --- | --- | --- |
| FAISS | Meta (Facebook AI Research), open source 2017 | IVF, HNSW, product quantization, GPU support | item retrieval at Meta, many open-source recommenders |
| ScaNN | Google, 2020 | anisotropic vector quantization plus tree search | YouTube two-tower retrieval, Google Cloud Vertex AI |
| HNSWlib | Malkov and Yashunin, 2016 | hierarchical navigable small world graphs | Spotify, Pinecone, Weaviate, Qdrant |
| Annoy | Spotify, 2013 | random-projection forests | Spotify radio, smaller-scale services |
| Pinecone, Weaviate, Milvus, Qdrant | various | managed vector databases wrapping HNSW or IVF | hosted item retrieval and semantic search |

These indexes are what make a billion-row item matrix searchable in milliseconds. The FAISS paper (Johnson, Douze, and Jégou, IEEE Transactions on Big Data, 2019) demonstrated billion-scale similarity search on GPUs, using product quantization to compress each item vector to about 8 bytes while keeping distance estimates meaningful [17]. A modern recommender typically stores the item matrix as flat vectors in one of these indexes, recomputes it nightly or hourly during retraining, and serves nearest-neighbor queries with sub-100ms latency over indexes with hundreds of millions of items. The two-tower YouTube system described by Yi et al. (2019) is an explicit example: the item tower output is dumped into a ScaNN index that handles tens of millions of videos, and the served user tower vector is matched against that index in real time [14].

## What is the item cold start problem?

Every row of V corresponds to an item with at least some interaction history. A brand-new item has no history, so there is nothing to fit V_i to and the row would be set to the regularizer's prior (close to zero). This is the item-side **[cold start](/wiki/cold_start)** problem, and it is generally more painful than its user-side counterpart because most platforms add new items continuously while users churn more slowly.

The usual mitigations:

- Hybrid models that combine collaborative latent factors with content features (text, image, metadata). [Hybrid recommender](/wiki/hybrid_recommender) systems concatenate or sum these representations.
- Two-tower models whose item tower takes content features as input. New items get an embedding immediately by passing their text and image through the trained tower, with no interaction data required. This is the strategy YouTube and TikTok use for newly uploaded videos.
- Side-information regularization, as in Meta-Prod2Vec, where item metadata pulls related items together in V even before they have shared co-occurrences [10].
- LLM-generated item descriptions. For 2024-era e-commerce systems, a frozen LLM is used to expand sparse product titles into richer descriptions, which are then embedded by a sentence encoder to seed V_i.
- Multimodal embeddings such as CLIP for visual products. The item matrix can be initialized from CLIP image-and-text embeddings, then fine-tuned with collaborative signal once interactions accumulate.

In practice most large platforms run a hybrid: a content-only embedding handles the first hours or days of an item's life, then the system blends in collaborative signal as views and clicks accumulate, and eventually the collaborative signal dominates.

## How does the item matrix differ from content-based filtering?

It is worth being explicit about how the matrix-factorization item matrix differs from the item representation in pure content-based filtering.

| dimension | matrix factorization V | [content-based filtering](/wiki/content-based_filtering) item vector |
| --- | --- | --- |
| source of features | learned from user-item interactions | derived from item attributes (text, image, metadata) |
| handles new items? | no, cold start | yes, immediately |
| handles new users? | no without bootstrap | yes if user profile features exist |
| captures taste signals invisible in content? | yes (e.g., a movie loved by a particular cluster of viewers) | no, limited to content similarity |
| catalog growth cost | retraining required as items are added | embedding can be computed for a single new item in isolation |
| typical use today | ID-based retrieval, similar-item carousels | item tower in two-tower models, cold-start fallback |

[Hybrid recommender](/wiki/hybrid_recommender) architectures combine both. The item tower in a modern two-tower model takes content features as input but is trained on user-item interaction data, so V_i ends up encoding both content similarity and collaborative signal in the same vector.

## Where is the item matrix used in production?

The item matrix shows up by name or by direct analogy in almost every published large-scale recommender architecture.

| system | reference | role of the item matrix |
| --- | --- | --- |
| Amazon item-to-item CF | Linden, Smith, York, IEEE Internet Computing 2003 | precomputed item-item similarity table; later replaced by learned V in Amazon's neural retrieval |
| Netflix Prize matrix factorization | Koren, Bell, Volinsky, IEEE Computer 2009 | learned V with bias term b_i; central to BellKor's Pragmatic Chaos winning blend |
| YouTube two-tower retrieval | Yi et al., RecSys 2019 | item tower output is V_i; loaded into a ScaNN index for serving tens of millions of videos on [YouTube](/wiki/youtube) |
| Pinterest PinSage | Ying et al., KDD 2018 | V_i aggregated from a graph convolutional network over 3 billion pins and boards |
| Airbnb listing embeddings | Grbovic and Cheng, KDD 2018 | listing-as-word skip-gram, including dwell-time and conversion as positive signals; powers "similar listings" |
| Spotify track embeddings | Spotify engineering blog, multiple posts | item matrix learned from user listening sessions; used for radio and Discover Weekly |
| Alibaba EGES | Wang et al., KDD 2018 | enhanced graph embedding with side information for taobao items |
| Meta DLRM | Naumov et al., 2019 | item embedding tables combined with dense features and feature crosses |
| Twitter / X SimClusters | Twitter engineering 2020 | sparse item embeddings over learned community memberships |
| TikTok recommendation | various engineering posts | two-tower retrieval over short-video item embeddings, refreshed continuously |

Many of these systems do not call V the "item matrix" anymore. They call it the item embedding table, the item tower, or simply "the index." Architecturally it is the same thing.

## How is the latent dimension k chosen?

The latent dimension k controls capacity. Common ranges:

| dataset / system | typical k | notes |
| --- | --- | --- |
| MovieLens 100k / 1M (research benchmarks) | 10 to 50 | enough for the small catalog; larger k overfits |
| Netflix Prize | 50 to 200 | the BellKor team used several k values in their blend |
| medium e-commerce (~10^5 SKUs) | 32 to 128 | balances offline RMSE and serving cost |
| YouTube and TikTok scale (10^8+ items) | 64 to 256 | bottlenecked by ANN index size and serving latency |
| word2vec-style item embeddings | 100 to 300 | inherits the word2vec convention |

Larger k captures more nuance per item but increases memory, slows ANN search, and demands more regularization to avoid overfitting on items with little data. The L2 regularizer $$\lambda \lVert V_i \rVert^2$$ in the matrix factorization loss shrinks rows with few interactions toward zero, which is one of the reasons cold-start items end up indistinguishable until enough signal accumulates.

## What are the limitations of the item matrix?

- One row per item. A catalog with billions of items (TikTok-scale short videos, Amazon-scale SKUs) requires a billion-row item matrix, which dominates memory and ANN index size.
- Static. New behavior does not change V_i until retraining.
- Cold start. New items get the prior; until they accumulate interactions, the system has to fall back to content features.
- Popularity bias. Without explicit countermeasures, V over-represents the head and under-represents the tail.
- Linear in the dot product. Plain $$U_u \cdot V_i$$ misses interaction effects that NCF [15], two-tower MLPs, transformers, and graph models capture.
- Uninterpretable. Unlike NMF, the columns of standard SVD-style V have no guaranteed meaning, which makes debugging and explanation difficult [7].
- Catastrophic update on retrain. Because the dimensions of V have arbitrary orientation, a fresh training run can produce a rotated embedding space that is incompatible with previously cached user vectors. Production systems mitigate this with anchor-item alignment or by warm-starting V from the previous checkpoint.

## Which libraries expose the item matrix?

| library | language | item-matrix support |
| --- | --- | --- |
| Surprise | Python | SVD, SVD++, NMF, KNN baselines; V exposed as `qi` after training |
| Spark MLlib `ALS` | Scala / Python / Java | distributed ALS for explicit and implicit feedback; item factors fetched via `model.itemFactors` |
| `implicit` (Ben Frederickson) | Python with C extensions | fast ALS, BPR, and logistic MF; `model.item_factors` is V |
| LightFM | Python | hybrid model that combines latent and content features in V; supports cold start |
| LibFM | C++ | factorization machines (Steffen Rendle); generalizes V to arbitrary feature interactions |
| TensorFlow Recommenders (TFRS) | Python | modern factorization, retrieval, and two-tower architectures with explicit item tower API |
| Gensim Word2Vec | Python | trains item2vec / prod2vec by treating sessions as sentences |
| PyTorch BigGraph | Python / C++ | knowledge graph and item embedding training at billions-of-nodes scale (Meta) |
| FAISS, ScaNN, HNSWlib | C++ / Python | not training, but serving: store V in an ANN index for sub-100ms retrieval |

## How has the item matrix evolved with deep learning?

The item matrix did not disappear with the deep learning wave; it generalized. In a modern two-tower retrieval system, the item tower replaces the static lookup row with a function of item features, but the output is still a vector that lives in the same space as the user vector and is retrieved by dot product.

Three trends are visible in 2024 to 2026 production systems:

- **Multimodal item embeddings.** Items with rich content (videos, products, listings) are embedded with frozen vision-language models such as CLIP or SigLIP, then fine-tuned with collaborative signal. This collapses the cold-start problem because every new item already has a meaningful vector before any user has clicked.
- **LLM-augmented item representations.** Sparse product titles, song metadata, or movie descriptions are expanded with an LLM into richer text, which is then encoded. The item matrix becomes the output of a description encoder rather than a free lookup.
- **Vector-database-native serving.** Item matrices are increasingly stored in managed vector databases (Pinecone, Weaviate, Milvus, Qdrant) rather than in custom in-process indexes. The item matrix and the ANN index are conceptually one object.

Despite all these changes, the underlying math is the same as Funk's 2006 blog post: a low-rank latent vector for each item, scored against a user vector by dot product [1].

## How big was the item matrix that won the Netflix Prize?

The Netflix Prize is the canonical demonstration of the item matrix at work. Netflix offered $1,000,000 for any team that could beat its in-house Cinematch system by 10% on root mean square error. The winning entry, BellKor's Pragmatic Chaos, was declared on 21 September 2009 with a test RMSE of 0.8567, a 10.06% improvement over Cinematch, and matrix factorization with item and user latent factors plus bias terms was the backbone of the winning blend [4][24]. Because R held only about 100 million ratings out of roughly 8.5 billion possible cells, the entire signal was reconstructed from item matrices V (17,770 rows) and user matrices U (480,189 rows) with latent dimensions typically in the 50 to 200 range. The competition more than any single paper established the user-item interaction matrix, and the latent item matrix that factorizes it, as the standard mental model for recommendation.

## explain like I'm 5

Imagine a giant grid where every row is a movie and every column is a person at the theater. Most squares are empty because most people have not seen most movies. The item matrix is a much smaller grid that gives each movie its own short list of secret numbers describing what kind of movie it is (scary, funny, long, has lots of action, and so on). Every person has their own secret list of numbers too. To guess whether a person will like a movie, you multiply their numbers by the movie's numbers and add them up. The item matrix is the part of the system that holds the movie cards.

## references

1. Funk, S. (2006). *Netflix Update: Try This At Home*. https://sifter.org/simon/journal/20061211.html
2. Linden, G., Smith, B., and York, J. (2003). *Amazon.com Recommendations: Item-to-Item Collaborative Filtering*. IEEE Internet Computing, 7(1), 76 to 80. https://ieeexplore.ieee.org/document/1167344
3. Smith, B., and Linden, G. (2017). *Two Decades of Recommender Systems at Amazon.com*. IEEE Internet Computing, 21(3), 12 to 18. https://assets.amazon.science/76/9e/7eac89c14a838746e91dde0a5e9f/two-decades-of-recommender-systems-at-amazon.pdf
4. Koren, Y., Bell, R., and Volinsky, C. (2009). *Matrix Factorization Techniques for Recommender Systems*. IEEE Computer, 42(8), 30 to 37. https://datajobs.com/data-science-repo/Recommender-Systems-%5BNetflix%5D.pdf
5. Hu, Y., Koren, Y., and Volinsky, C. (2008). *Collaborative Filtering for Implicit Feedback Datasets*. Proceedings of the 2008 IEEE International Conference on Data Mining (ICDM), 263 to 272. http://yifanhu.net/PUB/cf.pdf
6. Salakhutdinov, R., and Mnih, A. (2008). *Probabilistic Matrix Factorization*. Advances in Neural Information Processing Systems 20 (NIPS 2007). https://papers.nips.cc/paper/3208-probabilistic-matrix-factorization
7. Lee, D. D., and Seung, H. S. (1999). *Learning the parts of objects by non-negative matrix factorization*. Nature, 401, 788 to 791. https://www.nature.com/articles/44565
8. Barkan, O., and Koenigstein, N. (2016). *Item2Vec: Neural Item Embedding for Collaborative Filtering*. 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP). https://arxiv.org/abs/1603.04259
9. Grbovic, M., Radosavljevic, V., Djuric, N., Bhamidipati, N., Savla, J., Bhagwan, V., and Sharp, D. (2015). *E-commerce in Your Inbox: Product Recommendations at Scale*. Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1809 to 1818.
10. Vasile, F., Smirnova, E., and Conneau, A. (2016). *Meta-Prod2Vec: Product Embeddings Using Side-Information for Recommendation*. Proceedings of the 10th ACM Conference on Recommender Systems, 225 to 232. https://arxiv.org/abs/1607.07326
11. Grbovic, M., and Cheng, H. (2018). *Real-time Personalization using Embeddings for Search Ranking at Airbnb*. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 311 to 320.
12. Wang, J., Huang, P., Zhao, H., Zhang, Z., Zhao, B., and Lee, D. L. (2018). *Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (EGES)*. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 839 to 848. https://arxiv.org/abs/1803.02349
13. Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W. L., and Leskovec, J. (2018). *Graph Convolutional Neural Networks for Web-Scale Recommender Systems (PinSage)*. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 974 to 983. https://arxiv.org/abs/1806.01973
14. Yi, X., Yang, J., Hong, L., Cheng, D. Z., Heldt, L., Kumthekar, A., Zhao, Z., Wei, L., and Chi, E. (2019). *Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations*. Proceedings of the 13th ACM Conference on Recommender Systems, 269 to 277. https://research.google/pubs/sampling-bias-corrected-neural-modeling-for-large-corpus-item-recommendations/
15. He, X., Liao, L., Zhang, H., Nie, L., Hu, X., and Chua, T. S. (2017). *Neural Collaborative Filtering*. Proceedings of the 26th International Conference on World Wide Web, 173 to 182. https://arxiv.org/abs/1708.05031
16. Naumov, M., et al. (2019). *Deep Learning Recommendation Model for Personalization and Recommendation Systems (DLRM)*. arXiv preprint. https://arxiv.org/abs/1906.00091
17. Johnson, J., Douze, M., and Jégou, H. (2019). *Billion-scale similarity search with GPUs (FAISS)*. IEEE Transactions on Big Data, 7(3), 535 to 547. https://github.com/facebookresearch/faiss
18. Guo, R., Sun, P., Lindgren, E., Geng, Q., Simcha, D., Chern, F., and Kumar, S. (2020). *Accelerating Large-Scale Inference with Anisotropic Vector Quantization (ScaNN)*. Proceedings of the 37th International Conference on Machine Learning. https://arxiv.org/abs/1908.10396
19. Klimashevskaia, A., Jannach, D., Elahi, M., and Trattner, C. (2024). *A survey on popularity bias in recommender systems*. User Modeling and User-Adapted Interaction, 34. https://link.springer.com/article/10.1007/s11257-024-09406-0
20. Apache Spark documentation. *Collaborative Filtering (MLlib ALS)*. https://spark.apache.org/docs/latest/ml-collaborative-filtering.html
21. Hug, N. (2020). *Surprise: A Python library for recommender systems*. Journal of Open Source Software, 5(52), 2174. https://surprise.readthedocs.io/
22. Frederickson, B. *implicit: Fast Python Collaborative Filtering for Implicit Feedback Datasets*. https://github.com/benfred/implicit
23. Kula, M. (2015). *Metadata Embeddings for User and Item Cold-start Recommendations (LightFM)*. Proceedings of the 2nd Workshop on New Trends in Content-based Recommender Systems. https://arxiv.org/abs/1507.08439
24. Bennett, J., and Lanning, S. (2007). *The Netflix Prize*. Proceedings of KDD Cup and Workshop 2007. Dataset: 480,189 users, 17,770 movies, 100,480,507 ratings; Grand Prize awarded 2009 to BellKor's Pragmatic Chaos (test RMSE 0.8567, 10.06% over Cinematch). https://en.wikipedia.org/wiki/Netflix_Prize
25. Harper, F. M., and Konstan, J. A. (2015). *The MovieLens Datasets: History and Context*. ACM Transactions on Interactive Intelligent Systems, 5(4). MovieLens 20M: 138,493 users, 27,278 movies, about 20 million ratings (roughly 99.46% sparse). https://grouplens.org/datasets/movielens/20m/