Items

Machine Learning

13 min read

Updated Jul 7, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 7, 2026

Fact-checked

In review queue

Sources

13 citations

Revision

v3 · 2,585 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

In recommendation systems, items are the entities that the system recommends to users. Google's recommendation systems course defines an item, also called a document, as "the entities a system recommends. For the Google Play store, the items are apps to install. For YouTube, the items are videos." ^[13] The word covers anything a service might surface to a person browsing a feed or a catalog: movies on Netflix, products on Amazon, songs on Spotify, videos on YouTube, articles on a news site, restaurants on Yelp, profiles on a dating app, ride routes on Uber, friends on Facebook, or jobs on LinkedIn. Items and users are the two foundational entities in any recommender, and most of the math in the field is about learning how the two relate. The full set of items a system can choose from is called the corpus, and at web scale that corpus can hold billions of candidates. ^[10]

The term sounds generic on purpose. A recommender works the same way whether it is ranking 50 million products or 200 podcasts, so the literature uses "item" as a placeholder for whatever is being recommended. The first paper that formalized item based collaborative filtering, Sarwar and colleagues at the University of Minnesota in 2001, deliberately framed everything in terms of items and users so the algorithm would carry across domains. ^[1] That paper, presented at the 10th International World Wide Web Conference in Hong Kong (pages 285 to 295), has been cited tens of thousands of times and remains the standard reference for item based methods. ^[1]^[4]

What is the difference between an item and a user?

A recommender system has two sides. On one side there are users, the people asking for suggestions, and on the other there are items, the things that can be suggested. Almost every dataset in the field follows this structure: a user identifier, an item identifier, and some signal that links them, such as a rating, a click, a purchase, a watch time, or a like.

This is usually arranged as a user matrix and an item matrix, or together as a single user item interaction matrix where rows are users and columns are items. Each cell holds the interaction between one user and one item, if any exists. Most cells are empty because any given user has only ever touched a tiny fraction of the catalog, which is the well known sparsity problem in recommender systems. ^[6]

The asymmetry matters for design choices. In most consumer applications the number of users grows faster than the number of items, and item profiles change less often than user profiles. A movie does not develop new tastes overnight; a viewer might. That stability is one of the reasons Amazon moved from user based to item based collaborative filtering in 1998, since item similarity scores could be precomputed offline and reused for many users. ^[2]

What is an item identifier and the catalog?

Every item in a recommender starts with a unique item ID. This is the primary key that ties together every signal the system observes about that item, including ratings, clicks, view counts, inventory, price, category, and any text or image describing it. Catalog management is the unglamorous foundation of any production recommender. If item IDs are inconsistent, duplicated, or assigned sloppily, the model cannot learn coherent representations.

Most real catalogs are messier than the tidy MovieLens or Netflix datasets used in papers. An e-commerce site might have item masters with many variants such as a shirt that comes in different colors and sizes, each with its own SKU but sharing parent metadata. Microsoft's Intelligent Recommendations system, for example, distinguishes between standalone items and item masters with variants, and lets operators attach filter values such as category, color, or size to control what is eligible for recommendation. ^[12] Items also have availability windows: an out of stock product or an expired listing usually should not be recommended even if the model would otherwise pick it.

Alongside IDs, items carry features. These come from several places:

Source	Examples
Structured metadata	category, brand, price, release year, director, language, duration
Text	title, description, tags, user reviews, transcripts
Media	product photos, album art, video frames, audio waveforms
Behavior	aggregate click rate, purchase rate, average rating, dwell time

A recommender can use any subset of these depending on the algorithm.

How are items represented?

The heart of modern recommendation is learning a good representation of each item. The early approaches were simple. In content based filtering, each item is a sparse vector of its features such as a TF-IDF vector over its description, and the system recommends items similar to ones the user already liked. The advantage is that brand new items can be scored from day one because their features exist before any user has interacted with them. The disadvantage is that recommendations are only as good as the metadata.

Collaborative filtering takes the opposite approach. It treats items as anonymous IDs and learns about them only through the pattern of interactions they receive. ^[8] Two early variants dominated the field:

User based collaborative filtering finds users who behave like you and recommends what they liked.
Item based collaborative filtering, used at Amazon from 1998 and formalized by Sarwar, Karypis, Konstan, and Riedl in 2001, finds items similar to ones you already liked. ^[1]^[2] Item similarity is computed from the pattern of users who rated both items, typically using cosine similarity or Pearson correlation. Amazon's Linden, Smith, and York described the production version in IEEE Internet Computing in 2003, summarizing the result this way: "By comparing similar items rather than similar customers, item-to-item collaborative filtering scales to very large data sets and produces high-quality recommendations." ^[2] The technique scaled because item to item similarities are stable and can be precomputed offline.

What are item embeddings?

The deeper move, and the one that defines current practice, is to compress each item into a dense low dimensional vector called an item embedding. Instead of storing one row per item with thousands of sparse features, the system learns a vector with maybe 32 to 512 numbers that captures the item's position in some latent taste space.

Matrix factorization made this idea famous during the Netflix Prize competition that ran from 2006 to 2009. Simon Funk published a blog post in 2006 describing a simple stochastic gradient descent algorithm that decomposed the Netflix rating matrix into two smaller matrices: one for users and one for items. ^[5] Each item ended up as a vector of latent factors, and the predicted rating for a user item pair was just the dot product of their two vectors. Funk's approach, often called Funk SVD, became the backbone of the winning Netflix Prize entries and the model template for most of the recommender literature that followed. ^[5]^[9]

The number of latent dimensions is a tuning knob. Too few and the model cannot capture subtle differences between items. Too many and the model overfits and starts memorizing noise. Typical values for production systems sit in the low hundreds. ^[9]

What is a two-tower model and how does it embed items?

Classical matrix factorization has a hard limitation: it only knows about item IDs. If a brand new movie appears in the catalog tomorrow, matrix factorization has no row for it because no user has rated it yet. The fix is to let a neural network compute item embeddings from features instead of looking them up from a static table. That is the core idea of the two-tower model, now standard at YouTube, Google, Pinterest, Spotify, Meta, and many other platforms. ^[10]

In a two tower architecture, one neural network (the item tower) reads in raw item features such as ID, category, description, image features, and tags, then outputs a fixed length item embedding. A second neural network (the user tower) does the same for user features. The score for a user item pair is the dot product of the two embeddings. Because the item tower can use any features, a brand new item can be embedded the moment it is added to the catalog. This is one of the cleanest fixes for the item cold start problem.

At serving time, the item embeddings for the entire catalog (often hundreds of millions of items in industrial systems) are precomputed offline and stored in a vector index. When a user request arrives, the user tower computes a single user embedding on the fly, and the system uses approximate nearest neighbor search to find the top items in the embedding space. ^[10] This two-stage design of cheap retrieval from a corpus of billions followed by an expensive ranking pass was popularized by the 2016 Deep Neural Networks for YouTube Recommendations paper, and it is what makes two tower retrieval scale to massive catalogs without computing a score for every candidate. ^[10]

What is the item cold start problem?

Cold start refers to the difficulty of recommending an item that has no interaction history. A pure collaborative filtering algorithm cannot rank an item that nobody has rated yet, because its latent factors are undefined. ^[7] New items can sit invisible in the catalog for weeks before they collect enough signal to compete with established titles, which discourages catalog growth and creates a feedback loop where popular items keep getting more popular.

There are several common mitigations:

Content based features. If the item has metadata such as genre, brand, description, or thumbnail, a content based or hybrid model can score it from day one based on similarity to known items. ^[7]
Feature based embeddings. Two tower and other neural retrieval models predict an item embedding directly from features, so new items inherit the embedding space without needing interactions. ^[10]
Exploration. Many systems intentionally show new items to a small fraction of users to gather signal. Techniques range from epsilon greedy exploration to multi armed bandits and Thompson sampling.
Popularity priors. When the model is uncertain, defaulting to overall popularity or trending lists gives a reasonable fallback while signal accumulates.

Cold start hits popular items as well, in a subtler way. An old item with only a handful of ratings behaves a lot like a new item: there is not enough data to estimate its quality, and the model tends to under recommend it, which keeps the data sparse, which keeps it under recommended. This is sometimes called the long tail problem and is one of the active research areas in recommender systems. ^[7]

What is the difference between implicit and explicit item signals?

Not all item interactions are equal. The recommender literature splits feedback into two big buckets:

Explicit feedback such as star ratings, thumbs up or down, and survey responses. These are clean signals about preference but they are rare because most users do not bother to rate.
Implicit feedback such as clicks, watch time, scroll depth, add to cart, purchase, and dwell time. These are abundant and noisy, since a click does not always mean love and a non click does not always mean dislike.

Most modern production systems are built around implicit feedback because there is so much more of it. Hu, Koren, and Volinsky's 2008 paper on collaborative filtering for implicit feedback is the canonical reference for handling the noise and confidence levels that come with these signals; it splits each observation into a binary preference and a confidence weight, and it won the 2017 IEEE ICDM 10-Year Highest-Impact Paper Award. ^[11] For items specifically, this means each item has many kinds of interaction data attached to it, and the model has to weigh them by reliability and intent.

What are item to item recommendations?

A related but distinct task is recommending items given an item rather than given a user. This is what powers "people who viewed this also viewed" widgets on product pages, "more like this" sections at the bottom of articles, and end of video autoplay on YouTube. The job is to find items whose embeddings are close to the seed item, often with extra signals such as recent popularity or category constraints layered on top.

Item to item recommendation is technically easier than personalized recommendation because the system does not need a user model. Amazon's original 1998 algorithm was an item to item algorithm, and Amazon's own 2003 paper credits this design with letting recommendations "scale to very large data sets." ^[2] Pinterest's related pins, YouTube's up next suggestions, and Spotify's song radio are all variations on the same pattern, although modern versions personalize the item to item results based on the viewer.

How are items used in large language model recommenders?

More recent work has explored using large language models as recommenders, where items are described in text and the model is asked to rank them in context. In these systems each item becomes a small description rather than just an ID, and the model reasons over candidate items in natural language. This blurs the line between content based and collaborative filtering, since the LLM uses its general knowledge of the world (which is implicit collaborative signal from training data) plus the item description (which is content) to score recommendations. Research papers from 2023 onward have explored this direction, and at least some streaming and shopping platforms are experimenting with hybrid systems that combine a classical retrieval model with an LLM reranker.

Why does catalog quality and item governance matter?

None of this works if the catalog is broken. Practitioners spend a surprising amount of time on item governance: deduplicating products that have multiple SKUs for what is really the same thing, handling translations and locale variants, cleaning up bad images and missing descriptions, and deciding which items should be eligible for recommendation in the first place. ^[12] Adult content, age restricted items, region locked titles, and items violating policy all need to be flagged at the item level so the recommender can filter them.

Item level metadata also drives diversity and fairness adjustments. If the model would otherwise pile up recommendations from a single brand, category, or creator, post processing logic often reranks the top results to spread coverage across more items. This requires that the catalog carry the structural information (category trees, brand IDs, creator IDs) needed for the reranker to know what to balance.

References

Sarwar, B., Karypis, G., Konstan, J., Riedl, J. (2001). Item-Based Collaborative Filtering Recommendation Algorithms. Proceedings of the 10th International Conference on World Wide Web, 285 to 295. https://files.grouplens.org/papers/www10_sarwar.pdf ↩
Linden, G., Smith, B., York, J. (2003). Amazon.com Recommendations: Item-to-Item Collaborative Filtering. IEEE Internet Computing, 7(1), 76 to 80. https://www.cs.umd.edu/~samir/498/Amazon-Recommendations.pdf ↩
Smith, B., Linden, G. (2017). Two Decades of Recommender Systems at Amazon.com. IEEE Internet Computing. https://assets.amazon.science/76/9e/7eac89c14a838746e91dde0a5e9f/two-decades-of-recommender-systems-at-amazon.pdf
Wikipedia: Item-item collaborative filtering. https://en.wikipedia.org/wiki/Item-item_collaborative_filtering ↩
Wikipedia: Matrix factorization (recommender systems). https://en.wikipedia.org/wiki/Matrix_factorization_(recommender_systems) ↩
Wikipedia: Recommender system. https://en.wikipedia.org/wiki/Recommender_system ↩
Wikipedia: Cold start (recommender systems). https://en.wikipedia.org/wiki/Cold_start_(recommender_systems) ↩
Google Machine Learning Crash Course: Collaborative filtering. https://developers.google.com/machine-learning/recommendation/collaborative/basics ↩
Google Machine Learning Crash Course: Matrix factorization. https://developers.google.com/machine-learning/recommendation/collaborative/matrix ↩
Google Cloud: Scaling deep retrieval with TensorFlow two towers architecture. https://cloud.google.com/blog/products/ai-machine-learning/scaling-deep-retrieval-tensorflow-two-towers-architecture ↩
Hu, Y., Koren, Y., Volinsky, C. (2008). Collaborative Filtering for Implicit Feedback Datasets. IEEE International Conference on Data Mining, 263 to 272. http://yifanhu.net/PUB/cf.pdf ↩
Microsoft Learn: Catalog data entities for Intelligent Recommendations. https://learn.microsoft.com/en-us/industry/retail/intelligent-recommendations/catalog-data-entity ↩
Google Machine Learning: Recommendation systems terminology. https://developers.google.com/machine-learning/recommendation/overview/terminology ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

Machine learning terms/All Machine learning terms/Recommendation Systems Terms

What is the difference between an item and a user?

What is an item identifier and the catalog?

How are items represented?

What are item embeddings?

What is a two-tower model and how does it embed items?

What is the item cold start problem?

What is the difference between implicit and explicit item signals?

What are item to item recommendations?

How are items used in large language model recommenders?

Why does catalog quality and item governance matter?

References

Improve this article

Related Articles

A/B Testing

Diffusion model

Dimension Reduction

Dimensions

Discrete Feature

Discriminative Model

What links here

Related Articles

A/B Testing

Diffusion model

Dimension Reduction

Dimensions

Discrete Feature

Discriminative Model

What links here