# Items

> Source: https://aiwiki.ai/wiki/items
> Updated: 2026-05-11
> Categories: Machine Learning
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

*See also: [Machine learning terms](/wiki/machine_learning_terms), [Machine learning terms/Recommendation Systems](/wiki/machine_learning_terms_recommendation_systems)*

In [recommendation systems](/wiki/recommender_system), **items** are the entities that the system recommends to users. The word covers anything a service might surface to a person browsing a feed or a catalog: movies on Netflix, products on Amazon, songs on Spotify, videos on YouTube, articles on a news site, restaurants on Yelp, profiles on a dating app, ride routes on Uber, friends on Facebook, or jobs on LinkedIn. Items and [users](/wiki/users) are the two foundational entities in any recommender, and most of the math in the field is about learning how the two relate.

The term sounds generic on purpose. A recommender works the same way whether it is ranking 50 million products or 200 podcasts, so the literature uses "item" as a placeholder for whatever is being recommended. The first paper that formalized item based collaborative filtering, Sarwar and colleagues at the University of Minnesota in 2001, deliberately framed everything in terms of items and users so the algorithm would carry across domains.

## Items vs users

A recommender system has two sides. On one side there are users, the people asking for suggestions, and on the other there are items, the things that can be suggested. Almost every dataset in the field follows this structure: a user identifier, an item identifier, and some signal that links them, such as a rating, a click, a purchase, a watch time, or a like.

This is usually arranged as a [user matrix](/wiki/user_matrix) and an [item matrix](/wiki/item_matrix), or together as a single user item interaction matrix where rows are users and columns are items. Each cell holds the interaction between one user and one item, if any exists. Most cells are empty because any given user has only ever touched a tiny fraction of the catalog, which is the well known sparsity problem in recommender systems.

The asymmetry matters for design choices. In most consumer applications the number of users grows faster than the number of items, and item profiles change less often than user profiles. A movie does not develop new tastes overnight; a viewer might. That stability is one of the reasons Amazon moved from user based to item based collaborative filtering in 1998, since item similarity scores could be precomputed offline and reused for many users.

## Item identifiers and the catalog

Every item in a recommender starts with a unique item ID. This is the primary key that ties together every signal the system observes about that item, including ratings, clicks, view counts, inventory, price, category, and any text or image describing it. Catalog management is the unglamorous foundation of any production recommender. If item IDs are inconsistent, duplicated, or assigned sloppily, the model cannot learn coherent representations.

Most real catalogs are messier than the tidy MovieLens or Netflix datasets used in papers. An e-commerce site might have item masters with many variants such as a shirt that comes in different colors and sizes, each with its own SKU but sharing parent metadata. Microsoft's Intelligent Recommendations system, for example, distinguishes between standalone items and item masters with variants, and lets operators attach filter values such as category, color, or size to control what is eligible for recommendation. Items also have availability windows: an out of stock product or an expired listing usually should not be recommended even if the model would otherwise pick it.

Alongside IDs, items carry features. These come from several places:

| Source | Examples |
|---|---|
| Structured metadata | category, brand, price, release year, director, language, duration |
| Text | title, description, tags, user reviews, transcripts |
| Media | product photos, album art, video frames, audio waveforms |
| Behavior | aggregate click rate, purchase rate, average rating, dwell time |

A recommender can use any subset of these depending on the algorithm.

## Item representations

The heart of modern recommendation is learning a good representation of each item. The early approaches were simple. In [content based filtering](/wiki/content_based_filtering), each item is a sparse vector of its features such as a [TF-IDF](/wiki/tf-idf) vector over its description, and the system recommends items similar to ones the user already liked. The advantage is that brand new items can be scored from day one because their features exist before any user has interacted with them. The disadvantage is that recommendations are only as good as the metadata.

[Collaborative filtering](/wiki/collaborative_filtering) takes the opposite approach. It treats items as anonymous IDs and learns about them only through the pattern of interactions they receive. Two early variants dominated the field:

- User based collaborative filtering finds users who behave like you and recommends what they liked.
- Item based collaborative filtering, invented at Amazon in 1998 and formalized by Sarwar, Karypis, Konstan, and Riedl in 2001, finds items similar to ones you already liked. Item similarity is computed from the pattern of users who rated both items, typically using [cosine similarity](/wiki/cosine_similarity) or Pearson correlation. Amazon's Linden, Smith, and York described the production version in IEEE Internet Computing in 2003. The technique scaled because item to item similarities are stable and can be precomputed offline.

## Item embeddings

The deeper move, and the one that defines current practice, is to compress each item into a dense low dimensional vector called an item [embedding](/wiki/embeddings). Instead of storing one row per item with thousands of sparse features, the system learns a vector with maybe 32 to 512 numbers that captures the item's position in some latent taste space.

[Matrix factorization](/wiki/matrix_factorization) made this idea famous during the [Netflix Prize](/wiki/netflix_prize) competition that ran from 2006 to 2009. Simon Funk published a blog post in 2006 describing a simple stochastic gradient descent algorithm that decomposed the Netflix rating matrix into two smaller matrices: one for users and one for items. Each item ended up as a vector of latent factors, and the predicted rating for a user item pair was just the dot product of their two vectors. Funk's approach, often called Funk SVD, became the backbone of the winning Netflix Prize entries and the model template for most of the recommender literature that followed.

The number of latent dimensions is a tuning knob. Too few and the model cannot capture subtle differences between items. Too many and the model overfits and starts memorizing noise. Typical values for production systems sit in the low hundreds.

## Two tower models

Classical matrix factorization has a hard limitation: it only knows about item IDs. If a brand new movie appears in the catalog tomorrow, matrix factorization has no row for it because no user has rated it yet. The fix is to let a neural network compute item embeddings from features instead of looking them up from a static table. That is the core idea of the [two-tower model](/wiki/two-tower_model), now standard at YouTube, Google, Pinterest, Spotify, Meta, and many other platforms.

In a two tower architecture, one neural network (the item tower) reads in raw item features such as ID, category, description, image features, and tags, then outputs a fixed length item embedding. A second neural network (the user tower) does the same for user features. The score for a user item pair is the dot product of the two embeddings. Because the item tower can use any features, a brand new item can be embedded the moment it is added to the catalog. This is one of the cleanest fixes for the item [cold start](/wiki/cold_start) problem.

At serving time, the item embeddings for the entire catalog (often hundreds of millions of items in industrial systems) are precomputed offline and stored in a vector index. When a user request arrives, the user tower computes a single user embedding on the fly, and the system uses approximate nearest neighbor search to find the top items in the embedding space. This is what makes two tower retrieval scale to massive catalogs without computing a score for every candidate.

## The item cold start problem

Cold start refers to the difficulty of recommending an item that has no interaction history. A pure collaborative filtering algorithm cannot rank an item that nobody has rated yet, because its latent factors are undefined. New items can sit invisible in the catalog for weeks before they collect enough signal to compete with established titles, which discourages catalog growth and creates a feedback loop where popular items keep getting more popular.

There are several common mitigations:

- Content based features. If the item has metadata such as genre, brand, description, or thumbnail, a content based or hybrid model can score it from day one based on similarity to known items.
- Feature based embeddings. Two tower and other neural retrieval models predict an item embedding directly from features, so new items inherit the embedding space without needing interactions.
- Exploration. Many systems intentionally show new items to a small fraction of users to gather signal. Techniques range from epsilon greedy exploration to multi armed bandits and Thompson sampling.
- Popularity priors. When the model is uncertain, defaulting to overall popularity or trending lists gives a reasonable fallback while signal accumulates.

Cold start hits popular items as well, in a subtler way. An old item with only a handful of ratings behaves a lot like a new item: there is not enough data to estimate its quality, and the model tends to under recommend it, which keeps the data sparse, which keeps it under recommended. This is sometimes called the long tail problem and is one of the active research areas in recommender systems.

## Implicit and explicit item signals

Not all item interactions are equal. The recommender literature splits feedback into two big buckets:

- Explicit feedback such as star ratings, thumbs up or down, and survey responses. These are clean signals about preference but they are rare because most users do not bother to rate.
- Implicit feedback such as clicks, watch time, scroll depth, add to cart, purchase, and dwell time. These are abundant and noisy, since a click does not always mean love and a non click does not always mean dislike.

Most modern production systems are built around implicit feedback because there is so much more of it. Hu, Koren, and Volinsky's 2008 paper on collaborative filtering for implicit feedback is the canonical reference for handling the noise and confidence levels that come with these signals. For items specifically, this means each item has many kinds of interaction data attached to it, and the model has to weigh them by reliability and intent.

## Item to item recommendations

A related but distinct task is recommending items given an item rather than given a user. This is what powers "people who viewed this also viewed" widgets on product pages, "more like this" sections at the bottom of articles, and end of video autoplay on YouTube. The job is to find items whose embeddings are close to the seed item, often with extra signals such as recent popularity or category constraints layered on top.

Item to item recommendation is technically easier than personalized recommendation because the system does not need a user model. Amazon's original 1998 algorithm was an item to item algorithm. Pinterest's related pins, YouTube's up next suggestions, and Spotify's song radio are all variations on the same pattern, although modern versions personalize the item to item results based on the viewer.

## Items in large language model recommenders

More recent work has explored using [large language models](/wiki/large_language_model) as recommenders, where items are described in text and the model is asked to rank them in context. In these systems each item becomes a small description rather than just an ID, and the model reasons over candidate items in natural language. This blurs the line between content based and collaborative filtering, since the LLM uses its general knowledge of the world (which is implicit collaborative signal from training data) plus the item description (which is content) to score recommendations. Research papers from 2023 onward have explored this direction, and at least some streaming and shopping platforms are experimenting with hybrid systems that combine a classical retrieval model with an LLM reranker.

## Catalog quality and item governance

None of this works if the catalog is broken. Practitioners spend a surprising amount of time on item governance: deduplicating products that have multiple SKUs for what is really the same thing, handling translations and locale variants, cleaning up bad images and missing descriptions, and deciding which items should be eligible for recommendation in the first place. Adult content, age restricted items, region locked titles, and items violating policy all need to be flagged at the item level so the recommender can filter them.

Item level metadata also drives diversity and fairness adjustments. If the model would otherwise pile up recommendations from a single brand, category, or creator, post processing logic often reranks the top results to spread coverage across more items. This requires that the catalog carry the structural information (category trees, brand IDs, creator IDs) needed for the reranker to know what to balance.

## References

1. Sarwar, B., Karypis, G., Konstan, J., Riedl, J. (2001). Item-Based Collaborative Filtering Recommendation Algorithms. Proceedings of the 10th International Conference on World Wide Web, 285 to 295. https://files.grouplens.org/papers/www10_sarwar.pdf
2. Linden, G., Smith, B., York, J. (2003). Amazon.com Recommendations: Item-to-Item Collaborative Filtering. IEEE Internet Computing, 7(1), 76 to 80. https://www.cs.umd.edu/~samir/498/Amazon-Recommendations.pdf
3. Smith, B., Linden, G. (2017). Two Decades of Recommender Systems at Amazon.com. IEEE Internet Computing. https://assets.amazon.science/76/9e/7eac89c14a838746e91dde0a5e9f/two-decades-of-recommender-systems-at-amazon.pdf
4. Wikipedia: Item-item collaborative filtering. https://en.wikipedia.org/wiki/Item-item_collaborative_filtering
5. Wikipedia: Matrix factorization (recommender systems). https://en.wikipedia.org/wiki/Matrix_factorization_(recommender_systems)
6. Wikipedia: Recommender system. https://en.wikipedia.org/wiki/Recommender_system
7. Wikipedia: Cold start (recommender systems). https://en.wikipedia.org/wiki/Cold_start_(recommender_systems)
8. Google Machine Learning Crash Course: Collaborative filtering. https://developers.google.com/machine-learning/recommendation/collaborative/basics
9. Google Machine Learning Crash Course: Matrix factorization. https://developers.google.com/machine-learning/recommendation/collaborative/matrix
10. Google Cloud: Scaling deep retrieval with TensorFlow two towers architecture. https://cloud.google.com/blog/products/ai-machine-learning/scaling-deep-retrieval-tensorflow-two-towers-architecture
11. Hu, Y., Koren, Y., Volinsky, C. (2008). Collaborative Filtering for Implicit Feedback Datasets. IEEE International Conference on Data Mining.
12. Microsoft Learn: Catalog data entities for Intelligent Recommendations. https://learn.microsoft.com/en-us/industry/retail/intelligent-recommendations/catalog-data-entity

