Items
Last reviewed
May 11, 2026
Sources
12 citations
Review status
Source-backed
Revision
v2 · 2,323 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 11, 2026
Sources
12 citations
Review status
Source-backed
Revision
v2 · 2,323 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: Machine learning terms, Machine learning terms/Recommendation Systems
In recommendation systems, items are the entities that the system recommends to users. The word covers anything a service might surface to a person browsing a feed or a catalog: movies on Netflix, products on Amazon, songs on Spotify, videos on YouTube, articles on a news site, restaurants on Yelp, profiles on a dating app, ride routes on Uber, friends on Facebook, or jobs on LinkedIn. Items and users are the two foundational entities in any recommender, and most of the math in the field is about learning how the two relate.
The term sounds generic on purpose. A recommender works the same way whether it is ranking 50 million products or 200 podcasts, so the literature uses "item" as a placeholder for whatever is being recommended. The first paper that formalized item based collaborative filtering, Sarwar and colleagues at the University of Minnesota in 2001, deliberately framed everything in terms of items and users so the algorithm would carry across domains.
A recommender system has two sides. On one side there are users, the people asking for suggestions, and on the other there are items, the things that can be suggested. Almost every dataset in the field follows this structure: a user identifier, an item identifier, and some signal that links them, such as a rating, a click, a purchase, a watch time, or a like.
This is usually arranged as a user matrix and an item matrix, or together as a single user item interaction matrix where rows are users and columns are items. Each cell holds the interaction between one user and one item, if any exists. Most cells are empty because any given user has only ever touched a tiny fraction of the catalog, which is the well known sparsity problem in recommender systems.
The asymmetry matters for design choices. In most consumer applications the number of users grows faster than the number of items, and item profiles change less often than user profiles. A movie does not develop new tastes overnight; a viewer might. That stability is one of the reasons Amazon moved from user based to item based collaborative filtering in 1998, since item similarity scores could be precomputed offline and reused for many users.
Every item in a recommender starts with a unique item ID. This is the primary key that ties together every signal the system observes about that item, including ratings, clicks, view counts, inventory, price, category, and any text or image describing it. Catalog management is the unglamorous foundation of any production recommender. If item IDs are inconsistent, duplicated, or assigned sloppily, the model cannot learn coherent representations.
Most real catalogs are messier than the tidy MovieLens or Netflix datasets used in papers. An e-commerce site might have item masters with many variants such as a shirt that comes in different colors and sizes, each with its own SKU but sharing parent metadata. Microsoft's Intelligent Recommendations system, for example, distinguishes between standalone items and item masters with variants, and lets operators attach filter values such as category, color, or size to control what is eligible for recommendation. Items also have availability windows: an out of stock product or an expired listing usually should not be recommended even if the model would otherwise pick it.
Alongside IDs, items carry features. These come from several places:
| Source | Examples |
|---|---|
| Structured metadata | category, brand, price, release year, director, language, duration |
| Text | title, description, tags, user reviews, transcripts |
| Media | product photos, album art, video frames, audio waveforms |
| Behavior | aggregate click rate, purchase rate, average rating, dwell time |
A recommender can use any subset of these depending on the algorithm.
The heart of modern recommendation is learning a good representation of each item. The early approaches were simple. In content based filtering, each item is a sparse vector of its features such as a TF-IDF vector over its description, and the system recommends items similar to ones the user already liked. The advantage is that brand new items can be scored from day one because their features exist before any user has interacted with them. The disadvantage is that recommendations are only as good as the metadata.
Collaborative filtering takes the opposite approach. It treats items as anonymous IDs and learns about them only through the pattern of interactions they receive. Two early variants dominated the field:
The deeper move, and the one that defines current practice, is to compress each item into a dense low dimensional vector called an item embedding. Instead of storing one row per item with thousands of sparse features, the system learns a vector with maybe 32 to 512 numbers that captures the item's position in some latent taste space.
Matrix factorization made this idea famous during the Netflix Prize competition that ran from 2006 to 2009. Simon Funk published a blog post in 2006 describing a simple stochastic gradient descent algorithm that decomposed the Netflix rating matrix into two smaller matrices: one for users and one for items. Each item ended up as a vector of latent factors, and the predicted rating for a user item pair was just the dot product of their two vectors. Funk's approach, often called Funk SVD, became the backbone of the winning Netflix Prize entries and the model template for most of the recommender literature that followed.
The number of latent dimensions is a tuning knob. Too few and the model cannot capture subtle differences between items. Too many and the model overfits and starts memorizing noise. Typical values for production systems sit in the low hundreds.
Classical matrix factorization has a hard limitation: it only knows about item IDs. If a brand new movie appears in the catalog tomorrow, matrix factorization has no row for it because no user has rated it yet. The fix is to let a neural network compute item embeddings from features instead of looking them up from a static table. That is the core idea of the two-tower model, now standard at YouTube, Google, Pinterest, Spotify, Meta, and many other platforms.
In a two tower architecture, one neural network (the item tower) reads in raw item features such as ID, category, description, image features, and tags, then outputs a fixed length item embedding. A second neural network (the user tower) does the same for user features. The score for a user item pair is the dot product of the two embeddings. Because the item tower can use any features, a brand new item can be embedded the moment it is added to the catalog. This is one of the cleanest fixes for the item cold start problem.
At serving time, the item embeddings for the entire catalog (often hundreds of millions of items in industrial systems) are precomputed offline and stored in a vector index. When a user request arrives, the user tower computes a single user embedding on the fly, and the system uses approximate nearest neighbor search to find the top items in the embedding space. This is what makes two tower retrieval scale to massive catalogs without computing a score for every candidate.
Cold start refers to the difficulty of recommending an item that has no interaction history. A pure collaborative filtering algorithm cannot rank an item that nobody has rated yet, because its latent factors are undefined. New items can sit invisible in the catalog for weeks before they collect enough signal to compete with established titles, which discourages catalog growth and creates a feedback loop where popular items keep getting more popular.
There are several common mitigations:
Cold start hits popular items as well, in a subtler way. An old item with only a handful of ratings behaves a lot like a new item: there is not enough data to estimate its quality, and the model tends to under recommend it, which keeps the data sparse, which keeps it under recommended. This is sometimes called the long tail problem and is one of the active research areas in recommender systems.
Not all item interactions are equal. The recommender literature splits feedback into two big buckets:
Most modern production systems are built around implicit feedback because there is so much more of it. Hu, Koren, and Volinsky's 2008 paper on collaborative filtering for implicit feedback is the canonical reference for handling the noise and confidence levels that come with these signals. For items specifically, this means each item has many kinds of interaction data attached to it, and the model has to weigh them by reliability and intent.
A related but distinct task is recommending items given an item rather than given a user. This is what powers "people who viewed this also viewed" widgets on product pages, "more like this" sections at the bottom of articles, and end of video autoplay on YouTube. The job is to find items whose embeddings are close to the seed item, often with extra signals such as recent popularity or category constraints layered on top.
Item to item recommendation is technically easier than personalized recommendation because the system does not need a user model. Amazon's original 1998 algorithm was an item to item algorithm. Pinterest's related pins, YouTube's up next suggestions, and Spotify's song radio are all variations on the same pattern, although modern versions personalize the item to item results based on the viewer.
More recent work has explored using large language models as recommenders, where items are described in text and the model is asked to rank them in context. In these systems each item becomes a small description rather than just an ID, and the model reasons over candidate items in natural language. This blurs the line between content based and collaborative filtering, since the LLM uses its general knowledge of the world (which is implicit collaborative signal from training data) plus the item description (which is content) to score recommendations. Research papers from 2023 onward have explored this direction, and at least some streaming and shopping platforms are experimenting with hybrid systems that combine a classical retrieval model with an LLM reranker.
None of this works if the catalog is broken. Practitioners spend a surprising amount of time on item governance: deduplicating products that have multiple SKUs for what is really the same thing, handling translations and locale variants, cleaning up bad images and missing descriptions, and deciding which items should be eligible for recommendation in the first place. Adult content, age restricted items, region locked titles, and items violating policy all need to be flagged at the item level so the recommender can filter them.
Item level metadata also drives diversity and fairness adjustments. If the model would otherwise pile up recommendations from a single brand, category, or creator, post processing logic often reranks the top results to spread coverage across more items. This requires that the catalog carry the structural information (category trees, brand IDs, creator IDs) needed for the reranker to know what to balance.