# SPLADE

> Source: https://aiwiki.ai/wiki/splade
> Updated: 2026-06-24
> Categories: Information Retrieval, Natural Language Processing
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**SPLADE** (Sparse Lexical and Expansion model) is a learned sparse retrieval model that encodes a query or document as a weighted, vocabulary-sized sparse vector over BERT's roughly 30,000-term WordPiece vocabulary, so that neural-ranking quality can be served directly on a classical inverted index. Introduced by Thibault Formal, Benjamin Piwowarski, and Stéphane Clinchant of Naver Labs Europe at SIGIR 2021, SPLADE reuses the masked language modeling (MLM) head of a [BERT](/wiki/bert)-style encoder to assign nonzero weights to both the literal tokens of the text and a set of expansion terms, scoring documents through dot products on standard posting lists.[^1] It is trained end-to-end with a ranking objective plus an explicit FLOPS regularizer that penalizes the expected per-dimension cost of posting-list traversal, producing very sparse outputs that remain competitive with dense bi-encoders while preserving the efficiency and interpretability of bag-of-words [information retrieval](/wiki/information_retrieval).[^2][^1] In practical terms, SPLADE occupies a middle ground between unsupervised lexical scoring such as [BM25](/wiki/bm25) and dense or late-interaction neural retrievers such as [ColBERT](/wiki/colbert): it keeps explicit term matching and the inverted-index pipeline, but learns the term weights from data and can match terms that never literally appear in the text.[^7][^13] Successive releases, SPLADE v2, SPLADE++, Efficient SPLADE, and SPLADE-v3, extended the design with knowledge distillation, hard-negative mining, separated query and document encoders, and improved training recipes, with the SPLADE-v3 base model exceeding 40 MRR@10 on MS MARCO and 51.7 average nDCG@10 on BEIR-13.[^3][^4][^5][^6][^21] SPLADE is widely adopted as the canonical learned sparse retriever and is integrated into open-source toolkits such as Pyserini and managed services such as Pinecone's sparse-dense indexes and Vespa's WAND operators.[^7][^8][^9]

## Infobox

| Property | Value |
|---|---|
| Original paper | "SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking"[^1] |
| Venue | SIGIR 2021 (44th ACM SIGIR Conference), Virtual Event, Canada[^10] |
| Original authors | Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant (Naver Labs Europe)[^1][^10] |
| First arXiv version | 2107.05720, 12 July 2021[^1] |
| Successor papers | SPLADE v2 (arXiv:2109.10086, 21 Sep 2021); SPLADE++ (arXiv:2205.04733, 10 May 2022); Efficient SPLADE (arXiv:2207.03834, SIGIR 2022); SPLADE-v3 (arXiv:2403.06789, 11 Mar 2024)[^2][^4][^5][^6] |
| Reference implementation | github.com/naver/splade (995 GitHub stars)[^3] |
| License (code) | Creative Commons Attribution-NonCommercial-ShareAlike 4.0[^3] |
| Representation size | 30,522 sparse dimensions (BERT WordPiece vocabulary)[^11] |
| Pooling | max over MLM logits with log(1+ReLU) saturation[^11] |
| Regularizer | FLOPS (squared mean term weight per batch)[^11][^12] |
| Best MS MARCO MRR@10 | 40.2 (SPLADE-v3 base)[^21] |

## When was SPLADE released, and what came before it?

### Lexical retrieval and the vocabulary mismatch problem

For decades, first-stage [information retrieval](/wiki/information_retrieval) over large document collections relied on bag-of-words term-matching scores such as TF-IDF and [BM25](/wiki/bm25), evaluated efficiently through inverted indexes that store posting lists keyed by vocabulary terms.[^7] Such scores exhibit two well-known weaknesses: they fail when relevant documents use different surface forms than the query (the vocabulary mismatch problem), and they assign weights using unsupervised heuristics rather than task-specific signal. Two strands of work attempted to close that gap. Dense bi-encoders, exemplified by Dense Passage Retrieval (see [DPR](/wiki/dense_passage_retrieval)), embed queries and documents into a low-dimensional Euclidean space and retrieve with approximate nearest-neighbor search, but they discard the interpretability of explicit term matching and require specialized vector indexes.[^13] Learned sparse retrieval kept the inverted-index pipeline and instead trained neural models to predict per-term weights, with DeepCT, doc2query, uniCOIL, DeepImpact, and SparTerm as early examples.[^14][^13]

### SparTerm as the immediate antecedent

SparTerm proposed predicting term importance directly from the logits of BERT's MLM head over the WordPiece vocabulary, multiplied by a learned binary gating mask that controls expansion.[^15] The resulting representation reuses the inverted-index machinery while allowing the model to assign nonzero weight to vocabulary terms not literally present in the input, effectively learning a per-document expansion.[^15][^11] SparTerm's training did not directly optimize for sparsity beyond the gating mask, and its end-to-end behavior was difficult to tune.[^1]

### SPLADE (SIGIR 2021)

Thibault Formal, Benjamin Piwowarski, and Stéphane Clinchant introduced SPLADE in a short paper at SIGIR 2021, "SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking", published in the proceedings of the 44th International ACM SIGIR Conference and posted to arXiv on 12 July 2021.[^1][^10] The paper replaced SparTerm's binary gate with a log-saturation activation, `w_j = max over input tokens i of log(1 + ReLU(s_{ij}))`, where `s_{ij}` is the MLM logit for vocabulary term `j` at input position `i`, and trained the model end-to-end with a ranking loss plus an explicit FLOPS regularizer on the sparsity of the representation.[^11][^1] As the authors put it, the approach yields "highly sparse representations and competitive results with respect to state-of-the-art dense and sparse methods," trained "end-to-end in a single stage."[^1] The combination allowed SPLADE to outperform BM25, doc2query-T5, and previous sparse neural baselines on MS MARCO Passage Ranking while remaining indexable with standard inverted-file engines.[^1][^11]

### SPLADE v2 (October 2021)

Two months after SIGIR, Formal, Carlos Lassance, Piwowarski, and Clinchant posted a longer technical report on arXiv as SPLADE v2 (arXiv:2109.10086, 21 September 2021).[^2][^16] The v2 paper changed the pooling strategy from sum to max, introduced a document-only expansion variant (SPLADE-doc) that drops query expansion to lower query-time cost, and replaced the original cross-entropy training with a distillation objective using a cross-encoder teacher.[^2][^16] On MS MARCO TREC DL 2019, v2 reported "more than 9% gains on NDCG@10" over the SIGIR baseline and posted state-of-the-art zero-shot scores on the [MTEB](/wiki/mteb)-adjacent BEIR benchmark of 18 datasets.[^2]

### SPLADE++ (SIGIR 2022)

At SIGIR 2022, the same group published "From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective" (arXiv:2205.04733, 10 May 2022).[^4][^17] The paper is the canonical reference for the SPLADE++ checkpoints, which combine [knowledge distillation](/wiki/knowledge_distillation) from a cross-encoder, hard-negative mining over a larger negative pool, and a co-condenser pre-trained language model initialization.[^4][^17] The released checkpoints `naver/splade-cocondenser-selfdistil` and `naver/splade-cocondenser-ensembledistil` reached MRR@10 of 37.6 and 38.3 respectively on the MS MARCO development set, and the EnsembleDistil variant achieved an average nDCG@10 of 50.7 on the BEIR-13 subset reported by the authors.[^3][^18]

### Efficient SPLADE (SIGIR 2022)

Also at SIGIR 2022, Carlos Lassance and Stéphane Clinchant published "An Efficiency Study for SPLADE Models" (arXiv:2207.03834, 8 July 2022), focused on production latency rather than effectiveness ceilings.[^5][^19] The paper proposes L1 regularization specifically on the query side, separation of the query and document encoders so each can be tuned independently, a FLOPS-regularized middle-training stage, and faster query-side architectures.[^5][^19] The released Efficient SPLADE models, `naver/efficient-splade-V-large-doc` plus its query encoder, reach MRR@10 of 38.8 on MS MARCO dev while serving queries within a few milliseconds of BM25 on the same hardware.[^3][^5]

### SPLADE-v3 (March 2024)

Carlos Lassance, Hervé Déjean, Thibault Formal, and Stéphane Clinchant released "SPLADE-v3: New baselines for SPLADE" on arXiv on 11 March 2024 (arXiv:2403.06789), accompanying a refreshed release of the official library and Hugging Face checkpoints.[^6][^20] SPLADE-v3 keeps the architecture of SPLADE++ but changes the training mix: warm-starting from the SPLADE++SelfDistil checkpoint, sampling 8 negatives per query from SPLADE++SelfDistil for the base model, and combining a KL-divergence loss with a MarginMSE loss weighted 0.05.[^21][^20] A more aggressive ensemble configuration in the same paper instead samples 100 negatives per query, 50 from the top-50 and 50 chosen at random from the top-1000.[^20] The base model reaches MRR@10 of 40.2 on MS MARCO and an average nDCG@10 of 51.7 on BEIR-13, improving the out-of-domain BEIR score by roughly two points over SPLADE++.[^21][^20] The release also includes variant models: SPLADE-v3-DistilBERT for lower-footprint deployment, SPLADE-v3-Lexical that drops query expansion, and SPLADE-v3-Doc that drops query-side computation entirely.[^20]

## How does SPLADE work?

### MLM-logit representation

Given a sequence of input WordPiece tokens, SPLADE first encodes the sequence with a BERT (or [DistilBERT](/wiki/distilbert)) backbone and then runs the language modeling head, which produces a logit `s_{ij}` for every vocabulary term `j` at every input position `i`.[^11][^22] These are the same logits that BERT uses for masked-token prediction during pre-training; SPLADE repurposes them as a per-position relevance signal over the 30,522-entry vocabulary.[^22][^11] No fine-tuning of the MLM head from scratch is required: training instead adapts the existing weights so the logits encode retrieval relevance rather than next-token likelihood.[^11]

### Pooling and saturation

A nonlinearity is applied to convert logits into nonnegative term weights before pooling across positions. SPLADE uses the saturation function `log(1 + ReLU(s_{ij}))`, where [ReLU](/wiki/relu) zeros out negative logits, contributing the sparsity floor of the representation, and the logarithm dampens the impact of any single dominant logit, mimicking the saturating term-frequency curves used in BM25.[^11][^22] The v1 paper aggregated across input positions with a sum pool; v2 changed this to a max pool, which empirically improved both in-domain and zero-shot effectiveness and remains the default for SPLADE++ and SPLADE-v3.[^2][^21]

### FLOPS regularizer

End-to-end sparsity is enforced by an auxiliary loss term known as the FLOPS regularizer.[^12][^11] For a batch of `B` representations, the regularizer is the sum over vocabulary dimensions `j` of the squared mean activation, `sum_j (1/B sum_b w_{b,j})^2`.[^12] Because the squared mean penalizes consistent activation of the same dimension across the batch more heavily than isolated activations, it pushes the model toward representations whose nonzero entries are distributed unevenly across documents, which approximates the expected cost of scoring a query against an inverted index: a posting list is short when only a few documents activate that vocabulary entry.[^12][^11] Separate regularization weights `lambda_q` and `lambda_d` are applied to queries and documents, with a warmup schedule that ramps the coefficients up over the first thousand training steps so the model first learns to retrieve and then learns to be sparse.[^11][^16]

### Bag of words and expansion duality

Because SPLADE outputs a 30,522-dimensional sparse vector whose dimensions correspond directly to vocabulary tokens, every nonzero entry is human-readable.[^7][^22] A SPLADE encoding of a document therefore decomposes into two intuitive components: the in-document terms, which behave like a learned variant of TF weights, and the expansion terms, which are vocabulary entries that did not appear in the document but received nonzero weight because the MLM head deemed them semantically relevant.[^7][^22] The dual property holds for queries: a query SPLADE vector is itself a bag of literal query words plus expansion words, so the standard inverted-index scoring formula, sum over shared vocabulary entries of `w_q,j x w_d,j`, gives the SPLADE relevance score.[^7][^11] This makes SPLADE results inspectable in a way that dense bi-encoders are not: failures can be traced to specific over- or under-weighted vocabulary entries.[^22]

### Why can SPLADE run on a normal inverted index?

A SPLADE document representation is, by construction, a sparse weighted multiset over the BERT vocabulary, which fits exactly the data structure that classical search engines use for postings.[^7][^11] At indexing time, documents are encoded once with the document encoder, their nonzero entries written into Lucene-compatible impact-weighted postings, and the resulting index supports the same disjunctive top-k traversal that classical BM25 indexes use, including WAND and BlockMaxWAND skipping for sublinear retrieval.[^7][^8] At query time, only the (much smaller) query encoder runs, the query SPLADE vector is treated as a weighted disjunction of terms, and the standard impact-ranking machinery returns the top-k.[^8][^7] This compatibility is the main practical reason SPLADE has been adopted: existing search infrastructure based on Lucene, Tantivy, or commercial inverted-file engines can serve SPLADE without redesign.[^8][^7]

## Variants and checkpoints

The naver/splade GitHub repository and the corresponding Hugging Face organization host multiple checkpoints; the table below records the headline scores documented in the official README and the per-model Hugging Face cards.

| Model | Backbone | MRR@10 (MS MARCO dev) | Notes |
|---|---|---|---|
| splade_v2_max | DistilBERT | 34.0 | Original v1 max-pool checkpoint[^3] |
| splade_v2_distil (DistilSPLADE-max) | DistilBERT | 36.8 | v2 with distillation[^3] |
| splade-cocondenser-selfdistil (SPLADE++ SD) | co-condenser | 37.6 | SIGIR 2022 self-distillation[^3] |
| splade-cocondenser-ensembledistil (SPLADE++ ED) | co-condenser | 38.3 | Ensemble teacher distillation[^3] |
| efficient-splade-V-large (doc + query) | BERT | 38.8 | Separated encoders, L1 query reg[^3] |
| efficient-splade-VI-BT-large (doc + query) | BERT | 38.0 | Lower-latency configuration[^3] |
| splade-v3 | BERT (warm from ++SD) | 40.2 | KL-Div + MarginMSE, 8 negatives[^21] |
| splade-v3-distilbert | DistilBERT | 38.7 | Distilled variant of v3[^20] |
| splade-v3-lexical | BERT | 40.0 | No query expansion[^20] |
| splade-v3-doc | BERT | 37.8 | No query-side neural model[^20] |

The repository is released under CC-BY-NC-SA 4.0, and per the README has accumulated 995 stars and is actively maintained.[^3]

## How well does SPLADE do on BEIR?

BEIR is a heterogeneous zero-shot retrieval benchmark spanning 18 datasets (typically the 13-dataset publicly evaluable subset). SPLADE has consistently set strong baselines on this benchmark: the SPLADE v2 paper introduced BEIR evaluation for sparse models and reported then-state-of-the-art zero-shot scores; SPLADE++ EnsembleDistil reached an average nDCG@10 of 50.7 on the BEIR-13 evaluation reported by Formal and colleagues; and SPLADE-v3 raised the BEIR-13 average to 51.7 nDCG@10 while exceeding 40 MRR@10 on MS MARCO dev.[^2][^4][^21] The SPLADE-v3 paper conducts a meta-analysis across more than 40 query sets, including MS MARCO, BEIR, LoTTE and TREC collections, and reports that the v3 base model is "statistically significantly more effective than both BM25 and SPLADE++, while comparing well to cross-encoder re-rankers" on those collections.[^20]

## How is SPLADE deployed in practice?

### Reference library (naver/splade)

The official repository at `github.com/naver/splade` provides training, indexing, and evaluation pipelines, integrated with Hugging Face Transformers and PyTorch.[^3] The README documents training recipes for SPLADE, SPLADE++, Efficient SPLADE, and SPLADE-v3, end-to-end indexing via Pyserini-compatible JSON output, and reproduction scripts for MS MARCO and BEIR.[^3]

### Pyserini

Pyserini, the Lucene/Anserini-based Python toolkit for reproducible IR research from the University of Waterloo, supports SPLADE through its `LuceneImpactSearcher` class.[^7][^23] The toolkit ships pre-tokenized SPLADE indexes for MS MARCO and provides documentation that reproduces SPLADE v2's reported MRR@10 of ~0.368 on the development set, either by encoding queries on the fly with the `distill-splade-max` model or by using pre-computed query impacts.[^7][^23] Pyserini's SPLADE integration uses the `--impact --pretokenized` flags to bypass BM25 score normalization and treat the SPLADE-emitted weights as direct posting-list impacts.[^7]

### Pinecone sparse-dense

Pinecone announced general support for sparse-dense vectors on 23 February 2023, allowing users to upload SPLADE document vectors alongside dense [embeddings](/wiki/embeddings) in the same vector store and to retrieve with a tunable `alpha` blend between the two scores.[^9] The Pinecone Text Client library ships a `SpladeEncoder` class that wraps the `naver/splade-cocondenser-ensembledistil` checkpoint and emits the sparse component in the format expected by Pinecone's sparse-dense index, enabling hybrid lexical-semantic retrieval without operating a separate sparse engine.[^24][^9]

### Vespa

The Vespa search engine supports SPLADE via its `weightedset<int>` document field and the `wand` query operator, which implements top-k disjunctive retrieval over weighted posting lists.[^8] Because Vespa's WAND operator requires integer term identifiers, the recommended pattern is to map the 30,522 BERT WordPiece IDs into the Vespa field directly, allowing dense and sparse retrieval to be combined within a single query expression that mixes `wand` with Vespa's `nearestNeighbor` operator for dense [vector embeddings](/wiki/vector_embedding).[^8]

### Other integrations

Beyond the headline integrations, the SPLADE checkpoints are usable from Sentence Transformers via its `SparseEncoder` API, allowing SPLADE document and query encoding from the same library that handles dense bi-encoders.[^18] Downstream production users include open-source [vector database](/wiki/vector_database) stacks that surface sparse-dense hybrid retrieval, and academic systems that compare learned sparse retrieval against dense retrieval baselines across MS MARCO and BEIR.[^13][^7]

## What is SPLADE used for, and why does it matter?

SPLADE established a viable middle path between BM25-style lexical retrieval and dense bi-encoder retrieval.[^13][^7] Its three principal practical advantages are: compatibility with existing inverted-file infrastructure, which removes the operational cost of running a separate vector index for first-stage retrieval; inherent interpretability of the per-dimension vocabulary weights, which simplifies debugging and explanation in production retrieval systems; and competitive zero-shot generalization on out-of-domain benchmarks such as BEIR, where dense bi-encoders historically degraded sharply outside their training distribution.[^7][^22][^2] These properties have made SPLADE a frequent first-stage choice in [semantic search](/wiki/semantic_search) pipelines, retrieval-augmented question answering, and as the sparse component of sparse-dense hybrid retrieval stacks.[^9][^7]

## Limitations and criticisms

### Index inflation from expansion terms

A SPLADE-encoded document can have many more nonzero vocabulary entries than its surface tokens, which increases inverted-index size and the average length of posting lists relative to BM25.[^12][^7] Aggressive FLOPS regularization controls this at training time, but trades off effectiveness, the cost of which is the subject of the Efficient SPLADE paper and subsequent work on DF-FLOPS that penalizes the use of high document-frequency terms specifically.[^12][^5]

### Query latency

Naive SPLADE serving runs a full transformer pass at query time before the inverted-index lookup, which adds latency relative to BM25's keyword-only processing path.[^5] The Efficient SPLADE work narrowed the gap to under four milliseconds on standard hardware by using smaller query-side encoders and stronger L1 regularization on the query representation, but a residual cost relative to BM25 remains, especially under high concurrency.[^5][^19]

### Vocabulary lock-in

Because the representation space is fixed to the BERT WordPiece vocabulary, SPLADE inherits its tokenizer's strengths and weaknesses. Out-of-vocabulary terms must be subword-tokenized, and the WordPiece units may not match the natural language properties of non-English collections without retraining or multilingual variants.[^11][^22]

### License

The reference SPLADE code and weights are released under CC-BY-NC-SA 4.0, a noncommercial license, which restricts certain production uses without a separate commercial agreement.[^3]

## How does SPLADE differ from BM25, dense retrieval, and ColBERT?

SPLADE keeps the sparse, term-indexed representation and inverted-index serving of [BM25](/wiki/bm25), but learns the term weights and adds expansion terms instead of relying on unsupervised statistics. Dense bi-encoders such as [DPR](/wiki/dense_passage_retrieval) replace term matching entirely with a single low-dimensional embedding scored by approximate nearest-neighbor search, trading interpretability and inverted-index reuse for compactness. [ColBERT](/wiki/colbert) takes a third route, late interaction: it stores one dense vector per token and scores a query-document pair with a MaxSim sum over token vectors, which excels at precision but requires a multi-vector index rather than a posting-list engine. Work such as SPLATE (SIGIR 2024) explicitly bridges the two by mapping ColBERTv2's frozen token embeddings into a SPLADE-style sparse vocabulary space, reporting that it "nearly matches ColBERTv2 performance while significantly lowering retrieval latency."[^25]

| Approach | Representation | Index | Interpretability | Notes |
|---|---|---|---|---|
| [BM25](/wiki/bm25) | Unsupervised TF/IDF, sparse | Inverted | High | Strong unsupervised baseline[^7] |
| DeepCT, doc2query, uniCOIL, DeepImpact | Learned per-term weights, sparse | Inverted | High | Earlier learned sparse retrievers[^13][^14] |
| SparTerm | MLM logits with binary gate, sparse | Inverted | High | Direct antecedent of SPLADE[^15] |
| SPLADE family | MLM logits with FLOPS, sparse | Inverted | High | Subject of this article[^1][^2][^4][^5][^6] |
| [DPR](/wiki/dense_passage_retrieval) and successor dense bi-encoders | Dense vector, low-d | ANN/HNSW | Low | Strong on in-domain, can degrade zero-shot[^13] |
| [ColBERT](/wiki/colbert) late interaction | One dense vector per token | Multi-vector | Medium | MaxSim scoring; SPLATE maps it into sparse space[^25] |
| Cross-encoder rerankers | Pairwise transformer score | None (rerank) | Medium | Used as teachers in SPLADE distillation[^4][^20] |

## See also

- [BERT](/wiki/bert)
- [DistilBERT](/wiki/distilbert)
- [BM25](/wiki/bm25)
- [Information retrieval](/wiki/information_retrieval)
- [Dense Passage Retrieval](/wiki/dense_passage_retrieval)
- [ColBERT](/wiki/colbert)
- [Knowledge distillation](/wiki/knowledge_distillation)
- [Semantic search](/wiki/semantic_search)
- [Sparse representation](/wiki/sparse_representation)
- [Sparse vector](/wiki/sparse_vector)
- [Vector database](/wiki/vector_database)
- [MTEB](/wiki/mteb)
- [Pinecone](/wiki/pinecone)
- [Hugging Face](/wiki/hugging_face)
- [Naver Labs](/wiki/naver_labs)

## References

[^1]: Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant, "SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking", arXiv preprint, 2021-07-12. https://arxiv.org/abs/2107.05720. Accessed 2026-05-21.
[^2]: Thibault Formal, Carlos Lassance, Benjamin Piwowarski, Stéphane Clinchant, "SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval", arXiv preprint, 2021-09-21. https://arxiv.org/abs/2109.10086. Accessed 2026-05-21.
[^3]: Naver Labs Europe, "naver/splade: SPLADE: sparse neural search (SIGIR21, SIGIR22)", GitHub, 2024. https://github.com/naver/splade. Accessed 2026-05-21.
[^4]: Thibault Formal, Carlos Lassance, Benjamin Piwowarski, Stéphane Clinchant, "From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective", arXiv preprint, 2022-05-10. https://arxiv.org/abs/2205.04733. Accessed 2026-05-21.
[^5]: Carlos Lassance, Stéphane Clinchant, "An Efficiency Study for SPLADE Models", arXiv preprint, 2022-07-08. https://arxiv.org/abs/2207.03834. Accessed 2026-05-21.
[^6]: Carlos Lassance, Hervé Déjean, Thibault Formal, Stéphane Clinchant, "SPLADE-v3: New baselines for SPLADE", arXiv preprint, 2024-03-11. https://arxiv.org/abs/2403.06789. Accessed 2026-05-21.
[^7]: Castorini, "Pyserini: experiments-spladev2.md", GitHub documentation, 2022. https://github.com/castorini/pyserini/blob/master/docs/experiments-spladev2.md. Accessed 2026-05-21.
[^8]: Vespa Engine, "Redefining Hybrid Search Possibilities with Vespa", Vespa Blog, 2023. https://blog.vespa.ai/redefining-hybrid-search-possibilities-with-vespa/. Accessed 2026-05-21.
[^9]: Gareth Jones, "Introducing support for sparse-dense embeddings for better search results", Pinecone Blog, 2023-02-23. https://www.pinecone.io/blog/sparse-dense/. Accessed 2026-05-21.
[^10]: ACM, "SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking", Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021-07-11. https://dl.acm.org/doi/10.1145/3404835.3463098. Accessed 2026-05-21.
[^11]: James Briggs, "SPLADE for Sparse Vector Search Explained", Pinecone Learning Center, 2022. https://www.pinecone.io/learn/splade/. Accessed 2026-05-21.
[^12]: EmergentMind, "FLOPS Loss: Sparsity and Efficiency in Models", emergentmind.com topics page, 2025. https://www.emergentmind.com/topics/flops-loss. Accessed 2026-05-21.
[^13]: Wikipedia contributors, "Learned sparse retrieval", Wikipedia, 2025. https://en.wikipedia.org/wiki/Learned_sparse_retrieval. Accessed 2026-05-21.
[^14]: Jimmy Lin, Xueguang Ma, "A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques", arXiv preprint, 2021-06-28. https://arxiv.org/abs/2106.14807. Accessed 2026-05-21.
[^15]: Yang Bai, Xiaoguang Li, Gang Wang, Chaoliang Zhang, Lifeng Shang, Jun Xu, Zhaowei Wang, Fangshan Wang, Qun Liu, "SparTerm: Learning Term-based Sparse Representation for Fast Text Retrieval", arXiv preprint, 2020-10-02. https://arxiv.org/abs/2010.00768. Accessed 2026-05-21.
[^16]: Naver Labs Europe, "SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval", Naver Labs Europe Publications, 2021. https://europe.naverlabs.com/research/publications/splade-v2/. Accessed 2026-05-21.
[^17]: ACM, "From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective", Proceedings of the 45th International ACM SIGIR Conference, 2022-07-11. https://dl.acm.org/doi/10.1145/3477495.3531857. Accessed 2026-05-21.
[^18]: Hugging Face, "naver/splade-cocondenser-ensembledistil and naver/splade-v3", model cards, 2024. https://huggingface.co/naver/splade-v3. Accessed 2026-05-21.
[^19]: ACM, "An Efficiency Study for SPLADE Models", Proceedings of the 45th International ACM SIGIR Conference, 2022-07-11. https://dl.acm.org/doi/abs/10.1145/3477495.3531833. Accessed 2026-05-21.
[^20]: Carlos Lassance, Hervé Déjean, Thibault Formal, Stéphane Clinchant, "SPLADE-v3: New baselines for SPLADE", arXiv HTML version, 2024-03-11. https://arxiv.org/html/2403.06789v1. Accessed 2026-05-21.
[^21]: Hugging Face, "naver/splade-v3 Model Card", Hugging Face, 2024. https://huggingface.co/naver/splade-v3. Accessed 2026-05-21.
[^22]: Santhosh Kammari, "SPLADE: A Technical Deep Dive into Sparse Neural Information Retrieval", Medium, 2024. https://medium.com/@santhoshkammari/splade-a-technical-deep-dive-into-sparse-neural-information-retrieval-3a5daef18313. Accessed 2026-05-21.
[^23]: Castorini, "Pyserini: Python toolkit for reproducible information retrieval research with sparse and dense representations", GitHub, 2024. https://github.com/castorini/pyserini. Accessed 2026-05-21.
[^24]: Pinecone, "Pinecone Text Client: SpladeEncoder", GitHub, 2024. https://github.com/pinecone-io/pinecone-text. Accessed 2026-05-21.
[^25]: Thibault Formal, Stéphane Clinchant, Hervé Déjean, Carlos Lassance, "SPLATE: Sparse Late Interaction Retrieval", arXiv preprint, 2024-04-22. https://arxiv.org/abs/2404.13950. Accessed 2026-06-24.

