Importance sampling

Importance sampling (often abbreviated IS) is a Monte Carlo variance-reduction technique for estimating expectations under one probability distribution by drawing samples from a different proposal distribution and reweighting each sample by a likelihood ratio. The technique is one of the oldest tools in computational statistics, with roots in 1940s rare-event simulation work at Los Alamos and a textbook treatment in Hammersley and Handscomb's 1964 monograph Monte Carlo Methods. It now sits at the centre of off-policy reinforcement learning, variational inference, particle filtering, simulation-based Bayesian inference, and counterfactual evaluation in industrial recommender systems.

Given a target density $p$, a proposal density $q$ with $q(x)>0$ wherever $p(x)f(x)\neq 0$, and a function $f$ whose expectation we want, the basic identity is

$$\mathbb{E}_p[f(X)] = \int f(x),p(x),dx = \int f(x),\frac{p(x)}{q(x)},q(x),dx = \mathbb{E}_q!\left[f(X),w(X)\right],$$

where the importance weight $w(x)=p(x)/q(x)$ corrects for the mismatch between sampling distribution and target. The standard estimator is

$$\hat\mu_{\text{IS}} = \frac{1}{N}\sum_{i=1}^N f(x_i),w(x_i), \qquad x_i \sim q.$$

When the supports are correct the estimator is unbiased; in practice the choice of $q$ controls almost everything that can go right or wrong with the method.

Why the technique matters

Three forces keep importance sampling at the centre of modern probabilistic computing:

Sampling from $p$ may be impossible or expensive. $p$ might be an unnormalised posterior, the distribution induced by a target reinforcement learning policy, or the conditional path measure of a stochastic differential equation. Drawing from a tractable $q$ and reweighting is often the only feasible option.
Variance reduction. When $f$ concentrates in a region where $p$ puts little mass (rare-event probabilities, tail integrals, expected loss under adversarial inputs), naive Monte Carlo wastes nearly all of its samples. A well-chosen $q$ shifts samples into the region that matters.
It is the mathematical glue under modern algorithms. PPO clipping, V-trace, Retrace, IWAE bounds, particle filters, annealed importance sampling for marginal likelihoods, and counterfactual estimators in advertising and recommendation systems are all dressed-up importance samplers.

Different fields rediscovered the same equation under different names. Statisticians call $w$ the importance weight. Causal inference calls it the inverse propensity score. Survey methodologists call it a sampling weight. Reinforcement learning calls $\rho_t = \pi(a|s)/\mu(a|s)$ the importance ratio. The arithmetic is identical.

Mathematical foundations

Unbiasedness and variance

When $q$ dominates $|f|p$ (that is, $q(x)>0$ wherever $f(x)p(x)\neq 0$), $\hat\mu_{\text{IS}}$ is unbiased for $\mathbb{E}_p[f(X)]$. Its variance is

$$\operatorname{Var}_q!\left[f(X),w(X)\right] = \mathbb{E}_q!\left[f(X)^2 w(X)^2\right] - \bigl(\mathbb{E}_p[f(X)]\bigr)^2 = \int \frac{f(x)^2 p(x)^2}{q(x)},dx - \mu^2.$$

The second moment can be infinite even when $\mathbb{E}_p[f]$ is finite. Owen's Monte Carlo theory, methods and examples (Chapter 9) gives precise conditions: a sufficient finite-variance condition is that $w(x)$ is bounded, i.e. $\sup_x p(x)/q(x) < \infty$. When this fails the estimator can still be unbiased but its variance is infinite, the central limit theorem does not apply, and confidence intervals built from sample variance are meaningless.

The optimal proposal

Minimising the second moment over $q$ subject to $\int q = 1$ gives the optimal proposal

$$q^*(x) \propto |f(x)|,p(x).$$

For non-negative $f$ this $q^$ achieves zero variance; a single sample suffices. The catch is that $q^$'s normalising constant is exactly $\mathbb{E}_p[|f(X)|]$, the quantity we wanted in the first place. The optimal proposal is therefore unattainable, but it is a useful target: a good $q$ should look roughly like $|f|p$, putting mass where $f$ is large and $p$ is non-negligible.

Effective sample size

The most common diagnostic for an importance sample is the effective sample size (ESS):

$$n_{\text{eff}} = \frac{\left(\sum_{i=1}^N w_i\right)^2}{\sum_{i=1}^N w_i^2}.$$

Its value lies between $1$ (one weight dominates) and $N$ (uniform weights). An ESS of $350$ from $N=1000$ samples means that the weighted estimator carries about as much information as $350$ direct draws from $p$. ESS is widely used in sequential Monte Carlo to decide when to resample particles. It is a necessary diagnostic but not a sufficient one: a high ESS can still hide catastrophic tail behaviour if $q$ misses an important mode entirely.

Self-normalised importance sampling

When $p$ is known only up to a normalising constant (the usual situation in Bayesian inference where $p(x)\propto \tilde p(x)$), the unnormalised weights $\tilde w(x)=\tilde p(x)/q(x)$ are still computable but $\hat\mu_{\text{IS}}$ is not. The standard fix is the self-normalised estimator (SNIS):

$$\tilde w_i = \frac{\tilde p(x_i)/q(x_i)}{\sum_{j=1}^N \tilde p(x_j)/q(x_j)}, \qquad \hat\mu_{\text{SNIS}} = \sum_{i=1}^N \tilde w_i,f(x_i).$$

SNIS is biased (the ratio of two random variables) but consistent, and it often has lower mean-squared error than the vanilla estimator because of the cancellation between numerator and denominator fluctuations. The bias scales as $O(1/N)$ while the variance scales as $O(1/N)$, so the bias becomes negligible quickly. Robert and Casella's Monte Carlo Statistical Methods (Chapter 3) treats SNIS as the default importance sampler in Bayesian practice. Recent work such as Cardoso et al.'s 2022 BR-SNIS uses iterated sampling-importance-resampling to reduce the bias further at essentially the same cost.

Variance reduction techniques

Vanilla importance sampling is only one of a family of variance reduction tools, and it is often combined with the others.

Technique	Idea	Typical use
Antithetic variables	Pair each $x_i$ with $-x_i$ (or another negatively correlated draw)	Symmetric integrands, simulation studies
Control variates	Subtract a known-mean function correlated with $f$	Pricing financial derivatives, RL baselines
Stratified sampling	Partition the domain and sample within each stratum	Survey statistics, low-discrepancy QMC
Multiple importance sampling (Veach 1995)	Combine samples from several proposals via the balance heuristic	Path-traced rendering, bidirectional path tracing
Adaptive importance sampling	Refine $q$ from past samples (population Monte Carlo, AMIS)	Bayesian model evidence, simulation-based inference
Annealed importance sampling (Neal 2001)	Move along a tempered sequence $p_0,\ldots,p_n$ via MCMC kernels and accumulate weights	Marginal likelihood estimation, hard posteriors

Multiple importance sampling deserves a closer look. Eric Veach and Leonidas Guibas introduced it at SIGGRAPH 1995 in Optimally Combining Sampling Techniques for Monte Carlo Rendering, and it remains the workhorse of production renderers. Their balance heuristic weights each sample by the proposal density that produced it relative to the sum of all proposal densities, which is provably close to optimal in the single-sample case.

Off-policy reinforcement learning

Importance sampling is the standard correction in off-policy reinforcement learning, where a behaviour policy $\mu$ generates trajectories and a different target policy $\pi$ is evaluated or improved. The setting is unavoidable in three places: replay buffers in deep RL, observational data in healthcare and recommendation systems, and parallel actor-learner architectures where the actors run a stale policy by the time the learner uses their data.

The trajectory ratio

For a single full trajectory $\tau = (s_0, a_0, r_1, s_1, a_1, \ldots, s_T)$ generated by $\mu$, the trajectory importance ratio is

$$\rho_{0:T} = \prod_{k=0}^{T} \frac{\pi(a_k\mid s_k)}{\mu(a_k\mid s_k)}.$$

An unbiased estimator of the value $V_\pi(state)$ for an episodic Markov decision process is

$$\hat V_\pi(s) = \frac{1}{N}\sum_{i=1}^N \rho^{(i)}_{0:T},G^{(i)},$$

where $G^{(i)}$ is the discounted return of the $i$-th trajectory. Sutton and Barto's Reinforcement Learning: An Introduction (Chapter 5) develops this ordinary and the corresponding weighted importance sampling estimator and works out their bias/variance trade-off. The trouble is the product: a long trajectory multiplies many ratios, the variance of $\rho_{0:T}$ explodes exponentially in $T$, and a single lucky episode can swing the average by orders of magnitude. Sutton and Barto put it bluntly: weighted importance sampling "learns only from the tails of episodes," because most weights are negligible.

Per-decision importance sampling

Doina Precup, Richard Sutton and Satinder Singh introduced per-decision importance sampling in their ICML 2000 paper Eligibility Traces for Off-Policy Policy Evaluation. The trick is that the reward at time $t$ only depends on the actions up to time $t$, so its expectation needs only the partial product $\rho_{0:t}$, not the full trajectory ratio. This shrinks the effective variance considerably and is the foundation of every modern off-policy return estimator. The same paper analyses five eligibility-trace algorithms that combine importance sampling with TD learning and proves their consistency.

Truncated and clipped ratios

More recent work tames the variance by truncating the ratios at the cost of some bias.

Tree backup. Replaces the importance ratio with the target-policy probability, eliminating variance from the behaviour policy at the cost of off-policyness.
Retrace($\lambda$) (Munos et al., NeurIPS 2016). Uses the truncated ratio $\bar\rho_t = \lambda\min(1, \pi(a_t|s_t)/\mu(a_t|s_t))$ in a return-based off-policy update. Munos and colleagues proved Retrace is the first return-based off-policy control algorithm that converges almost surely to $Q^*$ without a Greedy-in-the-Limit-with-Infinite-Exploration assumption, and it solved a long-standing open question about Watkins's $Q(\lambda)$.
V-trace (Espeholt et al., IMPALA, ICML 2018). Uses two clipping constants $\bar\rho$ and $\bar c$ on per-step ratios so a distributed actor-learner architecture can absorb policy lag from thousands of asynchronous workers. IMPALA achieves about 250,000 frames per second on a single learner with hundreds of actors and is the workhorse of large-scale on-policy-style training.
PPO clipping (Schulman et al. 2017). Proximal policy optimization's flagship objective is

$$L^{\text{CLIP}}(\theta) = \mathbb{E}_t!\left[\min!\left(r_t(\theta)\hat A_t,;\operatorname{clip}(r_t(\theta), 1-\epsilon, 1+\epsilon)\hat A_t\right)\right],$$

where $r_t(\theta) = \pi_\theta(a_t|s_t)/\pi_{\theta_{\text{old}}}(a_t|s_t)$ is a one-step importance ratio. The clip constant (typically $\epsilon=0.2$) prevents the ratio from blowing up when the new policy strays from the old, which would otherwise destabilise gradient updates. PPO is now the default policy-gradient method in most deep-RL libraries.

Variational inference and generative models

Kingma and Welling's 2013 Variational Autoencoder maximises the evidence lower bound (ELBO) on $\log p(x)$ using a single sample from the recognition network $q_\phi(z\mid x)$. The ELBO can be tightened with importance weighting.

Burda, Grosse and Salakhutdinov's Importance Weighted Autoencoders (ICLR 2016) introduced the IWAE bound

$$\mathcal{L}K^{\text{IWAE}}(x) = \mathbb{E}{z_1,\ldots,z_K\sim q_\phi(\cdot\mid x)}!\left[\log \frac{1}{K}\sum_{k=1}^K \frac{p_\theta(x, z_k)}{q_\phi(z_k\mid x)}\right].$$

By Jensen's inequality this is a lower bound on $\log p_\theta(x)$ for every $K\ge 1$, with $\mathcal{L}1^{\text{IWAE}} = \text{ELBO}$, and the bound is monotonically tighter as $K$ grows, converging to $\log p\theta(x)$ in the limit. The trick is the position of the logarithm: the VAE takes the average of logs, the IWAE takes the log of an average of importance-weighted likelihood ratios. IWAE-style $K$-sample bounds are now the standard way to evaluate the marginal likelihood of normalising flows, diffusion models, and other deep generative models.

Closely related techniques include Reweighted Wake-Sleep (Bornschein and Bengio, ICLR 2015), which uses importance-weighted wake-phase updates to train discrete generative models, and Annealed Importance Sampling (Neal 2001), which constructs a chain of intermediate distributions $p_0,\ldots,p_n$ between a tractable proposal and the target and accumulates importance weights along an MCMC trajectory. AIS is the standard estimator of marginal likelihoods for restricted Boltzmann machines, deep belief networks, and recently for evaluating diffusion-model log-likelihoods.

Particle filters and sequential Monte Carlo

In state-space models the target is a sequence of posteriors $p(x_{0:t}\mid y_{1:t})$ that grows in dimension over time. Sequential importance sampling (SIS) extends an importance sample one time step at a time, multiplying weights by $w_t \propto p(y_t\mid x_t),p(x_t\mid x_{t-1})/q(x_t\mid x_{t-1}, y_t)$.

SIS suffers from weight degeneracy: after a few time steps almost all probability mass concentrates on a single particle. The fix is resampling: at each step (or whenever the ESS falls below a threshold), draw a new particle population proportional to current weights. This Sampling-Importance-Resampling (SIR) algorithm is what Gordon, Salmond and Smith introduced in their 1993 IEE paper Novel approach to nonlinear/non-Gaussian Bayesian state estimation, popularising the bootstrap filter. The bootstrap filter and its descendants underpin robot localisation (FastSLAM), object tracking, target tracking in radar, epidemic forecasting, and probabilistic programming languages such as Anglican and Pyro.

Use cases across machine learning

Application area	How importance sampling is used	Representative reference
Off-policy value estimation	Reweight returns by trajectory or per-decision ratios	Precup, Sutton and Singh 2000
Distributed deep RL	V-trace clipping in actor-learner architectures	Espeholt et al. (IMPALA) 2018
Policy optimisation	PPO clipped surrogate, TRPO surrogate	Schulman et al. 2017
Variational inference	IWAE tighter ELBO, Reweighted Wake-Sleep	Burda et al. 2016
Marginal likelihood	Annealed importance sampling along tempered chain	Neal 2001
Bayesian model comparison	PSIS-LOO leave-one-out cross-validation	Vehtari, Gelman, Gabry 2017
Particle filters	SIR/bootstrap filter for state-space models	Gordon, Salmond and Smith 1993
Computer graphics rendering	Multiple importance sampling for path tracing	Veach and Guibas 1995
Counterfactual policy evaluation	Inverse propensity scoring for ad/recommendation systems	Bottou et al. 2013
Causal inference	Inverse probability of treatment weighting	Horvitz and Thompson 1952
Survey statistics	Sampling weights to correct stratified surveys	Hansen and Hurwitz 1943
Likelihood-free inference (ABC)	Reweight simulations by approximate likelihood	Sisson et al. 2007
Diffusion model evaluation	IWAE-style upper bound on negative log-likelihood	Song et al. 2021
Neural rendering and inverse graphics	Multiple importance sampling within differentiable renderers	Müller et al. 2019

Pitfalls and limitations

Importance sampling is unforgiving when the proposal is wrong. The classical failure modes are:

Heavy-tailed weight distributions. When $p/q$ has a heavy right tail, a handful of samples dominate the average and the estimator behaves erratically. The Pareto-shape parameter of the weights is the standard diagnostic; values above $0.7$ are typically considered unreliable.
Mismatched supports. If $q(x)=0$ at some $x$ where $p(x)f(x)\neq 0$, the estimator is biased and the bias may be silently catastrophic. This is the most common bug in implementations: a softmax target policy that puts mass on actions that the behaviour policy excluded outright.
Infinite variance. Whenever $\int f(x)^2 p(x)^2/q(x),dx = \infty$, the variance estimate from a finite sample is misleading and the central limit theorem does not apply. Owen's Chapter 9 discusses sufficient conditions for finite variance.
Curse of dimensionality. The variance of the weight $w(X)$ in dimension $d$ typically grows exponentially with $d$ unless $q$ matches $p$ along most coordinates. As a result, vanilla importance sampling is rarely effective in moderate-to-high-dimensional problems without structural help (sequential structure, annealing, normalising-flow proposals, or learned proposals).
Counterfactual brittleness. In off-policy evaluation for recommender systems and ads, a target policy that puts probability $0.1$ on an item that the logging policy showed with probability $0.001$ produces a $100\times$ weight on every observed click of that item. A single click can dominate the entire estimate.

Diagnostics

Three diagnostics are standard practice:

Effective sample size. Computed continuously during a run; small ESS indicates a poor proposal or weight degeneracy.
Pareto-smoothed importance sampling (PSIS). Vehtari, Simpson, Gelman, Yao and Gabry's 2024 JMLR paper Pareto Smoothed Importance Sampling fits a generalised Pareto distribution to the upper tail of the importance ratios, smooths the largest weights with the fitted quantiles, and reports the shape parameter $\hat k$ as a finite-sample convergence diagnostic. The recommended threshold depends on sample size: $\hat k < \min(1 - 1/\log_{10} S,; 0.7)$ for $S$ samples. PSIS is implemented in the R package loo and the Python package arviz.
Visual inspection of weight histograms. A long, sparse right tail signals a problem that ESS may not reveal.

Implementations

Importance sampling is built into most probabilistic programming languages and reinforcement learning frameworks:

System	Where importance sampling appears
PyMC	SMC sampler, variational SVGD, PSIS-LOO
Pyro / NumPyro	`infer.SMCFilter`, `infer.MCMC`, IWAE example
TensorFlow Probability	`tfp.mcmc`, `tfp.experimental.distribute`, importance-weighted ELBO
Edward2	Importance-weighted training and evaluation
Stan + `loo`	PSIS leave-one-out cross-validation
Stable-Baselines3	PPO clipped objective, off-policy SAC importance corrections
RLlib	IMPALA V-trace, PPO clipping
Tianshou	PPO and off-policy actor-critic algorithms with IS corrections
Open Source Image rendering (Mitsuba, PBRT)	Multiple importance sampling in path tracers

Recent developments

Research interest in importance sampling has stayed steady because the technique sits at the intersection of so many fields. A few directions stand out in the past decade.

Pareto-smoothed importance sampling. Vehtari et al.'s PSIS gave the field a practical, sample-size-aware diagnostic and is now the default for Bayesian leave-one-out cross-validation.

Differentiable Monte Carlo. Treating importance-weighted estimators as differentiable graphs (with the reparameterisation trick where possible) lets practitioners back-propagate through Monte Carlo objectives. This is the technical core of the IWAE bound, of differentiable rendering, and of recent work on differentiable annealed importance sampling.

Off-policy evaluation in offline reinforcement learning. The shift from online RL to offline RL on logged data made off-policy evaluation a first-class problem. Self-normalised importance sampling, doubly robust estimators, and clipped IPS are routine in the offline-RL literature.

LLM evaluation and inference-time scaling. Importance sampling has become a core tool for evaluating large language models on rare prompts, for re-weighting samples from proposal LLMs to estimate behaviour under a target policy (used in distillation and red-teaming), and for re-scoring beam search candidates with auxiliary models.

Neural proposals. Normalising-flow and energy-based models are now used as learned importance-sampling proposals in physics simulation and lattice quantum chromodynamics, where a hand-designed $q$ would be hopeless.

Comparison with other Monte Carlo methods

Method	Assumptions on target	When to use	Cost per sample	Output
Vanilla Monte Carlo	Can sample directly from $p$	$p$ is easy to draw from and integrand is well-behaved	Very low	i.i.d. samples from $p$, unbiased $\hat\mu$
Importance sampling	Can evaluate $p$ (up to constant) and have a covering proposal $q$	Sampling from $p$ is hard, or $f$ concentrates in a rare region	Low	Weighted samples, unbiased (or consistent) $\hat\mu$
Markov Chain Monte Carlo (Metropolis–Hastings, Gibbs)	Can evaluate $\tilde p$ up to constant	Posterior sampling, unimodal or moderately multimodal targets	Moderate, correlated samples	Approximate samples after burn-in
Hamiltonian Monte Carlo	$\nabla\log\tilde p$ available	Continuous high-dimensional posteriors	Higher per step, much better mixing	Approximate samples with low autocorrelation
Sequential Monte Carlo / particle filters	State-space structure or sequence of intermediate targets	Nonlinear, non-Gaussian filtering, evidence estimation	Moderate, parallel	Particle approximation of $p_t$, evidence estimate
Variational inference	Choose a tractable family $q_\phi$	Speed-critical Bayesian inference, large-scale latent-variable models	Optimisation cost	Parametric approximation $q_\phi$
Annealed importance sampling	Tempered chain $p_0,\ldots,p_n$	Marginal likelihood / evidence estimation	Higher (chain of MCMC steps)	Unbiased weight, evidence estimate

A short history

While precursors of importance sampling appear in 1949 statistical-physics work by Kahn and Marshall (and in Hansen and Hurwitz's 1943 stratified survey weights), the explicit formulation as a variance-reduction technique is generally credited to the early Los Alamos Monte Carlo group. Hammersley and Handscomb's 1964 Monte Carlo Methods (Methuen Monographs on Applied Statistics and Probability) gave the first textbook treatment, and is still cited as the canonical reference for the classical results. Teun Kloek and Herman van Dijk's 1978 Econometrica paper introduced importance sampling to Bayesian econometrics. Rubinstein's 1981 Simulation and the Monte Carlo Method and Robert and Casella's 2004 Monte Carlo Statistical Methods (especially Chapter 3) are the standard graduate references in statistics; Bishop's Pattern Recognition and Machine Learning (2006) and Murphy's Machine Learning: A Probabilistic Perspective (2012) and Probabilistic Machine Learning (2022) cover the technique for ML audiences. Owen's online textbook Monte Carlo theory, methods and examples (2013, in progress) gives the most complete modern variance-reduction treatment, with Chapter 9 devoted entirely to importance sampling. The reinforcement-learning treatment is concentrated in Sutton and Barto's Reinforcement Learning: An Introduction (2018, 2nd edition), Chapters 5 and 7.

References

Hammersley, J. M. and Handscomb, D. C. (1964). *Monte Carlo Methods*. Methuen, London. Monographs on Applied Statistics and Probability.
Kloek, T. and van Dijk, H. K. (1978). Bayesian Estimates of Equation System Parameters: An Application of Integration by Monte Carlo. *Econometrica* 46(1):1–19.
Geweke, J. (1989). Bayesian Inference in Econometric Models Using Monte Carlo Integration. *Econometrica* 57(6):1317–1339.
Gordon, N. J., Salmond, D. J., and Smith, A. F. M. (1993). Novel approach to nonlinear/non-Gaussian Bayesian state estimation. *IEE Proceedings F (Radar and Signal Processing)* 140(2):107–113.
Veach, E. and Guibas, L. J. (1995). Optimally Combining Sampling Techniques for Monte Carlo Rendering. *SIGGRAPH '95*.
Precup, D., Sutton, R. S., and Singh, S. (2000). Eligibility Traces for Off-Policy Policy Evaluation. *ICML 2000*.
Neal, R. M. (2001). Annealed importance sampling. *Statistics and Computing* 11:125–139.
Robert, C. P. and Casella, G. (2004). *Monte Carlo Statistical Methods* (2nd ed.). Springer Texts in Statistics. Chapter 3 covers importance sampling.
Bishop, C. M. (2006). *Pattern Recognition and Machine Learning*. Springer.
Murphy, K. P. (2012). *Machine Learning: A Probabilistic Perspective*. MIT Press.
Owen, A. B. (2013). *Monte Carlo theory, methods and examples*. Online textbook. Chapter 9: Importance Sampling.
Kingma, D. P. and Welling, M. (2013). Auto-Encoding Variational Bayes. arXiv:1312.6114.
Burda, Y., Grosse, R. and Salakhutdinov, R. (2016). Importance Weighted Autoencoders. *ICLR 2016*. arXiv:1509.00519.
Munos, R., Stepleton, T., Harutyunyan, A. and Bellemare, M. G. (2016). Safe and Efficient Off-Policy Reinforcement Learning. *NeurIPS 2016*.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O. (2017). Proximal Policy Optimization Algorithms. arXiv:1707.06347.
Vehtari, A., Gelman, A. and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. *Statistics and Computing* 27:1413–1432.
Espeholt, L., Soyer, H., Munos, R., Simonyan, K., et al. (2018). IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. *ICML 2018*.
Sutton, R. S. and Barto, A. G. (2018). *Reinforcement Learning: An Introduction* (2nd ed.). MIT Press. Chapters 5 (off-policy MC) and 7 (off-policy n-step).
Cardoso, G., Idrissi, Y. J. E., Le Corff, S., Moulines, E. (2022). BR-SNIS: Bias Reduced Self-Normalized Importance Sampling. *NeurIPS 2022*. arXiv:2207.06364.
Murphy, K. P. (2022). *Probabilistic Machine Learning: An Introduction*. MIT Press.
Vehtari, A., Simpson, D., Gelman, A., Yao, Y. and Gabry, J. (2024). Pareto Smoothed Importance Sampling. *Journal of Machine Learning Research* 25:1–58.
Wikipedia contributors. *Importance sampling*. Wikipedia.

Why the technique matters

Mathematical foundations

Unbiasedness and variance

The optimal proposal

Effective sample size

Self-normalised importance sampling

Variance reduction techniques

Off-policy reinforcement learning

The trajectory ratio

Per-decision importance sampling

Truncated and clipped ratios

Variational inference and generative models

Particle filters and sequential Monte Carlo

Use cases across machine learning

Pitfalls and limitations

Diagnostics

Implementations

Recent developments

Comparison with other Monte Carlo methods

A short history

See also

References

Improve this article

Related Articles

Rejection sampling

Machine learning terms/Reinforcement Learning

AlphaGo

State (Reinforcement Learning)

State-Action Value Function

Action (Reinforcement Learning)

Why the technique matters

Mathematical foundations

Unbiasedness and variance

The optimal proposal

Effective sample size

Self-normalised importance sampling

Variance reduction techniques

Off-policy reinforcement learning

The trajectory ratio

Per-decision importance sampling

Truncated and clipped ratios

Variational inference and generative models

Particle filters and sequential Monte Carlo

Use cases across machine learning

Pitfalls and limitations

Diagnostics

Implementations

Recent developments

Comparison with other Monte Carlo methods

A short history

See also

References

Related Articles

Rejection sampling

Machine learning terms/Reinforcement Learning

AlphaGo

State (Reinforcement Learning)

State-Action Value Function

Action (Reinforcement Learning)