Bayesian network

A Bayesian network (also called a belief network, Bayes net, directed graphical model, or probabilistic causal network) is a probabilistic graphical model that represents a set of random variables and their conditional dependencies through a directed acyclic graph (DAG). Each node of the graph is a random variable, and each directed edge represents a direct probabilistic dependency, typically read as the parent influencing the child. The graph encodes a joint probability distribution as a product of local conditional distributions: P(X_1, ..., X_n) = ∏ P(X_i | parents(X_i)). This factorisation, together with the conditional-independence assumptions implied by the graph, lets a model with potentially exponential joint complexity be specified, stored, and queried in space and time that scale with the local structure rather than the full state space.^[1]^[2]

The modern formulation of Bayesian networks was developed in the 1980s by Judea Pearl and collaborators at UCLA, who unified prior work on probabilistic expert systems, influence diagrams, and tree-based propagation into the framework laid out in Pearl's 1988 textbook Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.^[1] Together with the junction-tree algorithm of Lauritzen and Spiegelhalter (1988),^[3] the Bayesian-network formalism became the standard tool for reasoning about uncertainty in artificial intelligence throughout the 1990s. It later provided the substrate on which Pearl built his theory of causal inference and the do-calculus, presented in Causality: Models, Reasoning, and Inference (2nd ed., 2009).^[4] The canonical reference for the broader field of probabilistic graphical models is Koller and Friedman's 2009 textbook Probabilistic Graphical Models: Principles and Techniques.^[2]

Bayesian networks are closely related to, and in many cases generalise, classical models such as naive Bayes classifiers, Hidden Markov Models, and Kalman filters. They remain in active use for medical diagnosis, fault diagnosis, gene regulatory network inference, risk assessment, decision support, and any setting where structured uncertainty, expert knowledge, and interpretability matter more than raw predictive accuracy.

Definition and joint factorisation

A Bayesian network over variables X_1, ..., X_n consists of two parts. The first is a directed acyclic graph G whose nodes are the variables. The second is, for each node X_i, a conditional probability distribution P(X_i | parents(X_i)) that quantifies how the variable depends on its parents in G. For discrete variables, this distribution is usually represented as a conditional probability table (CPT). For continuous variables, it is a conditional density, often a linear-Gaussian or a noisy-OR model.^[1]^[2]

The central claim of the formalism is that the joint distribution factorises as

P(X_1, ..., X_n) = ∏_{i=1..n} P(X_i | parents(X_i))

This factorisation is a direct consequence of the chain rule of probability combined with the local Markov assumption: each variable is conditionally independent of its non-descendants given its parents. The factorisation has two practical consequences. First, the model is parameter-frugal: a node with k binary parents needs 2^k probabilities, not 2^n. Second, queries that respect the graph structure can often be answered by message-passing on the graph rather than by enumerating the full joint table.^[2]

History

The roots of Bayesian networks lie in 1960s and 1970s work on probabilistic reasoning in expert systems, which struggled with the combinatorial explosion of joint distributions over many variables. Several lines of research converged in the early 1980s. In statistics, contingency-table analysis and the theory of graphical log-linear models, developed by Darroch, Lauritzen, and Speed, established that conditional independence could be read off from a graph. In artificial intelligence, the limitations of certainty factors and ad-hoc rule-based reasoning under uncertainty were becoming apparent, and Pearl, Spiegelhalter, Cooper, Henrion, and others were searching for a coherent probabilistic alternative.^[1]

Judea Pearl's 1986 paper introduced belief propagation as an exact inference algorithm for poly-trees, and his 1988 textbook Probabilistic Reasoning in Intelligent Systems consolidated the field, defining d-separation, message passing, and the semantics of belief networks.^[1] In parallel, Lauritzen and Spiegelhalter's 1988 paper in the Journal of the Royal Statistical Society introduced the junction-tree (also called clique-tree or join-tree) algorithm, which extended exact inference to general DAGs by triangulating the moralised graph.^[3] The 1990s saw a rapid expansion: Cooper proved in 1990 that exact inference is NP-hard,^[5] which motivated the development of approximate inference methods including Markov chain Monte Carlo, variational approximations, and loopy belief propagation. Software tools such as Hugin (a commercialisation of the Lauritzen-Spiegelhalter framework), Netica, and Pearl's own Bayes Net Toolbox emerged in this period.

In the 2000s the focus shifted toward learning. Heckerman's 1995 Microsoft Research tutorial gave a Bayesian treatment of parameter and structure learning that influenced a generation of researchers,^[6] and the Friedman, Geiger, and Goldszmidt 1997 paper introduced Tree-Augmented Naive Bayes (TAN) as a competitive classifier.^[7] Pearl's 2000/2009 Causality recast Bayesian networks as a language for causal reasoning, introducing the do-operator and the calculus of interventions. The 2010s and 2020s have seen Bayesian networks subsumed into the broader frameworks of probabilistic programming (Stan, PyMC, Pyro, NumPyro), causal discovery, and neuro-symbolic AI.^[2]^[4]

Foundational concepts

Conditional probability tables and densities

For each node, a Bayesian network specifies a conditional distribution given its parents. With discrete variables this is a CPT whose rows are parent configurations and whose columns are the possible values of the child. For continuous variables, common parametric forms include linear-Gaussian (the child is a linear function of its parents plus Gaussian noise) and conditional-Gaussian for hybrid networks that mix discrete and continuous variables.^[2]

D-separation

D-separation is a graph-theoretic criterion, introduced by Pearl, that determines which conditional independencies hold in a Bayesian network purely from its graph structure. A path between two nodes is blocked by a conditioning set Z if it contains a chain or fork node that is in Z, or a collider (a node with two converging arrows) that is neither in Z nor has a descendant in Z. If every path between X and Y is blocked by Z, then X and Y are d-separated by Z, and the network entails X ⊥ Y | Z.^[1] D-separation is sound (every d-separated pair is conditionally independent in every distribution that factorises over the graph) and is a complete graphical characterisation of the independencies entailed by the structure alone.

Markov blanket

The Markov blanket of a node X consists of its parents, its children, and the other parents of those children (sometimes called spouses or co-parents). Conditional on its Markov blanket, X is independent of every other node in the network. This locality property is what makes Gibbs sampling and other local update schemes practical: each variable can be resampled using only the values of its blanket.^[1]^[2]

Chain rule and factorisation

The joint factorisation P(X_1, ..., X_n) = ∏ P(X_i | parents(X_i)) follows from the chain rule of probability under any topological ordering, combined with the local Markov property that says a node is independent of its non-descendants given its parents. Choosing a different ordering can yield a different graph, and minimal Bayesian networks (those with the fewest edges that respect a given independence structure) generally depend on ordering, although they are unique up to a class of equivalent DAGs sharing the same skeleton and v-structures.

Inference

Given a Bayesian network, the canonical inference problem is to compute the posterior distribution of one set of variables (queries) given values for another set (evidence), marginalising over the rest. Several families of algorithms exist; the choice depends on graph structure, the number of variables, and accuracy requirements.

Method	Type	Origin	Notes
Belief propagation	Exact on trees	Pearl 1986; Pearl 1988	Exact for poly-trees; "loopy BP" is an approximation on general graphs
Variable elimination	Exact	Zhang and Poole 1994; Dechter 1996	Sums out non-query variables one at a time; complexity exponential in induced width
Junction tree (clique tree)	Exact	Lauritzen and Spiegelhalter 1988^[3]	Polynomial in the tree-width of the moralised, triangulated graph
Cutset conditioning	Exact	Pearl 1988	Conditions on a small set to break loops, then runs poly-tree BP on each instantiation
Gibbs sampling	Approximate, sampling	Geman and Geman 1984; Pearl 1987	Uses Markov-blanket conditionals; basis of WinBUGS
Metropolis-Hastings	Approximate, sampling	Metropolis et al. 1953; Hastings 1970	General-purpose MCMC
Importance sampling, likelihood weighting	Approximate, sampling	Fung and Chang 1989; Shachter and Peot 1989	Useful when evidence is unlikely under the prior
Mean-field variational inference	Approximate, deterministic	Saul, Jaakkola, Jordan 1996	Posits a factorised approximation; minimises KL divergence
Structured variational methods	Approximate, deterministic	Saul and Jordan 1996; Wainwright and Jordan 2008	Trades accuracy for tractability through structured families
Loopy belief propagation	Approximate	Murphy, Weiss, Jordan 1999	Belief propagation applied to general graphs; often empirically accurate

Computational complexity

Gregory Cooper proved in 1990 that exact inference in general Bayesian networks is NP-hard, even for restricted topologies.^[5] Dagum and Luby (1993) later showed that approximating posterior probabilities to within a constant factor is also NP-hard in the worst case. The intractability is not merely theoretical: real medical-diagnosis networks such as QMR-DT (Quick Medical Reference, Decision-Theoretic) have hundreds of disease nodes and thousands of finding nodes, making exact inference infeasible and motivating the use of variational and sampling approximations.

For structured graphs the picture is much better. Trees admit linear-time exact inference through belief propagation. Graphs with bounded tree-width admit polynomial-time exact inference through the junction-tree algorithm, with complexity O(n · 2^w) for binary variables, where w is the tree-width.^[3] In practice this means that networks with many variables but limited induced connectivity remain tractable.

Learning Bayesian networks

A full learning problem combines parameter learning (filling in the CPTs) with structure learning (choosing the DAG itself). When the structure is known and data is complete, parameter learning reduces to counting; otherwise it can become very hard.

Parameter learning

With complete data and a fixed structure, maximum-likelihood estimation of CPT entries reduces to normalised counts of joint occurrences in the data. The Bayesian alternative places Dirichlet priors on the rows of each CPT and yields posterior CPT entries as posterior expectations of the corresponding multinomial parameters; the conjugate-prior structure makes this update a closed-form add-and-normalise operation. With incomplete data or hidden variables, the standard tool is the expectation-maximisation (EM) algorithm, applied in this setting by Lauritzen (1995) and others. Structural EM (Friedman 1997) extends EM to alternate between parameter and structure updates when there are latent variables.^[6]

Structure learning

Learning the DAG itself is harder. Chickering (1996) showed that finding the network structure that maximises a Bayesian score is NP-hard. Three main families of algorithms exist.

Family	Representative algorithms	Idea
Score-based	Greedy hill-climbing, tabu search, simulated annealing, GES (Greedy Equivalence Search)	Optimise a goodness-of-fit score (BIC, AIC, BDe, MDL) over the space of DAGs or equivalence classes
Constraint-based	PC algorithm (Spirtes and Glymour 1991); FCI (Fast Causal Inference)	Use conditional-independence tests to discover the skeleton, then orient edges using v-structures and orientation rules^[8]
Hybrid	MMHC (Tsamardinos, Brown, Aliferis 2006)^[9]; Sparse Candidate (Friedman et al. 1999)	Use independence tests to restrict candidate parents, then score-optimise within that restricted space

Score-based methods rely on decomposable scores such as the Bayesian Dirichlet equivalent (BDe) score introduced by Heckerman, Geiger, and Chickering (1995),^[6] BIC (Bayesian information criterion), or AIC (Akaike information criterion). Bayesian model averaging integrates predictions over the posterior distribution on graphs rather than picking a single best graph, which can be useful when many structures fit the data nearly equally well.

Constraint-based methods like the PC algorithm^[8] start from a complete undirected graph and remove edges between variables found to be conditionally independent given some subset of the others. Surviving edges are then oriented using v-structures (colliders) and Meek's rules. The Fast Causal Inference (FCI) algorithm extends PC to handle latent confounders and selection bias, returning a partial ancestral graph rather than a single DAG.

The MMHC algorithm^[9] combines a constraint-based local skeleton-discovery phase (Max-Min Parents and Children) with a hill-climbing search over DAGs restricted to that skeleton. Empirical comparisons in Tsamardinos, Brown, and Aliferis (2006) found it to outperform PC, GES, and several other algorithms across many benchmarks.

Special cases and variants

Naive Bayes

The simplest Bayesian network of practical interest is the naive Bayes classifier, in which a single class node C is the parent of every feature node X_i, and the features are assumed conditionally independent given the class. Despite its strong independence assumption, naive Bayes is often competitive with much more sophisticated classifiers, especially for text classification.^[7]

Tree-Augmented Naive Bayes (TAN)

Friedman, Geiger, and Goldszmidt (1997) introduced TAN as a relaxation of naive Bayes that allows each feature to depend on at most one other feature in addition to the class.^[7] The optimal tree-shaped augmentation can be found in polynomial time using a maximum-spanning-tree algorithm on the conditional mutual information between features. TAN typically improves classification accuracy over naive Bayes while preserving most of its computational simplicity.

Dynamic Bayesian networks

Dynamic Bayesian networks (DBNs) extend the framework to time-series data. Introduced by Dean and Kanazawa in their 1989 paper A Model for Reasoning about Persistence and Causation in Computational Intelligence,^[10] a DBN consists of a prior network B_1 over the variables at time 1 and a transition network B_→ that defines the conditional distribution of the variables at time t+1 given those at time t. Standard inference tasks in DBNs include filtering (computing the posterior over the current state given past evidence), smoothing (over a past state given all evidence), prediction (over a future state), and most-likely-explanation. Hidden Markov Models and Kalman filters are special cases of DBNs.

Influence diagrams

Influence diagrams extend Bayesian networks with two additional node types: decision nodes representing controllable choices and utility nodes representing the agent's payoff. The resulting model supports decision-theoretic reasoning under uncertainty and is the basis of normative decision-support systems in medicine, business, and engineering.

Markov random fields and factor graphs

Markov random fields (MRFs) are the undirected counterpart of Bayesian networks. They factorise the joint distribution as a product of non-negative potential functions over cliques rather than as a product of conditional distributions over node-parent pairs. Factor graphs provide a unifying bipartite representation in which both Bayesian networks and MRFs are special cases, and the sum-product and max-product algorithms operate on factor graphs.^[2]

Causal Bayesian networks

A causal Bayesian network is a Bayesian network whose edges are interpreted as direct causal relationships rather than mere statistical associations. Pearl developed the framework in the 1990s and consolidated it in Causality: Models, Reasoning, and Inference (2nd ed., 2009).^[4]

The key new operation is the do-operator. P(Y | do(X = x)) denotes the distribution of Y after an external intervention sets X to x, which is generally different from the conditional distribution P(Y | X = x). Graphically, do(X = x) corresponds to a mutilated network in which all incoming edges to X are removed and X is set to x. The do-calculus, a set of three rules introduced by Pearl, gives sufficient conditions for rewriting interventional queries in terms of observational ones. Shpitser and Pearl (2006) and Tian and Pearl (2002) gave a complete identifiability algorithm: an interventional distribution is identifiable from observational data and a causal DAG if and only if their algorithm returns a do-free expression.

Causal Bayesian networks underpin much of modern causal inference and are connected to Bayesian-network structure learning through algorithms such as PC and FCI, which try to recover the causal graph (or its Markov equivalence class) from data alone under faithfulness assumptions.^[8]

Real-world applications

Bayesian networks are most useful in domains that involve structured uncertainty, expert knowledge that needs to be encoded explicitly, and a need for interpretability. The following table lists historically and currently important applications.

Application	System / domain	Notes
Medical diagnosis	MUNIN (Andreassen et al., 1989)	Causal probabilistic network for interpretation of electromyographic findings; one of the first large medical Bayesian networks
Medical diagnosis	Pathfinder (Heckerman, 1992)	Diagnosis of lymph-node diseases; over 100 features and 60+ diseases; commercialised as Intellipath
Medical diagnosis	QMR-DT	Decision-theoretic version of Quick Medical Reference; bipartite network with hundreds of diseases and thousands of findings
User modelling	Lumiere (Microsoft Research, 1993-1998)	Bayesian user-modelling system that became the basis for the Office Assistant ("Clippy") in Microsoft Office 97
Email filtering	Bayesian spam filters (e.g., SpamAssassin, Mozilla)	Naive-Bayes classifiers over word-level features; widely deployed before transformer-based filters
Defence and intelligence	Pearl Harbor reconstruction, missile-defence systems	Bayesian fusion of uncertain sensor data
Bioinformatics	Gene regulatory network inference	Dynamic Bayesian networks applied to time-series gene-expression data; models genetic regulatory networks
Risk and reliability	Industrial fault diagnosis, nuclear safety	Encodes failure modes and diagnostic findings in interpretable form
Speech recognition	HMM-based ASR (pre-deep-learning)	Hidden Markov Models, special-case DBNs, dominated speech recognition through the 1990s and 2000s
Tracking	Kalman-filter-based vehicle and pedestrian tracking	Linear-Gaussian DBNs applied to sensor-fusion problems
Computer vision	Pre-deep-learning image segmentation, object recognition	Bayesian networks and MRFs were used for stereo, segmentation, and parts-based recognition
Forensics	DNA mixture interpretation	Networks like those in TrueAllele encode the relationship between genotypes, observed allele peaks, and evidence

Software

Tool	Language / platform	Notes
Bayes Net Toolbox (BNT)	MATLAB	Written by Kevin Murphy from 1997 onwards; long-standing reference implementation
Hugin	Commercial; APIs in many languages	Direct descendant of the Lauritzen-Spiegelhalter junction-tree work
Netica	Commercial (Norsys)	Widely used in industry and education
GeNIe / SMILE	Free GUI / commercial library; from BayesFusion (formerly U. Pittsburgh)	Influence diagrams and Bayesian networks
SamIam	Java GUI; UCLA Automated Reasoning Group	Research-oriented, exact inference and sensitivity analysis
pgmpy	Python	Open-source library for discrete and continuous Bayesian networks
bnlearn	R	Implements PC, GS, Hill-Climbing, MMHC, and other learners
daft	Python	A drawing library for plate-notation graphical models
Stan, PyMC, Pyro, NumPyro	Probabilistic programming languages	Generalise Bayesian networks; support continuous variables, MCMC, and variational inference

Model	Directed?	Factorisation	Inference complexity	Typical use
Bayesian network	Yes (DAG)	Product of conditionals P(X_i \| parents)	Polynomial in tree-width; NP-hard in general^[5]	Diagnosis, causal reasoning, expert systems
Markov random field (MRF)	No	Product of clique potentials	Polynomial in tree-width; NP-hard in general	Image segmentation, statistical physics
Factor graph	Bipartite	Product of factors over variable subsets	Same as underlying graph	Generic message-passing, including LDPC codes
Hidden Markov Model (HMM)	Yes (chain DAG)	P(X_t \| X_{t-1}) P(O_t \| X_t)	Linear in chain length (forward-backward)	Speech recognition, bioinformatics
Dynamic Bayesian network	Yes	Time-extended Bayesian network^[10]	Polynomial in tree-width per slice	Tracking, time-series modelling
Conditional random field (CRF)	Undirected	Conditional log-linear model P(Y \| X)	Polynomial in tree-width	Sequence labelling (POS tagging, NER)
Influence diagram	Mixed (DAG + decision and utility nodes)	Bayesian network + decision and utility	Polynomial in tree-width for fixed strategies	Decision support

Bayesian inference in Bayesian networks

A recurring source of confusion is the relationship between Bayesian networks and Bayesian inference. The two ideas are related but distinct. Bayesian inference is a general philosophy of statistical reasoning in which beliefs are represented as probability distributions and updated via Bayes' theorem; it can be applied to any model, Bayesian network or not. A Bayesian network is a particular representation for a joint probability distribution, named after Bayes' theorem rather than committed to Bayesian-versus-frequentist methodology. CPTs in a Bayesian network can be estimated by maximum likelihood (a frequentist procedure), and inference of the posterior over query variables given evidence can be performed by purely probabilistic message-passing without invoking any prior. Conversely, Bayesian methods can place priors over the CPT entries, the structure, or both, and combine them through the same network machinery; this is the route that Heckerman, Geiger, and Chickering took with the BDe score.^[6]

Why Bayesian networks fell partly out of fashion

Through the 2010s, deep neural networks came to dominate most large-scale perception and language tasks: image classification, speech recognition, machine translation, and language modelling. Bayesian networks were ill-suited to those problems for several reasons. They scale badly with the number of variables once the graph is dense. They typically require expert knowledge or careful structure learning, both of which are awkward for raw pixels or characters. Their generative semantics, while a strength for diagnosis, is a weakness for discriminative tasks where end-to-end optimisation of conditional likelihood is what matters.

At the same time, Bayesian networks did not disappear. They remain the preferred tool for problems with the following characteristics:

Structured uncertainty over a moderate number of variables, with explicit conditional dependencies.
Heterogeneous evidence and missing data, where the same model needs to support inference in many directions.
Causal questions: "what if we intervened?", which require an interventional rather than a purely associational model.
High-stakes decision support where interpretability and traceability of inferences is more important than headline accuracy.
Domains where expert knowledge is plentiful and labelled data is scarce, such as parts of clinical medicine, reliability engineering, and forensic science.

Probabilistic programming languages (Stan, PyMC, Pyro, NumPyro) generalise Bayesian networks to arbitrary stochastic computations and provide automatic inference engines based on Hamiltonian Monte Carlo and variational inference. They are best understood as the modern descendants of the BNT and Hugin tradition rather than as replacements for graphical models.

Modern resurgence

Several contemporary research currents have brought Bayesian networks back into the spotlight, often under different names. Causal discovery, building on Spirtes-Glymour, Pearl, and others, has become a substantial subfield with applications in genomics, economics, and reinforcement learning. Probabilistic programming systems treat Bayesian networks as a special case of generative models written in code. Neuro-symbolic AI combines neural networks with discrete graphical models, often using neural networks to parameterise the conditional distributions of a Bayesian network or to amortise variational posteriors. And explainable-AI work in high-stakes domains continues to value the transparent, decomposable structure of Bayesian networks over the opaque computations of large neural models.

References

Pearl, J. (1988). *Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference*. Morgan Kaufmann, San Mateo.
Koller, D. and Friedman, N. (2009). *Probabilistic Graphical Models: Principles and Techniques*. MIT Press, Cambridge, MA.
Lauritzen, S. L. and Spiegelhalter, D. J. (1988). "Local computations with probabilities on graphical structures and their application to expert systems". *Journal of the Royal Statistical Society, Series B (Methodological)*, 50(2): 157-224.
Pearl, J. (2009). *Causality: Models, Reasoning, and Inference* (2nd ed.). Cambridge University Press.
Cooper, G. F. (1990). "The computational complexity of probabilistic inference using Bayesian belief networks". *Artificial Intelligence*, 42(2-3): 393-405.
Heckerman, D. (1995). "A Tutorial on Learning With Bayesian Networks". Microsoft Research Technical Report MSR-TR-95-06. Reprinted in M. I. Jordan (ed.), *Learning in Graphical Models*, MIT Press, 1998.
Friedman, N., Geiger, D., and Goldszmidt, M. (1997). "Bayesian Network Classifiers". *Machine Learning*, 29(2-3): 131-163.
Spirtes, P. and Glymour, C. (1991). "An algorithm for fast recovery of sparse causal graphs". *Social Science Computer Review*, 9(1): 62-72.
Tsamardinos, I., Brown, L. E., and Aliferis, C. F. (2006). "The max-min hill-climbing Bayesian network structure learning algorithm". *Machine Learning*, 65(1): 31-78.
Dean, T. and Kanazawa, K. (1989). "A model for reasoning about persistence and causation". *Computational Intelligence*, 5(2): 142-150.
Murphy, K. P. (2001). "The Bayes Net Toolbox for MATLAB". *Computing Science and Statistics*, 33: 1024-1034.
Horvitz, E., Breese, J., Heckerman, D., Hovel, D., and Rommelse, K. (1998). "The Lumiere Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users". *Proceedings of UAI-98*, pp. 256-265.
Heckerman, D. E. and Nathwani, B. N. (1992). "An evaluation of the diagnostic accuracy of Pathfinder". *Computers and Biomedical Research*, 25(1): 56-74.
Andreassen, S., Woldbye, M., Falck, B., and Andersen, S. K. (1987). "MUNIN: A causal probabilistic network for interpretation of electromyographic findings". *Proceedings of IJCAI-87*, pp. 366-372.
Dagum, P. and Luby, M. (1993). "Approximating probabilistic inference in Bayesian belief networks is NP-hard". *Artificial Intelligence*, 60(1): 141-153.
Heckerman, D., Geiger, D., and Chickering, D. M. (1995). "Learning Bayesian networks: The combination of knowledge and statistical data". *Machine Learning*, 20(3): 197-243.

Definition and joint factorisation

History

Foundational concepts

Conditional probability tables and densities

D-separation

Markov blanket

Chain rule and factorisation

Inference

Computational complexity

Learning Bayesian networks

Parameter learning

Structure learning

Special cases and variants

Naive Bayes

Tree-Augmented Naive Bayes (TAN)

Dynamic Bayesian networks

Influence diagrams

Markov random fields and factor graphs

Causal Bayesian networks

Real-world applications

Software

Comparison with related models

Bayesian inference in Bayesian networks

Why Bayesian networks fell partly out of fashion

Modern resurgence

See also

References

Improve this article

Related Articles

ARC-AGI 2

AUC-ROC

Machine learning terms/Clustering

Machine learning terms/Decision Forests

Machine learning terms/Fairness

Machine learning terms/Fundamentals

Definition and joint factorisation

History

Foundational concepts

Conditional probability tables and densities

D-separation

Markov blanket

Chain rule and factorisation

Inference

Computational complexity

Learning Bayesian networks

Parameter learning

Structure learning

Special cases and variants

Naive Bayes

Tree-Augmented Naive Bayes (TAN)

Dynamic Bayesian networks

Influence diagrams

Markov random fields and factor graphs

Causal Bayesian networks

Real-world applications

Software

Comparison with related models

Bayesian inference in Bayesian networks

Why Bayesian networks fell partly out of fashion

Modern resurgence

See also

References

Related Articles

ARC-AGI 2

AUC-ROC

Machine learning terms/Clustering

Machine learning terms/Decision Forests

Machine learning terms/Fairness

Machine learning terms/Fundamentals