Data Visualization
Last reviewed
May 13, 2026
Sources
45 citations
Review status
Source-backed
Revision
v2 ยท 4,203 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 13, 2026
Sources
45 citations
Review status
Source-backed
Revision
v2 ยท 4,203 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: Data Visualization ChatGPT Plugins
Data visualization is the graphical representation of information and data. The use of artificial intelligence in data visualization and business intelligence (BI) has grown rapidly since 2018, when Tableau shipped Ask Data, and accelerated sharply after the release of ChatGPT in late 2022. By 2024 every major BI vendor, including Microsoft Power BI, Tableau, ThoughtSpot, Google Looker, Snowflake, and Databricks, had shipped a generative AI assistant that converted natural language questions into charts, dashboards, and SQL. A parallel research literature on chart understanding, generative dashboards, and natural language to visualization (NL2VIS) sits behind these products.
AI in data visualization spans several adjacent areas. Natural language to chart systems translate questions like "sales by region last quarter" into a Vega-Lite specification or a SQL query joined to a chart. Generative dashboards build a full report from a short prompt and a semantic model. AI BI copilots layer chat, summarization, and code generation on top of an existing BI tool. Automated insight detection scans datasets for outliers, trends, and segment-level drivers without an explicit query. Chart understanding research focuses on the inverse problem: given a chart image, extract its values, summarize it, or answer questions about it. Each area shares an underlying claim that large language models can lower the barrier between business users and structured data.
The quality of the output depends heavily on the semantic layer that sits between the model and the warehouse. Tools that expose a curated set of metrics, dimensions, and joins to the model (Tableau Pulse, Looker, Snowflake Cortex Analyst, Databricks Genie) tend to produce more reliable answers than tools that ask the model to write raw SQL against a schema dump. The cost of a wrong number is high in BI, which is why vendors emphasize semantic models, certification, and trust scoring rather than raw model accuracy.
Automatic chart selection predates the deep learning boom by decades. Jock Mackinlay's 1986 PhD thesis at Stanford described APT, an automatic presentation tool that picked chart types based on data type and the task being supported. The ideas were folded into Tableau as the Show Me feature, added to Tableau 1.5 in 2005 and described in a 2007 paper by Mackinlay, Hanrahan, and Stolte at IEEE VIS. Show Me used heuristics over the VizQL grammar to recommend mark types when the user dropped fields onto a view.
Natural language interfaces appeared in research prototypes during the 2010s. NL4DV, a Python toolkit from the Georgia Tech Visualization Lab released in 2020, translated questions into Vega-Lite specifications using rule-based parsing and a fixed task taxonomy of ten operations. Microsoft shipped Power BI Q&A in 2015, which used tokenization, named entity recognition, and a hand-tuned synonym list to translate English questions into DAX queries against a tabular model.
Tableau previewed Ask Data at its annual conference in October 2018 and released it in Tableau 2019.1 in February 2019. Ask Data accepted colloquial queries, applied synonyms from a semantic model, and used Show Me to pick the visualization type. The same year Tableau shipped Explain Data in version 2019.3 (September 2019), which used Bayesian models to suggest explanations for individual data points in a view, including dimensions not on the chart.
Microsoft introduced Power BI Quick Insights earlier in the decade. Quick Insights ran a set of advanced analytical algorithms developed with Microsoft Research over a semantic model to find correlations, outliers, trends, seasonality, and major factors. SAS Visual Analytics and SAP Analytics Cloud added similar automated insight features. Most of these tools used classical statistics rather than neural networks.
The public release of ChatGPT on November 30, 2022, reset expectations about what a BI chat interface could do. Within six months every major BI vendor announced a generative AI roadmap.
ThoughtSpot was first. The company announced Sage, a GPT-3 integration for natural language search, on March 9, 2023, and demonstrated it at its Beyond conference on May 9 and 10, 2023. Microsoft introduced Copilot for Power BI at Microsoft Build on May 23, 2023, as part of the Microsoft Fabric launch. Salesforce unveiled Tableau GPT and Tableau Pulse at Tableau Conference in May 2023, framing them as the successor to Ask Data and Explain Data. Hex Magic entered public beta on May 4, 2023, bringing AI-generated SQL, Python, and chart cells to Hex notebooks.
The second wave shipped over the following twelve months. Power BI Copilot entered public preview in November 2023 and reached general availability in June 2024. Tableau Pulse became generally available on February 22, 2024, alongside the Tableau 2024.1 release. Einstein Copilot for Tableau entered limited beta in April 2024. Google announced Gemini in Looker at Cloud Next on April 9, 2024, including a Conversational Analytics workspace that went into preview in September 2024. Snowflake launched Cortex Analyst in public preview on August 14, 2024. Databricks announced AI/BI and the Genie conversational interface on June 12, 2024. ThoughtSpot replaced Sage with Spotter, an agentic successor, in 2024.
The table below lists the main generative AI copilots shipped by enterprise BI vendors. Dates refer to the first public announcement and general availability where known.
| Product | Vendor | First announced | General availability | Notes |
|---|---|---|---|---|
| Copilot for Power BI | Microsoft | May 23, 2023 (Build) | June 2024 (Fabric workload) | Built on Azure OpenAI Service. Generates report pages, summaries, DAX, and natural language Q&A. Requires Power BI Premium P1 or Fabric F64 capacity. |
| Tableau Pulse | Salesforce / Tableau | May 2023 (TC23, as Tableau GPT) | February 22, 2024 | Generative AI insight feed over a metrics layer. Sends summaries to Slack, Teams, and email. Free with Tableau Cloud. |
| Einstein Copilot for Tableau (now Tableau Agent) | Salesforce / Tableau | February 2024 | Limited beta April 2024, broader rollout 2024 to 2025 | Conversational assistant inside Tableau Cloud authoring. Built on the Einstein Trust Layer. |
| Spotter (replaces Sage) | ThoughtSpot | Sage March 9, 2023; Spotter 2024 | Spotter for ThoughtSpot Analytics GA 2024 | Agentic BI assistant. Sage was never made GA. Spotter introduced SpotterModel, SpotterViz, and SpotterCode sub-agents. |
| Mode AI / Helix | Mode (acquired by ThoughtSpot July 19, 2023) | 2023 | Folded into ThoughtSpot Analyst Studio in 2025 | Mode contributed a SQL IDE and Python and R notebooks to ThoughtSpot. The Helix data engine powers Mode's interactive visual analytics. |
| Gemini in Looker / Conversational Analytics | Google Cloud | April 9, 2024 (Next 24) | Conversational Analytics preview September 2024 | Includes LookML Assistant, Visualization Assistant, and slide generation. Uses Gemini over the Looker semantic model. |
| Looker Studio Gemini | Google Cloud | July 2024 | Preview for Looker Studio Pro | Brings Gemini chart creation and summary into the free-tier dashboard tool. |
| Hex Magic | Hex Technologies | May 4, 2023 (public beta) | 2023 | Generates SQL, Python, and chart cells in notebooks. Notebook Agent released August 2024 chains multiple cells. |
| Sigma AI / Sigma Copilot | Sigma Computing | Spring 2024 | Fall 2024 launch added Explain Viz and Formula Assistant | Uses customer-supplied LLM credentials. Spreadsheet-style BI tool. |
| Databricks AI/BI Genie | Databricks | June 12, 2024 (Data + AI Summit) | Genie GA 2025, AI/BI Dashboards GA 2024 | Compound AI system with planning, SQL, visualization, and certification agents. Tightly coupled to Unity Catalog. |
| Snowflake Cortex Analyst | Snowflake | August 14, 2024 (public preview) | 2025 | Built on Llama and Mistral models. Uses a user-defined semantic model. Snowflake reports about 90% text-to-SQL accuracy on internal benchmarks. |
| Qlik Answers and Qlik Predict (formerly AutoML) | Qlik | Qlik Answers GA July 2024; AutoML enhancements September 2024 | 2024 | Qlik Answers is a RAG knowledge assistant over unstructured data. Qlik Predict is the renamed AutoML product. |
| Domo.AI / Agent Catalyst | Domo | Domo.AI announced 2023; Agent Catalyst March 2024 | 2024 | Toolkit for building agents over Domo data. Supports text-to-SQL and text-to-Beastmode formulas. Pluggable LLM backend. |
| Deepnote AI | Deepnote | 2023 | 2024 | AI-first notebook with auto-notebook generation and an autonomous Auto AI mode. |
| Databricks Assistant | Databricks | 2023 | Autocomplete GA 2024 | Inline AI suggestions for SQL and Python in notebooks, queries, and AI/BI Dashboards. |
| SAP Just Ask in SAP Analytics Cloud (Joule integration) | SAP | 2023 | Controlled release Q1 2025 | Natural language interface to SAC charts and tables. Joule is SAP's broader generative AI copilot. |
The copilots converge on a similar set of features. All of them accept a natural language question, attempt to generate SQL or a chart specification, and return a result with a short summary. They diverge on three axes.
The first is the semantic layer. Power BI Copilot uses the tabular model. Tableau Pulse uses a new Metrics Layer added in 2024.1. Looker uses LookML. Snowflake Cortex Analyst requires a user-defined semantic model written in YAML. Databricks Genie reads metadata from Unity Catalog. Without a semantic layer the model has to guess at table joins from schema names, which is the main source of errors.
The second axis is the model backend. Power BI Copilot and Tableau Pulse use Azure OpenAI. Looker uses Gemini. Snowflake Cortex Analyst runs Llama and Mistral models inside Snowflake to keep data inside the security perimeter. Databricks Genie uses an ensemble of agents on the underlying Databricks Mosaic AI runtime. Domo and Sigma let customers plug in their own LLM credentials.
The third axis is the integration point. Tableau Pulse pushes summaries to Slack and email rather than asking users to open the BI tool. Power BI Copilot lives inside the report authoring canvas. ThoughtSpot Spotter is the primary interface to ThoughtSpot Analytics rather than a side panel.
Outside the enterprise BI suites a separate class of tools targets analysts and data scientists.
Julius AI, launched in 2023, is a conversational data analysis tool that accepts CSVs, Excel files, PDFs, and database connections, then runs Python, R, or SQL code in response to natural language prompts. It can handle larger files than the ChatGPT data analyst plugin and exposes the generated code for inspection.
Vizly, founded in 2023 by McGill University graduates Sami Sahnoune and Ali Shobeiri and incubated at Y Combinator, offers a chat-style interface that produces charts and statistical analyses from uploaded files.
Polymer turns spreadsheets into interactive dashboards and supports natural language queries against the resulting database. Akkio combines no-code machine learning with a GPT-4 powered natural language layer for cleaning, predicting, and charting.
Hex Magic is the AI layer inside Hex notebooks. The May 2023 public beta included Generate, Edit, Explain, Fix, and Explode tools. A March 2024 update let Magic generate multiple cells at once, chaining SQL, Python, and Chart cells. A November 2024 update added Ask Magic AI for data questions. Hex released a Notebook Agent in August 2024 that operates across cells.
Deepnote ships an AI assistant that generates entire notebooks from a prompt, with an autonomous Auto AI mode that executes the generated code and self-corrects.
Jupyter AI is an open source extension for JupyterLab maintained by the Project Jupyter team. It exposes a chat panel and inline magic commands that call out to OpenAI, Anthropic, Cohere, Hugging Face, and other providers. Version 2 of Jupyter AI shipped with JupyterLab 4 and added agent-style features.
Databricks Assistant is the inline AI feature inside Databricks notebooks, queries, and AI/BI Dashboards. Databricks Assistant Autocomplete reached general availability in 2024 with personalized AI suggestions as the user types.
Canva Flourish is the data visualization product Canva acquired in 2022. Flourish itself is a template-driven no-code tool rather than a generative AI product, but it integrates with Canva's broader AI features.
Automated insight detection predates the LLM wave but has been rebuilt around it. The category covers tools that surface trends, outliers, and segment-level drivers without an explicit user query.
Tableau Explain Data, released in Tableau 2019.3, uses Bayesian methods to fit dozens of statistical models in real time and suggest explanations for individual data points. It can pick up on dimensions not currently on the chart.
Tableau Pulse uses generative AI to summarize automated insight detection in natural language. It flags changes, detects drivers, trends, and outliers, and pushes summaries to Slack, Teams, and email.
Power BI Quick Insights runs algorithms developed with Microsoft Research to find correlations, outliers, trends, seasonality, change points in trends, and major factors. The system can produce up to ten different types of insight cards. Microsoft has announced that Power BI Q&A will be retired in December 2026 and replaced by Copilot.
SAS Visual Analytics integrates SAS Viya machine learning to produce automated charts, forecasts, and decision tree explanations.
SAP Analytics Cloud Just Ask lets users type questions and receive charts or tables. SAP is folding Just Ask into Joule, SAP's broader generative AI copilot, with a controlled release planned for Q1 2025.
Domo.AI Agent Catalyst, launched in March 2024, lets customers configure agents that combine an LLM with their Domo dataset and a tool catalog. Domo's text-to-Beastmode capability translates natural language into the Beastmode calculation language.
A research community focuses on the inverse problem of going from a chart image to a structured representation, a summary, or an answer to a question about the chart.
ChartQA, presented by Ahmed Masry and colleagues at ACL 2022, is the most widely used benchmark for chart question answering. It contains 9,608 human-written questions and 23,111 questions generated from human-written chart summaries, totaling 28,299 question-answer pairs over 18,317 chart images. ChartQA emphasizes complex reasoning that mixes visual features and arithmetic operations.
PlotQA by Nitesh Methani, Pritha Ganguly, and colleagues at WACV 2020 contains 28.9 million question-answer pairs grounded over 224,377 scientific plots. PlotQA included real-valued answers and richer label variability than earlier synthetic datasets like FigureQA and DVQA.
ChartLlama is a multimodal LLM for chart understanding and generation, introduced in arXiv paper 2311.16483 in November 2023. The authors built an instruction tuning dataset using GPT-4 in a multi-step pipeline and reported state-of-the-art results on ChartQA, Chart-to-text, and chart extraction benchmarks.
ChartMimic at arXiv 2406.09961 (June 2024) evaluates LMMs on the chart-to-code task. The benchmark contains 4,800 human-curated triplets of (figure, instruction, code) spanning 18 regular chart types and 4 advanced types. GPT-4o scored 83.2 on ChartMimic but still struggled on realistic chart-to-code generation.
MatPlotAgent, published in Findings of ACL 2024, is an LLM-based agentic system for scientific data visualization that combines code generation with visual feedback loops.
ChartX and ChartVLM at arXiv 2402.12185 provide a versatile benchmark and foundation model for chart understanding across 18 chart types.
A recurring finding from this literature is that vision-language models fabricate numbers when reading dense or low-contrast charts. The 2024 paper Are LLMs ready for Visualization? and follow-on work documented that GPT-4 misreads color legends and hallucinates legend entries, especially when hues are close.
Several research projects from Microsoft Research and academic groups treat dashboard generation as a structured generation problem.
LIDA, a Microsoft Research project led by Victor Dibia and presented at ACL 2023, generates grammar-agnostic visualizations and infographics using large language models. LIDA has four modules: a Summarizer that converts a dataset into a compact natural language summary, a Goal Explorer that enumerates visualization goals, a VisGenerator that produces and refines visualization code in any grammar (matplotlib, seaborn, Altair, D3.js), and an Infographer that uses image generation models to render stylized output. LIDA is open source on GitHub at microsoft/lida.
Data Formulator, also from Microsoft Research, takes a concept-driven approach to chart authoring. The initial release in late 2023 separated chart configuration from data transformation, letting users define visualization concepts through example-based UI interactions or natural language. The AI backend generates the data-reshaping code and the Vega-Lite specification. Data Formulator 2, published in 2024, added thread memory, conversational refinement, and persistent connectors to databases.
NL4DV is the older Python toolkit from the Georgia Tech Visualization Lab, with subsequent contributions from UNC Charlotte and HKUST. It maps natural language queries to a JSON specification that includes data attributes, analytic tasks from the Amar et al. taxonomy, and a list of Vega-Lite charts.
VisAct, published in the Journal of Visualization in 2020 by researchers at Tongji University, is a visualization design system based on semantic actions. It guides users through chart construction step by step, with an action history panel and a high-level grammar of semantic actions. A user study compared VisAct with the Vega-Lite based Polestar system and found fewer completion steps.
The rapid spread of AI in BI has generated several concerns that vendors and researchers acknowledge.
Hallucinated numbers. When a model summarizes a chart or explains a metric, it can fabricate values that look plausible but do not appear in the underlying data. The risk is highest in long natural language summaries where the user does not cross-check every figure. The 2024 paper Are LLMs ready for Visualization? documented number hallucinations in GPT-4 when asked to read complex charts.
Misleading auto-summarization. Tableau Pulse, Power BI Copilot, and similar tools generate one-sentence headlines like "Revenue is up 12% week over week." If the underlying metric is noisy or seasonal, the headline can mislead. Tableau and Microsoft have responded with calibration warnings and trust scores, but no vendor currently publishes a hallucination rate for production summaries.
Bias in metric definitions. Generative BI tools rely on a semantic layer to know what "revenue" means. If two teams maintain slightly different revenue definitions and the model picks one without surfacing the ambiguity, downstream decisions inherit that bias. ThoughtSpot Spotter Semantics and Snowflake's semantic models try to push these definitions into a single source of truth.
Multi-step query failures. Independent evaluations of Hex Magic and similar tools in 2024 showed that single-step questions ("top ten customers by revenue") work well, but multi-step questions that require intermediate aggregations and joins fail more often. Hex addressed this with the Notebook Agent that explicitly plans cells in advance.
Privacy and data residency. Pushing a prompt to OpenAI sends column names and sometimes sample values out of the customer's cloud. Snowflake Cortex Analyst keeps the model inside the Snowflake security perimeter to address this. SAP, Domo, and Databricks offer similar customer-tenant model options.
Evaluation gap. There is no widely adopted benchmark for the end-to-end BI copilot task. Vendors report internal metrics like Snowflake's claimed 90% text-to-SQL accuracy on Cortex Analyst, but these are not directly comparable. Research benchmarks like BIRD and Spider 2.0 focus on text-to-SQL in isolation, not on the broader question of whether the chart and summary the user sees is correct.
The table below summarizes the main benchmarks used to evaluate AI in data visualization and BI.
| Benchmark | Year | Task | Size | Notes |
|---|---|---|---|---|
| ChartQA (Masry et al.) | 2022 | Chart question answering | 28,299 question-answer pairs over 18,317 charts | Human-written and generated questions. Focus on visual and logical reasoning. |
| PlotQA (Methani et al.) | 2020 | Scientific plot question answering | 28.9 million QA pairs over 224,377 plots | Real-valued answers, out-of-vocabulary handling. |
| ChartMimic | 2024 | Chart to code | 4,800 (figure, instruction, code) triplets | 18 regular and 4 advanced chart types. Evaluates cross-modal reasoning. |
| ChartX / ChartVLM | 2024 | Chart understanding | 18 chart types across 7 tasks | Includes a foundation model for chart tasks. |
| BIRD-bench | 2023 | Text-to-SQL | 12,751 question-SQL pairs over 95 databases (33.4 GB) | Realistic dirty data and external knowledge. Introduced Valid Efficiency Score. |
| Spider | 2018 | Text-to-SQL | 10,181 questions, 5,693 SQL queries, 200 databases | Original cross-domain semantic parsing benchmark. |
| Spider 2.0 | November 2024 | Enterprise text-to-SQL | 632 real-world workflow problems | BigQuery and Snowflake dialects. Long contexts and multi-query workflows. State of the art models solved 21.3% in initial evaluations. |
| HallusionBench | 2024 | Vision-language hallucination | Includes chart and table prompts | Diagnoses visual illusion and language hallucination in multimodal models. |
| MatPlotAgent (eval) | 2024 | Scientific plotting | LLM agent with visual feedback | Findings of ACL 2024. |