See also: Machine learning terms
TensorFlow is an open-source machine learning framework developed by the Google Brain team for research and production use. Released in November 2015, it has become one of the most widely adopted tools for building, training, and deploying neural network models. TensorFlow uses dataflow graphs for numerical computation, where nodes represent mathematical operations and edges carry multidimensional data arrays called tensors. The framework supports a range of hardware platforms, from mobile phones and embedded devices to large-scale GPU clusters and TPU pods.
TensorFlow is licensed under the Apache 2.0 open-source license. Its GitHub repository has accumulated over 195,000 stars, making it one of the most popular open-source projects in the artificial intelligence space. The framework provides APIs in Python, C++, Java, JavaScript, and several community-maintained languages including Rust, Julia, R, and Scala.
TensorFlow's roots trace back to 2011, when the Google Brain team built DistBelief, a proprietary deep learning system for internal research and production. DistBelief powered several Google products, including improvements to Google Search (through RankBrain), Google Photos, and speech recognition systems. Geoffrey Hinton's team used DistBelief to achieve a 25% reduction in speech recognition errors in 2009, a result that helped catalyze Google's investment in deep learning research.
However, DistBelief had significant limitations. It was tightly coupled to Google's internal infrastructure, making it difficult to adapt for different hardware or smaller-scale use cases. Maintaining separate systems for large distributed training and smaller on-device workloads created engineering overhead. The system also lacked the flexibility needed to accommodate machine learning approaches beyond deep neural networks.
Google assigned several computer scientists, including Jeff Dean, to design a successor. The team aimed to create a framework that was more flexible, could target heterogeneous hardware (CPUs, GPUs, and custom accelerators), and supported both research experimentation and production deployment. The result was TensorFlow.
Jeff Dean advocated for open-sourcing the new framework, arguing that broad community adoption would accelerate progress. On November 9, 2015, Google released TensorFlow under the Apache 2.0 license. The release generated immediate interest: by 2016, Jeff Dean reported that over 1,500 GitHub repositories referenced TensorFlow, only five of which belonged to Google. This rapid community adoption helped TensorFlow quickly overtake earlier frameworks like Theano and Caffe in popularity.
Version 1.0.0 arrived on February 11, 2017, bringing a stable Python API and production readiness guarantees. During the 1.x era, TensorFlow operated primarily through a "define-then-run" paradigm: users built a static computation graph in Python, then executed it inside a tf.Session. This two-step process offered performance advantages because the framework could optimize the entire graph before running it, but it made debugging difficult and created a steep learning curve compared to more Pythonic alternatives.
The 1.x series also introduced several tools and extensions. In May 2017, Google announced TensorFlow Lite for mobile and embedded inference. TensorFlow.js followed in March 2018, enabling model execution in web browsers. TensorFlow Serving provided a production serving system for deploying trained models behind APIs.
During this period, PyTorch (released by Facebook in 2016) gained rapid adoption in research due to its eager execution model and more intuitive Python interface. By 2018, the growing popularity of PyTorch in academic settings put pressure on Google to modernize TensorFlow's programming model.
Google released TensorFlow 2.0 in September 2019. This was a substantial overhaul that addressed many of the complaints from the 1.x era:
tf.keras as the recommended approach for model building and training.tf.contrib module was sunsetted. Some submodules were promoted into core TensorFlow; others were spun off into separate projects like TF Addons and TF I/O.tf.function decorator. To recover the performance benefits of graph execution, TensorFlow 2.0 introduced tf.function, which traces Python code into an optimized graph at runtime. This gave users the readability of eager mode with the speed of graph mode when needed.After the 2.0 release, TensorFlow continued iterating through the 2.x series:
| Version | Release date | Notable changes |
|---|---|---|
| 2.0 | September 2019 | Eager execution by default, Keras integration, tf.function |
| 2.4 | December 2020 | Mixed precision support improvements, CUDA 11 support |
| 2.6 | August 2021 | Keras moved to a separate package (keras 2.6.0) |
| 2.9 | May 2022 | Deterministic GPU operations, oneDNN optimizations |
| 2.12 | March 2023 | Default float dtype changed to float32, new Keras optimizers |
| 2.14 | September 2023 | Improved NumPy API coverage, JAX-compatible random state |
| 2.16 | March 2024 | Keras 3 as default, Python 3.12 support |
| 2.17 | July 2024 | Performance improvements, updated Keras 3 |
| 2.20 | August 2025 | Continued performance improvements |
| 2.21 | March 2026 | LiteRT graduates to full production status, enhanced INT2/INT4 low-precision support, increased maintenance focus on security and critical bugs |
At its core, TensorFlow models computation as a directed graph. Each node in the graph represents a mathematical operation (matrix multiplication, convolution, element-wise addition, and so on), and the edges carry tensors, which are multidimensional arrays of data. The name "TensorFlow" comes from this concept: tensors flowing through a graph of operations.
In TensorFlow 1.x, the graph was constructed statically before execution. TensorFlow 2.x changed this to eager execution by default, where operations run immediately. However, the graph abstraction remains available through tf.function, which traces Python functions into optimized graph representations for better performance in production settings.
The core of TensorFlow is a C++ runtime library that handles graph execution, memory management, and device placement. This low-level layer supports operations commonly used in machine learning: matrix multiplication, convolution, pooling, activation functions (ReLU, Softmax, Sigmoid), loss functions (mean squared error, cross-entropy), and optimizers (Adam, Adagrad, stochastic gradient descent).
The C++ runtime also handles automatic differentiation. TensorFlow can compute gradients for model parameters automatically, which is necessary for backpropagation-based training. In TensorFlow 2.x, this is done through tf.GradientTape, which records operations on the forward pass and replays them in reverse to compute gradients.
The primary frontend is the Python API, which most users interact with. The Python layer provides:
Beyond Python, TensorFlow provides official APIs in C++ and JavaScript (through TensorFlow.js). Community-maintained bindings exist for Java, Go (archived), Rust, Julia, R, C#, Haskell, OCaml, and Crystal.
XLA (Accelerated Linear Algebra) is TensorFlow's domain-specific compiler for linear algebra operations. It transforms computation graphs into optimized machine code for specific hardware targets (CPUs, GPUs, TPUs).
XLA's primary optimization technique is operation fusion. Instead of executing each operation separately (which requires writing intermediate results to memory), XLA fuses multiple operations into a single kernel. For example, a sequence of addition, multiplication, and reduction can be compiled into one GPU kernel that keeps intermediate values in registers or cache. Since memory bandwidth is often the bottleneck on hardware accelerators, this fusion can yield substantial speedups.
To use XLA in TensorFlow 2.x, users pass jit_compile=True to tf.function. The first call incurs a compilation delay, but subsequent calls benefit from the optimized code. XLA is especially effective on TPUs, where it serves as the required compilation path.
XLA was later spun out into the OpenXLA project, making it available as a shared compiler infrastructure used by both TensorFlow and JAX.
TensorFlow supports NVIDIA GPUs through CUDA and cuDNN, and also offers experimental support for AMD GPUs via ROCm. GPU acceleration is important for training neural networks, where the parallel architecture of GPUs can speed up matrix operations by orders of magnitude compared to CPUs.
TensorFlow automatically places operations on available GPUs when possible. Users can also manually control device placement with tf.device context managers. For multi-GPU setups on a single machine, tf.distribute.MirroredStrategy implements synchronous training by replicating the model across all GPUs and using NVIDIA's NCCL library for efficient gradient aggregation.
TPUs are custom ASICs (application-specific integrated circuits) designed by Google specifically for machine learning workloads. The first TPU was announced in May 2016. Subsequent generations have increased in capability:
| Generation | Announced | Peak performance | Notes |
|---|---|---|---|
| TPU v1 | May 2016 | 92 TOPS (int8) | Inference only |
| TPU v2 | May 2017 | 180 TFLOPS | 64-TPU pods: 11.5 PFLOPS |
| TPU v3 | May 2018 | 420 TFLOPS | 128 GB HBM, 100+ PFLOPS pods |
| Edge TPU | July 2018 | 4 TOPS | For mobile and embedded devices |
| TPU v4 | May 2021 | 275 TFLOPS | 4,096-chip pods |
| TPU v5e | August 2023 | Cost-optimized | Targeted at inference and smaller training |
| TPU v5p | December 2023 | 459 TFLOPS | 8,960-chip pods |
TPUs have native hardware support for the bfloat16 format, which provides the same dynamic range as float32 but uses half the memory. TensorFlow's tf.distribute.TPUStrategy handles model replication and data sharding across TPU cores. TPUs are available through Google Cloud and Google Colab.
TensorFlow's tf.distribute.Strategy API provides several strategies for distributing training across multiple devices and machines:
These strategies allow users to scale training with minimal code changes. In many cases, wrapping the model creation and training code inside a strategy's scope is sufficient to enable distributed execution.
TensorFlow is more than a single library. Google has built an extensive set of tools around the core framework for different deployment targets and production workflows.
TensorFlow Lite (TFLite) is a lightweight runtime for deploying models on mobile devices (Android, iOS), embedded Linux systems, and microcontrollers. In September 2024, Google rebranded TFLite as LiteRT (Lite Runtime), reflecting the fact that it now supports models from PyTorch, JAX, and Keras in addition to TensorFlow. With TensorFlow 2.21 (March 2026), LiteRT graduated from preview to full production status.
LiteRT uses the FlatBuffers serialization format (with the .tflite extension) to store compressed, optimized models. It supports quantization, which reduces model precision from float32 to int8 or float16 for faster inference and smaller model size. Hardware acceleration is available through GPU delegates (for Adreno, Mali, and Apple GPUs), the Android Neural Networks API (NNAPI), and specialized Edge TPU hardware.
LiteRT powers over 100,000 applications running on approximately 2.7 billion devices worldwide. Major apps like Google Photos and Snapchat rely on it for on-device AI. The production release delivers 1.4x faster GPU performance compared to the original TensorFlow Lite implementation and introduces state-of-the-art NPU (Neural Processing Unit) acceleration. TensorFlow 2.21 also added enhanced support for low-precision data types, including INT2 and INT4 formats for fully connected layers and slice operations.
The rebranding to LiteRT did not break backward compatibility. Existing apps using TFLite, including those accessing it through Google Play Services, continue to work without changes.
TensorFlow.js brings machine learning to web browsers and Node.js environments. It can run pre-trained TensorFlow, Keras, and TFLite models directly in the browser using WebGL or WebGPU for hardware acceleration, and it also supports training models from scratch in JavaScript.
Use cases for TensorFlow.js include client-side inference (where data never leaves the user's device), interactive demos and educational tools, and server-side inference in Node.js environments.
TensorFlow Serving is a C++ serving system designed for production deployment of machine learning models. It handles model versioning, allowing multiple versions of a model to be served simultaneously for A/B testing or canary deployments. TensorFlow Serving exposes models through gRPC and REST APIs, and it is optimized for high throughput and low latency.
TFX is Google's end-to-end MLOps platform built on TensorFlow. It provides a set of modular components for building production machine learning pipelines:
TFX pipelines can run on Apache Airflow, Apache Beam, or Kubeflow Pipelines, and they integrate with Google Cloud's Vertex AI platform.
TensorFlow Hub is a repository of pre-trained models for transfer learning. It hosts models in TensorFlow, TFLite, and TF.js formats covering tasks like image classification, text embeddings, object detection, and style transfer. Models can be loaded with a few lines of code and fine-tuned on new datasets.
The TensorFlow ecosystem includes several additional libraries:
| Library | Purpose |
|---|---|
| TensorFlow Probability | Probabilistic modeling and statistical inference |
| TensorFlow Recommenders | Building recommendation systems |
| TensorFlow Graphics | 3D graphics and differentiable rendering |
| TensorFlow Model Optimization | Pruning, quantization, and clustering for model compression |
| TensorFlow Quantum | Hybrid quantum-classical machine learning |
| TensorFlow Decision Forests | Gradient-boosted trees and random forests |
| TensorFlow Addons | Community-maintained extensions not in core |
A major shift in the TensorFlow ecosystem came with the release of Keras 3.0 in November 2023. Keras, originally created by Francois Chollet as a standalone deep learning API, had been tightly integrated with TensorFlow since the 2.0 release. Keras 3 is a complete rewrite that makes Keras backend-agnostic: the same Keras model code can run on TensorFlow, PyTorch, or JAX by changing a single configuration setting.
Starting with TensorFlow 2.16 (March 2024), Keras 3 became the default Keras version bundled with TensorFlow. This means:
Keras 3 also introduced a new keras.distribution API for data and model parallelism, initially implemented for the JAX backend. This multi-backend approach could reduce the significance of framework choice in practice, since developers can prototype in one backend and deploy in another.
The TensorFlow-PyTorch comparison has defined the deep learning framework landscape since 2016. The two frameworks have converged in many ways, but meaningful differences remain.
| Aspect | TensorFlow | PyTorch |
|---|---|---|
| Execution model | Eager by default (since 2.0), with tf.function for graph mode | Eager by default, with torch.compile for optimization |
| Primary API | tf.keras | torch.nn |
| Deployment tools | TF Serving, TFLite/LiteRT, TF.js, TFX | TorchServe, ExecuTorch (mobile), torch.export |
| Compiler | XLA (OpenXLA) | TorchDynamo + Inductor |
| Research adoption | ~4% of new ML papers (2024) | ~85% of deep learning papers (2024) |
| Industry deployment | Widely used in production, especially at Google scale | Growing production adoption, especially with torch.compile |
| Hardware | CPUs, GPUs (NVIDIA/AMD), TPUs | CPUs, GPUs (NVIDIA/AMD), limited TPU support |
| Mobile/edge | LiteRT (mature) | ExecuTorch (newer) |
PyTorch dominates academic research. By 2023, roughly 80% of papers at venues like NeurIPS used PyTorch, while TensorFlow's share in new research implementations fell to single digits. This shift began around 2018 when researchers gravitated toward PyTorch's more natural Python interface and easier debugging.
In production and industry settings, TensorFlow retains a larger installed base. As of 2025, TensorFlow commands approximately 37-38% of the overall market share with over 25,000 companies using it globally, compared to PyTorch's roughly 26% with about 17,000 companies. Its mature serving infrastructure, mobile runtime, and Google Cloud integration make it a common choice for deploying models at scale. A 2025 survey found that over 40% of ML teams use both frameworks, prototyping in PyTorch and deploying in TensorFlow.
The performance gap between the two frameworks has largely closed. PyTorch 2.x introduced torch.compile(), delivering significant optimization gains, while TensorFlow's XLA compiler remains competitive for large-scale, long-running training jobs with 15-20% speed improvements on certain workloads. Neither framework can claim universal performance superiority.
JAX, also developed at Google (by the Google Research team), represents a different design philosophy. JAX is a functional transformation library built on top of XLA, providing composable transformations like grad (automatic differentiation), jit (JIT compilation), vmap (automatic vectorization), and pmap (parallel execution across devices).
JAX is not a full deep learning framework in itself; it requires separate libraries like Flax or Haiku for neural network layers. However, its functional approach and XLA integration make it especially fast for research that requires custom training loops, novel architectures, or large-scale distributed training.
Within Google, DeepMind and Google Research have increasingly adopted JAX and Flax for their internal research. TensorFlow remains the primary framework for many production Google services, but JAX's growing role has raised questions about TensorFlow's long-term position within Google's AI strategy. Notably, the TensorFlow 2.21 release announcement (March 2026) recommended that developers explore Keras 3, JAX, and PyTorch for new generative AI projects, while positioning TensorFlow as the stable, maintained choice for existing production workloads.
TensorFlow has been used in a wide range of real-world applications across industries:
Google products. TensorFlow powers many Google services internally. RankBrain, deployed in October 2015, uses TensorFlow for search ranking. Google Photos uses it for image classification and search. Google Translate, Gmail Smart Reply, and Google Assistant have all incorporated TensorFlow-based models.
Healthcare. GE Healthcare trained neural networks with TensorFlow to identify anatomical structures in brain MRI scans, aiming to improve scan speed and reliability. Google's DermAssist mobile app used TensorFlow for dermatological image analysis. Sinovation Ventures applied TensorFlow to classify diseases from OCT (optical coherence tomography) retinal scans.
Game AI. AlphaGo, DeepMind's Go-playing system that defeated world champion Lee Sedol in 2016, trained its neural networks using TensorFlow with 64 GPU workers and 19 CPU parameter servers. The open-source MiniGo project reimplemented the AlphaGo Zero algorithm in TensorFlow.
Social media and e-commerce. Twitter used TensorFlow to build its ranked timeline feature for surfacing relevant tweets. Airbnb applied it to image classification for property listings. NAVER Shopping used TensorFlow to automatically categorize over 20 million newly listed products per day into roughly 5,000 categories.
Telecommunications. China Mobile built a deep learning system with TensorFlow for network anomaly detection and maintenance automation, supporting the relocation of hundreds of millions of IoT device records.
Finance. Banks and financial institutions use TensorFlow for fraud detection, risk assessment, and algorithmic trading systems. The framework's production deployment tools make it suitable for latency-sensitive financial applications.
TensorFlow's position in academic research has shifted considerably since its release. In its first few years (2015-2018), TensorFlow was the dominant framework in both industry and academia. The introduction of PyTorch in 2016, with its eager execution and more Pythonic design, began drawing researchers away.
The trend accelerated through 2019-2023. Even after TensorFlow 2.0 adopted eager execution, researchers who had already built workflows around PyTorch saw little reason to switch back. By 2024, studies of ML paper repositories showed that roughly 70% of new implementations used PyTorch, while only about 4% used TensorFlow (down from 11% the previous year). At top-tier conferences, PyTorch's share was even higher.
Several factors contributed to this shift:
TensorFlow remains heavily used in production environments, where its serving infrastructure, mobile deployment tools, and Google Cloud integration provide advantages that matter more than the rapid iteration speed valued in research.
As of early 2026, TensorFlow occupies a distinctive position in the framework landscape. It is simultaneously the most widely deployed machine learning framework in production (by company count) and a declining choice for new research projects.
The TensorFlow 2.21 release (March 2026) provided important signals about the project's direction. Google announced increased focus on maintenance: security vulnerability fixes, critical bug patches, and minor/patch version releases across ten ecosystem projects including TensorFlow Serving, TFX, TensorBoard, and TensorFlow Data Validation. The release promoted LiteRT to full production status and added low-precision data type support (INT2, INT4), but the announcement notably recommended Keras 3, JAX, and PyTorch for new generative AI development.
The multi-framework convergence trend is reshaping the landscape in several ways:
These developments suggest that the rigid "TensorFlow vs. PyTorch" framing is becoming less relevant. The practical question for teams in 2026 is less about which single framework to adopt and more about which combination of tools best fits their specific requirements for research iteration, production deployment, and hardware targets.
In TensorFlow 2.x, code runs eagerly by default. This means that calling a TensorFlow operation like tf.matmul immediately computes and returns a result, just as a NumPy operation would. Eager mode simplifies development and debugging.
For production performance, the tf.function decorator converts a Python function into a TensorFlow graph. The first time the function is called, TensorFlow traces the Python code and builds an optimized graph; subsequent calls execute the pre-compiled graph directly. This tracing can yield substantial speedups, especially when combined with XLA compilation.
The tf.data API provides tools for building efficient input pipelines. Common operations include:
These pipelines are designed to keep GPUs and TPUs fed with data, preventing hardware from sitting idle while waiting for the next batch.
SavedModel is TensorFlow's standard serialization format for trained models. A SavedModel directory contains:
SavedModel is the format consumed by TensorFlow Serving, TensorFlow Lite (via conversion), and TensorFlow.js (via conversion). It provides a language-neutral way to share models between training and serving environments.
TensorFlow (through Keras) offers three levels of abstraction for building models:
tf.keras.Model and implement the forward pass in a call method. This approach is similar to PyTorch's nn.Module pattern.TensorFlow supports a wide range of deep learning and traditional machine learning tasks:
TensorFlow maintains close compatibility with NumPy, the standard numerical computing library for Python. NumPy ndarrays are automatically converted to TensorFlow tensors in TF operations, and the reverse conversion is equally seamless. This means that code mixing NumPy and TensorFlow operations works without manual type casting in most cases.
TensorFlow also provides a tf.experimental.numpy module that implements a large subset of the NumPy API using TensorFlow tensors. Operations in this module run on GPUs and TPUs, giving users familiar NumPy syntax with hardware acceleration. This is useful for migrating existing NumPy-based scientific computing code to GPU or TPU execution without a complete rewrite.
Google Colab (Colaboratory) is a free Jupyter notebook environment that comes with TensorFlow pre-installed. Colab provides free access to GPUs (NVIDIA T4) and TPUs, making it one of the most accessible ways to experiment with deep learning. Notebooks stored on Google Drive can be shared and run by anyone with a Google account. This low barrier to entry has made Colab a common starting point for TensorFlow tutorials, courses, and prototyping.
Despite its broad adoption, TensorFlow has faced several recurring criticisms:
API instability across major versions. The transition from TensorFlow 1.x to 2.x broke backward compatibility, requiring significant code migration. Many tutorials, books, and Stack Overflow answers written for TF 1.x became obsolete, creating confusion for newcomers.
Complexity compared to alternatives. Even after the TF 2.0 simplifications, TensorFlow's codebase and API surface remain large. Concepts like tf.function tracing, graph retracing, and the distinction between eager and graph modes can trip up intermediate users. PyTorch's simpler execution model has been cited as a reason for its research popularity.
Debugging challenges with graph mode. While eager mode in TF 2.x improved the debugging experience, code decorated with tf.function still compiles into a graph, and errors inside traced functions can produce confusing stack traces that point to the graph construction rather than the original Python source.
Competition within Google. The rise of JAX within Google's own research labs has created ambiguity about TensorFlow's long-term direction. Some developers have expressed concern that Google's investment in TensorFlow may decrease as JAX gains more internal traction.
Memory consumption. TensorFlow's graph-based execution model and the overhead of the Python-to-C++ bridge can result in higher memory usage compared to more lightweight alternatives for simpler tasks.
TensorFlow is a set of building blocks that people use to create smart computer programs. These programs can learn from examples and do things like recognize objects in pictures, understand what people are saying, or even create their own drawings. TensorFlow was made by a group of people at Google and can be used by anyone for free. It works on many different computers and devices, from phones to very powerful machines, so people can build and use smart programs in all sorts of situations.