Keras

Keras is an open-source, high-level neural network API written in Python, designed to simplify the process of building, training, and deploying deep learning models. Created by Francois Chollet and first released on March 27, 2015, Keras provides a user-friendly interface that abstracts away much of the complexity involved in constructing neural networks. It is licensed under the Apache 2.0 license and has grown into one of the most widely adopted deep learning libraries in the world, with over 2.5 million developers using the framework as of 2024.

Keras acts as a frontend for lower-level computational frameworks. In its current iteration (Keras 3), it supports TensorFlow, JAX, PyTorch, and OpenVINO as backends, allowing developers to write code once and run it across multiple frameworks without modification.

History and Evolution

The development of Keras is closely tied to the career of its creator, Francois Chollet, a French software engineer and AI researcher.

Origins (2015)

Chollet created Keras while working on research involving recurrent neural networks (RNNs). At the time, there was no good reusable open-source implementation of RNNs and LSTMs. The available options were limited: Caffe was popular in computer vision but only worked for narrow use cases and was not very extensible, while Torch 7 required coding in Lua, which lacked the advantages of the Python data science ecosystem. Chollet decided to build his own library, and that effort became Keras.

The name "Keras" comes from the Ancient Greek word keras (meaning "horn"), a reference to the literary image of the "Gate of Horn" from Homer's Odyssey, through which true visions pass to mortals. Chollet released the first version of Keras on March 27, 2015, and joined Google shortly afterward.

Keras 1 and 2: The Multi-Backend Era (2015 to 2019)

In its early versions, Keras supported multiple backends, including Theano, TensorFlow, and Microsoft's Cognitive Toolkit (CNTK). This backend-agnostic design was one of Keras's defining features: users could write model code once and switch between backends by changing a single configuration setting. Keras 2, released in 2017, stabilized the API and brought improvements to the layer system, model saving, and preprocessing utilities.

tf.keras: Integration into TensorFlow (2019 to 2023)

When TensorFlow 2.0 launched in September 2019, Keras was integrated as TensorFlow's official high-level API under the tf.keras namespace. This integration gave Keras access to TensorFlow's full ecosystem, including TensorFlow Serving, TensorFlow Lite, and TensorFlow.js for deployment across servers, mobile devices, and web browsers. During this period (Keras 2.4 through 2.15), TensorFlow was the only supported backend. The standalone multi-backend Keras package was no longer maintained in favor of tf.keras.

Keras 3: Return to Multi-Backend (2023 to Present)

In 2023, Chollet announced Keras 3, a full rewrite of the library that restored multi-backend support. The project was developed under the codename "Keras Core" during its initial development phase (April to July 2023) and a public beta test (July to September 2023). Keras 3.0 was officially released in late 2023.

Keras 3 supports four backends:

Backend	Use Case	Notes
JAX	High-performance training and inference	Typically delivers the best performance on GPU, TPU, and CPU
TensorFlow	Production deployment, mobile/web	Access to TF Serving, TF Lite, TF.js ecosystem
PyTorch	Research, integration with PyTorch ecosystem	Keras layers function as native PyTorch Modules
OpenVINO	Inference-only optimization	Added in Keras 3.8 for optimized CPU inference

As of April 2026, the latest stable version is Keras 3.14.0. Starting with version 3.13.0, Keras requires Python 3.11 or higher. TensorFlow 2.16 and later versions ship with Keras 3 as the default Keras implementation, while the legacy Keras 2 remains available through the tf_keras maintenance package.

Francois Chollet

Francois Chollet earned a Master of Engineering degree from ENSTA Paris (part of the Polytechnic Institute of Paris) in 2012. He created Keras in 2015 and joined Google the same year, where he served as a Senior Staff Engineer for over nine years before departing in November 2024.

Beyond Keras, Chollet has made several significant contributions to the AI field:

He authored Deep Learning with Python (2017), which sold over 100,000 copies, and co-authored Deep Learning with R (2018).
He published the Xception architecture paper (Xception: Deep Learning with Depthwise Separable Convolutions), which ranks among the top ten most cited CVPR papers with over 18,000 citations.
In 2019, he created the Abstraction and Reasoning Corpus (ARC-AGI), a benchmark designed to measure AI systems' ability to solve novel reasoning problems.
In 2024, he launched the ARC Prize, a $1 million competition to solve the ARC-AGI benchmark, and in early 2025 expanded it into a nonprofit foundation focused on artificial general intelligence research.
After leaving Google, Chollet co-founded a startup with Zapier co-founder Mike Knoop focused on program synthesis for AGI development.

Model-Building APIs

Keras offers three distinct APIs for building neural network models, each suited to different levels of complexity and customization.

Sequential API

The Sequential API is the simplest way to build a model in Keras. It allows users to create models by stacking layers in a linear sequence, one after another. This API is ideal for straightforward architectures where data flows through each layer in order without branching or merging.

import keras
from keras import layers

model = keras.Sequential([
    layers.Dense(128, activation='relu', input_shape=(784,)),
    layers.Dropout(0.3),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

The Sequential API is best suited for beginners or for building simple models like basic classifiers and regressors. Its limitation is that it only supports single-input, single-output stacks of layers.

Functional API

The Functional API provides greater flexibility by allowing users to define models as directed acyclic graphs of layers. This API supports multiple inputs and outputs, shared layers, and non-linear topologies such as skip connections and residual blocks.

inputs = keras.Input(shape=(784,))
x = layers.Dense(128, activation='relu')(inputs)
x = layers.Dropout(0.3)(x)
x = layers.Dense(64, activation='relu')(x)
outputs = layers.Dense(10, activation='softmax')(x)

model = keras.Model(inputs=inputs, outputs=outputs)

The Functional API strikes a balance between ease of use and flexibility. It is the recommended approach for most use cases, including architectures with branching (such as Inception-style networks) and models that require multiple input or output tensors.

Model Subclassing API

The Subclassing API gives users full control over the model by defining a custom class that inherits from keras.Model. Users implement the __init__ method to define layers and the call method to specify the forward pass logic.

class MyModel(keras.Model):
    def __init__(self):
        super().__init__()
        self.dense1 = layers.Dense(128, activation='relu')
        self.dropout = layers.Dropout(0.3)
        self.dense2 = layers.Dense(64, activation='relu')
        self.out = layers.Dense(10, activation='softmax')

    def call(self, inputs):
        x = self.dense1(inputs)
        x = self.dropout(x)
        x = self.dense2(x)
        return self.out(x)

This approach is suited for advanced research or highly customized models that require conditional logic, loops, or other dynamic behaviors during the forward pass. It offers maximum flexibility but requires a deeper understanding of the framework.

API Comparison

Feature	Sequential	Functional	Subclassing
Ease of use	Very easy	Moderate	Advanced
Multiple inputs/outputs	No	Yes	Yes
Shared layers	No	Yes	Yes
Non-linear topology	No	Yes	Yes
Dynamic forward pass	No	No	Yes
Model visualization	Yes	Yes	Limited
Best for	Beginners, simple models	Most use cases	Research, custom architectures

Key Features

User-Friendly Design

Keras follows the principle of "progressive disclosure of complexity." Simple tasks require minimal code, while advanced customization is available when needed. The API is consistent and uses clear naming conventions, which reduces the cognitive load on developers.

Modularity

Keras treats models as compositions of standalone, configurable modules. Layers, optimizers, loss functions, metrics, and callbacks can be combined in various ways. Users can also create custom versions of these components with relative ease, supporting experimentation with novel architectures.

Backend Flexibility

With Keras 3, users can switch between JAX, TensorFlow, PyTorch, and OpenVINO without rewriting model code. A model saved in one backend can be loaded and run in another (provided custom components use keras.ops instead of backend-specific operations). This flexibility allows teams to choose the best backend for each stage of their workflow.

The keras.ops Namespace

Keras 3 introduced the keras.ops namespace, which provides a unified set of operations that work identically across all backends. This includes a full NumPy-compatible API (e.g., ops.matmul, ops.sum, ops.stack, ops.einsum) and neural-network-specific functions (e.g., ops.softmax, ops.binary_crossentropy, ops.conv). Any custom layer, loss, metric, or optimizer written with keras.ops will run on all supported backends.

Pre-trained Models

Keras provides access to over 40 pre-trained model architectures through Keras Applications, including popular networks like VGG, ResNet, Inception, EfficientNet, and MobileNet. These models come with pre-trained weights (typically trained on ImageNet) and can be used for transfer learning, feature extraction, or fine-tuning on domain-specific tasks.

Common Layers and Modules

Keras organizes its layer library into 16 categories. The table below lists the most commonly used layers.

Layer	Category	Description
`Dense`	Core	Fully connected layer; each neuron connects to every neuron in the previous layer
`Conv2D`	Convolution	2D convolution layer for processing image data
`LSTM`	Recurrent	Long Short-Term Memory layer for sequential data; handles the vanishing gradient problem
`GRU`	Recurrent	Gated Recurrent Unit; a simpler alternative to LSTM with comparable performance
`Embedding`	Core	Maps integer indices (e.g., word IDs) to dense vectors; used in NLP models
`Dropout`	Regularization	Randomly sets a fraction of input units to zero during training to prevent overfitting
`BatchNormalization`	Normalization	Normalizes layer inputs to have zero mean and unit variance, stabilizing training
`LayerNormalization`	Normalization	Normalizes across features rather than the batch dimension; common in transformers
`MultiHeadAttention`	Attention	Implements multi-head attention mechanism used in transformer architectures
`Flatten`	Reshaping	Flattens a multi-dimensional input into a 1D vector
`MaxPooling2D`	Pooling	Downsamples spatial dimensions by taking the maximum value in each pooling window
`Concatenate`	Merging	Concatenates a list of inputs along a specified axis

In addition to these, Keras provides preprocessing layers for text, image, and audio data; activation layers (ReLU, Softmax, GELU, Swish); weight initializers (GlorotNormal, HeNormal); weight regularizers (L1, L2); and backend-specific layers for interoperability with PyTorch Modules, TensorFlow SavedModels, and JAX/Flax layers.

Training Workflow

Keras provides a streamlined workflow for training and evaluating models, centered around three methods: compile, fit, and evaluate/predict.

Step 1: Compile the Model

Before training, the model must be compiled with an optimizer, a loss function, and (optionally) metrics to monitor.

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

Step 2: Train with model.fit()

The model.fit() method is the primary training function. For each epoch, it performs the following steps:

Splits the data into batches.
Performs a forward pass to generate predictions.
Calculates the loss between predictions and true labels.
Computes gradients of the loss with respect to the model's trainable weights.
Updates the weights using the optimizer.
Calculates the specified metrics.
Optionally evaluates on a separate validation dataset at the end of each epoch.

history = model.fit(
    x_train, y_train,
    epochs=20,
    batch_size=32,
    validation_split=0.2,
    callbacks=[...]
)

Keras 3 supports multiple data pipeline formats, including NumPy arrays, tf.data.Dataset, torch.utils.data.DataLoader, Pandas DataFrames, and keras.utils.PyDataset.

Step 3: Evaluate and Predict

model.evaluate() calculates the loss and metrics on a test dataset, providing a measure of the model's performance on unseen data.

test_loss, test_accuracy = model.evaluate(x_test, y_test)

model.predict() generates output predictions for new input data without computing loss or metrics.

predictions = model.predict(new_data)

Custom Training Loops

For more advanced use cases, users can override the train_step() method to customize the training logic while still using model.fit(). Alternatively, Keras components can be used in fully custom training loops written in the native syntax of JAX, TensorFlow, or PyTorch.

Callbacks System

Callbacks are objects passed to model.fit() that can perform actions at various stages of training, such as at the start or end of an epoch, or before or after processing a batch. Keras includes several built-in callbacks for common tasks.

Callback	Purpose
`EarlyStopping`	Stops training when a monitored metric (e.g., validation loss) has stopped improving for a specified number of epochs (patience). Can restore weights from the best epoch.
`ModelCheckpoint`	Saves the model or its weights periodically or whenever performance on a monitored metric improves.
`ReduceLROnPlateau`	Reduces the learning rate when a monitored metric has stopped improving, helping the model escape plateaus.
`TensorBoard`	Logs training metrics, model graphs, and histograms for visualization in TensorBoard.
`LearningRateScheduler`	Adjusts the learning rate according to a user-defined schedule function at each epoch.
`CSVLogger`	Streams epoch results (loss, metrics) to a CSV file.
`ProgbarLogger`	Displays a progress bar during training.

Users can also create custom callbacks by subclassing keras.callbacks.Callback and overriding methods like on_epoch_end, on_batch_begin, and on_train_end.

Keras Ecosystem

KerasHub (formerly KerasCV and KerasNLP)

Originally, the Keras ecosystem included two separate domain-specific libraries: KerasCV for computer vision and KerasNLP for natural language processing. As AI models increasingly became multimodal (for example, chat-based large language models with image inputs, or vision tasks that leverage text encoders), maintaining separate domain libraries became impractical.

In 2024, KerasCV and KerasNLP were consolidated into a single unified library called KerasHub. KerasHub is a pretrained modeling library that provides Keras 3 implementations of popular model architectures paired with pretrained checkpoints available on Kaggle Models. Models work across all backends (TensorFlow, JAX, PyTorch) for both training and inference.

KerasHub includes implementations of models such as:

Language models: Llama 3, Gemma, BERT, T5, GPT-2, OPT
Vision models: Stable Diffusion, Segment Anything (SAM), YOLOv8
Audio models: Whisper

Key features of KerasHub include LoRA fine-tuning for resource-efficient model adaptation, quantization for optimized performance, model publishing and sharing, and multi-host distributed training.

Existing code using keras_nlp imports continues to work; migration only requires updating import statements from keras_nlp to keras_hub.

KerasTuner

KerasTuner is a hyperparameter tuning library for Keras that automates the process of finding optimal hyperparameter configurations. It supports several search algorithms, including random search, Bayesian optimization, and Hyperband.

AutoKeras

AutoKeras is an automated machine learning (AutoML) library built on Keras. It automatically searches for the best model architecture and hyperparameters for a given dataset, making deep learning accessible to users without extensive expertise in model design.

Distribution API

Keras 3 includes a distribution API (keras.distribution) that simplifies data parallelism and model parallelism. The API allows users to distribute training across multiple GPUs or TPUs with minimal code changes:

distribution = keras.distribution.DataParallel(
    devices=keras.distribution.list_devices()
)
keras.distribution.set_distribution(distribution)

The distribution API keeps model definition, training logic, and sharding configuration entirely separate, making it easy to scale training without restructuring code.

Applications

Keras is used across a wide range of deep learning applications:

Domain	Examples	Common Layers/Models
Image classification	Object detection, face recognition, medical imaging	Conv2D, ResNet, EfficientNet
Natural language processing	Text classification, sentiment analysis, machine translation	Embedding, LSTM, Transformer, BERT
Generative AI	Image synthesis, text generation, data augmentation	GANs, VAEs, Stable Diffusion
Speech and audio	Speech recognition, audio classification	Conv1D, Whisper
Time series	Forecasting, anomaly detection	LSTM, GRU, Conv1D
Reinforcement learning	Game playing, robot control	Dense, custom training loops

Keras powers major production systems, including the Waymo self-driving fleet and the YouTube recommendation engine.

Keras vs. PyTorch

Keras and PyTorch are two of the most widely used frameworks for deep learning, but they take different approaches.

Aspect	Keras	PyTorch
Abstraction level	High-level API	Lower-level framework
Ease of use	Very beginner-friendly; minimal boilerplate	Requires more code but feels Pythonic
Debugging	Relies on backend tools; can be less transparent	Excellent debugging with standard Python tools
Training loop	Built-in `model.fit()` handles most cases	Manual training loops offer full control
Research adoption	Common in applied ML and industry	Dominant in academic research
Cutting-edge models	Available through KerasHub	Most new state-of-the-art models appear first in PyTorch
Deployment	Strong TensorFlow ecosystem (TF Serving, TF Lite, TF.js)	TorchServe, ONNX export
Backend flexibility	Multi-backend (JAX, TF, PyTorch, OpenVINO)	PyTorch only
Performance	Can leverage JAX for best GPU/TPU performance	Optimized for GPU; strong CUDA support

When to use Keras: Keras is best suited for rapid prototyping, educational purposes, and small to medium-scale production projects. It is also a strong choice when backend flexibility is important or when deploying to mobile and edge devices through TensorFlow Lite. Teams that want to use JAX's performance advantages without learning JAX's functional programming model can use Keras as a familiar interface.

When to use PyTorch: PyTorch is preferred for cutting-edge research, when you need fine-grained control over training dynamics, or when working with models that are primarily published in the PyTorch ecosystem. It is also the standard in most academic labs.

Hybrid approach: Many teams use both frameworks. Keras 3's multi-backend support means that a model written in Keras can run on PyTorch as its backend, bridging the gap between the two ecosystems.

Explain Like I'm 5 (ELI5)

Imagine you want to build something out of LEGO blocks. You could try to make every single tiny brick yourself from scratch, which would take forever. Or you could use a LEGO kit that already has all the special pieces sorted and labeled, with instructions showing you how to snap them together.

Keras is like that LEGO kit, but for building smart computer programs. Deep learning programs are made of building blocks called "layers" that each do a small job, like looking at pictures or reading words. Keras gives you all these building blocks pre-made, so you just pick the ones you need and snap them together in the right order.

Once you put your blocks together, you "train" your creation by showing it lots of examples (like thousands of pictures of cats and dogs) so it learns to tell them apart. Keras handles all the complicated math behind the scenes. You just say "learn from these examples" and it does the rest.

The best part is that Keras works with several different "engines" underneath (called backends), so it is like having one set of LEGO instructions that works with different brands of building blocks.

References

Chollet, F. "Keras: Deep Learning for humans." keras.io. Accessed April 2026.
"Keras." Wikipedia. Accessed April 2026. https://en.wikipedia.org/wiki/Keras
Chollet, F. "Introducing Keras 3." keras.io/keras_3/. 2023.
"Francois Chollet." Wikipedia. Accessed April 2026. https://en.wikipedia.org/wiki/Fran%C3%A7ois_Chollet
"Keras documentation: Callbacks API." keras.io/api/callbacks/. Accessed April 2026.
"Keras documentation: Keras layers API." keras.io/api/layers/. Accessed April 2026.
"Introducing Keras Hub: Your one-stop shop for pretrained models." Google Developers Blog. 2024.
"Keras documentation: Training & evaluation with the built-in methods." keras.io/guides/training_with_built_in_methods/. Accessed April 2026.
"Keras vs PyTorch in 2025: The Comparison." DistantJob. https://distantjob.com/blog/keras-vs-pytorch/
Chollet, F. *Deep Learning with Python*. Manning Publications, 2017.

History and Evolution

Origins (2015)

Keras 1 and 2: The Multi-Backend Era (2015 to 2019)

tf.keras: Integration into TensorFlow (2019 to 2023)

Keras 3: Return to Multi-Backend (2023 to Present)

Francois Chollet

Model-Building APIs

Sequential API

Functional API

Model Subclassing API

API Comparison

Key Features

User-Friendly Design

Modularity

Backend Flexibility

The keras.ops Namespace

Pre-trained Models

Common Layers and Modules

Training Workflow

Step 1: Compile the Model

Step 2: Train with model.fit()

Step 3: Evaluate and Predict

Custom Training Loops

Callbacks System

Keras Ecosystem

KerasHub (formerly KerasCV and KerasNLP)

KerasTuner

AutoKeras

Distribution API

Applications

Keras vs. PyTorch

Explain Like I'm 5 (ELI5)

References

Improve this article

Related Articles

Sparse autoencoder

ARC-AGI 2

TensorFlow

GELU (Gaussian Error Linear Unit)

LeNet

Context window

History and Evolution

Origins (2015)

Keras 1 and 2: The Multi-Backend Era (2015 to 2019)

tf.keras: Integration into TensorFlow (2019 to 2023)

Keras 3: Return to Multi-Backend (2023 to Present)

Francois Chollet

Model-Building APIs

Sequential API

Functional API

Model Subclassing API

API Comparison

Key Features

User-Friendly Design

Modularity

Backend Flexibility

The keras.ops Namespace

Pre-trained Models

Common Layers and Modules

Training Workflow

Step 1: Compile the Model

Step 2: Train with model.fit()

Step 3: Evaluate and Predict

Custom Training Loops

Callbacks System

Keras Ecosystem

KerasHub (formerly KerasCV and KerasNLP)

KerasTuner

AutoKeras

Distribution API

Applications

Keras vs. PyTorch

Explain Like I'm 5 (ELI5)

References

Related Articles

Sparse autoencoder

ARC-AGI 2

TensorFlow

GELU (Gaussian Error Linear Unit)

LeNet