Keras is an open-source, high-level neural network API written in Python, designed to simplify the process of building, training, and deploying deep learning models. Created by Francois Chollet and first released on March 27, 2015, Keras provides a user-friendly interface that abstracts away much of the complexity involved in constructing neural networks. It is licensed under the Apache 2.0 license and has grown into one of the most widely adopted deep learning libraries in the world, with over 2.5 million developers using the framework as of 2024.
Keras acts as a frontend for lower-level computational frameworks. In its current iteration (Keras 3), it supports TensorFlow, JAX, PyTorch, and OpenVINO as backends, allowing developers to write code once and run it across multiple frameworks without modification.
The development of Keras is closely tied to the career of its creator, Francois Chollet, a French software engineer and AI researcher.
Chollet created Keras while working on research involving recurrent neural networks (RNNs). At the time, there was no good reusable open-source implementation of RNNs and LSTMs. The available options were limited: Caffe was popular in computer vision but only worked for narrow use cases and was not very extensible, while Torch 7 required coding in Lua, which lacked the advantages of the Python data science ecosystem. Chollet decided to build his own library, and that effort became Keras.
The name "Keras" comes from the Ancient Greek word keras (meaning "horn"), a reference to the literary image of the "Gate of Horn" from Homer's Odyssey, through which true visions pass to mortals. Chollet released the first version of Keras on March 27, 2015, and joined Google shortly afterward.
In its early versions, Keras supported multiple backends, including Theano, TensorFlow, and Microsoft's Cognitive Toolkit (CNTK). This backend-agnostic design was one of Keras's defining features: users could write model code once and switch between backends by changing a single configuration setting. Keras 2, released in 2017, stabilized the API and brought improvements to the layer system, model saving, and preprocessing utilities.
When TensorFlow 2.0 launched in September 2019, Keras was integrated as TensorFlow's official high-level API under the tf.keras namespace. This integration gave Keras access to TensorFlow's full ecosystem, including TensorFlow Serving, TensorFlow Lite, and TensorFlow.js for deployment across servers, mobile devices, and web browsers. During this period (Keras 2.4 through 2.15), TensorFlow was the only supported backend. The standalone multi-backend Keras package was no longer maintained in favor of tf.keras.
In 2023, Chollet announced Keras 3, a full rewrite of the library that restored multi-backend support. The project was developed under the codename "Keras Core" during its initial development phase (April to July 2023) and a public beta test (July to September 2023). Keras 3.0 was officially released in late 2023.
Keras 3 supports four backends:
| Backend | Use Case | Notes |
|---|---|---|
| JAX | High-performance training and inference | Typically delivers the best performance on GPU, TPU, and CPU |
| TensorFlow | Production deployment, mobile/web | Access to TF Serving, TF Lite, TF.js ecosystem |
| PyTorch | Research, integration with PyTorch ecosystem | Keras layers function as native PyTorch Modules |
| OpenVINO | Inference-only optimization | Added in Keras 3.8 for optimized CPU inference |
As of April 2026, the latest stable version is Keras 3.14.0. Starting with version 3.13.0, Keras requires Python 3.11 or higher. TensorFlow 2.16 and later versions ship with Keras 3 as the default Keras implementation, while the legacy Keras 2 remains available through the tf_keras maintenance package.
Francois Chollet earned a Master of Engineering degree from ENSTA Paris (part of the Polytechnic Institute of Paris) in 2012. He created Keras in 2015 and joined Google the same year, where he served as a Senior Staff Engineer for over nine years before departing in November 2024.
Beyond Keras, Chollet has made several significant contributions to the AI field:
Keras offers three distinct APIs for building neural network models, each suited to different levels of complexity and customization.
The Sequential API is the simplest way to build a model in Keras. It allows users to create models by stacking layers in a linear sequence, one after another. This API is ideal for straightforward architectures where data flows through each layer in order without branching or merging.
import keras
from keras import layers
model = keras.Sequential([
layers.Dense(128, activation='relu', input_shape=(784,)),
layers.Dropout(0.3),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])
The Sequential API is best suited for beginners or for building simple models like basic classifiers and regressors. Its limitation is that it only supports single-input, single-output stacks of layers.
The Functional API provides greater flexibility by allowing users to define models as directed acyclic graphs of layers. This API supports multiple inputs and outputs, shared layers, and non-linear topologies such as skip connections and residual blocks.
inputs = keras.Input(shape=(784,))
x = layers.Dense(128, activation='relu')(inputs)
x = layers.Dropout(0.3)(x)
x = layers.Dense(64, activation='relu')(x)
outputs = layers.Dense(10, activation='softmax')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
The Functional API strikes a balance between ease of use and flexibility. It is the recommended approach for most use cases, including architectures with branching (such as Inception-style networks) and models that require multiple input or output tensors.
The Subclassing API gives users full control over the model by defining a custom class that inherits from keras.Model. Users implement the __init__ method to define layers and the call method to specify the forward pass logic.
class MyModel(keras.Model):
def __init__(self):
super().__init__()
self.dense1 = layers.Dense(128, activation='relu')
self.dropout = layers.Dropout(0.3)
self.dense2 = layers.Dense(64, activation='relu')
self.out = layers.Dense(10, activation='softmax')
def call(self, inputs):
x = self.dense1(inputs)
x = self.dropout(x)
x = self.dense2(x)
return self.out(x)
This approach is suited for advanced research or highly customized models that require conditional logic, loops, or other dynamic behaviors during the forward pass. It offers maximum flexibility but requires a deeper understanding of the framework.
| Feature | Sequential | Functional | Subclassing |
|---|---|---|---|
| Ease of use | Very easy | Moderate | Advanced |
| Multiple inputs/outputs | No | Yes | Yes |
| Shared layers | No | Yes | Yes |
| Non-linear topology | No | Yes | Yes |
| Dynamic forward pass | No | No | Yes |
| Model visualization | Yes | Yes | Limited |
| Best for | Beginners, simple models | Most use cases | Research, custom architectures |
Keras follows the principle of "progressive disclosure of complexity." Simple tasks require minimal code, while advanced customization is available when needed. The API is consistent and uses clear naming conventions, which reduces the cognitive load on developers.
Keras treats models as compositions of standalone, configurable modules. Layers, optimizers, loss functions, metrics, and callbacks can be combined in various ways. Users can also create custom versions of these components with relative ease, supporting experimentation with novel architectures.
With Keras 3, users can switch between JAX, TensorFlow, PyTorch, and OpenVINO without rewriting model code. A model saved in one backend can be loaded and run in another (provided custom components use keras.ops instead of backend-specific operations). This flexibility allows teams to choose the best backend for each stage of their workflow.
Keras 3 introduced the keras.ops namespace, which provides a unified set of operations that work identically across all backends. This includes a full NumPy-compatible API (e.g., ops.matmul, ops.sum, ops.stack, ops.einsum) and neural-network-specific functions (e.g., ops.softmax, ops.binary_crossentropy, ops.conv). Any custom layer, loss, metric, or optimizer written with keras.ops will run on all supported backends.
Keras provides access to over 40 pre-trained model architectures through Keras Applications, including popular networks like VGG, ResNet, Inception, EfficientNet, and MobileNet. These models come with pre-trained weights (typically trained on ImageNet) and can be used for transfer learning, feature extraction, or fine-tuning on domain-specific tasks.
Keras organizes its layer library into 16 categories. The table below lists the most commonly used layers.
| Layer | Category | Description |
|---|---|---|
Dense | Core | Fully connected layer; each neuron connects to every neuron in the previous layer |
Conv2D | Convolution | 2D convolution layer for processing image data |
LSTM | Recurrent | Long Short-Term Memory layer for sequential data; handles the vanishing gradient problem |
GRU | Recurrent | Gated Recurrent Unit; a simpler alternative to LSTM with comparable performance |
Embedding | Core | Maps integer indices (e.g., word IDs) to dense vectors; used in NLP models |
Dropout | Regularization | Randomly sets a fraction of input units to zero during training to prevent overfitting |
BatchNormalization | Normalization | Normalizes layer inputs to have zero mean and unit variance, stabilizing training |
LayerNormalization | Normalization | Normalizes across features rather than the batch dimension; common in transformers |
MultiHeadAttention | Attention | Implements multi-head attention mechanism used in transformer architectures |
Flatten | Reshaping | Flattens a multi-dimensional input into a 1D vector |
MaxPooling2D | Pooling | Downsamples spatial dimensions by taking the maximum value in each pooling window |
Concatenate | Merging | Concatenates a list of inputs along a specified axis |
In addition to these, Keras provides preprocessing layers for text, image, and audio data; activation layers (ReLU, Softmax, GELU, Swish); weight initializers (GlorotNormal, HeNormal); weight regularizers (L1, L2); and backend-specific layers for interoperability with PyTorch Modules, TensorFlow SavedModels, and JAX/Flax layers.
Keras provides a streamlined workflow for training and evaluating models, centered around three methods: compile, fit, and evaluate/predict.
Before training, the model must be compiled with an optimizer, a loss function, and (optionally) metrics to monitor.
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
The model.fit() method is the primary training function. For each epoch, it performs the following steps:
history = model.fit(
x_train, y_train,
epochs=20,
batch_size=32,
validation_split=0.2,
callbacks=[...]
)
Keras 3 supports multiple data pipeline formats, including NumPy arrays, tf.data.Dataset, torch.utils.data.DataLoader, Pandas DataFrames, and keras.utils.PyDataset.
model.evaluate() calculates the loss and metrics on a test dataset, providing a measure of the model's performance on unseen data.
test_loss, test_accuracy = model.evaluate(x_test, y_test)
model.predict() generates output predictions for new input data without computing loss or metrics.
predictions = model.predict(new_data)
For more advanced use cases, users can override the train_step() method to customize the training logic while still using model.fit(). Alternatively, Keras components can be used in fully custom training loops written in the native syntax of JAX, TensorFlow, or PyTorch.
Callbacks are objects passed to model.fit() that can perform actions at various stages of training, such as at the start or end of an epoch, or before or after processing a batch. Keras includes several built-in callbacks for common tasks.
| Callback | Purpose |
|---|---|
EarlyStopping | Stops training when a monitored metric (e.g., validation loss) has stopped improving for a specified number of epochs (patience). Can restore weights from the best epoch. |
ModelCheckpoint | Saves the model or its weights periodically or whenever performance on a monitored metric improves. |
ReduceLROnPlateau | Reduces the learning rate when a monitored metric has stopped improving, helping the model escape plateaus. |
TensorBoard | Logs training metrics, model graphs, and histograms for visualization in TensorBoard. |
LearningRateScheduler | Adjusts the learning rate according to a user-defined schedule function at each epoch. |
CSVLogger | Streams epoch results (loss, metrics) to a CSV file. |
ProgbarLogger | Displays a progress bar during training. |
Users can also create custom callbacks by subclassing keras.callbacks.Callback and overriding methods like on_epoch_end, on_batch_begin, and on_train_end.
Originally, the Keras ecosystem included two separate domain-specific libraries: KerasCV for computer vision and KerasNLP for natural language processing. As AI models increasingly became multimodal (for example, chat-based large language models with image inputs, or vision tasks that leverage text encoders), maintaining separate domain libraries became impractical.
In 2024, KerasCV and KerasNLP were consolidated into a single unified library called KerasHub. KerasHub is a pretrained modeling library that provides Keras 3 implementations of popular model architectures paired with pretrained checkpoints available on Kaggle Models. Models work across all backends (TensorFlow, JAX, PyTorch) for both training and inference.
KerasHub includes implementations of models such as:
Key features of KerasHub include LoRA fine-tuning for resource-efficient model adaptation, quantization for optimized performance, model publishing and sharing, and multi-host distributed training.
Existing code using keras_nlp imports continues to work; migration only requires updating import statements from keras_nlp to keras_hub.
KerasTuner is a hyperparameter tuning library for Keras that automates the process of finding optimal hyperparameter configurations. It supports several search algorithms, including random search, Bayesian optimization, and Hyperband.
AutoKeras is an automated machine learning (AutoML) library built on Keras. It automatically searches for the best model architecture and hyperparameters for a given dataset, making deep learning accessible to users without extensive expertise in model design.
Keras 3 includes a distribution API (keras.distribution) that simplifies data parallelism and model parallelism. The API allows users to distribute training across multiple GPUs or TPUs with minimal code changes:
distribution = keras.distribution.DataParallel(
devices=keras.distribution.list_devices()
)
keras.distribution.set_distribution(distribution)
The distribution API keeps model definition, training logic, and sharding configuration entirely separate, making it easy to scale training without restructuring code.
Keras is used across a wide range of deep learning applications:
| Domain | Examples | Common Layers/Models |
|---|---|---|
| Image classification | Object detection, face recognition, medical imaging | Conv2D, ResNet, EfficientNet |
| Natural language processing | Text classification, sentiment analysis, machine translation | Embedding, LSTM, Transformer, BERT |
| Generative AI | Image synthesis, text generation, data augmentation | GANs, VAEs, Stable Diffusion |
| Speech and audio | Speech recognition, audio classification | Conv1D, Whisper |
| Time series | Forecasting, anomaly detection | LSTM, GRU, Conv1D |
| Reinforcement learning | Game playing, robot control | Dense, custom training loops |
Keras powers major production systems, including the Waymo self-driving fleet and the YouTube recommendation engine.
Keras and PyTorch are two of the most widely used frameworks for deep learning, but they take different approaches.
| Aspect | Keras | PyTorch |
|---|---|---|
| Abstraction level | High-level API | Lower-level framework |
| Ease of use | Very beginner-friendly; minimal boilerplate | Requires more code but feels Pythonic |
| Debugging | Relies on backend tools; can be less transparent | Excellent debugging with standard Python tools |
| Training loop | Built-in model.fit() handles most cases | Manual training loops offer full control |
| Research adoption | Common in applied ML and industry | Dominant in academic research |
| Cutting-edge models | Available through KerasHub | Most new state-of-the-art models appear first in PyTorch |
| Deployment | Strong TensorFlow ecosystem (TF Serving, TF Lite, TF.js) | TorchServe, ONNX export |
| Backend flexibility | Multi-backend (JAX, TF, PyTorch, OpenVINO) | PyTorch only |
| Performance | Can leverage JAX for best GPU/TPU performance | Optimized for GPU; strong CUDA support |
When to use Keras: Keras is best suited for rapid prototyping, educational purposes, and small to medium-scale production projects. It is also a strong choice when backend flexibility is important or when deploying to mobile and edge devices through TensorFlow Lite. Teams that want to use JAX's performance advantages without learning JAX's functional programming model can use Keras as a familiar interface.
When to use PyTorch: PyTorch is preferred for cutting-edge research, when you need fine-grained control over training dynamics, or when working with models that are primarily published in the PyTorch ecosystem. It is also the standard in most academic labs.
Hybrid approach: Many teams use both frameworks. Keras 3's multi-backend support means that a model written in Keras can run on PyTorch as its backend, bridging the gap between the two ecosystems.
Imagine you want to build something out of LEGO blocks. You could try to make every single tiny brick yourself from scratch, which would take forever. Or you could use a LEGO kit that already has all the special pieces sorted and labeled, with instructions showing you how to snap them together.
Keras is like that LEGO kit, but for building smart computer programs. Deep learning programs are made of building blocks called "layers" that each do a small job, like looking at pictures or reading words. Keras gives you all these building blocks pre-made, so you just pick the ones you need and snap them together in the right order.
Once you put your blocks together, you "train" your creation by showing it lots of examples (like thousands of pictures of cats and dogs) so it learns to tell them apart. Keras handles all the complicated math behind the scenes. You just say "learn from these examples" and it does the rest.
The best part is that Keras works with several different "engines" underneath (called backends), so it is like having one set of LEGO instructions that works with different brands of building blocks.