Summary

In TensorFlow, a summary is a piece of data written to disk during training so that it can later be visualized in TensorBoard. The tf.summary module is the API used to record those values. A scalar loss, the distribution of weights in a layer, a batch of generated images, a snippet of synthesized audio, or a block of model configuration text can all be turned into summaries and streamed to an event file as training proceeds. TensorBoard then reads that event file and renders the data as interactive charts, image grids, audio players, histograms, and graphs.

The word "summary" in this context is narrow and technical. It does not refer to text summarization or to a printout of a model with model.summary(). It refers to a serialized record, written by tf.summary.scalar, tf.summary.histogram, tf.summary.image, tf.summary.audio, tf.summary.text, or related functions, that TensorBoard knows how to display.

Purpose

Training a neural network can take hours or days, and most of the interesting behavior lives inside the loop: how the loss curves bend, whether gradients explode, what the model is generating on the validation set, how learning rates decay. Reading numbers from print calls is fine for toy problems and miserable for anything else. Summaries solve this by giving you a structured, time stamped log that a separate visualization tool can render while the run is still in progress.

A few of the things teams use tf.summary for in practice:

Tracking training and validation loss, accuracy, and other scalar metrics over steps or epochs.
Watching weight, bias, gradient, and activation distributions evolve, which makes vanishing or exploding gradients visible early.
Inspecting predicted images from a generative model side by side with real samples after each epoch.
Logging audio samples produced by a text to speech or music model so a human can listen to progress.
Recording model configuration, hyperparameters, and run notes as markdown so the run is self documenting.
Profiling computation graphs and op level execution to find slow kernels and device transfers.
Visualising attention maps, segmentation masks, or feature heatmaps alongside the underlying input so model behaviour can be inspected qualitatively.
Storing 3D point clouds, meshes, and embedding projections that go beyond what a single line chart can communicate.

The broader pattern that tf.summary belongs to is sometimes called experiment instrumentation. The training script emits structured events; a separate viewer process subscribes to those events. Coupling is loose, the writer never has to know about the dashboard, and the dashboard never has to interrupt training. That separation is what makes the workflow scalable to long jobs on remote machines.

Core api

The tf.summary module in TensorFlow 2 exposes a small set of writer functions. Each one accepts a name, the data to record, and an integer step that places the value in time. The exact signature of the scalar writer is tf.summary.scalar(name, data, step=None, description=None), and the other writers follow the same shape.

Function	Signature	What it writes
`tf.summary.scalar`	`(name, data, step=None, description=None)`	A single floating point number per call. Used for loss, accuracy, learning rate, gradient norm.
`tf.summary.histogram`	`(name, data, step=None, buckets=None, description=None)`	A tensor binned into a histogram for distribution analysis of weights, biases, gradients, activations.
`tf.summary.image`	`(name, data, step=None, max_outputs=3, description=None)`	One or more images, shaped `[k, h, w, c]` with `c` of 1, 3, or 4.
`tf.summary.audio`	`(name, data, sample_rate, step=None, max_outputs=3, encoding=None, description=None)`	One or more audio clips with a sample rate, optionally encoded as `wav` or `mp3`.
`tf.summary.text`	`(name, data, step=None, description=None)`	A string or string tensor, rendered as markdown in TensorBoard.
`tf.summary.write`	`(tag, tensor, metadata=None, name=None, step=None)`	Low level escape hatch for writing an arbitrary tensor and SummaryMetadata.
`tf.summary.create_file_writer`	`(logdir, max_queue=None, flush_millis=None, filename_suffix=None, name=None, experimental_trackable=False)`	Returns a `SummaryWriter` bound to a log directory.
`tf.summary.create_noop_writer`	`()`	Returns a writer that drops everything, useful for disabling logging from a single replica.
`tf.summary.flush`	`(writer=None, name=None)`	Forces buffered events to disk.
`tf.summary.record_if`	`(condition)`	Context manager that gates whether subsequent ops record.
`tf.summary.should_record_summaries`	`()`	Returns the current value of the recording condition as a `tf.bool` tensor.
`tf.summary.trace_on` / `trace_off` / `trace_export`	`trace_on(graph=True, profiler=False)` and friends	Captures the computational graph and profiling trace from a single `tf.function` call.

Each writer function returns a Python bool (in eager mode) or a tf.bool tensor (inside tf.function) that is True if the summary was actually recorded. The boolean lets you wire conditional logic into a training loop without checking the active writer state by hand.

There are also helpers under tf.summary.experimental for setting a default step so that you do not have to thread it through every call, plus tf.summary.experimental.write_raw_pb for shipping a pre-built protobuf blob.

Implicit step handling

Threading a step argument through every summary call gets noisy. The experimental step helpers exist to handle that:

tf.summary.experimental.set_step(step)

Once set, subsequent calls to tf.summary.scalar, histogram, image, and friends will use the stored step if step is left as None. Inside a tf.function, the helper integrates with the autograph capture machinery so that an outer Python variable can be used as the step, even though the function is running in graph mode.

Writers and event files

Nothing is written to disk until you create a SummaryWriter. tf.summary.create_file_writer(logdir) returns a writer that opens an append only tfevents file inside logdir. Summary ops do not pick a writer implicitly; they look up the writer that is currently active via the as_default() context manager. The pattern looks like this:

import tensorflow as tf

writer = tf.summary.create_file_writer("logs/run-01")

with writer.as_default():
    for step in range(num_steps):
        loss = train_one_batch()
        tf.summary.scalar("loss", loss, step=step)
        if step % 100 == 0:
            tf.summary.histogram("layer1/weights", model.layers<sup><a href="#cite_note-0" class="cite-ref">[0]</a></sup>.kernel, step=step)
            tf.summary.image("samples", generate_samples(), step=step, max_outputs=4)

The writer batches events in memory and flushes them to the file periodically. The default flush interval is 120 seconds, controlled by the flush_millis argument. The max_queue argument controls how many events can pile up in memory between flushes (default 10), and filename_suffix lets you add a tag to the event file name so multiple runs in the same directory can be distinguished by inspection. Calling writer.flush() or tf.summary.flush() forces an immediate write, which is useful at the end of training, around checkpoints, or before exiting after an exception.

Different runs should go to different log directories so that TensorBoard can show them as separate experiments and let you toggle them on and off. A common convention is logs/<experiment-name>/<timestamp>. Splitting training and validation into sibling directories such as logs/run-01/train and logs/run-01/val lets TensorBoard overlay the two curves on the same chart.

Event file format on disk

The file written by create_file_writer follows TensorFlow's tfevents record format. Each record is a length prefixed Event protobuf, defined in tensorflow/core/util/event.proto. The first record in every file is a file_version event with the value "brain.Event:2". Subsequent records can carry a wall time, a step, and one of several payload types: a Summary protobuf with one or more Value entries, a LogMessage, a SessionLog, or a TaggedRunMetadata for profiling.

The filename itself follows the pattern events.out.tfevents.<unix-seconds>.<host>.<pid>.<suffix>.v2. The pieces are useful for ad hoc debugging. The unix timestamp makes it easy to sort runs by start time, the hostname identifies which machine wrote the file in a distributed job, and the v2 marker distinguishes the modern event format used since TensorFlow 2.

Files are append only. The writer does not rewrite earlier events, which is what allows TensorBoard to tail a file safely while training is still running. The drawback is that a long run with very high frequency logging can produce gigabyte sized event files, so most teams cap the sampling rate of heavyweight payloads such as images and audio.

Reading events programmatically

The same event files can be parsed without launching TensorBoard. The tf.compat.v1.train.summary_iterator(path) generator walks a file and yields Event objects, which is enough to extract scalars, images, and tensors into pandas frames or numpy arrays. TensorBoard 2.3 added a more ergonomic tensorboard.data.experimental.ExperimentFromDev API that returns scalar runs as a pandas DataFrame with run, tag, step, and value columns. Setting pivot=True returns a wide form frame with tags as columns, which is the easiest input for downstream statistical analysis.

Tf 1 versus tf 2

tf.summary was redesigned for TensorFlow 2. In TensorFlow 1, summary ops produced protocol buffer tensors that had to be fetched through Session.run, aggregated with tf.summary.merge_all, and written manually with a separate FileWriter. That two stage flow assumed a static graph and did not fit eager execution. In TensorFlow 2, the writer is part of the execution context, summary ops write directly when they run, and the global step is passed explicitly to every call instead of being managed by a hidden collection.

The second positional argument was also renamed from tensor to data, and the collections and family keyword arguments were removed. Old TensorFlow 1 code can still run by importing tf.compat.v1.summary, but the official guidance is to migrate. Most non trivial Keras and custom training loops written after 2019 use the TensorFlow 2 style.

The table below summarises the practical differences a code reviewer will run into when porting a script.

Aspect	TF 1.x	TF 2.x
Writer class	`tf.summary.FileWriter`	`tf.summary.SummaryWriter` via `create_file_writer`
Activation	Manual `FileWriter.add_summary(buffer, step)`	Implicit via `with writer.as_default():` context
Aggregation	`tf.summary.merge_all()` over graph collections	Removed, no longer needed
Second positional arg	`tensor`	`data`
Global step	Implicit via `tf.train.get_or_create_global_step`	Explicit `step` argument on each call
Conditional logging	`if condition: ...` outside the graph	`with tf.summary.record_if(condition):`
Return value of op	A serialized `Summary` proto tensor	`True` if recorded, `False` otherwise
Removed keywords	n/a	`family`, `collections` deleted
Backwards compatibility	n/a	`tf.compat.v1.summary` retains the old API

TensorFlow shipped an automated migration script, tf_upgrade_v2, that rewrites a v1 codebase to either the v2 API or the tf.compat.v1.summary shim. Most non trivial training scripts still need a manual pass because the v1 pattern of fetching merged summary ops through a session does not map one to one onto the v2 writer model.

Using summaries from keras

If you train with model.fit, the easiest way to get summaries is the built in tf.keras.callbacks.TensorBoard callback. Adding it to the callback list opens a writer, records the loss and metric for each epoch, optionally logs weight histograms with histogram_freq, and can capture profiling data and embedding projections.

tb = tf.keras.callbacks.TensorBoard(
    log_dir="logs/fit",
    histogram_freq=1,
    write_graph=True,
    profile_batch="500,520",
)
model.fit(x_train, y_train, epochs=10, callbacks=[tb])

The callback writes to one directory per fit call. Custom metrics, image grids, or other ad hoc summaries can be added inside a custom callback by opening a writer in on_train_begin and calling tf.summary.scalar or tf.summary.image in on_epoch_end.

Callback parameters in full

The Keras callback exposes more knobs than the typical example shows. The table below lists every parameter and its default value, drawn from the current tf.keras.callbacks.TensorBoard documentation.

Parameter	Default	Effect
`log_dir`	`'./logs'`	Directory where event files are written. Pass a timestamped subdirectory per run.
`histogram_freq`	`0`	Compute weight and activation histograms every N epochs. `0` disables them.
`write_graph`	`True`	Write the model graph to the event file. Adds visible structure to the Graphs dashboard, can produce large files.
`write_images`	`False`	Render model weights as image tiles. Useful for visualising convolutional filters.
`write_steps_per_second`	`False`	Emit a scalar of training throughput per step.
`update_freq`	`'epoch'`	Either `'epoch'`, `'batch'`, or an integer batch count. Controls how often scalars are written.
`profile_batch`	`'500,520'`	Range of batches to profile. Pass `0` to disable.
`embeddings_freq`	`0`	Frequency (in epochs) at which embedding layers are exported for the Projector dashboard.
`embeddings_metadata`	`None`	Optional mapping from embedding layer name to a metadata file (usually a TSV of labels).

For custom logging inside the same callback hierarchy, the documented pattern is to call tf.summary.create_file_writer once, store the writer on self, and emit values from inside on_epoch_end or on_batch_end. This composes cleanly with the built in callback because both writers can target the same log_dir and TensorBoard will merge their events under the same run.

A custom training loop with gradient tape

Outside Keras, a typical training loop with tf.GradientTape looks like the snippet below. The pattern is the same as before: open a writer, enter its default context, log scalars per step and heavier payloads on a slower schedule.

train_writer = tf.summary.create_file_writer("logs/run-01/train")
val_writer = tf.summary.create_file_writer("logs/run-01/val")

for epoch in range(epochs):
    for step, (x, y) in enumerate(train_ds):
        with tf.GradientTape() as tape:
            preds = model(x, training=True)
            loss = loss_fn(y, preds)
        grads = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(grads, model.trainable_variables))

        with train_writer.as_default():
            tf.summary.scalar("loss", loss, step=optimizer.iterations)

    val_loss = evaluate(model, val_ds)
    with val_writer.as_default():
        tf.summary.scalar("loss", val_loss, step=epoch)

Keeping the two writers in sibling directories is what lets TensorBoard render training and validation loss on the same chart automatically.

Tensorboard

TensorBoard is the companion tool that reads event files and renders them. Starting it is a one line command:

tensorboard --logdir logs

It then serves a local web app, by default on port 6006, that scans the directory tree for tfevents files and groups them by run. The dashboards mirror the writer functions: scalars and time series for tf.summary.scalar, distributions and histograms for tf.summary.histogram, an image gallery for tf.summary.image, an audio player for tf.summary.audio, a markdown view for tf.summary.text, a graph viewer for traced tf.function calls, and a profiler for runs that captured profile data.

TensorBoard reads the log directory continuously, so summaries written during a long training run appear in the UI within seconds. Multiple runs in the same parent directory are rendered as overlaid lines on every scalar chart, which is how you compare hyperparameter sweeps visually.

The table below summarises the dashboards a vanilla TensorBoard install exposes.

Dashboard	Backed by	What you see
Time Series / Scalars	`tf.summary.scalar`	Line charts of metric vs step or epoch, with per run colours.
Histograms	`tf.summary.histogram`	Overlapping distribution curves stacked along the step axis.
Distributions	`tf.summary.histogram`	Percentile bands of the same data, easier to read than raw histograms.
Images	`tf.summary.image`	Grid of recorded image tensors with a step slider.
Audio	`tf.summary.audio`	Embedded HTML audio players.
Text	`tf.summary.text`	Rendered markdown blocks per step.
Graphs	`tf.summary.trace_on` / `trace_export`, Keras `write_graph`	Op graph viewer with namespace collapse.
Projector	`embeddings_freq` callback or `projector.visualize_embeddings`	2D / 3D embedding viewer with PCA, t-SNE, UMAP.
HParams	`hp.hparams_config`, `hp.hparams`	Table, parallel coordinates, and scatter views of hyperparameter sweeps.
Profile	`tf.summary.trace_on(profiler=True)` or Keras `profile_batch`	Step time graph, trace viewer, input pipeline analyzer.
Mesh	`mesh_summary.op` from the mesh plugin	Interactive 3D point clouds and triangle meshes.

For Jupyter and Colab users, TensorBoard exposes line magics: %load_ext tensorboard followed by %tensorboard --logdir logs opens the dashboard inside a notebook cell.

Tracing graphs

The graph viewer in TensorBoard does not get populated automatically once tf.function is used. You have to trace the function explicitly:

@tf.function
def step(x, y):
    return loss_fn(y, model(x))

tf.summary.trace_on(graph=True, profiler=True)
step(sample_x, sample_y)
with writer.as_default():
    tf.summary.trace_export(name="step", step=0, profiler_outdir="logs/profile")

The constraint to be aware of is that exactly one tf.function call should happen between trace_on and trace_export, otherwise the resulting graph will be a mash up of every traced function. The recommended pattern is to call trace_off at the end of the block, or to do tracing in a one off script outside the main training loop.

Hyperparameter sweeps and the hparams plugin

The tensorboard.plugins.hparams.api module sits on top of tf.summary and adds first class support for hyperparameter sweeps. It defines three building blocks: HParam for a tunable parameter, domain helpers such as Discrete, RealInterval, and IntInterval, and Metric for an outcome to optimise. A sweep looks like this:

from tensorboard.plugins.hparams import api as hp

HP_UNITS = hp.HParam("num_units", hp.Discrete([16, 32]))
HP_LR = hp.HParam("learning_rate", hp.RealInterval(1e-4, 1e-2))
METRIC_ACC = "accuracy"

with tf.summary.create_file_writer("logs/sweep").as_default():
    hp.hparams_config(
        hparams=[HP_UNITS, HP_LR],
        metrics=[hp.Metric(METRIC_ACC, display_name="Accuracy")],
    )

def trial(run_dir, hparams):
    with tf.summary.create_file_writer(run_dir).as_default():
        hp.hparams(hparams)
        acc = train_eval(hparams)
        tf.summary.scalar(METRIC_ACC, acc, step=1)

TensorBoard then renders the sweep in three views: a sortable table, a parallel coordinates plot for spotting clusters, and a scatter view for correlating any hyperparameter with any metric.

Profiler integration

The TensorFlow Profiler is itself a TensorBoard plugin, and its data is delivered through the same event file pipeline as summaries. There are two main entry points. The Keras callback's profile_batch argument captures a range of batches automatically. The lower level tf.summary.trace_on(profiler=True) and tf.summary.trace_export(profiler_outdir=...) API captures a single trace from a tf.function call.

The Profile tab adds several focused dashboards on top of the data:

Overview Page, with a step time breakdown into compute, input, and idle time. The header bar flags whether the step is compute bound or input bound.
Trace Viewer, a Chrome trace style timeline showing host CPU threads, device streams, kernels, and memory copies. Keyboard shortcuts W and S zoom, A and D pan, M measures intervals.
Input Pipeline Analyzer, which highlights tf.data stages that are bottlenecks.
TensorFlow Stats, an op by op table of cumulative time and self time.
Memory Profile, a timeline of allocator activity per device.

The canonical use of the profiler is to confirm whether a slow training loop is GPU bound, CPU bound, or input bound. Adding cache() and prefetch() to the tf.data pipeline is the typical fix once the trace viewer shows long idle gaps on the device stream.

Embeddings projector

The Embeddings Projector visualises high dimensional vectors in 2D or 3D. There are two ways to feed it. The Keras callback exposes embeddings_freq and embeddings_metadata, which exports the weights of any tf.keras.layers.Embedding layer along with an optional TSV of labels. The lower level path is to write a projector_config.pbtxt next to the event files and call projector.visualize_embeddings(logdir, config) from tensorboard.plugins.projector. Either way, the projector exposes PCA, t-SNE, and UMAP projections, a search box for label lookup, and a nearest neighbour view that highlights similar vectors. It is especially useful for inspecting learned word embeddings, contrastive learning representations, and the latent space of an autoencoder.

Mesh and 3D summaries

The mesh plugin ships with TensorBoard and accepts 3D data as tensors of vertex coordinates, optional per vertex colours, and optional triangle face indices. The summary is created with mesh_summary.op from tensorboard.plugins.mesh.summary, which writes a payload that the dashboard interprets as a renderable scene. A config_dict argument is forwarded to THREE.js so the viewer can be customised with camera, lighting, and material settings. The plugin is widely used in 3D reconstruction work and point cloud segmentation, where comparing a predicted mesh to the ground truth is far more informative than any scalar metric.

Distributed training caveats

Distributed training adds wrinkles that the single GPU examples do not surface. The first is per replica writers. Under tf.distribute.MirroredStrategy, every replica runs the model code. If every replica calls tf.summary.scalar, you end up with N copies of the same value at every step, which double counts in TensorBoard. The recommended pattern is to wrap the summary call in a check for the chief replica, for example by using tf.summary.create_noop_writer() on non chief replicas. The Keras callback handles this automatically.

The second is summary calls inside tf.function. Default writers do not cross tf.function boundaries the way Python state does. Inside a tf.function, the active writer is resolved at trace time, so calling with writer.as_default() inside the function is safer than relying on an outer context. The companion helper tf.summary.experimental.set_step makes the step available inside the trace.

The third is TPU constraints. TPU runs flush less aggressively than GPU runs because of how the host coordinates with the device. Explicit writer.flush() calls at the end of each epoch make the difference between seeing your scalars in TensorBoard in real time and waiting until training finishes.

The fourth is multi worker logging. With MultiWorkerMirroredStrategy, every worker has its own log directory or every worker has to write into a shared filesystem. Writing to the same file from multiple machines is unsupported; the standard advice is to give each worker its own subdirectory keyed on task_id, then let TensorBoard merge them.

Using summaries from keras callbacks for ad hoc metrics

Not every metric fits neatly into compile(metrics=[...]). Confusion matrices, per class precision and recall curves, and qualitative samples often need a custom callback. The pattern is the same as before. Open a writer in on_train_begin, write scalars or images in on_epoch_end, and flush in on_train_end.

class SampleImageCallback(tf.keras.callbacks.Callback):
    def __init__(self, logdir, sample_inputs):
        self.writer = tf.summary.create_file_writer(logdir)
        self.sample_inputs = sample_inputs

    def on_epoch_end(self, epoch, logs=None):
        preds = self.model(self.sample_inputs)
        with self.writer.as_default():
            tf.summary.image("preds", preds, step=epoch, max_outputs=4)

    def on_train_end(self, logs=None):
        self.writer.flush()

This is also the recommended way to log matplotlib figures. The standard helper from the documentation renders a figure to an in memory PNG, decodes it with tf.image.decode_png, and adds a batch dimension before calling tf.summary.image. Confusion matrices, ROC curves, and learning rate schedules are typically delivered this way.

Pytorch and the same idea

The same logging pattern exists outside TensorFlow. PyTorch ships torch.utils.tensorboard.SummaryWriter, which writes the same tfevents format that TensorBoard understands. The method names are slightly different but the mental model matches:

PyTorch method	TensorFlow analogue
`add_scalar`	`tf.summary.scalar`
`add_scalars`	Multiple `tf.summary.scalar` calls under one tag
`add_histogram`	`tf.summary.histogram`
`add_image`, `add_images`	`tf.summary.image`
`add_audio`	`tf.summary.audio`
`add_text`	`tf.summary.text`
`add_graph`	`tf.summary.trace_on` and `trace_export`
`add_embedding`	Embedding projector via `tf.summary` plugin
`add_pr_curve`, `add_hparams`, `add_mesh`, `add_figure`, `add_video`	TensorBoard plugins

A typical PyTorch training loop with logging looks like this:

from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter("logs/run-01")
for step, (x, y) in enumerate(loader):
    loss = train_step(x, y)
    writer.add_scalar("loss", loss.item(), step)
    if step % 100 == 0:
        writer.add_histogram("layer1/weights", model.layer1.weight, step)
writer.close()

The writer flushes asynchronously, the same tensorboard --logdir logs command picks up the events, and the dashboards behave the same. Teams that mix frameworks often standardize on TensorBoard precisely because both PyTorch and TensorFlow can target it.

Alternatives

tf.summary and TensorBoard are not the only way to log experiments. The space of experiment tracking tools has expanded substantially since 2018, and most teams running large training jobs end up combining a deep learning specific viewer with a higher level run manager. The most commonly cited alternatives are listed below.

Tool	Type	What it adds on top of `tf.summary`
Weights & Biases	Hosted SaaS	`wandb.log({...})` API, automatic system metrics, hosted dashboards, hyperparameter sweeps, model registry, reports. Can auto sync existing `tfevents` files with `wandb.init(sync_tensorboard=True)`.
MLflow	Open source, optional cloud	`log_param`, `log_metric`, `log_artifact`, model registry, project packaging, autolog for popular libraries.
Neptune	Hosted SaaS	Run metadata at scale, organisation features, code and dataset versioning, integration with the popular DL frameworks.
Aim	Open source, self hosted	High performance metric store, pythonic search SDK, dashboard with run comparison and aggregations. Used at Meta, Amazon, Microsoft.
ClearML	Open source plus hosted tier	Experiment tracking, agent based orchestration, dataset management, hyperparameter tuning, model registry.
Comet	Hosted SaaS	Experiment tracking, model production monitoring, LLM specific dashboards, automatic source code logging.
Sacred + Omniboard	Open source	Lightweight Python decorator API plus MongoDB backed UI. Popular in academic labs.
Guild AI	Open source CLI	File based experiment tracking, hyperparameter search, no server.
DVC + Iterative Studio	Open source plus hosted	Data and model versioning with optional metric dashboards layered on git.

MLflow is a framework agnostic experiment tracker. Calls such as mlflow.log_param, mlflow.log_metric, and mlflow.log_artifact record hyperparameters, scalar metrics, and arbitrary files inside a run. MLflow leans toward run management, model registry, and reproducibility rather than rich visualization, and it can ingest TensorBoard event files as artifacts when you want both. Autolog support exists for Scikit-learn, XGBoost, PyTorch, Keras, Spark, and several other libraries, so a single call to mlflow.autolog() captures parameters and metrics without explicit logging code.

Weights & Biases (wandb) is a hosted experiment tracking platform with a similar logging API: wandb.log({"loss": loss, "accuracy": acc}) for scalars, wandb.Image for images, wandb.Histogram for distributions, and wandb.watch(model) to log gradient and parameter histograms automatically. It adds collaboration features such as shared dashboards, sweep automation, and report writing on top of the basic logging. The TensorBoard integration is fully automated: passing sync_tensorboard=True to wandb.init causes the agent to mirror every event file write into the W&B cloud, where the same scalars, histograms, and images appear alongside W&B native logs.

Aim focuses on speed and self hosting. Its UI handles hundreds of thousands of metric sequences, and the SDK exposes a pythonic query language for filtering runs without leaving the notebook. Teams that already have an internal artifact store and just want a metric dashboard sometimes prefer Aim to a hosted SaaS because the data stays on their infrastructure.

Neptune sits between MLflow and W&B in feature surface. It targets organisations that need access controls, project level governance, and integration with both DL and classical ML pipelines. The Python API matches the same log_metric and log_artifact shape, and Neptune mirrors tfevents data through a similar bridge to the one W&B offers.

ClearML is open source with a hosted tier. Its strongest feature is the agent based orchestration on top of experiment tracking. A user submits a training script to a remote agent, the agent reproduces the environment, runs the job, and ships back metrics and artifacts. The metric API itself looks much like the others.

Many teams use more than one of these together. A common pattern is to use tf.summary or the PyTorch SummaryWriter for low level deep learning visualization, MLflow for run cataloging and model registry, and wandb for collaboration and hyperparameter sweeps. The data being logged is largely the same; the differences are storage, UI, and what surrounds the raw metrics.

Practical notes

A few things worth knowing once you start using tf.summary in real training jobs:

Summaries written inside a tf.function need the writer's as_default() context inside the function, since default writers do not cross tf.function boundaries automatically.
Logging every step is wasteful on long runs. Logging scalars every step is usually fine; histograms and images are cheaper if written every N steps or once per epoch.
Large images, videos, and audio inflate event files quickly. The max_outputs argument on tf.summary.image and tf.summary.audio caps how many examples per call are written.
If a process crashes before a flush, the last few seconds of summaries may be lost. Wrapping training in a try block that calls writer.flush() in the finally clause is cheap insurance.
TensorBoard scales to many runs but slows down once a single run logs millions of scalar points. Downsampling or aggregating before logging keeps the UI responsive.
Run directory naming dominates UX. Use a timestamp plus a short human readable label, for example logs/2026-05-16_baseline_resnet50. Random hashes or numeric ids are hard to scan in a sweep.
When comparing experiments, keep tag names consistent across runs. TensorBoard groups identical tags into one chart, so renaming loss to train_loss halfway through a sweep splits the curve across two panels.
For multi GPU runs under tf.distribute.MirroredStrategy, gate summary calls behind a tf.distribute.get_replica_context().replica_id_in_sync_group == 0 check, or rely on the Keras callback to do it for you. Otherwise N replicas log N copies of the same value.
Disable graph writing (write_graph=False) on very large models. The serialized graph can dominate event file size and slow down TensorBoard's initial load.
The Profile tab is gated on the right Python plugin. pip install -U tensorboard-plugin-profile is what makes it appear after the first profile capture.

Common pitfalls

A short list of bugs that beginners run into:

No writer activated. Calling tf.summary.scalar outside of a with writer.as_default(): block silently returns False and writes nothing. The return value of the call is the easiest signal.
Forgetting the step. Pass step=optimizer.iterations or use tf.summary.experimental.set_step. Without a step, the value falls back to the default and stacks up on top of itself.
Logging Python ints vs tf.Tensor types. Both work in eager mode. Inside tf.function, only tensor values are supported, so wrap Python scalars with tf.constant or rely on the implicit conversion.
Histograms on tensors with NaN. A single NaN poisons the binning and produces an empty distribution. Filter out NaN values, or clip before logging.
Out of range image values. tf.summary.image expects either uint8 in [0, 255] or float32 in [0, 1]. Logging unnormalised activations as images often produces saturated white tiles.
Logging from a tf.data map function. map runs on a separate thread that does not inherit the default writer. Move logging into the training step.

Other meanings of "summary" in machine learning

The word also appears in unrelated contexts. model.summary() in Keras prints a human readable table of layers, shapes, and parameter counts. Text summarization is the task of producing a short version of a document, handled by models such as BART, T5, and Pegasus. "Data summary" can mean descriptive statistics over a dataset, or it can refer to broader techniques for reducing data and models, such as dimensionality reduction with PCA or t SNE, model compression by quantization, pruning, and knowledge distillation, and ensemble methods like bagging and boosting. Those are separate concepts that share a name; in TensorFlow code, "summary" almost always means a tf.summary event.

History

tf.summary predates the rest of TensorBoard. In early TensorFlow 0.x releases, summaries were tied to a SummaryWriter that lived in the same process as the training loop and shared a Session. The mental model assumed a single graph, a single session, and explicit merge_all calls. That model was awkward in research code that wanted to log custom values on the fly, and it broke entirely when eager execution arrived.

The redesign for TensorFlow 2 was tracked through a public RFC process on GitHub during 2018 and 2019. The shipping API removed merge_all, replaced FileWriter with create_file_writer, made the global step explicit, and renamed the second positional argument from tensor to data. The migration script tf_upgrade_v2 was released alongside TensorFlow 2.0 in September 2019 to ease porting. The compat layer tf.compat.v1.summary retains the v1 surface for code that has not been ported.

Around the same time, the broader experiment tracking ecosystem grew. Weights & Biases launched its public beta in early 2018, MLflow released its first stable version in mid 2018, Neptune.ai expanded its product around the same period, and Aim and ClearML appeared shortly after. All of them shipped TensorBoard sync paths, partly because TensorBoard was already the de facto local viewer and partly because the tfevents format was easy to ingest. The result is that tf.summary today sits at the bottom of a layered stack: the raw events on disk feed both TensorBoard and any number of higher level platforms.

Purpose

Core api

Implicit step handling

Writers and event files

Event file format on disk

Reading events programmatically

Tf 1 versus tf 2

Using summaries from keras

Callback parameters in full

A custom training loop with gradient tape

Tensorboard

Tracing graphs

Hyperparameter sweeps and the hparams plugin

Profiler integration

Embeddings projector

Mesh and 3D summaries

Distributed training caveats

Using summaries from keras callbacks for ad hoc metrics

Pytorch and the same idea

Alternatives

Practical notes

Common pitfalls

Other meanings of "summary" in machine learning

History

References

Improve this article

Related Articles

Machine learning terms/Natural Language Processing

Machine learning terms/Computer Vision

Machine learning terms/Sequence Models

Split

Static

Agglomerative clustering

Purpose

Core api

Implicit step handling

Writers and event files

Event file format on disk

Reading events programmatically

Tf 1 versus tf 2

Using summaries from keras

Callback parameters in full

A custom training loop with gradient tape

Tensorboard

Tracing graphs

Hyperparameter sweeps and the hparams plugin

Profiler integration

Embeddings projector

Mesh and 3D summaries

Distributed training caveats

Using summaries from keras callbacks for ad hoc metrics

Pytorch and the same idea

Alternatives

Practical notes

Common pitfalls

Other meanings of "summary" in machine learning

History

References

Related Articles

Machine learning terms/Natural Language Processing

Machine learning terms/Computer Vision

Machine learning terms/Sequence Models

Split

Static

Agglomerative clustering