Queue
Last reviewed
May 11, 2026
Sources
9 citations
Review status
Source-backed
Revision
v2 ยท 2,199 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 11, 2026
Sources
9 citations
Review status
Source-backed
Revision
v2 ยท 2,199 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: Machine learning terms
A queue, in the context of machine learning, refers to the use of a data structure that follows the First-In-First-Out (FIFO) principle to store and manage data during the processing of machine learning tasks. Elements are removed from the queue in the order they were inserted. Queues can be utilized in various stages of the machine learning process, such as during data preprocessing, model training, or serving predictions.
In deep learning frameworks, queues historically served as the backbone of asynchronous input pipelines. Producer threads write training examples into a shared buffer while the consumer, the training loop, reads mini-batches out. This decoupling lets the GPU stay busy while the CPU reads files from disk and applies augmentations. Modern frameworks have moved away from explicit queue objects in user code, but the underlying idea of a thread-safe FIFO buffer between an I/O stage and a compute stage still drives TensorFlow's tf.data and PyTorch's DataLoader under the hood.
In data preprocessing, queues are used to manage the flow of data as it is transformed and prepared for training or testing a machine learning model. Data is often fed to a machine learning algorithm in batches, and using a queue helps ensure that the data is presented in a consistent and orderly manner. For example, a queue can be used to manage the order in which data is loaded and preprocessed for data augmentation, a technique used to increase the amount and diversity of training data.
During model training, queues can be employed to manage the flow of mini-batches of data to the algorithm, as well as to manage the processing of multiple threads or processes in parallel. Queues help balance the workload between different hardware resources, such as CPUs or GPUs, and can be especially useful in distributed machine learning systems, where multiple nodes work together to train a model. The use of queues in model training can lead to improved performance and reduced training time by minimizing the idle time of hardware resources.
In the context of serving predictions, queues can be utilized to manage incoming requests for predictions from a deployed machine learning model. When a model is deployed as a service or an API, it is common for multiple requests to be sent simultaneously. Queues can help maintain the order of incoming requests, manage the load on the model, and ensure that responses are sent back to the users in a timely manner. This is particularly important for applications where response time and throughput are critical, such as in real-time decision making or high-traffic web services.
TensorFlow exposes an explicit queue API in the tf.queue module. In TensorFlow 1.x this was the primary mechanism for building input pipelines; in TensorFlow 2.x the same classes are still available but they live alongside tf.compat.v1.train.QueueRunner, which has been deprecated with a notice telling users to switch to tf.data for input pipelines.
A queue in TensorFlow is itself a node in the computation graph. It is a stateful node, much like a variable: other nodes in the graph can enqueue new items into the queue or dequeue existing items from it. Because the queue lives in the graph rather than in Python, enqueue and dequeue ops can run on different devices, including remote workers in a distributed setup.
The tf.queue module provides four queue classes, each with a different ordering policy.
| Class | Behavior |
|---|---|
tf.queue.FIFOQueue | Dequeues elements in first-in, first-out order. The simplest case, useful when input ordering should be preserved. |
tf.queue.PaddingFIFOQueue | A FIFOQueue that supports variable-sized tensors by zero-padding shorter elements up to the longest in the batch when dequeue_many is called. Designed for mini-batch training on inputs of different lengths, like sentences or audio clips. |
tf.queue.PriorityQueue | Dequeues elements according to a 64-bit integer priority value supplied with each enqueue. Lower priority values come out first. |
tf.queue.RandomShuffleQueue | Dequeues a random element from the buffer. Used to shuffle training examples without holding the entire dataset in memory. It requires a min_after_dequeue parameter so that there is enough material in the buffer for randomness to be meaningful. |
Each queue is constructed with a fixed capacity and one or more dtypes. Items go in with enqueue or enqueue_many and come out with dequeue or dequeue_many. Closing a queue with close() signals no more items will arrive, after which pending dequeues raise OutOfRangeError when the buffer drains.
TensorFlow 1.x paired these queues with two helper classes that managed the threads filling them.
The QueueRunner class created a group of threads that repeatedly ran a given enqueue operation. A typical training graph would build a RandomShuffleQueue, define an enqueue op that reads a record from disk and decodes it, attach a QueueRunner with a few threads, and then call tf.train.start_queue_runners(sess) to launch the threads.
The Coordinator class was a small synchronization primitive that let those threads stop together and propagate exceptions. A queue runner also ran a closer thread that automatically closed the queue if the coordinator reported an exception, so that the dequeue side would not block forever.
This architecture had pain points that drove its deprecation. Lock contention on the queue and the Python Global Interpreter Lock limited multi-threaded producer throughput. Exceptions in producer threads surfaced through the coordinator in ways that were hard to debug. If a user forgot to call tf.train.start_queue_runners, the training loop would hang indefinitely on the first dequeue.
TensorFlow introduced the tf.data API in TensorFlow 1.4 as a clean-sheet redesign of the input pipeline, and made it the default recommended approach in TensorFlow 2.0. A tf.data.Dataset represents a sequence of elements where each element consists of one or more tensor components. Pipelines are built by chaining transformations rather than wiring queues and threads by hand.
A basic image training pipeline in tf.data looks like this:
dataset = tf.data.Dataset.from_tensor_slices((image_paths, labels))
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.map(load_and_preprocess, num_parallel_calls=tf.data.AUTOTUNE)
dataset = dataset.batch(32)
dataset = dataset.prefetch(tf.data.AUTOTUNE)
model.fit(dataset, epochs=10)
This short snippet replaces what would have been dozens of lines of queue, queue runner, and coordinator code in TensorFlow 1.x.
A dataset can be constructed from Dataset.from_tensor_slices (slicing an in-memory tensor along its first dimension), Dataset.from_generator (wrapping a Python generator), tf.data.TFRecordDataset (reading TFRecord protocol buffer files, the format Google recommends for large datasets), tf.data.TextLineDataset (text files line by line), or tf.data.experimental.make_csv_dataset for CSV.
| Method | What it does |
|---|---|
map(fn) | Applies fn to each element. Used for decoding images, parsing TFRecords, or running augmentation. Pass num_parallel_calls=tf.data.AUTOTUNE to run the function on multiple CPU cores. |
batch(n) | Stacks n consecutive elements into a single batched element. Pass drop_remainder=True to discard a smaller final batch so that shapes stay static. |
padded_batch(n, padded_shapes) | A version of batch that pads variable-length elements, replacing the role of PaddingFIFOQueue. |
shuffle(buffer_size) | Pulls items into a buffer of the given size and dequeues randomly. Larger buffers approximate true random shuffling more closely but use more memory. Replaces RandomShuffleQueue. |
repeat(count) | Cycles the dataset, infinitely if no count is given. |
prefetch(n) | Overlaps the work of producing elements with the work of consuming them, using a background thread and an internal buffer of size n. |
interleave(fn, cycle_length, num_parallel_calls) | Reads from several files in parallel and interleaves their elements, useful for sharded TFRecords. |
cache() | Caches the dataset in memory or on local disk so that expensive preprocessing only runs in the first epoch. |
filter(predicate) | Keeps only elements for which the predicate returns true. |
Two transformations carry most of the performance weight in a tf.data pipeline.
The prefetch transformation decouples the time when data is produced from the time when data is consumed. The TensorFlow documentation describes it this way: while the model executes training step s, the input pipeline reads data for step s+1. The training step time becomes the maximum of model time and input pipeline time, instead of their sum. This is the single biggest win in most input pipelines, and prefetch(tf.data.AUTOTUNE) should usually be the final transformation in the chain.
The map transformation accepts a num_parallel_calls argument that runs the user function on multiple CPU cores. For image pipelines that decode JPEGs and apply augmentations, this is often necessary to keep up with a GPU.
A common optimization is to apply expensive functions after batch rather than before, so the function runs once per batch instead of once per element. The official TensorFlow performance guide reports a roughly five-fold drop in execution time from this change in their example pipeline.
tf.data.AUTOTUNE is a sentinel value that tells the tf.data runtime to choose values dynamically based on hardware and pipeline characteristics. It is supported by prefetch, map, and interleave. The runtime adjusts buffer sizes and thread counts during execution, trying to use the minimum buffer needed to keep the accelerator fed while respecting memory limits. In most cases it removes the need for manual tuning.
Queue-based pipelines were difficult to compose, hard to debug, and dependent on Python threading with its GIL contention. Each queue and queue runner was an opaque stateful object that the graph optimizer could not rewrite. The tf.data API replaces those threads with C++ backed iterators and exposes the pipeline as a sequence of transformations the runtime can fuse and parallelize. TensorFlow GitHub issue 23067 from 2018 documents edge cases where tf.data did not initially cover every queue use case.
PyTorch never had a queue API exposed at the user level in the way TensorFlow 1.x did. Instead it splits the work between two classes: torch.utils.data.Dataset and torch.utils.data.DataLoader.
| PyTorch class | Role |
|---|---|
Dataset | Defines what the data is. Implements __len__ and __getitem__. A custom subclass typically reads one example from disk or memory and returns a tensor. |
DataLoader | Defines how to iterate. Wraps a Dataset and handles batching, shuffling, and parallel loading through worker processes. |
The DataLoader uses subprocess workers rather than threads to sidestep the GIL. num_workers controls how many subprocesses load data in the background; num_workers=0 keeps everything in the main process and positive values fork workers that pull samples in parallel. pin_memory=True allocates output tensors in pinned host memory so that transfers to the GPU can use asynchronous DMA. collate_fn lets users customize how a list of samples is merged into a batch, the PyTorch equivalent of padded_batch for variable-length inputs.
The analogues are clear:
DataLoader(dataset, batch_size=32) is roughly dataset.batch(32).DataLoader(..., shuffle=True) is roughly dataset.shuffle(...) with full per-epoch shuffling.DataLoader(..., num_workers=4) is roughly dataset.map(..., num_parallel_calls=4) plus prefetch.collate_fn is roughly padded_batch(..., padded_shapes=...).A practical difference is shuffling semantics. PyTorch's DataLoader with shuffle=True shuffles all indices before each epoch, a true random permutation. TensorFlow's Dataset.shuffle(buffer_size) is a sliding-window shuffle whose quality depends on buffer size; setting it equal to the dataset size approximates a full shuffle but costs memory.
| Year | Event |
|---|---|
| 2015 | TensorFlow 0.x ships with FIFOQueue, RandomShuffleQueue, and QueueRunner. |
| 2017 | TensorFlow 1.4 introduces the tf.data API as the new official input pipeline. |
| 2018 | QueueRunner is officially deprecated, directing users to tf.data. |
| 2019 | TensorFlow 2.0 ships with tf.data as default; queue and queue runner APIs move to tf.compat.v1. |
Imagine you are at a playground, and there is a slide that everyone wants to use. The kids form a line, waiting for their turn. The first kid in the line goes down the slide, and then the next kid in line takes their turn. This line of kids is similar to a queue in machine learning.
In machine learning, queues help to organize and manage data or tasks that need to be processed. They make sure everything happens in the right order and that nothing gets missed. Queues are used in different parts of machine learning, like when preparing data, training a model, or giving answers to questions.
In modern TensorFlow you usually do not see the queue itself. You write something like dataset.shuffle(10000).batch(32).prefetch(AUTOTUNE) and the library builds the queue for you. The line of kids is still there, you just do not have to point at each one and tell them when to move.