# Queue

> Source: https://aiwiki.ai/wiki/queue
> Updated: 2026-06-29
> Categories: Machine Learning
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

*See also: [Machine learning terms](/wiki/machine_learning_terms)*

## Queue in machine learning

A queue in machine learning is a First-In-First-Out (FIFO) data structure that stages and buffers data between an input/output stage and a compute stage so that elements are processed in the order they were inserted. Queues decouple a producer (code that reads files from disk, decodes them, and applies augmentations) from a consumer (the training loop or a deployed model), which lets the two run concurrently and keeps an accelerator such as a [GPU](/wiki/gpu) busy instead of stalling on data. In deep learning, queues historically powered asynchronous input pipelines: [TensorFlow](/wiki/tensorflow) 1.x exposed them directly through the `tf.queue` module and `QueueRunner`, while modern frameworks hide the same FIFO buffer inside [TensorFlow](/wiki/tensorflow)'s `tf.data` and [PyTorch](/wiki/pytorch)'s `DataLoader` [3][7].

Elements are removed from the queue in the order they were inserted. Queues can be utilized in various stages of the machine learning process, such as during data preprocessing, model training, or serving predictions. Producer threads write training examples into a shared buffer while the consumer, the training loop, reads mini-batches out. This decoupling lets the GPU stay busy while the CPU reads files from disk and applies augmentations. Modern frameworks have moved away from explicit queue objects in user code, but the underlying idea of a thread-safe FIFO buffer between an I/O stage and a compute stage still drives the input pipeline under the hood.

### Data preprocessing

In [data preprocessing](/wiki/data_preprocessing), queues are used to manage the flow of data as it is transformed and prepared for training or testing a machine learning model. Data is often fed to a machine learning algorithm in batches, and using a queue helps ensure that the data is presented in a consistent and orderly manner. For example, a queue can be used to manage the order in which data is loaded and preprocessed for [data augmentation](/wiki/data_augmentation), a technique used to increase the amount and diversity of training data.

### Model training

During model training, queues can be employed to manage the flow of [mini-batches](/wiki/mini_batch) of data to the algorithm, as well as to manage the processing of multiple threads or processes in parallel. Queues help balance the workload between different hardware resources, such as CPUs or GPUs, and can be especially useful in distributed machine learning systems, where multiple nodes work together to train a model. The use of queues in model training can lead to improved performance and reduced training time by minimizing the idle time of hardware resources.

### Serving predictions

In the context of serving predictions, queues can be utilized to manage incoming requests for predictions from a deployed machine learning model. When a model is deployed as a service or an API, it is common for multiple requests to be sent simultaneously. Queues can help maintain the order of incoming requests, manage the load on the model, and ensure that responses are sent back to the users in a timely manner. This is particularly important for applications where response time and throughput are critical, such as in real-time decision making or high-traffic web services. In modern inference servers the request queue is paired with dynamic batching, which holds requests for a short window and groups them so the accelerator processes several at once, trading a small amount of latency for much higher throughput.

## What problem does a queue solve in a training pipeline?

Reading and decoding data is usually a CPU and disk bound task, while the forward and backward passes are accelerator bound. If these two happen sequentially, the per-step time is the sum of the two and the expensive GPU sits idle while the CPU works. A queue placed between them turns that sum into an overlap: the producer fills the buffer for the next step while the consumer trains on the current step. This is the mechanism behind every modern input pipeline, and it is the single most common cause of, and cure for, an input-bound training job where the accelerator utilization sits well below 100 percent [4].

## Queues in TensorFlow (tf.queue)

TensorFlow exposes an explicit queue API in the `tf.queue` module. In TensorFlow 1.x this was the primary mechanism for building input pipelines; in TensorFlow 2.x the same classes are still available but they live alongside `tf.compat.v1.train.QueueRunner`, whose `__init__` is deprecated with a notice telling users to switch to `tf.data` for input pipelines [2].

A queue in TensorFlow is itself a node in the computation graph. It is a stateful node, much like a variable: other nodes in the graph can enqueue new items into the queue or dequeue existing items from it. Because the queue lives in the graph rather than in Python, enqueue and dequeue ops can run on different devices, including remote workers in a distributed setup.

### Queue types in tf.queue

The `tf.queue` module provides four queue classes, each with a different ordering policy [1].

| Class | Behavior |
|---|---|
| `tf.queue.FIFOQueue` | Dequeues elements in first-in, first-out order. The simplest case, useful when input ordering should be preserved. |
| `tf.queue.PaddingFIFOQueue` | A FIFOQueue that supports variable-sized tensors by zero-padding shorter elements up to the longest in the batch when `dequeue_many` is called. Designed for mini-batch training on inputs of different lengths, like sentences or audio clips. |
| `tf.queue.PriorityQueue` | Dequeues elements according to a 64-bit integer priority value supplied with each enqueue. Lower priority values come out first. |
| `tf.queue.RandomShuffleQueue` | Dequeues a random element from the buffer. Used to shuffle training examples without holding the entire dataset in memory. It requires a `min_after_dequeue` parameter so that there is enough material in the buffer for randomness to be meaningful. |

Each queue is constructed with a fixed `capacity` and one or more `dtypes`. Items go in with `enqueue` or `enqueue_many` and come out with `dequeue` or `dequeue_many`. Closing a queue with `close()` signals no more items will arrive, after which pending dequeues raise `OutOfRangeError` when the buffer drains.

### Queue runners and coordinators

TensorFlow 1.x paired these queues with two helper classes that managed the threads filling them.

The `QueueRunner` class created a group of threads that repeatedly ran a given enqueue operation. A typical training graph would build a `RandomShuffleQueue`, define an enqueue op that reads a record from disk and decodes it, attach a `QueueRunner` with a few threads, and then call `tf.train.start_queue_runners(sess)` to launch the threads.

The `Coordinator` class was a small synchronization primitive that let those threads stop together and propagate exceptions. A queue runner also ran a closer thread that automatically closed the queue if the coordinator reported an exception, so that the dequeue side would not block forever.

This architecture had pain points that drove its deprecation. Lock contention on the queue and the [Python](/wiki/python) Global Interpreter Lock limited multi-threaded producer throughput. Exceptions in producer threads surfaced through the coordinator in ways that were hard to debug. If a user forgot to call `tf.train.start_queue_runners`, the training loop would hang indefinitely on the first dequeue.

## tf.data.Dataset, the modern replacement

TensorFlow introduced the `tf.data` API in TensorFlow 1.4, released on November 7, 2017, as a clean-sheet redesign of the input pipeline, graduating it from `tf.contrib.data` into the core package; it became the default recommended approach in TensorFlow 2.0, released on September 30, 2019 [3][4]. A `tf.data.Dataset` represents a sequence of elements where each element consists of one or more tensor components. Pipelines are built by chaining transformations rather than wiring queues and threads by hand.

A basic image training pipeline in tf.data looks like this:

```python
dataset = tf.data.Dataset.from_tensor_slices((image_paths, labels))
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.map(load_and_preprocess, num_parallel_calls=tf.data.AUTOTUNE)
dataset = dataset.batch(32)
dataset = dataset.prefetch(tf.data.AUTOTUNE)
model.fit(dataset, epochs=10)
```

This short snippet replaces what would have been dozens of lines of queue, queue runner, and coordinator code in TensorFlow 1.x.

### Common dataset sources

A dataset can be constructed from `Dataset.from_tensor_slices` (slicing an in-memory tensor along its first dimension), `Dataset.from_generator` (wrapping a Python generator), `tf.data.TFRecordDataset` (reading TFRecord protocol buffer files, the format Google recommends for large datasets), `tf.data.TextLineDataset` (text files line by line), or `tf.data.experimental.make_csv_dataset` for CSV.

### Core transformations

| Method | What it does |
|---|---|
| `map(fn)` | Applies `fn` to each element. Used for decoding images, parsing TFRecords, or running augmentation. Pass `num_parallel_calls=tf.data.AUTOTUNE` to run the function on multiple CPU cores. |
| `batch(n)` | Stacks `n` consecutive elements into a single batched element. Pass `drop_remainder=True` to discard a smaller final batch so that shapes stay static. |
| `padded_batch(n, padded_shapes)` | A version of `batch` that pads variable-length elements, replacing the role of `PaddingFIFOQueue`. |
| `shuffle(buffer_size)` | Pulls items into a buffer of the given size and dequeues randomly. Larger buffers approximate true random shuffling more closely but use more memory. Replaces `RandomShuffleQueue`. |
| `repeat(count)` | Cycles the dataset, infinitely if no count is given. |
| `prefetch(n)` | Overlaps the work of producing elements with the work of consuming them, using a background thread and an internal buffer of size `n`. |
| `interleave(fn, cycle_length, num_parallel_calls)` | Reads from several files in parallel and interleaves their elements, useful for sharded TFRecords. |
| `cache()` | Caches the dataset in memory or on local disk so that expensive preprocessing only runs in the first epoch. |
| `filter(predicate)` | Keeps only elements for which the predicate returns true. |

### prefetch and parallel mapping

Two transformations carry most of the performance weight in a tf.data pipeline.

The `prefetch` transformation decouples the time when data is produced from the time when data is consumed. The TensorFlow performance guide describes it this way: "While the model is executing training step `s`, the input pipeline is reading the data for step `s+1`. Doing so reduces the step time to the maximum (as opposed to the sum) of the training and the time it takes to extract the data." [4] The guide adds that prefetch "provides benefits any time there is an opportunity to overlap the work of a 'producer' with a 'consumer'" [4]. This is the single biggest win in most input pipelines, and `prefetch(tf.data.AUTOTUNE)` should usually be the final transformation in the chain.

The `map` transformation accepts a `num_parallel_calls` argument that runs the user function on multiple CPU cores. For image pipelines that decode JPEGs and apply augmentations, this is often necessary to keep up with a GPU.

A common optimization is to apply expensive functions after `batch` rather than before, so the function runs once per batch instead of once per element. In the official TensorFlow performance guide example, vectorizing the mapped function in this way cuts execution time from about 0.240 seconds to about 0.050 seconds, an almost fivefold speedup [4].

### AUTOTUNE

`tf.data.AUTOTUNE` is a sentinel value that, in the words of the TensorFlow documentation, "will prompt the tf.data runtime to tune the value dynamically at runtime" based on hardware and pipeline characteristics [4]. It is supported by `prefetch`, `map`, and `interleave`. The runtime adjusts buffer sizes and thread counts during execution, trying to use the minimum buffer needed to keep the accelerator fed while respecting memory limits. In most cases it removes the need for manual tuning.

### Why tf.data replaced queues

Queue-based pipelines were difficult to compose, hard to debug, and dependent on Python threading with its GIL contention. Each queue and queue runner was an opaque stateful object that the graph optimizer could not rewrite. The tf.data API replaces those threads with C++ backed iterators and exposes the pipeline as a sequence of transformations the runtime can fuse and parallelize [3]. TensorFlow GitHub issue 23067 from 2018 documents edge cases where tf.data did not initially cover every queue use case [5].

## How does PyTorch handle queues?

[PyTorch](/wiki/pytorch) never had a queue API exposed at the user level in the way TensorFlow 1.x did. Instead it splits the work between two classes: `torch.utils.data.Dataset` and `torch.utils.data.DataLoader` [7].

| PyTorch class | Role |
|---|---|
| `Dataset` | Defines what the data is. Implements `__len__` and `__getitem__`. A custom subclass typically reads one example from disk or memory and returns a tensor. |
| `DataLoader` | Defines how to iterate. Wraps a Dataset and handles batching, shuffling, and parallel loading through worker processes. |

The DataLoader uses subprocess workers rather than threads to sidestep the GIL. `num_workers` controls how many subprocesses load data in the background; it defaults to 0, which means the data is loaded in the main process, and positive values fork workers that pull samples in parallel [7]. `pin_memory=True` allocates output tensors in pinned, page-locked host memory so that transfers to the GPU can use asynchronous DMA, which is faster than copying from ordinary pageable memory [7]. `collate_fn` lets users customize how a list of samples is merged into a batch, the PyTorch equivalent of `padded_batch` for variable-length inputs.

The analogues are clear:

- `DataLoader(dataset, batch_size=32)` is roughly `dataset.batch(32)`.
- `DataLoader(..., shuffle=True)` is roughly `dataset.shuffle(...)` with full per-epoch shuffling.
- `DataLoader(..., num_workers=4)` is roughly `dataset.map(..., num_parallel_calls=4)` plus `prefetch`.
- `collate_fn` is roughly `padded_batch(..., padded_shapes=...)`.

A practical difference is shuffling semantics. PyTorch's DataLoader with `shuffle=True` shuffles all indices before each epoch, a true random permutation. TensorFlow's `Dataset.shuffle(buffer_size)` is a sliding-window shuffle whose quality depends on buffer size; setting it equal to the dataset size approximates a full shuffle but costs memory.

## When did queue-based input pipelines fall out of use?

| Year | Event |
|---|---|
| 2015 | TensorFlow 0.x ships with `FIFOQueue`, `RandomShuffleQueue`, and `QueueRunner`. |
| 2017 | TensorFlow 1.4 (November 7, 2017) introduces the `tf.data` API as the new official input pipeline [3]. |
| 2018 | `QueueRunner` is officially deprecated, directing users to `tf.data` [2]. |
| 2019 | TensorFlow 2.0 (September 30, 2019) ships with `tf.data` as default; queue and queue runner APIs move to `tf.compat.v1` [3]. |

## Explain like I'm 5 (ELI5)

Imagine you are at a playground, and there is a slide that everyone wants to use. The kids form a line, waiting for their turn. The first kid in the line goes down the slide, and then the next kid in line takes their turn. This line of kids is similar to a queue in machine learning.

In machine learning, queues help to organize and manage data or tasks that need to be processed. They make sure everything happens in the right order and that nothing gets missed. Queues are used in different parts of machine learning, like when preparing data, training a model, or giving answers to questions.

In modern TensorFlow you usually do not see the queue itself. You write something like `dataset.shuffle(10000).batch(32).prefetch(AUTOTUNE)` and the library builds the queue for you. The line of kids is still there, you just do not have to point at each one and tell them when to move.

## References

1. TensorFlow API docs, `tf.queue.FIFOQueue`, `tf.queue.PaddingFIFOQueue`, `tf.queue.PriorityQueue`, `tf.queue.RandomShuffleQueue`. https://www.tensorflow.org/api_docs/python/tf/queue
2. TensorFlow API docs, `tf.compat.v1.train.QueueRunner`. https://www.tensorflow.org/api_docs/python/tf/compat/v1/train/QueueRunner
3. TensorFlow Core guide, "tf.data: Build TensorFlow input pipelines". https://www.tensorflow.org/guide/data
4. TensorFlow Core guide, "Better performance with the tf.data API". https://www.tensorflow.org/guide/data_performance
5. TensorFlow GitHub issue 23067, "QueueRunner going towards deprecation, but tf.data does not replace all usecases?" https://github.com/tensorflow/tensorflow/issues/23067
6. TensorFlow GitHub issue 7951, "Redesigning TensorFlow's input pipelines". https://github.com/tensorflow/tensorflow/issues/7951
7. PyTorch documentation, `torch.utils.data.DataLoader`. https://pytorch.org/docs/stable/data.html
8. "Announcing TensorFlow r1.4", Google Developers Blog, November 7, 2017. https://developers.googleblog.com/en/announcing-tensorflow-r14/
9. "TensorFlow 2.0 is now available!", The TensorFlow Blog, September 30, 2019. https://blog.tensorflow.org/2019/09/tensorflow-20-is-now-available.html