# Queue

> Source: https://aiwiki.ai/wiki/queue
> Updated: 2026-05-11
> Categories: Machine Learning
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

*See also: [Machine learning terms](/wiki/machine_learning_terms)*

## Queue in machine learning

A queue, in the context of machine learning, refers to the use of a data structure that follows the First-In-First-Out (FIFO) principle to store and manage data during the processing of machine learning tasks. Elements are removed from the queue in the order they were inserted. Queues can be utilized in various stages of the machine learning process, such as during data preprocessing, model training, or serving predictions.

In deep learning frameworks, queues historically served as the backbone of asynchronous input pipelines. Producer threads write training examples into a shared buffer while the consumer, the training loop, reads mini-batches out. This decoupling lets the [GPU](/wiki/gpu) stay busy while the CPU reads files from disk and applies augmentations. Modern frameworks have moved away from explicit queue objects in user code, but the underlying idea of a thread-safe FIFO buffer between an I/O stage and a compute stage still drives [TensorFlow](/wiki/tensorflow)'s tf.data and [PyTorch](/wiki/pytorch)'s DataLoader under the hood.

### Data preprocessing

In [data preprocessing](/wiki/data_preprocessing), queues are used to manage the flow of data as it is transformed and prepared for training or testing a machine learning model. Data is often fed to a machine learning algorithm in batches, and using a queue helps ensure that the data is presented in a consistent and orderly manner. For example, a queue can be used to manage the order in which data is loaded and preprocessed for [data augmentation](/wiki/data_augmentation), a technique used to increase the amount and diversity of training data.

### Model training

During model training, queues can be employed to manage the flow of [mini-batches](/wiki/mini_batch) of data to the algorithm, as well as to manage the processing of multiple threads or processes in parallel. Queues help balance the workload between different hardware resources, such as CPUs or GPUs, and can be especially useful in distributed machine learning systems, where multiple nodes work together to train a model. The use of queues in model training can lead to improved performance and reduced training time by minimizing the idle time of hardware resources.

### Serving predictions

In the context of serving predictions, queues can be utilized to manage incoming requests for predictions from a deployed machine learning model. When a model is deployed as a service or an API, it is common for multiple requests to be sent simultaneously. Queues can help maintain the order of incoming requests, manage the load on the model, and ensure that responses are sent back to the users in a timely manner. This is particularly important for applications where response time and throughput are critical, such as in real-time decision making or high-traffic web services.

## Queues in TensorFlow (tf.queue)

TensorFlow exposes an explicit queue API in the `tf.queue` module. In TensorFlow 1.x this was the primary mechanism for building input pipelines; in TensorFlow 2.x the same classes are still available but they live alongside `tf.compat.v1.train.QueueRunner`, which has been deprecated with a notice telling users to switch to `tf.data` for input pipelines.

A queue in TensorFlow is itself a node in the computation graph. It is a stateful node, much like a variable: other nodes in the graph can enqueue new items into the queue or dequeue existing items from it. Because the queue lives in the graph rather than in Python, enqueue and dequeue ops can run on different devices, including remote workers in a distributed setup.

### Queue types in tf.queue

The `tf.queue` module provides four queue classes, each with a different ordering policy.

| Class | Behavior |
|---|---|
| `tf.queue.FIFOQueue` | Dequeues elements in first-in, first-out order. The simplest case, useful when input ordering should be preserved. |
| `tf.queue.PaddingFIFOQueue` | A FIFOQueue that supports variable-sized tensors by zero-padding shorter elements up to the longest in the batch when `dequeue_many` is called. Designed for mini-batch training on inputs of different lengths, like sentences or audio clips. |
| `tf.queue.PriorityQueue` | Dequeues elements according to a 64-bit integer priority value supplied with each enqueue. Lower priority values come out first. |
| `tf.queue.RandomShuffleQueue` | Dequeues a random element from the buffer. Used to shuffle training examples without holding the entire dataset in memory. It requires a `min_after_dequeue` parameter so that there is enough material in the buffer for randomness to be meaningful. |

Each queue is constructed with a fixed `capacity` and one or more `dtypes`. Items go in with `enqueue` or `enqueue_many` and come out with `dequeue` or `dequeue_many`. Closing a queue with `close()` signals no more items will arrive, after which pending dequeues raise `OutOfRangeError` when the buffer drains.

### Queue runners and coordinators

TensorFlow 1.x paired these queues with two helper classes that managed the threads filling them.

The `QueueRunner` class created a group of threads that repeatedly ran a given enqueue operation. A typical training graph would build a `RandomShuffleQueue`, define an enqueue op that reads a record from disk and decodes it, attach a `QueueRunner` with a few threads, and then call `tf.train.start_queue_runners(sess)` to launch the threads.

The `Coordinator` class was a small synchronization primitive that let those threads stop together and propagate exceptions. A queue runner also ran a closer thread that automatically closed the queue if the coordinator reported an exception, so that the dequeue side would not block forever.

This architecture had pain points that drove its deprecation. Lock contention on the queue and the [Python](/wiki/python) Global Interpreter Lock limited multi-threaded producer throughput. Exceptions in producer threads surfaced through the coordinator in ways that were hard to debug. If a user forgot to call `tf.train.start_queue_runners`, the training loop would hang indefinitely on the first dequeue.

## tf.data.Dataset, the modern replacement

TensorFlow introduced the `tf.data` API in TensorFlow 1.4 as a clean-sheet redesign of the input pipeline, and made it the default recommended approach in TensorFlow 2.0. A `tf.data.Dataset` represents a sequence of elements where each element consists of one or more tensor components. Pipelines are built by chaining transformations rather than wiring queues and threads by hand.

A basic image training pipeline in tf.data looks like this:

```python
dataset = tf.data.Dataset.from_tensor_slices((image_paths, labels))
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.map(load_and_preprocess, num_parallel_calls=tf.data.AUTOTUNE)
dataset = dataset.batch(32)
dataset = dataset.prefetch(tf.data.AUTOTUNE)
model.fit(dataset, epochs=10)
```

This short snippet replaces what would have been dozens of lines of queue, queue runner, and coordinator code in TensorFlow 1.x.

### Common dataset sources

A dataset can be constructed from `Dataset.from_tensor_slices` (slicing an in-memory tensor along its first dimension), `Dataset.from_generator` (wrapping a Python generator), `tf.data.TFRecordDataset` (reading TFRecord protocol buffer files, the format Google recommends for large datasets), `tf.data.TextLineDataset` (text files line by line), or `tf.data.experimental.make_csv_dataset` for CSV.

### Core transformations

| Method | What it does |
|---|---|
| `map(fn)` | Applies `fn` to each element. Used for decoding images, parsing TFRecords, or running augmentation. Pass `num_parallel_calls=tf.data.AUTOTUNE` to run the function on multiple CPU cores. |
| `batch(n)` | Stacks `n` consecutive elements into a single batched element. Pass `drop_remainder=True` to discard a smaller final batch so that shapes stay static. |
| `padded_batch(n, padded_shapes)` | A version of `batch` that pads variable-length elements, replacing the role of `PaddingFIFOQueue`. |
| `shuffle(buffer_size)` | Pulls items into a buffer of the given size and dequeues randomly. Larger buffers approximate true random shuffling more closely but use more memory. Replaces `RandomShuffleQueue`. |
| `repeat(count)` | Cycles the dataset, infinitely if no count is given. |
| `prefetch(n)` | Overlaps the work of producing elements with the work of consuming them, using a background thread and an internal buffer of size `n`. |
| `interleave(fn, cycle_length, num_parallel_calls)` | Reads from several files in parallel and interleaves their elements, useful for sharded TFRecords. |
| `cache()` | Caches the dataset in memory or on local disk so that expensive preprocessing only runs in the first epoch. |
| `filter(predicate)` | Keeps only elements for which the predicate returns true. |

### prefetch and parallel mapping

Two transformations carry most of the performance weight in a tf.data pipeline.

The `prefetch` transformation decouples the time when data is produced from the time when data is consumed. The TensorFlow documentation describes it this way: while the model executes training step `s`, the input pipeline reads data for step `s+1`. The training step time becomes the maximum of model time and input pipeline time, instead of their sum. This is the single biggest win in most input pipelines, and `prefetch(tf.data.AUTOTUNE)` should usually be the final transformation in the chain.

The `map` transformation accepts a `num_parallel_calls` argument that runs the user function on multiple CPU cores. For image pipelines that decode JPEGs and apply augmentations, this is often necessary to keep up with a GPU.

A common optimization is to apply expensive functions after `batch` rather than before, so the function runs once per batch instead of once per element. The official TensorFlow performance guide reports a roughly five-fold drop in execution time from this change in their example pipeline.

### AUTOTUNE

`tf.data.AUTOTUNE` is a sentinel value that tells the tf.data runtime to choose values dynamically based on hardware and pipeline characteristics. It is supported by `prefetch`, `map`, and `interleave`. The runtime adjusts buffer sizes and thread counts during execution, trying to use the minimum buffer needed to keep the accelerator fed while respecting memory limits. In most cases it removes the need for manual tuning.

### Why tf.data replaced queues

Queue-based pipelines were difficult to compose, hard to debug, and dependent on Python threading with its GIL contention. Each queue and queue runner was an opaque stateful object that the graph optimizer could not rewrite. The tf.data API replaces those threads with C++ backed iterators and exposes the pipeline as a sequence of transformations the runtime can fuse and parallelize. TensorFlow GitHub issue 23067 from 2018 documents edge cases where tf.data did not initially cover every queue use case.

## PyTorch equivalents

[PyTorch](/wiki/pytorch) never had a queue API exposed at the user level in the way TensorFlow 1.x did. Instead it splits the work between two classes: `torch.utils.data.Dataset` and `torch.utils.data.DataLoader`.

| PyTorch class | Role |
|---|---|
| `Dataset` | Defines what the data is. Implements `__len__` and `__getitem__`. A custom subclass typically reads one example from disk or memory and returns a tensor. |
| `DataLoader` | Defines how to iterate. Wraps a Dataset and handles batching, shuffling, and parallel loading through worker processes. |

The DataLoader uses subprocess workers rather than threads to sidestep the GIL. `num_workers` controls how many subprocesses load data in the background; `num_workers=0` keeps everything in the main process and positive values fork workers that pull samples in parallel. `pin_memory=True` allocates output tensors in pinned host memory so that transfers to the GPU can use asynchronous DMA. `collate_fn` lets users customize how a list of samples is merged into a batch, the PyTorch equivalent of `padded_batch` for variable-length inputs.

The analogues are clear:

- `DataLoader(dataset, batch_size=32)` is roughly `dataset.batch(32)`.
- `DataLoader(..., shuffle=True)` is roughly `dataset.shuffle(...)` with full per-epoch shuffling.
- `DataLoader(..., num_workers=4)` is roughly `dataset.map(..., num_parallel_calls=4)` plus `prefetch`.
- `collate_fn` is roughly `padded_batch(..., padded_shapes=...)`.

A practical difference is shuffling semantics. PyTorch's DataLoader with `shuffle=True` shuffles all indices before each epoch, a true random permutation. TensorFlow's `Dataset.shuffle(buffer_size)` is a sliding-window shuffle whose quality depends on buffer size; setting it equal to the dataset size approximates a full shuffle but costs memory.

## A short timeline

| Year | Event |
|---|---|
| 2015 | TensorFlow 0.x ships with `FIFOQueue`, `RandomShuffleQueue`, and `QueueRunner`. |
| 2017 | TensorFlow 1.4 introduces the `tf.data` API as the new official input pipeline. |
| 2018 | `QueueRunner` is officially deprecated, directing users to `tf.data`. |
| 2019 | TensorFlow 2.0 ships with `tf.data` as default; queue and queue runner APIs move to `tf.compat.v1`. |

## Explain like I'm 5 (ELI5)

Imagine you are at a playground, and there is a slide that everyone wants to use. The kids form a line, waiting for their turn. The first kid in the line goes down the slide, and then the next kid in line takes their turn. This line of kids is similar to a queue in machine learning.

In machine learning, queues help to organize and manage data or tasks that need to be processed. They make sure everything happens in the right order and that nothing gets missed. Queues are used in different parts of machine learning, like when preparing data, training a model, or giving answers to questions.

In modern TensorFlow you usually do not see the queue itself. You write something like `dataset.shuffle(10000).batch(32).prefetch(AUTOTUNE)` and the library builds the queue for you. The line of kids is still there, you just do not have to point at each one and tell them when to move.

## References

1. TensorFlow API docs, `tf.queue.FIFOQueue`, `tf.queue.PaddingFIFOQueue`, `tf.queue.PriorityQueue`, `tf.queue.RandomShuffleQueue`. https://www.tensorflow.org/api_docs/python/tf/queue
2. TensorFlow API docs, `tf.compat.v1.train.QueueRunner`. https://www.tensorflow.org/api_docs/python/tf/compat/v1/train/QueueRunner
3. TensorFlow Core guide, "tf.data: Build TensorFlow input pipelines". https://www.tensorflow.org/guide/data
4. TensorFlow Core guide, "Better performance with the tf.data API". https://www.tensorflow.org/guide/data_performance
5. TensorFlow GitHub issue 23067, "QueueRunner going towards deprecation, but tf.data does not replace all usecases?" https://github.com/tensorflow/tensorflow/issues/23067
6. TensorFlow GitHub issue 7951, "Redesigning TensorFlow's input pipelines". https://github.com/tensorflow/tensorflow/issues/7951
7. PyTorch documentation, `torch.utils.data.DataLoader`. https://pytorch.org/docs/stable/data.html
8. "TensorFlow and Queues", plan space, March 27, 2017. https://planspace.org/20170327-tensorflow_and_queues/
9. "TensorFlow's QueueRunner", plan space, April 30, 2017. https://planspace.org/20170430-tensorflows_queuerunner/
