Inference

See also: Machine learning terms

Introduction

In machine learning, inference is when you make predictions or generate content by applying a trained model to new data such as unlabeled examples or prompts.

Inference Process

Inference in machine learning involves several steps. First, the trained model is loaded into memory and then new data is fed into it. Afterward, the model utilizes parameters and functions learned from its training data to make predictions or decisions about this new data.

Types of Inference

In machine learning, there are two main types: real-time inference and batch inference.

Real-time inference refers to making predictions as new data is collected; this approach works best when the model must respond quickly to changes such as image or speech recognition systems.
Batch inference on the other hand involves making predictions for a large dataset at once and is commonly employed when models don't need to respond in real-time like recommendation systems do.

Considerations for Inference

Speed and accuracy of inference are critical factors when using machine learning models. Speed of inference is especially crucial in real-time applications since it determines the model's capability to respond rapidly to changing data. On the other hand, accuracy inference has an impact on all applications since it determines the usefulness and dependability of predictions made by the model.

Explain Like I'm 5 (ELI5)

Machine learning models use inference, or making a guess based on what you have learned from examples. Imagine having pictures of animals and wanting to guess which kind is in a new picture that hasn't been seen before; using what you learned from looking at other images, you could use what was known before to make your guess. That's similar to what a machine learning model does - except instead of using your brain, it uses math instead!