Edge computing
Last reviewed
Apr 28, 2026
Sources
51 citations
Review status
Source-backed
Revision
v1 ยท 4,653 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Apr 28, 2026
Sources
51 citations
Review status
Source-backed
Revision
v1 ยท 4,653 words
Add missing citations, update stale details, or suggest a clearer explanation.
Edge computing is a distributed computing paradigm that brings computation and data storage closer to the sources of data, often called the "edge" of the network, rather than relying on a centralized cloud computing data center. The motivations include reducing latency, saving bandwidth, addressing data sovereignty and privacy, enabling offline operation, and supporting real time AI inference on resource constrained devices [1] [2]. It covers a spectrum of architectures, from server clusters at base stations to neural network accelerators inside smartphones, vehicles, drones, cameras, appliances, and industrial sensors.
Edge computing grew out of three converging trends: the explosion of Internet of Things (IoT) sensors generating data faster than backbone networks could absorb, the rollout of 5G cellular networks with single digit millisecond targets, and the commoditization of neural processing units (NPUs) inside consumer hardware. By the mid 2020s, on device AI moved from speech recognition and camera filters into running compact large language models locally, with examples such as Apple Intelligence, Gemini Nano, and Microsoft's Phi-3 showing that several billion parameter models could run on a phone or laptop without contacting a remote server [3] [4] [5].
| Edge computing | |
|---|---|
| Type | Distributed computing paradigm |
| Coined | "Fog computing" by Cisco, January 2014 |
| Key precursors | Akamai CDN (1998); cloudlet concept (CMU, 2009) |
| Standards | ETSI MEC; LF Edge; OpenFog (merged into LF Edge) |
| Adjacent terms | Fog computing; cloudlet; on device computing; near edge; far edge |
| Notable platforms | AWS IoT Greengrass; Azure IoT Edge; Google Distributed Cloud Edge; Cloudflare Workers |
| Notable hardware | NVIDIA Jetson; Apple Neural Engine; Google Coral; Hailo; Qualcomm Hexagon |
The field has accumulated overlapping vocabulary, sometimes due to vendor branding and sometimes for legitimate technical distinctions.
Edge computing in its general sense refers to any computation that takes place at or near the source of data, rather than in a centralized cloud region. The idea predates the cloud era and is closely related to content delivery networks and on premises servers [1]. Fog computing is a related term coined by Cisco engineer Flavio Bonomi and colleagues in a January 2014 paper titled "Fog Computing and Its Role in the Internet of Things." In Cisco's framing, the fog is a layer of compute between the device and the cloud, often implemented in network gateways and routers, that handles aggregation, filtering, and time sensitive processing for many devices at once [6] [7]. The OpenFog Consortium, founded in 2015 by Cisco, Intel, Microsoft, ARM, Dell, and Princeton University, formalized this view before merging into the Linux Foundation's LF Edge in 2018 [8].
Multi-access Edge Computing (MEC), formerly Mobile Edge Computing, is a standardized architecture defined by the European Telecommunications Standards Institute (ETSI). MEC describes how compute and storage are integrated with cellular base stations and aggregation points so that mobile applications can offload work to servers within one wireless hop. ETSI's first MEC group was established in 2014 [9] [10]. Cloudlet is an academic term from a 2009 paper by Mahadev Satyanarayanan, Paramvir Bahl, Ramon Caceres, and Nigel Davies titled "The Case for VM-Based Cloudlets in Mobile Computing," published in IEEE Pervasive Computing. A cloudlet is a small data center that sits one wireless hop from a mobile device, and the ideas later appeared in commercial MEC and telco edge offerings [2] [11].
On device computing describes computation that runs entirely on the end device with no network round trip. Near edge versus far edge is a vendor distinction: the near edge sits inside operator data centers or regional points of presence; the far edge is closer to the device, often on customer premises or inside a base station, with very low latency to a small number of users [12].
Early content delivery networks (CDNs) such as Akamai, founded in 1998 out of MIT, cached static web objects on servers spread around the world, reducing the round trip time and bandwidth cost of serving images and video. CDNs are widely viewed as the first commercially significant edge computing systems, although the focus was on caching content rather than running arbitrary code [13]. In 2009, the Carnegie Mellon paper "The Case for VM-Based Cloudlets in Mobile Computing" framed a more general problem: smartphones lacked the compute and energy needed for tasks such as augmented reality, and the wide area network round trip to a cloud data center was too slow. Satyanarayanan and his co authors proposed putting small clusters of servers running virtual machines one hop away from mobile users [2]. Cisco's 2014 fog computing paper made an industrial case for the same concept tied to the Internet of Things, arguing that the volume of sensor data from connected vehicles, smart grids, and industrial machinery would overwhelm naive cloud only architectures [6]. Cisco founded the OpenFog Consortium in 2015 [8].
In parallel, smartphones gained dedicated neural processing units. Apple's A11 Bionic, shipped in the iPhone 8 and iPhone X in September 2017, was the first widely deployed chip in a consumer phone with a dedicated Apple Neural Engine, capable of 600 billion operations per second. Google's Pixel 2 the same fall included the Pixel Visual Core, and Huawei's Kirin 970 introduced a Cambricon NPU. Within three years, NPUs were standard across mid range and flagship phones [14] [15]. The rollout of 5G cellular networks starting in 2019 brought renewed focus on telco edge: the 5G specification supports ultra reliable low latency communication (URLLC) with single digit millisecond targets, which only makes sense if applications can be hosted within the radio access network [10] [16].
A more recent inflection arrived in 2024 with the on device large language model. In June 2024, Apple announced Apple Intelligence at WWDC, describing a roughly 3 billion parameter on device foundation model running on iPhone 15 Pro, iPhone 16, iPad with M series silicon, and Mac, with harder queries handled by Private Cloud Compute [3]. Google had already shipped Gemini Nano on the Pixel 8 Pro in October 2023, with expanded availability on the Pixel 9 family in 2024 [4]. Microsoft released Phi-3-mini, a 3.8 billion parameter model, in April 2024 explicitly framed as suitable for phones [5]. In September 2024, Meta released Llama 3.2 1B and 3B variants positioned for on device deployment [17].
Edge computing rarely replaces cloud computing. The dominant architectures are hierarchical, with tiers handling different workloads.
| Tier | Typical location | Latency to device | Example workloads |
|---|---|---|---|
| Cloud (deep) | Hyperscale region, hundreds to thousands of kilometers away | 30 to 200 ms | Model training, batch analytics, archive storage |
| Regional or near edge | Carrier data center, metro point of presence | 10 to 50 ms | CDN dynamic logic, regulatory data residency, large model inference |
| Far edge or fog | Telco base station, factory, branch office | 1 to 10 ms | Industrial control, AR rendering, video analytics |
| Device or on device | Phone, vehicle, sensor, robot, wearable | 0 ms (no network hop) | Wake word detection, perception, on device LLMs, biometrics |
Edge gateways and routers sit between local devices and an upstream network. Software platforms such as AWS IoT Greengrass, Azure IoT Edge, and Google Distributed Cloud Edge provide containerized application hosting, message routing, local model serving, and offline operation [18] [19] [20]. Telco edge integrates compute with cellular networks: AWS Wavelength, launched in late 2019, places AWS compute inside the Verizon, Vodafone, and KDDI radio access network. Azure Edge Zones and Google Cloud Edge follow similar patterns [9] [16].
CDN as edge is the model where a content delivery network grows into a programmable runtime. Cloudflare Workers, launched in September 2017, runs JavaScript and WebAssembly in V8 isolates across Cloudflare's roughly 330 city footprint with cold start times under five milliseconds [21]. Fastly Compute@Edge, generally available in 2020, uses Lucet and later Wasmtime [22]. AWS Lambda@Edge runs Lambda functions at AWS CloudFront edge locations. On device runtimes push the architecture into the endpoint: a modern smartphone may run a wake word model on a digital signal processor, face recognition on an NPU, speech to text on the GPU, and an on device LLM on a combination of NPU and CPU, all without leaving the device.
The case for placing compute at the edge rests on several recurring arguments.
Latency. The round trip time from a consumer device to a hyperscale cloud region is typically 30 to 200 milliseconds. For augmented reality, robotic teleoperation, autonomous driving, and high frequency industrial control, that is too slow. Edge architectures can deliver under 10 millisecond round trips and sometimes under 1 millisecond when compute is colocated with the device [9] [10]. Bandwidth. A single 4K video camera generates roughly 25 megabits per second of compressed video, which is multiple terabytes per camera per month. A factory with a hundred such cameras cannot reasonably push that traffic to the cloud. Running detection models locally and only forwarding events or metadata reduces upstream bandwidth by orders of magnitude [1] [12].
Privacy and data sovereignty. Regulations including the European Union's GDPR, the EU Data Act, the California Consumer Privacy Act, and HIPAA in the United States make it expensive or impossible to send certain data to a foreign jurisdiction. Edge processing keeps data on the device or within a country's borders, simplifying compliance [23]. Offline operation, cost, and real time control. Vehicles, ships, drones, and remote industrial sites often operate without continuous connectivity, so edge systems must buffer and process locally. Cloud egress fees and transit pipes are real line items, and running anomaly detection at the edge and only shipping flagged events is materially cheaper than streaming raw telemetry [12]. Industrial automation, robotic surgery, and autonomous driving also have hard timing requirements: the functional safety case usually requires that critical loops run locally [10].
A major share of contemporary edge computing investment is driven by AI inference. Models trained in the cloud are compressed, quantized, and deployed onto end devices or edge servers so that perception, generation, and decisioning can happen in real time without a network round trip.
A wide range of chips compete for the AI inference workload at the edge. Most use a mix of dense matrix multiply units, on chip SRAM, and toolchains that convert trained models into the chip's native instruction set.
| Chip family | Vendor | Notes |
|---|---|---|
| Apple Neural Engine | Apple | First shipped in A11 Bionic (September 2017) at 0.6 TOPS; M4 Neural Engine reaches 38 TOPS [14] [24] |
| NVIDIA Jetson | NVIDIA | Jetson AGX Orin delivers 275 INT8 TOPS; Jetson Thor announced 2024 for humanoid robotics [25] [26] |
| Coral Edge TPU | 4 INT8 TOPS at roughly 2 W; targeted at small embedded vision [27] | |
| Hexagon NPU | Qualcomm | Snapdragon 8 Gen 3 advertises 45 INT8 TOPS for on device generative AI [28] |
| Hailo-8 / Hailo-10 | Hailo | Hailo-8 reaches 26 TOPS at about 2.5 W; Hailo-10 announced 2024 for on device LLM inference [29] |
| Mythic AMP | Mythic AI | Analog matrix processor performing multiplications in flash memory; targeted at low power vision |
| Movidius Myriad / Gaudi | Intel | Movidius Myriad X reaches 4 TOPS; Gaudi targets data center training and inference [30] |
| Grayskull / Wormhole / Blackhole | Tenstorrent | RISC-V based AI accelerators founded by Jim Keller's team |
| Ryzen AI XDNA | AMD | First shipped in Ryzen 7040 (2023); Ryzen AI 300 series reaches 50 TOPS [31] |
| FSD chip | Tesla | Custom dual redundant inference SoC at 144 TOPS per chip [32] |
| EyeQ | Mobileye | EyeQ6H delivers 34 TOPS at about 33 W; widely deployed in production vehicles [33] |
The diversity of accelerators is a strength for cost and power optimization but creates real friction for developers who must support several toolchains.
Until 2023, on device AI was dominated by perception tasks: speech recognition, computer vision, sensor fusion. The arrival of small but capable language models has expanded the on device frontier into generative AI.
Apple Intelligence, announced at WWDC 2024, includes a roughly 3 billion parameter on device foundation model that runs on the Apple A17 Pro, A18, or M series silicon. The model is fine tuned with task specific adapters for writing tools, summarization, and notification triage. Heavier queries route to Private Cloud Compute [3] [34]. Gemini Nano is Google's smallest Gemini variant and shipped first on the Pixel 8 Pro in October 2023, with the Pixel 9 family extending availability in 2024. It powers Recorder summarization, Smart Reply in Gboard, and on device descriptions for accessibility, and Google has exposed it through AICore and the Chrome Built-in AI APIs [4] [35].
Phi-3-mini, released by Microsoft in April 2024, is a 3.8 billion parameter dense Transformer trained on 3.3 trillion tokens of "textbook quality" data. Quantized variants run at roughly 1.8 GB on Apple silicon and reach interactive token rates on a phone [5]. The Phi family expanded with Phi-3-small, Phi-3-medium, Phi-3.5-mini, and Phi-3.5-vision before Phi-4 in late 2024. Llama 3.2 added 1 billion and 3 billion parameter text variants in September 2024 for on device use, with Qualcomm and MediaTek announcing day one support on Snapdragon and Dimensity NPUs [17].
The key enabler for on device LLMs is quantization. Models trained in 16 or 32 bit floating point are converted to lower precision integer formats, typically INT8 or INT4 for weights. Combined with pruning and weight sharing, quantization brings a 7 billion parameter model from roughly 14 GB at FP16 down to under 4 GB at INT4, fitting in mobile RAM [36]. Methods such as GPTQ, AWQ, and Apple's group quantization are now standard.
Deploying a trained model to edge hardware requires a runtime that maps operators onto accelerators.
TensorFlow Lite, announced in 2017, is Google's mobile and embedded inference runtime. In 2024, Google rebranded it to LiteRT to reflect support for PyTorch and JAX models as well as TensorFlow. LiteRT runs on Android, iOS, embedded Linux, and microcontrollers with delegates for GPU, NPU, and Coral Edge TPU [37]. Core ML, introduced at WWDC 2017, is Apple's on device framework: it compiles models into a unified format that the operating system schedules across CPU, GPU, and Apple Neural Engine, with conversion from PyTorch and TensorFlow via coremltools [24]. ONNX Runtime Mobile is Microsoft's cross platform runtime, supporting DirectML on Windows, CoreML on Apple, NNAPI on Android, and Hexagon on Qualcomm devices, and is the runtime behind much of Phi-3 on device deployment [38].
ExecuTorch, announced by Meta in October 2023, is a successor to PyTorch Mobile that uses the PyTorch 2.x export pipeline and backends for XNNPACK, Vulkan, Apple Core ML, MediaTek APU, and Qualcomm Hexagon, reaching 1.0 in late 2024 [39]. MediaPipe, originally a Google internal framework for streaming media perception, became open source in 2019 and now includes the MediaPipe LLM Inference API for on device language models [40]. MLC-LLM from CMU, UW, and Shanghai Jiao Tong University compiles language models for native execution on iPhones, Android phones, AMD GPUs, and WebGPU [41]. llama.cpp is Georgi Gerganov's open source project, started in March 2023, that demonstrated running quantized Llama models on consumer CPUs using a custom tensor library called ggml. It popularized the GGUF model format and underlies LM Studio, Ollama, and Jan [42] [43].
Edge computing is most visible in industries with strict latency, bandwidth, privacy, or reliability requirements.
Autonomous vehicles are the canonical edge AI workload. Tesla's Tesla FSD chip, introduced in 2019 with Hardware 3.0, runs full self driving inference on dual redundant SoCs delivering 144 TOPS each. NVIDIA's Drive Orin platform, launched in 2022, scales from 254 TOPS to over 1000 TOPS in the Drive Thor successor announced for production in 2025. Mobileye's EyeQ family is widely used in ADAS systems by BMW, Volkswagen, Ford, and Geely. All three vendors run perception, prediction, and planning models on the vehicle, with cloud services used for fleet learning and map updates rather than real time control [32] [33] [44].
Industrial IoT and Industry 4.0 uses edge computing for predictive maintenance, vision quality control, and process optimization. Siemens, Bosch, Schneider Electric, ABB, and Rockwell Automation all ship edge platforms that run vibration, acoustic, thermal, and visual models on factory floors [45]. Smart cities deploy edge AI in traffic cameras for pedestrian and vehicle counting, license plate recognition, and energy systems. Cisco, Hikvision, Dahua, Axis, and Verkada sell cameras with on device inference, and privacy regulations have nudged many municipalities toward edge processing that returns aggregated counts rather than identities.
Retail uses edge computing for cashier free stores including Amazon Just Walk Out, Standard AI, Trigo, and AiFi, with ceiling mounted cameras and weight sensors feeding on premises inference servers [46]. Healthcare applications include bedside monitoring, surgical robotics, point of care diagnostics, and at home cardiac monitors. Apple's ECG and atrial fibrillation features, fall detection, and crash detection on iPhone and Apple Watch are large scale consumer examples [47]. Wearables including Apple Watch, Garmin, Fitbit, Whoop, Oura, and Polar all run on device models for heart rate, blood oxygen, sleep staging, and step counting.
AR and VR. Apple Vision Pro, Meta Quest, and ByteDance's Pico headsets run hand tracking, eye tracking, foveated rendering, and avatar rendering on device. Latency targets in the low tens of milliseconds make cloud only deployment unworkable [48]. Drones including Skydio, Parrot Anafi, and DJI's enterprise series use on device tracking, obstacle avoidance, and visual inertial odometry; Skydio's X10 emphasizes fully on board autonomy [49]. Telecom has adopted edge for network function virtualization and 5G core deployment in software. Gaming services such as NVIDIA GeForce NOW and Xbox Cloud Gaming stream rendered frames from edge GPU servers, while clients run upscaling models such as DLSS, FSR, and XeSS locally. Google's Stadia shut down in January 2023 in part because the latency budget for input to display was difficult to meet outside dense urban areas.
Edge and cloud architectures are complementary rather than competitive in most systems.
| Dimension | Edge | Cloud |
|---|---|---|
| Latency | Low to ultra low (under 10 ms common, under 1 ms possible) | Higher (30 to 200 ms typical) |
| Compute capacity | Limited by power and form factor | Effectively unbounded |
| Energy budget | Tight (battery, fanless, embedded) | Loose (data center power) |
| Bandwidth efficiency | High (only ship results) | Low (must ship raw data) |
| Privacy and sovereignty | Strong by default | Depends on regional offerings |
| Offline operation | Yes | No |
| Update and patching | Hard at scale across heterogeneous fleets | Easy (single environment) |
| Hardware homogeneity | Fragmented (many SoCs, OSes, accelerators) | Uniform within a provider |
| Best workload examples | Perception, real time control, biometrics, on device LLM | Training, batch analytics, large model inference |
Hybrid architectures dominate in practice. Models are trained in the cloud, compressed, then deployed at the edge. Sensitive inputs are processed on device, with anonymized features or harder queries escalated to a cloud or near edge tier. Apple's combination of on device foundation models with Private Cloud Compute is an explicit articulation of this design [3] [34].
AWS IoT Greengrass, launched in June 2017, runs a Lambda compatible container runtime on edge devices, with synchronization to AWS IoT Core, local model inference via SageMaker Edge Manager, and offline operation against a local message broker [18]. Azure IoT Edge became generally available in June 2018; it packages workloads as containers and deploys them through Azure IoT Hub, often paired with Azure Stack Edge hardware [19]. Google Distributed Cloud Edge brings Google's Anthos stack onto customer hardware with Vertex AI model serving at the edge, and Google operates Google Distributed Cloud Hosted for air gapped sovereign environments [20].
Cloudflare Workers AI, launched in September 2023, runs a curated catalog of models including Llama, Mistral, and Stable Diffusion variants on GPU equipped servers across a subset of Cloudflare's network, callable from Workers code [50]. Fastly Compute@Edge is a programmable runtime built on the Lucet and later Wasmtime WebAssembly engines [22]. Akamai Connected Cloud combines Akamai's CDN footprint with Linode's compute regions and the EdgeWorkers serverless platform following Akamai's acquisition of Linode in 2022.
ETSI MEC (Multi-access Edge Computing) is the main international standard for telco edge. The ETSI MEC Industry Specification Group has published more than thirty specifications since 2014 covering reference architecture, application enablement, location services, and 5G integration [9]. LF Edge, a Linux Foundation umbrella formed in January 2019, brings together open source edge projects including EdgeX Foundry (industrial IoT), Akraino Edge Stack (telco blueprints), Project EVE (hypervisor for edge devices), and Open Horizon (workload management). LF Edge absorbed the OpenFog Consortium in 2018 [8] [51]. 3GPP standards drive the cellular edge through 5G specifications including the user plane function (UPF) and slicing features that enable per application edge breakout [16].
The privacy and sovereignty argument for edge computing has grown sharper as regulators and consumers focus on AI data flows.
Private Cloud Compute, announced by Apple at WWDC 2024, extends the privacy properties of on device computing to a server side AI service. Apple servers built on Apple silicon run a hardened operating system, with code transparency through verifiable build images, no persistent storage of user data, and cryptographic attestation that a request is being handled by a verified node. The design is intended to let security researchers confirm that the deployed software matches the published source [3] [34]. GDPR and the EU Data Act push organizations to keep personal data within the European Union and to give data subjects rights of access, correction, and erasure. Edge processing, where personal data never leaves the device or local network, simplifies compliance compared to centralized cloud processing [23].
Sovereign cloud and on premises AI offerings have become mainstream: Microsoft, AWS, Google, and Oracle each ship sovereign cloud regions, and vendors including NVIDIA, Cisco, Dell, HPE, and SuperMicro sell on premises AI factory configurations to governments, defense organizations, and regulated industries. Confidential computing technologies such as Intel TDX, AMD SEV-SNP, NVIDIA's H100 confidential computing mode, and Apple's secure enclave provide hardware backed isolation used both at the edge and inside near edge data centers.
Edge computing is not a panacea, and the literature catalogs a number of recurring difficulties.
Hardware fragmentation. Unlike cloud environments where x86 and Arm dominate a homogeneous fleet, the edge involves dozens of SoC vendors, multiple operating systems (Linux, Android, RTOS variants, bare metal), and at least a dozen incompatible AI accelerator toolchains. Update and patching at scale across millions of heterogeneous devices is harder than redeploying a service in a cloud region; failed updates can brick devices, and long lived devices outlive support windows for the chips and runtimes they ship with. Security is harder because edge devices are physically reachable, broadening the threat model. Side channel attacks, supply chain attacks, and key extraction have all been demonstrated.
Energy and thermal constraints. Training class workloads are not feasible at the edge in any near term horizon. Even for inference, NPU power budgets are tightly bounded and thermal management is a constraint in fanless designs. Quantization, sparsity, and speculative decoding are responses to this pressure. Deployment complexity also bites: sustaining a fleet of edge sites with consistent monitoring, observability, and capacity planning is operationally heavier than the cloud equivalent.
Marketing dilution. "Edge" has become a marketing label applied to almost any compute that is not in a hyperscale region. Industry observers including Gartner and IDC have noted that vendor claims often refer to architectures that are technically near edge or regional cloud, blurring the distinction with the deep cloud [12].