Edge computing

AI Hardware AI Infrastructure

25 min read

Updated Jun 22, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 22, 2026

Fact-checked

In review queue

Sources

53 citations

Revision

v2 · 4,974 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Edge computing is a distributed computing paradigm that runs computation and data storage close to where data is generated, at the "edge" of the network, instead of sending everything to a centralized cloud computing data center. Its purpose is to cut latency, save bandwidth, keep sensitive data local for privacy and data sovereignty, allow offline operation, and run real time AI inference on resource constrained devices ^[1] ^[2]. The research firm Gartner forecasts that 75% of enterprise generated data will be created and processed outside a traditional centralized data center or cloud by 2025, up from about 10% in 2018, a shift that has made edge a strategic computing tier ^[12]. STL Partners projects the total addressable market for edge computing will reach roughly 424 billion US dollars by 2030, growing at about a 32% compound annual rate ^[52].

Edge computing spans a spectrum of architectures, from server clusters at cellular base stations to neural network accelerators inside smartphones, vehicles, drones, cameras, appliances, and industrial sensors. The closely related term edge AI refers specifically to running machine learning inference on these edge devices; this article treats edge computing as the broader infrastructure paradigm of which edge AI is the dominant modern workload.

Edge computing grew out of three converging trends: the explosion of Internet of Things (IoT) sensors generating data faster than backbone networks could absorb, the rollout of 5G cellular networks with single digit millisecond targets, and the commoditization of neural processing units (NPUs) inside consumer hardware. By the mid 2020s, on device AI moved from speech recognition and camera filters into running compact large language models locally, with examples such as Apple Intelligence, Gemini Nano, and Microsoft's Phi-3 showing that several billion parameter models could run on a phone or laptop without contacting a remote server ^[3] ^[4] ^[5].

Edge computing
Type	Distributed computing paradigm
Coined	"Fog computing" by Cisco, January 2014
Key precursors	Akamai CDN (1998); cloudlet concept (CMU, 2009)
Standards	ETSI MEC; LF Edge; OpenFog (merged into LF Edge)
Adjacent terms	Edge AI; fog computing; cloudlet; on device computing; near edge; far edge
Notable platforms	AWS IoT Greengrass; Azure IoT Edge; Google Distributed Cloud Edge; Cloudflare Workers
Notable hardware	NVIDIA Jetson; Apple Neural Engine; Google Coral; Hailo; Qualcomm Hexagon
Market size	About 24 billion US dollars in 2024; roughly 424 billion US dollars projected for 2030 (STL Partners) ^[52]

What is edge computing, and how do the terms differ?

The field has accumulated overlapping vocabulary, sometimes due to vendor branding and sometimes for legitimate technical distinctions.

Edge computing in its general sense refers to any computation that takes place at or near the source of data, rather than in a centralized cloud region. The idea predates the cloud era and is closely related to content delivery networks and on premises servers ^[1]. Fog computing is a related term coined by Cisco engineer Flavio Bonomi and colleagues in a January 2014 paper titled "Fog Computing and Its Role in the Internet of Things." In Cisco's framing, the fog is a layer of compute between the device and the cloud, often implemented in network gateways and routers, that handles aggregation, filtering, and time sensitive processing for many devices at once ^[6] ^[7]. The OpenFog Consortium, founded in 2015 by Cisco, Intel, Microsoft, ARM, Dell, and Princeton University, formalized this view before merging into the Linux Foundation's LF Edge in 2018 ^[8].

Multi-access Edge Computing (MEC), formerly Mobile Edge Computing, is a standardized architecture defined by the European Telecommunications Standards Institute (ETSI). MEC describes how compute and storage are integrated with cellular base stations and aggregation points so that mobile applications can offload work to servers within one wireless hop. ETSI's first MEC group was established in 2014 ^[9] ^[10]. Cloudlet is an academic term from a 2009 paper by Mahadev Satyanarayanan, Paramvir Bahl, Ramon Caceres, and Nigel Davies titled "The Case for VM-Based Cloudlets in Mobile Computing," published in IEEE Pervasive Computing. A cloudlet is a small data center that sits one wireless hop from a mobile device, and the ideas later appeared in commercial MEC and telco edge offerings ^[2] ^[11].

On device computing describes computation that runs entirely on the end device with no network round trip. Near edge versus far edge is a vendor distinction: the near edge sits inside operator data centers or regional points of presence; the far edge is closer to the device, often on customer premises or inside a base station, with very low latency to a small number of users ^[12].

When did edge computing emerge?

Early content delivery networks (CDNs) such as Akamai, founded in 1998 out of MIT, cached static web objects on servers spread around the world, reducing the round trip time and bandwidth cost of serving images and video. CDNs are widely viewed as the first commercially significant edge computing systems, although the focus was on caching content rather than running arbitrary code ^[13]. In 2009, the Carnegie Mellon paper "The Case for VM-Based Cloudlets in Mobile Computing" framed a more general problem: smartphones lacked the compute and energy needed for tasks such as augmented reality, and the wide area network round trip to a cloud data center was too slow. Satyanarayanan and his co authors proposed putting small clusters of servers running virtual machines one hop away from mobile users ^[2]. Cisco's 2014 fog computing paper made an industrial case for the same concept tied to the Internet of Things, arguing that the volume of sensor data from connected vehicles, smart grids, and industrial machinery would overwhelm naive cloud only architectures ^[6]. Cisco founded the OpenFog Consortium in 2015 ^[8].

In parallel, smartphones gained dedicated neural processing units. Apple's A11 Bionic, shipped in the iPhone 8 and iPhone X in September 2017, was the first widely deployed chip in a consumer phone with a dedicated Apple Neural Engine, capable of 600 billion operations per second. Google's Pixel 2 the same fall included the Pixel Visual Core, and Huawei's Kirin 970 introduced a Cambricon NPU. Within three years, NPUs were standard across mid range and flagship phones ^[14] ^[15]. The rollout of 5G cellular networks starting in 2019 brought renewed focus on telco edge: the 5G specification supports ultra reliable low latency communication (URLLC) with single digit millisecond targets, which only makes sense if applications can be hosted within the radio access network ^[10] ^[16].

A more recent inflection arrived in 2024 with the on device large language model. In June 2024, Apple announced Apple Intelligence at WWDC, describing a roughly 3 billion parameter on device foundation model running on iPhone 15 Pro, iPhone 16, iPad with M series silicon, and Mac, with harder queries handled by Private Cloud Compute ^[3]. Google had already shipped Gemini Nano on the Pixel 8 Pro in October 2023, with expanded availability on the Pixel 9 family in 2024 ^[4]. Microsoft released Phi-3-mini, a 3.8 billion parameter model, in April 2024 explicitly framed as suitable for phones ^[5]. In September 2024, Meta released Llama 3.2 1B and 3B variants positioned for on device deployment; Meta described the release as "revolutionizing edge AI and vision with open, customizable models" ^[17].

How is edge computing architected?

Edge computing rarely replaces cloud computing. The dominant architectures are hierarchical, with tiers handling different workloads.

Tier	Typical location	Latency to device	Example workloads
Cloud (deep)	Hyperscale region, hundreds to thousands of kilometers away	30 to 200 ms	Model training, batch analytics, archive storage
Regional or near edge	Carrier data center, metro point of presence	10 to 50 ms	CDN dynamic logic, regulatory data residency, large model inference
Far edge or fog	Telco base station, factory, branch office	1 to 10 ms	Industrial control, AR rendering, video analytics
Device or on device	Phone, vehicle, sensor, robot, wearable	0 ms (no network hop)	Wake word detection, perception, on device LLMs, biometrics

Edge gateways and routers sit between local devices and an upstream network. Software platforms such as AWS IoT Greengrass, Azure IoT Edge, and Google Distributed Cloud Edge provide containerized application hosting, message routing, local model serving, and offline operation ^[18] ^[19] ^[20]. Telco edge integrates compute with cellular networks: AWS Wavelength, launched in late 2019, places AWS compute inside the Verizon, Vodafone, and KDDI radio access network. Azure Edge Zones and Google Cloud Edge follow similar patterns ^[9] ^[16].

CDN as edge is the model where a content delivery network grows into a programmable runtime. Cloudflare Workers, launched in September 2017, runs JavaScript and WebAssembly in V8 isolates across Cloudflare's roughly 330 city footprint with cold start times under five milliseconds ^[21]. Fastly Compute@Edge, generally available in 2020, uses Lucet and later Wasmtime ^[22]. AWS Lambda@Edge runs Lambda functions at AWS CloudFront edge locations. On device runtimes push the architecture into the endpoint: a modern smartphone may run a wake word model on a digital signal processor, face recognition on an NPU, speech to text on the GPU, and an on device LLM on a combination of NPU and CPU, all without leaving the device.

Why use edge computing instead of the cloud?

The case for placing compute at the edge rests on several recurring arguments.

Latency. The round trip time from a consumer device to a hyperscale cloud region is typically 30 to 200 milliseconds. For augmented reality, robotic teleoperation, autonomous driving, and high frequency industrial control, that is too slow. Edge architectures can deliver under 10 millisecond round trips and sometimes under 1 millisecond when compute is colocated with the device ^[9] ^[10]. Bandwidth. A single 4K video camera generates roughly 25 megabits per second of compressed video, which is multiple terabytes per camera per month. A factory with a hundred such cameras cannot reasonably push that traffic to the cloud. Running detection models locally and only forwarding events or metadata reduces upstream bandwidth by orders of magnitude ^[1] ^[12].

Privacy and data sovereignty. Regulations including the European Union's GDPR, the EU Data Act, the California Consumer Privacy Act, and HIPAA in the United States make it expensive or impossible to send certain data to a foreign jurisdiction. Edge processing keeps data on the device or within a country's borders, simplifying compliance ^[23]. Offline operation, cost, and real time control. Vehicles, ships, drones, and remote industrial sites often operate without continuous connectivity, so edge systems must buffer and process locally. Cloud egress fees and transit pipes are real line items, and running anomaly detection at the edge and only shipping flagged events is materially cheaper than streaming raw telemetry ^[12]. Industrial automation, robotic surgery, and autonomous driving also have hard timing requirements: the functional safety case usually requires that critical loops run locally ^[10].

How does AI inference run at the edge?

A major share of contemporary edge computing investment is driven by AI inference. Models trained in the cloud are compressed, quantized, and deployed onto end devices or edge servers so that perception, generation, and decisioning can happen in real time without a network round trip. This is the domain of edge AI.

What hardware accelerators power edge AI?

A wide range of chips compete for the AI inference workload at the edge. Most use a mix of dense matrix multiply units, on chip SRAM, and toolchains that convert trained models into the chip's native instruction set.

Chip family	Vendor	Notes
Apple Neural Engine	Apple	First shipped in A11 Bionic (September 2017) at 0.6 TOPS; M4 Neural Engine reaches 38 TOPS ^[14] ^[24]
NVIDIA Jetson	NVIDIA	Jetson AGX Orin delivers 275 INT8 TOPS; Jetson Thor, built on NVIDIA Blackwell and generally available August 2025, delivers up to 2,070 FP4 teraflops with 128 GB of memory for humanoid robotics ^[25] ^[26] ^[53]
Coral Edge TPU	Google	4 INT8 TOPS at roughly 2 W; targeted at small embedded vision ^[27]
Hexagon NPU	Qualcomm	Snapdragon 8 Gen 3 advertises 45 INT8 TOPS for on device generative AI ^[28]
Hailo-8 / Hailo-10	Hailo	Hailo-8 reaches 26 TOPS at about 2.5 W; Hailo-10 announced 2024 for on device LLM inference ^[29]
Mythic AMP	Mythic AI	Analog matrix processor performing multiplications in flash memory; targeted at low power vision
Movidius Myriad / Gaudi	Intel	Movidius Myriad X reaches 4 TOPS; Gaudi targets data center training and inference ^[30]
Grayskull / Wormhole / Blackhole	Tenstorrent	RISC-V based AI accelerators founded by Jim Keller's team
Ryzen AI XDNA	AMD	First shipped in Ryzen 7040 (2023); Ryzen AI 300 series reaches 50 TOPS ^[31]
FSD chip	Tesla	Custom dual redundant inference SoC at 144 TOPS per chip ^[32]
EyeQ	Mobileye	EyeQ6H delivers 34 TOPS at about 33 W; widely deployed in production vehicles ^[33]

NVIDIA frames Jetson Thor as "the ultimate platform for physical AI," claiming 7.5 times the AI compute and 3.5 times the energy efficiency of the previous Jetson Orin generation ^[53]. The diversity of accelerators is a strength for cost and power optimization but creates real friction for developers who must support several toolchains.

Can large language models run on device?

Until 2023, on device AI was dominated by perception tasks: speech recognition, computer vision, sensor fusion. The arrival of the small language model, a compact but capable LLM, has expanded the on device frontier into generative AI.

Apple Intelligence, announced at WWDC 2024, includes a roughly 3 billion parameter on device foundation model that runs on the Apple A17 Pro, A18, or M series silicon. The model is fine tuned with task specific adapters for writing tools, summarization, and notification triage. Heavier queries route to Private Cloud Compute ^[3] ^[34]. Gemini Nano is Google's smallest Gemini variant and shipped first on the Pixel 8 Pro in October 2023, with the Pixel 9 family extending availability in 2024. It powers Recorder summarization, Smart Reply in Gboard, and on device descriptions for accessibility, and Google has exposed it through AICore and the Chrome Built-in AI APIs ^[4] ^[35].

Phi-3-mini, released by Microsoft in April 2024, is a 3.8 billion parameter dense Transformer trained on 3.3 trillion tokens of "textbook quality" data. Quantized variants run at roughly 1.8 GB on Apple silicon and reach interactive token rates on a phone ^[5]. The Phi family expanded with Phi-3-small, Phi-3-medium, Phi-3.5-mini, and Phi-3.5-vision before Phi-4 in late 2024. Llama 3.2 added 1 billion and 3 billion parameter text variants in September 2024 for on device use, with Qualcomm and MediaTek announcing day one support on Snapdragon and Dimensity NPUs ^[17].

The key enabler for on device LLMs is quantization. Models trained in 16 or 32 bit floating point are converted to lower precision integer formats, typically INT8 or INT4 for weights. Combined with pruning and weight sharing, quantization brings a 7 billion parameter model from roughly 14 GB at FP16 down to under 4 GB at INT4, fitting in mobile RAM ^[36]. Methods such as GPTQ, AWQ, and Apple's group quantization are now standard.

What frameworks deploy models to the edge?

Deploying a trained model to edge hardware requires a runtime that maps operators onto accelerators.

TensorFlow Lite, announced in 2017, is Google's mobile and embedded inference runtime. In 2024, Google rebranded it to LiteRT to reflect support for PyTorch and JAX models as well as TensorFlow. LiteRT runs on Android, iOS, embedded Linux, and microcontrollers with delegates for GPU, NPU, and Coral Edge TPU ^[37]. Core ML, introduced at WWDC 2017, is Apple's on device framework: it compiles models into a unified format that the operating system schedules across CPU, GPU, and Apple Neural Engine, with conversion from PyTorch and TensorFlow via coremltools ^[24]. ONNX Runtime Mobile is Microsoft's cross platform runtime, supporting DirectML on Windows, CoreML on Apple, NNAPI on Android, and Hexagon on Qualcomm devices, and is the runtime behind much of Phi-3 on device deployment ^[38].

ExecuTorch, announced by Meta in October 2023, is a successor to PyTorch Mobile that uses the PyTorch 2.x export pipeline and backends for XNNPACK, Vulkan, Apple Core ML, MediaTek APU, and Qualcomm Hexagon, reaching 1.0 in late 2024 ^[39]. MediaPipe, originally a Google internal framework for streaming media perception, became open source in 2019 and now includes the MediaPipe LLM Inference API for on device language models ^[40]. MLC-LLM from CMU, UW, and Shanghai Jiao Tong University compiles language models for native execution on iPhones, Android phones, AMD GPUs, and WebGPU ^[41]. llama.cpp is Georgi Gerganov's open source project, started in March 2023, that demonstrated running quantized Llama models on consumer CPUs using a custom tensor library called ggml. It popularized the GGUF model format and underlies LM Studio, Ollama, and Jan ^[42] ^[43].

What industries use edge computing?

Edge computing is most visible in industries with strict latency, bandwidth, privacy, or reliability requirements.

Autonomous vehicles are the canonical edge AI workload. Tesla's Tesla FSD chip, introduced in 2019 with Hardware 3.0, runs full self driving inference on dual redundant SoCs delivering 144 TOPS each. NVIDIA's Drive Orin platform, launched in 2022, scales from 254 TOPS to over 1000 TOPS in the Drive Thor successor announced for production in 2025. Mobileye's EyeQ family is widely used in ADAS systems by BMW, Volkswagen, Ford, and Geely. All three vendors run perception, prediction, and planning models on the vehicle, with cloud services used for fleet learning and map updates rather than real time control ^[32] ^[33] ^[44].

Industrial IoT and Industry 4.0 uses edge computing for predictive maintenance, vision quality control, and process optimization. Siemens, Bosch, Schneider Electric, ABB, and Rockwell Automation all ship edge platforms that run vibration, acoustic, thermal, and visual models on factory floors ^[45]. Smart cities deploy edge AI in traffic cameras for pedestrian and vehicle counting, license plate recognition, and energy systems. Cisco, Hikvision, Dahua, Axis, and Verkada sell cameras with on device inference, and privacy regulations have nudged many municipalities toward edge processing that returns aggregated counts rather than identities.

Retail uses edge computing for cashier free stores including Amazon Just Walk Out, Standard AI, Trigo, and AiFi, with ceiling mounted cameras and weight sensors feeding on premises inference servers ^[46]. Healthcare applications include bedside monitoring, surgical robotics, point of care diagnostics, and at home cardiac monitors. Apple's ECG and atrial fibrillation features, fall detection, and crash detection on iPhone and Apple Watch are large scale consumer examples ^[47]. Wearables including Apple Watch, Garmin, Fitbit, Whoop, Oura, and Polar all run on device models for heart rate, blood oxygen, sleep staging, and step counting.

AR and VR. Apple Vision Pro, Meta Quest, and ByteDance's Pico headsets run hand tracking, eye tracking, foveated rendering, and avatar rendering on device. Latency targets in the low tens of milliseconds make cloud only deployment unworkable ^[48]. Drones including Skydio, Parrot Anafi, and DJI's enterprise series use on device tracking, obstacle avoidance, and visual inertial odometry; Skydio's X10 emphasizes fully on board autonomy ^[49]. Telecom has adopted edge for network function virtualization and 5G core deployment in software. Gaming services such as NVIDIA GeForce NOW and Xbox Cloud Gaming stream rendered frames from edge GPU servers, while clients run upscaling models such as DLSS, FSR, and XeSS locally. Google's Stadia shut down in January 2023 in part because the latency budget for input to display was difficult to meet outside dense urban areas.

How does edge compare to the cloud?

Edge and cloud architectures are complementary rather than competitive in most systems.

Dimension	Edge	Cloud
Latency	Low to ultra low (under 10 ms common, under 1 ms possible)	Higher (30 to 200 ms typical)
Compute capacity	Limited by power and form factor	Effectively unbounded
Energy budget	Tight (battery, fanless, embedded)	Loose (data center power)
Bandwidth efficiency	High (only ship results)	Low (must ship raw data)
Privacy and sovereignty	Strong by default	Depends on regional offerings
Offline operation	Yes	No
Update and patching	Hard at scale across heterogeneous fleets	Easy (single environment)
Hardware homogeneity	Fragmented (many SoCs, OSes, accelerators)	Uniform within a provider
Best workload examples	Perception, real time control, biometrics, on device LLM	Training, batch analytics, large model inference

Hybrid architectures dominate in practice. Models are trained in the cloud, compressed, then deployed at the edge. Sensitive inputs are processed on device, with anonymized features or harder queries escalated to a cloud or near edge tier. Apple's combination of on device foundation models with Private Cloud Compute is an explicit articulation of this design ^[3] ^[34].

What are the major edge platforms?

AWS IoT Greengrass, launched in June 2017, runs a Lambda compatible container runtime on edge devices, with synchronization to AWS IoT Core, local model inference via SageMaker Edge Manager, and offline operation against a local message broker ^[18]. Azure IoT Edge became generally available in June 2018; it packages workloads as containers and deploys them through Azure IoT Hub, often paired with Azure Stack Edge hardware ^[19]. Google Distributed Cloud Edge brings Google's Anthos stack onto customer hardware with Vertex AI model serving at the edge, and Google operates Google Distributed Cloud Hosted for air gapped sovereign environments ^[20].

Cloudflare Workers AI, launched in September 2023, runs a curated catalog of models including Llama, Mistral, and Stable Diffusion variants on GPU equipped servers across a subset of Cloudflare's network, callable from Workers code ^[50]. Fastly Compute@Edge is a programmable runtime built on the Lucet and later Wasmtime WebAssembly engines ^[22]. Akamai Connected Cloud combines Akamai's CDN footprint with Linode's compute regions and the EdgeWorkers serverless platform following Akamai's acquisition of Linode in 2022.

What standards govern edge computing?

ETSI MEC (Multi-access Edge Computing) is the main international standard for telco edge. The ETSI MEC Industry Specification Group has published more than thirty specifications since 2014 covering reference architecture, application enablement, location services, and 5G integration ^[9]. LF Edge, a Linux Foundation umbrella formed in January 2019, brings together open source edge projects including EdgeX Foundry (industrial IoT), Akraino Edge Stack (telco blueprints), Project EVE (hypervisor for edge devices), and Open Horizon (workload management). LF Edge absorbed the OpenFog Consortium in 2018 ^[8] ^[51]. 3GPP standards drive the cellular edge through 5G specifications including the user plane function (UPF) and slicing features that enable per application edge breakout ^[16].

How does edge computing protect privacy and sovereignty?

The privacy and sovereignty argument for edge computing has grown sharper as regulators and consumers focus on AI data flows.

Private Cloud Compute, announced by Apple at WWDC 2024, extends the privacy properties of on device computing to a server side AI service. Apple servers built on Apple silicon run a hardened operating system, with code transparency through verifiable build images, no persistent storage of user data, and cryptographic attestation that a request is being handled by a verified node. Apple's security team describes Private Cloud Compute as "the most advanced security architecture ever deployed for cloud AI compute at scale," intended to let security researchers confirm that the deployed software matches the published source ^[3] ^[34]. GDPR and the EU Data Act push organizations to keep personal data within the European Union and to give data subjects rights of access, correction, and erasure. Edge processing, where personal data never leaves the device or local network, simplifies compliance compared to centralized cloud processing ^[23].

Sovereign cloud and on premises AI offerings have become mainstream: Microsoft, AWS, Google, and Oracle each ship sovereign cloud regions, and vendors including NVIDIA, Cisco, Dell, HPE, and SuperMicro sell on premises AI factory configurations to governments, defense organizations, and regulated industries. Confidential computing technologies such as Intel TDX, AMD SEV-SNP, NVIDIA's H100 confidential computing mode, and Apple's secure enclave provide hardware backed isolation used both at the edge and inside near edge data centers.

What are the limitations of edge computing?

Edge computing is not a panacea, and the literature catalogs a number of recurring difficulties.

Hardware fragmentation. Unlike cloud environments where x86 and Arm dominate a homogeneous fleet, the edge involves dozens of SoC vendors, multiple operating systems (Linux, Android, RTOS variants, bare metal), and at least a dozen incompatible AI accelerator toolchains. Update and patching at scale across millions of heterogeneous devices is harder than redeploying a service in a cloud region; failed updates can brick devices, and long lived devices outlive support windows for the chips and runtimes they ship with. Security is harder because edge devices are physically reachable, broadening the threat model. Side channel attacks, supply chain attacks, and key extraction have all been demonstrated.

Energy and thermal constraints. Training class workloads are not feasible at the edge in any near term horizon. Even for inference, NPU power budgets are tightly bounded and thermal management is a constraint in fanless designs. Quantization, sparsity, and speculative decoding are responses to this pressure. Deployment complexity also bites: sustaining a fleet of edge sites with consistent monitoring, observability, and capacity planning is operationally heavier than the cloud equivalent.

Marketing dilution. "Edge" has become a marketing label applied to almost any compute that is not in a hyperscale region. Industry observers including Gartner and IDC have noted that vendor claims often refer to architectures that are technically near edge or regional cloud, blurring the distinction with the deep cloud ^[12].

References

Wikipedia. "Edge computing." https://en.wikipedia.org/wiki/Edge_computing ↩
Satyanarayanan, M., Bahl, P., Caceres, R., and Davies, N. (2009). "The Case for VM-Based Cloudlets in Mobile Computing." *IEEE Pervasive Computing*. https://www.cs.cmu.edu/~satya/docdir/satya-ieeepvc-cloudlets-2009.pdf ↩
Apple. "Introducing Apple Intelligence for iPhone, iPad, and Mac." Apple Newsroom, June 10, 2024. https://www.apple.com/newsroom/2024/06/introducing-apple-intelligence-for-iphone-ipad-and-mac/ ↩
Google. "Pixel 8 Pro: The first smartphone with AI built in." Google Blog, October 4, 2023. https://blog.google/products/pixel/google-pixel-8-pro/ ↩
Microsoft. "Tiny but mighty: The Phi-3 small language models with big potential." Microsoft AI Blog, April 23, 2024. https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/ ↩
Bonomi, F., Milito, R., Zhu, J., and Addepalli, S. (2012). "Fog Computing and Its Role in the Internet of Things." *Proceedings of the First Edition of the MCC Workshop on Mobile Cloud Computing*. https://dl.acm.org/doi/10.1145/2342509.2342513 ↩
Cisco. "Fog Computing and the Internet of Things: Extend the Cloud to Where the Things Are." White paper, 2015. https://www.cisco.com/c/dam/en_us/solutions/trends/iot/docs/computing-overview.pdf ↩
Wikipedia. "OpenFog Consortium." https://en.wikipedia.org/wiki/OpenFog_Consortium ↩
ETSI. "Multi-access Edge Computing (MEC)." https://www.etsi.org/technologies/multi-access-edge-computing ↩
Hu, Y. C., Patel, M., Sabella, D., Sprecher, N., and Young, V. (2015). "Mobile Edge Computing: A key technology towards 5G." ETSI White Paper No. 11. https://www.etsi.org/images/files/ETSIWhitePapers/etsi_wp11_mec_a_key_technology_towards_5g.pdf ↩
Carnegie Mellon University. "Living Edge Lab." https://cmu-living-edge-lab.github.io/ ↩
Gartner. "What Edge Computing Means for Infrastructure and Operations Leaders." https://www.gartner.com/smarterwithgartner/what-edge-computing-means-for-infrastructure-and-operations-leaders ↩
Wikipedia. "Akamai Technologies." https://en.wikipedia.org/wiki/Akamai_Technologies ↩
Apple. "The future is here: iPhone X." Apple Newsroom, September 12, 2017. https://www.apple.com/newsroom/2017/09/the-future-is-here-iphone-x/ ↩
Wikipedia. "AI accelerator." https://en.wikipedia.org/wiki/AI_accelerator ↩
3GPP. "5G specifications." https://www.3gpp.org/specifications-technologies/releases ↩
Meta. "Llama 3.2: Revolutionizing edge AI and vision with open, customizable models." Meta AI Blog, September 25, 2024. https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/ ↩
AWS. "AWS IoT Greengrass." https://aws.amazon.com/greengrass/ ↩
Microsoft. "Azure IoT Edge." https://azure.microsoft.com/en-us/products/iot-edge ↩
Google Cloud. "Distributed Cloud." https://cloud.google.com/distributed-cloud ↩
Cloudflare. "Welcome to Cloudflare Workers." https://blog.cloudflare.com/cloudflare-workers-unleashed/ ↩
Fastly. "Compute@Edge." https://www.fastly.com/products/edge-compute ↩
European Commission. "General Data Protection Regulation." https://gdpr-info.eu/ ↩
Apple. "Core ML." https://developer.apple.com/documentation/coreml ↩
NVIDIA. "Jetson AGX Orin Series Technical Brief." https://www.nvidia.com/content/dam/en-zz/Solutions/gtcf21/jetson-orin/nvidia-jetson-agx-orin-technical-brief.pdf ↩
NVIDIA. "Jetson Thor for the Age of Physical AI." NVIDIA Developer Blog. https://developer.nvidia.com/blog/introducing-nvidia-jetson-thor-the-ultimate-platform-for-physical-ai/ ↩
Google. "Coral: Edge TPU performance benchmarks." https://coral.ai/docs/edgetpu/benchmarks/ ↩
Qualcomm. "Snapdragon 8 Gen 3 Mobile Platform." https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-8-series-mobile-platforms/snapdragon-8-gen-3-mobile-platform ↩
Hailo. "Hailo-10 Generative AI Acceleration Module." https://hailo.ai/products/ai-accelerators/hailo-10h-m2-generative-ai-acceleration-module/ ↩
Intel. "Intel Movidius Myriad X VPU." https://www.intel.com/content/www/us/en/products/details/processors/movidius-vpu/movidius-myriad-x.html ↩
AMD. "AMD Ryzen AI." https://www.amd.com/en/products/processors/consumer/ryzen-ai.html ↩
Tesla. "Tesla Autonomy Day 2019: Full Self-Driving Hardware." April 22, 2019. https://www.tesla.com/blog/tesla-autonomy-investor-day ↩
Mobileye. "EyeQ Family." https://www.mobileye.com/our-technology/eyeq-chip/ ↩
Apple Security Engineering. "Private Cloud Compute: A new frontier for AI privacy in the cloud." June 2024. https://security.apple.com/blog/private-cloud-compute/ ↩
Google. "Built-in AI in Chrome." https://developer.chrome.com/docs/ai/built-in ↩
Frantar, E., Ashkboos, S., Hoefler, T., and Alistarh, D. (2022). "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers." arXiv:2210.17323. https://arxiv.org/abs/2210.17323 ↩
Google. "LiteRT (formerly TensorFlow Lite)." https://ai.google.dev/edge/litert ↩
Microsoft. "ONNX Runtime: cross-platform, high performance ML inferencing." https://onnxruntime.ai/ ↩
Meta. "ExecuTorch: A new tool for deploying PyTorch models on the edge." PyTorch Blog, October 2023. https://pytorch.org/executorch/ ↩
Google. "MediaPipe LLM Inference API." https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference ↩
MLC team. "MLC LLM: Universal LLM deployment engine with ML compilation." https://llm.mlc.ai/ ↩
Gerganov, G. "llama.cpp." https://github.com/ggerganov/llama.cpp ↩
Gerganov, G. "ggml." https://github.com/ggerganov/ggml ↩
NVIDIA. "NVIDIA Drive Thor." https://www.nvidia.com/en-us/self-driving-cars/thor/ ↩
Siemens. "Industrial Edge." https://www.siemens.com/global/en/products/automation/topic-areas/industrial-edge.html ↩
Amazon. "Just Walk Out technology." https://www.justwalkout.com/ ↩
Apple. "Apple Watch Series 9." https://www.apple.com/apple-watch-series-9/ ↩
Apple. "Apple Vision Pro." https://www.apple.com/apple-vision-pro/ ↩
Skydio. "Skydio X10." https://www.skydio.com/skydio-x10 ↩
Cloudflare. "Workers AI: serverless GPU-powered inference on Cloudflare's global network." https://blog.cloudflare.com/workers-ai/ ↩
LF Edge. "LF Edge: An Umbrella Organization for Edge Computing." https://www.lfedge.org/ ↩
STL Partners. "Edge computing market sizing forecast." https://stlpartners.com/research/edge-computing-market-sizing-2030/ ↩
NVIDIA. "NVIDIA Blackwell-Powered Jetson Thor Now Available, Accelerating the Age of General Robotics." NVIDIA Newsroom, August 25, 2025. https://nvidianews.nvidia.com/news/nvidia-blackwell-powered-jetson-thor-now-available-accelerating-the-age-of-general-robotics ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

Edge computing

What is edge computing, and how do the terms differ?

When did edge computing emerge?

How is edge computing architected?

Why use edge computing instead of the cloud?

How does AI inference run at the edge?

What hardware accelerators power edge AI?

Can large language models run on device?

What frameworks deploy models to the edge?

What industries use edge computing?

How does edge compare to the cloud?

What are the major edge platforms?

What standards govern edge computing?

How does edge computing protect privacy and sovereignty?

What are the limitations of edge computing?

See also

References

Improve this article

What links here

What links here

What is edge computing, and how do the terms differ?

When did edge computing emerge?

How is edge computing architected?

Why use edge computing instead of the cloud?

How does AI inference run at the edge?

What hardware accelerators power edge AI?

Can large language models run on device?

What frameworks deploy models to the edge?

What industries use edge computing?

How does edge compare to the cloud?

What are the major edge platforms?

What standards govern edge computing?

How does edge computing protect privacy and sovereignty?

What are the limitations of edge computing?

See also

References

Improve this article

Related Articles

Cloud TPU

NVIDIA Picasso

Tensor Processing Unit (TPU)

TPU Pod

TPU Node

TPU Worker

What links here

Related Articles

Cloud TPU

NVIDIA Picasso

Tensor Processing Unit (TPU)

TPU Pod

TPU Node

TPU Worker

What links here