fal.ai
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 3,373 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 3,373 words
Add missing citations, update stale details, or suggest a clearer explanation.
fal.ai (legally Features and Labels, Inc., often stylized fal) is a San Francisco-based artificial intelligence company that operates a specialized inference platform for generative-media models, including image, video, audio, voice, and 3D models. Founded in 2021 by Burkay Gur and Gorkem Yurtseven, the company pivoted from data infrastructure tooling in 2022 to focus on serving diffusion and other generative-media workloads, a category it argues hyperscale and general-purpose inference providers underserve.[^1][^2] By the end of 2025 fal had raised approximately $337 million across seed, Series A, Series B, Series C and Series D rounds, with its Series D in December 2025 valuing the company at $4.5 billion.[^3][^4]
The company is best known as one of the principal hosted-inference partners for Black Forest Labs' FLUX family of text-to-image models, as well as for hosting an unusually wide catalogue of generative-media models, including Stable Diffusion variants, Recraft, Ideogram, Veo (where commercially available), Kling, Hunyuan Video, Wan 2.x, MiniMax Hailuo, and ElevenLabs voice models. fal positions itself in opposition to general-purpose inference platforms such as Replicate, Together AI, Modal, Baseten, DeepInfra, and RunPod, arguing that its custom CUDA and Triton kernels, real-time WebSocket streaming, and vertically integrated optimization stack produce lower latency and lower per-image cost on flagship media models than commodity GPU rental or generic inference clouds.[^5][^6]
By February 2025 the company reported more than one million developer signups, fifty-plus enterprise customers including Quora, Canva, Perplexity, Adobe, and Shopify, more than 100 million daily inference requests, and 99.99% uptime.[^7] By late 2025, founders publicly cited annualized revenue above $200 million.[^4]
Burkay Gur and Gorkem Yurtseven are both originally from Turkey and first met in San Francisco in the early 2010s. Gur began his career at Oracle before joining Coinbase, where he served as that company's first machine-learning engineer and built out its internal ML infrastructure platform. Yurtseven spent the bulk of the same period at Amazon Web Services, where he worked on developer-facing tooling including parts of Amazon SageMaker.[^1][^7]
During the COVID-19 pandemic in 2020 the two co-founders rented a house together in Palm Springs, California. As Yurtseven later recalled in a First Round Review podcast, the long stretch of shared isolation and shop talk in Palm Springs eventually convinced them to leave their jobs and start a company. They incorporated the business in 2021 and used the name "fal", a backronym for the machine-learning concept features and labels that also doubles as the Turkish word for fortune-telling.[^1][^8]
fal's first product had nothing to do with generative media. The founders initially built a Python-native data-transformation and orchestration toolchain aimed at data and ML teams, following what Yurtseven called "the Databricks and Snowflake playbook" of selling infrastructure for data preparation and modeling.[^8] An early open-source project, "fal-serverless," and a related dbt extension targeted data engineers who wanted to call ML models from inside their data pipelines without operating their own GPU fleets.
fal closed a seed round from First Round Capital in 2022 on the strength of this thesis. The company acquired a small set of paying enterprise customers but found the market slow-moving. The arrival of DALL-E 2 in April 2022, then Stable Diffusion in August 2022, and ChatGPT in November 2022 fundamentally reset the founders' worldview. Yurtseven described the shift in the First Round podcast: "It was first DALL-E 2, and then Stable Diffusion, and then ChatGPT, and then Llama. The whole AI world kind of came backwards" relative to the data-tooling thesis.[^8]
Rather than continue with two products, fal ran the original data-tooling business and a new inference offering in parallel for two to three months in early 2023, then committed entirely to generative-media inference. Yurtseven later said the decision was emotionally hard because the legacy product had paying customers, but the inference business was already growing many times faster.[^8]
A second deliberate choice was to skip large language models. While competitors such as Together AI, Fireworks AI, Anyscale, DeepInfra, Baseten, Modal, and Replicate were racing to serve open LLMs, Yurtseven argued in interviews that LLM inference was effectively a bet against Google search and the hyperscalers, making it a poor fit for an early-stage company. Generative media, by contrast, was a "greenfield" market where the dominant interaction pattern would be developer APIs and real-time creative tools rather than chat.[^8]
The pivot produced an early breakout demo: a real-time image-to-image pipeline that streamed frames from a webcam, applied a fine-tuned Stable Diffusion model, and returned a transformed video feed at multiple frames per second. The team, including Yurtseven's vice president of engineering Batuhan Taskaya, who came from a compiler-engineering background, and an engineer specializing in Triton kernels, spent weeks tuning custom GPU kernels to reach the latency target. The demo went viral on Twitter and brought a wave of indie developers and small studios to the platform.[^6][^8]
In August 2023, fal reported roughly $2 million in annualized revenue. By August 2024, the company said annualized revenue exceeded $100 million, a roughly 50x year-over-year increase that the founders attributed to two factors: the explosion of generative-media use cases, and the launch of Black Forest Labs' FLUX models.[^8]
In September 2024, fal announced a $23 million combined round, comprising a previously unannounced $9 million seed round led by Andreessen Horowitz and a $14 million Series A led by Kindred Ventures, at an $80 million post-money valuation. Notable angel investors included Black Forest Labs co-founder Robin Rombach, Perplexity CEO Aravind Srinivas, Vercel founder Guillermo Rauch, Balaji Srinivasan, and Hugging Face CTO Julien Chaumond. At the time of the Series A, the company reported around 500,000 developer signups and 50 million daily generated images, videos, and audio files.[^2][^9]
On February 12, 2025, fal announced a $49 million Series B led by Notable Capital and Andreessen Horowitz, with participation from Bessemer Venture Partners, Kindred Ventures, and First Round Capital. Jennifer Li of a16z and Glenn Solomon of Notable Capital joined the board. Total raised to date stood at approximately $72 million.[^7][^10]
At the Series B announcement, fal disclosed more than 1 million developer signups, 50-plus enterprise customers including Quora (which the company said powered roughly 40% of the image-generation bots on Poe), Canva, Perplexity, Black Forest Labs, and PlayAI, more than 100 million daily inference requests, and 99.99% uptime. The round was characterized publicly as funding for the build-out of "tens of thousands of Blackwell GPUs" to replace its existing Hopper fleet.[^7][^10]
On July 31, 2025, fal announced a $125 million Series C led by Meritech Capital Partners, with participation from Salesforce Ventures, Shopify Ventures, and Google AI Futures Fund alongside existing investors. The round valued fal at $1.5 billion, making it the first specialist generative-media inference platform to cross the $1 billion private-valuation threshold. Total funding to date rose to approximately $197 million.[^11][^12]
The participation of Salesforce Ventures and Shopify Ventures was widely interpreted as a signal that generative-media inference was moving from a developer-toolchain category into a default building block for enterprise commerce, retail, and creative workflows.[^11]
On December 9, 2025, fal announced a $140 million Series D led by Sequoia Capital, with major participation from Kleiner Perkins and new investment from Alkeon Capital and NVentures (NVIDIA's venture capital arm), alongside existing investors. The round valued the company at $4.5 billion, roughly tripling the Series C valuation from five months earlier.[^3][^4]
In coverage of the Series D, TechCrunch and Bloomberg reported that fal's annualized revenue had crossed $200 million by October 2025, that the company had roughly 45 employees, and that a separate secondary transaction of up to $110 million ran alongside the primary $140 million raise, bringing total transaction size to approximately $250 million.[^4][^13]
fal's commercial pitch rests on the claim that purpose-built inference for diffusion and other generative-media architectures can deliver materially lower latency and lower per-image cost than generic, container-based inference on commodity GPU clouds. The platform's stack reflects three deliberate engineering choices.
fal maintains a proprietary inference engine built around more than one hundred custom CUDA and Triton kernels, layered with parallelization utilities, diffusion caching strategies, and quantization techniques. The company has stated publicly that this engine is generic across "roughly 70 to 80 percent" of modern diffusion-transformer models, so a single optimization stack accelerates the long tail of community fine-tunes as well as flagship checkpoints. fal's own benchmarks claim the engine runs FLUX up to roughly 2x faster than eager PyTorch and up to 4x faster than competing managed services on the same hardware.[^5][^6]
fal exposes two distinct interaction patterns beyond standard request and response REST. Streaming endpoints push progressive output from a single request to the client as it becomes available, typical for image refinement or long-form video. Real-time endpoints use persistent bidirectional WebSocket connections that bypass the queue entirely and connect a client directly to a kept-warm GPU runner. The runner stays resident across many requests, which both eliminates cold-start latency and permits applications such as live drawing tools and webcam-driven stylization to operate at 3 to 5 frames per second on Stable Diffusion-class models, according to fal's documentation.[^5]
By the Series B announcement in February 2025, fal said its production fleet was "in the thousands" of NVIDIA H100 (Hopper) GPUs, with plans to scale to "tens of thousands" of NVIDIA B100/B200 (Blackwell) GPUs by year end. The participation of NVIDIA's venture arm in the Series D in December 2025 was widely interpreted as both a strategic and financial alignment, although fal has not described any exclusive supply arrangement.[^7][^3] fal has also partnered with object-storage provider Tigris to back asset delivery and content storage for outputs.[^14]
fal advertises an API surface covering more than 1,000 generative-media models as of 2026, spanning image, video, audio, voice, and 3D modalities. Notable categories include the following.[^15]
The composition of the catalogue changes frequently and many models are added on or near their public release date through pre-arranged partnerships with the model providers.
| Round | Date | Amount | Lead investor(s) | Other notable participants | Valuation |
|---|---|---|---|---|---|
| Seed | 2022 | undisclosed | First Round Capital | none disclosed | undisclosed |
| Seed (extension) | 2024 (announced Sep 2024) | $9 million | Andreessen Horowitz | First Round Capital, angels | undisclosed |
| Series A | September 18, 2024 | $14 million | Kindred Ventures | a16z, Robin Rombach, Aravind Srinivas, Guillermo Rauch, Balaji Srinivasan, Julien Chaumond | $80 million post-money |
| Series B | February 12, 2025 | $49 million | Notable Capital, Andreessen Horowitz | Bessemer Venture Partners, Kindred Ventures, First Round Capital | undisclosed |
| Series C | July 31, 2025 | $125 million | Meritech Capital Partners | Salesforce Ventures, Shopify Ventures, Google AI Futures Fund | $1.5 billion |
| Series D | December 9, 2025 | $140 million | Sequoia Capital | Kleiner Perkins, Alkeon Capital, NVentures (NVIDIA), a16z | $4.5 billion |
[^2][^7][^9][^10][^11][^12][^3][^4]
Aggregate primary capital raised through the Series D stood at approximately $337 million, supplemented by a secondary transaction of up to $110 million arranged alongside the Series D.[^4]
fal's relationship with Black Forest Labs, the German image-generation lab founded by former Stability AI researchers including Robin Rombach, is the company's most prominent. fal supported FLUX.1 [schnell] and FLUX.1 [dev] on the day of the models' public release on August 1, 2024, in a launch the company describes as enabled by pre-existing relationships between the two teams. FLUX models subsequently became central to fal's growth narrative, and Robin Rombach invested personally in fal's Series A. Although fal is one of several distribution partners for FLUX, along with Replicate, Together AI, Cloudflare Workers AI, DeepInfra, Runware, and Microsoft Foundry, fal markets itself as the highest-performance inference partner for the model family and has continued to ship same-day support for new releases including FLUX.1 Kontext, FLUX.1 Tools, and FLUX.2.[^16][^17]
fal maintains hosted endpoints, often shipped on or near release day, for models from Black Forest Labs, Stability AI, Kuaishou (Kling), Tencent (Hunyuan), Alibaba (Wan), MiniMax (Hailuo), Recraft, Ideogram, ElevenLabs, Krea, and others. In several cases, including the FLUX.1 Krea [dev] partner fine-tune released in 2025, fal has also shipped training endpoints, allowing developers to LoRA-train derivative models.[^17]
In 2025, fal disclosed that it uses Tigris Data for global object storage to back asset delivery for generated media. NVIDIA's venture arm joined the Series D in December 2025, an investment widely read as cementing fal's status as a preferred customer for Blackwell-generation GPU supply.[^3][^14]
fal competes with several overlapping categories of company:
The investor case repeated across the Series B, C, and D announcements is that fal occupies a defensible position as the dedicated inference layer for generative media, sitting between model providers (who do not want to operate inference fleets) and AI-native application developers (who do not want to maintain their own kernels).[^7][^10]
fal's published customer references include:
fal has also publicized non-enterprise use cases including museum installations, live DJ performances driven by real-time image-to-image generation, and game studios that use fal endpoints for asset generation.[^9]
fal is headquartered in San Francisco with a globally distributed engineering team. At the time of its Series D announcement in December 2025 the company employed roughly 45 people, of whom about 34 were engineers.[^4][^8] Yurtseven has publicly described several deliberate operating choices:
Burkay Gur serves as CEO and Gorkem Yurtseven as CTO. Some early external coverage described Yurtseven as CEO; the company's About page and Series B and later press releases identify Gur as CEO and Yurtseven as CTO.[^1][^7]
fal received broad coverage across the technology and business press during 2024 and 2025. TechCrunch covered the seed-plus-Series A in September 2024, Fortune ran an exclusive on the Series B in February 2025, BusinessWire and Yahoo Finance carried the Series C announcement in July 2025, and Bloomberg, TechCrunch, and BusinessWire all covered the Series D in December 2025. The First Round Review podcast published a detailed interview with Yurtseven in 2025 covering the pivot and operational philosophy, and the Latent Space podcast ran a long technical interview with Yurtseven and engineering leader Batuhan Taskaya.[^2][^7][^11][^4][^13][^8][^6]
By the end of 2025, several industry analysts had described fal as the highest-revenue specialist generative-media inference provider in the market and as one of the highest-growth AI infrastructure startups of the 2024-2025 cycle.[^11][^4]