fal.ai

fal.ai (legally Features and Labels, Inc., often stylized fal) is a San Francisco-based artificial intelligence company that operates a specialized inference platform for generative-media models, including image, video, audio, voice, and 3D models. Founded in 2021 by Burkay Gur and Gorkem Yurtseven, the company pivoted from data infrastructure tooling in 2022 to focus on serving diffusion and other generative-media workloads, a category it argues hyperscale and general-purpose inference providers underserve.^[1]^[2] By the end of 2025 fal had raised approximately $337 million across seed, Series A, Series B, Series C and Series D rounds, with its Series D in December 2025 valuing the company at $4.5 billion.^[3]^[4]

The company is best known as one of the principal hosted-inference partners for Black Forest Labs' FLUX family of text-to-image models, as well as for hosting an unusually wide catalogue of generative-media models, including Stable Diffusion variants, Recraft, Ideogram, Veo (where commercially available), Kling, Hunyuan Video, Wan 2.x, MiniMax Hailuo, and ElevenLabs voice models. fal positions itself in opposition to general-purpose inference platforms such as Replicate, Together AI, Modal, Baseten, DeepInfra, and RunPod, arguing that its custom CUDA and Triton kernels, real-time WebSocket streaming, and vertically integrated optimization stack produce lower latency and lower per-image cost on flagship media models than commodity GPU rental or generic inference clouds.^[5]^[6]

By February 2025 the company reported more than one million developer signups, fifty-plus enterprise customers including Quora, Canva, Perplexity, Adobe, and Shopify, more than 100 million daily inference requests, and 99.99% uptime.^[7] By late 2025, founders publicly cited annualized revenue above $200 million.^[4]

History

Founders and pre-fal background

Burkay Gur and Gorkem Yurtseven are both originally from Turkey and first met in San Francisco in the early 2010s. Gur began his career at Oracle before joining Coinbase, where he served as that company's first machine-learning engineer and built out its internal ML infrastructure platform. Yurtseven spent the bulk of the same period at Amazon Web Services, where he worked on developer-facing tooling including parts of Amazon SageMaker.^[1]^[7]

During the COVID-19 pandemic in 2020 the two co-founders rented a house together in Palm Springs, California. As Yurtseven later recalled in a First Round Review podcast, the long stretch of shared isolation and shop talk in Palm Springs eventually convinced them to leave their jobs and start a company. They incorporated the business in 2021 and used the name "fal", a backronym for the machine-learning concept features and labels that also doubles as the Turkish word for fortune-telling.^[1]^[8]

Original product: data and ML tooling (2021 to 2022)

fal's first product had nothing to do with generative media. The founders initially built a Python-native data-transformation and orchestration toolchain aimed at data and ML teams, following what Yurtseven called "the Databricks and Snowflake playbook" of selling infrastructure for data preparation and modeling.^[8] An early open-source project, "fal-serverless," and a related dbt extension targeted data engineers who wanted to call ML models from inside their data pipelines without operating their own GPU fleets.

fal closed a seed round from First Round Capital in 2022 on the strength of this thesis. The company acquired a small set of paying enterprise customers but found the market slow-moving. The arrival of DALL-E 2 in April 2022, then Stable Diffusion in August 2022, and ChatGPT in November 2022 fundamentally reset the founders' worldview. Yurtseven described the shift in the First Round podcast: "It was first DALL-E 2, and then Stable Diffusion, and then ChatGPT, and then Llama. The whole AI world kind of came backwards" relative to the data-tooling thesis.^[8]

The pivot to generative-media inference (2023)

Rather than continue with two products, fal ran the original data-tooling business and a new inference offering in parallel for two to three months in early 2023, then committed entirely to generative-media inference. Yurtseven later said the decision was emotionally hard because the legacy product had paying customers, but the inference business was already growing many times faster.^[8]

A second deliberate choice was to skip large language models. While competitors such as Together AI, Fireworks AI, Anyscale, DeepInfra, Baseten, Modal, and Replicate were racing to serve open LLMs, Yurtseven argued in interviews that LLM inference was effectively a bet against Google search and the hyperscalers, making it a poor fit for an early-stage company. Generative media, by contrast, was a "greenfield" market where the dominant interaction pattern would be developer APIs and real-time creative tools rather than chat.^[8]

The pivot produced an early breakout demo: a real-time image-to-image pipeline that streamed frames from a webcam, applied a fine-tuned Stable Diffusion model, and returned a transformed video feed at multiple frames per second. The team, including Yurtseven's vice president of engineering Batuhan Taskaya, who came from a compiler-engineering background, and an engineer specializing in Triton kernels, spent weeks tuning custom GPU kernels to reach the latency target. The demo went viral on Twitter and brought a wave of indie developers and small studios to the platform.^[6]^[8]

Series A and rapid growth (2024)

In August 2023, fal reported roughly $2 million in annualized revenue. By August 2024, the company said annualized revenue exceeded $100 million, a roughly 50x year-over-year increase that the founders attributed to two factors: the explosion of generative-media use cases, and the launch of Black Forest Labs' FLUX models.^[8]

In September 2024, fal announced a $23 million combined round, comprising a previously unannounced $9 million seed round led by Andreessen Horowitz and a $14 million Series A led by Kindred Ventures, at an $80 million post-money valuation. Notable angel investors included Black Forest Labs co-founder Robin Rombach, Perplexity CEO Aravind Srinivas, Vercel founder Guillermo Rauch, Balaji Srinivasan, and Hugging Face CTO Julien Chaumond. At the time of the Series A, the company reported around 500,000 developer signups and 50 million daily generated images, videos, and audio files.^[2]^[9]

Series B (February 2025)

On February 12, 2025, fal announced a $49 million Series B led by Notable Capital and Andreessen Horowitz, with participation from Bessemer Venture Partners, Kindred Ventures, and First Round Capital. Jennifer Li of a16z and Glenn Solomon of Notable Capital joined the board. Total raised to date stood at approximately $72 million.^[7]^[10]

At the Series B announcement, fal disclosed more than 1 million developer signups, 50-plus enterprise customers including Quora (which the company said powered roughly 40% of the image-generation bots on Poe), Canva, Perplexity, Black Forest Labs, and PlayAI, more than 100 million daily inference requests, and 99.99% uptime. The round was characterized publicly as funding for the build-out of "tens of thousands of Blackwell GPUs" to replace its existing Hopper fleet.^[7]^[10]

Series C (July 2025) and unicorn status

On July 31, 2025, fal announced a $125 million Series C led by Meritech Capital Partners, with participation from Salesforce Ventures, Shopify Ventures, and Google AI Futures Fund alongside existing investors. The round valued fal at $1.5 billion, making it the first specialist generative-media inference platform to cross the $1 billion private-valuation threshold. Total funding to date rose to approximately $197 million.^[11]^[12]

The participation of Salesforce Ventures and Shopify Ventures was widely interpreted as a signal that generative-media inference was moving from a developer-toolchain category into a default building block for enterprise commerce, retail, and creative workflows.^[11]

Series D (December 2025)

On December 9, 2025, fal announced a $140 million Series D led by Sequoia Capital, with major participation from Kleiner Perkins and new investment from Alkeon Capital and NVentures (NVIDIA's venture capital arm), alongside existing investors. The round valued the company at $4.5 billion, roughly tripling the Series C valuation from five months earlier.^[3]^[4]

In coverage of the Series D, TechCrunch and Bloomberg reported that fal's annualized revenue had crossed $200 million by October 2025, that the company had roughly 45 employees, and that a separate secondary transaction of up to $110 million ran alongside the primary $140 million raise, bringing total transaction size to approximately $250 million.^[4]^[13]

Technical platform

fal's commercial pitch rests on the claim that purpose-built inference for diffusion and other generative-media architectures can deliver materially lower latency and lower per-image cost than generic, container-based inference on commodity GPU clouds. The platform's stack reflects three deliberate engineering choices.

Custom CUDA and Triton kernels

fal maintains a proprietary inference engine built around more than one hundred custom CUDA and Triton kernels, layered with parallelization utilities, diffusion caching strategies, and quantization techniques. The company has stated publicly that this engine is generic across "roughly 70 to 80 percent" of modern diffusion-transformer models, so a single optimization stack accelerates the long tail of community fine-tunes as well as flagship checkpoints. fal's own benchmarks claim the engine runs FLUX up to roughly 2x faster than eager PyTorch and up to 4x faster than competing managed services on the same hardware.^[5]^[6]

Real-time inference: WebSocket streaming and persistent runners

fal exposes two distinct interaction patterns beyond standard request and response REST. Streaming endpoints push progressive output from a single request to the client as it becomes available, typical for image refinement or long-form video. Real-time endpoints use persistent bidirectional WebSocket connections that bypass the queue entirely and connect a client directly to a kept-warm GPU runner. The runner stays resident across many requests, which both eliminates cold-start latency and permits applications such as live drawing tools and webcam-driven stylization to operate at 3 to 5 frames per second on Stable Diffusion-class models, according to fal's documentation.^[5]

Custom GPU fleet and infrastructure

By the Series B announcement in February 2025, fal said its production fleet was "in the thousands" of NVIDIA H100 (Hopper) GPUs, with plans to scale to "tens of thousands" of NVIDIA B100/B200 (Blackwell) GPUs by year end. The participation of NVIDIA's venture arm in the Series D in December 2025 was widely interpreted as both a strategic and financial alignment, although fal has not described any exclusive supply arrangement.^[7]^[3] fal has also partnered with object-storage provider Tigris to back asset delivery and content storage for outputs.^[14]

Models hosted

fal advertises an API surface covering more than 1,000 generative-media models as of 2026, spanning image, video, audio, voice, and 3D modalities. Notable categories include the following.^[15]

Image generation

Black Forest Labs FLUX family. FLUX.1 [schnell] (Apache-2.0 distilled), FLUX.1 [dev] (open weights, non-commercial), FLUX.1 [pro], FLUX.1.1 [pro] including Ultra and Raw modes, FLUX.1 Kontext for image editing, FLUX.2 [dev] and FLUX.2 [pro], plus partner fine-tunes such as FLUX.1 Krea [dev].^[16]^[17]
Stable Diffusion family. Stable Diffusion XL, SD3, SDXL Lightning, and many community fine-tunes.^[9]
Recraft. Recraft V3 and Recraft V4, design-oriented models.^[15]
Ideogram. Text-rendering specialist image model.^[15]

Video generation

Kling. Multiple Kling generations through Kling 3.0 and Kling Native 4K from Kuaishou.^[15]
Hunyuan Video. Tencent's open-weights video model.^[15]
Wan. Alibaba's Wan 2.1 and Wan 2.6 video models.^[15]
MiniMax Hailuo. Including the Hailuo-02 image-to-video API.^[15]
Google Veo. Veo 3.1 and Veo 3.1 Lite, available through fal as a hosted partner.^[15]

Audio and voice

ElevenLabs. Text-to-speech APIs including Turbo v2.5.^[15]
Whisper. OpenAI's speech-to-text model, on which fal has claimed industry-leading inference speed.^[1]
Various music-generation and sound-effect models from third-party providers.^[15]

The composition of the catalogue changes frequently and many models are added on or near their public release date through pre-arranged partnerships with the model providers.

Funding history

Round	Date	Amount	Lead investor(s)	Other notable participants	Valuation
Seed	2022	undisclosed	First Round Capital	none disclosed	undisclosed
Seed (extension)	2024 (announced Sep 2024)	$9 million	Andreessen Horowitz	First Round Capital, angels	undisclosed
Series A	September 18, 2024	$14 million	Kindred Ventures	a16z, Robin Rombach, Aravind Srinivas, Guillermo Rauch, Balaji Srinivasan, Julien Chaumond	$80 million post-money
Series B	February 12, 2025	$49 million	Notable Capital, Andreessen Horowitz	Bessemer Venture Partners, Kindred Ventures, First Round Capital	undisclosed
Series C	July 31, 2025	$125 million	Meritech Capital Partners	Salesforce Ventures, Shopify Ventures, Google AI Futures Fund	$1.5 billion
Series D	December 9, 2025	$140 million	Sequoia Capital	Kleiner Perkins, Alkeon Capital, NVentures (NVIDIA), a16z	$4.5 billion

^[2]^[7]^[9]^[10]^[11]^[12]^[3]^[4]

Aggregate primary capital raised through the Series D stood at approximately $337 million, supplemented by a secondary transaction of up to $110 million arranged alongside the Series D.^[4]

Partnerships

Black Forest Labs

fal's relationship with Black Forest Labs, the German image-generation lab founded by former Stability AI researchers including Robin Rombach, is the company's most prominent. fal supported FLUX.1 [schnell] and FLUX.1 [dev] on the day of the models' public release on August 1, 2024, in a launch the company describes as enabled by pre-existing relationships between the two teams. FLUX models subsequently became central to fal's growth narrative, and Robin Rombach invested personally in fal's Series A. Although fal is one of several distribution partners for FLUX, along with Replicate, Together AI, Cloudflare Workers AI, DeepInfra, Runware, and Microsoft Foundry, fal markets itself as the highest-performance inference partner for the model family and has continued to ship same-day support for new releases including FLUX.1 Kontext, FLUX.1 Tools, and FLUX.2.^[16]^[17]

Model partners

fal maintains hosted endpoints, often shipped on or near release day, for models from Black Forest Labs, Stability AI, Kuaishou (Kling), Tencent (Hunyuan), Alibaba (Wan), MiniMax (Hailuo), Recraft, Ideogram, ElevenLabs, Krea, and others. In several cases, including the FLUX.1 Krea [dev] partner fine-tune released in 2025, fal has also shipped training endpoints, allowing developers to LoRA-train derivative models.^[17]

Cloud and storage partners

In 2025, fal disclosed that it uses Tigris Data for global object storage to back asset delivery for generated media. NVIDIA's venture arm joined the Series D in December 2025, an investment widely read as cementing fal's status as a preferred customer for Blackwell-generation GPU supply.^[3]^[14]

Competitive landscape

fal competes with several overlapping categories of company:

General-purpose inference platforms such as Replicate, Together AI, Fireworks AI, Anyscale, Baseten, Modal, DeepInfra, and Lepton AI (now part of NVIDIA). Several of these started, like fal, in the LLM and diffusion-as-a-service category, but most have prioritized text models, where fal explicitly declines to compete at depth.^[5]^[8]
GPU rental and managed Kubernetes providers such as RunPod, Lambda, CoreWeave, and Crusoe. fal argues that bare GPU rental requires customers to assemble their own kernels, schedulers, autoscalers, and queueing, which is impractical for the typical creative-tools developer.^[5]
Hyperscaler model gardens such as AWS Bedrock, Azure AI Foundry, and Google Vertex AI, all of which now offer FLUX, Veo, and other media models. fal argues that hyperscalers' model gardens prioritize breadth over kernel-level optimization for diffusion.^[5]^[7]
Vertical AI product companies that ship generative-media products directly to creators, such as Midjourney, Runway, Luma, Pika, and Krea. These are partners as often as competitors, and several of them either host on fal or use it as a fallback.^[17]

The investor case repeated across the Series B, C, and D announcements is that fal occupies a defensible position as the dedicated inference layer for generative media, sitting between model providers (who do not want to operate inference fleets) and AI-native application developers (who do not want to maintain their own kernels).^[7]^[10]

Customers and use cases

fal's published customer references include:

Quora, which has said fal powers a substantial share of image-generation bots on its Poe platform.^[7]
Canva, which uses fal as part of its generative-image and generative-video pipelines.^[7]^[4]
Perplexity, which has used FLUX models hosted on fal for in-product image generation; CEO Aravind Srinivas is also an angel investor.^[2]^[7]
Adobe, named at the Series D announcement as an enterprise customer.^[4]
Shopify, whose corporate venture arm participated in the Series C and whose platform integrates fal-hosted models for merchant tooling.^[11]^[4]
Photoroom, Freepik, and PlayAI / PlayHT, named in earlier rounds as platform customers.^[2]^[7]
Black Forest Labs, whose first-party API at api.bfl.ai uses fal as a backend partner.^[7]^[16]

fal has also publicized non-enterprise use cases including museum installations, live DJ performances driven by real-time image-to-image generation, and game studios that use fal endpoints for asset generation.^[9]

Organization and culture

fal is headquartered in San Francisco with a globally distributed engineering team. At the time of its Series D announcement in December 2025 the company employed roughly 45 people, of whom about 34 were engineers.^[4]^[8] Yurtseven has publicly described several deliberate operating choices:

The engineering organization deliberately avoids dedicated engineering managers and instead organizes around tech leads.^[8]
One-on-one meetings are largely replaced by small group discussions mixing tenured and newer engineers, which the founders argue spreads context faster.^[8]
Marketing relies heavily on community engagement (Discord, X, in-person hackathons) and self-deprecating merchandise (notably the "GPU Rich" and "GPU Poor" hat campaign) rather than traditional brand advertising.^[8]
The company has emphasized hard revenue metrics, in particular annualized recurring revenue from yearly enterprise commitments, over vanity engagement metrics from the very early days.^[8]

Burkay Gur serves as CEO and Gorkem Yurtseven as CTO. Some early external coverage described Yurtseven as CEO; the company's About page and Series B and later press releases identify Gur as CEO and Yurtseven as CTO.^[1]^[7]

Reception and notable coverage

fal received broad coverage across the technology and business press during 2024 and 2025. TechCrunch covered the seed-plus-Series A in September 2024, Fortune ran an exclusive on the Series B in February 2025, BusinessWire and Yahoo Finance carried the Series C announcement in July 2025, and Bloomberg, TechCrunch, and BusinessWire all covered the Series D in December 2025. The First Round Review podcast published a detailed interview with Yurtseven in 2025 covering the pivot and operational philosophy, and the Latent Space podcast ran a long technical interview with Yurtseven and engineering leader Batuhan Taskaya.^[2]^[7]^[11]^[4]^[13]^[8]^[6]

By the end of 2025, several industry analysts had described fal as the highest-revenue specialist generative-media inference provider in the market and as one of the highest-growth AI infrastructure startups of the 2024-2025 cycle.^[11]^[4]

References

"About fal.ai." fal.ai. https://fal.ai/about. Accessed 2026-05-20.
Wiggers, Kyle. "Fal.ai, which hosts media-generating AI models, raises $23M from a16z and others." TechCrunch. September 18, 2024. https://techcrunch.com/2024/09/18/fal-ai-which-hosts-media-generating-ai-models-raises-23m-from-a16z-and-others/. Accessed 2026-05-20.
"fal Raises $140M in Series D Led by Sequoia, with Major Participation from Kleiner Perkins and New Investment from Alkeon Capital and NVentures (NVIDIA's venture capital arm) to Accelerate the Future of Real-Time Generative Media." BusinessWire. December 9, 2025. https://www.businesswire.com/news/home/20251209532649/en/fal-Raises-$140M-in-Series-D-Led-by-Sequoia-with-Major-Participation-from-Kleiner-Perkins-and-New-Investment-from-Alkeon-Capital-and-NVentures-NVIDIAs-venture-capital-arm-to-Accelerate-the-Future-of-Real-Time-Generative-Media. Accessed 2026-05-20.
Wiggers, Kyle. "Fal nabs $140M in fresh funding led by Sequoia, tripling valuation to $4.5B." TechCrunch. December 9, 2025. https://techcrunch.com/2025/12/09/fal-nabs-140m-in-fresh-funding-led-by-sequoia-tripling-valuation-to-4-5b/. Accessed 2026-05-20.
"Real-Time Inference." fal.ai documentation. https://fal.ai/docs/documentation/model-apis/inference/real-time. Accessed 2026-05-20.
"A Technical History of Generative Media with Gorkem and Batuhan from Fal.ai." Latent Space podcast. https://www.latent.space/p/fal. Accessed 2026-05-20.
"fal Raises $49M Series B to Power the Future of AI Video." fal.ai blog. February 12, 2025. https://blog.fal.ai/fal-raises-49m-series-b-to-power-the-future-of-ai-video/. Accessed 2026-05-20.
"The pivot that paid off: How fal found explosive growth in generative media." First Round Review podcast with Gorkem Yurtseven. https://review.firstround.com/podcast/the-pivot-that-paid-off-how-fal-found-explosive-growth-in-generative-media-gorkem-yurtseven-co-founder-and-ceo/. Accessed 2026-05-20.
"Fal: an AI inference platform for generative media." Kindred Ventures. September 2024. https://kindredventures.com/announcement/our-investment-in-fal-ai-inference-platform-for-generative-media/. Accessed 2026-05-20.
"Why We Invested In fal.ai." Notable Capital. February 2025. https://www.notablecap.com/blog/why-we-invested-in-fal-ai. Accessed 2026-05-20.
"fal Raises $125M in Series C Led by Meritech, with Salesforce Ventures, Shopify Ventures and Google AI Futures Fund Joining to Power the Next Decade of Generative Media." BusinessWire. July 31, 2025. https://www.businesswire.com/news/home/20250731234742/en/fal-Raises-$125M-in-Series-C-Led-by-Meritech-with-Salesforce-Ventures-Shopify-Ventures-and-Google-AI-Futures-Fund-Joining-to-Power-the-Next-Decade-of-Generative-Media. Accessed 2026-05-20.
"Tripling Down: Fal raises a $125M Series C at a $1.5B valuation." Kindred Ventures. July 31, 2025. https://kindredventures.com/announcement/tripling-down-fal-raises-125m-series-c-at-a-1-5-billion-valuation/. Accessed 2026-05-20.
"Sequoia-Led Funding Vaults AI Startup Fal to $4.5 Billion Valuation." Bloomberg. December 9, 2025. https://www.bloomberg.com/news/articles/2025-12-09/sequoia-led-funding-vaults-ai-startup-fal-to-4-5-billion-valuation. Accessed 2026-05-20.
"How fal.ai offers the fastest generative ai in the world." Tigris Data case study. https://www.tigrisdata.com/blog/case-study-falai/. Accessed 2026-05-20.
"Explore AI Models." fal.ai. https://fal.ai/explore/models. Accessed 2026-05-20.
"Announcing Flux by Black Forest Labs: The Next Leap in Text-to-Image Models." fal.ai blog. August 1, 2024. https://blog.fal.ai/flux-the-largest-open-sourced-text2img-model-now-available-on-fal/. Accessed 2026-05-20.
"Introducing the Flux.1 Krea [dev] On fal." fal.ai blog. https://blog.fal.ai/introducing-the-flux-1-krea-dev-on-fal/. Accessed 2026-05-20.

fal.ai

History

Founders and pre-fal background

Original product: data and ML tooling (2021 to 2022)

The pivot to generative-media inference (2023)

Series A and rapid growth (2024)

Series B (February 2025)

Series C (July 2025) and unicorn status

Series D (December 2025)

Technical platform

Custom CUDA and Triton kernels

Real-time inference: WebSocket streaming and persistent runners

Custom GPU fleet and infrastructure

Models hosted

Image generation

Video generation

Audio and voice

Funding history

Partnerships

Black Forest Labs

Model partners

Cloud and storage partners

Competitive landscape

Customers and use cases

Organization and culture

Reception and notable coverage

See also

References

Improve this article

What links here

fal.ai

History

Founders and pre-fal background

Original product: data and ML tooling (2021 to 2022)

The pivot to generative-media inference (2023)

Series A and rapid growth (2024)

Series B (February 2025)

Series C (July 2025) and unicorn status

Series D (December 2025)

Technical platform

Custom CUDA and Triton kernels

Real-time inference: WebSocket streaming and persistent runners

Custom GPU fleet and infrastructure

Models hosted

Image generation

Video generation

Audio and voice

Funding history

Partnerships

Black Forest Labs

Model partners

Cloud and storage partners

Competitive landscape

Customers and use cases

Organization and culture

Reception and notable coverage

See also

References

What links here