Mem0 is an open-source memory layer for AI agents and applications that gives large language models persistent, evolving memory across sessions. Built as infrastructure rather than a product feature, it lets developers add, retrieve, and manage structured memory with a few lines of code. The project has accumulated over 41,000 GitHub stars and 14 million Python package downloads since its January 2024 launch, and its managed cloud platform serves more than 90,000 developers. AWS selected Mem0 as the memory provider for its Strands Agents SDK, integrating it into the Amazon Bedrock ecosystem.
The core problem Mem0 addresses is that LLMs are stateless by design. Each conversation starts with no knowledge of prior interactions unless that context is explicitly passed into the prompt. Solving this by stuffing entire conversation histories into the context window is expensive, slow, and does not scale; the cost grows linearly with history length and many relevant details get buried. Mem0 takes a different approach: extract and store discrete facts from conversations, update those facts when things change, and retrieve only the most relevant subset when needed. The result is a system that can compress months of interactions into a few hundred tokens of useful context.
Mem0 began as Embedchain, an open-source retrieval-augmented generation (RAG) framework started by Taranjeet Singh in June 2023. Singh had quit his engineering role at Khatabook, a Y Combinator-backed fintech startup, in late 2022 as the ChatGPT wave was building. He then built one of the first GPT-powered app stores, which reached over a million users. The experience gave him a close view of what LLM applications lacked.
Embedchain addressed one of those gaps: making it easier for developers to index and retrieve unstructured data for use in LLM prompts. The project quickly earned 8,000-plus GitHub stars and demonstrated real demand for memory tooling in the developer community.
Singh reconnected with Deshraj Yadav, a longtime collaborator who had led the AI Platform at Tesla Autopilot, handling model training and monitoring infrastructure for Tesla's self-driving system. The two had previously co-created EvalAI, an open-source machine learning evaluation platform that served as Yadav's master's thesis and grew to over 1,600 GitHub stars. Yadav had also published research at top computer vision and AI conferences including CVPR, ECCV, and AAAI.
While building with Embedchain, Singh and Yadav launched a meditation app inspired by Indian yogi Sadhguru. The app went viral in India. Users liked it, but the feedback they kept hearing was that the app "doesn't remember" their meditative journey. Each session started from scratch. The AI had no idea who the user was, what they had worked on before, or where they were in their practice.
This feedback crystallized the problem. Embedchain helped developers build RAG pipelines for retrieving static documents. But real AI applications needed something different: a system that accumulates knowledge about users over time, updates that knowledge as things change, and retrieves the right context when needed. That was not a RAG problem. It was a memory problem.
In July 2024, Singh announced the change on social media: "We are changing our name from Embedchain to Mem0." The announcement described the change as reflecting the company's growth, learnings, and sharpened focus over the preceding year. The Embedchain codebase remains in the Mem0 GitHub repository as a subdirectory, preserving continuity for existing users.
Mem0 launched publicly in January 2024 as a YC S24 company. Y Combinator's Summer 2024 batch contained 255 companies, with 67% categorized as AI. Mem0 was among the most-cited in analyst reviews of the batch.
Mem0 has raised a total of $24 million across two rounds.
Kindred Ventures led a $3.9 million seed round, which had not been publicly announced before the Series A disclosure. The seed brought in early capital to build out the core open-source library and establish the managed platform.
In October 2025, Basis Set Ventures led a $20 million Series A. Other participants in the Series A included Kindred Ventures, Y Combinator, Peak XV Partners, and the GitHub Fund.
The round also drew a notable list of angel investors: Dharmesh Shah (co-founder of HubSpot), Scott Belsky (former CPO of Adobe), Olivier Pomel (CEO of Datadog), Thomas Dohmke (former CEO of GitHub), Paul Copplestone (CEO of Supabase), James Hawkins (CEO of PostHog), Lukas Biewald (co-founder of Weights and Biases), Brian Balfour (founder of Reforge), Philip Rathle (VP at Neo4j), and Jennifer Taylor (former president of Plaid). The participation of infrastructure founders and product executives from across the developer tooling ecosystem reflects the broad interest in memory as a foundational layer for AI applications.
At the time of the Series A, the founding team consisted of four people.
Singh has described the broader ambition as building "Plaid for memory." Just as Plaid became neutral infrastructure connecting financial apps to bank accounts, Mem0 aims to become the portable memory layer that travels with users across different AI applications and platforms. The argument is that model providers like OpenAI and Anthropic have no incentive to make memory portable; they benefit from users staying within their ecosystems. An open, neutral layer can serve developers and users regardless of which model or platform they use.
Mem0's core design is a two-phase pipeline: extraction and update. When new conversation turns arrive, the system extracts salient information using LLM-based function calling on a sliding window of recent messages and conversation summaries. The update phase then compares extracted facts against existing memories and decides between four operations: ADD (create a new memory when no semantically equivalent entry exists), UPDATE (augment an existing memory with new complementary information), DELETE (remove a memory that has been contradicted), or NOOP (leave existing memories unchanged when the new information is redundant).
The system uses a configurable context window of the last ten messages for extraction, and compares each extracted fact against the ten most semantically similar existing memories during the update phase.
Mem0 stores memories across three complementary datastores:
Vector store: Dense embeddings enable semantic similarity search. When a query arrives, the system embeds it into a 1,536-dimensional vector and retrieves the top candidates by cosine similarity. This handles open-domain and preference-related queries well. The default vector store is Qdrant, though the open-source version supports over twenty alternatives including Chroma, Pinecone, Milvus, PGVector, MongoDB Atlas, Weaviate, FAISS, Redis Stack, Valkey, Elasticsearch, OpenSearch, Supabase, Upstash, Cloudflare Vectorize, and AWS S3 Vectors.
Graph store: The graph-enhanced variant, referred to as Mem0g in the research paper, represents entities as nodes and relationships as directed, labeled edges. An LLM module analyzes entities and their context to identify semantically significant connections and classifies each with a relationship label, producing triplets that form the graph's edges. This structure improves performance on temporal queries and multi-hop reasoning tasks where the agent must connect a chain of facts about a person or topic. The graph component uses Neo4j or AWS Neptune Analytics as the backing store.
Key-value store: Structured, frequently accessed metadata such as user preferences and explicit settings gets stored in a key-value store. This is the fastest retrieval path for specific, well-defined lookups that do not require semantic search.
The retrieval pipeline scores candidates by combining relevance, recency, and a type-specific weight. Semantic memories receive a weight of 0.6, episodic memories 0.3, and procedural memories 0.1. The top five results under 200 tokens are injected into the prompt context.
One of Mem0's architectural advantages over full-context approaches is token efficiency. The paper published in April 2025 reported that Mem0 consumes approximately 7,000 tokens per conversation for its memory representation, compared to 26,031 tokens for full-context prompting. The graph-enhanced variant uses roughly 14,000 tokens. Zep, by comparison, exceeded 600,000 tokens in the same analysis due to its approach of caching redundant summaries and edge facts.
Mem0 organizes memory into four layers that correspond to different time horizons and use cases:
Conversation memory contains the in-flight messages within a single turn. These exist only for the duration of the current response cycle and do not persist.
Session memory holds short-lived context lasting from minutes to a few hours. It is appropriate for multi-step workflows that should reset when the task completes. Developers bind session memories to a run_id parameter, and the system expires them automatically when the session ends.
User memory is the persistent layer, tied to a user_id and spanning weeks to indefinitely. This is where facts like preferences, past interactions, and personal details accumulate over time. It is the primary mechanism for lasting personalization.
Organizational memory is shared context that is globally accessible to multiple agents or teams. It allows enterprise deployments to maintain shared knowledge across different users and workflows.
Within user memory, the system distinguishes between three cognitive types borrowed from cognitive psychology:
Semantic memory stores factual knowledge about a user: preferences, constraints, and stable attributes. A sales CRM agent might store "prefers email outreach over calls" or "budget cap is $50,000" as semantic memories.
Episodic memory records what happened in specific interactions, logged with sufficient context to be useful in later sessions. An AI therapist could use episodic memory to recall that a user described a difficult week in a session three months prior.
Procedural memory stores how to do things rather than facts or events. This covers learned workflows, custom tool-use patterns, and process knowledge the agent should apply consistently. An agent that learns a user's preferred report format or a custom code review checklist would store that in procedural memory.
Mem0 is available both as an Apache 2.0-licensed open-source library and as a fully managed cloud platform.
The open-source version (installable via pip install mem0ai) supports the complete set of vector databases, LLM providers, and embedding models listed in the project's documentation. It requires developers to provision their own vector database, configure their LLM provider, and manage infrastructure. Setup takes 15 to 30 minutes. It supports graph memory and multimodal features, and can be deployed on any cloud or on-premises environment. There is no licensing cost, but operators pay separately for their infrastructure and LLM API usage. For teams at scale, self-hosting can be significantly cheaper than the managed tiers, particularly if they are already running vector databases for other purposes.
The managed Mem0 Platform handles infrastructure automatically, including auto-scaling and high availability. It adds webhooks, memory export, a web dashboard with analytics, custom memory categories, and memory filtering through metadata. Data is hosted in the US by default, with expansion options for data residency requirements. The platform provides a unified API that abstracts over the underlying storage layer, so teams do not need to choose or operate a vector database.
The two versions share core capabilities: user and agent memory storage, intelligent deduplication, semantic search, and the ADD/UPDATE/DELETE/NOOP update operations. Both offer Python and JavaScript SDKs with identical interfaces and support the same major agent frameworks. Enterprise customers can deploy the platform on-premises or in a private cloud, with SSO, audit logging, and custom integrations.
A key decision point is data control. Self-hosted deployments give teams complete ownership of their memory data, which matters for regulated industries like healthcare and finance. The managed platform offers SOC 2 and HIPAA compliance certifications and a zero-trust architecture, but the data still transits Mem0's infrastructure. Teams with strict data sovereignty requirements tend to choose self-hosting.
In 2025, AWS and Mem0 announced a partnership integrating Mem0's memory layer into the Strands Agents SDK. Strands is AWS's lightweight, production-ready framework for building AI agents on Amazon Bedrock, offering built-in tools, multi-agent coordination, full observability, and deployment on AWS Lambda, Fargate, and EC2. The framework is designed to be model-agnostic and integrates directly with Bedrock's managed model inference.
The integration gives Strands-built agents persistent, contextual memory that carries over across sessions. Mem0 handles semantic retrieval via vector embeddings, relationship modeling via graph memory, and automated update operations. When a user interacts with a Strands agent, the agent queries Mem0 for relevant memories before generating a response and posts the new interaction to Mem0 after the turn completes. This makes the agent progressively more personalized without any additional engineering work from the developer.
The Mem0 Series A press release described AWS as having "selected Mem0 as the exclusive memory provider for its Agent SDK," pointing to the depth of the integration. Mem0 also reports handling more memory operations than any other provider across the Bedrock ecosystem.
Separately, AWS published detailed integration guides for combining Mem0 open source with Amazon ElastiCache for Valkey (as the key-value store) and Amazon Neptune Analytics (as the graph store), offering a complete reference architecture for production deployments on AWS infrastructure. This lets teams running on AWS use native managed services for each storage tier rather than operating third-party databases.
The Mem0 documentation also covers using Amazon Bedrock directly as an LLM provider and embeddings backend within the open-source library, making it possible to run the full Mem0 stack entirely within an AWS account.
Mem0 publishes first-party integrations with a broad range of AI frameworks and developer tools.
Agent frameworks: LangChain, LangGraph, LlamaIndex, CrewAI, Microsoft AutoGen, Agno, Camel AI, the OpenAI Agents SDK, Google AI ADK, Mastra, and Vercel AI SDK.
LLM providers: OpenAI, Anthropic (Claude), Azure OpenAI, AWS Bedrock, Google AI (Gemini), Groq, DeepSeek, Mistral, MiniMax, xAI (Grok), Together, Ollama, LM Studio, LiteLLM, and vLLM.
Embedding models: OpenAI Embeddings, Azure OpenAI, AWS Bedrock, Google AI, Vertex AI, Hugging Face, Ollama, LM Studio, and Together.
Vector databases: Qdrant (default), Chroma, PGVector, Milvus, Pinecone, MongoDB Atlas Vector Search, Azure AI Search, Redis Stack, Valkey, Elasticsearch, OpenSearch, Supabase, Upstash Vector, Cloudflare Vectorize, Vertex AI Vector Search, Weaviate, FAISS, Cassandra, AWS S3 Vectors, AWS Neptune Analytics, Databricks Delta Lake, and Turbopuffer.
Developer platforms: Dify, Flowise, Langflow (native integrations), AgentOps, Keywords AI, and Raycast.
Voice and real-time: LiveKit, Pipecat, and ElevenLabs.
AI coding tools: Claude Code, Cursor, and Codex via MCP.
Mem0 also publishes a Model Context Protocol (MCP) server that allows Claude Code and other MCP-compatible tools to use Mem0 memory natively.
Several projects address persistent memory for AI agents. The main alternatives to Mem0 are Letta (formerly MemGPT), Zep, and Cognee.
| Feature | Mem0 | Letta | Zep | Cognee |
|---|---|---|---|---|
| Core approach | Fact extraction | OS-style tiered memory | Temporal knowledge graph | GraphRAG |
| Setup complexity | Low (5 minutes) | High | Medium | Medium |
| Memory overhead | ~7K tokens | Variable | 600K+ tokens | Variable |
| Latency model | Synchronous extraction | Agent-directed | Asynchronous (no latency hit) | Moderate |
| Best fit | Chatbots, personal assistants | Autonomous long-running agents | Enterprise SaaS with temporal queries | Deep knowledge retrieval |
| Licensing | Apache 2.0 | MIT | Source available | Apache 2.0 |
| Managed cloud | Yes | Yes | Yes | Limited |
| Graph memory | Yes (Mem0g) | No | Yes (Graphiti) | Yes |
Letta (formerly MemGPT) uses an architecture inspired by operating system memory management. It divides memory into core memory (a small, fixed-size buffer always present in the context window, analogous to RAM), archival memory (a searchable vector store for long-term storage, analogous to disk), and recall memory (conversation history). Agents in Letta decide autonomously what to move between tiers, which gives them maximum flexibility but also introduces high latency on each operation. Letta is the preferred choice for autonomous agents that need to operate independently for extended periods.
Zep runs as a standalone server and processes memory asynchronously, so adding memories does not slow down the agent's response cycle. Its Graphiti engine uses a temporal knowledge graph that stores timestamps on every node and edge, enabling accurate answers to questions about what an agent knew or believed at a specific point in time. This makes Zep well-suited for enterprise SaaS applications where the history of state changes matters. Zep scored 63.8% on temporal reasoning in LongMemEval evaluations, compared to Mem0's lower scores on that specific task type.
Cognee transforms raw text and documents into structured knowledge graphs, excelling at multi-document reasoning where the agent needs to trace relationships across complex datasets. It is better suited to knowledge base applications than to personal assistant use cases.
The April 2025 research paper published by the Mem0 team evaluated the system on the LOCOMO benchmark, a public dataset designed to stress-test long-term memory systems. LOCOMO uses 10 extended conversations averaging 26,000 tokens each, with approximately 600 dialogue turns per conversation. Questions cover single-hop retrieval (a single recalled fact), multi-hop reasoning (connecting multiple facts), temporal queries (facts that changed over time), and open-domain questions.
| Question type | Mem0 (J-Score) | Mem0g (J-Score) | Notes |
|---|---|---|---|
| Single-hop | 67.13 | 65.71 | Standard variant leads |
| Multi-hop | 51.15 | 47.19 | Both trail full-context |
| Temporal | 55.51 | 58.13 | Graph variant improves on time-sensitive queries |
| Open-domain | 72.93 | 75.71 | Competitive with Zep (76.60) |
Compared to RAG approaches using chunk sizes from 128 to 8,192 tokens, both Mem0 variants delivered approximately 10 to 12 percentage points of relative improvement in LLM-as-Judge scoring. The improvement comes from the structured extraction process; RAG retrieves chunks of raw text while Mem0 retrieves distilled facts, which are shorter and more directly relevant to the query.
The efficiency improvements were more substantial than the accuracy gains. Mem0 achieved a p95 retrieval latency of 1.44 seconds, a 91% reduction from the 17.12-second p95 latency of full-context processing. The median (p50) search latency was 0.148 seconds, the lowest among all methods evaluated. Token consumption dropped from 26,031 tokens per conversation for full-context processing to 1,764 tokens for Mem0, a 93% reduction. These efficiency gains compound at scale: a production agent handling thousands of conversations per day sees order-of-magnitude reductions in LLM API costs.
A separate evaluation by analyst Deepak Gupta, also using the LOCOMO benchmark, found that Mem0 scored 26% higher than OpenAI's built-in memory system on response accuracy overall. OpenAI's system performed adequately on simple preference recall but missed multi-hop details. Mem0's graph-enhanced variant (Mem0g) performed best on tasks requiring complex, long-term context maintenance.
On multi-hop questions, Mem0's standard variant scored 51.15, the weakest category in the evaluation. Vector similarity retrieval identifies semantically similar memories but does not inherently reason across chains of connected facts. The graph-enhanced variant (47.19 on multi-hop) actually performed worse than the standard variant here, which the paper attributes to the graph structure introducing noise when the query does not map cleanly onto graph traversal patterns.
Mem0 is used across several categories of AI applications:
Personal assistants and companions: AI assistants that remember a user's preferences, habits, and history across sessions can offer continuity that stateless LLMs cannot. A fitness coaching app can recall previous workouts and dietary restrictions. A travel agent can remember that a user prefers window seats and boutique hotels.
Customer support: Support bots that recognize returning customers and recall their account history, previous tickets, and stated preferences can resolve issues faster and with less repetition. The system can store the outcome of prior support interactions so agents do not start from scratch.
Sales and CRM agents: Sales tools integrated with Mem0 can recall previous conversations with prospects, remember expressed budget constraints, and note which communication channels each contact prefers.
Healthcare: A patient care assistant can remember chronic conditions, medication preferences, and previous conversations about symptoms. Therapy progress trackers can build a longitudinal picture of a patient's state over multiple sessions. Mem0 offers HIPAA-compliant configurations for healthcare deployments.
Education: An AI tutor using Mem0 can track a student's demonstrated weaknesses, prior explanations that worked or did not work, and areas already covered, enabling adaptive instruction across sessions.
Developer tooling: Mem0 is used within AI coding tools including Claude Code and Cursor, where it can store project-specific preferences, coding patterns, and remembered context about a codebase.
Enterprise knowledge management: Organizational memory can capture shared context across multiple agents and users, allowing enterprise AI deployments to accumulate institutional knowledge over time.
Mem0 Platform offers four tiers:
| Plan | Price | Monthly add requests | Monthly retrieval requests | Notes |
|---|---|---|---|---|
| Hobby | Free | 10,000 | 1,000 | Community support |
| Starter | $19/month | 50,000 | 5,000 | Community support |
| Pro | $249/month | 500,000 | 50,000 | Private Slack, advanced analytics, multi-project |
| Enterprise | Custom | Unlimited | Unlimited | On-prem, SSO, audit logs, SLA, custom integrations |
The platform also offers usage-based pricing for custom volumes and a startup program that provides three months of Pro access at no charge for companies with under $5 million in funding.
The open-source library is free under the Apache 2.0 license. Infrastructure and LLM API costs are the operator's responsibility.
Mem0 reports that thousands of teams, from startups to Fortune 500 companies, run the system in production. The agent frameworks CrewAI, Flowise, and Langflow have native Mem0 integrations, meaning any developer building on those platforms can add Mem0 memory without additional integration work.
The managed platform has over 90,000 registered developers as of late 2025. API call volume grew from 35 million calls in Q1 2025 to 186 million calls in Q3 2025, a roughly 30% monthly growth rate across those two quarters. The Python package has been downloaded more than 14 million times.
The project's growth trajectory mirrors the broader shift toward AI agents that need to maintain state across sessions. As LLM applications mature from one-off chatbots toward persistent AI companions, sales assistants, and autonomous agents, the demand for external memory infrastructure has grown proportionally. Mem0's position as the most widely adopted open-source option in this space, combined with its AWS partnership, has made it something of a default choice for teams starting new agentic projects.
The GitHub Fund's participation in the Series A is notable: it suggests GitHub itself views agent memory as a significant part of the developer tooling ecosystem rather than a niche add-on.
Temporal reasoning accuracy: Mem0's reliance on vector similarity for retrieval gives it a structural weakness on queries about when something was true. Zep's Graphiti engine, which stores timestamps on nodes and edges, scores higher on temporal recall benchmarks. Mem0 partially addresses this through the graph-enhanced Mem0g variant, but the base system does not track time as a first-class property.
Context extraction nuance: Automated fact extraction can miss nuance. A user who says "I hate mushrooms on pizza" might have that simplified to a generic dislike of mushrooms, which could affect subsequent responses about mushroom-based dishes in other contexts. The system extracts explicit facts more reliably than it infers behavioral patterns or implicit preferences.
Multi-hop penalty: On LOCOMO benchmark multi-hop questions, Mem0 scored 51.15, reflecting a non-trivial accuracy gap when the agent needs to chain multiple facts together. Full-context approaches outperform Mem0 on these questions, at the cost of much higher token consumption.
Self-hosted complexity: The managed platform is the product Mem0 has optimized. Self-hosted deployments require provisioning and operating vector databases, managing LLM API integrations, and handling infrastructure maintenance without the analytics and observability features of the cloud platform.
Foundation model competition: OpenAI, Anthropic, and Google are all building native memory capabilities into their own products and APIs. As these improve, third-party memory infrastructure providers face the risk that the problem they solve gets absorbed into the base layer. Mem0's strategic counter-argument, articulated by CEO Taranjeet Singh, is that model providers have no incentive to make memory portable or interoperable across platforms, and an open, neutral memory layer fills a gap those providers will leave.