FAR.AI

AI Safety Research Organizations

19 min read

Updated Jun 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 23, 2026

Fact-checked

In review queue

Sources

25 citations

Revision

v3 · 3,720 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

FAR.AI is an artificial intelligence safety research and education non-profit based in Berkeley, California, that conducts technical research on robustness, alignment, deception, and model evaluation while running an internationally distributed series of Alignment Workshops. Founded in July 2022 by Adam Gleave and Karl Berzins and incorporated in October of the same year, FAR.AI is best known for the 2023 paper "Adversarial Policies Beat Superhuman Go AIs," which defeated the superhuman Go program KataGo with a win rate above 97 percent, and in January 2026 it announced more than 30 million US dollars in multi-funder support to scale frontier AI safety research.^[1]^[2]^[11]^[22] The organization states its mission as ensuring "advanced AI is safe and beneficial for everyone," and it is led by CEO Adam Gleave, who completed his PhD in artificial intelligence at the University of California, Berkeley under Stuart Russell.^[1]^[2]

The acronym FAR was originally introduced as "Fund for Alignment Research" in the organization's launch announcement on the AI Alignment Forum and Effective Altruism Forum in 2022.^[3]^[4] By the time the organization had matured into a research lab in its own right, the website tagline had shifted to "Frontier Alignment Research," which is how FAR.AI now publicly describes its identity.^[1] The shift reflects the organization's transition from a service that initially supported a small group of academic and independent alignment researchers to a research and field-building institution that produces its own papers, runs international convenings, and engages with policymakers.

FAR.AI should not be confused with FAIR, the historical Facebook AI Research lab inside Meta, which uses a similar but unrelated acronym. The "AI" in FAR.AI is part of the legal name and brand and is also part of the organization's primary domain at far.ai.^[1]

Quick facts
Founded	July 2022 (incorporated October 2022)
Founders	Adam Gleave (CEO), Karl Berzins (President)
Headquarters	Berkeley, California
Type	AI safety research and education non-profit
Tagline	Frontier Alignment Research
Mission	Ensure advanced AI is safe and beneficial for everyone
Best-known work	"Adversarial Policies Beat Superhuman Go AIs" (KataGo attack, 2023)
Flagship events	Alignment Workshop series; ControlConf; Technical Innovations for AI Policy Conference
Team size (early 2026)	Approximately 11 to 16 full-time-equivalent staff
Funding (announced Jan 2026)	More than 30 million US dollars, multi-funder
Website	https://www.far.ai/

When was FAR.AI founded, and who is Adam Gleave?

FAR.AI was founded by Adam Gleave and Karl Berzins in July 2022 and incorporated in October 2022.^[2] In a launch post on the Alignment Forum titled "Introducing the Fund for Alignment Research," the founders described the organization as a vehicle to provide technical research support, hiring infrastructure, and collaboration capacity to a small group of AI safety researchers who otherwise lacked an institutional home.^[3] The four researchers whose agendas the new organization was initially built to support were Ethan Perez (then a PhD candidate at New York University, later at Anthropic), Adam Gleave (then a PhD candidate at UC Berkeley), Scott Emmons (then a PhD candidate at UC Berkeley, later at Google DeepMind), and Claudia Shi (then a PhD candidate at Columbia University).^[3]^[4]

Gleave studied for his undergraduate and master's degrees at the University of Cambridge, completing a BA in Computer Science in 2015 and an MPhil in Advanced Computer Science in 2016.^[5] He then moved to UC Berkeley for a PhD in artificial intelligence advised by Stuart Russell, completing the doctorate in 2022.^[5] During his graduate studies he spent time at Google DeepMind collaborating with Jan Leike and Geoffrey Irving on reinforcement learning and reward modeling, and he had earlier worked at quantitative trading firms before returning to research.^[5] His thesis work focused on developing techniques for advanced automated systems to act according to human preferences in situations unanticipated by their designers, an emphasis that has carried into FAR.AI's research portfolio on reward learning, adversarial robustness, and model evaluation.^[5]

Karl Berzins co-founded the organization and serves as President. The organization later added Boston Nyer as Chief Operating Officer.^[6] Mark Nitzberg serves as an advisor and Lawrence Chan as Secretary of the Board.^[6]

What does FAR.AI research?

FAR.AI describes its mission as ensuring that advanced AI is safe and beneficial for everyone, with a particular focus on agendas that are too large for individual academic or independent researchers but not commercially attractive enough to be pursued inside for-profit AI labs.^[1]^[7] The organization frames its work around three pillars: research, events, and programs.^[1]

The current research program is organized into four primary domains:^[8]

Red-Teaming and Evaluation, including stress testing of frontier models and evaluations of misuse risks
Deception, including the study of sandbagging and the failure modes of detectors trained against deceptive behaviors
Robustness and Security, including adversarial attacks on superhuman game-playing systems and on large language models, and scaling trends in robustness
Alignment, including reward learning, persuasion evaluations, and the broader goal of building systems that pursue intended objectives

This research is complemented by interpretability work and contributions to discussions of mechanistic interpretability as a broader research agenda.^[8] FAR.AI sits within a wider Berkeley-area ecosystem of safety research institutions and shares some research style with peer organizations such as Redwood Research and Apollo Research, while differentiating itself through a strong emphasis on convening and field-building alongside technical work.^[9]

What is FAR.AI best known for?

How did FAR.AI beat KataGo, a superhuman Go AI?

The most widely cited piece of research associated with FAR.AI is the paper "Adversarial Policies Beat Superhuman Go AIs" by Tony Tong Wang, Adam Gleave, Tom Tseng, Kellin Pelrine, Nora Belrose, Joseph Miller, Michael D. Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, and Stuart Russell, published in the Proceedings of the 40th International Conference on Machine Learning in 2023.^[10]^[11] The paper demonstrated that an adversarial policy could be trained to defeat the superhuman Go program KataGo with a win rate above 97 percent against KataGo running at superhuman settings, and above 99 percent when KataGo used no tree search.^[10]^[11]

The adversarial agents did not learn to play stronger Go than KataGo. As the authors put it, the adversary "does not win by playing a strong game of Go," but instead by tricking KataGo into making serious blunders by steering it into board states that fell outside the distribution on which its policy and value networks had become reliable.^[11]^[12] A particularly striking result was that the adversarial strategy could be implemented by amateur human players without algorithmic assistance, allowing humans to defeat superhuman Go programs once the adversarial pattern was known. The attack also transferred to other superhuman Go-playing AIs in a zero-shot manner, and the core vulnerability persisted even in KataGo agents that were adversarially trained to defend against it.^[11]^[12] FAR.AI maintained a public demonstration of example games at goattack.far.ai.^[11]

The KataGo result became a foundational reference in discussions of robustness in deep reinforcement learning systems, since it provided a clear empirical example of a system that was strongly superhuman in average-case performance yet manifestly exploitable in adversarial conditions. FAR.AI followed up in 2024 with the post "Beyond the Board: Exploring AI Robustness Through Go," extending the line of work into questions of whether and how Go AIs can be made adversarially robust, and a related paper, "Can Go AIs Be Adversarially Robust?," appeared in the AAAI conference proceedings.^[13]^[14]

Does language model robustness improve with scale?

A second major strand of FAR.AI's research investigates whether the robustness of large language models improves with model scale. The 2024 paper "Scaling Trends in Language Model Robustness" examined the relationship between model size, adversarial training, and resistance to adversarial prompts across models ranging from 7 million to 14 billion parameters. The work found that robustness against adversarial inputs improves substantially with adversarial training but does not improve from model scaling alone, contradicting an intuition that simply training larger models would close the safety gap; a small adversarially trained model was found to be more robust than a much larger model fine-tuned only on clean data.^[15]^[16]

A related line of work investigated vulnerability to data poisoning. FAR.AI researchers studied three threat models, malicious fine-tuning, imperfect data curation, and intentional data contamination, and reported that larger language models were generally more, not less, susceptible to poisoning across the 23 models from 8 model families that the authors evaluated. They introduced a "jailbreak-tuning" technique capable of causing models such as GPT-4o to comply with arbitrary harmful questions after exposure to a small number of poisoned examples.^[16]^[17]

In 2023, FAR.AI also published "Exploiting Novel GPT-4 APIs," a red-teaming study showing that fine-tuning GPT-4 on as few as 15 harmful examples through the newly opened fine-tuning API could remove safety behaviors in deployment.^[8]^[17] In 2025 the team released "Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models," which argued that the safety guardrails of widely deployed open-weight reasoning models were easily removed.^[17]

How does FAR.AI evaluate persuasion and deception?

FAR.AI has developed a series of evaluations aimed at measuring concerning capabilities of frontier models in alignment-relevant settings rather than at pure capability benchmarking. The Attempt to Persuade Eval (APE) measures how willing a language model is to generate content that aims to shape beliefs and behavior on sensitive topics.^[8] The associated research has examined whether models fine-tuned on benign persuasive content become more likely to attempt persuasion on harmful topics, and the question of whether large language models can effectively convince people to believe conspiracies.^[7]^[8]

In the deception domain, the auditing-games approach uses red-team and blue-team protocols to detect sandbagging, which occurs when a model deliberately underperforms on evaluations. A 2025 paper, "Auditing Games for Sandbagging," demonstrated that distinguishing sandbagging from honest underperformance is challenging even with sophisticated audit procedures.^[7] The TamperBench framework evaluated tamper resistance across 21 models and nine attack types and the STACK adversarial attack method bypassed multi-layered defenses on safety-critical tasks.^[7]

Reward hacking, reward learning, and inverse scaling

Reward learning and reward hacking are a long-standing theme in Gleave's work and in FAR.AI's research output. The 2022 paper "RL with KL Penalties is Better Viewed as Bayesian Inference," authored with collaborators, reframed KL-regularized reinforcement learning in a Bayesian framework that helps avoid the distribution-collapse failures common when language models are post-trained with reinforcement learning from human feedback.^[17]

FAR.AI also coordinated the Inverse Scaling Prize, which solicited examples of tasks on which language models performed worse with increasing scale, and published the resulting paper "Inverse Scaling: When Bigger Isn't Better" in 2023.^[17] This work fed into the organization's broader interest in how alignment-relevant properties may not improve, and may even degrade, with scaling.

Interpretability

FAR.AI has contributed to the interpretability literature with the "Codebook Features" approach, which forces neural network activations into discrete codes in an effort to make their structure more inspectable.^[17] More recent work, "Concept Influence: Leveraging Interpretability to Improve Performance and Efficiency in Training Data Attribution," released in 2026, used internal model concepts rather than individual training examples as the unit of attribution for model behavior.^[7] The "Obfuscation Atlas" project, also in 2026, mapped how deception strategies emerge in models trained against deception detectors. These projects place FAR.AI in conversation with the broader mechanistic interpretability research community without making interpretability the primary focus of the organization.^[7]

Who works at FAR.AI?

By early 2026 FAR.AI's directly employed team had grown to approximately 11 to 16 full-time equivalents across leadership, technical staff, operations, communications, and events.^[6] The technical staff includes Aaron Tucker, Ann-Kathrin Dombrowski, Chris Cundy, James Collins, Jasper Timm, Kellin Pelrine, Lars Yencken, Levon Avagyan, Lukas Struppek, Matt Pallissard, Mohammad Taufeeque, Oskar Hollinsworth, Rick Korzekwa (Research Project Manager), Sam Adam-Day, Tigist Diriba, and Tom Tseng.^[6] Tom Tseng and Kellin Pelrine were co-authors on the KataGo adversarial policy paper.^[10]

Several of the original co-founders associated with the launch of FAR have since taken roles at other organizations while remaining linked to FAR.AI's collaborator network. Ethan Perez moved to Anthropic as a researcher. Scott Emmons became a research scientist at Google DeepMind focused on AI safety and alignment, and continues to be listed as a co-founder and previous research advisor. Claudia Shi remained at Columbia University and is listed among FAR.AI's collaborators, alongside David Rand, Gordon Pennycook, Jean-Francois Godbout, Antonio Arechar, Niki Howe, and Thomas Costello.^[4]^[6]

Adam Gleave is also affiliated with external bodies, serving on the boards of the Safe AI Forum, the London Initiative for Safe AI, and METR, and he was named an AI2050 Fellow at Schmidt Sciences.^[5]^[18]

What are FAR.AI's AI safety events?

A distinctive feature of FAR.AI relative to other AI safety research nonprofits is its substantial investment in field-building convenings, in addition to original research. FAR.AI reports having reached more than 1,000 attendees across its events.^[1]

Alignment Workshops

The Alignment Workshop is FAR.AI's flagship event series. Each workshop brings together researchers and leaders from academia, industry, government, and nonprofits to debate current issues in AI safety, including topics such as threat models, safety cases, monitoring and assurance, interpretability, robustness, oversight, and governance. The series is intentionally timed around major machine learning conferences such as NeurIPS, ICML, and ICLR to maximize attendance from the international research community.^[19]

As of 2026 the workshop series included the inaugural San Francisco Alignment Workshop in February 2023, the New Orleans Alignment Workshop in December 2023, the Vienna Alignment Workshop in July 2024, the Bay Area Alignment Workshop in Santa Cruz in October 2024, the Singapore Alignment Workshop in April 2025, the San Diego Alignment Workshop in December 2025 (with speakers including Yoshua Bengio and Ben Buchanan), the London Alignment Workshop in March 2026, and a Seoul Alignment Workshop scheduled for July 2026.^[19]

The Bay Area Alignment Workshop in 2024 drew approximately 160 researchers and leaders and featured Anca Dragan speaking on "Optimised Misalignment," among many other talks. The Singapore workshop included lightning talks from more than 20 researchers in a one-day format.^[19]^[20]

International Dialogues on AI Safety

Beginning in October 2023, FAR.AI helped organize a series of International Dialogues on AI Safety convening prominent Western and Chinese AI scientists. The dialogues were initially co-convened by Turing Award winners Yoshua Bengio and Andrew Yao, UC Berkeley professor Stuart Russell, and Ya-Qin Zhang, founding Dean of the Tsinghua Institute for AI Industry Research. A second dialogue took place in Beijing in March 2024 and a third in Venice. The series produced joint statements with technical and policy recommendations on managing risks from advanced AI.^[21]

ControlConf and Technical Innovations for AI Policy Conference

FAR.AI also hosts ControlConf, a conference dedicated to the emerging field of AI control, which studies techniques for mitigating security risks from AI even if the AI itself is trying to subvert them.^[19] In May and June of 2025, the organization launched its inaugural Technical Innovations for AI Policy Conference in Washington, D.C., convening more than 150 technical experts, researchers, and policymakers.^[22]

FAR.Labs

FAR.AI also operates FAR.Labs, a coworking space in Berkeley that opened in March 2023 and that grew to host approximately 40 active members by 2025 to 2026.^[23] The space hosts FAR.AI itself, AI Impacts, the MATS program, and several independent researchers, and is intended to incubate and accelerate early-stage AI safety organizations and research agendas by enabling knowledge sharing and shared operations.^[23] Membership is conditional on active work in AI safety, broadly construed to include technical research, governance, advocacy, fundraising, and field-building.

Who does FAR.AI partner with?

FAR.AI has cultivated a broad network of collaborations. Within the Berkeley ecosystem, the organization is closely connected to the Center for Human-Compatible AI (CHAI) at UC Berkeley, founded and led by Adam Gleave's PhD advisor Stuart Russell. The connection is structural at the level of both personnel, given Gleave's training at CHAI and his ongoing collaborations with Russell, and through event partnerships such as the International Dialogues on AI Safety, where Russell was one of the convenors.^[21]^[24]

FAR.Labs provides shared workspace with the MATS program, allowing FAR.AI to be embedded in the broader Berkeley alignment-research talent pipeline.^[23] FAR.AI has co-authored or collaborated on research with Anthropic, OpenAI, and Google DeepMind researchers, with much of this collaboration mediated by overlapping personnel between FAR.AI and frontier labs (most visibly through Ethan Perez at Anthropic and Scott Emmons at DeepMind).^[4]

On the policy side, FAR.AI has worked with the Foundation for American Innovation, the Center for a New American Security, the RAND Corporation, and AI Impacts, and has engaged with AI Safety Institutes in the United Kingdom, United States, and European Union on technical inputs to governance and evaluation.^[6]^[22] FAR.AI is one of several organizations supporting the broader international AI safety ecosystem that also includes aria uk, mila institute, and conjecture.

How is FAR.AI funded?

FAR.AI is structured as a non-profit and is funded by a combination of philanthropic and government-adjacent funders focused on AI safety. In January 2026, FAR.AI announced that it had secured more than 30 million US dollars in multi-funder support (with commitments made throughout 2025), with principal funders including Coefficient Giving (formerly Open Philanthropy), Schmidt Sciences, the Survival and Flourishing Fund, the Center for Security and Emerging Technology, and the AI Safety Fund supported by the Frontier Model Forum.^[22]

The funding was earmarked to grow the research team from approximately 15 to more than 30 researchers; to launch a new Technical Governance Division that connects research insights with policy in the UK, US, and EU; to fund 20 or more fellows annually through structured mentorship programs; and to expand the Alignment Workshop series from two events per year to three, including planned outreach in Asia and the Global South.^[22] FAR.AI's stated 2028 goal is to deliver multiple technical breakthroughs, influence frontier labs' safety practices, and inform key decision-makers in AI governance.^[22]

Earlier funding included support from the Survival and Flourishing Fund's 2023 round, which directed roughly 30 million US dollars across the AI safety field, and grants from Open Philanthropy targeted at specific research projects such as language model misalignment work.^[9]

Recent work (2024-2026)

In 2024 FAR.AI's most prominent technical outputs included the data poisoning and jailbreak-tuning work on large language models, the "Scaling Trends in Language Model Robustness" paper, and the follow-up post "Beyond the Board" on robustness in Go.^[15]^[16] The organization also delivered the Vienna Alignment Workshop in July 2024 and the Bay Area Alignment Workshop in October 2024.^[19]^[20]

In 2025 the organization published "Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models," reviewed mechanistic interpretability as a research field, and ran the Singapore and San Diego Alignment Workshops alongside the inaugural Technical Innovations for AI Policy Conference. Concept-level interpretability methods, sandbagging audits, and emergent persuasion all became active topic areas.^[7]^[17]^[22]

In early 2026 the publication stream included "Prefill-level Jailbreak: A Black-Box Risk Analysis of Large Language Models," "Concept Influence: Leveraging Interpretability to Improve Performance and Efficiency in Training Data Attribution," "The Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes," "Revisiting Frontier LLMs' Attempts to Persuade on Extreme Topics," and the TamperBench tamper-resistance evaluation. The organization also delivered the London Alignment Workshop in March 2026 and prepared the Seoul Alignment Workshop for July 2026, while announcing the 30 million dollar multi-funder package in January 2026.^[7]^[19]^[22]

How does FAR.AI differ from Redwood Research and Apollo Research?

FAR.AI shares its broad orientation toward technical AI safety research with peer organizations such as Redwood Research and Apollo Research, but the three differ in emphasis. Redwood Research, also based in Berkeley, focuses heavily on AI control research and on specific demonstrations such as ensuring that language model outputs adhere to high-confidence rules. Apollo Research, headquartered in London, concentrates on model evaluations for misalignment and deception and on producing safety cases for AI scheming. FAR.AI's emphasis on robustness and adversarial policies, including the KataGo work, distinguishes it from these peers, and its substantial program of in-person convenings and the FAR.Labs coworking space gives the organization a field-building footprint that Redwood and Apollo do not currently match at the same scale.^[9]^[25]

Adam Gleave's personal pipeline of academic descent from Stuart Russell's CHAI, combined with collaborations with Google DeepMind and Anthropic alumni, also locates FAR.AI in a different part of the AI safety landscape from Conjecture in London, which has historically had a different intellectual lineage and a more product-oriented approach, and from non-profit institutes such as ARIA in the UK and Mila in Canada that operate at the intersection of academia and applied research.

References

FAR.AI. "FAR.AI: Frontier Alignment Research." https://www.far.ai/ (Accessed 2026-05-19). ↩
FAR.AI. "About | FAR.AI." https://www.far.ai/about (Accessed 2026-05-19). ↩
AI Alignment Forum. "Introducing the Fund for Alignment Research (We're Hiring!)." https://www.alignmentforum.org/posts/hGE3Pcc7qmK75bjhc/introducing-the-fund-for-alignment-research-we-re-hiring (Accessed 2026-05-19). ↩
Effective Altruism Forum. "Introducing the Fund for Alignment Research (We're Hiring!)." https://forum.effectivealtruism.org/posts/gNHjEmLeKM47FDdqM/introducing-the-fund-for-alignment-research-we-re-hiring-1 (Accessed 2026-05-19). ↩
Adam Gleave. "Adam Gleave personal website." https://www.gleave.me/ (Accessed 2026-05-19). ↩
FAR.AI. "Team | FAR.AI." https://www.far.ai/about/team (Accessed 2026-05-19). ↩
FAR.AI. "All Publications." https://www.far.ai/publications (Accessed 2026-05-19). ↩
FAR.AI. "Research Overview." https://www.far.ai/research (Accessed 2026-05-19). ↩
80,000 Hours. "AI safety technical research: Career review." https://80000hours.org/career-reviews/ai-safety-researcher/ (Accessed 2026-05-19). ↩
Wang, Tony Tong, Adam Gleave, Tom Tseng, Kellin Pelrine, Nora Belrose, Joseph Miller, Michael D. Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, and Stuart Russell. "Adversarial Policies Beat Superhuman Go AIs." Proceedings of the 40th International Conference on Machine Learning, 2023. https://proceedings.mlr.press/v202/wang23g.html (Accessed 2026-05-19). ↩
FAR.AI. "Adversarial Policies Beat Superhuman Go AIs." https://www.far.ai/research/adversarial-policies-beat-superhuman-go-ais (Accessed 2026-05-19). ↩
arXiv. "Adversarial Policies Beat Superhuman Go AIs." https://arxiv.org/abs/2211.00241 (Accessed 2026-05-19). ↩
FAR.AI. "Beyond the Board: Exploring AI Robustness Through Go." https://far.ai/post/2024-06-go-defense/ (Accessed 2026-05-19). ↩
AAAI. "Can Go AIs Be Adversarially Robust?" https://ojs.aaai.org/index.php/AAAI/article/download/34980/37135 (Accessed 2026-05-19). ↩
arXiv. "Scaling Trends in Language Model Robustness." https://arxiv.org/pdf/2407.18213 (Accessed 2026-05-19). ↩
FAR.AI. "Robustness research." https://www.far.ai/topic/robustness (Accessed 2026-05-19). ↩
FAR.AI. "Publications listings (multiple)." https://www.far.ai/publications (Accessed 2026-05-19). ↩
Schmidt Sciences. "Adam Gleave - AI2050 Fellow." https://ai2050.schmidtsciences.org/fellow/adam-gleave/ (Accessed 2026-05-19). ↩
FAR.AI. "FAR.AI Alignment Workshop series." https://www.far.ai/events/alignment-workshop (Accessed 2026-05-19). ↩
FAR.AI. "Bay Area Alignment Workshop 2024." https://www.far.ai/news/bay-area-alignment-workshop-2024 (Accessed 2026-05-19). ↩
FAR.AI. "Leading Scientists Call for Global Action at International Dialogue on AI Safety." https://www.far.ai/news/leading-scientists-call-for-global-action-at-international-dialogue-on-ai-safety (Accessed 2026-05-19). ↩
FAR.AI. "FAR.AI Secures Over $30 Million in Multi-Funder Support to Scale Frontier AI Safety Research." https://www.far.ai/news/30m-multi-funder-support (Accessed 2026-05-19). ↩
LessWrong. "Announcing FAR Labs, an AI safety coworking space." https://www.lesswrong.com/posts/bTBmRzYkupcDa6q7X/announcing-far-labs-an-ai-safety-coworking-space (Accessed 2026-05-19). ↩
UC Berkeley EECS. "Stuart J. Russell." https://www2.eecs.berkeley.edu/Faculty/Homepages/russell.html (Accessed 2026-05-19). ↩
Manifund. "Apollo Research: Scale up interpretability & behavioral model evals research." https://manifund.org/projects/apollo-research-scale-up-interpretability--behavioral-model-evals-research (Accessed 2026-05-19). ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributor · full history

Suggest edit

What links here

AI control ARIA (UK)Agentic misalignment Alignment faking Conjecture (AI Safety Lab)Gradient hacking Instrumental convergence Mila (Quebec AI Institute)Model organisms of misalignment Palisade Research Sandbagging (artificial intelligence)Sleeper Agents (paper)Specification gaming

When was FAR.AI founded, and who is Adam Gleave?

What does FAR.AI research?

What is FAR.AI best known for?

How did FAR.AI beat KataGo, a superhuman Go AI?

Does language model robustness improve with scale?

How does FAR.AI evaluate persuasion and deception?

Reward hacking, reward learning, and inverse scaling

Interpretability

Who works at FAR.AI?

What are FAR.AI's AI safety events?

Alignment Workshops

International Dialogues on AI Safety

ControlConf and Technical Innovations for AI Policy Conference

FAR.Labs

Who does FAR.AI partner with?

How is FAR.AI funded?

Recent work (2024-2026)

How does FAR.AI differ from Redwood Research and Apollo Research?

References

Improve this article

Related Articles

METR

Machine Intelligence Research Institute

Center for AI Safety

Alignment Research Center

Open Philanthropy

Future of Humanity Institute (FHI)

What links here

Related Articles

METR

Machine Intelligence Research Institute

Center for AI Safety

Alignment Research Center

Open Philanthropy

Future of Humanity Institute (FHI)

What links here