AI Wiki
Category

AI Safety

89 articles

80,000 Hours

AI Existential Risk, Career Advice, Effective Altruism

AI 2027

AI forecasting, AI scenarios

AI Alignment

Ethics, Machine Learning

AI Parasite

Artificial intelligence terms, Terms

AI Safety Institutes

AI Governance, AI Regulation, Artificial Intelligence

AI Safety Summit

Artificial Intelligence

AI bias

AI ethics, Artificial Intelligence

AI consciousness

AI Ethics, Artificial Intelligence, Philosophy

AI deception

Artificial Intelligence

AI ethics

AI ethics, Artificial Intelligence

AI governance

AI Regulation, Artificial Intelligence, Policy

AI regulation

AI ethics, Artificial Intelligence

AI safety

AI ethics, Artificial Intelligence

AI watermarking

Artificial Intelligence

AdvBench

AI Benchmarks, Adversarial Attacks, Large Language Models

Adversarial attack

Adversarial ML

Algorithmic fairness

AI Ethics, AI Fairness, Machine Learning

Alignment Research Center

AI Organizations, AI Research

Anthropic

AI Companies, Artificial Intelligence, Large Language Models

Apollo Research

AI Alignment, AI Companies, AI Research

Artificial General Intelligence

Artificial Intelligence

BBQ (Bias Benchmark for QA)

AI Benchmarks, Bias and Fairness, Natural Language Processing

Backdooring LLMs

AI security, Artificial intelligence terms, Terms

Bletchley Declaration

AI Governance, AI Policy

California Senate Bill 53

AI Governance, AI Policy, United States

Cari Tuna

Effective Altruism, People

Center for AI Safety

AI Organizations, AI Research

Confirmation Bias

AI Ethics, Cognitive Bias, Data Science

Constitutional AI

Alignment, Anthropic

Constitutional Classifiers

AI Alignment, AI Techniques, Anthropic

Dan Hendrycks

AI Researchers, People

Dario Amodei

AI Researchers, Anthropic

Data poisoning

Machine Learning

EU AI Act

AI ethics, Artificial Intelligence

Effective Altruism

AI Ethics, AI Existential Risk, Ethics

Eliezer Yudkowsky

People

Emergent abilities

Large Language Models, Machine Learning

Executive Order on AI

Artificial Intelligence

Existential risk from AI

Artificial Intelligence

Existential risk from AI

AI Ethics, Artificial Intelligence, Existential Risk

Frontier Model Forum

AI Alignment, AI Industry, AI Standards

Frontier models

AI Policy, Artificial Intelligence, Large Language Models

Future of Humanity Institute (FHI)

Effective Altruism, Existential Risk, Research Institutions

GPQA Diamond

AI Benchmarks, Natural Language Processing

Grounding (artificial intelligence)

Artificial Intelligence, Large Language Models, Natural Language Processing

Guardrails (AI)

Large Language Models

Hallucination

Machine Learning, Natural Language Processing

HaluEval

AI Benchmarks, Hallucination, Large Language Models

HarmBench

AI Benchmarks, Large Language Models, Red Teaming

Holden Karnofsky

Effective Altruism, People

Humanity's Last Exam

AI Benchmarks, AI Evaluations

Ilya Sutskever

AI Researchers, Deep Learning, OpenAI

International AI Safety Report

AI Governance, AI Policy

Jailbreak (artificial intelligence)

Artificial Intelligence, Large Language Models

JailbreakBench

AI Benchmarks, Large Language Models, Red Teaming

LessWrong

AI Existential Risk, Online Communities, Rationality

MACHIAVELLI (benchmark)

AI Alignment, AI Benchmarks, AI Ethics

METR (Model Evaluation and Threat Research)

AI Evaluations, Nonprofits

Machine Intelligence Research Institute

AI Organizations, AI Research

Mechanistic interpretability

Interpretability