Arthur Conmy

I'm a 25 year old AI researcher based in London. My main motivation is making the future of AI go well. In the past, I did foundational early work in mechanistic interpretability, but since then I have worked more on post-training.

I am joining Anthropic to work on aligning upcoming models as they are trained.

Announcement
Research Scholar
Next role Member of Technical Staff, Anthropic
Focus Alignment during training
Previous Google DeepMind, 2023-2026

Member of Technical Staff, Anthropic.

I will work on aligning upcoming models as they are trained.

What I mean

Triaging signs of misalignment in training, then looking for root-cause fixes rather than whack-a-mole patches.

For a public example of the direction, see Anthropic's Teaching Claude Why.

Mentorship

I have been a MATS mentor since MATS 6.0. MATS is the main way I mentor people; if you want to work with me, apply through MATS.

MATS usually runs winter and summer programs, so applications may not always be open.

Past work.

2023-2026

Senior Research Engineer, Google DeepMind

Worked on post-training for Gemini and on interpretability tools that are closer to production model work: probes, reward-model bias discovery, reasoning behavior, Gemma Scope, sparse autoencoders, model diffing, and steering.

2022-2023

Early mechanistic interpretability

Worked on circuits and automated circuit discovery, including IOI in GPT-2 Small and ACDC, before the later wave of large-scale SAE work.

Earlier

Redwood, Meta, Cambridge

Redwood Research in 2022-2023; Meta software engineering internship in 2021; mathematics at Trinity College Cambridge, upper first class honours.