Hello! I’m Arthur Conmy. I’m a Research Engineer at Google DeepMind, on the Language Model Interpretability team. I like to work on making interpretability useful, and am also thinking about thinking models and unfaithful Chain-of-Thought at the moment too.

I try and read things sometimes too (sadly not updated in some time).

Research

The below summary is highly outdated, and I have much more recent work from GDM and external in my Google Scholar.

  1. Copy Suppression: Comprehensively Understanding an Attention Head

    Callum McDougall,* Arthur Conmy,* Cody Rushing,* Thomas McGrath, Neel Nanda (* denotes equal contribution): arXiv preprint arXiv:2310.04625 (2023)

  2. Towards Automated Circuit Discovery for Mechanistic Interpretability

    Arthur Conmy, AN Mavor-Parker, A Lynch, S Heimersheim, A Garriga-Alonso: NeurIPS 2023 Spotlight arXiv:2304.14997 (2023)

  3. Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

    K Wang, A Variengien, Arthur Conmy, B Shlegeris, J Steinhardt
    Proceedings of ICLR 2023 arXiv:2211.00593

  4. StyleGAN-induced data-driven regularization for inverse problems

    Arthur Conmy, S Mukherjee, CB Schönlieb
    Proceedings of ICASSP 2022 arXiv:2110.03814

Experience

2023 - 2025 (present) Research Engineer at Google DeepMind, on the Language Model Interpretability team.

2023 - 2023 SERI MATS and Independent Research.

2022 - 2023 Redwood Research.

2021 - 2021 Meta. Software Engineering Intern.

2019 - 2022 Trinity College Cambridge Undergraduate Mathematics. Upper first class honours.

Outside the Tower of London, July 2021.

For the future: 0ee063d506d9319ca159f53a7dd3879e65465e28926a02a35f9c6348ec00f1bf