About
Hello! I’m Arthur Conmy. I’m a Research Engineer at Google DeepMind, on the Language Model Interpretability team. My research interests are in AI interpretability, safety and alignment in that order.
I try and read things sometimes too.
Research
For an up-to-date summary, see my Google Scholar.
-
Copy Suppression: Comprehensively Understanding an Attention Head
Callum McDougall,* Arthur Conmy,* Cody Rushing,* Thomas McGrath, Neel Nanda (* denotes equal contribution): arXiv preprint arXiv:2310.04625 (2023)
-
Towards Automated Circuit Discovery for Mechanistic Interpretability
Arthur Conmy, AN Mavor-Parker, A Lynch, S Heimersheim, A Garriga-Alonso: NeurIPS 2023 Spotlight arXiv:2304.14997 (2023)
-
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
K Wang, A Variengien, Arthur Conmy, B Shlegeris, J Steinhardt
Proceedings of ICLR 2023 arXiv:2211.00593 -
StyleGAN-induced data-driven regularization for inverse problems
Arthur Conmy, S Mukherjee, CB Schönlieb
Proceedings of ICASSP 2022 arXiv:2111.14215
Experience
2023 - 2023 SERI MATS, ‘Independent’ research, Conjecture internship
2022 - 2023 Redwood Research
2021 - 2021 Meta. Software Engineering Intern.
2019 - 2022 Trinity College Cambridge Undergraduate Mathematics. Upper first class honours.
Outside the Tower of London, July 2021.
For the future: 0ee063d506d9319ca159f53a7dd3879e65465e28926a02a35f9c6348ec00f1bf