Yuanzhe (Richard) Pang

Hello. I am a research scientist on the Llama research team at Meta where I work on language model post-training (e.g., alignment). I am currently located in New York.

I completed my Ph.D. in computer science at the Courant Institute of Mathematical Sciences at New York University. At NYU, I was a member of the Machine Learning for Language (ML²) group (subset of the CILVR group) and I was advised by Profs. He He and Kyunghyun Cho. In recent years, my research focuses on natural langauge processing and machine learning; specifically, my interests include text generation (learning from reward, learning from feedback, machine translation, dialogue, decoding, etc.), reasoning, and human-machine collaboration in general. I have also been learning more about multi-modal NLP.

Prior to my current position, from October 2022 to July 2024, I was (on-and-off) a Visiting Researcher on the Fundamental AI Research (FAIR) team at Meta, mentored by Dr. Jason Weston. I also spent two summers (2020, 2021) at Google. Prior to my Ph.D. studies, I graduated from the University of Chicago (B.S. in mathematics and B.S. in computer science). At Toyota Technological Institute at Chicago (TTIC) and the University of Chicago, I worked on text generation and structured prediction with Prof. Kevin Gimpel. In addition, I also hope to advance access to cutting-edge NLP: I co-organized and co-instructed NYU AI School mainly targeting NYC-area undergrads (2022, 2023); I also co-instructed the Deep Learning for NLP course at the African Masters of Machine Intelligence (2022).

[google scholar] [dblp] [linkedin] [mail] [twitter] [abbreviations]

Research

OVERVIEW OF SELECTED RESEARCH DIRECTIONS

Learning from rewards in text generation: GOLD (offline RL), amortized noisy channel NMT (off-policy RL & knowledge distillation), reward gaming (on-policy RL), AgreeSum (on-policy RL for multi-doc summarization), ENGINE for non-autoregressive NMT ("soft" knowledge distillation), implicit feedback in dialogue (extracting implicit reward from deployment data), self-rewarding LLM, iterative reasoning preference optimization (IRPO), etc.
Reasoning: PrOntoQA-OOD (deductive reasoning); IRPO above is also related to reasoning
Human-machine collaboraton (benchmarks and methods): QuALITY (long-document QA; related: SQuALITY long-document summarization), GPQA (graduate-level Google-proof QA); I'm working on developing methods as well

Publications and preprints (2024-)

Iterative Reasoning Preference Optimization
Richard Yuanzhe Pang, Weizhe Yuan, Kyunghyun Cho, He He, Sainbayar Sukhbaatar, Jason Weston
To appear in Proceedings of NeurIPS 2024
[paper] [abstract] [bibtex] | by others: [notable use cases (ESM3, Llama 3)]

Self-Taught Evaluators
Tianlu Wang, Ilia Kulikov, Olga Golovneva, Ping Yu, Weizhe Yuan, Jane Dwivedi-Yu, Richard Yuanzhe Pang, Maryam Fazel-Zarandi, Jason Weston, Xian Li
Preprint, August 2024
[paper] [abstract] [code] [model] [synthetic data] [bibtex]

Self-Rewarding Language Models
Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Xian Li, Sainbayar Sukhbaatar, Jing Xu, Jason Weston
In Proceedings of ICML 2024
[paper] [abstract] [bibtex] | by others: [press and other mentions]

GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, Samuel R. Bowman
In Proceedings of Conference on Language Modeling (COLM) 2024; spotlight (2% of submissions)
[paper] [abstract] [data & code] [openreview] [bibtex] | by others: [press, mentions, and notable use cases]

An Introduction to Vision-Language Modeling
Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie, Pietro Astolfi, Reyhane Askari Hemmat, Jun Chen, Kushal Tirumala, Rim Assouel, Mazda Moayeri, Arjang Talattof, Kamalika Chaudhuri, Zechun Liu, Xilun Chen, Quentin Garrido, Karen Ullrich, Aishwarya Agrawal, Kate Saenko, Asli Celikyilmaz, Vikas Chandra
Preprint, May 2024
[paper] [abstract] [bibtex]

Leveraging Implicit Feedback from Deployment Data in Dialogue
Richard Yuanzhe Pang, Stephen Roller, Kyunghyun Cho, He He, Jason Weston
In Proceedings of EACL 2024
[paper] [abstract] [bibtex]

Publications (2021-2023) — main focus: text generation (learning from rewards, RL), long-document understanding (question answering, summarization), reasoning

Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples
Abulhair Saparov, Richard Yuanzhe Pang, Vishakh Padmakumar, Nitish Joshi, Seyed Mehran Kazemi, Najoung Kim*, He He*
In Proceedings of NeurIPS 2023
[paper] [abstract] [poster at ICML 2023 Knowledge and Logical Reasoning Workshop] [bibtex]

Extrapolative Controlled Sequence Generation via Iterative Refinement
Vishakh Padmakumar, Richard Yuanzhe Pang, He He, Ankur P. Parikh
In Proceedings of ICML 2023
[paper] [abstract] [bibtex]

Reward Gaming in Conditional Text Generation
Richard Yuanzhe Pang, Vishakh Padmakumar, Thibault Sellam, Ankur P. Parikh, He He
In Proceedings of ACL 2023
[paper] [abstract] [15-min talk] [slides] [bibtex]

What Do NLP Researchers Believe? Results of the NLP Community Metasurvey
Julian Michael, Ari Holtzman, Alicia Parrish, Aaron Mueller, Alex Wang, Angelica Chen, Divyam Madaan, Nikita Nangia, Richard Yuanzhe Pang, Jason Phang, Samuel R. Bowman
In Proceedings of ACL 2023
[paper] [abstract] [website] [bibtex] | by others: [press]

SQuALITY: Building a Long-Document Summarization Dataset the Hard Way
Alex Wang, Richard Yuanzhe Pang, Angelica Chen, Jason Phang, Samuel R. Bowman
In Proceedings of EMNLP 2022
[paper] [abstract] [data] [code] [bibtex] | by others: [zeroscrolls]

Amortized Noisy Channel Neural Machine Translation
Richard Yuanzhe Pang, He He, Kyunghyun Cho
In Proceedings of INLG 2022; best presentation award
tl;dr: amortizing the inference cost of "beam search and rerank" – learning to rerank without explicitly reranking
[paper] [abstract] [talk] [poster] [bibtex]

QuALITY: Question Answering with Long Input Texts, Yes!
Richard Yuanzhe Pang^*, Alicia Parrish^*, Nitish Joshi^*, Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He, Samuel R. Bowman
In Proceedings of NAACL 2022
[paper] [abstract] [data] [code] [leaderboard] [15-min live talk] [slides] [bibtex] | by others: [tfds] [forecast] [press mention by Science] [scrolls] [zeroscrolls]

Token Dropping for Efficient BERT Pretraining
Le Hou^*, Richard Yuanzhe Pang^*, Tianyi Zhou, Yuexin Wu, Xinying Song, Xiaodan Song, Denny Zhou
In Proceedings of ACL 2022
[paper] [abstract] [code] [talk] [bibtex] | by others: [press] [improvement]

AgreeSum: Agreement-Oriented Multi-Document Summarization
Richard Yuanzhe Pang^*, Adam D. Lelkes^*, Vinh Q. Tran^*, Cong Yu
In Findings of ACL 2021
[paper] [abstract] [data] [short talk] [bibtex]

Comparing Test Sets with Item Response Theory
Clara Vania^*, Phu Mon Htut^*, William Huang^*, Dhara Mungra, Richard Yuanzhe Pang, Jason Phang, Haokun Liu, Kyunghyun Cho, Samuel R. Bowman
In Proceedings of ACL 2021
[paper] [abstract] [code] [bibtex]

Text Generation by Learning from Demonstrations
Richard Yuanzhe Pang, He He
In Proceedings of ICLR 2021
tl;dr: a high-precision-generation training objective to address the train/test objective mismatch and history mismatch
[paper] [abstract] [openreview] [poster] [slides] [code] [discussion] [bibtex] | by others: [ICLR blog by other authors] [GOLD in AlphaCode, Science] [GOLD as the main learning objective in AlphaCode 2, Dec 2023]

Publications (2019-2020) — main focus: text generation (textual style transfer, non-autoregressive translation, decoding), energy-based network in NLP

Improving Joint Training of Inference Networks and Structured Prediction Energy Networks
Lifu Tu, Richard Yuanzhe Pang, Kevin Gimpel
In Proceedings of EMNLP 2020 Workshop on Structured Prediction for NLP (SPNLP); spotlight paper
tl;dr: improving fast approximate+amortized inference for energy-based models in NLP structured prediction
[paper] [abstract] [my slides] [bibtex]

Consistency of a Recurrent Language Model With Respect to Incomplete Decoding
Sean Welleck^*, Ilia Kulikov^*, Jaedeok Kim, Richard Yuanzhe Pang, Kyunghyun Cho
In Proceedings of EMNLP 2020
Also appearing in the non-archival DeepMath 2020
[paper] [abstract] [code] [bibtex]

ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine Translation
Lifu Tu, Richard Yuanzhe Pang, Sam Wiseman, Kevin Gimpel
In Proceedings of ACL 2020
tl;dr: a "soft" form of knowledge distillation for non-autoregressive MT
[paper] [abstract] [code] [bibtex]

Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?
Yada Pruksachatkun^*, Jason Phang^*, Haokun Liu^*, Phu Mon Htut^*, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, Samuel R. Bowman
In Proceedings of ACL 2020
[paper] [abstract] [bibtex]

Unsupervised Evaluation Metrics and Learning Criteria for Non-Parallel Textual Transfer
Richard Yuanzhe Pang, Kevin Gimpel
In Proceedings of EMNLP 2019 Workshop on Neural Generation and Translation (WNGT)
tl;dr: proposing more dimensions for textual transfer evaluation metrics, and losses that target them
[paper] [supplementals] [abstract] [poster] [bibtex]

The Daunting Task of Real-World Textual Style Transfer Auto-Evaluation
Richard Yuanzhe Pang
Extended abstract in EMNLP 2019 Workshop on Neural Generation and Translation (WNGT); abstract in Proceedings of the Workshop on Noisy User-generated Text (W-NUT)
tl;dr: an opinion piece arguing that the research on textual style transfer and its evaluation are going astray
[paper] [abstract] [poster] [bibtex]

More info: [google scholar] [semantic scholar] [dblp] [abbreviations] [twitter/x]

Other writings

Learning from Rewards in Text Generation [pdf] [table of contents]
Ph.D. dissertation
September 2024

Discussion of GOLD [pdf] [integrated in dissertation page ~55]
June 2022
tl;dr: GOLD does not maximize the expected reward. It maximizes the expected reward of training examples only.

More research activities

As an area chair / action editor

ACL 2023 (summarization), ACL Rolling Review (02,04,06/2024; areas: generation, language modeling, QA, dialogue, resource & eval)

As a reviewer / program committee member

Top ML/NLP venues: AAAI (2023), ACL Rolling Review (10,11/2021; 01,02,12/2022; 02,06,08,10,12/2023), ACL (2021), Conference on Language Modeling (COLM; 2024), EMNLP (2021, 2022), ICLR (2022, 2023, 2024), ICLR blog post track (2022), ICML (2022, 2023, 2024), NeurIPS (2021 — top 8% reviewer, 2022, 2023 — top 10% reviewer), Transactions on Machine Learning Research (TMLR; 2022, 2023, 2024) [abbreviations]
Workshops: Novel Ideas in Learning-to-Learn through Interaction (NILLI at EMNLP 2021 and EMNLP 2022), Efficient Benchmarking in NLP (NLP Power at ACL 2022), Interactive Learning for NLP (InterNLP at NeurIPS 2022), Multilingual Representation Learning (MRL at EMNLP 2023), Mathematical and Empirical Understanding of Foundation Models (ME-FoMo at ICLR 2024)
Other events: Mid-Atlantic Student Colloquium on Speech, Language, and Learning (2022, 2023), Inverse Scaling Prize (2023; github, report)

Teaching

External

May 2022, Teaching Assistant / Lab Instructor (virtual), African Masters of Machine Intelligence (course: Deep Learning for NLP by Prof. Kyunghyun Cho and Prof. Duygu Ataman) [AMMI site]

At New York University

May 2023, Co-organizer, NYU AI School 2023 (in-person) [site]
Spring 2022, Section Leader / Teaching Assistant (in-person), DS-GA 1012 / LING-GA 1012: Natural Language Understanding and Computational Semantics (Bowman; graduate-level) [syllabus]
January 2022, Co-instructor / Co-organizer, NYU AI School 2022 (virtual) [site]
Fall 2020, Section Leader (in-person), DS-GA 1008: Deep Learning (Cho, LeCun; graduate-level) [syllabus] [Cho's QA blog post]

At the University of Chicago

Spring 2017, Course Assistant, MATH 15910: Intro to Proofs in Analysis
Winter 2017, Course Assistant, MATH 15910: Intro to Proofs in Analysis [sol]
Winter 2017, Grader, CMSC 15200: Intro to Computer Science II
Autumn 2016, Teaching Assistant, MATH 15300: Calculus III

Presentations

Selected presentations

Talk on learning from rewards in text generation & long-context benchmarks; Nvidia; January 2024
Talk on Leveraging Implicit Feedback from Deployment Data in Dialogue; Meta reading group; August 2023
Talk on QuALITY: Question Answering with Long Input Texts, Yes! and SQuALITY: Building a Long-Document Summarization Dataset the Hard Way; Meta AI reading group in New York; October 2022
Talk titled QuALITY: Question Answering with Long Input Texts, Yes!; NAACL 2022 in Seattle; July 2022 [live talk]
Talk on QuALITY: Question Answering with Long Input Texts, Yes! in NYU's undergraduate course LING-UA 52 / DS-UA 203 ML for Language Understanding; March 2022
Talk on RL in text generation and Text Generation by Learning from Demonstrations in NYU's graduate course DS/LING-GA 1012 Natural Language Understanding; March 2022
Talk on question answering data collection (QuALITY, SQuALITY, etc.); Apple; December 2021
Talk titled Text Generation by Learning from Demonstrations; Samsung workshop; June 2021 [based on this slide deck]
Talk on structured prediction (specifically, inference networks and structured prediction energy networks); Bank of New York Mellon; September 2020 [based on this slide deck]
Talk titled Text Generation by Offline RL; Google Research New York; July 2020
Poster presentation on Learning Criteria and Evaluation Metrics for Textual Transfer between Non-Parallel Corpora; NAACL 2019 NeuralGen workshop in Minneapolis; June 2019
Talk titled Learning Approximate Inference Networks and Structured Prediction Energy Networks; Midwest Speech and Language Days (MSLD) 2019 in Chicago; May 2019
Poster presentation on Learning Criteria and Evaluation Metrics for Textual Transfer between Non-Parallel Corpora; UChicago STEM Research Symposium in Chicago; October 2018

Other conference presentations with associated proceeding papers

Please email for full CV.

News

June 2024: I defended my Ph.D. dissertation titled Learning from Rewards in Text Generation! The committee members are Drs. He He, Kyunghyun Cho, Jason Weston, Mengye Ren, and Ankur P. Parikh. Thank you, committee members, collaborators, and other colleagues!

Last updated: September 25, 2024. Contact: My NYU office is at 60 5th Ave. Get in touch at yzpang at _ dot edu (where _ is nyu)!