Publications

You can also find my articles on my Google Scholar profile.

Under Review 2026

ReplicatorBench: Benchmarking LLM Agents for Replicability in Social and Behavioral Sciences

Authors: Bang Nguyen, Dominik Soós, Qian Ma, Rochana R. Obadage, Zack Ranjan, Sai Koneru, Timothy M. Errington, Shakhlo Nematova, Sarah Rajtmajer, Jian Wu, Meng Jiang

Rerunning code is easy, but replicating a scientific claim with new data is where AI agents currently hit a wall. Our new benchmark, ReplicatorBench, exposes the gap between an agent's ability to run experiments and its ability to actually retrieve the resources necessary for real-world scientific validation.

View Paper

ACL Main 2025

QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation

Authors: Bang Nguyen, Tingting Du, Mengxia Yu, Lawrence Angrave, Meng Jiang

Many tools generate quiz questions, but don’t check if they’re actually good for learning. This paper introduces a new way to test question quality using simulated students, improving how we evaluate educational questions.

View Paper

EMNLP Findings 2024

Reference-based Metrics Disprove Themselves in Question Generation

Authors: Bang Nguyen, Mengxia Yu, Yun Huang, Meng Jiang

A new reference written by humans can be more different from the original reference than the generated text! You need a better metric and we have it.

View Paper

CODI 2023

Embedding Mental Health Discourse for Community Recommendation

Authors: Hy Dang*, Bang Nguyen*, Noah Ziems, Meng Jiang

People seek support online—but with so many communities, where do they go? We model both how people communicate within a community (via discourse embeddings) and what communities similar users prefer (via collaborative filtering). Our system combines both to recommend the right mental health space for each user.

View Paper

Bang Nguyen (Nguyễn Văn Bàng)

Publications

ReplicatorBench: Benchmarking LLM Agents for Replicability in Social and Behavioral Sciences

QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation

Reference-based Metrics Disprove Themselves in Question Generation

Embedding Mental Health Discourse for Community Recommendation