Publications
ReplicatorBench: Benchmarking LLM Agents for Replicability in Social and Behavioral Sciences
Authors: Bang Nguyen, Dominik Soós, Qian Ma, Rochana R. Obadage, Zack Ranjan, Sai Koneru, Timothy M. Errington, Shakhlo Nematova, Sarah Rajtmajer, Jian Wu, Meng Jiang
Rerunning code is easy, but replicating a scientific claim with new data is where AI agents currently hit a wall. Our new benchmark, ReplicatorBench, exposes the gap between an agent's ability to run experiments and its ability to actually retrieve the resources necessary for real-world scientific validation.
QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation
Authors: Bang Nguyen, Tingting Du, Mengxia Yu, Lawrence Angrave, Meng Jiang
Many tools generate quiz questions, but don’t check if they’re actually good for learning. This paper introduces a new way to test question quality using simulated students, improving how we evaluate educational questions.
Reference-based Metrics Disprove Themselves in Question Generation
Authors: Bang Nguyen, Mengxia Yu, Yun Huang, Meng Jiang
A new reference written by humans can be more different from the original reference than the generated text! You need a better metric and we have it.
Embedding Mental Health Discourse for Community Recommendation
Authors: Hy Dang*, Bang Nguyen*, Noah Ziems, Meng Jiang
People seek support online—but with so many communities, where do they go? We model both how people communicate within a community (via discourse embeddings) and what communities similar users prefer (via collaborative filtering). Our system combines both to recommend the right mental health space for each user.