ICML 2024: Open-Domain Text Evaluation via Contrastive Distribution Methods

Related Works

InsNet-v2: The GPT Moment for Insertion-based Language models ICML 2024: NADOv2: Improved Training and Low-Rank Adaptation of Neurally-Decomposed Oracles for Controlling Language Models ICML 2024: Open-Domain Text Evaluation via Contrastive Distribution Methods NeurIPS 2022: Controllable Text Generation with Neurally-Decomposed Oracle NeurIPS 2022: InsNet: An Efficient, Flexible, and Performant Insertion-based Text Generation Model ICML 2019: Neurally-Guided Structure Inference ICML 2019: CoT: Cooperative Training for Generative Modeling of Discrete Data IJCAI-2018: Neural Text Generation: Past, Present and Beyond SIGIR-2018: Texygen: A Benchmarking Platform for Text Generation Models AAAI-2018: Long Text Generation via Adversarial Training with Leaked Information

更多>>

Abstract

Recent advancements in open-domain text generation, driven by the power of large pre-trained language models (LLMs), have demonstrated remarkable performance. However, assessing these models’ generation quality remains a challenge. In this paper, we introduce a novel method for evaluating open-domain text generation called Contrastive Distribution Methods (CDM). Leveraging the connection between increasing model parameters and enhanced LLM performance, CDM creates a mapping from the \textit{contrast} of two probabilistic distributions – one known to be superior to the other – to quality measures. We investigate CDM for open-domain text generation evaluation under two paradigms: 1) \emph{Generative} CDM, which harnesses the contrast of two language models’ distributions to generate synthetic examples for training discriminator-based metrics; 2) \emph{Discriminative} CDM, which directly uses distribution disparities between two language models for evaluation. Our experiments on coherence evaluation for multi-turn dialogue and commonsense evaluation for controllable generation demonstrate CDM’s superior correlate with human judgment than existing automatic evaluation metrics, highlighting the strong performance and generalizability of our approach.

Paper Link

TBD

Status

Accepted to ICML 2024