Sidi Lu

01 Apr 2024

InsNet-v2: The GPT Moment for Insertion-based Language models

Abstract

TBD

Paper Link

TBD

Status

Under Submission

Details
01 Mar 2024

ICML 2024: NADOv2: Improved Training and Low-Rank Adaptation of Neurally-Decomposed Oracles for Controlling Language Models

Abstract

NeurAlly-Decomposed Oracle (NADO) is a powerful approach for controllable generation with large language models. It is designed to avoid catastrophic forgetting while achieving guaranteed convergence to an entropy-maximized closed-form optimal solution with reasonable modeling capacity. Despite existing success, several challenges arise when NADO is applied to less ideal scenarios. Vanilla NADO suffers from gradient vanishing for low-probability control signals and is highly reliant on the forward-consistency regularization. In addition, the vanilla implementation of NADO through introducing a few additional transformer layers suffer from a limited capacity, especially compared to other finetune-based model adaptation methods like LoRA. In this paper, we concern an improved version of the NADO algorithm, namely NADOv2, in both parameterization and training process. We discuss how such improved version can significantly improve the effectiveness of the algorithm, and allowing NADO to be combined with LoRA, achieving better model capacity and algorithmic flexibility. Experiment results on the lexically constrained generation task CommonGen justify the significance of the improvements.

Paper Link

TBD

Status

Accepted to ICML 2024

Details
01 Dec 2023

ICML 2024: Open-Domain Text Evaluation via Contrastive Distribution Methods

Abstract

Recent advancements in open-domain text generation, driven by the power of large pre-trained language models (LLMs), have demonstrated remarkable performance. However, assessing these models’ generation quality remains a challenge. In this paper, we introduce a novel method for evaluating open-domain text generation called Contrastive Distribution Methods (CDM). Leveraging the connection between increasing model parameters and enhanced LLM performance, CDM creates a mapping from the \textit{contrast} of two probabilistic distributions – one known to be superior to the other – to quality measures. We investigate CDM for open-domain text generation evaluation under two paradigms: 1) \emph{Generative} CDM, which harnesses the contrast of two language models’ distributions to generate synthetic examples for training discriminator-based metrics; 2) \emph{Discriminative} CDM, which directly uses distribution disparities between two language models for evaluation. Our experiments on coherence evaluation for multi-turn dialogue and commonsense evaluation for controllable generation demonstrate CDM’s superior correlate with human judgment than existing automatic evaluation metrics, highlighting the strong performance and generalizability of our approach.

Paper Link

TBD

Status

Accepted to ICML 2024

Details
01 Mar 2022

NeurIPS 2022: Controllable Text Generation with Neurally-Decomposed Oracle

Abstract

We propose a general and efficient framework to control auto-regressive generation models with NeurAlly-Decomposed Oracle (NADO). Given a pre-trained base language model and a sequence-level boolean oracle function, we aim to decompose the oracle function into token-level guidance to steer the base model in text generation. Specifically, the token-level guidance is provided by NADO, a neural model trained with examples sampled from the base model, demanding no additional auxiliary labeled data. Based on posterior regularization, we present the close-form optimal solution to incorporate the decomposed token-level guidance into the base model for controllable generation. We further discuss how the neural approximation affects the quality of the solution. These experiments conducted on two different applications: (1) text generation with lexical constraints and (2) machine translation with formality control demonstrate that our framework efficiently guides the base model towards the given oracle while keeping high generation quality.

Paper Link

https://proceedings.neurips.cc/paper_files/paper/2022/hash/b40d5797756800c97f3d525c2e4c8357-Abstract-Conference.html

Status

Accepted (Oral/Spotlight)

Details
21 Feb 2021

NeurIPS 2022: InsNet: An Efficient, Flexible, and Performant Insertion-based Text Generation Model

Abstract

We propose InsNet, an expressive insertion-based text generator with efficient training and flexible decoding (parallel or sequential). Unlike most existing insertion-based text generation works that require re-encoding of the (decoding) context after each insertion operation and thus are inefficient to train, InsNet only requires one pass of context encoding for the entire insertion sequence during training by using a novel insertion-oriented position encoding to enable computation sharing. Furthermore, InsNet provides a controllable switch between parallel and sequential decoding, making it flexible to handle more parallelizable tasks such as machine translation to support efficient decoding, or less parallelizable tasks such as lexically constrained text generation to guarantee high-quality outputs. Experiments on two unsupervised lexically constrained text generation datasets and three machine translation datasets demonstrate InsNet’s advantages over previous insertion-based methods in terms of training speed, inference efficiency, and generation quality.

Paper Link

https://proceedings.neurips.cc/paper_files/paper/2022/hash/2e32d3a10985fc94c7e11ee6ea165cca-Abstract-Conference.html

Status

Accepted (Poster)

Details
24 May 2019

ICML 2019: Neurally-Guided Structure Inference

Abstract

Most structure inference methods either rely on exhaustive search or are purely data-driven. Exhaustive search robustly infers the structure of arbitrarily complex data, but it is slow. Data-driven methods allow efficient inference, but do not generalize when test data have more complex structures than training data. In this paper, we propose a hybrid inference algorithm, the Neurally-Guided Structure Inference (NG-SI), keeping the advantages of both search-based and data-driven methods. The key idea of NG-SI is to use a neural network to guide the hierarchical, layer-wise search over the compositional space of structures. We evaluate our algorithm on two representative structure inference tasks: probabilistic matrix decomposition and symbolic program parsing. It outperforms data-driven and search-based alternatives on both tasks.

Paper Link

http://proceedings.mlr.press/v97/lu19b/lu19b.pdf

Status

Accepted (Short Oral)

Details
24 May 2019

ICML 2019: CoT: Cooperative Training for Generative Modeling of Discrete Data

Abstract

In this paper, we study the generative models of sequential discrete data. To tackle the exposure bias problem inherent in maximum likelihood estimation (MLE), generative adversarial networks (GANs) are introduced to penalize the unrealistic generated samples. To exploit the supervision signal from the discriminator, most previous models leverage REINFORCE to address the non-differentiable problem of sequential discrete data. However, because of the unstable property of the training signal during the dynamic process of adversarial training, the effectiveness of REINFORCE, in this case, is hardly guaranteed. To deal with such a problem, we propose a novel approach called Cooperative Training (CoT) to improve the training of sequence generative models. CoT transforms the min-max game of GANs into a joint maximization framework and manages to explicitly estimate and optimize Jensen-Shannon divergence. Moreover, CoT works without the necessity of pre-training via MLE, which is crucial to the success of previous methods. In the experiments, compared to existing state-of-the-art methods, CoT shows superior or at least competitive performance on sample quality, diversity, as well as training stability.

Paper Link

http://proceedings.mlr.press/v97/lu19d/lu19d.pdf

Status

Accepted (Short Oral)

Details
10 Mar 2018

IJCAI-2018: Neural Text Generation: Past, Present and Beyond

Abstract

We introduce Texygen, a benchmarking platform to support research on open-domain text generation models. Texygen has not only implemented a majority of text generation models, but also covered a set of metrics that evaluate the diversity, the quality and the consistency of the generated texts. The Texygen platform could help standardize the research on text generation and facilitate the sharing of fine-tuned open-source implementations among researchers for their work. As a consequence, this would help in improving the reproductivity and reliability of future research work in text generation.

Arxiv Link

https://arxiv.org/abs/1803.07133

Status

Rejected

Details
30 Jan 2018

SIGIR-2018: Texygen: A Benchmarking Platform for Text Generation Models

Abstract

We introduce Texygen, a benchmarking platform to support research on open-domain text generation models. Texygen has not only implemented a majority of text generation models, but also covered a set of metrics that evaluate the diversity, the quality and the consistency of the generated texts. The Texygen platform could help standardize the research on text generation and facilitate the sharing of fine-tuned open-source implementations among researchers for their work. As a consequence, this would help in improving the reproductivity and reliability of future research work in text generation.

Arxiv Link

https://arxiv.org/abs/1802.01886

Status

Accepted as a Conference Short Paper

Details
30 Sep 2017

AAAI-2018: Long Text Generation via Adversarial Training with Leaked Information

Abstract

Automatically generating coherent and semantically meaningful text has many applications in machine translation, dialogue systems, image captioning, etc. Recently, by combining with policy gradient, Generative Adversarial Nets (GAN) that use a discriminative model to guide the training of the generative model as a reinforcement learning policy has shown promising results in text generation. However, the scalar guiding signal is only available after the entire text has been generated and lacks intermediate information about text structure during the generative process. As such, it limits its success when the length of the generated text samples is long (more than 20 words). In this paper, we propose a new framework, called LeakGAN, to address the problem for long text generation. We allow the discriminative net to leak its own high-level extracted features to the generative net to further help the guidance. The generator incorporates such informative signals into all generation steps through an additional Manager module, which takes the extracted features of current generated words and outputs a latent vector to guide the Worker module for next-word generation. Our extensive experiments on synthetic data and various real-world tasks with Turing test demonstrate that LeakGAN is highly effective in long text generation and also improves the performance in short text generation scenarios. More importantly, without any supervision, LeakGAN would be able to implicitly learn sentence structures only through the interaction between Manager and Worker.

Arxiv Link

https://arxiv.org/abs/1709.08624

Status

Accepted as a Conference Poster Paper

Details

Abstract

Paper Link

Status

Abstract

Paper Link

Status

Abstract

Paper Link

Status

Abstract

Paper Link

Status

Abstract

Paper Link

Status

Abstract

Paper Link

Status

Abstract

Paper Link

Status

Abstract

Arxiv Link

Status

Abstract

Arxiv Link

Status

Abstract

Arxiv Link

Status