Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track - ACL Anthology
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track Mingxuan Wang , Imed Zitouni (Editors) Anthology ID: 2023.emnlp-industry Month: December Year: 2023 Address: Singapore Venue: EMNLP SIG: Publisher: Association for Computational Linguistics URL: https://aclanthology.org/2023.emnlp-industry/ DOI: Bib Export formats: BibTeX MODS XML EndNote PDF: https://aclanthology.org/2023.emnlp-industry.pdf PDF (full) Bib TeX Search Show all abstracts Hide all abstracts pdf bib Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track Mingxuan Wang | Imed Zitouni pdf bib abs B eautiful P rompt: Towards Automatic Prompt Engineering for Text-to-Image Synthesis Tingfeng Cao | Chengyu Wang | Bingyan Liu | Ziheng Wu | Jinhui Zhu | Jun Huang Recently, diffusion-based deep generative models (e.g., Stable Diffusion) have shown impressive results in text-to-image synthesis. However, current text-to-image models often require multiple passes of prompt engineering by humans in order to produce satisfactory results for real-world applications. We propose BeautifulPrompt, a deep generative model to produce high-quality prompts from very simple raw descriptions, which enables diffusion-based models to generate more beautiful images. In our work, we first fine-tuned the BeautifulPrompt model over low-quality and high-quality collecting prompt pairs. Then, to ensure that our generated prompts can generate more beautiful images, we further propose a Reinforcement Learning with Visual AI Feedback technique to fine-tune our model to maximize the reward values of the generated prompts, where the reward values are calculated based on the PickScore and the Aesthetic Scores. Our results demonstrate that learning from visual AI feedback promises the potential to improve the quality of generated prompts and images significantly. We further showcase the integration of BeautifulPrompt to a cloud-native AI platform to provide better text-to-image generation service in the cloud. pdf bib abs Enhancing Language Model with Unit Test Techniques for Efficient Regular Expression Generation Chenhui Mao | Xiexiong Lin | Xin Jin | Xin Zhang Recent research has investigated the use of generative language models to produce regular expressions with semantic-based approaches. However, these approaches have shown shortcomings in practical applications, particularly in terms of functional correctness, which refers to the ability to reproduce the intended function inputs by the user. To address this issue, we present a novel method called Unit-Test Driven Reinforcement Learning (UTD-RL). Our approach differs from previous methods by taking into account the crucial aspect of functional correctness and transforming it into a differentiable gradient feedback using policy gradient techniques. In which functional correctness can be evaluated through Unit Tests, a testing method that ensures regular expressions meets its design and performs as intended. Experiments conducted on three public datasets demonstrate the effectiveness of the proposed method in generating regular expressions. This method has been employed in a regulatory scenario where regular expressions can be utilized to ensure that all online content is free from non-compliant elements, thereby significantly reducing the workload of relevant personnel. pdf bib abs A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models Takuma Udagawa | Aashka Trivedi | Michele Merler | Bishwaranjan Bhattacharjee Large language models have become a vital component in modern NLP, achieving state of the art performance in a variety of tasks. However, they are often inefficient for real-world deployment due to their expensive inference costs. Knowledge distillation is a promising technique to improve their efficiency while retaining most of their effectiveness. In this paper, we reproduce, compare and analyze several representative methods for task-agnostic (general-purpose) distillation of Transformer language models. Our target of study includes Output Distribution (OD) transfer, Hidden State (HS) transfer with various layer mapping strategies, and Multi-Head Attention (MHA) transfer based on MiniLMv2. Through our extensive experiments, we study the effectiveness of each method for various student architectures in both monolingual (English) and multilingual settings. Overall, we show that MHA transfer based on MiniLMv2 is generally the best option for distillation and explain the potential reasons behind its success. Moreover, we show that HS transfer remains as a competitive baseline, especially under a sophisticated layer mapping strategy, while OD transfer consistently lags behind other approaches. Findings from this study helped us deploy efficient yet effective student models for latency-critical applications. pdf bib abs Towards Effective Automatic Debt Collection with Persona Awareness Tong Zhang | Junhong Liu | Chen Huang | Jia Liu | Hongru Liang | Zujie Wen | Wenqiang Lei Understanding debtor personas is crucial for collectors to empathize with debtors and develop more effective collection strategies. In this paper, we take the first step towards comprehensively investigating the significance of debtor personas and present a successful commercial practice on automatic debt collection agents. Specifically, we organize the debtor personas into a taxonomy and construct a persona-aware conversation dataset. Building upon it, we implement a simple yet effective persona-aware agent called PAD. After two-month online testing, PAD increases the recovery rate by 3.31% and collects an additional ~100K RMB. Our commercial practice brings inspiration to the debt collection industry by providing an effective automatic solution. pdf bib abs Gatekeeper to save COGS and improve efficiency of Text Prediction Nidhi Tiwari | Sneha Kola | Milos Milunovic | Si-qing Chen | Marjan Slavkovski The text prediction (TP) workflow calls a Large Language Model (LLM), almost, after every character to get subsequent sequence of characters, till user accepts a suggestion. The confidence score of the prediction is commonly used for filtering the results to ensure that only correct predictions are shown to user. As LLMs require massive amounts of computation and storage, such an approach incurs network and high execution cost. So, we propose a Model gatekeeper (GK) to stop the LLM calls that will result in incorrect predictions at client application level itself. This way a GK can save cost of model inference and improve user experience by not showing the incorrect predictions. We demonstrate that use of a model gatekeeper saved approx 46.6% of COGS for TP, at the cost of approx 4.5% loss in character saving. Use of GK also improved the efficiency (suggestion rate) of TP model by 73%. pdf bib abs Efficient Transformer Knowledge Distillation: A Performance Review Nathan Brown | Ashton Williamson | Tahj Anderson | Logan Lawrence As pretrained transformer language models continue to achieve state-of-the-art performance, the Natural Language Processing community has pushed for advances in model compression and efficient attention mechanisms to address high computational requirements and limited input sequence length. Despite these separate efforts, no investigation has been done into the intersection of these two fields. In this work, we provide an evaluation of model compression via knowledge distillation on efficient attention transformers. We provide cost-performance trade-offs for the compression of state-of-the-art efficient attention architectures and the gains made in performance in comparison to their full attention counterparts. Furthermore, we introduce a new long-context Named Entity Recognition dataset, GONERD, to train and test the performance of NER models on long sequences. We find that distilled efficient attention transformers can preserve a significant amount of original model performance, preserving up to 98.6% across short-context tasks (GLUE, SQUAD, CoNLL-2003), up to 94.6% across long-context Question-and-Answering tasks (HotpotQA, TriviaQA), and up to 98.8% on long-context Named Entity Recognition (GONERD), while decreasing inference times by up to 57.8% . We find that, for most models on most tasks, performing knowledge distillation is an effective method to yield high-performing efficient attention models with low costs. pdf bib abs CDD : A Large Scale Dataset for Legal Intelligence Research Changzhen Ji | Yating Zhang | Adam Jatowt | Haipang Wu As an important application of Artificial Intelligence, legal intelligence has recently attracted the attention of many researchers. Previous works investigated diverse issues like predicting crimes, predicting outcomes of judicial debates, or extracting information/knowledge from various kinds of legal documents. Although many advances have been made, the research on supporting prediction of court judgments remains relatively scarce, while the lack of large-scale data resources limits the development of this research.In this paper, we present a novel, large-size Court Debate Dataset (CDD), which includes 30,481 court cases, totaling 1,144,425 utterances. CDD contains real-world conversations involving judges, plaintiffs and defendants in court trials. To construct this dataset we have invited experienced judges to design appropriate labels for data records. We then asked law school students to provide annotations based on the defined labels. The dataset can be applied to several downstream tasks, such as text summarization, dialogue generation, text classification, etc. We introduce the details of the different tasks in the rapidly developing field of legal intelligence, the research of which can be fostered thanks to our dataset, and we provide the corresponding benchmark performance. pdf bib abs MUST & P - SRL : Multi-lingual and Unified Syllabification in Text and Phonetic Domains for Speech Representation Learning Noé Tits In this paper, we present a methodology for linguistic feature extraction, focusing particularly on automatically syllabifying words in multiple languages, with a design to be compatible with a forced-alignment tool, the Montreal Forced Aligner (MFA). In both the textual and phonetic domains, our method focuses on the extraction of phonetic transcriptions from text, stress marks, and a unified automatic syllabification (in text and phonetic domains). The system was built with open-source components and resources. Through an ablation study, we demonstrate the efficacy of our approach in automatically syllabifying words from several languages (English, French and Spanish). Additionally, we apply the technique to the transcriptions of the CMU ARCTIC dataset, generating valuable annotations available online (https://github.com/noetits/MUST_P-SRL) that are ideal for speech representation learning, speech unit discovery, and disentanglement of speech factors in several speech-related fields. pdf bib abs Personalized Dense Retrieval on Global Index for Voice-enabled Conversational Systems Masha Belyi | Charlotte Dzialo | Chaitanya Dwivedi | Prajit Muppidi | Kanna Shimizu Voice-controlled AI dialogue systems are susceptible to noise from phonetic variations and failure to resolve ambiguous entities. Typically, personalized entity resolution (ER) and/or query rewrites (QR) are deployed to recover from these error modes. Previous work in this field achieves personalization by constraining retrieval search space to personalized indices built from user’s historical interactions with the device. While constrained retrieval achieves high precision, predictions are limited to entities in recent user history, which offers low coverage of future requests. Further, maintaining individual indices for millions of users is memory intensive and difficult to scale. In this work, we propose a personalized entity retrieval system that is robust to phonetic noise and ambiguity but is not limited to a personalized index. We achieve this by embedding user listening preferences into a contextual query embedding used in retrieval. We demonstrate our model’s ability to correct multiple error modes and show 91% improvement over baseline on the entity retrieval task. Finally, we optimize the end-to-end approach to fit within online latency constraints while maintaining gains in performance. pdf bib abs T ext2 T opic: Multi-Label Text Classification System for Efficient Topic Detection in User Generated Content with Zero-Shot Capabilities Fengjun Wang | Moran Beladev | Ofri Kleinfeld | Elina Frayerman | Tal Shachar | Eran Fainman | Karen Lastmann Assaraf | Sarai Mizrachi | Benjamin Wang Multi-label text classification is a critical task in the industry. It helps to extract structured information from large amount of textual data. We propose Text to Topic (Text2Topic), which achieves high multi-label classification performance by employing a Bi-Encoder Transformer architecture that utilizes concatenation, subtraction, and multiplication of embeddings on both text and topic. Text2Topic also supports zero-shot predictions, produces domain-specific text embeddings, and enables production-scale batch-inference with high throughput. The final model achieves accurate and comprehensive results compared to state-of-the-art baselines, including large language models (LLMs). In this study, a total of 239 topics are defined, and around 1.6 million text-topic pairs annotations (in which 200K are positive) are collected on approximately 120K texts from 3 main data sources on Booking.com. The data is collected with optimized smart sampling and partial labeling. The final Text2Topic model is deployed on a real-world stream processing platform, and it outperforms other models with 92.9% micro mAP, as well as a 75.8% macro mAP score. We summarize the modeling choices which are extensively tested through ablation studies, and share detailed in-production decision-making steps. pdf bib abs Deep Metric Learning to Hierarchically Rank - An Application in Product Retrieval Kee Kiat Koo | Ashutosh Joshi | Nishaanth Reddy | Karim Bouyarmane | Ismail Tutar | Vaclav Petricek | Changhe Yuan Most e-commerce search engines use customer behavior signals to augment lexical matching and improve search relevance. Many e-commerce companies like Amazon, Alibaba, Ebay etc. operate in multiple countries with country specific stores. However, customer behavior data is sparse in newer stores. To compensate for sparsity of behavioral data in low traffic stores, search engines often use cross-listed products in some form. However, cross-listing across stores is not uniform and in many cases itself sparse. In this paper, we develop a model to identify duplicate and near-duplicate products across stores. Such a model can be used to unify product catalogs worldwide, improve product meta-data or as in our
Executive Summary
The Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track presents cutting-edge research in natural language processing (NLP) with a focus on industry applications. Two notable papers are highlighted: one introducing BeautifulPrompt, a deep generative model for automatic prompt engineering in text-to-image synthesis, and another proposing Unit-Test Driven Reinforcement Learning (UTD-RL) for generating functionally correct regular expressions. Both papers leverage advanced techniques such as reinforcement learning and visual AI feedback to enhance the performance and practicality of NLP models.
Key Points
- ▸ Introduction of BeautifulPrompt for automatic prompt engineering in text-to-image synthesis.
- ▸ Proposal of UTD-RL for generating functionally correct regular expressions.
- ▸ Use of reinforcement learning and visual AI feedback to improve model performance.
- ▸ Integration of models into cloud-native AI platforms for practical applications.
Merits
Innovative Techniques
Both papers introduce novel methods that address significant challenges in NLP, such as the need for automatic prompt engineering and the generation of functionally correct regular expressions.
Practical Applications
The research presented has clear applications in real-world scenarios, particularly in enhancing the quality of text-to-image synthesis and improving the accuracy of regular expressions.
Advanced Methodologies
The use of reinforcement learning and visual AI feedback demonstrates a sophisticated approach to solving complex problems in NLP.
Demerits
Limited Scope
The focus on specific applications may limit the generalizability of the findings to other areas of NLP.
Complexity
The advanced techniques employed may require significant computational resources and expertise, potentially limiting their immediate adoption.
Data Dependency
The effectiveness of the models is highly dependent on the quality and diversity of the training data, which may not always be readily available.
Expert Commentary
The Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track showcases the ongoing advancements in NLP research, particularly in the areas of text-to-image synthesis and regular expression generation. The introduction of BeautifulPrompt and UTD-RL represents a significant step forward in addressing the challenges of automatic prompt engineering and functional correctness. The use of reinforcement learning and visual AI feedback demonstrates a sophisticated approach to improving model performance, which has clear implications for both practical applications and policy decisions. However, the complexity and data dependency of these methods may pose challenges to their immediate adoption. Overall, the research presented in these proceedings highlights the potential of advanced NLP techniques to enhance the quality and reliability of AI applications, paving the way for further innovation in the field.
Recommendations
- ✓ Further research should explore the generalizability of the proposed methods to other areas of NLP.
- ✓ Efforts should be made to simplify the implementation of these advanced techniques to facilitate their adoption in industry settings.
- ✓ The development of standardized datasets and benchmarks for evaluating the performance of NLP models would be beneficial for future research.