Conference

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track - ACL Anthology

· March 7, 2026 · 10 min read · 6 views

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track Franck Dernoncourt , Daniel Preoţiuc-Pietro , Anastasia Shimorina (Editors) Anthology ID: 2024.emnlp-industry Month: November Year: 2024 Address: Miami, Florida, US Venue: EMNLP SIG: Publisher: Association for Computational Linguistics URL: https://aclanthology.org/2024.emnlp-industry/ DOI: 10.18653/v1/2024.emnlp-industry Bib Export formats: BibTeX MODS XML EndNote PDF: https://aclanthology.org/2024.emnlp-industry.pdf PDF (full) Bib TeX Search Show all abstracts Hide all abstracts pdf bib Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track Franck Dernoncourt | Daniel Preoţiuc-Pietro | Anastasia Shimorina pdf bib abs Optimizing Entity Resolution in Voice Interfaces: An ASR -Aware Entity Reference Expansion Approach Jiangning Chen | Ziyun Zhang | Qianli Hu This paper tackles the challenges presented by Automatic Speech Recognition (ASR) errors in voice-based dialog systems, specifically, their adverse impact on Entity Resolution (ER) as a downstream task. Navigating the equilibrium between accuracy and online retrieval’s speed requirement proves challenging, particularly when limited data links the failed mentions to resolved entities. In this paper, we propose a entity reference expansion system, injecting pairs of failed mentions and resolved entity names into the knowledge graph, enhancing its awareness of unresolved mentions. To address data scarcity, we introduce a synthetic data generation approach aligned with noise patterns. This, combined with an ASR-Error-Aware Loss function, facilitates the training of a RoBERTa model, which filters failed mentions and extracts entity pairs for knowledge graph expansion. These designs confront obstacles related to ASR noise, data limitations, and online entity retrieval. pdf bib abs Two-tiered Encoder-based Hallucination Detection for Retrieval-Augmented Generation in the Wild Ilana Zimmerman | Jadin Tredup | Ethan Selfridge | Joseph Bradley Detecting hallucinations, where Large Language Models (LLMs) are not factually consistent with a Knowledge Base (KB), is a challenge for Retrieval-Augmented Generation (RAG) systems. Current solutions rely on public datasets to develop prompts or fine-tune a Natural Language Inference (NLI) model. However, these approaches are not focused on developing an enterprise RAG system; they do not consider latency, train or evaluate on production data, nor do they handle non-verifiable statements such as small talk or questions. To address this, we leverage the customer service conversation data of four large brands to evaluate existing solutions and propose a set of small encoder models trained on a new dataset. We find the proposed models to outperform existing methods and highlight the value of combining a small amount of in-domain data with public datasets. pdf bib abs The Program Testing Ability of Large Language Models for Code Weimin Xiong | Yiwen Guo | Hao Chen Recent development of large language models (LLMs) for code like CodeX and CodeT5+ shows promise in achieving code intelligence. Their ability of synthesizing program targeting a pre-defined algorithmic coding task has been intensively tested and verified on datasets including HumanEval and MBPP. Yet, evaluation of these LLMs from more perspectives (than just program synthesis) is also anticipated, considering their broad scope of applications. In this paper, we explore their ability of automatic test cases generation. We show intriguing observations and reveal how the quality of their generated test cases can be improved. Following recent work which uses generated test cases to enhance program synthesis, we further leverage our findings in improving the quality of the synthesized programs and show +11.77% and +4.22% higher code pass rates on HumanEval+ comparing with the GPT-3.5-turbo baseline and the recent state-of-the-art, respectively. Our code is publicly available at https://github.com/asdasxzxcq/TestCaseGen. pdf bib abs Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization Lei Xu | Mohammed Asad Karim | Saket Dingliwal | Aparna Elangovan Large language models (LLMs) can generate fluent summaries across domains using prompting techniques, reducing the effort required for summarization applications. However, crafting effective prompts that guide LLMs to generate summaries with the appropriate level of detail and writing style remains a challenge. In this paper, we explore the use of salient information extracted from the source document to enhance summarization prompts. We show that adding keyphrases in prompts can improve ROUGE F1 and recall, making the generated summaries more similar to the reference and more complete. The number of keyphrases can control the precision-recall trade-off. Furthermore, our analysis reveals that incorporating phrase-level salient information is superior to word- or sentence-level. However, the impact on summary faithfulness is not universally positive across LLMs. To enable this approach, we introduce Keyphrase Signal Extractor (SigExt), a lightweight model that can be finetuned to extract salient keyphrases. By using SigExt, we achieve consistent ROUGE improvements across datasets and LLMs without any LLM customization. Our findings provide insights into leveraging salient information in building prompt-based summarization systems. pdf bib abs Predicting Entity Salience in Extremely Short Documents Benjamin Bullough | Harrison Lundberg | Chen Hu | Weihang Xiao A frequent challenge in applications that use entities extracted from text documents is selecting the most salient entities when only a small number can be used by the application (e.g., displayed to a user). Solving this challenge is particularly difficult in the setting of extremely short documents, such as the response from a digital assistant, where traditional signals of salience such as position and frequency are less likely to be useful. In this paper, we propose a lightweight and data-efficient approach for entity salience detection on short text documents. Our experiments show that our approach achieves competitive performance with respect to complex state-of-the-art models, such as GPT-4, at a significant advantage in latency and cost. In limited data settings, we show that a semi-supervised fine-tuning process can improve performance further. Furthermore, we introduce a novel human-labeled dataset for evaluating entity salience on short question-answer pair documents. pdf bib abs Don’t Shoot The Breeze: Topic Continuity Model Using Nonlinear Naive B ayes With Attention Shu-Ting Pi | Pradeep Bagavan | Yejia Li | Disha | Qun Liu Utilizing Large Language Models (LLM) as chatbots in diverse business scenarios often presents the challenge of maintaining topic continuity. Abrupt shifts in topics can lead to poor user experiences and inefficient utilization of computational resources. In this paper, we present a topic continuity model aimed at assessing whether a response aligns with the initial conversation topic. Our model is built upon the expansion of the corresponding natural language understanding (NLU) model into quantifiable terms using a Naive Bayes approach. Subsequently, we have introduced an attention mechanism and logarithmic nonlinearity to enhance its capability to capture topic continuity. This approach allows us to convert the NLU model into an interpretable analytical formula. In contrast to many NLU models constrained by token limits, our proposed model can seamlessly handle conversations of any length with linear time complexity. Furthermore, the attention mechanism significantly improves the model’s ability to identify topic continuity in complex conversations. According to our experiments, our model consistently outperforms traditional methods, particularly in handling lengthy and intricate conversations. This unique capability offers us an opportunity to ensure the responsible and interpretable use of LLMs. pdf bib abs Retrieval Augmented Spelling Correction for E -Commerce Applications Xuan Guo | Rohit Patki | Dante Everaert | Christopher Potts The rapid introduction of new brand names into everyday language poses a unique challenge for e-commerce spelling correction services, which must distinguish genuine misspellings from novel brand names that use unconventional spelling. We seek to address this challenge via Retrieval Augmented Generation (RAG). On this approach, product names are retrieved from a catalog and incorporated into the context used by a large language model (LLM) that has been fine-tuned to do contextual spelling correction. Through quantitative evaluation and qualitative error analyses, we find improvements in spelling correction utilizing the RAG framework beyond a stand-alone LLM. We also demonstrate the value of additional finetuning of the LLM to incorporate retrieved context. pdf bib abs Scaling Parameter-Constrained Language Models with Quality Data Ernie Chang | Matteo Paltenghi | Yang Li | Pin-Jie Lin | Changsheng Zhao | Patrick Huber | Zechun Liu | Rastislav Rabatin | Yangyang Shi | Vikas Chandra Scaling laws in language modeling traditionally quantify training loss as a function of dataset size and model parameters, providing compute-optimal estimates but often neglecting the impact of data quality on model generalization.In this paper, we extend the conventional understanding of scaling law by offering a microscopic view of data quality within the original formulation – effective training tokens – which we posit to be a critical determinant of performance for parameter-constrained language models.Specifically, we formulate the proposed term of effective training tokens to be a combination of two readily-computed indicators of text:(i) text diversity and (ii) syntheticity as measured by a teacher model.We pretrained over 200 models of 25M to 1.5B parameters on a diverse set of sampled, synthetic data, and estimated the constants that relate text quality, model size, training tokens, and eight reasoning task accuracy scores.We demonstrated the estimated constants yield +0.83 Pearson correlation with true accuracies, and analyze it in scenarios involving widely-used data techniques such as data sampling and synthesis which aim to improve data quality. pdf bib abs INDUS : Effective and Efficient Language Models for Scientific Applications Bishwaranjan Bhattacharjee | Aashka Trivedi | Masayasu Muraoka | Muthukumaran Ramasubramanian | Takuma Udagawa | Iksha Gurung | Nishan Pantha | Rong Zhang | Bharath Dandala | Rahul Ramachandran | Manil Maskey | Kaylin Bugbee | Michael M. Little | Elizabeth Fancher | Irina Gerasimov | Armin Mehrabian | Lauren Sanders | Sylvain V. Costes | Sergi Blanco-Cuaresma | Kelly Lockhart | Thomas Allen | Felix Grezes | Megan Ansdell | Alberto Accomazzi | Yousef El-Kurdi | Davis Wertheimer | Birgit Pfitzmann | Cesar Berrospi Ramis | Michele Dolfi | Rafael Teixeira De Lima | Panagiotis Vagenas | S. Karthik Mukkavilli | Peter W. J. Staar | Sanaz Vahidinia | Ryan McGranaghan | Tsengdar J. Lee Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this insight, we developed INDUS, a comprehensive suite of LLMs tailored for the closely-related domains of Earth science, biology, physics, heliophysics, planetary sciences and astrophysics, and trained using curated scientific corpora drawn from diverse data sources. The suite of models include: (1) an encoder model trained using domain-specific vocabulary and corpora to address NLP tasks, (2) a contrastive-learning based text embedding model trained using a diverse set of datasets to address information retrieval tasks and (3) smaller versions of these models created using knowledge distillation for applications which have latency or resource constraints. We also created three new scientific benchmark datasets, Climate-Change NER (entity-recognition), NASA-QA (extractive QA) and NASA-IR (IR) to accelerate research in these multi-disciplinary fields. We show that our models outperform both general-purpose (RoBERTa) and domain- specific (SciBERT) encoders on these new tasks as well as existing tasks in the domains of interest. Furthermore, we demonstrate the use of these models in two industrial settings- as a retrieval model for large-scale vector search applications and in automatic content tagging systems. pdf bib abs DL - QAT : Weight-Decomposed Low-Rank Quantization-Aware Training for Large Language Models Wenjing Ke | Zhe Li | Dong Li | Lu Tian | Emad Barsoum Improving the efficiency of inference in Large Language Models (LLMs) is a critical area of research. Post-training Quantization (PTQ) is a popular technique, but it often faces challenges at low-bit levels, particularly in downstream tasks. Quantization-aware Training (QAT) can alleviate this problem, but it requires significantly more computational resources. To tackle this, we introduced Weight-Decomposed Low-Rank Quantization-Aware Training (DL-QAT), which merges the advantages of QAT while training only less than 1% of the total parameters. Specifically, we introduce a group-specific quantization magnitude to adjust the overall scale of each quantization group. Within each quantization group, we use LoRA matrices to update the weight size and direction in the quantization space. We validated the effectiveness of our method on the LLaMA and LLaMA2 model families. The results show significant improvements over our baseline method across different quantization granularities. For instance, for LLaMA-7B, our approach outperforms the previous state-of-the-art method by 4.2% in MMLU on 3-bit LLaMA-7B. Additionally, our quantization results on pre-trained models also surpass previous QAT methods, demonstrating the superior performance and efficiency of our approach. pdf bib abs Hybrid- RACA : Hybrid Retrieval-Augmented Composition Assistance for Real-time Text Prediction Menglin Xia | Xuchao Zhang | Camille Couturier | Guoqing Zheng | Saravan Rajmohan | Victor Rühle Large language models (LLMs) enhanced with retrieval augmentation has shown great performance in many applications. However, the computational demands for these models pose a challenge when applying them to real-time tasks, such as composition assistance. To address this, we propose Hybrid Retrieval-Augmented Composition Assistance (Hybrid-RACA), a novel system for real-time text prediction that efficiently combines a cloud-based LLM with a smaller client-side model through retrieval augmented memory. This integration enables the client model to generate better responses, benefiting from the LLM’s capabilities and cloud-based data. Meanwhile, via a novel a

Executive Summary

The 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track proceedings present innovative solutions to challenges in natural language processing, including entity resolution in voice interfaces and hallucination detection in retrieval-augmented generation. The papers propose novel approaches, such as ASR-aware entity reference expansion and two-tiered encoder-based hallucination detection, to address these challenges. The proceedings demonstrate the significance of industry-academia collaboration in advancing NLP research and its applications.

Key Points

▸ Entity resolution in voice interfaces using ASR-aware entity reference expansion
▸ Hallucination detection in retrieval-augmented generation using two-tiered encoder-based models
▸ Importance of industry-academia collaboration in advancing NLP research

Merits

Innovative Solutions

The proceedings present novel and innovative solutions to pressing challenges in NLP, demonstrating the potential for industry-academia collaboration to drive progress in the field.

Demerits

Data Scarcity

Some papers highlight the challenge of data scarcity, particularly in cases where limited data links failed mentions to resolved entities, which can hinder the development and training of effective NLP models.

Expert Commentary

The proceedings demonstrate the growing importance of industry-academia collaboration in advancing NLP research and its applications. The innovative solutions presented in the papers highlight the potential for NLP to drive progress in various fields, from customer service to voice interfaces. However, the challenges highlighted in the papers, such as data scarcity and the need for explainability and transparency, also underscore the need for continued research and development in these areas. As NLP continues to evolve, it is essential to address these challenges and ensure that the benefits of NLP are realized while minimizing its risks.

Recommendations

✓ Further research is needed to address the challenges highlighted in the papers, including data scarcity and the need for explainability and transparency in NLP models.
✓ Industry-academia collaboration should be encouraged to drive progress in NLP research and its applications, and to ensure that the benefits of NLP are realized while minimizing its risks.

Sources

EMNLP

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track - ACL Anthology

AI Commentary

Executive Summary

Key Points

Merits

Innovative Solutions

Demerits

Data Scarcity

Expert Commentary

Recommendations

Sources

Related Articles

Google Maps

Find Your Next Job

A Retrospective on the ICLR 2026 Review Process

Retrospective on PAT x ICML 2026 AI Paper Assistant Program

JCG, PC

HSOLLC Co., Ltd.