Conference

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing - ACL Anthology

· March 7, 2026 · 10 min read · 6 views

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing Yoav Goldberg , Zornitsa Kozareva , Yue Zhang (Editors) Anthology ID: 2022.emnlp-main Month: December Year: 2022 Address: Abu Dhabi, United Arab Emirates Venue: EMNLP SIG: Publisher: Association for Computational Linguistics URL: https://aclanthology.org/2022.emnlp-main/ DOI: 10.18653/v1/2022.emnlp-main Bib Export formats: BibTeX MODS XML EndNote PDF: https://aclanthology.org/2022.emnlp-main.pdf PDF (full) Bib TeX Search Show all abstracts Hide all abstracts pdf bib Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing Yoav Goldberg | Zornitsa Kozareva | Yue Zhang pdf bib abs Generative Knowledge Graph Construction: A Review Hongbin Ye | Ningyu Zhang | Hui Chen | Huajun Chen Generative Knowledge Graph Construction (KGC) refers to those methods that leverage the sequence-to-sequence framework for building knowledge graphs, which is flexible and can be adapted to widespread tasks. In this study, we summarize the recent compelling progress in generative knowledge graph construction. We present the advantages and weaknesses of each paradigm in terms of different generation targets and provide theoretical insight and empirical analysis. Based on the review, we suggest promising research directions for the future. Our contributions are threefold: (1) We present a detailed, complete taxonomy for the generative KGC methods; (2) We provide a theoretical and empirical analysis of the generative KGC methods; (3) We propose several research directions that can be developed in the future. pdf bib abs CDC onv: A Benchmark for Contradiction Detection in C hinese Conversations Chujie Zheng | Jinfeng Zhou | Yinhe Zheng | Libiao Peng | Zhen Guo | Wenquan Wu | Zheng-Yu Niu | Hua Wu | Minlie Huang Dialogue contradiction is a critical issue in open-domain dialogue systems. The contextualization nature of conversations makes dialogue contradiction detection rather challenging. In this work, we propose a benchmark for Contradiction Detection in Chinese Conversations, namely CDConv. It contains 12K multi-turn conversations annotated with three typical contradiction categories: Intra-sentence Contradiction, Role Confusion, and History Contradiction. To efficiently construct the CDConv conversations, we devise a series of methods for automatic conversation generation, which simulate common user behaviors that trigger chatbots to make contradictions. We conduct careful manual quality screening of the constructed conversations and show that state-of-the-art Chinese chatbots can be easily goaded into making contradictions. Experiments on CDConv show that properly modeling contextual information is critical for dialogue contradiction detection, but there are still unresolved challenges that require future research. pdf bib abs Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space Mor Geva | Avi Caciularu | Kevin Wang | Yoav Goldberg Transformer-based language models (LMs) are at the core of modern NLP, but their internal prediction construction process is opaque and largely not understood. In this work, we make a substantial step towards unveiling this underlying prediction process, by reverse-engineering the operation of the feed-forward network (FFN) layers, one of the building blocks of transformer models. We view the token representation as a changing distribution over the vocabulary, and the output from each FFN layer as an additive update to that distribution. Then, we analyze the FFN updates in the vocabulary space, showing that each update can be decomposed to sub-updates corresponding to single FFN parameter vectors, each promoting concepts that are often human-interpretable. We then leverage these findings for controlling LM predictions, where we reduce the toxicity of GPT2 by almost 50%, and for improving computation efficiency with a simple early exit rule, saving 20% of computation on average. pdf bib abs Learning to Generate Question by Asking Question: A Primal-Dual Approach with Uncommon Word Generation Qifan Wang | Li Yang | Xiaojun Quan | Fuli Feng | Dongfang Liu | Zenglin Xu | Sinong Wang | Hao Ma Automatic question generation (AQG) is the task of generating a question from a given passage and an answer. Most existing AQG methods aim at encoding the passage and the answer to generate the question. However, limited work has focused on modeling the correlation between the target answer and the generated question. Moreover, unseen or rare word generation has not been studied in previous works. In this paper, we propose a novel approach which incorporates question generation with its dual problem, question answering, into a unified primal-dual framework. Specifically, the question generation component consists of an encoder that jointly encodes the answer with the passage, and a decoder that produces the question. The question answering component then re-asks the generated question on the passage to ensure that the target answer is obtained. We further introduce a knowledge distillation module to improve the model generalization ability. We conduct an extensive set of experiments on SQuAD and HotpotQA benchmarks. Experimental results demonstrate the superior performance of the proposed approach over several state-of-the-art methods. pdf bib abs Graph-based Model Generation for Few-Shot Relation Extraction Wanli Li | Tieyun Qian Few-shot relation extraction (FSRE) has been a challenging problem since it only has a handful of training instances. Existing models follow a ‘one-for-all’ scheme where one general large model performs all individual N-way-K-shot tasks in FSRE, which prevents the model from achieving the optimal point on each task. In view of this, we propose a model generation framework that consists of one general model for all tasks and many tiny task-specific models for each individual task. The general model generates and passes the universal knowledge to the tiny models which will be further fine-tuned when performing specific tasks. In this way, we decouple the complexity of the entire task space from that of all individual tasks while absorbing the universal knowledge.Extensive experimental results on two public datasets demonstrate that our framework reaches a new state-of-the-art performance for FRSE tasks. Our code is available at: https://github.com/NLPWM-WHU/GM_GEN. pdf bib abs Backdoor Attacks in Federated Learning by Rare Embeddings and Gradient Ensembling Ki Yoon Yoo | Nojun Kwak Recent advances in federated learning have demonstrated its promising capability to learn on decentralized datasets. However, a considerable amount of work has raised concerns due to the potential risks of adversaries participating in the framework to poison the global model for an adversarial purpose. This paper investigates the feasibility of model poisoning for backdoor attacks through rare word embeddings of NLP models. In text classification, less than 1% of adversary clients suffices to manipulate the model output without any drop in the performance of clean sentences. For a less complex dataset, a mere 0.1% of adversary clients is enough to poison the global model effectively. We also propose a technique specialized in the federated learning scheme called gradient ensemble, which enhances the backdoor performance in all experimental settings. pdf bib abs Generating Natural Language Proofs with Verifier-Guided Search Kaiyu Yang | Jia Deng | Danqi Chen Reasoning over natural language is a challenging problem in NLP. In this work, we focus on proof generation: Given a hypothesis and a set of supporting facts, the model generates a proof tree indicating how to derive the hypothesis from supporting facts. Compared to generating the entire proof in one shot, stepwise generation can better exploit the compositionality and generalize to longer proofs but has achieved limited success on real-world data. Existing stepwise methods struggle to generate proof steps that are both logically valid and relevant to the hypothesis. Instead, they tend to hallucinate invalid steps given the hypothesis. In this paper, we present a novel stepwise method, NLProofS (Natural Language Proof Search), which learns to generate relevant steps conditioning on the hypothesis. At the core of our approach, we train an independent verifier to check the validity of the proof steps to prevent hallucination. Instead of generating steps greedily, we search for proofs maximizing a global proof score judged by the verifier. NLProofS achieves state-of-the-art performance on EntailmentBank and RuleTaker. Specifically, it improves the correctness of predicted proofs from 27.7% to 33.3% in the distractor setting of EntailmentBank, demonstrating the effectiveness of NLProofS in generating challenging human-authored proofs. pdf bib abs Toward Unifying Text Segmentation and Long Document Summarization Sangwoo Cho | Kaiqiang Song | Xiaoyang Wang | Fei Liu | Dong Yu Text segmentation is important for signaling a document’s structure. Without segmenting a long document into topically coherent sections, it is difficult for readers to comprehend the text, let alone find important information. The problem is only exacerbated by a lack of segmentation in transcripts of audio/video recordings. In this paper, we explore the role that section segmentation plays in extractive summarization of written and spoken documents. Our approach learns robust sentence representations by performing summarization and segmentation simultaneously, which is further enhanced by an optimization-based regularizer to promote selection of diverse summary sentences. We conduct experiments on multiple datasets ranging from scientific articles to spoken transcripts to evaluate the model’s performance. Our findings suggest that the model can not only achieve state-of-the-art performance on publicly available benchmarks, but demonstrate better cross-genre transferability when equipped with text segmentation. We perform a series of analyses to quantify the impact of section segmentation on summarizing written and spoken documents of substantial length and complexity. pdf bib abs The Geometry of Multilingual Language Model Representations Tyler A. Chang | Zhuowen Tu | Benjamin K. Bergen We assess how multilingual language models maintain a shared multilingual representation space while still encoding language-sensitive information in each language. Using XLM-R as a case study, we show that languages occupy similar linear subspaces after mean-centering, evaluated based on causal effects on language modeling performance and direct comparisons between subspaces for 88 languages. The subspace means differ along language-sensitive axes that are relatively stable throughout middle layers, and these axes encode information such as token vocabularies. Shifting representations by language means is sufficient to induce token predictions in different languages. However, we also identify stable language-neutral axes that encode information such as token positions and part-of-speech. We visualize representations projected onto language-sensitive and language-neutral axes, identifying language family and part-of-speech clusters, along with spirals, toruses, and curves representing token position information. These results demonstrate that multilingual language models encode information along orthogonal language-sensitive and language-neutral axes, allowing the models to extract a variety of features for downstream tasks and cross-lingual transfer learning. pdf bib abs Improving Complex Knowledge Base Question Answering via Question-to-Action and Question-to-Question Alignment Yechun Tang | Xiaoxia Cheng | Weiming Lu Complex knowledge base question answering can be achieved by converting questions into sequences of predefined actions. However, there is a significant semantic and structural gap between natural language and action sequences, which makes this conversion difficult. In this paper, we introduce an alignment-enhanced complex question answering framework, called ALCQA, which mitigates this gap through question-to-action alignment and question-to-question alignment. We train a question rewriting model to align the question and each action, and utilize a pretrained language model to implicitly align the question and KG artifacts. Moreover, considering that similar questions correspond to similar action sequences, we retrieve top-k similar question-answer pairs at the inference stage through question-to-question alignment and propose a novel reward-guided action sequence selection strategy to select from candidate action sequences. We conduct experiments on CQA and WQSP datasets, and the results show that our approach outperforms state-of-the-art methods and obtains a 9.88% improvements in the F1 metric on CQA dataset. Our source code is available at https://github.com/TTTTTTTTy/ALCQA . pdf bib abs PAIR : Prompt-Aware marg I n Ranking for Counselor Reflection Scoring in Motivational Interviewing Do June Min | Verónica Pérez-Rosas | Kenneth Resnicow | Rada Mihalcea Counselor reflection is a core verbal skill used by mental health counselors to express understanding and affirmation of the client’s experience and concerns. In this paper, we propose a system for the analysis of counselor reflections. Specifically, our system takes as input one dialog turn containing a client prompt and a counselor response, and outputs a score indicating the level of reflection in the counselor response. We compile a dataset consisting of different levels of reflective listening skills, and propose the Prompt-Aware margIn Ranking (PAIR) framework that contrasts positive and negative prompt and response pairs using specially designed multi-gap and prompt-aware margin ranking losses. Through empirical evaluations and deployment of our system in a real-life educational environment, we show that our analysis model outperforms several baselines on different metrics, and can be used to provide useful feedback to counseling trainees. pdf bib abs Co-guiding Net: Achieving Mutual Guidances between Multiple Intent Detection and Slot Filling via Heterogeneous Semantics-Label Graphs Bowen Xing | Ivor Tsang Recent graph-based models for joint multiple intent detection and slot filling have obtained promising results through modeling the guidance from the prediction of intents to the decoding of slot filling.However, existing methods (1) only model the unidirectional guidance from intent to slot; (2) adopt homogeneous graphs to model the interactions between the slot semantics nodes and intent label nodes, which limit the performance.In this paper, we propose a novel model termed Co-guiding Net, which implements a two-stage framework achieving the mutual guidances between the two tasks.In the first stage, the initial estimated labels of both tasks are produced, and then they are leveraged i

Executive Summary

The Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP) present a collection of cutting-edge research in the field of natural language processing (NLP). The conference, held in Abu Dhabi, United Arab Emirates, features contributions from leading researchers and academics. Notable articles include a review on generative knowledge graph construction, a benchmark for contradiction detection in Chinese conversations, and an analysis of transformer feed-forward layers. The proceedings highlight the advancements and challenges in NLP, offering insights into future research directions.

Key Points

▸ Generative Knowledge Graph Construction: A Review provides a comprehensive taxonomy and analysis of methods leveraging sequence-to-sequence frameworks for building knowledge graphs.
▸ CDConv: A Benchmark for Contradiction Detection in Chinese Conversations introduces a new dataset and methods for detecting contradictions in multi-turn conversations, highlighting the challenges in open-domain dialogue systems.
▸ Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space explores the mechanisms of transformer feed-forward layers in NLP models.

Merits

Comprehensive Review

The review on generative knowledge graph construction offers a detailed taxonomy and empirical analysis, providing a solid foundation for future research in this area.

Innovative Benchmark

The CDConv benchmark introduces a novel dataset and methods for contradiction detection, addressing a critical issue in open-domain dialogue systems.

Insightful Analysis

The analysis of transformer feed-forward layers provides valuable insights into the mechanisms of NLP models, contributing to the understanding of their predictive capabilities.

Demerits

Limited Scope

The proceedings focus primarily on specific areas within NLP, which may not cover the broader spectrum of challenges and advancements in the field.

Data Quality Concerns

The automatic generation of conversations for the CDConv benchmark raises questions about the quality and representativeness of the data, which could impact the reliability of the findings.

Theoretical Limitations

The analysis of transformer feed-forward layers, while insightful, may have theoretical limitations that could affect the generalizability of the findings.

Expert Commentary

The Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing present a valuable collection of research that advances the field of NLP. The review on generative knowledge graph construction provides a comprehensive taxonomy and empirical analysis, offering a solid foundation for future research. The CDConv benchmark introduces an innovative dataset and methods for contradiction detection, addressing a critical issue in open-domain dialogue systems. The analysis of transformer feed-forward layers offers insightful perspectives on the mechanisms of NLP models, contributing to the understanding of their predictive capabilities. However, the proceedings are not without limitations. The focus on specific areas within NLP may not cover the broader spectrum of challenges and advancements in the field. The automatic generation of conversations for the CDConv benchmark raises questions about data quality and representativeness, which could impact the reliability of the findings. The theoretical limitations of the analysis of transformer feed-forward layers may affect the generalizability of the findings. Despite these limitations, the proceedings offer valuable insights and directions for future research, making them a significant contribution to the field of NLP.

Recommendations

✓ Future research should expand the scope of NLP studies to cover a broader range of challenges and advancements, ensuring a more comprehensive understanding of the field.
✓ Efforts should be made to improve the quality and representativeness of datasets used in NLP research, ensuring the reliability and validity of the findings.

Sources

EMNLP

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing - ACL Anthology

AI Commentary

Executive Summary

Key Points

Merits

Comprehensive Review

Innovative Benchmark

Insightful Analysis

Demerits

Limited Scope

Data Quality Concerns

Theoretical Limitations

Expert Commentary

Recommendations

Sources

Related Articles

Google Maps

Find Your Next Job

A Retrospective on the ICLR 2026 Review Process

Retrospective on PAT x ICML 2026 AI Paper Assistant Program

JCG, PC

HSOLLC Co., Ltd.