Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing - ACL Anthology
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing Marie-Francine Moens , Xuanjing Huang , Lucia Specia , Scott Wen-tau Yih (Editors) Anthology ID: 2021.emnlp-main Month: November Year: 2021 Address: Online and Punta Cana, Dominican Republic Venue: EMNLP SIG: Publisher: Association for Computational Linguistics URL: https://aclanthology.org/2021.emnlp-main/ DOI: Bib Export formats: BibTeX MODS XML EndNote PDF: https://aclanthology.org/2021.emnlp-main.pdf PDF (full) Bib TeX Search Show all abstracts Hide all abstracts pdf bib Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing Marie-Francine Moens | Xuanjing Huang | Lucia Specia | Scott Wen-tau Yih pdf bib abs A lig NART : Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate Jongyoon Song | Sungwon Kim | Sungroh Yoon Non-autoregressive neural machine translation (NART) models suffer from the multi-modality problem which causes translation inconsistency such as token repetition. Most recent approaches have attempted to solve this problem by implicitly modeling dependencies between outputs. In this paper, we introduce AligNART, which leverages full alignment information to explicitly reduce the modality of the target distribution. AligNART divides the machine translation task into (i) alignment estimation and (ii) translation with aligned decoder inputs, guiding the decoder to focus on simplified one-to-one translation. To alleviate the alignment estimation problem, we further propose a novel alignment decomposition method. Our experiments show that AligNART outperforms previous non-iterative NART models that focus on explicit modality reduction on WMT14 En↔De and WMT16 Ro→En. Furthermore, AligNART achieves BLEU scores comparable to those of the state-of-the-art connectionist temporal classification based models on WMT14 En↔De. We also observe that AligNART effectively addresses the token repetition problem even without sequence-level knowledge distillation. pdf bib abs Zero-Shot Cross-Lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders Guanhua Chen | Shuming Ma | Yun Chen | Li Dong | Dongdong Zhang | Jia Pan | Wenping Wang | Furu Wei Previous work mainly focuses on improving cross-lingual transfer for NLU tasks with a multilingual pretrained encoder (MPE), or improving the performance on supervised machine translation with BERT. However, it is under-explored that whether the MPE can help to facilitate the cross-lingual transferability of NMT model. In this paper, we focus on a zero-shot cross-lingual transfer task in NMT. In this task, the NMT model is trained with parallel dataset of only one language pair and an off-the-shelf MPE, then it is directly tested on zero-shot language pairs. We propose SixT, a simple yet effective model for this task. SixT leverages the MPE with a two-stage training schedule and gets further improvement with a position disentangled encoder and a capacity-enhanced decoder. Using this method, SixT significantly outperforms mBART, a pretrained multilingual encoder-decoder model explicitly designed for NMT, with an average improvement of 7.1 BLEU on zero-shot any-to-English test sets across 14 source languages. Furthermore, with much less training computation cost and training data, our model achieves better performance on 15 any-to-English test sets than CRISS and m2m-100, two strong multilingual NMT baselines. pdf bib abs ERNIE - M : Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora Xuan Ouyang | Shuohuan Wang | Chao Pang | Yu Sun | Hao Tian | Hua Wu | Haifeng Wang Recent studies have demonstrated that pre-trained cross-lingual models achieve impressive performance in downstream cross-lingual tasks. This improvement benefits from learning a large amount of monolingual and parallel corpora. Although it is generally acknowledged that parallel corpora are critical for improving the model performance, existing methods are often constrained by the size of parallel corpora, especially for low-resource languages. In this paper, we propose Ernie-M, a new training method that encourages the model to align the representation of multiple languages with monolingual corpora, to overcome the constraint that the parallel corpus size places on the model performance. Our key insight is to integrate back-translation into the pre-training process. We generate pseudo-parallel sentence pairs on a monolingual corpus to enable the learning of semantic alignments between different languages, thereby enhancing the semantic modeling of cross-lingual models. Experimental results show that Ernie-M outperforms existing cross-lingual models and delivers new state-of-the-art results in various cross-lingual downstream tasks. The codes and pre-trained models will be made publicly available. pdf bib abs Cross Attention Augmented Transducer Networks for Simultaneous Translation Dan Liu | Mengge Du | Xiaoxi Li | Ya Li | Enhong Chen This paper proposes a novel architecture, Cross Attention Augmented Transducer (CAAT), for simultaneous translation. The framework aims to jointly optimize the policy and translation models. To effectively consider all possible READ-WRITE simultaneous translation action paths, we adapt the online automatic speech recognition (ASR) model, RNN-T, but remove the strong monotonic constraint, which is critical for the translation task to consider reordering. To make CAAT work, we introduce a novel latency loss whose expectation can be optimized by a forward-backward algorithm. We implement CAAT with Transformer while the general CAAT architecture can also be implemented with other attention-based encoder-decoder frameworks. Experiments on both speech-to-text (S2T) and text-to-text (T2T) simultaneous translation tasks show that CAAT achieves significantly better latency-quality trade-offs compared to the state-of-the-art simultaneous translation approaches. pdf bib abs Translating Headers of Tabular Data: A Pilot Study of Schema Translation Kunrui Zhu | Yan Gao | Jiaqi Guo | Jian-Guang Lou Schema translation is the task of automatically translating headers of tabular data from one language to another. High-quality schema translation plays an important role in cross-lingual table searching, understanding and analysis. Despite its importance, schema translation is not well studied in the community, and state-of-the-art neural machine translation models cannot work well on this task because of two intrinsic differences between plain text and tabular data: morphological difference and context difference. To facilitate the research study, we construct the first parallel dataset for schema translation, which consists of 3,158 tables with 11,979 headers written in 6 different languages, including English, Chinese, French, German, Spanish, and Japanese. Also, we propose the first schema translation model called CAST, which is a header-to-header neural machine translation model augmented with schema context. Specifically, we model a target header and its context as a directed graph to represent their entity types and relations. Then CAST encodes the graph with a relational-aware transformer and uses another transformer to decode the header in the target language. Experiments on our dataset demonstrate that CAST significantly outperforms state-of-the-art neural machine translation models. Our dataset will be released at https://github.com/microsoft/ContextualSP . pdf bib abs Towards Making the Most of Dialogue Characteristics for Neural Chat Translation Yunlong Liang | Chulun Zhou | Fandong Meng | Jinan Xu | Yufeng Chen | Jinsong Su | Jie Zhou Neural Chat Translation (NCT) aims to translate conversational text between speakers of different languages. Despite the promising performance of sentence-level and context-aware neural machine translation models, there still remain limitations in current NCT models because the inherent dialogue characteristics of chat, such as dialogue coherence and speaker personality, are neglected. In this paper, we propose to promote the chat translation by introducing the modeling of dialogue characteristics into the NCT model. To this end, we design four auxiliary tasks including monolingual response generation, cross-lingual response generation, next utterance discrimination, and speaker identification. Together with the main chat translation task, we optimize the enhanced NCT model through the training objectives of all these tasks. By this means, the NCT model can be enhanced by capturing the inherent dialogue characteristics, thus generating more coherent and speaker-relevant translations. Comprehensive experiments on four language directions (English<->German and English<->Chinese) verify the effectiveness and superiority of the proposed approach. pdf bib abs Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source Pretraining Yicheng Zou | Bolin Zhu | Xingwu Hu | Tao Gui | Qi Zhang With the rapid increase in the volume of dialogue data from daily life, there is a growing demand for dialogue summarization. Unfortunately, training a large summarization model is generally infeasible due to the inadequacy of dialogue data with annotated summaries. Most existing works for low-resource dialogue summarization directly pretrain models in other domains, e.g., the news domain, but they generally neglect the huge difference between dialogues and conventional articles. To bridge the gap between out-of-domain pretraining and in-domain fine-tuning, in this work, we propose a multi-source pretraining paradigm to better leverage the external summary data. Specifically, we exploit large-scale in-domain non-summary data to separately pretrain the dialogue encoder and the summary decoder. The combined encoder-decoder model is then pretrained on the out-of-domain summary data using adversarial critics, aiming to facilitate domain-agnostic summarization. The experimental results on two public datasets show that with only limited training data, our approach achieves competitive performance and generalizes well in different dialogue scenarios. pdf bib abs Controllable Neural Dialogue Summarization with Personal Named Entity Planning Zhengyuan Liu | Nancy Chen In this paper, we propose a controllable neural generation framework that can flexibly guide dialogue summarization with personal named entity planning. The conditional sequences are modulated to decide what types of information or what perspective to focus on when forming summaries to tackle the under-constrained problem in summarization tasks. This framework supports two types of use cases: (1) Comprehensive Perspective, which is a general-purpose case with no user-preference specified, considering summary points from all conversational interlocutors and all mentioned persons; (2) Focus Perspective, positioning the summary based on a user-specified personal named entity, which could be one of the interlocutors or one of the persons mentioned in the conversation. During training, we exploit occurrence planning of personal named entities and coreference information to improve temporal coherence and to minimize hallucination in neural generation. Experimental results show that our proposed framework generates fluent and factually consistent summaries under various planning controls using both objective metrics and human evaluations. pdf bib abs Fine-grained Factual Consistency Assessment for Abstractive Summarization Models Sen Zhang | Jianwei Niu | Chuyuan Wei Factual inconsistencies existed in the output of abstractive summarization models with original documents are frequently presented. Fact consistency assessment requires the reasoning capability to find subtle clues to identify whether a model-generated summary is consistent with the original document. This paper proposes a fine-grained two-stage Fact Consistency assessment framework for Summarization models (SumFC). Given a document and a summary sentence, in the first stage, SumFC selects the top-K most relevant sentences with the summary sentence from the document. In the second stage, the model performs fine-grained consistency reasoning at the sentence level, and then aggregates all sentences’ consistency scores to obtain the final assessment result. We get the training data pairs by data synthesis and adopt contrastive loss of data pairs to help the model identify subtle cues. Experiment results show that SumFC has made a significant improvement over the previous state-of-the-art methods. Our experiments also indicate that SumFC distinguishes detailed differences better. pdf bib abs Decision-Focused Summarization Chao-Chun Hsu | Chenhao Tan Relevance in summarization is typically de- fined based on textual information alone, without incorporating insights about a particular decision. As a result, to support risk analysis of pancreatic cancer, summaries of medical notes may include irrelevant information such as a knee injury. We propose a novel problem, decision-focused summarization, where the goal is to summarize relevant information for a decision. We leverage a predictive model that makes the decision based on the full text to provide valuable insights on how a decision can be inferred from text. To build a summary, we then select representative sentences that lead to similar model decisions as using the full text while accounting for textual non-redundancy. To evaluate our method (DecSum), we build a testbed where the task is to summarize the first ten reviews of a restaurant in support of predicting its future rating on Yelp. DecSum substantially outperforms text-only summarization methods and model-based explanation methods in decision faithfulness and representativeness. We further demonstrate that DecSum is the only method that enables humans to outperform random chance in predicting which restaurant will be better rated in the future. pdf bib abs Multiplex Graph Neural Network for Extractive Text Summarization Baoyu Jing | Zeyu You | Tao Yang | Wei Fan | Hanghang Tong Extractive text summarization aims at extracting the most representative sentences from a given document as its summary. To extract a good summary from a long text document, sentence embedding plays an important role. Recent studies have leveraged graph neural networks to capture the inter-sentential relationship (e.g., the discourse graph) within the documents to learn contextual sentence embedding. However, those approaches neither consider multiple types of inter-sentential relationships (e.g., semantic similarity and natural connection relationships), nor model intra-sentential relationships (e.g, semantic similarity and syntactic relationship among words). To address these problems, we propose a novel Multiplex Graph Convolutional Network (Multi-GCN) to jointly model different types of relationships among sentences and words. Based on Multi-
Executive Summary
The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) showcased significant advancements in the field of natural language processing (NLP). Two notable papers presented at the conference include 'AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate' and 'Zero-Shot Cross-Lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders'. The former introduces a novel approach to non-autoregressive neural machine translation (NART) that addresses the multi-modality problem and token repetition, while the latter explores the use of multilingual pretrained encoders (MPE) to improve zero-shot cross-lingual transferability in neural machine translation (NMT). These papers highlight the ongoing efforts to enhance the efficiency and accuracy of machine translation systems.
Key Points
- ▸ Introduction of AligNART for non-autoregressive neural machine translation.
- ▸ Use of multilingual pretrained encoders for zero-shot cross-lingual transfer in NMT.
- ▸ Significant improvements in machine translation accuracy and efficiency.
Merits
Innovative Approaches
Both papers present innovative methods that address critical challenges in NLP, particularly in machine translation.
Empirical Validation
The studies provide empirical evidence supporting the effectiveness of their proposed methods, enhancing the credibility of their findings.
Demerits
Limited Scope
The studies focus on specific aspects of NLP and may not be directly applicable to other areas within the field.
Complexity
The methods proposed are complex and may require significant computational resources and expertise to implement.
Expert Commentary
The 2021 EMNLP proceedings demonstrate the ongoing commitment of the NLP community to pushing the boundaries of machine translation. The introduction of AligNART represents a significant step forward in addressing the multi-modality problem in non-autoregressive neural machine translation. By leveraging full alignment information, the method not only improves translation consistency but also achieves comparable performance to state-of-the-art models. Similarly, the exploration of zero-shot cross-lingual transfer using multilingual pretrained encoders opens new avenues for enhancing the transferability of NMT models. The practical implications of these advancements are substantial, as they pave the way for more efficient and accurate machine translation systems. However, the complexity of these methods and their limited scope highlight the need for further research to ensure broader applicability and accessibility. Policymakers and industry leaders should take note of these advancements and consider the ethical implications of deploying such technologies on a larger scale.
Recommendations
- ✓ Further research to simplify and broaden the applicability of the proposed methods.
- ✓ Increased collaboration between academia and industry to accelerate the development and deployment of advanced NLP technologies.