Evolution of Representations in the Transformer

This is a post for the EMNLP 2019 paper The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives.

We look at the evolution of representations of individual tokens in Transformers trained with different training objectives (MT, LM, MLM - BERT-style) from the Information Bottleneck perspective and show, that:
  • LMs gradually forget past when forming future;
  • for MLMs, the evolution has the two stages of context encoding and token reconstruction;
  • MT representations get refined with context, but less processing is happening.
September 2019


When a Good Translation is Wrong in Context

This is a post for the ACL 2019 paper When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion.

From this post, you will learn:
  • which phenomena cause context-agnostic translations to be inconsistent with each other
  • how we create test sets addressing the most frequent phenomena
  • about a novel set-up for context-aware NMT with a large amount of sentence-level data and much less of document-level data
  • about a new model for this set-up (Context-Aware Decoder, aka CADec) - a two-pass MT model which first produces a draft translation of the current sentence, then corrects it using context.
July 2019


The Story of Heads

This is a post for the ACL 2019 paper Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned.

From this post, you will learn:
  • how we evaluate the importance of attention heads in Transformer
  • which functions the most important encoder heads perform
  • how we prune the vast majority of attention heads in Transformer without seriously affecting quality
  • which types of model attention are most sensitive to the number of attention heads and on which layers
June 2019