SFU Neural MT class: Syllabus

Cho et al (EMNLP 2014) describe a Recursive Neural Network based Encoder-Decoder for Neural Machine Translation.

Syllabus

Course Introduction

Lecture notes

Statistical Machine Translation

Links (=optional)

Deep Learning. Ian Goodfellow, Yoshua Bengio and Aaron Courville.
Neural MT implementations.
Decoding for Statistical MT. Philipp Koehn.

Introduction to Neural MT

Lecture notes

Neural Machine Translation (Thang Luong, Kyunghyun Cho and Chris Manning)
Neural Machine Translation and Sequence-to-sequence Models: A Tutorial (Graham Neubig)

Links (=optional)

Neural Machine Translation: Breaking the Performance Plateau. Rico Sennrich.
Several Studies on Natural Language and Back-propagation. Robert B. Allen.
Learning recursive distributed representations for holistic computation. Lonnie Chrisman.
A connectionist approach to machine translation. M. A. Castano and F. Casacuberta. EUROSPEECH. 1997.
Asynchronous translations with recurrent neural nets. R. Neco and M. Forcada. International Conference on Neural Networks, vol 4, pages 2535-2540.

Neural Network Models and Training

Lecture notes

Natural Language Understanding with Distributed Representation (Kyunghyun Cho)
A Primer on Neural Networks for NLP (Yoav Goldberg)

Links (=optional)

Deep Learning for MT Winter School.
Visualizing Optimization Algorithms. Alec Radford.
Learning representations by back-propagating errors. Rumelhart, Hinton and Williams. Nature, 1986.

Gated Recurrent Units and LSTMs

Lecture notes

On the difficulty of training Recurrent Neural Networks (Razvan Pascanu, Tomas Mikolov, Yoshua Bengio)
Learning long-term dependencies with gradient descent is difficult (Bengio et al)

Links (=optional)

Visualizing and Understanding Recurrent Networks. Andrej Karpathy, Justin Johnson, Fei-Fei Li. ICLR 2016.
Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks. Hendrik Strobelt, Sebastian Gehrmann, Bernd Huber, Hanspeter Pfister, Alexander M. Rush.
Understanding LSTMs. Chris Olah.

Sequence to Sequence RNNs

Lecture notes

Sequence to Sequence Learning with Neural Networks (Sutskever et al)
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (Cho et al)
Grammar as a foreign language (Vinyals et al)

Links (=optional)

Incorporating Copying Mechanism in Sequence-to-Sequence Learning. Gu et al. ACL 2016.

Attention

Lecture notes

Neural Translation by Jointly Learning to Align and Translate (Bahdanau et al)
Effective Approaches to Attention-based Neural Machine Translation (Luong et al)
On using very large target vocabulary for neural machine translation (Jean et al)
Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation (Zhou et al)

Links (=optional)

Grammar as a Foreign Language. Vinyals et al. arXiv (this version has attention).

Multilingual NMT (Oct 11: Nishant K.)

Lecture notes

Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism (Firat, Cho and Bengio)
Multi-source neural translation (Zoph et al)
Multi-task learning for multiple language translation (Dong et al)

Links (=optional)

Code for their NAACL 2016 and CSL journal paper. Firat et al.
Multi-task Sequence to Sequence Learning. Luong et al. ICLR 2016.
Multi-way, multilingual neural machine translation. Firat et al. Computer, Speech and Language.

Sequence level training of RNNs (Oct 16: Hassan S.)

Lecture notes

Sequence Level Training with Recurrent Neural Networks (Ranzato, Chopra, Auli, Zaremba)
Reinforcement Learning Neural Turing Machines - Revised (Zaremba, Sutskever)

Actor-critic training of RNNs (Oct 18: Payam M.)

Lecture notes

An Actor-Critic Algorithm for Sequence Prediction (Bahdanau et al.)

Convolutional sequence to sequence learning (Oct 23: P. Gujjar)

Lecture notes

Convolutional sequence to sequence learning (Gehring, Auli, Grangier, Yarats, Dauphin)
Recurrent continuous translation models (Kalchbrenner, Blunsom)

Links (=optional)

A Neural Attention Model for Abstractive Sentence Summarization. Rush, Chopra, Weston. EMNLP 2015.
DeepL Translator.
DEEPL Machine Translation Vs our Challenge Set. Pierre Isabelle.
A Challenge Set Approach to Evaluating Machine Translation. Isabelle, Cherry, Foster. EMNLP 2017.

Multi-lingual word embedding (Oct 25: R. Gunasekaran)

Lecture notes

A survey of cross-lingual embedding models (S. Ruder)
Cross-lingual Models of Word Embeddings: An Empirical Comparison (Upadhyay, Faruqui, Dyer, Roth)

Links (=optional)

A survey of cross-lingual embedding models. S. Ruder.

Neural Paraphrasing (Oct 30: M. Kierans)

Lecture notes

Learning to Paraphrase for Question Answering (Dong, Mallinson, Reddy, Lapata)
Neural Paraphrase Generation with Stacked Residual LSTM Networks (Prakash, Hasan, Lee, Datla, Qadir, Liu, Farri)

Links (=optional)

Improving Statistical Machine Translation with a Multilingual Paraphrase Database. Seraj, Siahbani, Sarkar. EMNLP 2015.

Stack LSTMs and NMT (Tue Oct 31 9:30am: J. Gu)

Lecture notes

Make up class: Different time and place. No class on Wed, Nov 1. Tue, Oct 31, 9:30am in TASC1 9408
Greedy Transition-Based Dependency Parsing with Stack LSTMs (Dyer, Ballesteros, Ling, Matthews, Smith)
Recurrent Neural Network Grammars (Dyer, Kuncoro, Ballesteros, Smith)
Learning to Parse and Translate Improves Neural Network Translation (Eriguchi, Tsuruoka, Cho)

Understanding attention (Nov 6: P. Batta)

Lecture notes

Frustratingly Short Attention Spans in Neural Language Modeling. (Daniluk et al.)
Using fast weights to attend to the recent past (Ba, Hinton, Minh, Leibo, Ionescu)

Links (=optional)

Key-Value Memory Networks for Directly Reading Documents. Miller et. al.. EMNLP 2016.

GPU Training for Encoder-Decoder networks (Nov 8: N. Vedula)

Lecture notes

On-the-fly Operation Batching in Dynamic Computation Graphs (G. Neubig, Y. Goldberg, C. Dyer)
Friends don’t let friends write batching code

Sentence representations (Nov 15: T. Elganainy)

Lecture notes

Learning Distributed Representations of Sentences from Unlabelled Data. (Hill, Cho, Korhonen)
Learning Generic Sentence Representations Using Convolutional Neural Networks. (Gan et al.)
Skip-thought vectors. (Kiros et al.)

Links (=optional)

Aetherial Symbols. G. Hinton.

Positional encoding

Lecture notes

Attention is all you need. (Vaswani et al.)
One model to learn them all. (Kaiser et al.)

Links (=optional)

Tensor2Tensor.

Open vocabulary

Lecture notes

Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling. (Kawakami, Dyer, Blunsom)
Reference-Aware Language Models. (Yang, Blunsom, Dyer, Ling)

Pointer networks

Lecture notes

Pointer Networks. (Vinyals, Fortunato, Jaitly)
Pointer Sentinel Mixture Models. (Merity, Xiong, Bradbury, Socher)

Unsupervised NMT

Lecture notes

Unsupervised Machine Translation Using Monolingual Corpora Only (Lample, Denoyer, Ranzato)