- Neural Machine Translation (Thang Luong, Kyunghyun Cho and Chris Manning) Tutorial at ACL 2016.

- Natural Language Understanding with Distributed Representation (Kyunghyun Cho)
- A Primer on Neural Networks for NLP (Yoav Goldberg)

- A Neural Probabilistic Language Model (Bengio et al) JMLR 2003.
- Decoding with large-scale neural language models improves translation (Vaswani et al) EMNLP 2013.
- A Scalable Hierarchical Distributed Language Model (Mnih and Hinton) NIPS 2009.
- A fast and simple algorithm for training neural probabilistic language models (Mnih and Teh) ICML 2012.
- Fast and robust neural network joint models for statistical machine translation (Devlin et al) ACL 2014.

- Recurrent neural network based language model (Mikolov et al) InterSpeech 2010.
- Statistical Language Models based on Neural Networks (Tomas Mikolov) PhD thesis.
- Extensions of recurrent neural network language model (Mikolov et al) ICASSP 2011.

- Sequence to Sequence Learning with Neural Networks (Sutskever et al) NIPS 2014.
- Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (Cho et al) EMNLP 2014.
- Grammar as a foreign language (Vinyals et al) ICLR 2015 **submission** (this version has no attention model).

- Learning long-term dependencies with gradient descent is difficult (Bengio et al) IEEE Trans on Neural Networks.
- Visualizing and Understanding Recurrent Networks (Andrej Karpathy, Justin Johnson, Fei-Fei Li) ICLR 2016.
- Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks (Hendrik Strobelt, Sebastian Gehrmann, Bernd Huber, Hanspeter Pfister, Alexander M. Rush)

- Neural Translation by Jointly Learning to Align and Translate (Bahdanau et al) ICLR 2015.
- Effective Approaches to Attention-based Neural Machine Translation (Luong et al) ACL 2015.
- On using very large target vocabulary for neural machine translation (Jean et al) ACL 2015.
- Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation (Zhou et al) TACL 2016.

- Recurrent Models of Visual Attention (Mnih et al) NIPS 2014.
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (Xu et al) ICML 2015.
- Long-term Recurrent Convolutional Networks for Visual Recognition and Description (Donahue et al) CVPR 2015.
- DRAW: A recurrent neural network for image generation (Gregor et al) ICML 2015.

- Supervised Sequence Labelling with Recurrent Neural Networks (Alex Graves) Phd Thesis.
- Generating Sequences With Recurrent Neural Networks (Alex Graves) arxiv.

- Modeling Coverage for Neural Machine Translation (Tu et al) ACL 2016.
- Incorporating Structural Alignment Biases into an Attentional Neural Translation Model (Cohn et al) NAACL 2016.
- Context Gates for Neural Machine Translation (Tu et al) arxiv.

- Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism (Firat, Cho and Bengio) NAACL 2016.
- Multi-source neural translation (Zoph et al) NAACL 2016.
- Multi-task learning for multiple language translation (Dong et al) ACL 2015.
- Multi-task Sequence to Sequence Learning (Luong et al) ICLR 2016.

- Recurrent Continuous Translation Models (Kalchbrenner and Blunsom) EMNLP 2013.
- A Convolutional Encoder Model for Neural Machine Translation (Gehring et al) arxiv 2016.
- Convolutional Encoders for Neural Machine Translation (Lamb and Xie) unpublished manuscript 2016.
- Context-Dependent Translation Selection Using Convolutional Neural Network (Tu et al) ACL-IJCNLP 2015.
- Encoding Source Language with Convolutional Neural Network for Machine Translation (Meng et al) ACL-IJCNLP 2015.

- Fully Character-Level Neural Machine Translation without Explicit Segmentation (Lee et al) arxiv 2016.
- Character-based neural machine translation (Costa-Jussa et al) ACL 2016.
- A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation (Chung et al) ACL 2016.
- Neural Machine Translation of Rare Words with Subword Units (Sennrich et al)

- A Fast and Accurate Dependency Parser using Neural Networks (Chen and Manning) EMNLP 2014.
- Transition-based dependency parsing with stack long short-term memory (Dyer et al) ACL 2015.
- Easy-First Dependency Parsing with Hierarchical Tree LSTMs (Kiperwasser and Goldberg) arxiv 2016.
- Recurrent Neural Network Grammars (Dyer et al) NAACL 2016.