Skip to main content


Also see References from the ACL 2016 Tutorial by Luong, Cho and Manning.

Recurrent Neural Networks

  1. Hochreiter & Schmidhuber, 1997. Long Short-term Memory.
  2. Bengio et al, IEEE Trans on Neural Networks. Learning long-term dependencies with gradient descent is difficult.
  3. Mikael Bodén 2002. A guide to recurrent neural networks and backpropagation.
  4. Pascanu et al, 2013. On the difficulty of training Recurrent Neural Networks.
  5. Graves et al, ICML 2006. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks.

RNN Language models

  1. Mikolov et al InterSpeech 2010. Recurrent neural network based language model.
  2. Mikolov PhD thesis. Statistical Language Models based on Neural Networks.
  3. Mikolov et al ICASSP 2011. Extensions of recurrent neural network language model.
  4. Zoph, Vaswani, May, Knight, NAACL’16. Simple, Fast Noise Contrastive Estimation for Large RNN Vocabularies.
  5. Ji, Vishwanathan, Satish, Anderson, Dubey, ICLR’16. BlackOut: Speeding up Recurrent Neural Network Language Models with very Large Vocabularies.
  6. Merity et al, 2016. Pointer Sentinel Mixture Models.
  7. Sundermeyer, Ney, and Schluter, 2015. From feedforward to recurrent lstm neural networks for language modeling.

n-gram Neural Language models

  1. Bengio, Ducharme, Vincent, Jauvin, JMLR’03. A Neural Probabilistic Language Model.
  2. Morin & Bengio, AISTATS’05. Hierarchical Probabilistic Neural Network Language Model.
  3. Mnih & Hinton, NIPS’09. A Scalable Hierarchical Distributed Language Model.
  4. Mnih & Teh, ICML’12. A fast and simple algorithm for training neural probabilistic language models.
  5. Vaswani, Zhao, Fossum, Chiang, EMNLP’13. Decoding with Large-Scale Neural Language Models Improves Translation.
  6. Kim, Jernite, Sontag, Rush, AAAI’16. Character-Aware Neural Language Models.
  7. Ji, Haffari, Eisenstein, NAACL’16. A Latent Variable Recurrent Neural Network for Discourse-Driven Language Models.
  8. Wang, Cho, ACL’16. Larger-Context Language Modelling with Recurrent Neural Network.

Neural Machine Translation

  1. Bahdanau et al., ICLR’15. Neural Translation by Jointly Learning to Align and Translate.
  2. Chung, Cho, Bengio, ACL’16. A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation.
  3. Cohn, Hoang, Vymolova, Yao, Dyer, Haffari, NAACL’16. Incorporating Structural Alignment Biases into an Attentional Neural Translation Model.
  4. Gu, Lu, Li, Li, ACL’16. Incorporating Copying Mechanism in Sequence-to-Sequence Learning.
  5. Gulcehre, Ahn, Nallapati, Zhou, Bengio, ACL’16. Pointing the Unknown Words.
  6. Ling, Luís, Marujo, Astudillo, Amir, Dyer, Black, Trancoso, EMNLP’15. Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation.
  7. Luong et al., ACL’15a. Addressing the Rare Word Problem in Neural Machine Translation.
  8. Luong et al., ACL’15b. Effective Approaches to Attention-based Neural Machine Translation.
  9. Luong & Manning, IWSLT’15. Stanford Neural Machine Translation Systems for Spoken Language Domain.
  10. Sennrich, Haddow, Birch, ACL’16a. Improving Neural Machine Translation Models with Monolingual Data.
  11. Sennrich, Haddow, Birch, ACL’16b. Neural Machine Translation of Rare Words with Subword Units.
  12. Tu, Lu, Liu, Liu, Li, ACL’16. Modeling Coverage for Neural Machine Translation.

Encoder-Decoder Neural Networks

  1. Cho+, EMNLP 2014 Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation.
  2. Mnih et al., NIPS’14 Recurrent Models of Visual Attention.
  3. Sutskever et al., NIPS’14. Sequence to Sequence Learning with Neural Networks.
  4. Xu, Ba, Kiros, Cho, Courville, Salakhutdinov, Zemel, Bengio, ICML’15. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.
  5. Jia, Liang, ACL’16. Data Recombination for Neural Semantic Parsing.

Encoder-Decoder Plus Reinforce

  1. Zaremba and Sutskever, ICLR 2016 Reinforcement learning neural turing machines - revisited.
  2. Ranzato+, ICLR 2016 Sequence level training with RNNs.
  3. Bahdanau+, arXiv 2017 An actor-critic algorithm for sequence prediction.

Multi-lingual Neural MT

  1. Zoph, Knight, NAACL’16. Multi-source neural translation.
  2. Dong, Wu, He, Yu, Wang, ACL’15. Multi-task learning for multiple language translation.
  3. Firat, Cho, Bengio, NAACL’16. Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism.

Neural Word Alignment

  1. Yang et al, ACL 2013. Word Alignment Modeling with Context Dependent Deep Neural Network.
  2. Akihiro Tamura, Taro Watanabe and Eiichiro Sumita ACL 2014. Recurrent Neural Networks for Word Alignment Model.

n-gram NMT

  1. Le et al, NAACL 2012. Continuous Space Translation Models with Neural Networks.


  1. Yang et al, EMNLP 2016. Toward Socially-Infused Information Extraction: Embedding Authors, Mentions, and Entities.
  2. Qu et al, 2016. Named Entity Recognition for Novel Types by Transfer Learning.
  3. McDonald et al, EMNLP 2005. Flexible Text Segmentation with Structured Multilabel Classification.