Anoop Sarkar SFU Computer Science

Anahita passes her PhD depth exam

Anahita Mansouri successfully defended her PhD thesis on Dec 18th 2015.

The title of her survey paper was: “Word alignment for Statistical Machine Translation using Hidden Markov Models”.

Abstract

Statistical machine translation (SMT) relies on large parallel data between source and target languages. Word alignment is a crucial early step in training of most SMT systems. The objective of the word alignment task is to discover the word-to-word correspondences in a sentence pair. The classic IBM Models 1-5 and the Hidden Markov Model (HMM) have underpinned the majority of the SMT systems to date.

HMMs have been applied to numerous problems in NLP. The key attraction of HMMs is the existence of well-known tractable algorithms for EM parameter estimation (Baum- Welch algorithm) and maximization (Viterbi algorithm). HMMs have been exploited for the word alignment problem. The performance of an improved HMM word alignment model is comparable to that of IBM Model 4 which is arguably the most widely used model for word alignment. Compared to IBM Model 4, HMM is much easier to implement and modify and is more time-efficient to train. This report is a summary of the key papers that use the HMM-based word alignment model for SMT.