Consider the following sentence $s$ where <bs>
and <es>
are padding tokens:
<bs> <bs> I would like to go West . <es>
Using a vocabulary \({\cal V} = \{\) crazy, killer, clown \(\}\) define a language model which has a particular constraint: \(p(x_1, \ldots, x_n) = \gamma_n \times 0.5^n\) for any \(x_1, \ldots, x_n\) such that \(x_i \in {\cal V}\) for \(i = 1, \ldots, n-1\) and \(x_n =\) STOP.
\(\gamma_n\) is some expression that can be a function of \(n\).
Choose one of the following definitions of \(\gamma_n\) so that \(p(x_1, \ldots, x_n)\) is a valid language model.
1. | \(3^{n-1}\) |
2. | \(3^n\) |
3. | \(1\) |
4. | \(\frac{1}{3^n}\) |
5. | \(\frac{1}{3^{n-1}}\) |
Hint: \(\sum_{n=1}^\infty 0.5^n = 1\).
The perplexity of a language model on a test corpus is defined as
\[2^{-\ell}\]where
\[\ell = \frac{1}{M} \sum_{i=1}^m \log_2 p(x^{(i)})\]where \(m\) is the number of sentences in the corpus, \(M\) is the total number of words in the corpus, \(\log_2\) is log base 2, \(x^{(i)}\) is the \(i\)‘th sentence in the corpus.
Assume we have a bigram language model where
\[p(w_1, \ldots, w_n) = \prod_{i=1}^n q(w_i \mid w_{i-1})\]and \(w_0 = \ast\) and \(w_n =\) STOP. Each parameter \(q(w \mid v)\) is calculated as follows:
\[q(w \mid v) = \frac{C(v,w)}{C(v)}\]C(\(\cdot\)) is the count of that item in training data. Write down a training corpus and a test corpus such that the perplexity of the model trained on the training corpus takes the maximum possible value on the test corpus.
We define a trigram language model as follows. Take \(C(w), C(v, w)\) and \(C(u, v, w)\) to be unigram, bigram and trigram counts taken from a training corpus (here \(w\) is a single word, \(v, w\) is a bigram, and \(u, v, w\) is a trigram). Take \(N\) to be the total number of words seen in the corpus. Then the unigram, bigram and trigram maximum-likelihood estimates are:
\[q(w) = \frac{C(w)}{N}\] \[q(w \mid v) = \frac{C(v,w)}{C(v)}\] \[q(w \mid u, v) = \frac{C(u,v,w)}{C(u,v)}\]The final estimate of the trigram probability is:
\[p(w \mid u, v) = \alpha \times q(w \mid u, v) + (1 - \alpha) \times \left( \beta \times q(w \mid v) + (1 - \beta) \times q(w) \right)\]Assume that \(\alpha = \beta = 0.5\). Show that the above model is equivalent to the following model by providing values for \(\lambda_1, \lambda_2, \lambda_3\).
\[p(w \mid u, v) = \lambda_1 \times q(w \mid u, v) + \lambda_2 \times q(w \mid v) + \lambda_3 \times q(w)\]Consider the following corpus of sentences where <bs>
and <es>
are padding tokens:
<bs> I am Sam <es>
<bs> Sam I am <es>
<bs> I do not like green eggs and ham <es>
Fill in the following table:
P(I | <bs> ) = |
P(Sam | <bs> ) = |
P(am | I ) = |
|||
P(<es> | Sam ) = |
P(Sam | am ) = |
P(do | I ) = |
Some of these questions are modified versions of questions from the Columbia University course COMS 4705 by Michael Collins.