NLP 2019

Assignments for intensive NLP course, University of Helsinki

Day 5: Evaluation

Carry out all the exercises below and submit your answers on Moodle. Also submit a single Python file containing your full implementation.

Exercise 1: Basics

Consider an information retrieval system that returns a retrieval set of 15 documents (retrieved). Each document in retrieved is labelled as relevant ('R') or non-relevant ('N'):

total_docs = 100
total_relevant = 10

retrieved = ['R', 'N', 'N', 'R', 'R', 'N', 'N', 'N',
             'R', 'N', 'R', 'N', 'N', 'R', 'R']

Exercise 1.1

Continuing the snippet given above, compute the numbers of true positives, false positives, true negatives, and false negatives. Then, compute the values of the following metrics (round the values to two decimal places):

Exercise 1.2

Consider the following scenario: a database consists of 10,000 documents in total, of which 10 are relevant.

Exercise 2: Evaluation of a POS tagger

In exercises 2.1-3, we evaluate a POS tagger based on a hidden Markov model (HMM), which you implemented on Day 3.

Today, we will again use the Penn Treebank corpus that you used yesterday. You will already have downloaded yesterday using:

import nltk
nltk.download('treebank')

We use 80% of sentences for training, and the remaining 20% for testing.
The following code splits the corpus of sentences into training and test sentences, and assigns test tokens and the correct tags into separate lists.

Train the HMM with training_sents, as in exercise 2 of Day 3. Download ass5utils.py into the same directory as your source code.

from nltk.corpus import treebank
from nltk.tag.hmm import HiddenMarkovModelTagger
from ass5utils import split_corpus

training_sents, test_sents = split_corpus(treebank, 0.8)

test_tokens = [t[0] for s in test_sents for t in s]
correct_tags = [t[1] for s in test_sents for t in s]

hmm_tagger = HiddenMarkovModelTagger.train(training_sents)

Exercise 2.1: Confusion matrix

Use the HMM to predict the tags for test_tokens. (If you’ve forgotten how to do this, refer back to your code from day 3.)

Then, compute the confusion matrix between the predicted tags and correct_tags.
You can use the nltk.metrics.ConfusionMatrix class for this exercise.

(In the confusion matrix, rows are the correct tags and columns are the predicted tags. That is, an entry cm[correct_tag, predicted_tag] is the number of times a token with true tag correct_tag was tagged with predicted_tag.)

Exercise 2.2: Comparison with baselines

We would like to know whether the HMM tagger is any good compared to naive baselines.

Now, implement the following functions:

Compute the overall accuracy of both baselines, and compare the values with the HMM.

Exercise 2.3: Evaluation of HMM language model

Recall exercise 5 on Day 3, where you used the HMM as a language model.

Again, use the log_probability() method of the HMM to compute the total log-probability of test tokens. (The input should be given as (token, None) pairs.)

Exercise 3: Text annotation

Consider the following sentences from Penn Treebank corpus:

s1 = ['So', 'far', 'Mr.', 'Hahn', 'is', 'trying', 'to', 'entice', 'Nekoosa', 'into', 'negotiating', 'a', 'friendly',
'surrender', 'while', 'talking', 'tough']
s2 = ['Despite', 'the', 'economic', 'slowdown', 'there', 'are', 'few', 'clear', 'signs', 'that', 'growth', 'is',
'coming', 'to', 'a', 'halt']
s3 =  ['The', 'real', 'battle', 'is', 'over', 'who', 'will', 'control', 'that', 'market', 'and', 'reap',
'its', 'huge', 'rewards']

Exercise 3.1

Annotate the sentences with appropriate POS tags. The tags are described here.

(It is not the aim of the exercise to annotate exactly according to guidelines, so simply make your best guess of the correct tag.)

Exercise 3.2

The corresponding gold-standard tags of the sentences are below:

tags1 = ['IN', 'RB', 'NNP', 'NNP', 'VBZ', 'VBG', 'TO', 'VB', 'NNP', 'IN', 'VBG', 'DT', 'JJ', 'NN', 'IN', 'VBG', 'JJ']
tags2 = ['IN', 'DT', 'JJ', 'NN', 'EX', 'VBP', 'JJ', 'JJ', 'NNS', 'IN', 'NN', 'VBZ', 'VBG', 'TO', 'DT', 'NN']
tags3 = ['DT', 'JJ', 'NN', 'VBZ', 'IN', 'WP', 'MD', 'VB', 'DT', 'NN', 'CC', 'VB', 'PRP$', 'JJ', 'NNS']