NLP 2020

Assignments for NLP course, University of Helsinki

Final project: extending a system

In the final project, you should implement an extension to one of the systems that you produced in the assignments. Wherever appropriate, the assignment instructions contain a list of ideas for possible extensions.

You can choose what you build. You do not need to choose one of the suggested extensions. If you do, you don’t need to follow the instructions exactly. The main criterion is that you display an understanding of:


You should submit a short report (2-3 pages) containing the following:

The submission is due one week after the end of the course: 6.3.2020.

Also submit your code as a single Python file. Don’t worry too much about cleaning it up or submitting production-quality code! As with the assignments, we will not mark the code itself – it is only for reference when marking your report.

We will not be grading your submission on the basis of the success of your system. The main purpose of this assignment is for you to have putting into practice what you’ve learned to gain a better understanding of the challenges of building successful, real-world NLP systems.

System ideas

Here are some ideas for NLP systems that could be built using the kind of components we have seen or developed. You could use these as the basis for your extension, or as inspiration: e.g. implement an extension to a tool that would make it more suitable for use in a system here.

We do not expect you to produce a full working system for any of these ideas.

Temporal information extraction

Here we can continue working on the Temporal Information Exctraction problem, which we started in week 7. It is assumed that you finished the W7 assignments and have implemented two time-expression annotators: regex-based and spaCy-based.

You have the training and development datasets and the scorer, and know how the annotators perform in terms of recall, precision and F1-score. You have made a comparison of the annotators’ outputs and know their main strengths and weaknesses.

Now try to find a method that would combine advantages of both annotators.

The following instructions will take you through some further ways to develop your system. You are welcome to choose a different approach.

The best ever temporal expression annotator

You may use any other technology but try to reuse your code made for Exercises 1 and 2.

Repeat this process until you are satisfied with your scores.

Does it perform better than the other two? If not, try to identify the problem and modify your annotator.


By this point you have seen a number of the scorer outputs.

Most probably you would need to modify a function called update_scores.

Results and evaluation

Now you have three annotators and two scorers. Let’s test them on unseen data.

The data used in this assignment are taken from TempEval-3 Temporal Annotation Shared Task. The shared task used a much elaborate annotation schema and consisted of several sub-tasks. More details on the tasks and the results can be found in the organizers’ paper.

Metaphor Generation System

The goal of this task is to use knowledge bases of nouns along with their stereotypical adjectival properties (i.e. adjectives that are strongly associated with the nouns) to generate metaphorical expressions.

You can use the knowledge bases provided in Prosecco Network’s Github Read the README file for descriptions of each file. Alternatively, you can use other resources (e.g. word embeddings models) to obtain similar relations.

In the resources, the file expanded_weights.txt contains inferred stereotypical relations, which makes it more extensive (see this site for an interactive interface for expanding a seed list of properties). To download the knowledge base, execute the following command:


You can read and parse the knowledge base using the below code:

import io, re
from collections import defaultdict
def parse_members_rex_weights(file):
  Parses the `expanded_weights.txt` file and returns:
      NOUN: {
  weights = defaultdict(lambda: defaultdict(float))
  rg_ptrn = re.compile(r"^(\d+)\. ([\d\w_\.\-\"\']+) \[(.*)\]$", re.UNICODE | re.IGNORECASE)  # magical regex pattern
  with, 'r', encoding='utf-8') as inp_file:
    for l in inp_file:
      matched = rg_ptrn.match(l).groups()
      member = matched[1].replace('_', ' ')
      properties = map(lambda p: p.replace(')', '').split('('), matched[2].replace(' ', '').split(','))
      properties = map(lambda x: tuple([' '.join(x[0].split('_')[:-1]), float(x[1])]), properties)
      properties = list(properties)
      weights[member] = dict(properties)
  return weights

stereotypes = parse_members_rex_weights('expanded_weights.txt')

Now, you can use such knowledge base to create metaphors, similes and analogies. For instance, to produce a metaphor that highlights that someone is very brave, you need to find out a noun that is well-known to be brave. A simple code to do that is:

brave_nouns = [(noun, properties['brave']) for noun, properties in stereotypes.items() if 'brave' in properties] # get brave nouns
brave_nouns = sorted(brave_nouns, key=lambda k: k[1], reverse=True) # sort them based on the strength
print(brave_nouns) # [(u'Hero', 1778.0), (u'Warrior', 1491.0), ...]

With this knowledge, you can construct some metaphorical templates (e.g. “X is Y”) and fill them dynamically (refer to Day 6 - Exercise 2: Very simple NLG) with the knowledge you have depending on the context. Following the earlier example, you can produce metaphorical expressions like “X is a hero”, “X is as brave as a hero”, “Like a hero, X rescued the dog.” and so on.

Write a function that accepts a text and performs some analyses (e.g. POS tagging, dependency parsing, entity recognition). If the sentence contains a noun and an adjective that has an adverb with the relation advmod, the function would remove the adverb and, then, inject a metaphorical expression that fits the context. The insertion could be for example after the noun or at the beginning/end of the sentence. Don’t complicate the system, just build a very simple proof-of-concept.


Pun Generation System

The goal of this exercise is to extend the pun generator you have implemented during day 2 (feel free to use the model solution). Below are some ideas for improving it (you are free to improve it differently as long as you motivate your choices):

Text summarization

Text summarization accepts as input an arbitrary text (e.g. news article, conversation) and produces its short summary.

Standard NLP components:

Additionally, we can add speech recognition as the first step

Additional components:

Potential problems:

Challenge: How to choose what is important in the text?

Of course, this is a big task and will take a long time to do well. But it could be very interesting to see how far you can get with just a few analysis tools and simple strategies for choosing important phrases, constructing the summary, etc.

Language generation

A system for generating text.

  1. A large corpus is split into sentences.
  2. Tokenization
  3. Lemmatization
  4. Build a generative language model (e.g. markov chain)

Generative model probably outputs bad language and nonsense. The quality of the model depends on the corpus. A more complex model and domain specific corpus needed for passable results.

Interesting things you could explore/learn from this:

News tagging

The system gives tags to news articles.

  1. Sentence splitting
  2. Tokenization
  3. Lemmatization
  4. POS
  5. NER
  6. Topic modelling?

From NER we can just get some of the named entities as tags. Also maybe use topic modelling.

If lemmatization or NER goes wrong (for example NER does not recognize named entities), then we have problems. We might get useless tags. We need to somehow figure out which tags are important.

You could perform some manual error analysis of the output and assess where the system is going wrong and how the tagging algorithm could produce more useful output.

If you have a dataset with manually assigned tags, you could do this by training a classifier, using features from your pipeline. You will probably find some suitable examples in one of the lists below.

There are many possible datasets you could try this out on. Here are some ideas:

Recognizing politicians’ stand-point through their Twitter


  1. Extract English data from twitter messages of politicians running for the EU parliament
  2. Tokenization
  3. Lemmatization
  4. POS tagging
  5. NER (for example EU, politic parties)
  6. Parsing
  7. Sentence-level semantics
  8. Semantic-role labeling (Finding the relation, i.e. stand-point. For example same-sex marriage for or against?)
  9. Analyzing the agreement-level/Multi-level classification (For example very much against, against, neutral, okay with it, for it)

Potential problems:

Some further problems, for this assignment:

  1. The above pipeline is not easy to implement quickly (or at all…?)
  2. Labelled data may be possible to retrieve (e.g. from Vaalikone), but preparing the data may take a lot of time.

Regarding 1: Think about simpler, shallower methods you could try. E.g. some features from lower-level processing could be fed into a classifier, instead of relying on more abstract analysis.

Alternatively, you could try some ready-made systems for English abstract analysis (e.g. SRL) and try to find some relatively reliable signals of stance they produce.

Regarding 2: Perhaps there’s a related task for which data is more easily available, which could be seen as a test case or proof of concept for the task above. Take a look at this list, for example.

Automatic medical diagnosis system

User calls the medical hotline and describes the symptoms. The system tries to guess the nature of the illness based on the description.

Pipeline components:

For the purposes of this assignment, you probably want to drop the speech-related components at the start and end.

Think carefully about what components are necessary and how you will use their output further down the pipeline.

A crucial factor in this system will be the knowledge resources that supply the medical information. Here are a couple you could consider:

However, avoid spending all your time scraping websites! An easily available, poor quality knowledgebase will be most useful as a proof of concept. You can develop your system to use better medical databases later, before releasing for use by the medical profession or selling it to make millions.