Each of these is expanded below.
In this step, we'll use the nltk.metrics.agreement module, which is partly documented here. This module takes in data in the form of a list of triples, where each triple contains an annotator label, an item label and a tag. For example, the following is a snippet of what our data could look like:
[['1', 5723, 'ORG'], ['2', 5723, 'ORG'], ['1', 55829, 'LOC'], ['2', 55829, 'LOC'], ['1', 259742, 'PER'], ['2', 259742, 'LOC'], ['1', 269340, 'PER'], ['2', 269340, 'LOC']]
Here we have four items, each labeled by two different annotators. In two cases, the annotators agree. In two cases they don't.
Using the python interpreter and the nltk metrics package, calculate inter-annotator agreement (both kappa and alpha) for this example. Note that AnnotationTask is a type of object, with methods kappa() and alpha(). When you call nltk.metrics.AnnotationTask() it returns an object of that type, which in the example below is stored in the variable task.
import nltk toy_data = [['1', 5723, 'ORG'],['2', 5723, 'ORG'],['1', 55829, 'LOC'],['2', 55829, 'LOC'],['1', 259742, 'PER'],['2', 259742, 'LOC'],['1', 269340, 'PER'],['2', 269340, 'LOC']] task = nltk.metrics.agreement.AnnotationTask(data=toy_data) task.kappa() task.alpha()
The nltk metrics package also provides for calculating and printing confusion matrices, a way of displaying which labels were 'mistaken' for which other ones. Unfortunately, this functionality requires a different format for the input. In particular, it wants two lists of labels (in the same order).
import nltk #Don't need to do this twice in the same python session toy1 = ['ORG','LOC','PER','PER'] toy2 = ['ORG','LOC','LOC','LOC'] cm = nltk.metrics.ConfusionMatrix(toy1,toy2) print cm
The rest of the lab will be done by writing a script to take the two .eaf files, extract the annotations from them and format them as nltk.metrics.agreement expects, and calculate the two measures. In addition, this script will print out the points of disagreement so you can examine them by hand.
Trade .eaf files with your practicum group partner, so that you have two. Open these up with a text editor, and look at their structure. Note what kind of information is where in the file, and what happened with words that didn't get MUC annotations.
Download the starter script (calculate-iaa.py), and read through the comments and existing code to get a sense of what the script is doing, and what it is asking you to do. As before, the symbol
# ~*~
indicates a place where you need to fill in code to implement what's in the comment above.
The main subtasks of the script are as follows:
Since we need to extract the information from two files the same way, and since we need to format the information for three different tasks, the first two subtasks above are conceptualized as subroutines in the model script. There are a couple of other subroutines to define at the top of the script. You can wait to define those until you've reach the portion of the main body of the script that uses them.
This script makes heavy use of xml.etree for handing XML, but the assignment does not ask you to modify that part of the code.
It also uses nltk.metrics. In addition to the notes above, the final thing you need to know is how to get the confusion matrix to print with the write() method on the output file object. Confusion matrix objects have a pp() method ("pretty print") which works together with the write method as follows:
outfile.write(cm.pp())
Appending to a list You can append two lists like so:
list1 = [1, 2, 3] list2 = [4, 5, 6] list1 = list1 + list2
to append a single element to a list, make a list of it:
list1 = list1 + [7]
Getting the keys to a dictionary, sorted: The sorted() function and iterkeys() methods are helpful here:
dict1 = {1:'a', 5:'b', 2:'c', 9:'a'} sorted(dict1.iterkeys())