`
University of Washington: Linguistics: Ling 573: Spring 2017: Deliverable #2
Ling 573 - Natural Language Processing Systems and Applications
Spring 2017
Deliverable #2: Base End-to-End Summarization System
Code and Results due: April 23, 2017: 23:59
Updated Project Report due: April 25, 2017: 09:00
Goals
In this deliverable, you will implement a base end-to-end summarization
system. You will
- Create an end-to-end summarization system going from topic-oriented document clusters to short summaries.
- Implement a content selection component, based on techniques discussed in class and in the readings to identify salient content.
- Identify the resources - software and corpus - that can support this task.
Base end-to-end system
For this deliverable, you will need to implement a base end-to-end system. You should build on approaches presented in class and readings.
You may implement any effective strategy, but you are
encouraged to implement an extractive summarization strategy.
Your system should include:
- Content selection, to determine which information to include in the summary;
- Information ordering, to organize the selected content; and
- Content realization, to create your output summary.
Since this is an initial system, it is not expected that your system will be
as elaborate as those presented in class. You should concentrate on "connectivity first": get the system to work end-to-end first, and then work on refinements.
High-level System Behavior
We will be focusing on the TAC (Text Analytics Conference) summarization shared task.
Your system will:
- Perform multi-document summarization:
- Over newswire document sets, where each set is associated with a topic, as specified below.
- For each document set:
- Produce one high quality text summary.
- Evaluate the summaries output by your system with respect to human model summaries, using the standard ROUGE metrics.
Document Sets
Document sets to be summarized are provided in NIST standard XML files, identifying a set of topics consisting of:
- topic title,
- (in some files) topic narrative,
- docsetA, and
- docsetB, where
- docsets provide sets of document ids, referring to documents in the AQUAINT and AQUAINT-2 corpora, available on patas.
- Document IDs specify:
- publication source: e.g. APW, NYT
- publication date: as YYYYMMDD
- detail specifier: a digit sequence
NOTE: You should only evaluate your system on the
the 'A' document sets. The 'B' sets were designed to evaluate so-called
"update" summaries.
NOTE: The format of the document IDs and the organization of the two corpora exhibit some differences. You may include/hardcode information
about this structure in your system directly, or in a configuration
file as you choose.
Summary Outputs
Your system should produce one summary output file per document set (topic),
structured as described below:
- Each summary can be no longer than 100 words (whitespace-delimited tokens). Summaries over the size limit will be truncated.
- Each summary should be well-organized, in English, using complete sentences. It should have one sentence per line. (Other formats can be used, but require modifications to the scoring configuration.)
A blank line may be used to separate paragraphs, but no other formatting is allowed (such as bulleted points, tables, bold-face type, etc.).
- Summaries should be based only on the 'A' group of documents for each
of the topics in the specification file.
-
All processing of documents and generation of summaries must be automatic.
- Please include a file for each summary, even if the file is empty.
- Each file will be read and assessed as a plain text file, so no special characters or markups are allowed.
Evaluation
You will employ the standard automatic ROUGE method to evaluate
the results from your base end-to-end summarization system.
- You should provide results for ROUGE-1, ROUGE-2, ROUGE-3,and ROUGE-4 which have, in aggregate, been shown to correlate well with human assessments of responsiveness. This can be done with the "-n 4" switch in ROUGE.
Code implementing the ROUGE metric
is provided in /dropbox/16-17/573/code/ROUGE/ROUGE-1.5.5.pl. Example
configuration files are given. You will need to modify the configuration
file to reference your own system's summary output.
- You will need to change the "PEER-ROOT" to point to your own outputs.
- You will also need to adjust the "PEERS" filenames to handle differences in file naming.
- If you choose to develop on an alternative data set, you will need to make similar changes to the "MODEL" specifications.
- You should use the following flag settings for your official evaluation runs:
-e ROUGE_DATA_DIR -a -n 4 -x -m -c 95 -r 1000 -f A -p 0.5 -t 0 -l 100 -s -d CONFIG_FILE_WITH_PATH
- where, ROUGE_DATA_DIR is /dropbox/16-17/573/code/ROUGE/data
- CONFIG_FILE_WITH_PATH is the location of your revised configuration file
- Output is written to standard output by default.
- Further usage information can be found using the -H flag or invoking ROUGE with
no parameters.
Files
Training, Development Test, and Example Files
We will use
one year's data as development test data (devtest) for most of the term, and then use a new unseen
year's data as final evaltest in the last deliverable.
Primary Document Collections
The AQUAINT and AQUAINT-2 Corpora have been employed as the document
collections for the summarization task for a number of years,
and will form the basis of summarization for this deliverable.
The collections can be found on patas in
- /corpora/LDC/LDC02T31/ (AQUAINT, 1996-2000) and
- /corpora/LDC/LDC08T25/ (AQUAINT-2, 2004-2006).
Core Files
The core training, development test, and evaluation files can be found in /dropbox/16-17/573/ on the CL cluster.
- /dropbox/16-17/573/Data/Documents/<file_set_type>/*.xml: All document set specification files
- /dropbox/16-17/573/Data/models/<file_set_type>/*: All human-created gold-standard model summary files. The training data sets are further placed in subdirectories by year.
- /dropbox/16-17/573/Data/peers/<file_set_type>/*: All automatically created official submission and baseline system summary files for the corresponding Shared Task event.
<file_set_type> ranges over:
- training
- devtest
- (Later)evaltest
Training Data
You may use any of the DUC or TAC summarization data through 2009
for training and developing your system. For previous years, there are prepared document sets and model summaries
to allow you to train and tune your
summarization system.
Development Test Data
For Deliverables 2 and 3, you should evaluate on the TAC-2010 topic-oriented document sets and their corresponding model summaries. You should only evaluate your system on the
the 'A' sets. Development test data appears in the devtest subdirectories.
Evaluation Example Files
A variety of example files are provided to help you familiarize yourself
with the ROUGE evaluation software.
- /dropbox/16-17/573/code/ROUGE/ROUGE-1.5.5pl: Script implementing the ROUGE evaluation measure.
- /dropbox/16-17/573/code/ROUGE/rouge_run_ex.xml: Example configuration file to be used with the ROUGE script. The directories and models for the model files are set correctly for the
TAC 2010 evaluation. You would point them to alternative files/directories
if you wish to use other data, such as the 2009 data.
- /dropbox/16-17/573/code/ROUGE/rouge_example.out: Output of ROUGE evaluation script using example configuration on example summaries specified below.
- /dropbox/16-17/573/Data/mydata/*: Example summary files for practice runs of the evaluation scripts.
Specific Submission Files
In addition to your source code and resources needed to support your system,
your repository should include the following:
- Dx.cmd: Top-level Condor file, where X is the number of the deliverable, here "2".
- README: File explaining anything we'll need to know
to be able to run and review your system.
Your System Generated Summarization Files
- .../outputs/D2/: directory containing the summaries based on running your base summarization system on
the devtest data files.
You should name your output files as:
- Given topic ID e.g. D0901A
- Split into:
- id_part1 = D0901, and
- id_part2 = A
- Output file name should be:
[id_part1]-A.M.100.[id_part2].[some_unique_alphanum]
The names must match the peer file names in your ROUGE configuration file.
- .../results/D2_rouge_scores.out: file containing scores
from running ROUGE evaluation on the summaries from your base summarization system on
the devtest data files.
Extended Project Report
../doc/D2.pdf: This extended version should include content for all the sections of the
report (no more lorem ipsums), though some of it will not be very
detailed yet. You should specifically focus on the following:
- System overview
- System architecture (NOTE: We expect an architecture diagram similar to
that from the course notes, with more detail in the subcomponents. )
- Approach
- Content Selection
- Information Ordering
- Content Realization
- Results
- Base results: this subsection should describe the results of your base system.
- You should include a table presenting the ROUGE-1, ROUGE-2, ROUGE-3, and
ROUGE-4 scores of your system. You should present ROUGE recall values: the "R" column in the ROUGE evaluation output.
- Discussion
- You should also present some error analysis help to motivate future improvements.
Presentation
../doc/D2_presentation.{pdf|pptx|etc}: Your presentation may be prepared in any computer-projectable format,
including HTML, PDF, PPT, and Word. Your presentation should take
about 10 minutes to cover your main content, including:
- System architecture
- Content selection
- Information Ordering
- Content Realization
- Issues and successes
- Related reading which influenced your approach
Your presentation should be deposited in your doc directory,
but it is not due until the actual presentation time. You may continue
working on it after the main deliverable is due.
Summary
- Finish coding and document all code.
- Verify that all code runs successfully on patas using Condor.
- Add any specific execution or other notes to the README.
- Create your D2.pdf and add it to the doc directory.
- Verify that all components have been added and any changes checked in.
- If using GIT, remember to tag your deliverable: D2, for the code/implementation, and D2.1 when you add your document.