Ling 573 - Natural Language Processing Systems and Applications
Spring 2015
Deliverable #4: Final Summarization Systems
Code and Results due: May 29, 2015: 23:59
Final Report due: June 9, 2015: 17:00
Goals
In this deliverable, you will complete development of your summarization
system. You will
- Refine and finalize your end-to-end summarization system.
- Improve content realization, for improved content or enhanced readability.
- Exploit information from any source to improve your overall system.
- Perform final evaluation on a held-out test set.
System Enhancement
This final deliverable must include substantive enhancements beyond your baseline system and further extensions over your D3 system.
Content Realization
For this deliverable, one focus will be on improving your
systems through enhanced content realization.
Content realization can address:
- sentence compression to remove extraneous content, either before or
after content selection, or
- sentence reformulation focussing on enhancing readability.
You may build on techniques presented in class, described in the reading
list, and proposed in other research articles.
We will also be conducting a manual readability evaluation
in addition to the ROUGE content scoring, to give improvements in this
area due credit.
Data
We will be focusing on the TAC summarization shared task. Since this is the
final deliverable, you will evaluate not only on the 2010 devtest data you
have used all term, but also on held-out test data.
Document Collection
Devtest Corpus
The AQUAINT and AQUAINT-2 Corpora
collections were used for the summarization task for a number of years and
form the devtest corpus.
The collections can be found on patas in /corpora/LDC/LDC02T31/ (AQUAINT, 1996-2000) and /corpora/LDC/LDC08T25/ (AQUAINT-2, 2004-2006).
Evaluation Corpus
The held-out document sets for the final evaluation are drawn from the
English Gigaword corpus, from years 2007 and 2008. This collection may be found on patas in
/corpora/LDC/LDC11T07. (Note: Given the size of this corpus, it's
still fine if you use the main corpus as your background corpus.)
Training Data
You may use any of the DUC or TAC summarization data through 2009
for training and developing your system. For previous years, there are prepared document sets and model summaries
to allow you to train and tune your
summarization system.
All model files appear in /dropbox/14-15/573/Data/models.
All document specification files appear in /dropbox/14-15/573/Data/Documents.
Training data appear in the training subdirectories and devtest data in the devtest directory.
Development Test Data
You should evaluate on the TAC-2010 topic-oriented document sets and their corresponding model summaries. You should only evaluate your system on the
the 'A' sets. Development test data appears in the devtest subdirectories.
Held-out Evaluation Test Data
You should also evaluate on the TAC-2011 topic-oriented document sets and
their corresponding model summaries, again only on the 'A' sets. This
evaluation test data appears in the evaltest subdirectories.
Scoring
You will employ the standard automatic ROUGE method to evaluate
the results from your summarization system.
- Evaluation results should be stored in your results directory.
- You will have two results files:
- The devtest results file should be named D4.devtest.results.
- The evaluation test results file should be named D4.evaltest.results.
- You should provide results for ROUGE-1, ROUGE-2, ROUGE-3,and ROUGE-4 which have, in aggregate, been shown to correlate well with human assessments of responsiveness. This can be done with the "-n 4" switch in ROUGE.
Code implementing the ROUGE metric
is provided in /dropbox/14-15/573/code/ROUGE/ROUGE-1.5.5.pl. Example
configuration files are given.
- rouge_run_ex.xml gives a model configuration file covering the 2010 data.
- rouge_run_ex_2011.xml gives a model configuration file covering the 2011 data.
- You will need to change the "PEER-ROOT" to point to your own outputs.
- You also adjust the "PEERS" filenames to handle differences in file naming.
- The directories and models for the model files are set correctly for the
TAC 2010/12 evaluations, respectively. You would to point them to alternative files/directories
if you wish to use other data, such as the 2009 data.
- To be safe, please call with perl5.10.0.
- You should use the following flag settings for your official evaluation runs:
-e ROUGE_DATA_DIR -a -n 4 -x -m -c 95 -r 1000 -f A -p 0.5 -t 0 -l 100 -s -d CONFIG_FILE_WITH_PATH
- where, ROUGE_DATA_DIR is /dropbox/14-15/573/code/ROUGE/data
- CONFIG_FILE_WITH_PATH is the location of your revised configuration file
- Output is writtent to standard output by default.
- Further usage information can be found using the -H flag or invoking ROUGE with
no parameters.
Outputs
Create two directories under the outputs directory containing the summaries based on running your final summarization system as
below:
- Directory D4.devtest should have the summaries
resulting from the TAC 2010 devtest document sets.
- Directory D4.evaltest should have the summaries
resulting from the TAC 2011 devtest document sets.
You should do this as follows:
- Summary output
- Each summary should be well-organized, in English, using complete sentences. It should have one sentence per line. (Other formats can be used, but require modifications to the scoring configuration.) A blank line may be used to separate paragraphs, but no other formatting is allowed (such as bulleted points, tables, bold-face type, etc.). Each summary can be no longer than 100 words (whitespace-delimited tokens). Summaries over the size limit will be truncated.
- Summaries should be based only on the 'A' group of documents for each
of the topics in the specification file. All processing of documents and generation of summaries must be automatic.
- Submission format: A run will comprise exactly one file per topic summary, where the name of each summary file is the ID of its document set. Please include a file for each summary, even if the file is empty. Each file will be read and assessed as a plain text file, so no special characters or markups are allowed.
Completing the project report
This final version should include all required sections, as well as a complete system architecture description and proper bibliography including all and only the papers you have actually referenced. See this document for full details.
The project report must also include a substantive error analysis. Please name your report D4.pdf.
Presentation
Your presentation may be prepared in any computer-projectable format,
including HTML, PDF, PPT, and Word. Your presentation should take
about 10 minutes to cover your main content, including:
- Your overall system, emphasizing recent improvements
- Discussion of error analysis
- Issues and successes
- Related reading which influenced your approach
Your presentation should be deposited in your doc directory,
but it is not due until the actual presentation time. You may continue
working on it after the main deliverable is due.
Summary
- Finish coding and document all code.
- Verify that all code runs effectively on patas using Condor.
- Add any specific execution or other notes to a README.
- Create your D4.pdf and add it to the doc directory.
- Verify that all components have been added and any changes checked in.