CMPSCI 691L: Statistical Natural Language Processing
Spring 2002
Syllabus
1. Monday, Feb 4
Linguistic Essentials, Corpus
Based Work
This session will present the
basic linguistic concepts needed for appreciation of the technology to be
studied and the important role of language corpora in modern approaches to
natural language processing. (Based on chapters 3 & 4 of
text.)
Christy Doran and Lisa Ferro
2. Monday, Feb 11
Mathematical Foundations &
Collocations
A brief
introduction to probability, information theory and statistical inference, with
emphasis on the terminology and notation that is used in the text. The problem of recognizing collocations, expressions that correspond to a conventional way of
saying things, will serve as basis for the presentation of some of the key
concepts. (Based on chapter 2 of text.)
Warren Greiff
3. TUESDAY, Feb 19
Statistical Inference - Ngram Models over Sparse Data
n-gram
models are simplistic, but surprisingly useful, probabilistic models of the generation
of natural language. This session
presents these models, the problem presented by sparse data, and the techniques
of statistical estimation that have been developed. (Based on
chapter 6 of text.)
Alex Morgan
4. Monday, Feb 25
Markov Models
Markov Models are sophisticated
probabilistic models that are particularly well suited for the modeling of
sequential phenomena. Initially applied to speech recognition, they are now
used for a variety of purposes in statistical natural language processing. (Based on chapter 9 of text.)
John Burger
5. Monday, Mar 4.
Word Sense Disambiguation and
Lexical Acquisition
In natural language, many words
have multiple meanings. This session
will discuss the problem of word sense disambiguation. Also covered in this session is the issue of
lexical acquisition – the automatic induction of syntactic and semantic
properties of words. (Based
on chapters 7 & 8 of text.)
Marc Light
6. Monday, Mar 11
Probabilistic Context Free
Grammars and Probabilistic Part of Speech Tagging
Context free grammars are well
suited to modeling the essential recursive nature of language. Probabilistic grammars extend this
mathematical formalism. With this
extension models can be described that capture the notion that some utterances
are more likely to occur than others. (Based on chapters 10 & 11 of text.)
John Henderson
** March 18 (SPRING BREAK)
7. Monday, Mar 25
Probabilistic Parsing
Deterministic approaches to
finding the syntactic structure of language have two limitations. First they need be extraordinarily complex to
serve as a basis for natural language.
Second, they cannot account for the relative likelihood of one utterance
compared to another. In this session, we
will review probabilistic approaches to the problem of inferring syntactic
structure. (Based on chapter12 of text.)
John Bruger
8. Monday, Apr 1
Statistical Alignment and Machine Translation
Automated machine translation has
been something of the Holy Grail since the inception of Artificial Intelligence
a half century ago. In this session, we
will review statistical approaches to the development of this technology. (Based on chapter 13 of text.)
John Henderson
9. Monday, Apr 8
Speech
Speech recognition is now a mature
technology. This session will cover the
basic principles of modern speech technology. (Not covered in text)
David Palmer
10. WEDNESDAY Apr 17
Machine Learning and NLP
In this session, we shall briefly
review main trends in the application of Machine Learning techniques to the
problems of natural language processing. (Based on chapters
14 & 16 of text.)
David Palmer and Randy Fish
11. Monday, Apr 22
Info Extraction
Information extraction is the task
of uncovering information within natural language text and presenting it in a
structured format. This session will
present the history of this area of investigation, the basic problems involved,
and the techniques that have been applied. (Not covered in text)
David Day and Marc Vilain
12. Monday, Apr 29
Question Answering
While research into answering
questions has a long history in natural language processing, the last few years
have seen a resurgence of interest in practical approaches to this
problem. In this session, we will
present a review of the current state of the art in this field. (Not covered in
text)
John Burger
13. Monday, May 6
Technology Evaluation
Modern research into natural
language processing is driven to a very large degree by quantitative evaluation
metrics. In this session, we will
discuss the importance of this paradigm to recent advances in NLP technology,
the problems of developing appropriate evaluation procedures, and the
approaches that have been taken in the various areas of NLP that have been
covered in the course.
Lynette Hirschman
14. Monday, May 13
Topics in Information Retrieval
This session will summarize the
essential principles of the theory and practice of modern information retrieval
systems. (Based on chapter 15 of text.)
Jay Ponte
15. Monday, May 20 (exam period)
left open for now
CMPSCI 691L: Statistical Natural Language Processing
Student Evaluation
Grading:
25% Assignments (to be handed in at following
class, unless specified otherwise)
60%
Term project
15% Class participation
Project Guidelines:
Guided by the course
syllabus, students will select a topic in the area of Statistical NLP for a
term project. Students are encouraged to connect the project they
choose to their individual areas of research.
It is expected that a typical project will involve the implementation of
some statistical language processing algorithm; application of the application
to a carefully circumscribed problem; and evaluation of aspects of either the
effectiveness of the approach to the solution of the application problem or the
computational properties of the implementation or both. Students should not, however, feel
constrained by this particular format; creativity, risk-taking and thinking
out-of-the-box will be welcome.
Students will be expected to
formulate a project and identify a member of the teaching team to serve as an
advisor. For the purposes of monitoring
progress, students will be expected to comply with the following schedule:
Feb 19
Submission of a one (maximum, two) paragraph description
of the project. The description should
identify the specific topic area, lay out the principal objectives, and give
the name of the person who has agreed to serve as advisor.
Mar 4
Submission of
two-page project description. This document will describe at a greater
level of detail the specific objectives and principal activities of the
project. It will state tangible results
that are expected to be achieved upon completion of the project. Where appropriate, descriptions should
include: hypotheses to be tested, effects to be demonstrated, characteristics to be measured. Appended to the calendar should be a calendar
of concrete milestones for the duration of the project.
Apr 1:
Submission of
first 1-page progress report. Progress reports will give an account of the
current status of the project, including difficulties encountered and changes,
if any, to the original plan. Appended
to the report should be a copy of the original calendar with any changes
clearly indicated.
Apr29:
Submission of
second progress report. (same format.)
May 20
Submission of
final project report. The final
report can be modeled along the lines of a research conference article. Once again, students need not feel overly
constrained to a particular format.