CMPSCI 691L: Statistical Natural Language Processing
Spring 2002

Syllabus

 

   1.  Monday, Feb 4   

Linguistic Essentials, Corpus Based Work

This session will present the basic linguistic concepts needed for appreciation of the technology to be studied and the important role of language corpora in modern approaches to natural language processing. (Based on chapters 3 & 4 of text.)

Christy Doran and Lisa Ferro

   2.  Monday, Feb 11  

Mathematical Foundations & Collocations

A brief introduction to probability, information theory and statistical inference, with emphasis on the terminology and notation that is used in the text. The problem of recognizing collocations, expressions that correspond to a conventional way of saying things, will serve as basis for the presentation of some of the key concepts. (Based on chapter 2 of text.)

Warren Greiff

   3.  TUESDAY, Feb 19 

Statistical Inference - Ngram Models over Sparse Data

n-gram models are simplistic, but surprisingly useful, probabilistic models of the generation of natural language.  This session presents these models, the problem presented by sparse data, and the techniques of statistical estimation that have been developed. (Based on chapter 6 of text.)

Alex Morgan

   4.  Monday, Feb 25  

Markov Models

Markov Models are sophisticated probabilistic models that are particularly well suited for the modeling of sequential phenomena. Initially applied to speech recognition, they are now used for a variety of purposes in statistical natural language processing. (Based on chapter 9 of text.)

John Burger

   5.  Monday, Mar 4.  

Word Sense Disambiguation and Lexical Acquisition

In natural language, many words have multiple meanings.  This session will discuss the problem of word sense disambiguation.  Also covered in this session is the issue of lexical acquisition – the automatic induction of syntactic and semantic properties of words.   (Based on chapters 7 & 8 of text.)

Marc Light

   6.  Monday, Mar 11  

Probabilistic Context Free Grammars and Probabilistic Part of Speech Tagging

Context free grammars are well suited to modeling the essential recursive nature of language.  Probabilistic grammars extend this mathematical formalism.  With this extension models can be described that capture the notion that some utterances are more likely to occur than others.  (Based on chapters 10 & 11 of text.)

John Henderson

  **  March 18 (SPRING BREAK)

   7.  Monday, Mar 25  

Probabilistic Parsing

Deterministic approaches to finding the syntactic structure of language have two limitations.  First they need be extraordinarily complex to serve as a basis for natural language.  Second, they cannot account for the relative likelihood of one utterance compared to another.  In this session, we will review probabilistic approaches to the problem of inferring syntactic structure. (Based on chapter12 of text.)

John Bruger

   8.  Monday, Apr 1   

Statistical Alignment and Machine Translation

Automated machine translation has been something of the Holy Grail since the inception of Artificial Intelligence a half century ago.  In this session, we will review statistical approaches to the development of this technology. (Based on chapter 13 of text.)

John Henderson

   9.  Monday, Apr 8   

Speech

Speech recognition is now a mature technology.  This session will cover the basic principles of modern speech technology. (Not covered in text)

David Palmer

10.  WEDNESDAY Apr 17

Machine Learning and NLP

In this session, we shall briefly review main trends in the application of Machine Learning techniques to the problems of natural language processing. (Based on chapters 14 & 16 of text.)                 

David Palmer and Randy Fish

11.  Monday, Apr 22  

Info Extraction

Information extraction is the task of uncovering information within natural language text and presenting it in a structured format.  This session will present the history of this area of investigation, the basic problems involved, and the techniques that have been applied. (Not covered in text)

David Day and Marc Vilain

12.  Monday, Apr 29  

Question Answering

While research into answering questions has a long history in natural language processing, the last few years have seen a resurgence of interest in practical approaches to this problem.  In this session, we will present a review of the current state of the art in this field. (Not covered in text)

John Burger

13.  Monday, May 6

Technology Evaluation

Modern research into natural language processing is driven to a very large degree by quantitative evaluation metrics.  In this session, we will discuss the importance of this paradigm to recent advances in NLP technology, the problems of developing appropriate evaluation procedures, and the approaches that have been taken in the various areas of NLP that have been covered in the course.

Lynette Hirschman

14.  Monday, May 13  

Topics in Information Retrieval

This session will summarize the essential principles of the theory and practice of modern information retrieval systems. (Based on chapter 15 of text.)

Jay Ponte

15.  Monday, May 20   (exam period)

left open for now


CMPSCI 691L: Statistical Natural Language Processing

Student Evaluation

 

           

 

Grading:

25%     Assignments (to be handed in at following class, unless specified otherwise)

            60%     Term project

15%     Class participation

 

    

Project Guidelines:

 

Guided by the course syllabus, students will select a topic in the area of Statistical NLP for a term project.  Students      are encouraged to connect the project they choose to their individual areas of research.  It is expected that a typical project will involve the implementation of some statistical language processing algorithm; application of the application to a carefully circumscribed problem; and evaluation of aspects of either the effectiveness of the approach to the solution of the application problem or the computational properties of the implementation or both.  Students should not, however, feel constrained by this particular format; creativity, risk-taking and thinking out-of-the-box will be welcome.

 

Students will be expected to formulate a project and identify a member of the teaching team to serve as an advisor.  For the purposes of monitoring progress, students will be expected to comply with the following schedule:

Feb 19 

Submission of a one (maximum,  two) paragraph description of the project.  The description should identify the specific topic area, lay out the principal objectives, and give the name of the person who has agreed to serve as advisor.

Mar 4 

Submission of two-page project description.  This document will describe at a greater level of detail the specific objectives and principal activities of the project.  It will state tangible results that are expected to be achieved upon completion of the project.  Where appropriate, descriptions should include: hypotheses to be tested, effects to be demonstrated, characteristics to be measured.  Appended to the calendar should be a calendar of concrete milestones for the duration of the project.

Apr 1:

Submission of first 1-page progress report.  Progress reports will give an account of the current status of the project, including difficulties encountered and changes, if any, to the original plan.   Appended to the report should be a copy of the original calendar with any changes clearly indicated.

Apr29:

Submission of second progress report. (same format.)

May 20  

Submission of final project report. The final report can be modeled along the lines of a research conference article.  Once again, students need not feel overly constrained to a particular format.