COMP 4420/5420 Natural Language Processing - UMass Lowell Spring 2022
Course Description:
Text is everywhere, and the overload of digital information in our century makes automated processing of text an imperative, whether you are searching for information on the web, trying to understand patterns of interaction in social media and digital communities, need to translate news from another language, or wish to evaluate different treatment options for a particular medical condition. In this course, we will learn the principles and theory behind the contemporary technology, covering the many applications of automatic text processing, from information retrieval to machine translation. You will have a chance to build a practical system for the application of your choice. We will emphasize the developing statistical models for various tasks using large collections of texts, with a particular focus on deep learning techniques.
Pre-requisite: COMP 4220 / COMP 5220 Machine Learning or equivalent (with permission of instructor).
List of Topics:
- Why to use machine learning for language processing
- Sparse and dense word vector representations
- Logistic regression, fully-connected neural networks
- Text pre-processing, tokenization, subword tokenizers
- Language models
- Attention, transformer networks
- Machine translation, sequence-to-sequence neural networks
- Pre-training for NLP: GPT, BERT
- Text-to-text NLP: T5
- Scaling effects of language models, unsupservised task acquisition
- Tasks / Applications
- Text classification
- Machine translation
- Natural language inference
- Question answering
- Summarization
- Search
- Text generation
Course schedule could be found here (subject to change).
Please join our Slack group, the link could be found in the class Blackboard.
All class announcements will be posted on Slack.
Class meets
Mon 3:30 - 6:20 pm (with a 15-minute break)
Room: Falmouth 309
Class format
Each class will consist of
- Quiz review and Q&A (10 minutes)
- Lecture (80 min)
- Practicum/Lab (60 min)
There will be a 15-minute break after the lecture.
During the Practicum/Lab segment of the class, we will focus on technical (coding) skills and provide homework guidance. You will be expected to work on your homework during this time.
Staff
| Name | Contact | Office | Office hours | |
|---|---|---|---|---|
| Instructor | Anna Rumshisky | arumshisky@gmail.com | Dandeneau 318 | TBA |
| TA | Vlad Lialin | vlialin@cs.uml.edu | Dandeneau 415 | Mon 2pm - 4pm |
| TA | Namrata Shivagunde | Namrata_Shivagunde@student.uml.edu | Dandeneau 415 | TBA |
COVID Safety
- Since some of us have high-risk family members at home, we ask that you wear KN95 masks in class.
- If you do not have a KN95 mask, two masks may be requested here
- If you have been in contact with someone who tested positive for COVID, PLEASE DO NOT COME TO CLASS.
- To join remotely, please contact us and we will send you a zoom link for you to join the class remotely (also posted on Blackboard).
Note that it might be difficult to ask questions and participate interactively if you join the class via zoom.
Remote learning
Class recordings will be available on Echo (you need to log in with your University logon):
Textbook
Reading materials will be made available include via the course website in electronic forms.
We may use selections from the following NLP textbooks:
- Jurafky & Martin 2021. Speech and Natural Language Processing
- Eisenstein 2018. Natural Language Processing.
Cheat sheets:
- Machine learning cheat sheet
- Review: Math preliminaries
- Python cheat sheet
Numpycheat shet- Review: Mathematics for deep learning
- Github for beginners
- Github help
Grading
| Homeworks & quizzes | 60% |
| Research Paper Presentations | 10% |
| Final Project | 30% |
There will be no final or midterm.
Homeworks
- We will have 6-7 homeworks.
- Homeworks are due at midnight on the day before the next lecture.
- Homeworks will be posted on the course website and linked from the course schedule
- Homeworks must be submitted via Blackboard; you must submit a PDF of your homework and a link to a Github repository with your code.
Quizzes
- We will have take-home quizzes after every lecture
- Quizzes are due immediately before the next lecture
- Quizzes will be posted on the course website
Research Paper Presentations
- Each student will be required to present a research paper assigned as readings for the class.
- More information about the research paper presentations can be found here.
Final Projects
- You will have several alternatives for the final project.
- Projects will be done in groups of 2.
Late Policy:
- Quizzes can not be submitted late.
- Homeworks will be accepted up to 2 (two) days after the original due date.
- Homeworks submitted up to 1 full day late will be graded at a 10% reduction.
- Homeworks submitted up to 2 full days late will be graded at a 20% reduction.
- After 2 days, Homeworks will not be accepted.
Collaboration Policy:
- Homeworks and quizzes must be done individually.
- Projects can be done in groups of two.
For the work submitted by a group, please also include a description of what was done by each group member.
Violating the collaboration policy by copying other people’s work, as well as any other instance of cheating, including copying solutions from existing sources, carries the following penalties: (1) First violation leads to getting zero credit for the submitted assignment (2) Second violation leads to failing the course.