PeAR WPI

Welcome to RBE474X/595-A01-SP: Deep Learning for Perception course at WPI. This course is taught by Prof. Nitin J. Sanket. This course is both an undergraduate (RBE474X) and a graduate course (RBE595-A01-SP) with the same course content but slightly different assignments. If you want to get graduate credit for the course, be sure to enroll in the RBE595-A01-SP course and NOT RBE474X. Please sign up for Piazza if you haven't done so already.

All the class announcements will be made through Piazza. Please use Piazza to contact TAs. Please do NOT contact the TA's or the Instructor via email unless it's an emergency, and do NOT contact the TA's on any social media platform such as Facebook or WhatsApp (please respect their privacy) regarding course content. If you want to have a chat about our research, feel free to reach out to Prof. Sanket after class.

All the student reports will be released publicly online to enable better learning experience for others. We will be also announce top three submissions for each announcement on the website. All the projects and homeworks are to be submitted using Canvas. If you find any errors/typo in the course github website, please edit the markdown '.md' file and send a 'pull request'. If you don't know how to use pull request, please check out this tutorial, alternatively post on Piazza.

Project 0 (P0) has been released on Aug 22, 2024 and is due on Aug 24, 2024. The knowledge from P0 is a pre-requisite for the class.

What is this course about?

This course is about learning the mathematical foundations of deep learning applied to images. Perception stacks in state-of-the-art robots are rapidly adapting the latest advancements in deep learning due to their efficacy and high accuracy. These deep learning-based methods are also accelerable using parallelized hardware such as GPUs that can enable low latency operations of complex tasks such as real-time scene segmentation. You will learn to formulate, develop and implement of deep learning solutions for common computer vision problems in the context of robot perception. The course will cover advanced and state-of-the-art topics such as sim2real, adversarial attacks on neural networks, vision transformers and diffusion models. Additional topics explored in this course include image formation, linear classifiers, neural networks and backpropagation, Convolutional Neural Networks (CNNs), CNN architectures, data generation for sim2real, black- and white-box attacks on neural networks as applied to build state-of-the-art robotic stack. You will gain knowledge about the considerations required to enable a robotic system with the state-of-the-art deep learning toolkit. A unique aspect of this course is that the course is designed to balance theory with applications through projects.

Team

Prof. Nitin J. Sanket

Instructor
he/him/his

Manoj Velmurugan

Teaching Assistant
he/him/his

Philip Brush

Grader
he/him/his

Pre-requisities

Programming proficiency (preferably in Python 3), experience with Linux and completion of P0 is a hard pre-requisite. We assume that the students are proficient with basic Linear Algebra (MA2071/2072), calculus (MA1024) and probability (MA2621/2631). Experience working on images programmatically is a big plus.

Class Timings and Location

Class Timings: Tuesdays and Fridays 10:00AM to 11:50AM.
Class Location: Atwater Kent 232 (AK232).

Office Hours

Manoj Velmurugan: Thursdays, 2:00PM to 3:50PM, UH243 Curtain Space.
Philip Brush: Mondays and Wednesdays, 12:00PM to 1:00PM, UH243 Curtain Space.
Prof. Nitin J. Sanket: ONLY for serious issues with Appointment, UH250E.

Expectations

This course is fast-paced and has high-expectations of time and effort commitment from students. The students are expected to complete projects on time since the concepts build on top of each other and it is very easy to fall behind in this course. Please understand the concepts well, ask questions, use Piazza heavily and come to office hours. Help your peers out, but do not plagiarize.

Software Environment

We will use Python 3 as the programming platform throughout this course along with packages from OpenCV, PyTorch, TensorFlow, Numpy, Scikit and Matplotlib.

Assignments

All projects are to be done in groups of THREE. However, we encourage you to discuss with your peers and not cheat. For clarifications, post a private post on Piazza. For further details, read the Collaboration Policy and Honor Code below. We are pro ChatGPT (and other LLMs) in this course as long as the chat prompts are included in the report and more than 30% assignment is not made by LLM prompts.

Assignment Name With Link	Deadline	Grade Percentage	Student Outputs
P0: Alohomora!	Sunday, Aug 25, 2024 at 11:59:59 PM (Individual Submissions)	5	NA
P1: Nifty Neural Networks!	Monday, Sept 02, 2024 at 11:59:59 PM (Group Submissions)	18	TBD
P2: Dramatic Data!	Thursday, Sept 12, 2024 at 11:59:59 PM (Group Submissions)	17	TBD
Midterm Exam 1 (In-class)	Friday, Sept 13, 2024 (Individual Submissions)	8	NA
P3: Neural Nemesis!	Saturday, Sept 28, 2024 at 11:59:59 PM (Group Submissions)	20	TBD
P4: Dreaming Data!	Friday, Oct 11, 2024 at 9:30:00 AM (Group Submissions)	20	TBD
Midterm Exam 2 (In-class)	Friday, Oct 11, 2024 (Individual Submissions)	12	NA

Class Slides

Class Number	Class Date	Slides/Resources	Class Topic(s)
1	Aug 23 2024	Slides, Video:	Introduction, Logistics And Sensors
2	Aug 27 2024	Slides, Video:	Multi Layer Perceptrons And Backpropagation
3	Aug 30 2024	Slides, Video:	NN Tuning, Image Filtering And Convolutional Neural Networks
4	Sept 03 2024	Slides, Video:	Advanced CNN Architectures And Image Warping
5	Sept 06 2024	Slides, Video:	Simulation for Data Generation And Sim2Real
6	Sept 10 2024	Slides, Video:	Object Detection And Segmentation
7	Sept 13 2024	Slides, Video:	Learned Depth: Monocular + Stereo And Midterm Exam 1 (In-class)
8	Sept 17 2024	Slides, Video:	Vision Transformers, Can We Trust Neural Networks?
9	Sept 20 2024	Slides, Video:	Single Pixel Attacks, Patch Based Attacks
10	Sept 24 2024	Slides, Video:	Generative Models: VAEs, GANs, Attacking GANs
11	Oct 01 2024	Slides, Video:	Advanced Generative Models: Diffusion Models
12	Oct 04 2024	Slides, Video:	Advanced Generative Models++: Multi-modal Generative Deep Learning
13	Oct 08 2024		Deep Learning Is Not Enough!
14	Oct 11 2024		Summary, Recap, Conclusions And Midterm Exam 2 (In-class)

Syllabus

Rigid body transformations, attitude estimation, Bayesian filters, linear and unscented Kalman filters, camera models, Gaussian mixture models, image processing, visual feature detection and tracking, projective geometry, optical flow, stereopsis, quadrotor dynamics and controls, and structure from motion/SLAM.

Piazza and Canvas

If you haven't done so already, register yourself on Piazza with your wpi email. We'll be using Piazza for all announcements and discussions. Please use Piazza to contact the instructor/TAs (Feel free to use private posts on Piazza to contact me/TAs). Please do NOT contact the instructor/TAs via email unless it's an emergency, and do NOT contact the instructor/TAs on any social media platform such as Facebook or WhatsApp (please respect our privacy) regarding course content. If you want to have a chat about research, chat with Prof. Sanket after class.

All the assignments will be released on this website page.

All the assignments are to be submitted using Canvas. If you find any errors/typo in the course website, please post on Piazza.

Submission Policy

If you find a discrepancy in due dates on the website or Canvas, please post on Piazza as soon as possible. If no such posts are found, the earlier deadline will be used as the correct deadline. The submissions are made through Canvas (unless otherwise specified) with the name pk_DirID.zip for individual projects (where k is the assignment number, for e.g., for project 0, this would be p0_DirID.zip). Here, DirID is your directory ID, i.e., the first part of your wpi email address. For e.g., if your wpi email address is ABCD@wpi.edu, then your DirID is ABCD. For group projects, the name would be p0_groupGROUPNUMBER.zip (for project 0), where GROUPNUMBER is the number of your group. For e.g., if you are submitting for project 2 and group number 4, then the name would be p2_group4.zip. Refer to submission guidelines of the respective assignment for more details. Keep your submissions professional, grammatically correct without spelling mistakes. Do not use slangs and chat shorthands on your submissions. You'll get 25% grade penalty for not following the submission guidelines.

Late Submission Policy

This course moves quickly, and concepts will build on top of each other. Therefore it's very important to keep up with the material. To encourage this, late assignments are penalized for 25% per day after the due date. But life is unpredictable; we all need a break sometimes. So, we allow you four late days, to spend on any assignment(s) except the midterms and the final project (P4). You may submit an assignment late (after the due date) using a late day without any penalty. Think of a late day as pushing the deadline back by a day. So, to get full credit on a 2-days-late assignment, you'd need to use two late days. Late days can only be spent as full days (i.e., you can't use only half a late day for an assignment you submit 12 hrs late). If you are using a late day, mention it in the title of your submission as "USING X LATE DAY(S)" and post a comment on Canvas about the usage of a late day. We expect you to keep track of number of late days you have remaining and notify us of usage of one or more late days. We will default for penalty if we don't see the mention of late day usage as a comment on Canvas on that particular assignment. Again, If you are using a late day, mention it in the title of your submission as "USING X LATE DAY(S)".

Collaboration Policy and Honor Code

Collaboration is HIGHLY encouraged, but one should know the difference between collaboration and cheating. Cheating is prohibited and will carry serious consequences. Cheating may be defined as using or attempting to use unauthorized assistance, material, or study aids in academic work or examinations. Some examples of cheating are: collaborating on an take-home exam or homework unless explicitly allowed; copying homework; handing in someone else's work as your own; and plagiarism. You are welcome to collaborate with your peers on Piazza and in person. However it's important that the work you submit is an expression of your understanding, and not merely something you copied from a peer. So, we place strict limits on collaboration: Firstly, you must clearly cite your collaborators by name at the top of your report. This includes Piazza posts reference. You may not share or copy each other's code. You can discuss how your code works, and the concepts it implements, but you can't just show someone your code. You may use free and publicly available sources, such as books, journal and conference publications, and web pages, as research material for your answers. (You will not lose points for using external sources.) You may not use any service that involves payment, and you must clearly and explicitly cite all outside sources and materials that you made use of. We consider the use of uncited external sources as portraying someone else's work as your own, and as such it is a violation of the University's policies on academic dishonesty. Instances will be dealt with harshly and typically result in a failing course grade. Unless otherwise specified, you should assume that that the UMD Code of Academic Integrity applies. Unless otherwise specified, you should assume that that the WPI Code of Academic Integrity applies.

Reference Books

All concepts will be covered in class lecture, and in the lecture notes. However, we also recommend the following books as good references:

Deep Learning Book, MIT Press, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016.
Computer Vision: Algorithms and Applications, Springer, Richard Szeliski, 2010.
Digital Image Processing, Prentice Hall, Rafael Gonzalez, and Richard Woods, 2008.

Furthermore, refer to the slides from courses in the acknowledgement section.

Acknowledgements

This course is developed by being inspired by adapting some of the best parts of each of the courses at multiple universities and these resources are linked below and you are encouraged to look at their content to learn from them as well. The goal of this course is to be the best undergraduate deep learning course in the world.