Data-X: Data, Signals, and Systems
IEOR 190D/ 290-003
Spring 2017
Instructor: Ikhlaq Sidhu
Department of Industrial Engineering & Operations Research
Offered Spring 2017, 3 Units, Lecture and Lab:
- Undergraduate Section: 190D, Class Number 33036
- Graduate Section: Class Number INDENG 290 ā 003, 33258
Prerequisite: Interested students should have working knowledge of Python in advance of the class, and also should have completed a fundamental probability or statistics course.
Location: Barrows 60, Time: 5:10 pm-7:59
Teaching Team:
- GSI: Kevin Bozhe Li, [email protected]
- Visiting Scholar: Alexander Fred-Ojala, [email protected]
- Tensor Flow Lead: Nathan Cheng: [email protected]
Description
This course surveys a variety of key of concepts that are useful for designing and building applications that process data signals. The course also introduces modern open source, computer programming tools and libraries that can be used to implement these applications. These concepts include filtering, prediction, classification, decision-making, Markov chains, LTI systems, spectral analysis, and frameworks for learning from data. After reviewing each concept, we explore implementing it within sample applications using Python using libraries for math array functions (Numpy), manipulation of tables (Pandas), long term storage (SQL, JSON, CSV files), natural language (NLTK), and ML frameworks (ScikitLearn, TensorFlow). The course includes a team based data application project.
The skill set learned in this class can be applied to a broad range of industry sectors such as finance, health, engineering, transportation, energy, and many others. The lab section of the course meets in parallel with the lecture. In the lab, the first 4 weeks are used to generate a story and low-tech demo for a real-world project that performs actions on data, and the following 8 weeks will include code development, with a demonstration of working project code by the end of the class.
TEXTS AND REQUIRED SUPPLIES
- Handouts in class
- Anaconda Python Environment on personal computer
HOMEWORK, GRADING & ATTENDANCE
Class attendance and participation are expected, and sign-ins for sessions are tracked. Absences for unavoidable reasons should be preapproved whenever possible via an email to the GSI
Grading:
- Homework: 35%
- Attendance: 15%
- Low Tech Validated Solution 15%
- Final Project Demo, depth: 35%
OFFICE HOURS & COMMUNICATIONS
TBD
SCHEDULE (Subject to Change)
Lec # | Topic | Tools | Cookbook Examples | HW DUE | Lab |
0 |
Introduction: Overview of Frameworks for obtaining insights from data (Slides)
Slides: Python and Math/Probability Pre-requisites |
Anaconda, Python | Setting up Anaconda Environment | HW 1 Assigned | |
1 |
Notebook: Python Numpy Notebook
Slides: Data Structure Outline Slides: Numpy Review |
Python, Numpy, Pandas, JSON formatted files |
Earthquake Data live query
Example with JSON file |
Bring 3 ideas to next class
HW 1 Due |
Form Teams |
2 |
Data signals in Tables. Slides: Pandas Overview
Notebook: Pandas Intro Notebook: Pandas and Stock Market |
Pandas, Numpy, SciPy, Mathplotlib | Stock market live download to Pandas DataFrame. Quant trading algorithm | HW 2 Due | Form Teams |
3 | Scoring, Linear Prediction and Max Likelihood Prediction. Extending to multiple variables | Numpy, SciPy, Mathplotlib | Code samples: 2 variable and multi-variable Linear Prediction | HW 3 Due | Validate and Adjust |
4 | Classification. Logistic Regression, SVM, Nonlinear mapping | Scikit Learn, Seaborg Visualization | Classification example with Iris Database: Logistic, SVM | TBA | Low Tech Demo and Validation Results |
5 |
Classification II KNN
Data storage with SQL |
Scikit Learn, Seaborg Visualization |
Classification example with Iris Database: KNN
SQL/Pandas Data Exchange example with Call Center Complaint data |
TBA | Agile Sprint with feedback and reflection |
6 | NLTK Introduction, Markov Processes Introduction (Discrete Time) | NLTK | Next Word Predictor, Spell Checking | TBA | Agile Sprint with feedback and reflection |
7 |
Markov Continued.
Bayesian Bag of Words Model? |
NLTK |
Corpora Access: Tweets, Gutenberg, Shakespeare.?
Grammar Checking Web Crawler to increase training set |
TBA | Agile Sprint with feedback and reflection |
8 | Neural Nets/Deep Learning Introduction | Tensor Flow | Classification using TensorFlow MINST Handwriting example | TBA | Agile Sprint with feedback and reflection |
9 | Neural Nets/Deep Learning | Tensor Flow | Tensorflow Example: Spam Filter for email | Agile Sprint with feedback and reflection | |
10 | Data as a signal I, LTI Systems, Convolution, Filters, Correlations | Python/Numpy/SciPy | A Control Feedback Example and/or multivariate correlation | TBA | Agile Sprint with feedback and reflection |
11 | Data as a signal II, Transforms, spectral information matrix based features | Python/Numpy/SciPy |
TBD, possibly with Kalman Filter
Stock market or weather frequency information |
TBA | Agile Sprint with feedback and reflection |
12 | Image Classification, focus on feature selection | Python/OpenCV | Image Classify. Python/Unix Shell usage | TBA | Agile Sprint with feedback and reflection |
13 | Prep for Final Projects | TBA | Demo Day | ||
COURSE MODEL ILLUSTRATION: