Syllabus: Data-X

Data-X: Data, Signals, and Systems
IEOR 190D/ 290-003
Spring 2017

Instructor: Ikhlaq Sidhu
Department of Industrial Engineering & Operations Research

Offered Spring 2017, 3 Units, Lecture and Lab:

  • Undergraduate Section: 190D, Class Number 33036
  • Graduate Section: Class Number INDENG 290 ā€“ 003, 33258

Prerequisite: Interested students should have working knowledge of Python in advance of the class, and also should have completed a fundamental probability or statistics course.

Location: Barrows 60, Time: 5:10 pm-7:59

Teaching Team:

Description

This course surveys a variety of key of concepts that are useful for designing and building applications that process data signals.  The course also introduces modern open source, computer programming tools and libraries that can be used to implement these applications.  These concepts include filtering, prediction, classification, decision-making, Markov chains, LTI systems, spectral analysis, and frameworks for learning from data.    After reviewing each concept, we explore implementing it within sample applications using Python using libraries for math array functions (Numpy), manipulation of tables (Pandas), long term storage (SQL, JSON, CSV files), natural language (NLTK), and ML frameworks (ScikitLearn, TensorFlow).  The course includes a team based data application project.

The skill set learned in this class can be applied to a broad range of industry sectors such as finance, health, engineering, transportation, energy, and many others.  The lab section of the course meets in parallel with the lecture.  In the lab, the first 4 weeks are used to generate a story and low-tech demo for a real-world project that performs actions on data, and the following 8 weeks will include code development, with a demonstration of working project code by the end of the class.

TEXTS AND REQUIRED SUPPLIES

  • Handouts in class
  • Anaconda Python Environment on personal computer

HOMEWORK, GRADING & ATTENDANCE

Class attendance and participation are expected, and sign-ins for sessions are tracked.  Absences for unavoidable reasons should be preapproved whenever possible via an email to the GSI

Grading:

  • Homework: 35%
  • Attendance: 15%
  • Low Tech Validated Solution 15%
  • Final Project Demo, depth: 35%

OFFICE HOURS & COMMUNICATIONS

TBD

SCHEDULE (Subject to Change)

Lec # Topic Tools Cookbook Examples HW DUE Lab
0 Introduction: Overview of Frameworks for obtaining insights from data (Slides)

Slides: Python and Math/Probability Pre-requisites

Anaconda, Python Setting up Anaconda Environment HW 1 Assigned
1 Notebook: Python Numpy Notebook

Slides: Data Structure Outline

Slides: Numpy Review

Python, Numpy, Pandas, JSON formatted files Earthquake Data live query

Example with JSON file

Bring 3 ideas to next class

HW 1 Due

Form Teams
2 Data signals in Tables.  Slides: Pandas Overview

Notebook: Pandas Intro

Notebook: Pandas and Stock Market

Pandas, Numpy, SciPy, Mathplotlib Stock market live download to Pandas DataFrame. Quant trading algorithm HW 2 Due Form Teams
3 Scoring, Linear Prediction and Max Likelihood Prediction. Extending to multiple variables Numpy, SciPy, Mathplotlib Code samples: 2 variable and multi-variable Linear Prediction HW 3 Due Validate and Adjust
4 Classification.  Logistic Regression, SVM, Nonlinear mapping Scikit Learn, Seaborg Visualization Classification example with Iris Database: Logistic, SVM TBA Low Tech Demo and Validation Results
5 Classification II KNN

Data storage with SQL

Scikit Learn, Seaborg Visualization Classification example with Iris Database: KNN

SQL/Pandas Data Exchange example with Call Center Complaint data

TBA Agile Sprint with feedback and reflection
6 NLTK Introduction, Markov Processes Introduction (Discrete Time) NLTK Next Word Predictor, Spell Checking TBA Agile Sprint with feedback and reflection
7 Markov Continued.

Bayesian

Bag of Words Model?

NLTK Corpora Access: Tweets, Gutenberg, Shakespeare.?

Grammar Checking

Web Crawler to increase training set

TBA Agile Sprint with feedback and reflection
8 Neural Nets/Deep Learning Introduction Tensor Flow Classification using TensorFlow MINST Handwriting example TBA Agile Sprint with feedback and reflection
9 Neural Nets/Deep Learning Tensor Flow Tensorflow Example: Spam Filter for email Agile Sprint with feedback and reflection
10 Data as a signal I, LTI Systems, Convolution, Filters, Correlations Python/Numpy/SciPy A Control Feedback Example and/or multivariate correlation TBA Agile Sprint with feedback and reflection
11 Data as a signal II, Transforms, spectral information matrix based features Python/Numpy/SciPy TBD, possibly with Kalman Filter

Stock market or weather frequency information

TBA Agile Sprint with feedback and reflection
12 Image Classification, focus on feature selection Python/OpenCV Image Classify.  Python/Unix Shell usage TBA Agile Sprint with feedback and reflection
13 Prep for Final Projects TBA Demo Day
 


COURSE MODEL ILLUSTRATION:

course-model