Applied Data Science with Venture Applications
IEOR 135/ 290
Instructor: Ikhlaq Sidhu
Department of Industrial Engineering & Operations Research
3 Units, Lecture and Lab
Prerequisite: Interested students should have working knowledge of Python in advance of the class, and also should have completed a fundamental probability or statistics course.
Teaching Team:
 Lead: Ikhlaq Sidhu, IEOR, sidhu@berkeley.edu
 Visiting Scholar: Alexander FredOjala, afo@berkeley.edu
 GSI: Sana Iqbal: sana_iqbal@berkeley.edu
 Kevin Bozhe Li, kbl4ew@berkeley.edu
 Blockchain: Nadir Akhtar, nadir@blockchain.berkeley.edu
 Blockchain: Ali Mousa, alimousa@berkeley.edu
Office Hours: Tuesdays 11am – 12pm, Etcheverry Hall 4176B (Breakout room)
Description
Course Description:
This course is designed primarily for upperlevel undergraduate engineering and technical students. Graduate students at a mezzanine level can also take a colocated section of the course. The course material offers an understanding at the intersection of foundational math mathematical concepts and current computer science tools, with applications of real world problems. Math concepts include filtering, prediction, classification, decisionmaking, entropy as part of information theory, LTI systems, spectral analysis, and frameworks for learning from data. Computer science tools for this course include open source tools such as Python with Numpy, Scipy, Pandas, SQL, NLTK, Tensor Flow, and Spark. The course includes a team based data application project.
The lectures present alternating and related topics between mathematical frameworks and the same concept within code examples. One goal is that students who understand math concepts can bring them to life with scalable CS tools. And, students who are comfortable with computer software code can create systems by understanding selected, structured mathematical frameworks. This course is designed to be more applied than a traditional ML algorithms course as it includes a systems view and covers implementation concepts.
Applications of this course are broad. They include industry sectors such as finance, health, engineering, transportation, energy, and many others. The lab section of the course meets in parallel with the lecture. In the lab, the first 4 weeks are used to generate a story and lowtech demo for a realworld project that performs actions on data, and the following 8 weeks will be an agile sprint, with a demonstration of working project code by the end of the class. The skill set learned in this class can be applied to a broad range of industry sectors such as finance, health, engineering, transportation, energy, and many others.
Find our amazing projects from Fall2017 here.
TEXTS AND REQUIRED SUPPLIES
 HandsOn Machine Learning with ScikitLearn and TensorFlow By Aurélien Géron
 General Information
 Github for Code and Slides
 Anaconda Python Environment on personal computer
HOMEWORK, GRADING & ATTENDANCE
Class attendance and participation are expected, and signins for sessions are tracked. Absences for unavoidable reasons should be preapproved whenever possible via an email to the GSI
Grading: (Required to be taken on Letter Grade only)
 Homework: 35%
 Attendance: 20%
 Low Tech Validated Solution 10%
 Final Project Demo, depth: 35%
Piazza
piazza.com/class/j6o5l788o874i
TENTATIVE SCHEDULE FOR SPRING 2018
 On a weekly basis, class sessions may start with a “meet a mentor” and/or “application model case study” section.*
 All slides and notebook samples will be updated at this site.
Topic 1:  Introduction Theory: Overview of Frameworks for obtaining insights from data (Slides). Tools: Python Review 
Code  1. Introduction to GitHub 2. Setting up Anaconda Environment 3. Coding with Python Review 
DUE  Homework 1 assigned. 
Project  Office Hours Session that week for Environment Set Up 
Topic 2:  Tools: NumPy, Pandas, Matplotlib 
Code 

DUE  HW 1 Due 
Project  Bring 3 ideas to class. Mixer: Form teams for the final project. 
Topic 3:  Theory: Data as a Signal with Correlation Tools: Webscraping – crawling and API use 
Code  Coding with BeautifulSoup and other python scraping libraries 
DUE  Homework 2 Due 
Project  Form Teams Part II 
Topic 4:  Theory: Prediction Algorithms Primer Tools: Scikit Learn for Classification and Regression 
Code  Coding with Scikit Learn 
DUE  Homework 3 Due 
Project  Validate and Adjust 
Topic 5:  Theory: 1. Introduction to Theory of Machine Learning 2. Regularization in ML – Overview Tools: Data Wrangling and Prediction in Python 
Code  Coding with Pandas, Scikit Learn, Matplotlib on Titanic dataset 
DUE  Homework 4 Due 
Project  Low Tech Demo and Validation Results 
Topic 6:  Theory: Introduction to Neural Networks ANN, CNN, RNN Tools: Tensorflow 
Code  Coding with Tensorflow for image classification 
DUE  Homework 5 Due 
Project  Agile sprint with reflection 
Topic 7:  Theory: 1. Introduction to Natural Language Processing – NLTK overview and Word2vec 2. Sentiment Analysis Tools: NLTK, Gensim, Tensorflow 
Code  Coding with NLTK, Gensim, Tensorflow 
DUE  Homework 6 Due 
Project  Agile sprint with reflection 
Topic 8:  Theory: 1. Loss versus Risk 2. Theory of Decision Trees 
Code  Coding with python 
DUE  Homework 7 Due 
Project  Agile sprint with reflection 
Topic 9:  Theory: 1. Introduction to database 2. Introduction to SQL 3. Introduction to Block Chain as a database 4. Big Data Analysis with Spark Tools: SQL libraries in python, Solidity 
Code  Coding with python for SQL and Spark 
DUE  Homework 8 due. 
Project  Agile sprint with reflection 
Topic 10:  Theory: Spectral Signals, LTI Fundamentals and Applications Tools: Temporal and Spatial Signal processing 
Code  Coding with python for signal processing 
DUE  Homework 9 Due 
Project  Agile sprint with reflection 
Topic 11:  Theory: Reinforcement Learning primer 
Code  TBD 
DUE  Homework 10 Due 
Project  Agile sprint with reflection 
Topic 12:  Project Presentations – Demo Day(s) 
Code  Presentation including running code and code samples 
DUE  Includes preparation time in last week 
Project  Final Presentations 
 To include, if possible tool: Connecting Pandas to SQL for Longterm storage. AWS / SQL / Parallelization.
 Example application topics may include examples such as recommendation engines, digital mirror, customer journey, bloom filters, fuzzy join applications.
COURSE MODEL ILLUSTRATION: