Berkeley Syllabus

Applied Data Science with Venture Applications
IEOR 135/ 290

Instructor: Ikhlaq Sidhu
Department of Industrial Engineering & Operations Research

3 Units, Lecture and Lab

Prerequisite: Interested students should have working knowledge of Python in advance of the class, and also should have completed a fundamental probability or statistics course.

Teaching Team:

  • Ikhlaq Sidhu, IEOR, sidhu@berkeley.edu
  • Alexander Fred-Ojala, afo@berkeley.edu (lectures, notebooks, HW)
  • Sana Iqbal, sana_iqbal@berkeley.edu (class communication, attendance, HW)
  • Quinn Tran, ssquinntran@berkeley.edu

Extended Team:

  • Kevin Bozhe Li, kbl4ew@berkeley.edu
  • Blockchain: Nadir Akhtar, nadir@blockchain.berkeley.edu
  • Blockchain: Ali Mousa, alimousa@berkeley.edu

Office Hours:
Sana (HW, project): Th: 12:00-1:30pm at Etcheverry 4176 A
Alex (project, tech support):  Fridays 1-2pm at SCET / Stadium
Ikhlaq Sidhu by appointment (via m.glass@berkeley.edu): M: 10:30 am-12:30 pm at SCET / Stadium

Description

Course Description:

This course is designed primarily for upper-level undergraduate engineering and technical students. Graduate students at a mezzanine level can also take a co-located section of the course. The course material offers an understanding at the intersection of foundational math mathematical concepts and current computer science tools, with applications of real world problems.  Math concepts include filtering, prediction, classification, decision-making, entropy as part of information theory, LTI systems, spectral analysis, and frameworks for learning from data.  Computer science tools for this course include open source tools such as Python with Numpy, Scipy, Pandas, SQL, NLTK, Tensor Flow, and Spark.  The course includes a team based data application project.

The lectures present alternating and related topics between mathematical frameworks and the same concept within code examples. One goal is that students who understand math concepts can bring them to life with scalable CS tools.  And, students who are comfortable with computer software code can create systems by understanding selected, structured mathematical frameworks. This course is designed to be more applied than a traditional ML algorithms course as it includes a systems view and covers implementation concepts.

Applications of this course are broad.  They include industry sectors such as finance, health, engineering, transportation, energy, and many others.  The lab section of the course meets in parallel with the lecture.  In the lab, the first 4 weeks are used to generate a story and low-tech demo for a real-world project that performs actions on data, and the following 8 weeks will be an agile sprint, with a demonstration of working project code by the end of the class. The skill set learned in this class can be applied to a broad range of industry sectors such as finance, health, engineering, transportation, energy, and many others.

Find our amazing projects from Fall-2017 here.

TEXTS AND REQUIRED SUPPLIES

HOMEWORK, GRADING & ATTENDANCE

Class attendance and participation are expected, and sign-ins for sessions are tracked.  Absences for unavoidable reasons should be preapproved whenever possible via an email to the GSI

Grading: (Required to be taken on Letter Grade only)

  • Homework: 60%
  • Attendance: 20%
  • Low Tech Validated Solution 20%
  • Final Project Demo results in up to a full grade increase or decrease.

Piazza

piazza.com/class/jb3z144fynr5yx

TENTATIVE SCHEDULE FOR SPRING 2018

  • On a weekly basis, class sessions may start with a “meet a mentor” and/or “application model case study” section.*
  • All slides and notebook samples will be updated at this site.

 

Topic 1: Introduction
Theory: Overview of Frameworks for obtaining insights from data (Slides).
Tools: Python Review
Code 1. Introduction to GitHub
2. Setting up Anaconda Environment
3. Coding with Python Review
 Project Office Hours Session that week for Environment Set Up
Topic 2: Tools: NumPy, Pandas, Matplotlib
Code
  1. Coding with Numpy
  2. Coding with Pandas
  3. Coding with Matplotlib
Reading DataCamp, tutorialpoint, Text Book Chapter 1 | Page 3 -13
 Project Bring three ideas to the class.
Topic 3: Project Mixer
Theory
: Data as a Signal with Correlation
Code —-
Reading Text Book Chapter 2 | Page 33 -45
 Project Share ideas and finalize projects
Topic 4: Tools: Webscraping / crawling
Theory:
Prediction Algorithms Primer, Linear Regression
Code Requests and BeautifulSoup
Reading  Text Book Chapter 4| Page 105 – 110
 Project  Mixer: Form teams for the final project.
Topic 5: Theory:
1. Classification and Logistic Regression
2. Gradient Descent, Polynomial Regression, Overfitting / Underfitting, Regularization in ML: Overview
Tools: Scikit Learn for Classification and Regression
Code Coding with Pandas, Scikit Learn, Matplotlib on Titanic dataset
Reading Text Book Chapter 3| Page 81 -95
 Project Develop insightful story and brainstorm solutions
Topic 6: Theory: Introduction to Neural Networks- ANN, CNN, RNN
Tools:  Tensorflow
Code Coding with Tensorflow for image classification
Reading Text Book Chapter 10| Page 256-272, Chapter 13| Page 357-359
 Project Low Tech Demo and Validation Results
Topic 7: Theory:
1. Introduction to Natural Language Processing – NLTK overview and Word2vec
2. Sentiment Analysis
Tools: NLTK, Gensim, Tensorflow
Code Coding with NLTK, Gensim, Tensorflow
Reading Links
 Project Agile sprint with reflection
Topic 8: Theory:
1. Loss versus Risk
2. Theory of Decision Trees
Code Coding with python
Reading Text Book Chapter 6, Chapter 7
 Project Agile sprint with reflection
Topic 9: Theory:
1. Introduction to database
2. Introduction to SQL
3.  Introduction to Block Chain as a database
4. Big Data Analysis with Spark
Tools: SQL libraries in python, Solidity
Code Coding with python for SQL  and  Spark
Reading Text Book
 Project Agile sprint with reflection
Topic 10: Theory: Spectral Signals, LTI -Fundamentals and Applications
Tools: Temporal and Spatial Signal processing
Code Coding with python for signal processing
Reading  Text Book
 Project  Agile sprint with reflection
Topic 11: Theory: Reinforcement Learning primer
Code TBD
Reading Text Book Chapter 16| Page 443-450
Project Agile sprint with reflection
Topic 12: Project Presentations – Demo Day(s)
Code Presentation including running code and code samples
Due  Includes preparation time in last week
 Project Final Presentations
  • To include,  if possible tool: Connecting Pandas to SQL for Long-term storage.  AWS / SQL / Parallelization.
  • Example application topics may include examples such as recommendation engines, digital mirror, customer journey, bloom filters, fuzzy join applications.

COURSE MODEL ILLUSTRATION:

dx-project