Berkeley Syllabus

Applied Data Science with Venture Applications
IEOR 135/ 290

Instructors: Ikhlaq Sidhu & Arash Nourian
Department of Industrial Engineering & Operations Research

3 Units, Lecture and Lab

Link to current schedule of topics and homeworks

Prerequisites: Interested students should have working knowledge of Python in advance of the class, and also should have completed a fundamental probability or statistics course.

Teaching Team:

  • Ikhlaq Sidhu, sidhu@berkeley.edu (Instructor)
  • Arash Nourian, nourian@berkeley.edu (Instructor)
  • Harika Kalluri, harikakalluri@berkeley.edu (Project coordinator)
  • Lillian Dong, lilliandong@berkeley.edu (GSI)
  • Zhi Li, zhili@berkeley.edu (GSI)
  • Deirdre Quillen, deirdrequillen@berkeley.edu (GSI)
  • Ishaan Malhi, ishaan.malhi@berkeley.edu(GSI)

Extended Team:

  • Alexander Fred-Ojala, afo@berkeley.edu

Office Hours:

  • Monday 1-2pm in Etcheverry Hall Room 4176b
  • Fridays 2-3pm in Etcheverry Hall, IEOR Conference Room

Arash Nourian , by appointment vi email
Ikhlaq Sidhu: by appointment via Melissa Glass, m.glass@berkeley.edu

Description

Course Description:

This course is designed primarily for upper-level undergraduate engineering and technical students. Graduate students at a mezzanine level can also take a co-located section of the course. The course material offers an understanding at the intersection of foundational math mathematical concepts and current computer science tools, with applications of real world problems.  Math concepts include filtering, prediction, classification, decision-making, entropy as part of information theory, LTI systems, spectral analysis, and frameworks for learning from data.  Computer science tools for this course include open source tools such as Python with Numpy, Scipy, Pandas, SQL, NLTK, Tensor Flow, and Spark.  The course includes a team based data application project.

The lectures present alternating and related topics between mathematical frameworks and the same concept within code examples. One goal is that students who understand math concepts can bring them to life with scalable CS tools.  And, students who are comfortable with computer software code can create systems by understanding selected, structured mathematical frameworks. This course is designed to be more applied than a traditional ML algorithms course as it includes a systems view and covers implementation concepts.

Applications of this course are broad.  They include industry sectors such as finance, health, engineering, transportation, energy, and many others.  The lab section of the course meets in parallel with the lecture.  In the lab, the first 4 weeks are used to generate a story and low-tech demo for a real-world project that performs actions on data, and the following 8 weeks will be an agile sprint, with a demonstration of working project code by the end of the class. The skill set learned in this class can be applied to a broad range of industry sectors such as finance, health, engineering, transportation, energy, and many others.

Find our amazing projects from previous semesters here.

TEXTS AND REQUIRED SUPPLIES

HOMEWORK, GRADING & ATTENDANCE

Class attendance and participation are expected, and sign-ins for sessions are tracked.  Absences for unavoidable reasons should be preapproved whenever possible via an email to the GSI

Grading: (Required to be taken on Letter Grade only)
The class will be graded according to the categories below. At the end of the class there will be a poster presentation + live demo during reading week where invited judges will provide assessment of each project.

  • Homework: 35%
  • Quizzes: 15%
  • Low Tech Validated Solution (Demo + MVP): 20%
  • Final Project + Write up + Code Review: 30%

Based on our previous experience in the course, we have decided to use the following percentile thresholds for the final grading. We plan to award A (top 30%), A- (next 30%), B+ (next 25%) and case by case grading for the rest. We reserve the right to increase or decrease these thresholds based on the performance of the class.

Student Accommodations: Students with disabilities who need accommodations in order to have equal access to this course will be accommodated. If you have not done so already, please contact DSP and apply for services. If you are already eligible for services, please be sure to request your accommodation letters for this class. You are welcome to visit me in office hours or to schedule an individual appointment with me via email to review your accommodations.

SCHEDULE OF TOPICS

  • On a weekly basis, class sessions may start with a “meet a mentor” and/or “application model case study” section.*
  • All slides and notebook samples will be updated at this site.
Topic 1: Introduction
Theory: Overview of Frameworks for obtaining insights from data (Slides).
Tools: Python Review
Code 1. Introduction to GitHub
2. Setting up Anaconda Environment
3. Coding with Python Review
Homework HW1

HW2

 Project Module 1: Project Introduction
Topic 2: Tools: Linear Regression, Data as a Signal with Correlation
Code
Reading
Project Module 2: Team Formation 1
Topic 3: Theory: Regression -ML
Code  Coding with Numpy
Reading DataCamp, tutorialpoint,
Project Module 3 Module 3: Team Formation 2
Topic 4: Theory: Classification and Logistic Regression
Code Coding with Pandas
Reading
 Project Develop insightful story and brainstorm solutions
Topic 5: Theory: Correlation
Code
Reading Correlation Reading
 Project Team break out discussions
Topic 6: Theory: Prediction & Intro to Skikit-Learn
Code Coding with Skikit-Learn
Reading Prediction Slides
 Project
Topic 7: Theory: Matplotlib / Data Visualization
Code Coding with Matplotlib
Reading
 Project
Topic 8: Theory: Low Tech Demo Presentations
Code
Reading
 Project Module 4: Low Tech Demo
Topic 9: Theory: Classification & Prediction
Code Reference Titanic Notebook (part1-4)
Reading
 Project Module 5
Topic 10: Theory: Machine Learning & Cross Validation
Code Coding with python for ML deploy-Flask
Reading Machine Learning Reading
 Project Module 5
Topic 10: Theory: Decision Trees, Information Theory, Random Forest
Code Reference Titanic Notebook (Part 4)
Reading Slides
 Project Module 5
Topic 11: Tools: Web Scraping  & Web Crawling
Code Web Scraping Notebook  , Breakout
Reading
 Project Module 6
Topic 12: Theory:
1. Introduction to Natural Language Processing – NLTK overview and Word2vec
2. Sentiment Analysis
Tools: NLTK, Gensim, Tensorflow
Code Coding with NLTK, Gensim, Tensorflow
Reading Links    , Slides
 Project Module 6
Topic 13: Theory: Polynomial Regression, Bias Variance Tradeoff, Regularization
Code  Regularization Notebook
Reading Slides
 Project Module 7
Topic 14: Theory: Introduction to Neural Networks- ANN, CNN, RNN
Tools:  Tensorflow
Code Coding with Tensorflow for image classification
Reading Slides
 Project Module 8
Topic 15: Theory:
1. Introduction to database
2. Introduction to SQL
3.  Introduction to Block Chain as a database
4. Big Data Analysis with Spark
Tools: SQL libraries in python, Solidity
Code Coding with python for SQL
Reading Text Book
 Project Module 9
Topic 16: Theory: Spectral Signals, LTI -Fundamentals and Applications
Tools: Temporal and Spatial Signal processing
Code Coding with python for Spark
Reading
 Project Module 10
Topic 17: Theory: GAN/Reinforcement Learning
Code TBD
Reading
Project Module 11-12
Topic 18: Project Presentations – Demo Day(s)
Code Presentation including running code and code samples
Due  Includes preparation time in last week
 Project Final Presentations
  • To include,  if possible tool: Connecting Pandas to SQL for Long-term storage.  AWS / SQL / Parallelization.
  • Example application topics may include examples such as recommendation engines, digital mirror, customer journey, bloom filters, fuzzy join applications.

COURSE MODEL ILLUSTRATION:

dx-project