**Applied Data Science with Venture Applications**

IEOR 135/ 290-002

IEOR 135/ 290-002

Instructor: Ikhlaq Sidhu

Department of Industrial Engineering & Operations Research

3 Units, Lecture and Lab:

Prerequisite: Interested students should have working knowledge of Python in advance of the class, and also should have completed a fundamental probability or statistics course.

Teaching Team:

- GSI: Sana Iqbal: sana_iqbal@berkeley.edu
- Visiting Scholar: Alexander Fred-Ojala, afo@berkeley.edu
- Kevin Bozhe Li, kbl4ew@berkeley.edu
- Tensor Flow Lead: Nathan Cheng, ncheng@berkeley.edu
- NLTK Lead: Sam Choi, sam.choi@berkeley.edu
- Blockchain: Nadir Akhtar, nadir@blockchain.berkeley.edu
- Blockchain: Ali Mousa, alimousa@berkeley.edu

**Office Hours:** Tuesdays 11am – 12pm, Etcheverry Hall 4176B (Breakout room)

__Description__

**Course Description:**

This course is designed primarily for upper-level undergraduate engineering and technical students. Graduate students at a mezzanine level can also take a co-located section of the course. The course material offers an understanding at the intersection of foundational math mathematical concepts and current computer science tools, with applications of real world problems. Math concepts include filtering, prediction, classification, decision-making, Markov chains, LTI systems, spectral analysis, and frameworks for learning from data. Computer science tools for this course include open source tools such as Python with Numpy, Scipy, Pandas, SQL, NLTK, Tensor Flow, and Spark. The course includes a team based data application project.

The lectures present alternating and related topics between mathematical frameworks and the same concept within code examples. One goal is that students who understand math concepts can bring them to life with scalable CS tools. And, students who are comfortable with computer software code can create systems by understanding selected, structured mathematical frameworks. This course is designed to be more applied than a traditional ML algorithms course as it includes a systems view and covers implementation concepts.

Applications of this course are broad. They include industry sectors such as finance, health, engineering, transportation, energy, and many others. The lab section of the course meets in parallel with the lecture. In the lab, the first 4 weeks are used to generate a story and low-tech demo for a real-world project that performs actions on data, and the following 8 weeks will be an agile sprint, with a demonstration of working project code by the end of the class. The skill set learned in this class can be applied to a broad range of industry sectors such as finance, health, engineering, transportation, energy, and many others.

__TEXTS AND REQUIRED SUPPLIES__

- Handouts in class
- General Information: https://data-x.blog/
- Github for Code and Slides: https://github.com/ikhlaqsidhu/data-x
- Anaconda Python Environment on personal computer

__HOMEWORK, GRADING & ATTENDANCE__

Class attendance and participation are expected, and sign-ins for sessions are tracked. Absences for unavoidable reasons should be preapproved whenever possible via an email to the GSI

**Grading:**

- Homework: 35%
- Attendance: 20%
- Low Tech Validated Solution 10%
- Final Project Demo, depth: 35%

__Piazza__

piazza.com/class/j6o5l788o874i

__SCHEDULE (Subject to Change)__

- On a weekly basis, class sessions may start with a “meet a mentor” and/or “application model case study” section.*
- All slides and notebook samples will be updated at this site: https://github.com/ikhlaqsidhu/data-x

Topic 1: |
Introduction: Overview of Frameworks for obtaining insights from data (Slides). Python Review with Notebook |

Code | Anaconda, Python. Setting up Anaconda EnvironmentPython Review BKHW |

DUE | HW 1 Assigned |

Project | Office Hours Session that week for Environment Set Up |

Topic 2: |
Tools: NumPy with Notebook. Theory: Prediction |

Code | NumPy BKHW including prediction example |

DUE | Bring 3 ideas to next class. HW 1 Due |

Project |
Form Teams |

Topic 3: |
Tools: Data signals in Tables. Slides: Pandas Overview Theory: Data as a Signal with Correlation |

Code | Pandas BKHW. Stock Market live download to Pandas, DataFrame, Quant trading algorithm, correlation exercise |

DUE | HW2 Due |

Project | Form Teams Part II |

Topic 4: |
Theory: ML Overview Algorithm Comparison Tools: SKL with Classification and Regression |

Code | SKL Classification and Regression Notebook |

DUE | HW 3 Due |

Project | Validate and Adjust |

Topic 5: |
Tools: Trees and Classification Theory: Spectral Signals, LTI Fundamentals |

Code | Decision Tree BKHW, Example with Titanic Data set |

DUE | HW 4 Due |

Project | Low Tech Demo and Validation Results |

Topic 6: |
Theory: Classification, Loss, Reward, Logistic Regression, Regularization Tools: Image Processing/Classification Toolset |

Code | BKHW: Logistic Classification, Image Fundamentals |

DUE | HW5 Due |

Project | Agile sprint with reflection |

Topic 7: |
Theory: TBD Tools: Web Scraping |

Code | BKHW: Web scraping and web crawling |

DUE | HW6 Due |

Project | Agile sprint with reflection |

Topic 8: |
Theory: TBD, Word2Vec Tools: NLTK, Natural Language Processing |

Code | BKHW: TBD |

DUE | HW7 Due |

Project | Agile sprint with reflection |

Topic 9: |
Theory: Neural Networks and CNNs Tools: Tensor Flow |

Code | BKHW: Cats and Dogs |

DUE | HW8 Due |

Project | Agile sprint with reflection |

Topic 10: |
Theory: Block Chain Tools: Block Chain as a database |

Code | BKHW: TBD (probably Programming in Solidity) |

DUE | HW9 Due |

Project | Agile sprint with reflection |

Topic 11: |
Theory: Data and Control Systems Tools: Spark |

Code | BKHW: TBD |

DUE | HW10 Due |

Project | Agile sprint with reflection |

Topic 12: |
Project Presentations – Demo Day(s) |

Code | Presentation including running code and code samples |

DUE | Includes preparation time in last week |

Project | Final Presentations |

- To include, if possible tool: Connecting Pandas to SQL for Long-term storage. AWS / SQL / Parallelization.
- Example application topics may include examples such as recommendation engines, digital mirror, customer journey, bloom filters, fuzzy join applications.

__ COURSE MODEL ILLUSTRATION:__