The 21st century has witnessed the birth of data analytics in baseball. Almost every professional organization has an in-house data analytics team dedicated to providing player insights, scouting reports and game predictions. Each team has access to tons of private data which they use to give themselves as much of an advantage as possible both in preparation and in real-time during games. Currently, the Cal baseball team has an array of unstructured and disorganized data available. However, because college teams are not generating as much revenue as professional teams, their budget is much smaller and they cannot simply hire an analyst on staff. Thus, very little to no insights are currently being drawn from the data sources.

The Clubhouse team in UC-Berkeley’s Data-X course set out to tackle this issue. Using machine learning and data analytics, the team was able to make use of the vast amount of data available and develop three models, each of which providing insights that the Cal team has never had before. The models can all be accessed through a user-friendly UI. Because of this product, Cal’s pitchers and rotations have improved resulting in more wins for the program. The team’s future work will focus on gaining access to opposing team data to further improve the models and boosting accuracy.