team photo

Figure 1
project photo

Figure 2
project photo

Team 32

Team Members

Faculty Advisor

John Gauthier
Samarth Kasbawala
Michael Mannella
Ryan Miller

Wei Wei



sponsored by
No Sponsor

The sports industry has always relied on statistics and past data to make inferences about future trends and performance. This sector has become so lucrative that the sports analytics market is projected to reach nearly $5 billion by 2021. Major League Baseball is one of the largest consumers and industry drivers of these analytics. Statistical analysis is nothing new in baseball, and has been used to give teams an edge against their competition since its inception. Our team set out to create a predictive model that allows users to select matchups between two different baseball teams. In particular, users can select the away team, the home team, the away starting pitcher, and the home starting pitcher as input to the model and will be presented with the win probability for both of the selected teams. To accomplish this, the group leveraged the power of machine learning and used comprehensive data from Retrosheet to train various models. For the matchup predictions, Random Forest Classifiers gave the best results. The team also used machine learning order to make predictions for the 2021 standings for the end of the season using Ridge Regression. An accompanying webapp was made and can be accessed by anyone by visiting the website: On the site, users can interact with the predictor and find links to see the source code. The data processing and machine learning was done in Python in the form of Jupyter Notebooks, while the webapp used a combination of HTML, CSS, and Python in conjunction with the Flask microframework.