The Lifecycle Of A Machine Learning Project: Sea Turtle Conservation And Predictive Modeling

Abstract: I will walk through all important steps of a machine learning project from problem definition and data collection to an interpretable predictive model and scientific/actionable insights in this tutorial. I will use an academic project to illustrate important concepts on
- how to incorporate external datasets,
- feature generation from time series data,
- data exploration and visualization,
- the importance of proper cross-validation approaches,
- how to improve the interpretability of supervised machine learning models using XGBoost and SHAP values.

We will analyze a rich dataset on the basking behavior of green sea turtles. This is a collaboration between data scientists at the Center for Computation and Visualization at Brown University and the Hawaii Wildlife Fund, a non-profit wildlife conservation organization. Green sea turtles are endangered marine animals that bask or rest on beaches. Maui’s Ho’okipa beach hosts one of the largest and densest basking aggregations in the state of Hawaii. Volunteers from the Hawaii Wildlife Fund are stationed at the beach from approximately 2:30 – 7:30pm every day of the year with the exception of severe weather conditions (hurricane force winds, etc.). They have been recording human visitor and basking turtle counts on the beach for years.

The goals of the collaboration are to better understand the basking behavior of the turtles and to inform the Hawaii Wildlife Fund’s management and policy decisions with the use of predictive modeling.

Bio: Andras Zsom is a Lead Data Scientist at the Center for Computation and Visualization at Brown University. He is managing a small but dedicated team of data scientists with the mission to help high level university administrators to make better data-driven decisions with data analysis and predictive modeling, we collaborate with faculty members on various data-intensive academic projects, and we also train data science interns.

Andras is passionate about using machine learning and predictive modeling for good. He is an astrophysicist by training and he has been fascinated with all fields of the natural and life sciences since childhood. He was a postdoctoral researcher at MIT for 3.5 years before coming to Brown. He obtained his PhD from the Max Planck Institute of Astronomy at Heidelberg, Germany; and he was born and raised in Hungary.