Predicting Pet Life Expectancy: An AI Hackathon Challenge
MARS hosts an internal data science/AI hackathon every year, and for 2022 the challenge was to create a model that could predict pet life expectancy based on medical history.
I decided to compete this year to take a break from my daily deep learning work and exercise some AI muscles I hadn’t used in a while (regression problems and tabular data 😅):
The model I submitted scored an RMSE of 0.695 years (8 months).
The training data consisted of the medical history of 2000 pets (canine and feline), including the age that they passed away (e.g. 10.856 years).
The test data contained the medical history of 1000 pets.
The basic workflow was:
Exploratory Data Analysis to understand the data and remove any obvious bad data (e.g. pets with age in negative years)
Basic feature selection (about 15 columns of data; e.g. number of vet visits over lifetime)
Feature Engineering; creating new features from our understanding of the data. (over 3k features/columns)
Final Feature Selection; Using feature selection packages to determine which features had the biggest impact on our RMSE. (down to ~500 columns)
Baseline model testing (XGBOOST, CatBoost, LightGBM). LightGBM had the best RMSE scores and was the fastest to train.
Hyperparameter Tuning; Used Optuna and LightGBMTuner to find the best Hyperparameters for our model.
Final round of training and testing before model submission!
Technology Used:
Python 🐍
LightGBM ⚡
Optuna 📊
Honorable Mentions
CatBoost
XGBoost
3rd Place had and RMSE of 0.687 and first had 0.661, a very close competition!