Datathons
Datathons
Discover
Checklist
- Learn more about the problem. Search for similar Kaggle competitions. Check the task in Papers with Code.
- Do a basic data exploration. Try to understand the problem and gather a sense of what can be important.
- Get baseline model working.
- Design an evaluation method as close as the final evaluation. Plot local evaluation metrics against the public ones (correlation) to validate how well your validation strategy works.
- Try different approaches for preprocessing (encodings, Deep Feature Synthesis, lags, aggregations, imputers, …). If you're working as a group, split preprocessing feature generation between files.
- Plot learning curves (sklearn or external tools) to avoid overfitting.
- Plot real and predicted target distribution to see how well your model understand the underlying distribution. Apply any postprocessing that might fix small things.
- Tune hyper-parameters once you've settled on an specific approach ([hyperopt](target distribution), optuna).
- Plot and visualize the predictions (histograms, random prediction, …) to make sure they're doing as expected. Explain the predictions with SHAP.
- Think about what postprocessing heuristics can be done to improve or correct predictions.
- Stack classifiers (example).
- Try AutoML models. For tabular data: TPOT, AutoSklearn, AutoGluon, Google AI Platform, PyCaret, Fast.ai, Alex.For time series: AtsPy, DeepAR.
Preprocessing Resources
Exploratory Data Analysis Resources
Scikit Learn Compatible Transformers
Other Compatible Tools
Time Series Resources
- Quick Tutorials
- Tsfresh
- Fold
- Neural Prophet or TimesFM
- Darts
- Functime
- Pytimetk
- Sktime / Aeon
- Awesome Collection
- Video with great ideas
- Tutorial Kaggle Notebook
- Think about adding external datasets like related Google Trends search, PiPy Packages downloads, Statista, weather, …