Heart Failure Survival Analysis

Project Overview

Dataset

299 heart failure patient records with 13 clinical features. Binary outcome: survival or death during follow-up period.

Key Finding

Two features (serum creatinine & ejection fraction) alone achieve strong predictive performance comparable to models using all 13 features.

Learning Path

9 weeks of hands-on lessons covering data exploration, statistics, ML, ensemble methods, deep learning, and model interpretation with real clinical data.

Curriculum

Week 1 Data Exploration

Introduction to the dataset, exploratory data analysis, and visualization techniques using Pandas, Seaborn, and Matplotlib.

Notebook

Week 2 Statistical Analysis

Hypothesis testing (T-test, Mann-Whitney U), correlation analysis, FDR correction, and variance inflation factor (VIF) for multicollinearity detection.

Notebook

Week 3 Unsupervised Learning

Dimensionality reduction with PCA, K-Means clustering, hierarchical clustering, silhouette analysis, and the elbow method.

Notebook

Week 4 Supervised Learning

Classification algorithms including Logistic Regression, Random Forest, SVM, and KNN. Train/test splitting with stratification and model evaluation metrics.

Notebook

Week 5 Hyperparameter Optimization

Advanced tuning techniques: GridSearchCV, Random Search, and Bayesian Optimization with Optuna for efficient hyperparameter space search.

Notebook

Week 6 Ensemble Methods & Boosting

From Random Forest to Gradient Boosting to LightGBM. Hyperparameter tuning with Optuna, evaluation with 4 metrics, and hands-on exercises.

Notebook Slides

Week 7 Feature Selection Methods

Why fewer features often beat more: Lasso (L1), Elastic Net (L1+L2), and MRMR filter method. Learn to identify which clinical features truly drive survival prediction.

Notebook

Week 8 Deep Learning for Medical Data

Introduction to deep learning with Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTMs). Learn neural network fundamentals, backpropagation, activation functions, and how to apply deep learning to clinical prediction tasks.

Notebook

Week 9 AI Model Interpretation (PLS-DA & SHAP)

Interpretability and explainability in machine learning. Partial Least Squares Discriminant Analysis (PLS-DA) for supervised dimensionality reduction, and SHAP (SHapley Additive exPlanations) for understanding model predictions and feature importance in black-box models.

Notebook Slides

Getting Started Tutorials

Interactive Jupyter notebooks to help you set up and get started

Git Tutorial

Learn Git version control: installation, core concepts, essential commands, branching, and workflows.

Open Tutorial

Virtual Environment Tutorial

Master Python virtual environments (venv): setup, activation, package management, and troubleshooting.

Open Tutorial

Research Background

This project is based on the paper by Chicco & Jurman (2020): "Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone"

Key Findings

Serum creatinine and ejection fraction achieve competitive predictive performance with all 13 features
Random Forest achieves best results with Matthews Correlation Coefficient (MCC) of 0.418
Clinical relevance: Both biomarkers are routinely measured and can guide decision-making
Demonstrates the power of feature selection and ensemble methods in medical ML
Dataset: 299 heart failure patients with 13 clinical features

Read Full Paper