Heart Failure Survival Analysis

Winter 2026 | Michigan Data Science Team

Project Overview

Dataset

299 heart failure patient records with 13 clinical features. Binary outcome: survival or death during follow-up period.

Key Finding

Two features (serum creatinine & ejection fraction) alone achieve strong predictive performance comparable to models using all 13 features.

Learning Path

9 weeks of hands-on lessons covering data exploration, statistics, ML, ensemble methods, deep learning, and model interpretation with real clinical data.

Curriculum

Week 1 Data Exploration

Introduction to the dataset, exploratory data analysis, and visualization techniques using Pandas, Seaborn, and Matplotlib.

Week 2 Statistical Analysis

Hypothesis testing (T-test, Mann-Whitney U), correlation analysis, FDR correction, and variance inflation factor (VIF) for multicollinearity detection.

Week 3 Unsupervised Learning

Dimensionality reduction with PCA, K-Means clustering, hierarchical clustering, silhouette analysis, and the elbow method.

Week 4 Supervised Learning

Classification algorithms including Logistic Regression, Random Forest, SVM, and KNN. Train/test splitting with stratification and model evaluation metrics.

Week 5 Hyperparameter Optimization

Advanced tuning techniques: GridSearchCV, Random Search, and Bayesian Optimization with Optuna for efficient hyperparameter space search.

Week 6 Ensemble Methods & Boosting

From Random Forest to Gradient Boosting to LightGBM. Hyperparameter tuning with Optuna, evaluation with 4 metrics, and hands-on exercises.

Week 7 Feature Selection Methods

Why fewer features often beat more: Lasso (L1), Elastic Net (L1+L2), and MRMR filter method. Learn to identify which clinical features truly drive survival prediction.

Week 8 Deep Learning for Medical Data

Introduction to deep learning with Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTMs). Learn neural network fundamentals, backpropagation, activation functions, and how to apply deep learning to clinical prediction tasks.

Week 9 AI Model Interpretation (PLS-DA & SHAP)

Interpretability and explainability in machine learning. Partial Least Squares Discriminant Analysis (PLS-DA) for supervised dimensionality reduction, and SHAP (SHapley Additive exPlanations) for understanding model predictions and feature importance in black-box models.

Getting Started Tutorials

Interactive Jupyter notebooks to help you set up and get started

Git Tutorial

Learn Git version control: installation, core concepts, essential commands, branching, and workflows.

Open Tutorial

Virtual Environment Tutorial

Master Python virtual environments (venv): setup, activation, package management, and troubleshooting.

Open Tutorial

Research Background

This project is based on the paper by Chicco & Jurman (2020): "Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone"

Key Findings

  • Serum creatinine and ejection fraction achieve competitive predictive performance with all 13 features
  • Random Forest achieves best results with Matthews Correlation Coefficient (MCC) of 0.418
  • Clinical relevance: Both biomarkers are routinely measured and can guide decision-making
  • Demonstrates the power of feature selection and ensemble methods in medical ML
  • Dataset: 299 heart failure patients with 13 clinical features
Read Full Paper