Michigan Data Science Team | Winter 2026

Heart Failure
Survival Analysis

Identifying key clinical factors that distinguish patient survival from mortality using statistical analysis and machine learning techniques.

299
Patients Analyzed
13
Clinical Features
2
Key Predictors Found
68%
Survival Rate

Two Features Are All You Need

Our analysis confirms the groundbreaking finding from Chicco & Jurman (2020): machine learning models can predict heart failure survival using just two biomarkers, performing comparably to models using all 13 features.

Serum Creatinine

Kidney function indicator

Ejection Fraction

Heart pumping efficiency

Model Performance

Random Forest (2 features)

MCC: 0.418 82% Accuracy

Random Forest (All 13 features)

MCC: 0.384 85% Accuracy

Simpler models achieve comparable performance

Research Background

Based on Published Research

This project replicates and extends the findings from peer-reviewed medical informatics research.

Original Study

Chicco & Jurman (2020) published in BMC Medical Informatics, demonstrating ML prediction of heart failure survival.

ML Classifiers

Random Forest, Gradient Boosting, SVM, and other classifiers were compared for predictive performance.

Clinical Impact

Both serum creatinine and ejection fraction are routinely measured, enabling practical clinical application.

Project Schedule

Weekly Learning Path

A structured approach to learning data science through hands-on medical data analysis.

Week 1

Exploratory Data Analysis

Load the dataset, understand features, create visualizations, and identify patterns in heart failure data.

Pandas Seaborn Matplotlib
Week 2

Statistical Analysis

Apply hypothesis testing, correlation analysis, feature importance, and multicollinearity detection.

T-test Mann-Whitney VIF Random Forest
Week 3

Unsupervised Learning

Dimensionality reduction with PCA and clustering to find natural groupings in patient data.

PCA K-Means Hierarchical
Week 4+

Predictive Modeling

Build and evaluate machine learning models to predict patient survival outcomes.

Classification Cross-Validation Model Evaluation
Methodology

Analysis Techniques

A comprehensive toolkit for medical data analysis and machine learning.

Statistical Testing

Rigorous hypothesis testing to identify significant differences between patient groups.

  • Welch's T-test for mean comparison
  • Mann-Whitney U for non-parametric testing
  • Benjamini-Hochberg FDR correction
  • Pearson correlation analysis

Feature Importance

Machine learning techniques to rank predictive power of clinical features.

  • Random Forest Gini importance
  • Permutation importance
  • Variance Inflation Factor (VIF)
  • Coefficient of Variation

Unsupervised Learning

Discover hidden patterns and natural groupings in patient data.

  • Principal Component Analysis
  • K-Means clustering
  • Hierarchical clustering
  • Silhouette analysis

Predictive Modeling

Build robust classifiers to predict patient survival outcomes.

  • Random Forest classifier
  • Logistic Regression
  • Cross-validation strategies
  • ROC-AUC, MCC evaluation
Get Started

Quick Setup

Clone the repository and start exploring in minutes.

# Clone the repository
git clone https://github.com/MichiganDataScienceTeam/W26-MDST-Project_Heart-Failure-Survival-Analysis.git
cd W26-MDST-Project_Heart-Failure-Survival-Analysis

# Set up virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies and launch
pip install -r requirements.txt
jupyter notebook
Our Team

Project Leads

Guiding the project with expertise in data science and machine learning.

SB

Sina Bonakdar

Project Lead

TZ

Terry Zhang

Project Lead

Learn More

Resources

Documentation, tutorials, and references to deepen your understanding.