Age Prediction Task

Task: Predict a Person’s Age from Brain Image Data

This repository contains the code and dataset for Task 1 of the Advanced Machine Learning course at ETH. The goal of this task is to predict a person’s age based on brain MRI data.

Introduction

Magnetic Resonance Imaging (MRI) is a key technology in medical imaging. It uses a magnetic field and radio waves to produce 3D detailed anatomical images. MRI is non-invasive and non-radiative, making it a preferred method for investigating sensitive organs like the brain.

Why Brain Age Estimation is Interesting

Aging does not affect everyone uniformly. Individual rates of aging are influenced by environmental, genetic, and epigenetic factors. Brain MRI studies have shown a relationship between accelerated aging and brain atrophy. Predicting brain age can improve early diagnosis and risk assessments for neurodegenerative and neuropsychiatric diseases such as Alzheimer’s, Parkinson’s, and Huntington’s diseases.

Dataset Specifications

Features Size: Approximately 830 features
Dataset Size: Approximately 1200 samples

MRI feature extraction for this project uses around 800 anatomical features derived from image data using FreeSurfer. The dataset includes:

X_train.csv: Training features
y_train.csv: Training labels (ages)
X_test.csv: Testing features
sample.csv: Sample submission file

Task Specifications

Subtask 1: Filling Missing Values

Background: The dataset contains missing values represented as NaNs.
Requirement: Fill the missing values in the training and test sets using suitable imputation methods (mean, median, most frequent, etc.).

Subtask 2: Outlier Detection

Background: The training set contains outliers.
Requirement: Build an outlier detection model to classify samples in the training set as outliers or non-outliers.

Subtask 3: Feature Selection

Background: The dataset includes additional manual features that require selection.
Requirement: Use feature selection methods to label features as selected or unselected (irrelevant and redundant features).

Main Task: Age Prediction

Background: After preprocessing and dimensionality reduction, the main task is regression-based age prediction.
Requirement: Use suitable regression methods to predict the age from brain MRI features.

Evaluation and Rules

The main evaluation metric is the Coefficient of Determination (R²). It measures the proportion of variance in the dependent variable that is predictable from the independent variable, ranging from 1 (best) to negative infinity.

from sklearn.metrics import r2_score
score = r2_score(y_true, y_pred)

Files Description

X_train.csv: Training features
y_train.csv: Training labels (ages)
X_test.csv: Testing features
sample.csv: Sample submission file
task1.ipynb: Main task solution file
requirements.txt: File with the requirements on the environment to run the notebook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Age Prediction Task

Task: Predict a Person’s Age from Brain Image Data

Table of Contents

Introduction

Why Brain Age Estimation is Interesting

Dataset Specifications

Task Specifications

Subtask 1: Filling Missing Values

Subtask 2: Outlier Detection

Subtask 3: Feature Selection

Main Task: Age Prediction

Evaluation and Rules

Files Description

Files

README.md

Latest commit

History

README.md

File metadata and controls

Age Prediction Task

Task: Predict a Person’s Age from Brain Image Data

Table of Contents

Introduction

Why Brain Age Estimation is Interesting

Dataset Specifications

Task Specifications

Subtask 1: Filling Missing Values

Subtask 2: Outlier Detection

Subtask 3: Feature Selection

Main Task: Age Prediction

Evaluation and Rules

Files Description