Skip to content

makagan/SSI_Projects

Repository files navigation

Machine Learning Across The Frontiers - SSI 2023 Projects

Each HEP frontier presents its own Big Data challenges, inviting the use of AI/ML to tackle them. Here we choose three specific challenges, one from each of the Energy, Intensity, and Cosmic Frontiers, that can be tackled during the school by small project teams.

Each has a dataset associated with it, which can be either downloaded to your local (or remote) computing resource, or imported to Google colab. Your team might then pick up one of the approaches described in the lectures, and try and apply it. We provide a number of tutorial notebooks below, that introduce the datasets and provide some possible starting points for you.

On the last Thursday of the school, we will hear very short presentations from each project team in a common slide deck, and award various small prizes.

For maximum community value, project teams should plan to submit their project notebook back to this repo via a pull request, so everyone can benefit from their hard work. Fork this repo and get to work!

Have a look at the Getting Started slides to get started with Github and Google Colab.

The Challenges

Energy Frontier: here, the challenge is to develop ML models for LHC jets. These could be for classification, or generative modeling. We provide a dataset to explore that includes various boosted jets, including high-level jet features, jet-images, and per-particle features. Many thanks to SSI lecturer Jennifer Ngadubia, from whose recent course the materials for this challenge are drawn!

Cosmic Frontier: here, the challenge is to develop methods for mapping Dark Matter in the Universe from weak lensing data, after exploring some related inverse problems using LSST-like imaging data. We provide suitable weak lensing datasets. Many thanks to SSI Lecturer François Lanusse for the materials for this challenge, which are based on the materials used at the Quarks2Cosmos conference!

Intensity Frontier: here, the challenge is to... Many thanks to SSI Organizer Kazu Terao for the materials for this challenge!

SSI2023 Project Prerequisites

Prerequisites for the course include basic knowledge of GitHub, Colab and python. It is thus required before the course to go through these slides as well as the following two python basics notebooks:

  • python_intro_part1.ipynb
    • Quickstart
    • Indentation
    • Comments
    • Variables
    • Conditions and if statements
    • Arrays
    • Strings
    • Loops: while and for
    • Dictionaries
  • python_intro_part2.ipynb
    • Functions
    • Classes/Objects
    • Inheritance
    • Modules
    • JSON data format
    • Exception Handling
    • File Handling

Tutorials

We've organized a variety of tutorial notebooks below, grouped by Frontier (after some more general tutorials you may find helpful). Note that your project might well benefit from techniques you pick up by looking for tutorials across the Frontiers...

General: Advanced Python

General: Introduction to PyTorch

General: PyTorch Geometric (PyG)

Energy Frontier: Basic NN with Keras for LHC jet tagging task

Energy Frontier: RNN, GNN and Transformer implementations for LHC jet tagging task

Energy Frontier: Anomaly Detection for LHC jets

Cosmic Frontier: Differentiable Forward Models, Generative Models, And Variational Inference

  • 1.PartI-DifferentiableForwardModel.ipynb
    • How to write a probabilistic forward model for galaxy images with Jax + TensorFlow Probability
    • How to optimize parameters of a Jax model
    • Write a forward model of ground-based galaxy images
  • 2.PartII-GenerativeModels.ipynb
    • Write an Auto-Encoder in Jax+Haiku
    • Build a Normalizing Flow in Jax+Haiku+TensorFlow Probability
    • Bonus: Learn a prior by Denoising Score Matching
    • Build a generative model of galaxy morphology from Space-Based images
  • 3.PartIII-VariationalInference.ipynb
    • Solve inverse problem by MAP
    • Learn how to sample from the posterior using Variational Inference
    • Bonus: Learn to sample with SDE
    • Recover high-resolution posterior images for HSC galaxies
    • Propose an inpainting model for masked regions in HSC galaxies
    • Bonus: Demonstrate single band deblending!

Cosmic Frontier: Dark Matter Mass-Mapping using Real HSC Weak Gravitational Lensing Data

  • Open challenge 4.MappingDarkMatterDataChallenge.ipynb
    • Use Jax to write a differentiable model for weak gravitational lensing
    • Use an analytic Gaussian prior to solve the inverse problem (Wiener Filtering)
    • Use Denoising Score Matching to learn the score of a prior distribution
    • Use Stochastic Differential Equations for sampling from the posterior

Other Resources

  • Pattern Recognition and Machine Learning, Bishop (2006)
  • Deep Learning, Goodfellow et al. (2016) -- link
  • Introduction to machine learning, Murray (2010) -- video lectures
  • Stanford ML courses -- link