tweaks

hsf-training · Aug 21, 2024 · a87959e · a87959e
1 parent 5f62137
commit a87959e
Show file tree

Hide file tree

Showing 2 changed files with 21 additions and 20 deletions.
diff --git a/deep-learning-intro-for-hep/intro.md b/deep-learning-intro-for-hep/intro.md
@@ -1,8 +1,8 @@
 # Deep learning for particle physicists
 
-This book is an introduction to modern neural networks (deep learning), intended for particle physicists. Deep learning is a popular topic, so there are many courses on it, each assuming a different level of mathematical background. Most particle physicists need to use machine learning for their analysis or detector studies, and they have a unique combination of mathematical knowledge and familiarity with statistics that can help in understanding the foundations. However, most physicists don't have the same background as computer scientists or data scientists, so it can be hard to find a machine learning course that both uses what a physicist knows to shorten some explanations and go deeper in others, while also not assuming what a typical physicist does not know.
+This book is an introduction to modern neural networks (deep learning), intended for particle physicists. Most particle physicists need to use machine learning for data analysis or detector studies, and their unique combination of mathematical and statistical knowledge puts them in a good position to understand the topic deeply. However, most introductions to deep learning can't assume that their readers have this background, and advanced courses assume specialized knowledge that physics audiences may not have.
 
-This book is "introductory" because it emphasizes the foundations of what neural networks are, how they work, _why_ they work, and provides practical steps to train neural networks of any topology. It does not get into the (changing) world of network topologies or designing new kinds of machine learning algorithms to fit new problems.
+This book is "introductory" because it emphasizes the foundations of what neural networks are, how they work, _why_ they work, and it provides practical steps to train neural networks of any topology. It does not get into the (changing) world of network topologies or designing new kinds of machine learning algorithms to fit new problems.
 
 The material in this book was first presented at [CoDaS-HEP](https://codas-hep.org/) in 2024: [jpivarski-talks/2024-07-24-codas-hep-ml](https://github.com/jpivarski-talks/2024-07-24-codas-hep-ml). It also has roots in [jpivarski-talks/2024-07-08-scipy-teen-track](https://github.com/jpivarski-talks/2024-07-08-scipy-teen-track) (though that was for an entirely different audience). I am writing it in book format, rather than simply depositing my slide PDFs and Jupyter notebooks in [https://hsf-training.org/](https://hsf-training.org/), because the original format assumes that I'll verbally fill in the gaps. This format is good for two purposes:
 
@@ -11,13 +11,13 @@ The material in this book was first presented at [CoDaS-HEP](https://codas-hep.o
 
 The course materials include some inline problems, intended for active learning during a lecture, and a large project designed for students to work on for about two hours. (In practice, experienced students finished it in an hour and beginners could have used a little more time.)
 
-Also, this course uses [Scikit-Learn](https://scikit-learn.org/) and [PyTorch](https://pytorch.org/) for examples and problem sets. [TensorFlow](https://www.tensorflow.org/) is also a popular machine learning library, but its functionality mostly duplicates PyTorch, and I didn't want to hide the conceptual material behind different interfaces. (I included Scikit-Learn because it is much simpler than PyTorch, which makes it good for smaller examples.)
+This course uses [Scikit-Learn](https://scikit-learn.org/) and [PyTorch](https://pytorch.org/) for examples and problem sets. [TensorFlow](https://www.tensorflow.org/) is also a popular machine learning library, but its functionality mostly duplicates PyTorch, and I didn't want to hide the concepts behind incidental differences in software interfaces. (I _did_ include Scikit-Learn because its interface is much simpler than PyTorch. When I want to emphasize issues that surround fitting in general, I'll use Scikit-Learn because the fit itself is just two lines of code, and when I want to emphasize the details of the machine learning model, I'll use PyTorch, which expands the fit into tens of lines of code and allows for more control of this part.)
 
-I believe that the choice of choice of PyTorch over TensorFlow is more future-proof. The plot below, derived using the methodology in [this GitHub repo](https://github.com/jpivarski-talks/2023-05-09-chep23-analysis-of-physicists) and [this talk](https://indico.jlab.org/event/459/contributions/11547/), shows adoption of machine learning libraries by CMS physicists, in which Scikit-Learn, PyTorch, and TensorFlow are all equally popular.
+I didn't take the choice of PyTorch over TensorFlow lightly (since I'm a newcomer to both). I verified that PyTorch is about as popular as TensorFlow among CMS physicists using the plot below (derived using the methodology in [this GitHub repo](https://github.com/jpivarski-talks/2023-05-09-chep23-analysis-of-physicists) and [this talk](https://indico.jlab.org/event/459/contributions/11547/)). Other choices, such as [JAX](https://jax.readthedocs.io/), would be poor choices because a reader of this tutorial would not be prepared to collaborate with machine learning as it is currently practiced in particle physics.
 
 ![](img/github-ml-package-cmsswseed.svg){. width="100%"}
 
-However, beyond particle physics, <a href="https://trends.google.com/trends/explore?q=%2Fm%2F0h97pvq,%2Fg%2F11bwp1s2k3,%2Fg%2F11gd3905v1&date=2014-08-14%202024-08-14">PyTorch is more frequently the subject of Google searches than TensorFlow</a> and [PyTorch is much more frequently used in machine learning competitions than TensorFlow](https://mlcontests.com/state-of-competitive-machine-learning-2023/#deep-learning). Although [JAX](https://jax.readthedocs.io/), an array library intended to build machine learning algorithms from the ground up, is interesting, it's not yet widely used by machine learning practitioners in particle physics or beyond.
+Moreover, PyTorch seems to be a more future-proof choice than TensorFlow by examining their use outside of particle physics. Trends in <a href="https://trends.google.com/trends/explore?q=%2Fm%2F0h97pvq,%2Fg%2F11bwp1s2k3,%2Fg%2F11gd3905v1&date=2014-08-14%202024-08-14">Google search volume</a> show an increase in interest in PyTorch at the expense of TensorFlow ("JAX" is a common word with meanings beyond machine learning, making it impossible to compare), and PyTorch is much more frequently used by [machine learning competition winners](https://mlcontests.com/state-of-competitive-machine-learning-2023/#deep-learning) in the past few years.
 
 ```{tableofcontents}
 ```
diff --git a/deep-learning-intro-for-hep/overview.md b/deep-learning-intro-for-hep/overview.md
@@ -24,11 +24,11 @@ The upshot of the machine learning revolution is that we now have two ways to ma
 
 ![](img/craftsmanship.jpg){. width="49%"} ![](img/farming.jpg){. width="49%"}
 
-Machine learning will not make hand-written programs obsolete any more than farming made manual tool-building obsolete: the two methods of development have different properties and the most appropriate applications of each don't entirely overlap.
+Machine learning will not make hand-written programs obsolete any more than farming made manual tool-building obsolete: the two methods of development have different strengths and the most appropriate applications of each don't entirely overlap.
 
 Programming by hand allows for more precise control of the finished product, but the complexity of hand-written programs is fundamentally limited by a human mind or a team's ability to communicate. Encapsulation, specifications, and protocols help ever-larger teams work together on shared programming projects, but they do so by simplifying the interfaces, and there are limits to how simple some problems can be cast.
 
-Machine learning, on the other hand, allows for extremely nuanced solutions. Machine learning algorithms are developed by allowing enormous numbers of parameters to fluctuate, biased toward configurations that solve the problem at hand. By analogy, living systems randomly sample the space of possible configurations, biased toward configurations that survive or are cultivated by human farmers, and thus the anatomy of a plant is more intricate than any human could ever invent. Although we steer this process toward preferred bulk properties, we don't control it in detail.
+Machine learning, on the other hand, allows for extremely nuanced solutions. Machine learning algorithms are developed by allowing enormous numbers of parameters to fluctuate, biased toward configurations that solve the problem at hand. By analogy, living systems randomly sample the space of possible configurations, biased toward configurations that survive (natural evolution) or are selected by human farmers (cultivation), and thus the anatomy of a plant is more intricate than any human could ever invent. Although we steer this process toward preferred bulk properties, we don't control it in detail.
 
 Thus, simple or moderately complex problems that need to be controlled with precision are best solved by hand-written programs, while machine learning is best for extremely complex problems or problems that can't be solved (or solved as accurately) any other way. It is entirely reasonable for hand-written and machine learning algorithms to coexist in the same workflow—in fact, it's common for machine learning algorithms to be encapsulated in conventional frameworks, which deliver the machine learning outputs where they need to go, and perhaps adjust for unexpected outputs when the machine learning algorithm goes awry.
 
@@ -39,23 +39,24 @@ Some authors make distinctions between the terms
 * Artificial Intelligence (AI) and
 * Machine Learning (ML).
 
-I have not seen practicioners of AI/ML distinguish these terms consistently, so I treat them as synonymous. Moreover, these two terms and
+I have not seen practicioners of AI/ML distinguish these terms consistently, so I treat them as synonymous. Moreover, these terms and
 
 * data mining
+* MultiVariate Analysis (MVA)
 
-are all applied to many-parameter fits of large datasets, and "machine learning" and "data mining" were both introduced in the decades when "artificial intelligence" was out of favor as a funded research topic—as a way to continue the research under a different name. As data analysis techniques, all three may be considered synonymous, but "data mining" wouldn't be used to describe _generative_ techniques, such as simulating chat text, images, or physics collision events, the way that "artificial intelligence" and "machine learning" would.
+are all applied to many-parameter fits of large datasets. "Machine learning," "data mining," and "multivariate analysis" were all introduced during the decades when "artificial intelligence" was out of favor as a funded research topic—as a way to continue the research under different names. As data analysis techniques, all of these words may be considered synonymous, but "data mining" and "multivariate analysis" wouldn't be used to describe _generative_ techniques, such as simulating chat text, images, or physics collision events, the way that "artificial intelligence" and "machine learning" are.
 
-All three terms describe the following procedure:
+All of these terms describe the following general procedure:
 
 1. write an algorithm (a parameterized "model") that generates output that depends on a huge number of internal parameters;
 2. vary those parameters ("train the model") until the algorithm returns expected results on a labeled dataset ("supervised learning") or until it finds patterns according to some desired metric ("unsupervised learning");
-3. either apply the trained model to new data, to describe the new data in the same terms as the training dataset ("predictive"), or use the model to generate new data that is plausibly similar to the training dataset ("generative").
+3. either apply the trained model to new data, to describe the new data in the same terms as the training dataset ("predictive"), or use the model to generate new data that is plausibly similar to the training dataset ("generative"; AI and ML only).
 
-Apart from the word "huge," this is curve-fitting, which most experimental physicists use on a daily basis. Consider a dataset with two observables (called "features" in ML), $x$ and $y$, and suppose that they have an approximate, but not exact, linear relationship. There is [an exact algorithm](https://en.wikipedia.org/wiki/Linear_regression#Formulation) to compute the best fit of $y$ as a function of $x$, and this linear fit is a model with two parameters: the slope and intercept of the line. If $x$ and $y$ have a non-linear relationship expressed by $N$ parameters, a non-deterministic optimizer like [MINUIT](https://en.wikipedia.org/wiki/MINUIT) can be used to search for the best fit.
+Apart from the word "huge," this procedure also describes curve-fitting, a ubiquitous analysis technique that most experimental physicists use on a daily basis. Consider a dataset with two observables (called "features" in ML), $x$ and $y$, and suppose that they have an approximate, but not exact, linear relationship. There is [an exact algorithm](https://en.wikipedia.org/wiki/Linear_regression#Formulation) to compute the best fit of $y$ as a function of $x$, and this linear fit is a model with two parameters: the slope and intercept of the line. If $x$ and $y$ have a non-linear relationship expressed by $N$ parameters, a non-deterministic optimizer like [MINUIT](https://en.wikipedia.org/wiki/MINUIT) can be used to search for the best fit.
 
-ML fits differ from curve-fitting in that the number of parameters is huge and, while the individual parameters are usually meaningful in curve-fitting, the parameters of an ML model are not meaningful in themselves, just as a means to get predictions or generate data from the model. Most machine learning models don't even have a unique minimum—all that matters is what the model predicts or generates.
+ML fits differ from curve-fitting in the number of parameters used and their interpretation—or rather, their lack of interpretation. In curve fitting, the values of the parameters and their uncertainties are regarded as the final product, often quoted in a table as the result of the data analysis. In ML, the parameters are too numerous to present this way and wouldn't be useful if they were, since the calculation of predicted values from these parameters is complex. Instead, the ML model is used as a machine to predict $y$ for new $x$ values (prediction) or to randomly generate new $x$, $y$ pairs with the same distribution as the training set (generation). In fact, most ML models don't even have a unique minimum in parameter space—different combinations of parameters would result in the same predictions.
 
-Today, most high-performance ML models are "deep" neural networks, a particular type of ML algorithm. Neural networks will be described in detail in upcoming chapters; I'm mentioning it now to point out that it's not the only one. Other types of ML algorithms include:
+Today, the most accurate and versatile class of ML models are "deep" Neural Networks (NN), where "deep" means having a large number of internal layers. I will describe this type of model in much more detail, since this course will focus exclusively on them. However, it's worth pointing out that NNs are just one type of ML model; others include:
 
 * [Naive Bayes classifiers](https://en.wikipedia.org/wiki/Naive_Bayes_classifier),
 * [k-Nearest Neighbors (kNN)](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm),
@@ -67,14 +68,14 @@ Today, most high-performance ML models are "deep" neural networks, a particular
 * [Support Vector Machines (SVMs)](https://en.wikipedia.org/wiki/Support_vector_machine),
 * [Hidden Markov Models (HMMs)](https://en.wikipedia.org/wiki/Hidden_Markov_model),
 
-and many more. Boosted random forests were particularly popular in particle physics before the deep learning revolution (around 2015), and they're still widely used (through [XGBoost](https://xgboost.readthedocs.io/) and ROOT's [TMVA](https://root.cern/manual/tmva/)). Most of these algorithms are still relevant in particular domains or with datasets that are too small for a deep neural network.
+Boosted random forests were particularly popular in particle physics before the deep learning revolution (around 2015), and they're still widely used (through [XGBoost](https://xgboost.readthedocs.io/) and ROOT's [TMVA](https://root.cern/manual/tmva/)). Most of the above algorithms are still relevant in some domains, particularly if available datasets are too small to train a deep NN.
 
-This course focuses on neural networks because
+This course focuses on NNs for several reasons.
 
-1. at heart, it's a simple algorithm, a generalization of a linear fit,
-2. they're applicable to a broad range of problems, when large enough training datasets and computational resources are available,
-3. they're open to experimentation with different neural network topologies, and
-4. at the time of writing, we are in the midst of an ML revolution, due almost entirely to deep neural networks.
+1. At heart, an NN is a simple algorithm, a generalization of a linear fit.
+2. NNs are applicable to a broad range of problems, when large enough training datasets and computational resources are available to train them.
+3. They're open to experimentation with different NN topologies.
+4. At the time of writing, we are in the midst of an ML/AI revolution, almost entirely due to advances in deep NNs.
 
 ## Goals of this course