diff --git a/.github/workflows/website.yml b/.github/workflows/website.yml index 72e3d14..9663aef 100644 --- a/.github/workflows/website.yml +++ b/.github/workflows/website.yml @@ -17,9 +17,9 @@ jobs: shell: bash steps: - name: Set up Ruby - uses: actions/setup-ruby@v1 + uses: ruby/setup-ruby@v1 with: - ruby-version: '2.7' + ruby-version: '3.3' - name: Set up Python uses: actions/setup-python@v2 diff --git a/_episodes/01-introduction.md b/_episodes/01-introduction.md index 7a7fa02..816d1c3 100644 --- a/_episodes/01-introduction.md +++ b/_episodes/01-introduction.md @@ -3,112 +3,124 @@ title: "Introduction" teaching: 30 exercises: 10 questions: -- What is machine learning? +- "What is machine learning?" +- "What are some useful machine learning techniques?" objectives: -- "Gain an overview of what machine learning is." +- "Gain an overview of what machine learning is and the techniques available." - "Understand how machine learning and artificial intelligence differ." -- "Understand some common examples of machine learning being used in our daily lives" +- "Be aware of some caveats when using Machine Learning." + keypoints: -- "Machine learning is a set of tools and techniques to find patterns in data." -- "Some machine learning techniques are useful for predicting something given some input data." -- "Some machine learning techniques are useful for classifying input data and working out which class it belongs to." -- "Artificial Intelligence is a broader term that refers to making computers show human like intelligence." -- "Some people say Artificial Intelligence to mean machine learning" -- "All machine learning systems have some kinds of limitations" +- "Machine learning is a set of tools and techniques that use data to make predictions." +- "Artificial intelligence is a broader term that refers to making computers show human-like intelligence." +- "Deep learning is a subset of machine learning." +- "All machine learning systems have limitations to be aware of." --- # What is machine learning? -Machine learning is a set of of tools and techniques which let us find patterns in data. This lesson will introduce you to a few of these techniques, but there are many more which we simply don't have time to cover here. +Machine learning is a set of techniques that enable computers to use data to improve their performance in a given task. This is similar in concept to how humans learn to make predictions based upon previous experience and knowledge. Machine learning encompasses a wide range of activities, but broadly speaking it can be used to: find trends in a dataset, classify data into groups or categories, make predictions based upon data, and even "learn" how to interact with an environment when provided with goals to achieve. -The techniques breakdown into two broad categories, predictors and classifiers. Predictors are used to predict a value (or set of value) given a set of inputs, for example trying to predict the cost of something given the economic conditions and the cost of raw materials or predicting a country's GDP given its life expectancy. Classifiers try to classify data into different categories, for example deciding what characters are visible in a picture of some writing or if a message is spam or not. +### Artificial intelligence vs machine learning +The term machine learning (ML) is often mentioned alongside artificial intelligence (AI) and deep learning (DL). Deep learning is a subset of machine learning, and machine learning is a subset of artificial intelligence. -## Training Data +AI is increasingly being used as a catch-all term to describe things that encompass ML and DL systems - from simple email spam filters, to more complex image recognition systems, to large language models such as ChatGPT. The more specific term "Artificial General Intelligence" (AGI) is used to describe a system possessing a "general intelligence" that can be applied to solve a diverse range of problems, often mimicking the behaviour of intelligent biological systems. Modern attempts at AGI are getting close to fooling humans, but while there have been great advances in AI research, human-like intelligence is only possible in a few specialist areas. -Many (but not all) machine learning systems "learn" by taking a series of input data and output data and using it to form a model. The maths behind the machine learning doesn't care what the data is as long as it can represented numerically or categorised. Some examples might include: +ML refers to techniques where a computer can "learn" patterns in data, usually by being shown many training examples. While ML algorithms can learn to solve specific problems, or multiple similar problems, they are not considered to possess a general intelligence. ML algorithms often need hundreds or thousands of examples to learn a task and are confined to activities such as simple classifications. A human-like system could learn much quicker than this, and potentially learn from a single example by using it's knowledge of many other problems. -* predicting a person's weight based on their height -* predicting commute times given traffic conditions -* predicting house prices given stock market prices -* classifying if an email is spam or not -* classifying what if an image contains a person or not +DL is a particular field of machine learning where algorithms called neural networks are used to create highly complex systems. Large collections of neural networks are able to learn from vast quantities of data. Deep learning can be used to solve a wide range of problems, but it can also require huge amounts of input data and computational resources to train. +The image below shows the relationships between artificial intelligence, machine learning and deep learning. -Typically we will need to train our models with hundreds, thousands or even millions of examples before they work well enough to do any useful predictions or classifications with them. +![An infographic showing some of the relationships between AI, ML, and DL](../fig/introduction/AI_ML_DL_differences.png) +The image above is by Tukijaaliwa, CC BY-SA 4.0, via Wikimedia Commons, original source -Some systems will do training as a one shot process which produces a model. Others might try to continuosly refine their training through the real use of the system and human feedback to it. For example every time you mark an email as spam or not spam you are probably contributing to further training of your spam filter's model. -### Types of output +### Machine learning in our daily lives -Predictors will usually involve a continuos scale of outputs, such as the price of something. Classifiers will tell you which class (or classes) are present in the data. For example a system to recognise hand writing from an input image will need to classify the output into one of a set of potential characters. +Machine learning has quickly become an important technology and is now frequently used to perform services we encounter in our daily lives. Here are just a few examples: +* Banks look for trends in transaction data to detect outliers that may be fraudulent +* Email inboxes use text to decide whether an email is spam or not, and adjust their rules based upon how we flag emails +* Travel apps use live and historic data to estimate traffic, travel times, and journey routes +* Retail companies and streaming services use data to recommend new content we might like based upon our demographic and historical preferences +* Image, object, and pattern recognition is used to identify humans and vehicles, capture text, generate subtitles, and much more +* Self-driving cars and robots use object detection and performance feedback to improve their interaction with the world -## Machine learning vs Artificial Intelligence +> ## Where else have you encountered machine learning already? +> Now that we have explored machine learning in a bit more detail, discuss with the person next to you: +> 1. Where else have I seen machine learning in use? +> 2. What kind of input data does that machine learning system use to make predictions/classifications? +> 3. Is there any evidence that your interaction with the system contributes to further training? +> 4. Do you have any examples of the system failing? +{: .challenge} -Artificial Intelligence often means a system with general intelligence, able to solve any problem. AI is a very broad term. ML systems are usually trained to work on a particular problem. But they can appear to "learn" but isn't a general intelligence that can solve anything a human could. They often need hundreds or thousands of examples to learn and are confined to relatively simple classifications. A human like system could learn from a single example. -Another definition of Artificial Intelligence dates back to the 1950s and Alan Turing's "Immitation Game". This said that we could consider a system intelligent when it could fool a human into thinking they were talking to another human when they were actually talking to a computer. Modern attempts at this are getting close to fooling humans, but we are still a very long way from a machine which has full human like intelligence. +### Limitations of machine learning -### Over Hyping of Artificial Intelligence and Machine Learning +Like any other systems machine learning has limitations, caveats, and "gotchas" to be aware of that may impact the accuracy and performance of a machine learning system. -There is a lot of hype around machine learning and artificial intelligence right now, while many real advances have been made a lot of people are overstating what can be achieved. Recent advances in computer hardware and machine learning algorithms have made it a lot more useful, but its been around over 50 years. +#### Garbage in = garbage out -The [Gartner Hype Cycle](https://www.gartner.com/en/research/methodologies/gartner-hype-cycle) looks at which technologies are being over-hyped. In the August 2018 analysis AI Platform as a service, Deep Learning chips, Deep learning neural networks, Conversational AI and Self Driving Cars are all shown near the "Peak of inflated expectations". +There is a classic expression in computer science: "garbage in = garbage out". This means that if the input data we use is garbage then the ouput will be too. If, for example, we try to use a machine learning system to find a link between two unlinked variables then it may well manage to produce a model attempting this, but the output will be meaningless. -![The Gartner Hype Cycle curve](https://upload.wikimedia.org/wikipedia/commons/9/94/Gartner_Hype_Cycle.svg) -[Image from Jeremy Kemp via Wikimedia](https://en.wikipedia.org/wiki/File:Gartner_Hype_Cycle.svg) +#### Biases due to training data -# Applications of machine learning +The performance of a ML system depends on the breadth and quality of input data used to train it. If the input data contains biases or blind spots then these will be reflected in the ML system. For example, if we collect data on public transport use from only high socioeconomic areas, the resulting input data may be biased due to a range of factors that may increase the likelihood of people from those areas using private transport vs public options. -## Machine learning in our daily lives +#### Extrapolation - * [Image recognition](https://www.youtube.com/watch?v=eve8DkkVdhI) - * [Object classification](https://www.youtube.com/watch?v=VOC3huqHrss) - * [Character recognition](https://www.youtube.com/watch?v=ocB8uDYXtt0) - * [Insurance payout predictions](https://www.youtube.com/watch?v=Q3vknDOy6Bs) - * [Crime prediction](https://www.youtube.com/watch?v=7Ly7yAzLDjA) +We can only make reliable predictions about data which is in the same range as our training data. If we try to extrapolate beyond the boundaries of the training data we cannot be confident in our results. As we shall see some algorithms are better suited (or less suited) to extrapolation than others. +#### Over fitting -## Example of machine learning in research - * [Classifying remote sensing images to find water.](https://pure.aber.ac.uk/portal/files/29140808/remotesensing_11_00593.pdf) - * [Looking for breast cancer in medical images](https://pure.aber.ac.uk/portal/files/28421096/08003418.pdf) - * [Predicting what cows are doing from GPS data](https://pure.aber.ac.uk/portal/files/6707587/JDS_DairyModel_Revised_2.docx) +Sometimes ML algorithms become over-trained and subsequently don't perform well when presented with real data. It's important to consider how many rounds of training a ML system has recieved and whether or not it may have become over-trained. +#### Inability to explain answers +Machine learning techniques will return an answer based on the input data and model parameters even if that answer is wrong. Most systems are unable to explain the logic used to arrive at that answer. This can make detecting and diagnosing problems difficult. -# Limitations of Machine Learning -## Garbage In = Garbage Out +# Getting started with Scikit-Learn -There is a classic expression in Computer Science, "Garbage In = Garbage Out". This means that if the input data we use is garbage then the ouput will be too. If for instance we try to get a machine learning system to find a link between two unlinked variables then it might still come up with a model that attempts this, but the output will be meaningless. +### About Scikit-Learn -## Bias or lacking training data +[Scikit-Learn](http://github.com/scikit-learn/scikit-learn) is a python package designed to give access to well-known machine learning algorithms within Python code, through a clean application programming interface (API). It has been built by hundreds of contributors from around the world, and is used across industry and academia. -Input data may also be lacking enough diversity to cover all examples. Due to how the data was obtained there might be biases in it that are then reflected in the ML system. For example if we collect data on crime reporting it could be biased towards wealthier areas where crimes are more likely to be reported. Historical data might not cover enough history. +Scikit-Learn is built upon Python's [NumPy (Numerical Python)](http://numpy.org) and [SciPy (Scientific Python)](http://scipy.org) libraries, which enable efficient in-core numerical and scientific computation within Python. As such, Scikit-Learn is not specifically designed for extremely large datasets, though there is [some work](https://github.com/ogrisel/parallel_ml_tutorial) in this area. For this introduction to ML we are going to stick to processing small to medium datasets with Scikit-Learn, without the need for a graphical processing unit (GPU). -## Extrapolation +Like any other Python package, we can import Scikit-Learn and check the package version using the following Python commands: -We can only make reliable predictions about data which is in the same range as our training data. If we try to extrapolate beyond what was covered in the training data we'll probably get wrong answers. +~~~ +import sklearn +print('scikit-learn:', sklearn.__version__) +~~~ +{: .language-python} -## Over fitting +### Representation of Data in Scikit-learn -Sometimes ML algorithms become over trained to their training data and struggle to work when presented with real data. In some cases it best not to train too many times. +Machine learning is about creating models from data: for that reason, we'll start by discussing how data can be represented in order to be understood by the computer. -## Inability to explain answers +Most machine learning algorithms implemented in scikit-learn expect data to be stored in a two-dimensional array or matrix. The arrays can be either numpy arrays, or in some cases scipy.sparse matrices. The size of the array is expected to be [n_samples, n_features] -Many machine learning techniques will give us an answer given some input data even if that answer is wrong. Most are unable to explain any kind of logic in arriving at that answer. This can make diagnosing and even detecting problems with them difficult. +We typically have a "Features Matrix" (usually referred to as the code variable `X`) which are the "features" data we wish to train on. -> ## Where have you encountered machine learning already? -> -> Discuss with the person next to you: -> -> 1. Where have I seen machine learning in use? -> 2. What kind of input data does that machine learning system use to make predictions/classifications? -> 3. Is there any evidence that your interaction with the system contributes to further training? -> 4. Do you have any examples of the system failing? -> -> Write your answers into the etherpad. -{: .challenge} +* n_samples: The number of samples. A sample can be a document, a picture, a sound, a video, an astronomical object, a row in database or CSV file, or whatever you can describe with a fixed set of quantitative traits. +* n_features: The number of features (variables) that can be used to describe each item in a quantitative manner. Features are generally real-valued, but may be boolean or discrete-valued in some cases. + +If we want our ML models to make predictions or classifications, we also provide "labels" as our expected "answers/results". The model will then be trained on the input features to try and match our provided labels. This is done by providing a "Target Array" (usually referred to as the code variable `y`) which contains the "labels or values" that we wish to predict using the features data. + +![Types of Machine Learning](../fig/introduction/sklearn_input.png) +Figure from the [Python Data Science Handbook](https://github.com/jakevdp/PythonDataScienceHandbook) + +# What will we cover today? + +This lesson will introduce you to some of the key concepts and sub-domains of ML such as supervised learning, unsupervised learning, and neural networks. + +The figure below provides a nice overview of some of the sub-domains of ML and the techniques used within each sub-domain. We recommend checking out the Scikit-Learn [webpage](https://scikit-learn.org/stable/index.html) for additional examples of the topics we will cover in this lesson. We will cover topics highlighted in blue: classical learning techniques such as regression, classification, clustering, and dimension reduction, as well as ensemble methods and a brief introduction to neural networks using perceptrons. + +![Types of Machine Learning](../fig/introduction/ML_summary.png) +[Image from Vasily Zubarev via their blog](https://vas3k.com/blog/machine_learning/) with modifications in blue to denote lesson content. {% include links.md %} diff --git a/_episodes/02-regression.md b/_episodes/02-regression.md index 55ecd5b..7fa0fd9 100644 --- a/_episodes/02-regression.md +++ b/_episodes/02-regression.md @@ -1,512 +1,422 @@ --- -title: "Regression" -teaching: 45 +title: "Supervised methods - Regression" +teaching: 90 exercises: 30 questions: -- "How can I make linear regression models from data?" -- "How can I use logarithmic regression to work with non-linear data?" +- "What is supervised learning?" +- "What is regression?" +- "How can I model data and make predictions using regression methods?" objectives: -- "Learn how to use linear regression to produce a model from data." -- "Learn how to model non-linear data using a logarithmic." -- "Learn how to measure the error between the original data and a linear model." +- "Apply linear regression with Scikit-Learn to create a model." +- "Measure the error between a regression model and input data." +- "Analyse and assess the accuracy of a linear model using Scikit-Learn's metrics library." +- "Understand how more complex models can be built with non-linear equations." +- "Apply polynomial modelling to non-linear data using Scikit-Learn." keypoints: -- "We can model linear data using a linear or least squares regression." -- "A linear regression model can be used to predict future values." -- "We should split up our training dataset and use part of it to test the model." -- "For non-linear data we can use logarithms to make the data linear." +- "Scikit-Learn is a Python library with lots of useful machine learning functions." +- "Scikit-Learn includes a linear regression function." +- "Scikit-Learn can perform polynomial regressions to model non-linear data." --- -# Linear regression +# Supervised learning -If we take two variable and graph them against each other we can look for relationships between them. Once this relationship is established we can use that to produce a model which will help us predict future values of one variable given the other. +Classical machine learning is often divided into two categories – supervised and unsupervised learning. -If the two variables form a linear relationship (a straight line can be drawn to link them) then we can create a linear equation to link them. This will be of the form y = m * x + c, where x is the variable we know, y is the variable we're calculating, m is the slope of the line linking them and c is the point at which the line crosses the y axis (where x = 0). +For the case of supervised learning we act as a "supervisor" or "teacher" for our ML algorithms by providing the algorithm with "labelled data" that contains example answers of what we wish the algorithm to achieve. -Using the Gapminder website we can graph all sorts of data about the development of different countries. Lets have a look at the change in [life expectancy over time in the United Kingdom](https://www.gapminder.org/tools/#$state$time$value=2018&showForecast:true&delay:100;&entities$filter$;&dim=geo;&marker$select@$geo=gbr&trailStartTime=1800;;&axis_x$which=time&domainMin:null&domainMax:null&zoomedMin=1800&zoomedMax=2018&scaleType=time&spaceRef:null;&axis_y$domainMin:null&domainMax:null&zoomedMin:1&zoomedMax:84.17&spaceRef:null;&size$domainMin:null&domainMax:null&extent@:0.022083333333333333&:0.4083333333333333;;&color$which=world_6region;;;&chart-type=bubbles). +For instance, if we wish to train our algorithm to distinguish between images of cats and dogs, we would provide our algorithm with images that have already been labelled as "cat" or "dog" so that it can learn from these examples. If we wished to train our algorithm to predict house prices over time we would provide our algorithm with example data of datetime values that are "labelled" with house prices. -Since around 1950 life expectancy appears to be increasing with a pretty straight line in other words a linear relationship. We can use this data to try and calculate a line of best fit that will attempt to draw a perfectly straight line through this data. One method we can use is called [linear regression or least square regression](https://www.mathsisfun.com/data/least-squares-regression.html). The linear regression will create a linear equation that minimises the average distance from the line of best fit to each point in the graph. It will calculate the values of m and c for a linear equation for us. We could do this manually, but lets use Python to do it for us. +Supervised learning is split up into two further categories: classification and regression. For classification the labelled data is discrete, such as the "cat" or "dog" example, whereas for regression the labelled data is continuous, such as the house price example. +In this episode we will explore how we can use regression to build a "model" that can be used to make predictions. -## Coding a linear regression with Python -This code will calculate a least squares or linear regression for us. -~~~ -def least_squares(data): - x_sum = 0 - y_sum = 0 - x_sq_sum = 0 - xy_sum = 0 - - # the list of data should have two equal length columns - assert len(data) == 2 - assert len(data[0]) == len(data[1]) - - n = len(data[0]) - # least squares regression calculation - for i in range(0, n): - x = int(data[0][i]) - y = data[1][i] - x_sum = x_sum + x - y_sum = y_sum + y - x_sq_sum = x_sq_sum + (x**2) - xy_sum = xy_sum + (x*y) - - m = ((n * xy_sum) - (x_sum * y_sum)) - m = m / ((n * x_sq_sum) - (x_sum ** 2)) - c = (y_sum - m * x_sum) / n - - print("Results of linear regression:") - print("x_sum=", x_sum, "y_sum=", y_sum, "x_sq_sum=", x_sq_sum, "xy_sum=", - xy_sum) - print("m=", m, "c=", c) - - return m, c -~~~ -{: .language-python} - -Lets test our code by using the example data from the mathsisfun link above. +# Regression -~~~ -x_data = [2,3,5,7,9] -y_data = [4,5,7,10,15] -least_squares([x_data,y_data]) -~~~ -{: .language-python} +Regression is a statistical technique that relates a dependent variable (a label in ML terms) to one or more independent variables (features in ML terms). A regression model attempts to describe this relation by fitting the data as closely as possible according to mathematical criteria. This model can then be used to predict new labelled values by inputting the independent variables into it. For example, if we create a house price model we can then feed in any datetime value we wish, and get a new house price value prediction. -We should get the following results: +Regression can be as simple as drawing a "line of best fit" through data points, known as linear regression, or more complex models such as polynomial regression, and is used routinely around the world in both industry and research. You may have already used regression in the past without knowing that it is also considered a machine learning technique! -~~~ -Results of linear regression: -x_sum= 26 y_sum= 41 x_sq_sum= 168 xy_sum= 263 -m= 1.5182926829268293 c= 0.30487804878048763 -~~~ -{: .output} +![Example of linear and polynomial regressions](../fig/regression_example.png) -### Testing the accuracy of a linear regression model +### Linear regression using Scikit-Learn -We now have a simple linear model for some data. It would be useful to test how accurate that model is. We can do this by computing the y value for every x value used in our original data and comparing the model's y value with the original. We can turn this into a single overall error number by calculating the root mean square (RMS), this squares each comparison, takes the sum of all of them, divides this by the number of items and finally takes the square root of that value. By squaring and square rooting the values we prevent negative errors from cancelling out positive ones. The RMS gives us an overall error number which we can then use to measure our model's accuracy with. The following code calculates RMS in Python. +We've had a lot of theory so time to start some actual coding! Let's create regression models for a small bundle of datasets known as [Anscombe's Quartet](https://en.wikipedia.org/wiki/Anscombe%27s_quartet). These datasets are available through the Python plotting library [Seaborn](https://seaborn.pydata.org/). Let's define our bundle of datasets, extract out the first dataset, and inspect it's contents: ~~~ -import math -def measure_error(data1, data2): +import seaborn as sns + +# Anscomes Quartet consists of 4 sets of data +data = sns.load_dataset("anscombe") +print(data.head()) - assert len(data1) == len(data2) - err_total = 0 - for i in range(0, len(data1)): - err_total = err_total + (data1[i] - data2[i]) ** 2 +# Split out the 1st dataset from the total +data_1 = data[data["dataset"]=="I"] +data_1 = data_1.sort_values("x") - err = math.sqrt(err_total / len(data1)) - return err +# Inspect the data +print(data_1.head()) ~~~ {: .language-python} +We see that the dataset bundle has the 3 columns `dataset`, `x`, and `y`. We have already used the `dataset` column to extract out Dataset I ready for our regression task. Let's visually inspect the data: -To calculate the RMS for the test data we just used we need to calculate the y coordinate for every x coordinate (2,3,5,7,9) that we had in the original data. +~~~ +import matplotlib.pyplot as plt +plt.scatter(data_1["x"], data_1["y"]) +plt.xlabel("x") +plt.ylabel("y") +plt.show() ~~~ -# get the m and c values from the least_squares function -m, c = least_squares([x_data, y_data]) +{: .language-python} -# create an empty list for the model y data -linear_data = [] +![Inspection of our dataset](../fig/regression_inspect.png) -for x in x_data: - y = m * x + c - # add the result to the linear_data list - linear_data.append(y) -# calculate the error -print(measure_error(y_data,linear_data)) -~~~ -{: .language-python} +In this regression example we will create a Linear Regression model that will try to predict `y` values based upon `x` values. -This will output an error of 0.7986268703523449, which means that on average the difference between our model and the real values is 0.7986268703523449. The less linear the data is the bigger this number will be. If the model perfectly matches the data then the value will be zero. +In machine learning terminology: we will use our `x` feature (variable) and `y` labels("answers") to train our Linear Regression model to predict `y` values when provided with `x` values. +The mathematical equation for a linear fit is `y = mx + c` where `y` is our label data, `x` is our input feature(s), `m` represents the gradient of the linear fit, and `c` represents the intercept with the y-axis. -### Graphing the data +A typical ML workflow is as following: +* Define the model (also known as an estimator) +* Tweak your data into the required format for your model +* Train your model on the input data +* Predict some values using the trained model +* Check the accuracy of the prediction, and visualise the result -To compare our model and data lets graph both of them using matplotlib. +We'll define functions for each of these steps so that we can quickly perform linear regressions on our data. First we'll define a function to pre-process our data into a format that Scikit-Learn can use. +~~~ +import numpy as np +def pre_process_linear(x, y): + # sklearn requires a 2D array, so lets reshape our 1D arrays. + x_data = np.array(x).reshape(-1, 1) + y_data = np.array(y).reshape(-1, 1) + return x_data, y_data ~~~ -import matplotlib.pyplot as plt +{: .language-python} -def calculate_linear(x_data, m, c): - linear_data = [] - for x in x_data: - y = m * x + c - #add the result to the linear_data list - linear_data.append(y) - return(linear_data) +Next we'll define a model, and train it on the pre-processed data. We'll also inspect the trained model parameters `m` and `c`: +~~~ +from sklearn.linear_model import LinearRegression -def make_graph(x_data, y_data, linear_data): - plt.plot(x_data, y_data, label="Original Data") - plt.plot(x_data, linear_data, label="Line of best fit") +def fit_a_linear_model(x_data, y_data): + # Define our estimator/model + model = LinearRegression(fit_intercept=True) - plt.grid() - plt.legend() + # train our estimator/model using our data + lin_regress = model.fit(x_data,y_data) - plt.show() + # inspect the trained estimator/model parameters + m = lin_regress.coef_ + c = lin_regress.intercept_ + print("linear coefs=",m, c) -x_data = [2,3,5,7,9] -y_data = [4,5,7,10,15] + return lin_regress +~~~ +{: .language-python} -m, c = least_squares([x_data, y_data]) -linear_data = calculate_linear(x_data, m, c) -make_graph(x_data, y_data, calculate_linear(x_data, m, c)) +Then we'll define a function to make predictions using our trained model, and calculate the Root Mean Squared Error (RMSE) of our predictions: +~~~ +import math +from sklearn.metrics import mean_squared_error +def predict_linear_model(lin_regress, x_data, y_data): + # predict some values using our trained estimator/model + # (in this case we predict our input data!) + linear_data = lin_regress.predict(x_data) + + # calculated a RMS error as a quality of fit metric + error = math.sqrt(mean_squared_error(y_data, linear_data)) + print("linear error=",error) + + # return our trained model so that we can use it later + return linear_data ~~~ {: .language-python} -![graph of the test regression data](../fig/regression_test_graph.png) - +Finally, we'll define a function to plot our input data, our linear fit, and our predictions: +~~~ +def plot_linear_model(x_data, y_data, predicted_data): + # visualise! + # Don't call .show() here so that we can add extra stuff to the figure later + plt.scatter(x_data, y_data, label="input") + plt.plot(x_data, predicted_data, "-", label="fit") + plt.plot(x_data, predicted_data, "rx", label="predictions") + plt.xlabel("x") + plt.ylabel("y") + plt.legend() +~~~ +{: .language-python} -### Predicting life expectancy +We will be training a few Linear Regression models in this episode, so let's define a handy function to combine input data processing, model creation, training our model, inspecting the trained model parameters `m` and `c`, make some predictions, and finally visualise our data. +~~~ +def fit_predict_plot_linear(x, y): + x_data, y_data = pre_process_linear(x, y) + lin_regress = fit_a_linear_model(x_data, y_data) + linear_data = predict_linear_model(lin_regress, x_data, y_data) + plot_linear_model(x_data, y_data, linear_data) -Now lets try and model some real data with linear regression. We'll use the [Gapminder Foundation's](http://www.gapminder.org) life expectancy data for this. Click [here](../data/gapminder-life-expectancy.csv) to download it. + return lin_regress +~~~ +{: .language-python} +Now we have defined our generic function to fit a linear regression we can call the function to train it on some data, and show the plot that was generated: ~~~ -# put this line at the top of the file -import pandas as pd +# just call the function here rather than assign. +# We don't need to reuse the trained model yet +fit_predict_plot_linear(data_1["x"], data_1["y"]) -def process_life_expectancy_data(filename, country, min_date, max_date): - df = pd.read_csv(filename, index_col="Life expectancy") +plt.show() +~~~ +{: .language-python} - # get the life expectancy for the specified country/dates - # we have to convert the dates to strings as pandas treats them that way - life_expectancy = df.loc[country, str(min_date):str(max_date)] - # create a list with the numerical range of min_date to max_date - # we could use the index of life_expectancy but it will be a string - # we need numerical data - x_data = list(range(min_date, max_date + 1)) +![Linear regression of dataset I](../fig/regress_linear.png) - # calculate line of best fit - m, c = least_squares([x_data, life_expectancy]) - linear_data = calculate_linear(x_data, m, c) +This looks like a reasonable linear fit to our first dataset. Thanks to our function we can quickly perform more linear regressions on other datasets. - error = measure_error(life_expectancy, linear_data) - print("error is ", error) +Let's quickly perform a new linear fit on the 2nd Anscombe dataset: - make_graph(x_data, life_expectancy, linear_data) +~~~ +data_2 = data[data["dataset"]=="II"] +fit_predict_plot_linear(data_2["x"],data_2["y"]) -process_life_expectancy_data("../data/gapminder-life-expectancy.csv", - "United Kingdom", 1950, 2010) +plt.show() ~~~ {: .language-python} +![Linear regression of dataset II](../fig/regress_linear_2nd.png) -> ## Modelling Life Expectancy -> -> Combine all the code above into a single Python file, save it into a directory called code. -> -> In the parent directory create another directory called data -> -> Download the file [https://scw-aberystwyth.github.io/machine-learning-novice/data/gapminder-life-expectancy.csv](https://scw-aberystwyth.github.io/machine-learning-novice/data/gapminder-life-expectancy.csv) into the data directory -> The full code from above is also available to download from [https://scw-aberystwyth.github.io/machine-learning-novice/code/linear_regression.py](https://scw-aberystwyth.github.io/machine-learning-novice/code/linear_regression.py) -> -> If you're using a Unix or Unix like environment the following commands will do this in your home directory: -> -> ~~~ -> cd ~ -> mkdir code -> mkdir data -> cd data -> wget https://scw-aberystwyth.github.io/machine-learning-novice/data/gapminder-life-expectancy.csv -> ~~~ -> {: .language-bash} -> -> Adjust the program to calculate the life expectancy for Germany between 1950 and 2000. What are the values (m and c) of linear equation linking date and life expectancy? -> > ## Solution -> > ~~~ -> > process_life_expectancy_data("../data/gapminder-life-expectancy.csv", "Germany", 1950, 2000) -> > ~~~ -> > {: .language-python} -> > -> > m= 0.212219909502 c= -346.784909502 -> {: .solution} -{: .challenge} - +It looks like our linear fit on Dataset II produces a nearly identical fit to the linear fit on Dataset I. Although our errors look to be almost identical our visual inspection tells us that Dataset II is probably not a linear correllation and we should try to make a different model. -> ## Predicting Life Expectancy -> Use the linear equation you've just created to predict life expectancy in Germany for every year between 2001 and 2016. How accurate are your answers? -> If you worked for a pension scheme would you trust your answers to predict the future costs for paying pensioners? +> ## Exercise: Repeat the linear regression excercise for Datasets III and IV. +> Adjust your code to repeat the linear regression for the other datasets. What can you say about the similarities and/or differences between the linear regressions on the 4 datasets? > > ## Solution > > ~~~ -> > for x in range(2001,2017): -> > print(x,0.212219909502 * x - 346.784909502) -> > ~~~ -> > {: .language-python} +> > # Repeat the following and adjust for dataset IV +> > data_3 = data[data["dataset"]=="III"] > > -> > Predicted answers: -> > ~~~ -> > 2001 77.86712941150199 -> > 2002 78.07934932100403 -> > 2003 78.29156923050601 -> > 2004 78.503789140008 -> > 2005 78.71600904951003 -> > 2006 78.92822895901202 -> > 2007 79.140448868514 -> > 2008 79.35266877801604 -> > 2009 79.56488868751802 -> > 2010 79.77710859702 -> > 2011 79.98932850652199 -> > 2012 80.20154841602402 -> > 2013 80.41376832552601 -> > 2014 80.62598823502799 -> > 2015 80.83820814453003 -> > 2016 81.05042805403201 -> > ~~~ -> > {: .output} +> > fit_predict_plot_linear(data_3["x"],data_3["y"]) > > -> > Compare with the real values: -> > ~~~ -> > df = pd.read_csv('../data/gapminder-life-expectancy.csv',index_col="Life expectancy") -> > for x in range(2001,2017): -> > y = 0.215621719457 * x - 351.935837103 -> > real = df.loc['Germany', str(x)] -> > print(x, "Predicted", y, "Real", real, "Difference", y-real) +> > plt.show() > > ~~~ > > {: .language-python} > > -> > ~~~ -> > 2001 Predicted 77.86712941150199 Real 78.4 Difference -0.532870588498 -> > 2002 Predicted 78.07934932100403 Real 78.6 Difference -0.520650678996 -> > 2003 Predicted 78.29156923050601 Real 78.8 Difference -0.508430769494 -> > 2004 Predicted 78.503789140008 Real 79.2 Difference -0.696210859992 -> > 2005 Predicted 78.71600904951003 Real 79.4 Difference -0.68399095049 -> > 2006 Predicted 78.92822895901202 Real 79.7 Difference -0.771771040988 -> > 2007 Predicted 79.140448868514 Real 79.9 Difference -0.759551131486 -> > 2008 Predicted 79.35266877801604 Real 80.0 Difference -0.647331221984 -> > 2009 Predicted 79.56488868751802 Real 80.1 Difference -0.535111312482 -> > 2010 Predicted 79.77710859702 Real 80.3 Difference -0.52289140298 -> > 2011 Predicted 79.98932850652199 Real 80.5 Difference -0.510671493478 -> > 2012 Predicted 80.20154841602402 Real 80.6 Difference -0.398451583976 -> > 2013 Predicted 80.41376832552601 Real 80.7 Difference -0.286231674474 -> > 2014 Predicted 80.62598823502799 Real 80.7 Difference -0.074011764972 -> > 2015 Predicted 80.83820814453003 Real 80.8 Difference 0.03820814453 -> > 2016 Predicted 81.05042805403201 Real 80.9 Difference 0.150428054032 -> > ~~~ -> > {: .output} -> > Answers are between 0.15 years over and 0.77 years under the reality. -> > If this was being used in a pension scheme it might lead to a slight under prediction of life expectancy and cost the pension scheme a little more than expected. -> {: .solution} -{: .challenge} - -> ## Predicting Historical Life Expectancy -> -> Now change your program to measure life expectancy in Canada between 1890 and 1914. Use the resulting m and c values to predict life expectancy in 1918. How accurate is your answer? -> If your answer was inaccurate, why was it inaccurate? What does this tell you about extrapolating models like this? -> -> > ## Solution -> > ~~~ -> > process_life_expectancy_data("../data/gapminder-life-expectancy.csv", "Canada", 1890, 1914) -> > ~~~ -> > {: .language-python} +> > ![Linear regression of dataset III](../fig/regress_linear_3rd.png) +> > ![Linear regression of dataset IV](../fig/regress_linear_4th.png) +> > The 4 datasets all produce very similar linear regression fit parameters (`m` and `c`) and RMSEs despite visual differences in the 4 datasets. > > -> > ~~~ -> > m = 0.369807692308 c = -654.215830769 -> > ~~~ -> > {: .output} +> > This is intentional as the Anscombe Quartet is designed to produce near identical basic statistical values such as means and standard deviations. > > -> > ~~~ -> > print(1918 * 0.369807692308 -654.215830769) -> > ~~~ -> > {: .language-python} -> > The predicted age is 55.0753 but the actual age is 47.17. This is inaccurate due to WW1 and the subsequent flu epidemic. Major events can produce trends that we've not seen before (or not for a long time), our models struggle to take account of things they've never seen. -> > Even if we look back to 1800, the earliest date we have data for we never see a sudden drop in life expectancy like the 1918 one. +> > While the trained model parameters and errors are near identical, our visual inspection tells us that a linear fit might not be the best way of modelling all of these datasets. > {: .solution} {: .challenge} -# Logarithmic Regression -We've now seen how we can use linear regression to make a simple model and use that to predict values, but what do we do when the relationship between the data isn't linear? +## Polynomial regression using Scikit-Learn -As an example lets take the relationship between income (GDP per Capita) and life expectancy. The gapminder website will [graph](https://www.gapminder.org/tools/#$state$time$value=2017&showForecast:true&delay:206.4516129032258;&entities$filter$;&dim=geo;&marker$axis_x$which=life_expectancy_years&domainMin:null&domainMax:null&zoomedMin:45&zoomedMax:84.17&scaleType=linear&spaceRef:null;&axis_y$which=gdppercapita_us_inflation_adjusted&domainMin:null&domainMax:null&zoomedMin:115.79&zoomedMax:144246.37&spaceRef:null;&size$domainMin:null&domainMax:null&extent@:0.022083333333333333&:0.4083333333333333;;&color$which=world_6region;;;&chart-type=bubbles) this for us. +Now that we have learnt how to do a linear regression it's time look into polynomial regressions. Polynomial functions are non-linear functions that are commonly-used to model data. Mathematically they have `N` degrees of freedom and they take the following form `y = a + bx + cx^2 + dx^3 ... + mx^N` -> ## Logarithms Introduction -> Logarithms are the inverse of an exponent (raising a number by a power). -> ``` -> log b(a) = c -> b^c = a -> ``` -> For example: -> ``` -> 2^5 = 32 -> log 2(32) = 5 -> ``` -> If you need more help on logarithms see the [Khan Academy's page](https://www.khanacademy.org/math/algebra2/exponential-and-logarithmic-functions/introduction-to-logarithms/a/intro-to-logarithms) -{: .callout} +If we have a polynomial of degree N=1 we once again return to a linear equation `y = a + bx` or as it is more commonly written `y = mx + c`. Let's create a polynomial regression using N=2. +In Scikit-Learn this is done in two steps. First we pre-process our input data `x_data` into a polynomial representation using the `PolynomialFeatures` function. Then we can create our polynomial regressions using the `LinearRegression().fit()` function, but this time using the polynomial representation of our `x_data`. -The relationship between these two variables clearly isn't linear. But there is a trick we can do to make the data appear to be linear, we can take the logarithm of the Y axis (the GDP) by clicking on the arrow on the left next to GDP/capita and choosing log. [This graph](https://www.gapminder.org/tools/#$state$time$value=2017&showForecast:true&delay:206.4516129032258;&entities$filter$;&dim=geo;&marker$axis_x$which=life_expectancy_years&domainMin:null&domainMax:null&zoomedMin:45&zoomedMax:84.17&scaleType=linear&spaceRef:null;&axis_y$which=gdppercapita_us_inflation_adjusted&domainMin:null&domainMax:null&zoomedMin:115.79&zoomedMax:144246.37&scaleType=log&spaceRef:null;&size$domainMin:null&domainMax:null&extent@:0.022083333333333333&:0.4083333333333333;;&color$which=world_6region;;;&chart-type=bubbles) now appears to be linear. +~~~ +from sklearn.preprocessing import PolynomialFeatures +def pre_process_poly(x, y): + # sklearn requires a 2D array, so lets reshape our 1D arrays. + x_data = np.array(x).reshape(-1, 1) + y_data = np.array(y).reshape(-1, 1) -## Coding a logarithmic regression + # create a polynomial representation of our data + poly_features = PolynomialFeatures(degree=2) + x_poly = poly_features.fit_transform(x_data) -### Downloading the data + return x_poly, x_data, y_data -Download the GDP data from [http://scw-aberystwyth.github.io/machine-learning-novice/data/worldbank-gdp.csv](http://scw-aberystwyth.github.io/machine-learning-novice/data/worldbank-gdp.csv) -### Loading the data +def fit_poly_model(x_poly, y_data): + # Define our estimator/model(s) + poly_regress = LinearRegression() -We need to modify our code a little to work with this example. Firstly the data is now stored in two different files so we'll have to read both of them and combine them together. The two datasets don't quite have an identical list of countries, the life expectancy data is from gapminder themselves and includes French Overseas Departments and British Overseas Territories as seperate entities, it also includes Taiwan. The GDP data is from the World Bank and doesn't differentiate many of the overseas territories/departments and doesn't include Taiwan. Some countries are also lacking GDP data, life expectancy or both. When we load the data we'll have to discard any country which doesn't have valid data in both datasets. Missing data is marked as an NaN (not a number), when loading it we'll have to check for NaN's using the `math.isnan()` function. + # define and train our model + poly_regress.fit(x_poly,y_data) -To match the analysis we just did on the gapminder website we only want to focus on a single year, so we'll filter the data down to a single year which the user can specify. + # inspect trained model parameters + poly_m = poly_regress.coef_ + poly_c = poly_regress.intercept_ + print("poly_coefs",poly_m, poly_c) -Finally the data is sorted in the files by country name, but to help with graphing it later on we need to sort it by life expectancy instead. For this we can use Pandas `sort_values()` function to do this. + return poly_regress -~~~ -def read_data(gdp_file, life_expectancy_file, year): - df_gdp = pd.read_csv(gdp_file, index_col="Country Name") - - gdp = df_gdp.loc[:, year] - - df_life_expt = pd.read_csv(life_expectancy_file, - index_col="Life expectancy") - - # get the life expectancy for the specified country/dates - # we have to convert the dates to strings as pandas treats them that way - life_expectancy = df_life_expt.loc[:, year] - - data = [] - for country in life_expectancy.index: - if country in gdp.index: - # exclude any country where data is unknown - if (math.isnan(life_expectancy[country]) is False) and \ - (math.isnan(gdp[country]) is False): - data.append((country, life_expectancy[country], - gdp[country])) - else: - print("Excluding ", country, ",NaN in data (life_exp = ", - life_expectancy[country], "gdp=", gdp[country], ")") - else: - print(country, "is not in the GDP country data") - - combined = pd.DataFrame.from_records(data, columns=("Country", - "Life Expectancy", "GDP")) - combined = combined.set_index("Country") - # we'll need sorted data for graphing properly later on - combined = combined.sort_values("Life Expectancy") - return combined + +def predict_poly_model(poly_regress, x_poly, y_data): + # predict some values using our trained estimator/model + # (in this case - our input data) + poly_data = poly_regress.predict(x_poly) + + poly_error = math.sqrt(mean_squared_error(y_data, poly_data)) + print("poly error=", poly_error) + + return poly_data + + +def plot_poly_model(x_data, poly_data): + # visualise! + plt.plot(x_data, poly_data, label="poly fit") + plt.legend() + + +def fit_predict_plot_poly(x, y): + # Combine all of the steps + x_poly, x_data, y_data = pre_process_poly(x, y) + poly_regress = fit_poly_model(x_poly, y_data) + poly_data = predict_poly_model(poly_regress, x_poly, y_data) + plot_poly_model(x_data, poly_data) + + return poly_regress ~~~ {: .language-python} -### Processing the data +Lets plot our input dataset II, linear model, and polynomial model together, as well as compare the errors of the linear and polynomial fits. -Once the data is loaded we'll need to convert the GDP data to its logarithmic form by using the `math.log()` function. Pandas has a special function called `apply` which can apply an operation to every item in a column, by using the statement `data["GDP"].apply(math.log)` it will calculate the logarithmic form of every value in the GDP column and turn it into a new dataframe. We'll convert the data into two lists to simplify working with it, these can be used by the least_squares, make_graph and measure_error functions. +~~~ +# Sort our data in order of our x (feature) values +data_2 = data[data["dataset"]=="II"] +data_2 = data_2.sort_values("x") -Once we've calculated the line of best fit with the least_squares function we can graph it. But now we have two choices on how to do the graphing, we can either leave the data in its logarithmic form and draw a straight line of best fit. Or we could convert it back to its original form with the `math.exp()` function and graph the curved line of best fit. To allow us to do either we'll calculate both forms of the line of best fit and store them in the lists linear_data and log_data. +fit_predict_plot_linear(data_2["x"],data_2["y"]) +fit_predict_plot_poly(data_2["x"],data_2["y"]) +plt.show() ~~~ -def process_data(gdp_file, life_expectancy_file, year): - data = read_data(gdp_file, life_expectancy_file, year) +{: .language-python} - gdp = data["GDP"].tolist() - gdp_log = data["GDP"].apply(math.log).tolist() - life_exp = data["Life Expectancy"].tolist() +![Comparison of the regressions of our dataset](../fig/regress_both.png) - m, c = least_squares([life_exp, gdp_log]) +Comparing the plots and errors it seems like a polynomial regression of N=2 is a far superior fit to Dataset II than a linear fit. In fact, it looks like our polynomial fit almost perfectly fits Dataset II... which is because Dataset II is created from a N=2 polynomial equation! + +> ## Exercise: Perform and compare linear and polynomial fits for Datasets I, III, and IV. +> Which performs better for each dataset? Modify your polynomial regression function to take `N` as an input parameter to your regression model. How does changing the degree of polynomial fit affect each dataset? +> > ## Solution +> > ~~~ +> > for ds in ["I","II","III","IV"]: +> > # Sort our data in order of our x (feature) values +> > data_ds = data[data["dataset"]==ds] +> > data_ds = data_ds.sort_values("x") +> > fit_predict_plot_linear(data_ds["x"],data_ds["y"]) +> > fit_predict_plot_poly(data_ds["x"],data_ds["y"]) +> > +> > plt.show() +> > ~~~ +> > {: .language-python} +> > +> > The `N=2` polynomial fit is far better for Dataset II. According to the RMSE the polynomial is a slightly better fit for Datasets I and III, however it could be argued that a linear fit is good enough. +> > Dataset III looks like a linear relation that has a single outlier, rather than a truly non-linear relation. The polynomial and linear fits perform just as well (or poorly) on Dataset IV. +> > For Dataset IV it looks like `y` may be a better estimator of `x`, than `x` is at estimating `y`. +> > ~~~ +> > def fit_poly_model(x_poly, y_data, N): +> > # Define our estimator/model(s) +> > poly_features = PolynomialFeatures(degree=N) +> > # ... +> > ~~~ +> > {: .language-python} +> > +> > and +> > ~~~ +> > for ds in ["I","II","III","IV"]: +> > # Sort our data in order of our x (feature) values +> > data_ds = data[data["dataset"]==ds] +> > data_ds = data_ds.sort_values("x") +> > fit_predict_plot_linear(data_ds["x"],data_ds["y"]) +> > for N in range(2,11): +> > print("Polynomial degree =",N) +> > fit_predict_plot_poly(data_ds["x"],data_ds["y"],N) +> > plt.show() +> > ~~~ +> > {: .language-python} +> > +> > With a large enough polynomial you can fit through every point with a unique `x` value. +> > Datasets II and IV remain unchanged beyond `N=2` as the polynomial has converged (dataset II) or cannot model the data (Dataset IV). +> > Datasets I and III slowly decrease their RMSE and N is increased, but it is likely that these more complex models are overfitting the data. Overfitting is discussed later in the lesson. +> {: .solution} +{: .challenge} - # list for logarithmic version - log_data = [] - # list for raw version - linear_data = [] - for x in life_exp: - y_log = m * x + c - log_data.append(y_log) - y = math.exp(y_log) - linear_data.append(y) +## Let's explore a more realistic scenario - # uncomment for log version, further changes needed in make_graph too - # make_graph(life_exp, gdp_log, log_data) - make_graph(life_exp, gdp, linear_data) +Now that we have some convenient Python functions to perform quick regressions on data it's time to explore a more realistic regression modelling scenario. - err = measure_error(linear_data, gdp) - print("error=", err) +Let's start by loading in and examining a new dataset from Seaborn: a penguin dataset containing a few hundred samples and a number of features and labels. +~~~ +dataset = sns.load_dataset("penguins") +dataset.head() ~~~ {: .language-python} - -A small change to the least_squares function is needed to handle this data. Previously we were working with dates on the x-axis and these were all strings which the least_squares function converted into integers. Now we have life expectancy on the x-axis and that data is already floats, so we need to remove the conversion to integers. Lets change the line ```x = int(data[0][1]``` in our least_squares function to ```x = data[0][1]```. - +We can see that we have seven columns in total: 4 continuous (numerical) columns named `bill_length_mm`, `bill_depth_mm`, `flipper_length_mm`, and `body_mass_g`; and 3 discrete (categorical) columns named `species`, `island`, and `sex`. We can also see from a quick inspection of the first 5 samples that we have some missing data in the form of `NaN` values. Let's go ahead and remove any rows that contain `NaN` values: ~~~ -def least_squares(data): - x_sum = 0 - y_sum = 0 - x_sq_sum = 0 - xy_sum = 0 - - # the list of data should have two equal length columns - assert len(data) == 2 - assert len(data[0]) == len(data[1]) - - n = len(data[0]) - # least squares regression calculation - for i in range(0, n): - x = data[0][i] - y = data[1][i] - x_sum = x_sum + x - y_sum = y_sum + y - x_sq_sum = x_sq_sum + (x**2) - xy_sum = xy_sum + (x*y) - - m = ((n * xy_sum) - (x_sum * y_sum)) - m = m / ((n * x_sq_sum) - (x_sum ** 2)) - c = (y_sum - m * x_sum) / n - - print("Results of linear regression:") - print("x_sum=", x_sum, "y_sum=", y_sum, "x_sq_sum=", x_sq_sum, "xy_sum=", - xy_sum) - print("m=", m, "c=", c) - - return m, c +dataset.dropna(inplace=True) +dataset.head() ~~~ {: .language-python} -Finally to run everything we need to call the process_data function, this takes three parameters, the GDP filename, the life expectancy filename and the year we want to process as a string. +Now that we have cleaned our data we can try and predict a penguins bill depth using their body mass. In this scenario we will train a linear regression model using `body_mass_g` as our feature data and `bill_depth_mm` as our label data. We will train our model on a subset of the data by slicing the first 146 samples of our cleaned data. We will then use our regression function to train and plot our model. ~~~ -process_data("../data/worldbank-gdp.csv", - "../data/gapminder-life-expectancy.csv", "1980") +dataset_1 = dataset[:146] + +x_data = dataset_1["body_mass_g"] +y_data = dataset_1["bill_depth_mm"] + +trained_model = fit_predict_plot_linear(x_data, y_data) + +plt.xlabel("mass g") +plt.ylabel("depth mm") +plt.show() ~~~ {: .language-python} +![Comparison of the regressions of our dataset](../fig/regress_penguin_lin.png) -### Graphing the data +Congratulations! We've taken our linear regression function and quickly created and trained a new linear regression model on a brand new dataset. Note that this time we have returned our model from the regression function and assigned it to the variable `trained_model`. We can now use this model to predict `bill_depth_mm` values for any given `body_mass_g` values that we pass it. -Previously we drew a line graph showing life expectancy over time. This made sense as a line as it was tracking a single variable over time. But now we are plotting two variables against each other and need to use a scatter graph instead, so we'll change the first `plt.plot` call to `plt.scatter`. +Let's provide the model with all of the penguin samples and visually inspect how the linear regression model performs. ~~~ -def make_graph(x_data, y_data, linear_data): - plt.scatter(x_data, y_data, label="Original Data") - plt.plot(x_data, linear_data, color="orange", label="Line of best fit") +x_data_all, y_data_all = pre_process_linear(dataset["body_mass_g"], dataset["bill_depth_mm"]) - plt.grid() - plt.legend() +y_predictions = predict_linear_model(trained_model, x_data_all, y_data_all) - plt.show() +plt.scatter(x_data_all, y_data_all, label="all data") +plt.scatter(x_data, y_data, label="training data") + +plt.plot(x_data_all, y_predictions, label="fit") +plt.plot(x_data_all, y_predictions, "rx", label="predictions") + +plt.xlabel("mass g") +plt.ylabel("depth mm") +plt.legend() +plt.show() ~~~ {: .language-python} -The process_data function gave us a choice of plotting either the logarithmic or non-logarithmic version of the data depending on which data we pass to make_graph. If we uncomment the line `# make_graph(life_exp, gdp_log, log_data)` and comment the line `make_graph(life_exp, gdp, linear_data)` then we can switch to showing the logarithmic version. - +![Comparison of the regressions of our dataset](../fig/regress_penguin_lin_tot.png) -> ## Comparing the logarithmic and non-logarithmic graphs -> -> Convert the code above to plot the logarithmic version of the graph. -> Save the graph. -> Now change back to the non-logarithmic version. -> Compare the two graphs, which one do you think is easier to read? -{: .challenge} +Oh dear. It looks like our linear regression fits okay for our subset of the penguin data, and a few additional samples, but there appears to be a cluster of points that are poorly predicted by our model. +> ## This is a classic Machine Learning scenario known as over-fitting +> We have trained our model on a specific set of data, and our model has learnt to reproduce those specific answers at the expense of creating a more generally-applicable model. +> Over fitting is the ML equivalent of learning an exam papers mark scheme off by heart, rather than understanding and answering the questions. +{: .callout} -> ## Removing outliers from the data -> The correlation of GDP and life expectancy has a few big outliers that are probably increasing the error rate on this model. These are typically countries with very high GDP and sometimes not very high life expectancy. These tend to be either small countries with artificially high GDPs such as Monaco and Luxemborg or oil rich countries such as Qatar or Brunei. Kuwait, Qatar and Brunei have already been removed from this data set, but are available in the file worldbank-gdp-outliers.csv. Try experimenting with adding and removing some of these high income countries to see what effect it has on your model's error rate. -> Do you think its a good idea to remove these outliers from your model? -> How might you do this automatically? -{: .challenge} +Perhaps our model is too simple? Perhaps our data is more complex than we thought? Perhaps our question/goal needs adjusting? Let's explore the penguin dataset in more depth in the next section! {% include links.md %} diff --git a/_episodes/03-classification.md b/_episodes/03-classification.md new file mode 100644 index 0000000..f5ae9c3 --- /dev/null +++ b/_episodes/03-classification.md @@ -0,0 +1,313 @@ +--- +title: "Supervised methods - Classification" +teaching: 60 +exercises: 0 +questions: +- "How can I classify data into known categories?" +objectives: +- "Use two different supervised methods to classify data." +- "Learn about the concept of hyper-parameters." +- "Learn to validate and ?cross-validate? models" +keypoints: +- "Classification requires labelled data (is supervised)" +--- + +# Classification + +Classification is a supervised method to recognise and group data objects into a pre-determined categories. Where regression uses labelled observations to predict a continuous numerical value, classification predicts a discrete categorical fit to a class. Classification in ML leverages a wide range of algorithms to classify a set of data/datasets into their respective categories. + +In this episode we are going to introduce the concept of supervised classification by classifying penguin data into different species of penguins using Scikit-Learn. + +## The penguins dataset +We're going to be using the penguins dataset of Allison Horst, published [here](https://github.com/allisonhorst/palmerpenguins), The dataset contains 344 size measurements for three penguin species (Chinstrap, Gentoo and Adélie) observed on three islands in the Palmer Archipelago, Antarctica. + +![*Artwork by @allison_horst*](../fig/palmer_penguins.png) + +The physical attributes measured are flipper length, beak length, beak width, body mass, and sex. +![*Artwork by @allison_horst*](../fig/culmen_depth.png) + +In other words, the dataset contains 344 rows with 7 features i.e. 5 physical attributes, species and the island where the observations were made. + +~~~ +import seaborn as sns + +dataset = sns.load_dataset('penguins') +dataset.head() +~~~ +{: .language-python} + +Our aim is to develop a classification model that will predict the species of a penguin based upon measurements of those variables. + +As a rule of thumb for ML/DL modelling, it is best to start with a simple model and progressively add complexity in order to meet our desired classification performance. + +For this lesson we will limit our dataset to only numerical values such as bill_length, bill_depth, flipper_length, and body_mass while we attempt to classify species. + +The above table contains multiple categorical objects such as species. If we attempt to include the other categorical fields, island and sex, we might hinder classification performance due to the complexity of the data. + +### Preprocessing our data + +Lets do some pre-processing on our dataset and specify our `X` features and `y` labels: + +~~~ +# Extract the data we need +feature_names = ['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g'] +dataset.dropna(subset=feature_names, inplace=True) + +class_names = dataset['species'].unique() + +X = dataset[feature_names] +y = dataset['species'] +~~~ +{: .language-python} + +Having extracted our features `X` and labels `y`, we can now split the data using the `train_test_split` function. + +## Training-testing split +When undertaking any machine learning project, it's important to be able to evaluate how well your model works. + +Rather than evaluating this manually we can instead set aside some of our training data, usually 20% of our training data, and use these as a testing dataset. We then train on the remaining 80% and use the testing dataset to evaluate the accuracy of our trained model. + +We lose a bit of training data in the process, But we can now easily evaluate the performance of our model. With more advanced test-train split techniques we can even recover this lost training data! + +> ## Why do we do this? +> It's important to do this early, and to do all of your work with the training dataset - this avoids any risk of you introducing bias to the model based on your own manual observations of data in the testing set (afterall, we want the model to make the decisions about parameters!). This can also highlight when you are over-fitting on your training data. +{: .callout} + +How we split the data into training and testing sets is also extremely important. We need to make sure that our training data is representitive of both our test data and actual data. + +For classification problems this means we should ensure that each class of interest is represented proportionately in both training and testing sets. For regression problems we should ensure that our training and test sets cover the range of feature values that we wish to predict. + +In the previous regression episode we created the penguin training data by taking the first 146 samples our the dataset. Unfortunately the penguin data is sorted by species and so our training data only considered one type of penguin and thus was not representitive of the actual data we tried to fit. We could have avoided this issue by randomly shuffling our penguin samples before splitting the data. + +> ## When not to shuffle your data +> Sometimes your data is dependant on it's ordering, such as time-series data where past values influence future predictions. Creating train-test splits for this can be tricky at first glance, but fortunately there are existing techniques to tackle this (often called stratification): See [Scikit-Learn](https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation-iterators) for more information. +{: .callout} + + We specify the fraction of data to use as test data, and the function randomly shuffles our data prior to splitting: + +~~~ +from sklearn.model_selection import train_test_split + +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) +~~~ +{: .language-python} + +We'll use `X_train` and `y_train` to develop our model, and only look at `X_test` and `y_test` when it's time to evaluate its performance. + +### Visualising the data +In order to better understand how a model might classify this data, we can first take a look at the data visually, to see what patterns we might identify. + +~~~ +import matplotlib.pyplot as plt + +fig01 = sns.scatterplot(X_train, x=feature_names[0], y=feature_names[1], hue=dataset['species']) +plt.show() +~~~ +{: .language-python} + +![Visualising the penguins dataset](../fig/e3_penguins_vis.png) + +As there are four measurements for each penguin, we need quite a few plots to visualise all four dimensions against each other. Here is a handy Seaborn function to do so: + +~~~ +sns.pairplot(dataset, hue="species") +plt.show() +~~~ +{: .language-python} + +![Visualising the penguins dataset](../fig/pairplot.png) + +We can see that penguins from each species form fairly distinct spatial clusters in these plots, so that you could draw lines between those clusters to delineate each species. This is effectively what many classification algorithms do. They use the training data to delineate the observation space, in this case the 4 measurement dimensions, into classes. When given a new observation, the model finds which of those class areas the new observation falls in to. + + +## Classification using a decision tree +We'll first apply a decision tree classifier to the data. Decisions trees are conceptually similar to flow diagrams (or more precisely for the biologists: dichotomous keys). They split the classification problem into a binary tree of comparisons, at each step comparing a measurement to a value, and moving left or right down the tree until a classification is reached. + +![Decision tree for classifying penguins](../fig/decision_tree_example.png) + + +Training and using a decision tree in Scikit-Learn is straightforward: +~~~ +from sklearn.tree import DecisionTreeClassifier, plot_tree + +clf = DecisionTreeClassifier(max_depth=2) +clf.fit(X_train, y_train) + +clf.predict(X_test) +~~~ +{: .language-python} + +> ## Hyper-parameters: parameters that tune a model +> 'Max Depth' is an example of a *hyper-parameter* for the decision tree model. Where models use the parameters of an observation to predict a result, hyper-parameters are used to tune how a model works. Each model you encounter will have its own set of hyper-parameters, each of which affects model behaviour and performance in a different way. The process of adjusting hyper-parameters in order to improve model performance is called hyper-parameter tuning. +{: .callout} + +We can conveniently check how our model did with the .score() function, which will make predictions and report what proportion of them were accurate: + +~~~ +clf_score = clf.score(X_test, y_test) +print(clf_score) +~~~ +{: .language-python} + +Our model reports an accuracy of ~98% on the test data! We can also look at the decision tree that was generated: + +~~~ +fig = plt.figure(figsize=(12, 10)) +plot_tree(clf, class_names=class_names, feature_names=feature_names, filled=True, ax=fig.gca()) +plt.show() +~~~ +{: .language-python} + +![Decision tree for classifying penguins](../fig/e3_dt_2.png) + +The first first question (`depth=1`) splits the training data into "Adelie" and "Gentoo" categories using the criteria `flipper_length_mm <= 206.5`, and the next two questions (`depth=2`) split the "Adelie" and "Gentoo" categories into "Adelie & Chinstrap" and "Gentoo & Chinstrap" predictions. + + + + + +### Visualising the classification space +We can visualise the classification space (decision tree boundaries) to get a more intuitive feel for what it is doing.Note that our 2D plot can only show two parameters at a time, so we will quickly visualise by training a new model on only 2 features: + +~~~ +from sklearn.inspection import DecisionBoundaryDisplay + +f1 = feature_names[0] +f2 = feature_names[3] + +clf = DecisionTreeClassifier(max_depth=2) +clf.fit(X_train[[f1, f2]], y_train) + +d = DecisionBoundaryDisplay.from_estimator(clf, X_train[[f1, f2]]) + +sns.scatterplot(X_train, x=f1, y=f2, hue=y_train, palette="husl") +plt.show() +~~~ +{: .language-python} + +![Classification space for our decision tree](../fig/e3_dt_space_2.png) + +## Tuning the `max_depth` hyperparameter + +Our decision tree using a `max_depth=2` is fairly simple and there are still some incorrect predictions in our final classifications. Let's try varying the `max_depth` hyperparameter to see if we can improve our model predictions. + + + +~~~ +import pandas as pd + +max_depths = [1, 2, 3, 4, 5] + +accuracy = [] +for i, d in enumerate(max_depths): + clf = DecisionTreeClassifier(max_depth=d) + clf.fit(X_train, y_train) + acc = clf.score(X_test, y_test) + + accuracy.append((d, acc)) + +acc_df = pd.DataFrame(accuracy, columns=['depth', 'accuracy']) + +sns.lineplot(acc_df, x='depth', y='accuracy') +plt.xlabel('Tree depth') +plt.ylabel('Accuracy') +plt.show() +~~~ +{: .language-python} + +![Performance of decision trees of various depths](../fig/e3_dt_overfit.png) + +Here we can see that a `max_depth=2` performs slightly better on the test data than those with `max_depth > 2`. This can seem counter intuitive, as surely more questions should be able to better split up our categories and thus give better predictions? + +Let's reuse our fitting and plotting codes from above to inspect a decision tree that has `max_depth=5`: + +~~~ +clf = DecisionTreeClassifier(max_depth=5) +clf.fit(X_train, y_train) + +fig = plt.figure(figsize=(12, 10)) +plot_tree(clf, class_names=class_names, feature_names=feature_names, filled=True, ax=fig.gca()) +plt.show() +~~~ +{: .language-python} + +![Simplified decision tree](../fig/e3_dt_6.png) + +It looks like our decision tree has split up the training data into the correct penguin categories and more accurately than the `max_depth=2` model did, however it used some very specific questions to split up the penguins into the correct categories. Let's try visualising the classification space for a more intuitive understanding: +~~~ +f1 = feature_names[0] +f2 = feature_names[3] + +clf = DecisionTreeClassifier(max_depth=5) +clf.fit(X_train[[f1, f2]], y_train) + +d = DecisionBoundaryDisplay.from_estimator(clf, X_train[[f1, f2]]) + +sns.scatterplot(X_train, x=f1, y=f2, hue=y_train, palette='husl') +plt.show() +~~~ +{: .language-python} + +![Classification space of the simplified decision tree](../fig/e3_dt_space_6.png) + +Earlier we saw that the `max_depth=2` model split the data into 3 simple bounding boxes, whereas for `max_depth=5` we see the model has created some very specific classification boundaries to correctly classify every point in the training data. + +This is a classic case of over-fitting - our model has produced extremely specific parameters that work for the training data but are not representitive of our test data. Sometimes simplicity is better! + + +## Classification using support vector machines +Next, we'll look at another commonly used classification algorithm, and see how it compares. Support Vector Machines (SVM) work in a way that is conceptually similar to your own intuition when first looking at the data. They devise a set of hyperplanes that delineate the parameter space, such that each region contains ideally only observations from one class, and the boundaries fall between classes. + +### Normalising data +Unlike decision trees, SVMs require an additional pre-processing step for our data. We need to normalise it. Our raw data has parameters with different magnitudes such as bill length measured in 10's of mm's, whereas body mass is measured in 1000's of grams. If we trained an SVM directly on this data, it would only consider the parameter with the greatest variance (body mass). + +Normalising maps each parameter to a new range so that it has a mean of 0 and a standard deviation of 1. + +~~~ +from sklearn import preprocessing +import pandas as pd + +scalar = preprocessing.StandardScaler() +scalar.fit(X_train) +X_train_scaled = pd.DataFrame(scalar.transform(X_train), columns=X_train.columns, index=X_train.index) +X_test_scaled = pd.DataFrame(scalar.transform(X_test), columns=X_test.columns, index=X_test.index) +~~~ +{: .language-python} + +Note that we fit the scalar to our training data - we then use this same pre-trained scalar to transform our testing data. + +With this scaled data, training the models works exactly the same as before. + +~~~ +from sklearn import svm + +SVM = svm.SVC(kernel='poly', degree=3, C=1.5) +SVM.fit(X_train_scaled, y_train) + +svm_score = SVM.score(X_test_scaled, y_test) +print("Decision tree score is ", clf_score) +print("SVM score is ", svm_score) +~~~ +{: .language-python} + +We can again visualise the decision space produced, also using only two parameters: + +~~~ +x2 = X_train_scaled[[feature_names[0], feature_names[1]]] + +SVM = svm.SVC(kernel='poly', degree=3, C=1.5) +SVM.fit(x2, y_train) + +DecisionBoundaryDisplay.from_estimator(SVM, x2) #, ax=ax +sns.scatterplot(x2, x=feature_names[0], y=feature_names[1], hue=dataset['species']) +plt.show() +~~~ +{: .language-python} + +![Classification space generated by the SVM model](../fig/e3_svc_space.png) + +While this SVM model performs slightly worse than our decision tree (95.6% vs. 98.5%), it's likely that the non-linear boundaries will perform better when exposed to more and more real data, as decision trees are prone to overfitting and requires complex linear models to reproduce simple non-linear boundaries. It's important to pick a model that is appropriate for your problem and data trends! \ No newline at end of file diff --git a/_episodes/03-introducing-sklearn.md b/_episodes/03-introducing-sklearn.md deleted file mode 100644 index e1ae741..0000000 --- a/_episodes/03-introducing-sklearn.md +++ /dev/null @@ -1,277 +0,0 @@ ---- -title: "Introducing Scikit Learn" -teaching: 15 -exercises: 20 -questions: -- "How can I use scikit-learn to process data?" -objectives: -- "Recall that scikit-learn has built in linear regression functions." -- "Measure the error between a regression model and real data." -- "Apply scikit-learn's linear regression to create a model." -- "Analyse and assess the accuracy of a linear model using scikit-learn's metrics library." -- "Understand that more complex models can be built with non-linear equations." -- "Apply scikit-learn's polynomial modelling to non-linear data." -keypoints: -- "Scikit Learn is a Python library with lots of useful machine learning functions." -- "Scikit Learn includes a linear regression function." -- "It also includes a polynomial modelling function which is useful for modelling non-linear data." ---- - - -SciKit Learn (also known as sklearn) is an open source machine learning library for Python which has a very wide range of machine learning algorithms. It makes it very easy for a Python programmer to use machine learning techniques without having to implement them. - -## Linear Regression with scikit-learn - -Lets adapt our linear regression program to use scikit-learn instead of our own regression function. We can go and remove the least_squares and measure_error functions from our code. We'll save this under a different filename to the original linear regression code so that we can compare the answers of the two, they should be identical. - -First lets add the import for sklearn, we're also going to need the numpy library so we'll import that too: - -~~~ -import numpy as np -import sklearn.linear_model as skl_lin -~~~ -{: .language-python} - - -Now lets replace the calculation with our own least_squares function with the one from scikit-learn. The scikit-learn regression function is much more capable than the simple one we wrote earlier and is designed for datasets where multiple parameters are used, its expecting to be given multi-demnsional arrays data. To get it to accept single dimension data such as we have we need to convert the array to a numpy one and use numpy's reshape function. The resulting data is also designed to show us multiple coefficients and intercepts, so these values will be arrays, since we've just got one parameter we can just grab the first item from each of these arrays. Instead of manually calculating the results we can now use scikit-learn's predict function. Finally lets calculate the error. scikit-learn doesn't provide a root mean squared error function, but it does provide a mean squared error function. We can calculate the root mean squared error simply by taking the square root of the output of this function. The mean_squared_error function is part of the scikit-learn metrics module, so we'll have to add that to our imports at the top of the file: - -~~~ -import sklearn.metrics as skl_metrics -~~~ -{: .language-python} - - -Lets go ahead and change the process_data function for life expectancy to use scikit-learn's LinearRegression function instead of our own version. - -~~~ -import pandas as pd -import math -def process_life_expectancy_data(filename, country, min_date, max_date): - df = pd.read_csv(filename, index_col="Life expectancy") - - # get the life expectancy for the specified country/dates - # we have to convert the dates to strings as pandas treats them that way - life_expectancy = df.loc[country, str(min_date):str(max_date)] - x_data = list(range(min_date, max_date + 1)) - - x_data_arr = np.array(x_data).reshape(-1, 1) - life_exp_arr = np.array(life_expectancy).reshape(-1, 1) - - regression = skl_lin.LinearRegression().fit(x_data_arr, life_exp_arr) - - m = regression.coef_[0][0] - c = regression.intercept_[0] - - # old manual version - #linear_data = calculate_linear(x_data, m, c) - - # new scikit learn version - linear_data = regression.predict(x_data_arr) - - # old manual version - #error = measure_error(life_expectancy, linear_data) - - # new scikit learn version - error = math.sqrt(skl_metrics.mean_squared_error(life_exp_arr, linear_data)) - print("error=", error) - - # uncomment to make the graph - #make_graph(life_exp, gdp, linear_data) - -process_life_expectancy_data("../data/gapminder-life-expectancy.csv", - "United Kingdom", 1950, 2016) -~~~ -{: .language-python} - - -Now if we go ahead and run the new program we should get the same answers and same graph as before. - - -> ## Comparing the Scikit learn and our own linear regression implementations. -> Adjust both the original program (using our own linear regression implementation) and the sklearn version to calculate the life expectancy for Germany between 1950 and 2000. What are the values (m and c) of linear equation -> linking date and life expectancy? Are they the same in both? -> > ## Solution -> > ~~~ -> > process_life_expectancy_data("../data/gapminder-life-expectancy.csv", "Germany", 1950, 2000) -> > ~~~ -> > {: .language-python} -> > -> > m= 0.212219909502 c= -346.784909502 -> > They should be identical -> {: .solution} -{: .challenge} - - -> ## Predicting Life Expectancy -> Use the linear equation you've just created to predict life expectancy in Germany for every year between 2001 and 2016. How accurate are your answers? -> If you worked for a pension scheme would you trust your answers to predict the future costs for paying pensioners? -> > ## Solution -> > ~~~ -> > for x in range(2001,2017): -> > print(x,0.212219909502 * x - 346.784909502) -> > ~~~ -> > {: .language-python} -> > -> > Predicted answers: -> > ~~~ -> > 2001 77.86712941150199 -> > 2002 78.07934932100403 -> > 2003 78.29156923050601 -> > 2004 78.503789140008 -> > 2005 78.71600904951003 -> > 2006 78.92822895901202 -> > 2007 79.140448868514 -> > 2008 79.35266877801604 -> > 2009 79.56488868751802 -> > 2010 79.77710859702 -> > 2011 79.98932850652199 -> > 2012 80.20154841602402 -> > 2013 80.41376832552601 -> > 2014 80.62598823502799 -> > 2015 80.83820814453003 -> > 2016 81.05042805403201 -> > ~~~ -> > Compare with the real values: -> > ~~~ -> > df = pd.read_csv('../data/gapminder-life-expectancy.csv',index_col="Life expectancy") -> > for x in range(2001,2017): -> > y = 0.215621719457 * x - 351.935837103 -> > real = df.loc['Germany', str(x)] -> > print(x, "Predicted", y, "Real", real, "Difference", y-real) -> > ~~~ -> > {: .language-python} -> > -> > ~~~ -> > 2001 Predicted 77.86712941150199 Real 78.4 Difference -0.532870588498 -> > 2002 Predicted 78.07934932100403 Real 78.6 Difference -0.520650678996 -> > 2003 Predicted 78.29156923050601 Real 78.8 Difference -0.508430769494 -> > 2004 Predicted 78.503789140008 Real 79.2 Difference -0.696210859992 -> > 2005 Predicted 78.71600904951003 Real 79.4 Difference -0.68399095049 -> > 2006 Predicted 78.92822895901202 Real 79.7 Difference -0.771771040988 -> > 2007 Predicted 79.140448868514 Real 79.9 Difference -0.759551131486 -> > 2008 Predicted 79.35266877801604 Real 80.0 Difference -0.647331221984 -> > 2009 Predicted 79.56488868751802 Real 80.1 Difference -0.535111312482 -> > 2010 Predicted 79.77710859702 Real 80.3 Difference -0.52289140298 -> > 2011 Predicted 79.98932850652199 Real 80.5 Difference -0.510671493478 -> > 2012 Predicted 80.20154841602402 Real 80.6 Difference -0.398451583976 -> > 2013 Predicted 80.41376832552601 Real 80.7 Difference -0.286231674474 -> > 2014 Predicted 80.62598823502799 Real 80.7 Difference -0.074011764972 -> > 2015 Predicted 80.83820814453003 Real 80.8 Difference 0.03820814453 -> > 2016 Predicted 81.05042805403201 Real 80.9 Difference 0.150428054032 -> > ~~~ -> {: .solution} -{: .challenge} - - -## Other types of regression - -Linear regression obviously has its limits for working with data that isn't linear. Scikit-learn has a number of other regression techniques -which can be used on non-linear data. Some of these (such as isotonic regression) will only interpolate data in the range of the training -data and can't extrapolate beyond it. One non-linear technique that works with many types of data is polynomial regression. This creates a polynomial -equation of the form y = a + bx + cx^2 + dx^3 etc. The more terms we add to the polynomial the more accurately we can model a system. - -Scikit-learn includes a polynomial modelling tool as part of its pre-processing library which we'll need to add to our list of imports. - -~~~ -import sklearn.preprocessing as skl_pre -~~~ -{: .language-python} - - -Now lets modify the `process_life_expectancy_data` function to calculate the polynomial. This takes two parts, the first is to pre-process the data into polynomial form. We first call the PolynomialFeatures function with the parameter degree. The degree parameter controls how many components the polynomial will have, a polynomial of the form y = a + bx + cx^2 + dx^3 has 4 degrees. Typically a value between 5 and 10 is sufficient. We must then process the numpy array that we used for the X axis in the linear regression to convert it into a set of polynomial features. - -This only gets us halfway to being able to create a model that we can use for predictions. To form the complete model we actually have to perform a linear regression on the polynomial model, but we'll use the polynomial features as the X axis instead of the numpy array. The Y axis will still be the life expectancy numpy array that we used before. The resulting model can now be used to make some predictions like we did before using the predict function. - -If we want to draw the line of best fit we can pass the polynomial features in as a parameter to predict() and this will generate the y values for the full range of our data. This can be plotted by passing it to make_graph in place of the linear data. - - -Finally we can make some predictions of future data. Lets create a list containing the date range we'd like to predict, as with other lists/arrays we've used we'll have to reshape it to make scikit-learn work with it. -Now lets use this list of dates to predict life expectancy using both our linear and polynomial models. - -~~~ -def process_life_expectancy_data_poly(filename, country, min_date, max_date): - df = pd.read_csv(filename, index_col="Life expectancy") - - # get the life expectancy for the specified country/dates - # we have to convert the dates to strings as pandas treats them that way - life_expectancy = df.loc[country, str(min_date):str(max_date)] - x_data = list(range(min_date, max_date + 1)) - - x_data_arr = np.array(x_data).reshape(-1, 1) - life_exp_arr = np.array(life_expectancy).reshape(-1, 1) - - polynomial_features = skl_pre.PolynomialFeatures(degree=5) - x_poly = polynomial_features.fit_transform(x_data_arr) - - polynomial_model = skl_lin.LinearRegression().fit(x_poly, life_exp_arr) - - polynomial_data = polynomial_model.predict(x_poly) - - #make_graph(x_data, life_expectancy, polynomial_data) - - # make some predictions - predictions_x = list(range(2011,2025)) - predictions_x_arr = np.array(predictions_x).reshape(-1, 1) - - predictions_polynomial = polynomial_model.predict(polynomial_features.fit_transform(predictions_x_arr)) - plt.plot(x_data, life_expectancy, label="Original Data") - plt.plot(predictions_x, predictions_polynomial, label="Polynomial Prediction") - plt.grid() - plt.legend() - plt.show() -~~~ -{: .language-python} - - -To measure the error lets calculate the RMS error on both the linear and polynomial data. - -~~~ -def process_life_expectancy_data_poly(filename, country, min_date, max_date): - df = pd.read_csv(filename, index_col="Life expectancy") - - # get the life expectancy for the specified country/dates - # we have to convert the dates to strings as pandas treats them that way - life_expectancy = df.loc[country, str(min_date):str(max_date)] - x_data = list(range(min_date, max_date + 1)) - - x_data_arr = np.array(x_data).reshape(-1, 1) - life_exp_arr = np.array(life_expectancy).reshape(-1, 1) - - polynomial_features = skl_pre.PolynomialFeatures(degree=5) - x_poly = polynomial_features.fit_transform(x_data_arr) - - polynomial_model = skl_lin.LinearRegression().fit(x_poly, life_exp_arr) - - polynomial_data = polynomial_model.predict(x_poly) - - polynomial_error = math.sqrt( - skl_metrics.mean_squared_error(life_exp_arr, polynomial_data)) - print("polynomial error is", polynomial_error) - -process_life_expectancy_data_poly("../data/gapminder-life-expectancy.csv", - "United Kingdom", 1950, 2016) - -process_life_expectancy_data("../data/gapminder-life-expectancy.csv", - "United Kingdom", 1950, 2016) -~~~ -{: .language-python} - - -> ## Exercise: Comparing linear and polynomial models -> Train a linear and polynomial model on life expectancy data from China between 1960 and 2000. Then predict life expectancy from 2001 to 2016 using both methods. Compare their root mean squared errors, which is more accurate? Why do you think this model is the more accurate one? -> > ## Solution -> > modify the call to the process_life_expectancy_data -> > ~~~ -> > process_life_expectancy_data_poly("../data/gapminder-life-expectancy.csv", "China", 1960, 2000) -> > ~~~ -> > {: .language-python} -> > -> > linear prediction error is 5.385162846665607 -> > polynomial prediction error is 28.169167771983528 -> > The linear model is more accurate, polynomial models often become wildly inaccurate beyond the range they were trained on. Look at the predicted life expectancies, the polynomial model predicts a life expectancy of 131 by 2016! -> > ![China 1960-2000](../fig/polynomial_china_training.png) -> > ![China 2001-2016 predictions](../fig/polynomial_china_overprediction.png) -> {: .solution} -{: .challenge} - -{% include links.md %} diff --git a/_episodes/04-ensemble-methods.md b/_episodes/04-ensemble-methods.md new file mode 100644 index 0000000..10631aa --- /dev/null +++ b/_episodes/04-ensemble-methods.md @@ -0,0 +1,356 @@ +--- +title: "Ensemble methods" +teaching: 90 +exercises: 30 +questions: +- "What are ensemble methods?" +- "What are random forests?" +- "How can we stack estimators in sci-kit learn?" +objectives: +- "Learn about applying ensemble methods in scikit-learn." +- "Understand why ensemble methods are useful." +keypoints: +- "Ensemble methods can be used to reduce under/over fitting training data." +--- + +# Ensemble methods + +What's better than one decision tree? Perhaps two? or three? How about enough trees to make up a forest? +Ensemble methods bundle individual models together and use each of their outputs to contribute towards a final consensus for a given problem. Ensemble methods are based on the mantra that the whole is greater than the sum of the parts. + +Thinking back to the classification episode with decision trees we quickly stumbled into the problem of overfitting our training data. If we combine predictions from a series of over/under fitting estimators then we can often produce a better final prediction than using a single reliable model - in the same way that humans often hear multiple opinions on a scenario before deciding a final outcome. Decision trees and regressions are often very sensitive to training outliers and so are well suited to be a part of an ensemble. + +Ensemble methods are used for a variety of applciations including, but not limited to, search systems and object detection. We can use any model/estimator available in sci-kit learn to create an ensemble. There are three main methods to create ensembles approaches: + +* Stacking +* Bagging +* Boosting + +Let's explore them in a bit more depth. + +### Stacking + +This is where we train a series of different models/estimators on the same input data in parallel. We then take the output of each model and pass them into a final decision algorithm/model that makes the final prediction. + +If we trained the same model multiple times on the same data we would expect very similar answers, and so the emphasis with stacking is to choose different models that can be used to build up a reliable concensus. Regression is then typically a good choice for the final decision-making model. + +![Stacking](../fig/stacking.jpeg) + +[Image from Vasily Zubarev via their blog](https://vas3k.com/blog/machine_learning/) + +### Bagging (a.k.a [Bootstrap AGGregatING](https://en.wikipedia.org/wiki/Bootstrap_aggregating) ) + +This is where we use the same model/estimator and fit it on different subsets of the training data. We can then average the results from each model to produce a final prediction. The subsets are random and may even repeat themselves. + +The most common example is known as the Random Forest algorithm, which we'll take a look at later on. Random Forests are typically used as a faster, computationally cheaper alternative to Neural Networks, which is ideal for real-time applications like camera face detection prompts. + +![Stacking](../fig/bagging.jpeg) + +[Image from Vasily Zubarev via their blog](https://vas3k.com/blog/machine_learning/) + +### Boosting + +This is where we train a single type of Model/estimator on an initial dataset, test it's accuracy, and then subsequently train the same type of models on poorly predicted samples i.e. each new model pays most attention to data that were incorrectly predicted by the last one. + +Just like for bagging, boosting is trained mostly on subsets, however in this case these subsets are not randomly generated but are instead built using poorly estimated predictions. Boosting can produce some very high accuracies by learning from it's mistakes, but due to the iterative nature of these improvements it doesn't parallelize well unlike the other ensemble methods. Despite this it can still be a faster, and computationally cheaper alternative to Neural Networks. + +![Stacking](../fig/boosting.jpeg) + +[Image from Vasily Zubarev via their blog](https://vas3k.com/blog/machine_learning/) + +### Ensemble summary + +Machine learning jargon can often be hard to remember, so here is a quick summary of the 3 ensemble methods: + +* Stacking - same dataset, different models, trained in parallel +* Bagging - different subsets, same models, trained in parallel +* Boosting - subsets of bad estimates, same models, trained in series + +## Using Bagging (Random Forests) for a classification problem + +In this session we'll take another look at the penguins data and applying one of the most common bagging approaches, random forests, to try and solve our species classification problem. First we'll load in the dataset and define a train and test split. + +~~~ +# import libraries +import numpy as np +import pandas as pd +import seaborn as sns +from sklearn.model_selection import train_test_split + +# load penguins data +penguins = sns.load_dataset('penguins') + +# prepare and define our data and targets +feature_names = ['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g'] +penguins.dropna(subset=feature_names, inplace=True) + +species_names = penguins['species'].unique() + +X = penguins[feature_names] +y = penguins.species + +# Split data in training and test set +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5) + +print("train size:", X_train.shape) +print("test size", X_test.shape) +~~~ +{: .language-python} + +We'll now take a look how we can use ensemble methods to perform a classification task such as identifying penguin species! We're going to use a Random forest classifier available in scikit-learn which is a widely used example of a bagging approach. + +Random forests are built on decision trees and can provide another way to address over-fitting. Rather than classifying based on one single decision tree (which could overfit the data), an average of results of many trees can be derived for more robust/accurate estimates compared against single trees used in the ensemble. + +![Random Forests](../fig/randomforest.png) + +[Image from Venkatak Jagannath](https://commons.wikimedia.org/wiki/File:Random_forest_diagram_complete.png) + +We can now define a random forest estimator and train it using the penguin training data. We have a similar set of attritbutes to the DecisionTreeClassifier but with an extra parameter called n_estimators which is the number of trees in the forest. + +~~~ +from sklearn.ensemble import RandomForestClassifier +from sklearn.tree import plot_tree + +# Define our model +# extra parameter called n_estimators which is number of trees in the forest +# a leaf is a class label at the end of the decision tree +forest = RandomForestClassifier(n_estimators=100, max_depth=7, min_samples_leaf=1) + +# train our model +forest.fit(X_train, y_train) + +# Score our model +print(forest.score(X_test, y_test)) +~~~ +{: .language-python} + +You might notice that we have a different value (hopefully increased) compared with the decision tree classifier used above on the same training data. Lets plot the first 5 trees in the forest to get an idea of how this model differs from a single decision tree. + +~~~ +import matplotlib.pyplot as plt + +fig, axes = plt.subplots(nrows=1, ncols=5 ,figsize=(12,6)) + +# plot first 5 trees in forest +for index in range(0, 5): + plot_tree(forest.estimators_[index], + class_names=species_names, + feature_names=feature_names, + filled=True, + ax=axes[index]) + + axes[index].set_title(f'Tree: {index}') + +plt.show() +~~~ +{: .language-python} + +![random forest trees](../fig/rf_5_trees.png) + +We can see the first 5 (of 100) trees that were fitted as part of the forest. + +If we train the random forest estimator using the same two parameters used to plot the classification space for the decision tree classifier what do we think the plot will look like? + +~~~ +# lets train a random forest for only two features (body mass and bill length) +from sklearn.inspection import DecisionBoundaryDisplay +f1 = feature_names[0] +f2 = feature_names[3] + +# plot classification space for body mass and bill length with random forest +forest_2d = RandomForestClassifier(n_estimators=100, max_depth=7, min_samples_leaf=1, random_state=5) +forest_2d.fit(X_train[[f1, f2]], y_train) + +# Lets plot the decision boundaries made by the model for the two trained features +d = DecisionBoundaryDisplay.from_estimator(forest_2d, X_train[[f1, f2]]) + +sns.scatterplot(X_train, x=f1, y=f2, hue=y_train, palette="husl") +plt.show() +~~~ +{: .language-python} + +![random forest clf space](../fig/EM_rf_clf_space.png) + +There is still some overfitting indicated by the regions that contain only single points but using the same hyper-parameter settings used to fit the decision tree classifier, we can see that overfitting is reduced. + +## Stacking a regression problem + +We've had a look at a bagging approach, but we'll now take a look at a stacking approach and apply it to a regression problem. We'll also introduce a new dataset to play around with. + +### California house price prediction +The California housing dataset for regression problems contains 8 training features such as, Median Income, House Age, Average Rooms, Average Bedrooms etc. for 20,640 properties. The target variable is the median house value for those 20,640 properties, note that all prices are in units of $100,000. This toy dataset is available as part of the [scikit learn library](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html). We'll start by loading the dataset to very briefly inspect the attributes by printing them out. + +~~~ +import sklearn +from sklearn.datasets import fetch_california_housing + +# load the dataset +X, y = fetch_california_housing(return_X_y=True, as_frame=True) + +## All price variables are in units of $100,000 +print(X.shape) +print(X.head()) + +print("Housing price as the target: ") + +## Target is in units of $100,000 +print(y.head()) +print(y.shape) +~~~ +{: .language-python} + +For the the purposes of learning how to create and use ensemble methods and since it is a toy dataset, we will blindly use this dataset without inspecting it, cleaning or pre-processing it further. + +> ## Exercise: Investigate and visualise the dataset +> For this episode we simply want to learn how to build and use an Ensemble rather than actually solve a regression problem. To build up your skills as an ML practitioner, investigate and visualise this dataset. What can you say about the dataset itself, and what can you summarise about about any potential relationships or prediction problems? +{: .challenge} + +Lets start by splitting the dataset into training and testing subsets: + +~~~ +# split into train and test sets, We are selecting an 80%-20% train-test split. +from sklearn.model_selection import train_test_split + +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5) + +print(f'train size: {X_train.shape}') +print(f'test size: {X_test.shape}') +~~~ +{: .language-python} + +Lets stack a series of regression models. In the same way the RandomForest classifier derives a results from a series of trees, we will combine the results from a series of different models in our stack. This is done using what's called an ensemble meta-estimator called a VotingRegressor. + +We'll apply a Voting regressor to a random forest, gradient boosting and linear regressor. + +Lets stack a series of regression models. In the same way the RandomForest classifier derives a results from a series of trees, we will combine the results from a series of different models in our stack. This is done using what's called an ensemble meta-estimator called a VotingRegressor. + +We'll apply a Voting regressor to a random forest, gradient boosting and linear regressor. + +> ## But wait, aren't random forests/decision tree for classification problems? +> Yes they are, but quite often in machine learning various models can be used to solve both regression and classification problems. +> +> Decision trees in particular can be used to "predict" specific numerical values instead of categories, essentially by binning a group of values into a single value. +> +> This works well for periodic/repeating numerical data. These trees are extremely sensitive to the data they are trained on, which makes them a very good model to use as a Random Forest. +{: .callout} + +> ## But wait again, isn't a random forest (and a gradient boosting model) an ensemble method instead of a regression model? +> Yes they are, but they can be thought of as one big complex model used like any other model. The awesome thing about ensemble methods, and the generalisation of Scikit-Learn models, is that you can put an ensemble in an ensemble! +{: .callout} + +A VotingRegressor can train several base estimators on the whole dataset, and it can take the average of the individual predictions to form a final prediction. + +~~~ +from sklearn.ensemble import ( + GradientBoostingRegressor, + RandomForestRegressor, + VotingRegressor, +) +from sklearn.linear_model import LinearRegression + +# Initialize estimators +rf_reg = RandomForestRegressor(random_state=5) +gb_reg = GradientBoostingRegressor(random_state=5) +linear_reg = LinearRegression() +voting_reg = VotingRegressor([("rf", rf_reg), ("gb", gb_reg), ("lr", linear_reg)]) + +# fit/train voting estimator +voting_reg.fit(X_train, y_train) + +# lets also fit/train the individual models for comparison +rf_reg.fit(X_train, y_train) +gb_reg.fit(X_train, y_train) +linear_reg.fit(X_train, y_train) +~~~ +{: .language-python} + +We fit the voting regressor in the same way we would fit a single model. When the voting regressor is instantiated we pass it a parameter containing a list of tuples that contain the estimators we wish to stack: in this case the random forest, gradient boosting and linear regressors. To get a sense of what this is doing lets predict the first 20 samples in the test portion of the data and plot the results. + +~~~ +import matplotlib.pyplot as plt + +# make predictions +X_test_20 = X_test[:20] # first 20 for visualisation + +rf_pred = rf_reg.predict(X_test_20) +gb_pred = gb_reg.predict(X_test_20) +linear_pred = linear_reg.predict(X_test_20) +voting_pred = voting_reg.predict(X_test_20) + +plt.figure() +plt.plot(gb_pred, "o", color="black", label="GradientBoostingRegressor") +plt.plot(rf_pred, "o", color="blue", label="RandomForestRegressor") +plt.plot(linear_pred, "o", color="green", label="LinearRegression") +plt.plot(voting_pred, "x", color="red", ms=10, label="VotingRegressor") + +plt.tick_params(axis="x", which="both", bottom=False, top=False, labelbottom=False) +plt.ylabel("predicted") +plt.xlabel("training samples") +plt.legend(loc="best") +plt.title("Regressor predictions and their average") + +plt.show() +~~~ +{: .language-python} + +![Regressor predictions and average from stack](../fig/house_price_voting_regressor.svg) + +Finally, lets see how the average compares against each single estimator in the stack? + +~~~ +print(f'random forest: {rf_reg.score(X_test, y_test)}') + +print(f'gradient boost: {gb_reg.score(X_test, y_test)}') + +print(f'linear regression: {linear_reg.score(X_test, y_test)}') + +print(f'voting regressor: {voting_reg.score(X_test, y_test)}') +~~~ +{: .language-python} + +Each of our models score between 0.61-0.82, which at the high end is good, but at the low end is a pretty poor prediction accuracy score. Do note that the toy datasets are not representative of real world data. However what we can see is that the stacked result generated by the voting regressor fits different sub-models and then averages the individual predictions to form a final prediction. The benefit of this approach is that, it reduces overfitting and increases generalizability. Of course, we could try and improve our accuracy score by tweaking with our indivdual model hyperparameters, using more advaced boosted models or adjusting our training data features and train-test-split data. + +> ## Exercise: Stacking a classification problem. +> Scikit learn also has method for stacking ensemble classifiers ```sklearn.ensemble.VotingClassifier``` do you think you could apply a stack to the penguins dataset using a random forest, SVM and decision tree classifier, or a selection of any other classifier estimators available in sci-kit learn? +> +> ~~~ +> penguins = sns.load_dataset('penguins') +> +> feature_names = ['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g'] +> penguins.dropna(subset=feature_names, inplace=True) +> +> species_names = penguins['species'].unique() +> +> # Define data and targets +> X = penguins[feature_names] +> +> y = penguins.species +> +> # Split data in training and test set +> from sklearn.model_selection import train_test_split +> +> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5) +> +> print(f'train size: {X_train.shape}') +> print(f'test size: {X_test.shape}') +> ~~~ +> {: .language.python} +> +> The code above loads the penguins data and splits it into test and training portions. Have a play around with stacking some classifiers using the ```sklearn.ensemble.VotingClassifier``` using the code comments below as a guide. +> +> ~~~ +> # import classifiers +> +> # instantiate classifiers +> +> # fit classifiers +> +> # instantiate voting classifier and fit data +> +> # make predictions +> +> # compare scores +> ~~~ +> {: .language.python} +> +{: .challenge} diff --git a/_episodes/04-clustering.md b/_episodes/05-clustering.md similarity index 50% rename from _episodes/04-clustering.md rename to _episodes/05-clustering.md index f473464..4d548d7 100644 --- a/_episodes/04-clustering.md +++ b/_episodes/05-clustering.md @@ -1,65 +1,111 @@ --- -title: "Clustering with Scikit Learn" -teaching: 15 -exercises: 20 +title: "Unsupervised methods - Clustering" +teaching: 30 +exercises: 30 questions: +- "What is unsupervised learning?" - "How can we use clustering to find data points with similar attributes?" objectives: +- "Understand the difference between supervised and unsupervised learning" - "Identify clusters in data using k-means clustering." -- "See the limitations of k-means when clusters overlap." +- "Understand the limitations of k-means when clusters overlap." - "Use spectral clustering to overcome the limitations of k-means." keypoints: -- "Clustering is a form of unsupervised learning" -- "Unsupervised learning algorithms don't need training" +- "Clustering is a form of unsupervised learning." +- "Unsupervised learning algorithms don't need training." - "Kmeans is a popular clustering algorithm." -- "Kmeans struggles where one cluster exists within another, such as concentric circles." -- "Spectral clustering is another technique which can overcome some of the limitations of Kmeans." +- "Kmeans is less useful when one cluster exists within another, such as concentric circles." +- "Spectral clustering can overcome some of the limitations of Kmeans." - "Spectral clustering is much slower than Kmeans." -- "As well as providing machine learning algorithms scikit learn also has functions to make example data" +- "Scikit-Learn has functions to create example data." --- +# Unsupervised learning + +In episode 2 we learnt about supervised learning. Now it is time to explore unsupervised learning. + +Sometimes we do not have the luxury of using labelled data. This could be for a number of reasons: + +* We have labelled data, but not enough to accurately train our model +* Our existing labelled data is low-quality or innacurate +* It is too time-consuming to (manually) label more data +* We have data, but no idea what correlations might exist that we could model! + +In this case we need to use unsupervised learning. As the name suggests, this time we do not "supervise" the ML algorithm by providing it labels, but instead we let it try to find its own patterns in the data and report back on any correlations that it might find. You can think of unsupervised learning as a way to discover labels from the data itself. + # Clustering Clustering is the grouping of data points which are similar to each other. It can be a powerful technique for identifying patterns in data. -Clustering analysis does not usually require any training and is known as an unsupervised learning technique. The lack of a need for training -means it can be applied quickly. +Clustering analysis does not usually require any training and is therefore known as an unsupervised learning technique. Clustering can be applied quickly due to this lack of training. -## Applications of Clustering +## Applications of clustering * Looking for trends in data -* Data compression, all data clustering around a point can be reduced to just that point. For example, reducing colour depth of an image. +* Reducing the data around a point to just that point (e.g. reducing colour depth in an image) * Pattern recognition -## K-means Clustering +## K-means clustering -The K-means clustering algorithm is a simple clustering algorithm that tries to identify the centre of each cluster. +The k-means clustering algorithm is a simple clustering algorithm that tries to identify the centre of each cluster. It does this by searching for a point which minimises the distance between the centre and all the points in the cluster. -The algorithm needs to be told how many clusters to look for, but a common technique is to try different numbers of clusters and combine -it with other tests to decide on the best combination. +The algorithm needs to be told how many k clusters to look for, but a common technique is to try different numbers of clusters and combine +it with other tests to decide on the best combination. -### K-means with Scikit Learn +> ## Hyper-parameters again +> 'K' is also an exmaple of a *hyper-parameter* for the k-means clustering technique. Another example of a hyper-parameter is the N-degrees of freedom for polynomial regression. Keep an eye out for others throughout the lesson! +{: .callout} + +### K-means with Scikit-Learn -To perform a k-means clustering with Scikit learn we first need to import the sklearn.cluster module. +To perform a k-means clustering with Scikit-Learn we first need to import the sklearn.cluster module. ~~~ import sklearn.cluster as skl_cluster ~~~ {: .language-python} -For this example, we're going to use scikit learn's built in random data blob generator instead of using an external dataset. For this we'll also need the `sklearn.datasets.samples_generator` module. +For this example, we're going to use Scikit-Learn's built-in 'random data blob generator' instead of using an external dataset. Therefore we'll need the `sklearn.datasets.samples_generator` module. ~~~ import sklearn.datasets as skl_datasets ~~~ {: .language-python} -Now let's create some random blobs using the make_blobs function. The `n_samples` argument sets how many points we want to use in all of our blobs. `cluster_std` sets the standard deviation of the points, the smaller this value the closer together they will be. `centers` sets how many clusters we'd like. `random_state` is the initial state of the random number generator, by specifying this we'll get the same results every time we run the program. If we don't specify a random state then we'll get different points every time we run. This function returns two things, an array of data points and a list of which cluster each point belongs to. +Now lets create some random blobs using the `make_blobs` function. The `n_samples` argument sets how many points we want to use in all of our blobs while `cluster_std` sets the standard deviation of the points. The smaller this value the closer together they will be. `centers` sets how many clusters we'd like. `random_state` is the initial state of the random number generator. By specifying this value we'll get the same results every time we run the program. If we don't specify a random state then we'll get different points every time we run. This function returns two things: an array of data points and a list of which cluster each point belongs to. + +~~~ +import matplotlib.pyplot as plt + +#Lets define some functions here to avoid repetitive code +def plots_labels(data, labels): + tx = data[:, 0] + ty = data[:, 1] + + fig = plt.figure(1, figsize=(4, 4)) + plt.scatter(tx, ty, edgecolor='k', c=labels) + plt.show() + +def plot_clusters(data, clusters, Kmean): + tx = data[:, 0] + ty = data[:, 1] + fig = plt.figure(1, figsize=(4, 4)) + plt.scatter(tx, ty, s=5, linewidth=0, c=clusters) + for cluster_x, cluster_y in Kmean.cluster_centers_: + plt.scatter(cluster_x, cluster_y, s=100, c='r', marker='x') + plt.show() +~~~ +{: .language-python} + +Lets create the clusters. ~~~ data, cluster_id = skl_datasets.make_blobs(n_samples=400, cluster_std=0.75, centers=4, random_state=1) +plots_labels(data, cluster_id) ~~~ {: .language-python} -Now that we have some data we can go ahead and try to identify the clusters using K-means. First, we need to initialise the KMeans module and tell it how many clusters to look for. Next, we supply it some data via the fit function, in much the same we did with the regression functions earlier on. Finally, we run the predict function to find the clusters. +![Plot of the random clusters](../fig/random_clusters.png) + +Now that we have some data we can try to identify the clusters using k-means. First, we need to initialise the KMeans module and tell it how many clusters to look for. Next, we supply it with some data via the `fit` function, in much the same way we did with the regression functions earlier on. Finally, we run the predict function to find the clusters. ~~~ Kmean = skl_cluster.KMeans(n_clusters=4) @@ -68,18 +114,16 @@ clusters = Kmean.predict(data) ~~~ {: .language-python} -The data can now be plotted to show all the points we randomly generated. To make it clearer which cluster points have been classified to we can set the colours (the c parameter) to use the `clusters` list that was returned -by the predict function. The Kmeans algorithm also lets us know where it identified the centre of each cluster as. These are stored as a list called `cluster_centers_` inside the `Kmean` object. Let's go ahead and plot the points from the clusters, colouring them by the output from the K-means algorithm, and also plot the centres of each cluster as a red X. +The data can now be plotted to show all the points we randomly generated. To make it clearer which cluster points have been classified we can set the colours (the c parameter) to use the `clusters` list that was returned by the `predict` function. The Kmeans algorithm also lets us know where it identified the centre of each cluster. These are stored as a list called 'cluster_centers_' inside the `Kmean` object. Let's plot the points from the clusters, colouring them by the output from the K-means algorithm, and also plot the centres of each cluster as a red X. ~~~ -import matplotlib.pyplot as plt -plt.scatter(data[:, 0], data[:, 1], s=5, linewidth=0, c=clusters) -for cluster_x, cluster_y in Kmean.cluster_centers_: - plt.scatter(cluster_x, cluster_y, s=100, c='r', marker='x') -plt.show() +plot_clusters(data, clusters, Kmean) ~~~ {: .language-python} +![Plot of the fitted random clusters](../fig/random_clusters_centre.png) + +Here is the code all in a single block. ~~~ import sklearn.cluster as skl_cluster @@ -92,50 +136,48 @@ Kmean = skl_cluster.KMeans(n_clusters=4) Kmean.fit(data) clusters = Kmean.predict(data) -plt.scatter(data[:, 0], data[:, 1], s=5, linewidth=0, c=clusters) -for cluster_x, cluster_y in Kmean.cluster_centers_: - plt.scatter(cluster_x, cluster_y, s=100, c='r', marker='x') -plt.show() +plot_clusters(data, clusters, Kmean) ~~~ {: .language-python} > ## Working in multiple dimensions -> Although this example shows two dimensions the kmeans algorithm can work in more than two, it just becomes very difficult to show this visually +> Although this example shows two dimensions, the kmeans algorithm can work in more than two. It becomes very difficult to show this visually > once we get beyond 3 dimensions. Its very common in machine learning to be working with multiple variables and so our classifiers are working in > multi-dimensional spaces. {: .callout} -### Limitations of K-Means +### Limitations of k-means * Requires number of clusters to be known in advance * Struggles when clusters have irregular shapes -* Will always produce an answer finding the required number of clusters even if the data isn't clustered (or clustered in that many clusters). +* Will always produce an answer finding the required number of clusters even if the data isn't clustered (or clustered in that many clusters) * Requires linear cluster boundaries ![An example of kmeans failing on non-linear cluster boundaries](../fig/kmeans_concentric_circle.png) -### Advantages of K-Means +### Advantages of k-means -* Simple algorithm, fast to compute. A good choice as the first thing to try when attempting to cluster data. -* Suitable for large datasets due to its low memory and computing requirements. +* Simple algorithm and fast to compute +* A good choice as the first thing to try when attempting to cluster data +* Suitable for large datasets due to its low memory and computing requirements > ## Exercise: K-Means with overlapping clusters > Adjust the program above to increase the standard deviation of the blobs (the cluster_std parameter to make_blobs) and increase the number of samples (n_samples) to 4000. > You should start to see the clusters overlapping. > Do the clusters that are identified make sense? -> Is there any strange behaviour from this? +> Is there any strange behaviour? > > > ## Solution -> > The resulting image from increasing n_samples to 4000 and cluster_std to 3.0 looks like this: +> > Increasing n_samples to 4000 and cluster_std to 3.0 looks like this: > > ![Kmeans attempting to classify overlapping clusters](../fig/kmeans_overlapping_clusters.png) > > The straight line boundaries between clusters look a bit strange. > {: .solution} {: .challenge} > ## Exercise: How many clusters should we look for? -> As K-Means requires us to specify the number of clusters to expect a common strategy to get around this is to vary the number of clusters we are looking for. +> Using k-means requires us to specify the number of clusters to expect. A common strategy to get around this is to vary the number of clusters we are looking for. > Modify the program to loop through searching for between 2 and 10 clusters. Which (if any) of the results look more sensible? What criteria might you use to select the best one? > > ## Solution > > ~~~ @@ -152,41 +194,42 @@ plt.show() > > ~~~ > > {: .language-python} > > -> > None of these look very sensible clusterings because all the points really form one large cluster. -> > We might look at a measure of similarity of the cluster to test if its really multiple clusters. A simple standard deviation or interquartile range might be a good starting point. +> > None of these look like very sensible clusterings because all of the points form one large cluster. +> > We might look at a measure of similarity to test if this single cluster is actually multiple clusters. A simple standard deviation or interquartile range might be a good starting point. > {: .solution} {: .challenge} -## Spectral Clustering +## Spectral clustering Spectral clustering is a technique that attempts to overcome the linear boundary problem of k-means clustering. -It works by treating clustering as a graph partitioning problem, its looking for nodes in a graph with a small distance between them. See [this](http://www.cvl.isy.liu.se:82/education/graduate/spectral-clustering/SC_course_part1.pdf) introduction to Spectral Clustering if you are interested in more details about how spectral clustering works. +It works by treating clustering as a graph partitioning problem and looks for nodes in a graph with a small distance between them. See [this](https://www.cvl.isy.liu.se/education/graduate/spectral-clustering/SC_course_part1.pdf) introduction to spectral clustering if you are interested in more details about how spectral clustering works. -Here is an example of using spectral clustering on two concentric circles +Here is an example of spectral clustering on two concentric circles: ![Spectral clustering on two concentric circles](../fig/spectral_concentric_circle.png) -Spectral clustering uses something called a kernel trick to introduce additional dimensions to the data. +Spectral clustering uses something called a 'kernel trick' to introduce additional dimensions to the data. A common example of this is trying to cluster one circle within another (concentric circles). -A K-means classifier will fail to do this and will end up effectively drawing a line which crosses the circles. -Spectral clustering will introduce an additional dimension that effectively moves one of the circles away from the other in the -additional dimension. This has the downside of being more computationally expensive than k-means clustering. +A k-means classifier will fail to do this and will end up effectively drawing a line which crosses the circles. +However spectral clustering will introduce an additional dimension that effectively moves one of the circles away from the other in the +additional dimension. This does have the downside of being more computationally expensive than k-means clustering. ![Spectral clustering viewed with an extra dimension](../fig/spectral_concentric_3d.png) -### Spectral Clustering with Scikit Learn +### Spectral clustering with Scikit-Learn -Lets try out using Scikit Learn's spectral clustering. To make the concentric circles in the above example we need to use the make_circles function in the sklearn.datasets module. This works in a very similar way to the make_blobs function we used earlier on. +Lets try out using Scikit-Learn's spectral clustering. To make the concentric circles in the above example we need to use the `make_circles` function in the sklearn.datasets module. This works in a very similar way to the make_blobs function we used earlier on. ~~~ import sklearn.datasets as skl_data circles, circles_clusters = skl_data.make_circles(n_samples=400, noise=.01, random_state=0) +plots_labels(circles, circles_clusters) ~~~ {: .language-python} -The code for calculating the SpectralClustering is very similar to the kmeans clustering, instead of using the sklearn.cluster.KMeans class we use the sklearn.cluster.SpectralClustering class. +The code for calculating the SpectralClustering is very similar to the kmeans clustering, but instead of using the sklearn.cluster.KMeans class we use the `sklearn.cluster.SpectralClustering` class. ~~~ model = skl_cluster.SpectralClustering(n_clusters=2, affinity='nearest_neighbors', assign_labels='kmeans') ~~~ @@ -196,10 +239,11 @@ The SpectralClustering class combines the fit and predict functions into a singl ~~~ labels = model.fit_predict(circles) +plots_labels(circles, labels) ~~~ {: .language-python} -Here is the whole program combined with the kmeans clustering for comparison. Note that this produces two figures, to view both of them use the "Inline" graphics terminal inside the Python console instead of the "Automatic" method which will open a window and only show you one of the graphs. +Here is the whole program combined with the kmeans clustering for comparison. Note that this produces two figures. To view both of them use the "Inline" graphics terminal inside the Python console instead of the "Automatic" method which will open a window and only show you one of the graphs. ~~~ import sklearn.cluster as skl_cluster @@ -213,28 +257,26 @@ Kmean.fit(circles) clusters = Kmean.predict(circles) # plot the data, colouring it by cluster -plt.scatter(circles[:, 0], circles[:, 1], s=15, linewidth=0.1, c=clusters,cmap='flag') -plt.show() +plot_clusters(circles, clusters, Kmean) # cluster with spectral clustering model = skl_cluster.SpectralClustering(n_clusters=2, affinity='nearest_neighbors', assign_labels='kmeans') labels = model.fit_predict(circles) -plt.scatter(circles[:, 0], circles[:, 1], s=15, linewidth=0, c=labels, cmap='flag') -plt.show() +plots_labels(circles, labels) ~~~ {: .language-python} > ## Comparing k-means and spectral clustering performance -> Modify the program we wrote in the previous exercise to use spectral clustering instead of k-means, save it as a new file. -> Time how long both programs take to run. Add the line `import time` at the top of both files, as the first line in the file get the start time with `start_time = time.time()`. +> Modify the program we wrote in the previous exercise to use spectral clustering instead of k-means and save it as a new file. +> Time how long both programs take to run. Add the line `import time` at the top of both files as the first line, and get the start time with `start_time = time.time()`. > End the program by getting the time again and subtracting the start time from it to get the total run time. Add `end_time = time.time()` and `print("Elapsed time:",end_time-start_time,"seconds")` to the end of both files. > Compare how long both programs take to run generating 4,000 samples and testing them for between 2 and 10 clusters. > How much did your run times differ? > How much do they differ if you increase the number of samples to 8,000? > How long do you think it would take to compute 800,000 samples (estimate this, it might take a while to run for real)? > > ## Solution -> > KMeans version, runtime around 4 seconds (your computer might be faster/slower) +> > KMeans version: runtime around 4 seconds (your computer might be faster/slower) > > ~~~ > > import matplotlib.pyplot as plt > > import sklearn.cluster as skl_cluster @@ -260,7 +302,7 @@ plt.show() > > ~~~ > > {: .language-python} > > -> > Spectral version, runtime around 9 seconds (your computer might be faster/slower) +> > Spectral version: runtime around 9 seconds (your computer might be faster/slower) > > ~~~ > > import matplotlib.pyplot as plt > > import sklearn.cluster as skl_cluster @@ -286,11 +328,11 @@ plt.show() > > {: .language-python} > > > > When the number of points increases to 8000 the runtimes are 24 seconds for the spectral version and 5.6 seconds for kmeans. -> > The runtime numbers will differ depending on the speed of your computer, but the relative different should be similar. -> > For 4000 points kmeans took 4 seconds, spectral 9 seconds, 2.25 fold difference. -> > For 8000 points kmeans took 5.6 seconds, spectral took 24 seconds. 4.28 fold difference. Kmeans 1.4 times slower for double the data, spectral 2.6 times slower. -> > The realative difference is diverging. Its double by doubling the amount of data. If we use 100 times more data we might expect a 100 fold divergence in execution times. -> > Kmeans might take a few minutes, spectral will take hours. +> > The runtime numbers will differ depending on the speed of your computer, but the relative difference should be similar. +> > For 4000 points kmeans took 4 seconds, while spectral took 9 seconds. A 2.25 fold difference. +> > For 8000 points kmeans took 5.6 seconds, while spectral took 24 seconds. A 4.28 fold difference. Kmeans is 1.4 times slower for double the data, while spectral is 2.6 times slower. +> > The realative difference is diverging. If we used 100 times more data we might expect a 100 fold divergence in execution times. +> > Kmeans might take a few minutes while spectral will take hours. > {: .solution} {: .challenge} diff --git a/_episodes/05-dimensionality-reduction.md b/_episodes/05-dimensionality-reduction.md deleted file mode 100644 index 66dd8dc..0000000 --- a/_episodes/05-dimensionality-reduction.md +++ /dev/null @@ -1,159 +0,0 @@ ---- -title: "Dimensionality Reduction" -teaching: 0 -exercises: 0 -questions: -- "How can we perform unsupervised learning with dimensionality reduction techniques such as Principle Component Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE)?" -objectives: -- "Recall that most data is inherently multidimensional" -- "Understand that reducing the number of dimensions can simplify modelling and allow classifications to be performed." -- "Recall that PCA is a popular technique for dimensionality reduction." -- "Recall that t-SNE is another technique for dimensionality reduction." -- "Apply PCA and t-SNE with Scikit Learn to an example dataset." -- "Evaluate the relative peformance of PCA and t-SNE." -keypoints: -- "PCA is a linear dimensionality reduction technique for tabular data" -- "t-SNE is another dimensionality reduction technique for tabular data that is more general than PCA" ---- - -# Dimensionality Reduction - -Dimensionality reduction is the process of using a subset of the coordinates, -which may be transformed, of the dataset to capture the variation in features -of the data set. It can be a helpful pre-processing step before doing other -operations on the data, such as classification, regression or visualization. - -## Dimensionality Reduction with Scikit-learn - -First setup our environment and load the MNIST digits dataset which will be used -as our initial example. - -~~~ -import numpy as np -import matplotlib.pyplot as plt - -from sklearn import decomposition -from sklearn import datasets -from sklearn import manifold - -digits = datasets.load_digits() - -# Examine the dataset -print(digits.data) -print(digits.target) - -X = digits.data -y = digits.target -~~~ -{: .language-python} - -### Principle Component Analysis (PCA) - -PCA is a technique that does rotations of data in a two dimensional -array to decompose the array into combinations vectors that are orthogonal -and can be ordered according to the amount of information they carry. - -~~~ -# PCA -pca = decomposition.PCA(n_components=2) -pca.fit(X) -X_pca = pca.transform(X) - -fig = plt.figure(1, figsize=(4, 4)) -plt.clf() -plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap=plt.cm.nipy_spectral, - edgecolor='k',label=y) -plt.colorbar(boundaries=np.arange(11)-0.5).set_ticks(np.arange(10)) -plt.savefig("pca.svg") -~~~ -{: .language-python} - -![Reduction using PCA](../fig/pca.svg) - -### t-distributed Stochastic Neighbor Embedding (t-SNE) - -~~~ -# t-SNE embedding -tsne = manifold.TSNE(n_components=2, init='pca', - random_state = 0) -X_tsne = tsne.fit_transform(X) -fig = plt.figure(1, figsize=(4, 4)) -plt.clf() -plt.scatter(X_tsne[:, 0], X_tsne[:, 1], c=y, cmap=plt.cm.nipy_spectral, - edgecolor='k',label=y) -plt.colorbar(boundaries=np.arange(11)-0.5).set_ticks(np.arange(10)) -plt.savefig("tsne.svg") -~~~ -{: .language-python} - -![Reduction using t-SNE](../fig/tsne.svg) - - - -> ## Exercise: Working in three dimensions -> The above example has considered only two dimensions since humans -> can visualize two dimensions very well. However, there can be cases -> where a dataset requires more than two dimensions to be appropriately -> decomposed. Modify the above programs to use three dimensions and -> create appropriate plots. -> Do three dimensions allow one to better distinguish between the digits? -> -> > ## Solution -> > ~~~ -> > from mpl_toolkits.mplot3d import Axes3D -> > # PCA -> > pca = decomposition.PCA(n_components=3) -> > pca.fit(X) -> > X_pca = pca.transform(X) -> > fig = plt.figure(1, figsize=(4, 4)) -> > plt.clf() -> > ax = fig.add_subplot(projection='3d') -> > ax.scatter(X_pca[:, 0], X_pca[:, 1], X_pca[:, 2], c=y, -> > cmap=plt.cm.nipy_spectral, s=9, lw=0) -> > plt.savefig("pca_3d.svg") -> > ~~~ -> > {: .language-python} -> > -> > ![Reduction to 3 components using pca](../fig/pca_3d.svg) -> > -> > ~~~ -> > # t-SNE embedding -> > tsne = manifold.TSNE(n_components=3, init='pca', -> > random_state = 0) -> > X_tsne = tsne.fit_transform(X) -> > fig = plt.figure(1, figsize=(4, 4)) -> > plt.clf() -> > ax = fig.add_subplot(projection='3d') -> > ax.scatter(X_tsne[:, 0], X_tsne[:, 1], X_tsne[:, 2], c=y, -> > cmap=plt.cm.nipy_spectral, s=9, lw=0) -> > plt.savefig("tsne_3d.svg") -> > ~~~ -> > {: .language-python} -> > -> > ![Reduction to 3 components using tsne](../fig/tsne_3d.svg) -> > -> > -> {: .solution} -{: .challenge} - -> ## Exercise: Parameters -> -> Look up parameters that can be changed in PCA and t-SNE, -> and experiment with these. How do they change your resulting -> plots? Might the choice of parameters lead you to make different -> conclusions about your data? -{: .challenge} - -> ## Exercise: Other Algorithms -> -> There are other algorithms that can be used for doing dimensionality -> reduction, for example the Higher Order Singular Value Decomposition (HOSVD) -> Do an internet search for some of these and -> examine the example data that they are used on. Are there cases where they do -> poorly? What level of care might you need to use before applying such methods -> for automation in critical scenarios? What about for interactive data -> exploration? -{: .challenge} - -{% include links.md %} - diff --git a/_episodes/06-dimensionality-reduction.md b/_episodes/06-dimensionality-reduction.md new file mode 100644 index 0000000..ab57dcd --- /dev/null +++ b/_episodes/06-dimensionality-reduction.md @@ -0,0 +1,300 @@ +--- +title: "Unsupervised methods - Dimensionality reduction" +teaching: 30 +exercises: 30 +questions: +- How do we apply machine learning techniques to data with higher dimensions? +objectives: +- "Recall that most data is inherently multidimensional." +- "Understand that reducing the number of dimensions can simplify modelling and allow classifications to be performed." +- "Apply Principle Component Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE) to reduce the dimensions of data." +- "Evaluate the relative peformance of PCA and t-SNE in reducing data dimensionality." +keypoints: +- "PCA is a linear dimensionality reduction technique for tabular data" +- "t-SNE is another dimensionality reduction technique for tabular data that is more general than PCA" +--- + +# Dimensionality reduction + +As seen in the last episode, general clustering algorithms work well with low-dimensional data. In this episode we see how higher-dimensional data, such as images of handwritten text or numbers, can be processed with dimensionality reduction techniques to make the datasets more accessible for other modelling techniques. The dataset we will be using is the Scikit-Learn subset of the Modified National Institute of Standards and Technology (MNIST) dataset. + +![MNIST example illustrating all the classes in the dataset](../fig/MnistExamples.png) + + +The MNIST dataset contains 70,000 images of handwritten numbers, and are labelled from 0-9 with the number that each image contains. Each image is a greyscale and 28x28 pixels in size for a total of 784 pixels per image. Each pixel can take a value between 0-255 (8bits). When dealing with a series of images in machine learning we consider each pixel to be a feature that varies according to each of the sample images. Our previous penguin dataset only had no more than 7 features to train with, however even a small 28x28 MNIST image has as much as 784 features (pixels) to work with. + +![MNIST example of a single image](../fig/mnist_30000-letter.png) + +To make this episode a bit less computationally intensive, the Scikit-Learn example that we will work with is a smaller sample of 1797 images. Each image is 8x8 in size for a total of 64 pixels per image, resulting in 64 features for us to work with. The pixels can take a value between 0-15 (4bits). Let's retrieve and inspect the Scikit-Learn dataset with the following code: + +~~~ +import numpy as np +import matplotlib.pyplot as plt +import sklearn.cluster as skl_cluster +from sklearn import manifold, decomposition, datasets + +# Let's define these here to avoid repetitive code +def plots_labels(data, labels): + tx = data[:, 0] + ty = data[:, 1] + + fig = plt.figure(1, figsize=(4, 4)) + plt.scatter(tx, ty, edgecolor='k', c=labels) + plt.show() + +def plot_clusters(data, clusters, Kmean): + tx = data[:, 0] + ty = data[:, 1] + fig = plt.figure(1, figsize=(4, 4)) + plt.scatter(tx, ty, s=5, linewidth=0, c=clusters) + for cluster_x, cluster_y in Kmean.cluster_centers_: + plt.scatter(cluster_x, cluster_y, s=100, c='r', marker='x') + plt.show() + +def plot_clusters_labels(data, labels): + tx = data[:, 0] + ty = data[:, 1] + + # with labels + fig = plt.figure(1, figsize=(5, 4)) + plt.scatter(tx, ty, c=labels, cmap="nipy_spectral", + edgecolor='k', label=labels) + plt.colorbar(boundaries=np.arange(11)-0.5).set_ticks(np.arange(10)) + plt.show() +~~~ +{: .language-python} + +Next lets load in the digits dataset, +~~~ +# load in dataset as a Pandas Dataframe, return X and Y +features, labels = datasets.load_digits(return_X_y=True, as_frame=True) + +print(features.shape, labels.shape) +print(labels) +features.head() +~~~ +{: .language-python} + +## Our goal: using dimensionality-reduction to help with machine learning + +As humans we are pretty good at object and pattern recognition. We can look at the images above, inspect the intensity and position pixels relative to other pixels, and pretty quickly make an accurate guess at what the image shows. As humans we spends much of our younger lives learning these spatial relations, and so it stands to reason that computers can also extract these relations. Let's see if it is possible to use unsupervised clustering techniques to pull out relations in our MNIST dataset of number images. + + +> ## Exercise: Try to visually inspect the dataset and features for correlations +> As we did for previous datasets, lets visually inspect relationships between our features/pixels. Try and investigate the following pixels for relations (written "row_column"): 0_4, 1_4, 2_4, and 3_4. +> +> > ## Solution +> > ~~~ +> > +> > print(features.iloc[0]) +> > image_1D = features.iloc[0] +> > image_2D = np.array(image_1D).reshape(-1,8) +> > +> > plt.imshow(image_2D,cmap="gray_r") +> > # these points are the pixels we will investigate +> > # pixels 0,1,2,3 of row 4 of the image +> > plt.plot([0,1,2,3],[4,4,4,4],"rx") +> > plt.show() +> > ~~~ +> > {: .language-python} +> > +> > ![SKLearn image with highlighted pixels](../fig/mnist_pairplot_pixels.png) +> > +> > ~~~ +> > import seaborn as sns +> > +> > # make a temporary copy of data for plotting here only +> > seaborn_data = features +> > +> > # add labels for pairplot color coding +> > seaborn_data["labels"] = labels +> > +> > # make a short list of N features for plotting N*N figures +> > # 4**2 = 16 plots, whereas 64**2 is over 4000! +> > feature_subset = [] +> > for i in range(4): +> > feature_subset.append("pixel_"+str(i)+"_4") +> > +> > sns.pairplot(seaborn_data, vars=feature_subset, hue="labels", +> > palette=sns.mpl_palette("Spectral", n_colors=10)) +> > ~~~ +> > {: .language-python} +> > +> > ![SKLearn image with highlighted pixels](../fig/mnist_pairplot.png) +> > +> > As we can see the dataset relations are far more complex than our previous examples. The histograms show that some numbers appear in those pixel positions more than others, but the `feature_vs_feature` plots are quite messy to try and decipher. There are gaps and patches of colour suggesting that there is some kind of structure there, but it's far harder to inspect than the penguin data. We can't easily see definitive clusters in our 2D representations, and we know our clustering algorithms will take a long time to try and crunch 64 dimensions at once, so let's see if we can represent our 64D data in fewer dimensions. +> > +> {: .solution} +{: .challenge} + +# Dimensionality reduction with Scikit-Learn +We will look at two commonly used techniques for dimensionality reduction: Principal Component Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE). Both of these techniques are supported by Scikit-Learn. + +### Principal Component Analysis (PCA) + +PCA allows us to replace our 64 features with a smaller number of dimensional representations that retain the majority of our variance/relational data. Using Scikit-Learn lets apply PCA in a relatively simple way. + +For more in depth explanations of PCA please see the following links: +* [https://builtin.com/data-science/step-step-explanation-principal-component-analysis](https://builtin.com/data-science/step-step-explanation-principal-component-analysis) +* [https://scikit-learn.org/stable/modules/decomposition.html#pca](https://scikit-learn.org/stable/modules/decomposition.html#pca) + +Let's apply PCA to the MNIST dataset and retain the two most-major components: + +~~~ +# PCA with 2 components +pca = decomposition.PCA(n_components=2) +x_pca = pca.fit_transform(features) + +print(x_pca.shape) +~~~ +{: .language-python} + +This returns us an array of 1797x2 where the 2 remaining columns(our new "features" or "dimensions") contain vector representations of the first principle components (column 0) and second principle components (column 1) for each of the images. We can plot these two new features against each other: + +~~~ +# We are passing None becuase it is an unlabelled plot +plots_labels(x_pca, None) +~~~ +{: .language-python} + +![Reduction using PCA](../fig/pca_unlabelled.png) + +We now have a 2D representation of our 64D dataset that we can work with instead. Let's try some quick K-means clustering on our 2D representation of the data. Because we already have some knowledge about our data we can set `k=10` for the 10 digits present in the dataset. + +~~~ +Kmean = skl_cluster.KMeans(n_clusters=10) +Kmean.fit(x_pca) +clusters = Kmean.predict(x_pca) +plot_clusters(x_pca, clusters, Kmean) +~~~ +{: .language-python} + +![Reduction using PCA](../fig/pca_clustered.png) + +And now we can compare how these clusters look against our actual image labels by colour coding our first scatter plot: + +~~~ +plot_clusters_labels(x_pca, labels) +~~~ +{: .language-python} + +![Reduction using PCA](../fig/pca_labelled.png) + +PCA has done a valiant effort to reduce the dimensionality of our problem from 64D to 2D while still retaining some of our key structural information. We can see that the digits `0`,`1`,`4`, and `6` cluster up reasonably well even using a simple k-means test. However it does look like there is still quite a bit of overlap between the remaining digits, especially for the digits `5` and `8`. The clustering is from perfect in the largest "blob", but not a bad effort from PCA given the substantial dimensionality reduction. + +It's worth noting that PCA does not handle outlier data well primarily due to global preservation of structural information, and so we will now look at a more complex form of learning that we can apply to this problem. + +### t-distributed Stochastic Neighbor Embedding (t-SNE) + +t-SNE is a powerful example of manifold learning - a non-deterministic non-linear approach to dimensionality reduction. Manifold learning tasks are based on the idea that the dimension of many datasets is artificially high. This is likely the case for our MNIST dataset, as the corner pixels of our images are unlikely to contain digit data, and thus those dimensions are almost negligable compared with others. + +The versatility of the algorithm in transforming the underlying structural information into lower-order projections makes t-SNE applicable to a wide range of research domains. + +For more in depth explanations of t-SNE and manifold learning please see the following links which also contain som very nice visual examples of manifold learning in action: +* [https://thedatafrog.com/en/articles/visualizing-datasets/](https://thedatafrog.com/en/articles/visualizing-datasets/) +* [https://scikit-learn.org/stable/modules/manifold.html](https://scikit-learn.org/stable/modules/manifold.html) + +Scikit-Learn allows us to apply t-SNE in a relatively simple way. Lets code and apply t-SNE to the MNIST dataset in the same manner that we did for the PCA example, and reduce the data down from 64D to 2D again: + +~~~ +# t-SNE embedding +# initialising with "pca" explicitly preserves global structure +tsne = manifold.TSNE(n_components=2, init='pca', random_state = 0) +x_tsne = tsne.fit_transform(features) + +plots_labels(x_tsne, None) +~~~ +{: .language-python} + +![Reduction using PCA](../fig/tsne_unlabelled.png) + +It looks like t-SNE has done a much better job of splitting our data up into clusters using only a 2D representation of the data. Once again, let's run a simple k-means clustering on this new 2D representation, and compare with the actual color-labelled data: + +~~~ +Kmean = skl_cluster.KMeans(n_clusters=10) + +Kmean.fit(x_tsne) +clusters = Kmean.predict(x_tsne) + +plot_clusters(x_tsne, clusters, Kmean) +plot_clusters_labels(x_tsne, labels) +~~~ +{: .language-python} + +![Reduction using PCA](../fig/tsne_clustered.png)![Reduction using PCA](../fig/tsne_labelled.png) + + +It looks like t-SNE has successfully separated out our digits into accurate clusters using as little as a 2D representation and a simple k-means clustering algorithm. It has worked so well that you can clearly see several clusters which can be modelled, whereas for our PCA representation we needed to rely heavily on the knowledge that we had 10 types of digits to cluster. + +Additionally, if we had run k-means on all 64 dimensions this would likely still be computing away, whereas we have already broken down our dataset into accurate clusters, with only a handful of outliers and potential misidentifications (remember, a good ML model isn't a perfect model!) + +The major drawback of applying t-SNE to datasets is the large computational requirement. Furthermore, hyper-parameter tuning of t-SNE usually requires some trial and error to perfect. + +Our example here is still a relatively simple example of 8x8 images and not very typical of the modern problems that can now be solved in the field of ML and DL. To account for even higher-order input data, neural networks were developed to more accurately extract feature information. + + +> ## Exercise: Working in three dimensions +> The above example has considered only two dimensions since humans +> can visualize two dimensions very well. However, there can be cases +> where a dataset requires more than two dimensions to be appropriately +> decomposed. Modify the above programs to use three dimensions and +> create appropriate plots. +> Do three dimensions allow one to better distinguish between the digits? +> +> > ## Solution +> > ~~~ +> > from mpl_toolkits.mplot3d import Axes3D +> > # PCA +> > pca = decomposition.PCA(n_components=3) +> > pca.fit(x) +> > x_pca = pca.transform(x) +> > fig = plt.figure(1, figsize=(4, 4)) +> > ax = fig.add_subplot(projection='3d') +> > ax.scatter(x_pca[:, 0], x_pca[:, 1], x_pca[:, 2], c=y, +> > cmap=plt.cm.nipy_spectral, s=9, lw=0) +> > plt.show() +> > ~~~ +> > {: .language-python} +> > +> > ![Reduction to 3 components using pca](../fig/pca_3d.svg) +> > +> > ~~~ +> > # t-SNE embedding +> > tsne = manifold.TSNE(n_components=3, init='pca', +> > random_state = 0) +> > x_tsne = tsne.fit_transform(x) +> > fig = plt.figure(1, figsize=(4, 4)) +> > ax = fig.add_subplot(projection='3d') +> > ax.scatter(x_tsne[:, 0], x_tsne[:, 1], x_tsne[:, 2], c=y, +> > cmap=plt.cm.nipy_spectral, s=9, lw=0) +> > plt.show() +> > ~~~ +> > {: .language-python} +> > +> > ![Reduction to 3 components using tsne](../fig/tsne_3d.svg) +> > +> > +> {: .solution} +{: .challenge} + +> ## Exercise: Parameters +> +> Look up parameters that can be changed in PCA and t-SNE, +> and experiment with these. How do they change your resulting +> plots? Might the choice of parameters lead you to make different +> conclusions about your data? +{: .challenge} + +> ## Exercise: Other algorithms +> +> There are other algorithms that can be used for doing dimensionality +> reduction (for example the Higher Order Singular Value Decomposition (HOSVD)). +> Do an internet search for some of these and +> examine the example data that they are used on. Are there cases where they do +> poorly? What level of care might you need to use before applying such methods +> for automation in critical scenarios? What about for interactive data +> exploration? +{: .challenge} + +{% include links.md %} + diff --git a/_episodes/07-ethics.md b/_episodes/07-ethics.md deleted file mode 100644 index ddd5c0b..0000000 --- a/_episodes/07-ethics.md +++ /dev/null @@ -1,74 +0,0 @@ ---- -title: "Ethics and Implications of Machine Learning" -teaching: 10 -exercises: 5 -questions: -- "What are the ethical implications of using machine learning in research?" -objectives: -- "To think about the ethical implications of machine learning." -- "To think about any ethical implications for using machine learning in research." -keypoints: -- "Machine learning is often thought of as unbiased and impartial. But if the training data is biased the machine learning will be." -- "Many machine learning algorithms can't explain how they arrived at a decision." -- "There is a lot of concern about how machine learning can be used for unethical purposes." -- "No machine learning system is 100% accurate, think about the implications of false positives and false negatives." ---- - -# Ethics and Machine Learning - -There are increasing worries about the ethics of using machine learning. -In recent year's we've seen a number of worrying problems from machine learning entering all kinds of aspects of daily life and the economy: - -* The first death from an autonomous car which failed to brake for a pedestrian.[\[1\]](https://www.forbes.com/sites/meriameberboucha/2018/05/28/uber-self-driving-car-crash-what-really-happened/) -* Highly targetted advertising based around social media and internet usage. [\[2\]](https://www.wired.com/story/big-tech-can-use-ai-to-extract-many-more-ad-dollars-from-our-clicks/) -* The outcomes of elections and referendums being influenced by highly targetted social media posts . This is compunded by the data being obtained without the users's consent. [\[3\]](https://www.vox.com/policy-and-politics/2018/3/23/17151916/facebook-cambridge-analytica-trump-diagram) -* The mass deploymeny of facial recognition technologies. [\[4\]](https://www.bbc.co.uk/news/technology-44089161) -* The possible first use of autonomous military robots making a decision to kill in battle. [\[5\]](https://www.theverge.com/2021/6/3/22462840/killer-robot-autonomous-drone-attack-libya-un-report-context) - -## Problems with bias - -Machine learning systems are often presented as more impartial and consistent ways to make decisions. For example sentencing criminals or -deciding if somebody should be granted bail. There have been a number of examples recently where machine learning systems have been shown to -be biased because the data they were trained on was already biased. This can occur due to the training data being unrepresentative and -under representing certain groups. For example if you were trying to automatically screen job candidates and used a sample of people the -same company had previously decided to employ then any biases in their past employment processes would be reflected in the machine learning. - -## Problems with explaining decisions - -Many machine learning systems (e.g. neural networks) can't really explain their decisions. Although the input and output are known trying to -explain why the training caused the network to behave in a certain way can be very difficult. If a decision is questioned by a human its -difficult to provide any rationale as to how a decision was arrived at. - -## Problems with accuracy - -No machine learning system is ever 100% accurate. Getting into the high 90s is usually considered good. -But when we're evaluating millions of data items this can translate into 100s of thousands of mis-identifications. -If the implications of these incorrect decisions are serious then it will cause major problems. For instance if it results in somebody -being imprisoned or even investigated for a crime or maybe just being denied insurance or a credit card. - -## Energy Usage - -Many machine learning systems (especially deep learning) need vast amounts of computational power which in turn can consume vast amounts of energy. Depending on the source of that energy this might account for significant amounts of fossil fuels being burned. It is not uncommon for a modern GPU accelerated computer to use several kilowatts of power, running this for one hour could easily use as much energy a typical home would use in an entire day. This can be particularly bad when models are constantly being retrained or when "parameter sweeps" are done to find the best set of parameters to train with. - -# Ethics of machine learning in research - -Not all research using machine learning will have major ethical implications. -Many research projects don't directly affect the lives of other people, but this isn't always the case. - -Some questions you might want to ask yourself (and which an ethics committee might also ask you): - - * Will anything your machine learning system does make a decision that somehow affects a person's life? - * Will anything your machine learning system does make a decision that somehow affects an animial's life? - * Will you be using any people to create your training data? Will they have to look at any disturbing or traumatic material during the training process? - * Are there any inherent biases in the dataset(s) you're using for training? - * How much energy will this computation use? Are there more efficient ways to get the same answer? - - -> ## Exercise: Ethical implications of your own research -> Split into pairs or groups of three. -> Think of a use case for machine learning in your research areas. -> What ethical implications (if any) might there be from using machine learning in your research? -> Write down your group's answers in the etherpad. -{: .challenge} - -{% include links.md %} diff --git a/_episodes/06-neural-networks.md b/_episodes/07-neural-networks.md similarity index 62% rename from _episodes/06-neural-networks.md rename to _episodes/07-neural-networks.md index 4f3a5b2..d13d26c 100644 --- a/_episodes/06-neural-networks.md +++ b/_episodes/07-neural-networks.md @@ -3,13 +3,13 @@ title: "Neural Networks" teaching: 20 exercises: 30 questions: +- "What are Neural Networks?" - "How can we classify images using a neural network?" objectives: -- "Explain the basic architecture of a perceptron." -- "Create a perceptron to encode a simple function." -- "Understand that a single perceptron cannot solve a problem requiring non-linear separability." +- "Understand the basic architecture of a perceptron." +- "Be able to create a perceptron to encode a simple function." - "Understand that layers of perceptrons allow non-linear separable problems to be solved." -- "Train a multi-layer perceptron using scikit-learn." +- "Train a multi-layer perceptron using Scikit-Learn." - "Evaluate the accuracy of a multi-layer perceptron using real input data." - "Understand that cross validation allows the entire data set to be used in the training process." keypoints: @@ -19,17 +19,16 @@ keypoints: - "Multiple perceptrons can be combined to form a neural network which can solve functions that aren't linearly separable." - "We can train a whole neural network with the back propagation algorithm. Scikit-learn includes an implementation of this algorithm." - "Training a neural network requires some training data to show the network examples of what to learn." -- "To validate our training we split the the training data into a training set and a test set." +- "To validate our training we split the training data into a training set and a test set." - "To ensure the whole dataset can be used in training and testing we can train multiple times with different subsets of the data acting as training/testing data. This is called cross validation." -- "Deep learning neural networks are a very powerful modern technique. Scikit learn does not support these but other libraries like Tensorflow do." +- "Deep learning neural networks are a very powerful modern machine learning technique. Scikit-Learn does not support these but other libraries like Tensorflow do." - "Several companies now offer cloud APIs where we can train neural networks on powerful computers." --- -# Introduction - -Neural networks are a machine learning method inspired by how the human brain works. They are particularly good at doing pattern recognition and classification tasks, often using images as inputs. They are a well-established machine learning technique that has been around since the 1950s but have gone through several iterations since that have overcome fundamental limitations of the previous one. The current state-of-the-art neural networks is often referred to as deep learning. +# Neural networks +Neural networks are a machine learning method inspired by how the human brain works. They are particularly good at pattern recognition and classification tasks, often using images as inputs. They are a well-established machine learning technique, having been around since the 1950s, but they've gone through several iterations to overcome limitations in previous generations. Using state-of-the-art neural networks is often referred to as 'deep learning'. ## Perceptrons @@ -39,9 +38,9 @@ Perceptrons are the building blocks of neural networks. They are an artificial v ### Coding a perceptron -Below is an example of a perceptron written as a Python function. The function takes three parameters: Inputs is a list of input values, Weights is a list of weight values and Threshold is the activation threshold. +Below is an example of a perceptron written as a Python function. The function takes three parameters: `Inputs` is a list of input values, `Weights` is a list of weight values and `Threshold` is the activation threshold. -First let us multiply each input by the corresponding weight. To do this quickly and concisely, we will use the numpy multiply function which can multiply each item in a list by a corresponding item in another list. +First we multiply each input by the corresponding weight. To do this quickly and concisely, we will use the numpy multiply function which can multiply each item in a list by a corresponding item in another list. We then take the sum of all the inputs multiplied by their weights. Finally, if this value is less than the activation threshold, we output zero, otherwise we output a one. @@ -121,7 +120,7 @@ for input in inputs: NOT: -The NOT function only has a single input but to make it work in the perceptron, we need to introduce a bias term which is always the same value. In this example, it is the second input. It has a weight of 1.0, the weight on the real input is -1.0. +The NOT function only has a single input. To make it work in the perceptron we need to introduce a bias term which is always the same value. In this example it is the second input. It has a weight of 1.0 while the weight on the real input is -1.0. ~~~ inputs = [[0.0,1.0],[1.0,1.0]] for input in inputs: @@ -129,7 +128,7 @@ for input in inputs: ~~~ {: .language-python} -A perceptron can be trained to compute any function which has linear separability. A simple training algorithm called the perceptron learning algorithm can be used to do this and scikit-learn has its own implementation of it. We are going to skip over the perceptron learning algorithm and move straight onto more powerful techniques. If you want to learn more about it see [this page](https://computing.dcu.ie/~humphrys/Notes/Neural/single.neural.html) from Dublin City University. +A perceptron can be trained to compute any function which has linear separability. A simple training algorithm called the perceptron learning algorithm can be used to do this and Scikit-Learn has its own implementation of it. We are going to skip over the perceptron learning algorithm and move straight onto more powerful techniques. If you want to learn more about it see [this page](https://computing.dcu.ie/~humphrys/Notes/Neural/single.neural.html) from Dublin City University. @@ -146,24 +145,24 @@ A single perceptron cannot solve any function that is not linearly separable, me (Make a graph of this) -This function outputs a zero both when all its inputs are one or zero and its not possible to separate with a straight line. This is known as linear separability, when this limitation was discovered in the 1960s it effectively halted development of neural networks for over a decade in a period known as the "AI Winter". +This function outputs a zero when all its inputs are one or zero and its not possible to separate with a straight line. This is known as linear separability. When this limitation was discovered in the 1960s it effectively halted development of neural networks for over a decade in a period known as the "AI Winter". -## Multi-layer Perceptrons +## Multi-layer perceptrons -A single perceptron cannot be used to solve a non-linearly separable function. For that, we need to use multiple perceptrons and typically multiple layers of perceptrons. They are formed of networks of artificial neurons which each take one or more inputs and typically have a single output. The neurons are connected together in large networks typically of 10s to 1000s of neurons. Typically, networks are connected in layers with an input layer, middle or hidden layer (or layers) and finally an output layer. +A single perceptron cannot be used to solve a non-linearly separable function. For that, we need to use multiple perceptrons and typically multiple layers of perceptrons. They are formed of networks of artificial neurons which each take one or more inputs and typically have a single output. The neurons are connected together in networks of 10s to 1000s of neurons. Typically, networks are connected in layers with an input layer, middle or hidden layer (or layers), and finally an output layer. ![A multi-layer perceptron](../fig/multilayer_perceptron.svg) -### Training Multi-layer perceptrons +### Training multi-layer perceptrons -Multi-layer perceptrons need to be trained by showing them a set of training data and measuring the error between the network's predicted output and the true value. Training takes an iterative approach that improves the network a little each time a new training example is presented. There are a number of training algorithms available for a neural network today, but we are going to use one of the best established and well known, the backpropagation algorithm. The algorithm is called back propagation because it takes the error calculated between an output of the network and the true value and takes it back through the network to update the weights. If you want to read more about back propagation, please see [this chapter](http://page.mi.fu-berlin.de/rojas/neural/chapter/K7.pdf) from the book "Neural Networks - A Systematic Introduction". +Multi-layer perceptrons need to be trained by showing them a set of training data and measuring the error between the network's predicted output and the true value. Training takes an iterative approach that improves the network a little each time a new training example is presented. There are a number of training algorithms available for a neural network today, but we are going to use one of the best established and well known, the backpropagation algorithm. This algorithm is called back propagation because it takes the error calculated between an output of the network and the true value and takes it back through the network to update the weights. If you want to read more about back propagation, please see [this chapter](http://page.mi.fu-berlin.de/rojas/neural/chapter/K7.pdf) from the book "Neural Networks - A Systematic Introduction". -### Multi-layer perceptrons in scikit-learn +### Multi-layer perceptrons in Scikit-Learn -We are going to build a multi-layer perceptron for recognising handwriting from images. Scikit Learn includes some example handwriting data from the [MNIST data set](http://yann.lecun.com/exdb/mnist/), this consists of 70,000 images of hand written digits. Each image is 28x28 pixels in size (784 pixels in total) and is represented in grayscale with values between zero for fully black and 255 for fully white. This means we will need 784 perceptrons in our input layer, each taking the input of one pixel and 10 perceptrons in our output layer to represent each digit we might classify. If trained correctly then only the perceptron in the output layer to "fire" will be on the one representing the in the image (this is a massive oversimplification!). +We are going to build a multi-layer perceptron for recognising handwriting from images. Scikit-Learn includes some example handwriting data from the [MNIST data set](http://yann.lecun.com/exdb/mnist/), which is a dataset containing 70,000 images of hand-written digits. Each image is 28x28 pixels in size (784 pixels in total) and is represented in grayscale with values between zero for fully black and 255 for fully white. This means we will need 784 perceptrons in our input layer, each taking the input of one pixel and 10 perceptrons in our output layer to represent each digit we might classify. If trained correctly, only the perceptron in the output layer will "fire" to represent the contents of the image (but this is a massive oversimplification!). -We can import this dataset from `sklearn.datasets` with then load it into memory by calling the `fetch_openml` function. +We can import this dataset from `sklearn.datasets` then load it into memory by calling the `fetch_openml` function. ~~~ import sklearn.datasets as skl_data @@ -180,11 +179,11 @@ data = data / 255.0 ~~~ {: .language-python} -instead of writing a loop ourselves to divide every pixel by 255. Although the final result is the same and will take about the same (possibly a little less, it might do some clever optimisations) amount of computation. +This is instead of writing a loop ourselves to divide every pixel by 255. Although the final result is the same and will take about the same amount of computation (possibly a little less, it might do some clever optimisations). -Now we need to initialise a neural network, scikit learn has an entire library `sklearn.neural_network` for this and the `MLPClassifier` class handles multi-layer perceptrons. This network takes a few parameters including the size of the hidden layer, the maximum number of training iterations we're going to allow, the exact algorithm to use, if we'd like verbose output about what the training is doing and the initial state of the random number generator. +Now we need to initialise a neural network. Scikit-Learn has an entire library for this (`sklearn.neural_network`) and the `MLPClassifier` class handles multi-layer perceptrons. This network takes a few parameters including the size of the hidden layer, the maximum number of training iterations we're going to allow, the exact algorithm to use, whether or not we'd like verbose output about what the training is doing, and the initial state of the random number generator. -In this example we specify a multi-layer perceptron with 50 hidden nodes, we allow a maximum of 50 iterations to train it, we turn on verbose output to see what's happening and initialise the random state to 1 so that we always get the same behaviour. +In this example we specify a multi-layer perceptron with 50 hidden nodes, we allow a maximum of 50 iterations to train it, we turn on verbose output to see what's happening, and initialise the random state to 1 so that we always get the same behaviour. ~~~ import sklearn.neural_network as skl_nn @@ -192,7 +191,7 @@ mlp = skl_nn.MLPClassifier(hidden_layer_sizes=(50,), max_iter=50, verbose=1, ran ~~~ {: .language-python} -We now have a neural network but we have not done any training of it yet. Before training, let us split our dataset into two parts, a training set which we will use to train the classifier and a test set which we will use to see how well the training is working. By using different data for the two, we can help show, we have not only trained a network which works just with the data it was trained on, this is known as over-fitting and can end up creating models which do not "generalise" or work with data other than their training data. +We now have a neural network but we have not trained it yet. Before training, we will split our dataset into two parts: a training set which we will use to train the classifier and a test set which we will use to see how well the training is working. By using different data for the two, we can avoid 'over-fitting', which is the creation of models which do not "generalise" or work with data other than their training data. Typically, 10 to 20% of the data will be used as training data. Let us see how big our dataset is to decide how many samples we want to train with. The `describe` attribute in Pandas will tell us how many rows our data has: @@ -224,14 +223,14 @@ This tells us we have 70,000 rows in the dataset. Let us take 90% of the data for training and 10% for testing, so we will use the first 63,000 samples in the dataset as the training data and the last 7,000 as the test data. We can split these using a slice operator. ~~~ -data_train = data[0:63000] -labels_train = labels[0:63000] -data_test = data[63001:] -labels_test = labels[63001:] +data_train = data[0:63000].values +labels_train = labels[0:63000].values +data_test = data[63001:].values +labels_test = labels[63001:].values ~~~ {: .language-python} -Now let us go ahead and train the network. This line will take about one minute to run. We do this by calling the `fit` function inside the `mlp` class instance. This needs two arguments the data itself and the labels showing what class each item should be classified to. +Now lets train the network. This line will take about one minute to run. We do this by calling the `fit` function inside the `mlp` class instance. This needs two arguments: the data itself, and the labels showing what class each item should be classified to. ~~~ @@ -239,7 +238,7 @@ mlp.fit(data_train,labels_train) ~~~ {: .language-python} -Finally, let us score the accuracy of our network against both the original training data and the test data. If the training had converged to the point where each iteration of training was not improving the accuracy, then the accuracy of the training data should be 1.0 (100%). +Finally, we will score the accuracy of our network against both the original training data and the test data. If the training had converged to the point where each iteration of training was not improving the accuracy, then the accuracy of the training data should be 1.0 (100%). ~~~ print("Training set score", mlp.score(data_train, labels_train)) @@ -261,11 +260,11 @@ data = data / 255.0 mlp = skl_nn.MLPClassifier(hidden_layer_sizes=(50,), max_iter=50, verbose=1, random_state=1) -data_train = data[0:63000] -labels_train = labels[0:63000] +data_train = data[0:63000].values +labels_train = labels[0:63000].values -data_test = data[63001:] -labels_test = labels[63001:] +data_test = data[63001:].values +labels_test = labels[63001:].values mlp.fit(data_train, labels_train) print("Training set score", mlp.score(data_train, labels_train)) @@ -276,22 +275,22 @@ print("Testing set score", mlp.score(data_test, labels_test)) ### Prediction using a multi-layer perceptron -Now that we have trained a multi-layer perceptron, we can give it some input data and ask it to perform a prediction. In this case, our input data is a 28x28 pixel image, which can also be represented as a 784-element list of data. The output will be a number between 0 and 9 telling us which digit the network thinks we have supplied. The `predict` function in the `MLPClassifier` class can be used to make a prediction. Let us try using the first digit from our test set as an example. +Now that we have trained a multi-layer perceptron, we can give it some input data and ask it to perform a prediction. In this case, our input data is a 28x28 pixel image, which can also be represented as a 784-element list of data. The output will be a number between 0 and 9 telling us which digit the network thinks we have supplied. The `predict` function in the `MLPClassifier` class can be used to make a prediction. Lets use the first digit from our test set as an example. -Before we can pass it to the predictor, we have to extract one of the digits from the test set. We can use `iloc` on the dataframe to get hold of the first element in the test set. In order to present it to the predictor, we have to turn it into a numpy array which has the dimensions of 1x784 instead of 28x28. We can then call the `predict` function with this array as our parameter. This will return an array of predictions (as it could have been given multiple inputs), the first element of this will be the predicted digit. You may get a warning stating "X does not have valid feature names", this is because we didn't encode feature names into our X (digit images) data. +Before we can pass it to the predictor, we need to extract one of the digits from the test set. We can use `iloc` on the dataframe to get hold of the first element in the test set. In order to present it to the predictor, we have to turn it into a numpy array which has the dimensions of 1x784 instead of 28x28. We can then call the `predict` function with this array as our parameter. This will return an array of predictions (as it could have been given multiple inputs), the first element of this will be the predicted digit. You may get a warning stating "X does not have valid feature names", this is because we didn't encode feature names into our X (digit images) data. ~~~ -test_digit = data_test.iloc[0].to_numpy().reshape(1,784) -test_digit_prediciton = mlp.predict(test_digit)[0] -print("Predicted value",test_digit_prediciton) +test_digit = data_test[0].reshape(1,784) +test_digit_prediction = mlp.predict(test_digit)[0] +print("Predicted value",test_digit_prediction) ~~~ -{: .langugage-python} +{: .language-python} We can now verify if the prediction is correct by looking at the corresponding item in the `labels_test` array. ~~~ -print("Actual value",labels_test.iloc[0]) +print("Actual value",labels_test[0]) ~~~ {: .language-python} @@ -299,14 +298,14 @@ This should be the same value which is being predicted. > ## Changing the learning parameters -> There are several parameters which control the training of the data. One of these is called the learning rate, increasing this can reduce how many learning iterations we need. But make it too large and we will end up overshooting. -> Try tweaking this parameter by adding the parameter `learning_rate_init`, the default value of this is 0.001. Try increasing it to around 0.1. +> There are several parameters which control the training of the data. One of these is called the learning rate. Increasing this can reduce how many learning iterations we need. But if this is too large you can end up overshooting. +> Try tweaking this parameter by adding the parameter `learning_rate_init` with a default value of 0.001. Try increasing it to around 0.1. {: .challenge} > ## Using your own handwriting > Create an image using Microsoft Paint, the GNU Image Manipulation Project (GIMP) or [jspaint](https://jspaint.app/). The image needs to be grayscale and 28 x 28 pixels. > -> Try and draw a digit (0-9) in the image and save it into your code directory. +> Try to draw a digit (0-9) in the image and save it into your code directory. > > The code below loads the image (called digit.png, change to whatever your file is called) using the OpenCV library. Some Anaconda installations need this installed either through the package manager or by running the command: `conda install -c conda-forge opencv ` from the anaconda terminal. > @@ -332,19 +331,19 @@ This should be the same value which is being predicted. > {: .language-python} {: .challenge} -## Measuring Neural Network performance +## Measuring neural network performance -We have now trained a neural network and tested prediction on a few images. This might have given us a feel for how well our network is performing, but it would be much more useful to have a more objective measure. Since recognising digits is a classification problem, we can measure how many predictions were correct in a set of test data. As we already have a test set of data with 7,000 images let us use that and see how many predictions the neural network has got right. We will loop through every image in the test set, run it through our predictor and compare the result with the label for that image. We will also keep a tally of how many images we got right and see what percentage were correct. +We have now trained a neural network and tested prediction on a few images. This might have given us a feel for how well our network is performing, but it would be much more useful to have a more objective measure. Since recognising digits is a classification problem, we can measure how many predictions were correct in a set of test data. As we already have a test set of data with 7,000 images we can use that and see how many predictions the neural network has gotten right. We will loop through every image in the test set, run it through our predictor and compare the result with the label for that image. We will also keep a tally of how many images we got right and see what percentage were correct. ~~~ correct=0 -for row in data_test.iterrows(): +for idx, row in enumerate(data_test): # image contains a tuple of the row number and image data - image = row[1].to_numpy().reshape(1,784) + image = row.reshape(1,784) prediction = mlp.predict(image)[0] - actual = labels_test[row[0]] + actual = labels_test[idx] if prediction == actual: correct = correct + 1 @@ -353,18 +352,18 @@ print((correct/len(data_test))*100) ~~~ {: .language-python} -### Confusion Matrix +### Confusion matrix -We now know what percentage of images were correctly classified, but we don't know anything about the distribution of that across our different classes (the digits 0 to 9 in this case). A more powerful technique is known as a confusion matrix. Here we draw a grid with each class along both the x and y axis. The x axis is the actual number of items in each class and the y axis is the predicted number. In a perfect classifier there will be a diagonal line of values across the grid moving from the top left to bottom right corresponding to the number in each class and all other cells will be zero. If any cell outside of the diagonal is non-zero then it indicates a miss-classification. Scikit Learn has a function called `confusion_matrix` in the `sklearn.metrics` class which can display a confusion matrix for us. It will need two inputs, an array showing how many items were in each class for both the real data and the classifications. We already have the real data in the labels_test array, but we need to build it for the classifications by classifying each image (in the same order as the real data) and storing the result in another array. +We now know what percentage of images were correctly classified, but we don't know anything about the distribution of correct predictions across our different classes (the digits 0 to 9 in this case). A more powerful technique is known as a confusion matrix. Here we draw a grid with each class along both the x and y axis. The x axis is the actual number of items in each class and the y axis is the predicted number. In a perfect classifier, there will be a diagonal line of values across the grid moving from the top left to bottom right corresponding to the number in each class, and all other cells will be zero. If any cell outside of the diagonal is non-zero then it indicates a miss-classification. Scikit-Learn has a function called `confusion_matrix` in the `sklearn.metrics` class which can display a confusion matrix for us. It will need two inputs: arrays showing how many items were in each class for both the real data and the classifications. We already have the real data in the labels_test array, but we need to build it for the classifications by classifying each image (in the same order as the real data) and storing the result in another array. ~~~ from sklearn.metrics import confusion_matrix predictions = [] -for image in data_test.iterrows(): +for image in data_test: # image contains a tuple of the row number and image data - image = image[1].to_numpy().reshape(1,784) - predictions.append(mlp.predict(image)[0]) + image = image.reshape(1,784) + predictions.append(mlp.predict(image)) confusion_matrix(labels_test,predictions) ~~~ @@ -372,23 +371,23 @@ confusion_matrix(labels_test,predictions) > ## A better way to plot a confusion matrix > The `ConfusionMatrixDisplay` class in the `sklearn.metrics` package can create a graphical representation of a confusion matrix with colour coding to highlight how many items are in each cell. This colour coding can be useful when working with very large numbers of classes. -> Try and use the `from_predictions()` method in the `ConfusionMatrixDisplay` class to display a graphical confusion matrix. +> Try to use the `from_predictions()` method in the `ConfusionMatrixDisplay` class to display a graphical confusion matrix. > > > ## Solution > > ~~~ > > from sklearn.metrics import ConfusionMatrixDisplay -> > ConfusionMatrixDisplay.from_predictions(labels_test,predictions) +> > ConfusionMatrixDisplay.from_predictions(labels_test,np.array(predictions)) > > ~~~ > > {: .language-python} > {: .solution} {: .challenge} -## Cross Validation +## Cross-validation -Previously we split the data into training and test sets. But what happens if the test set includes important features we want to train on that happen to be missing in the training set? We are having to throw away part of our data to use in the testing set. +Previously we split the data into training and test sets. But what if the test set includes important features we want to train on that happen to be missing in the training set? We are throwing away part of our data to use it in the testing set. -Cross validation runs the training/testing multiple times but splits the data in a different way each time. This way all of the data gets used both for training and testing. We can use multiple iterations of training with different data in each set to eventually include the entire dataset. +Cross-validation runs the training/testing multiple times but splits the data in a different way each time. This means all of the data gets used both for training and testing. We can use multiple iterations of training with different data in each set to eventually include the entire dataset. example list @@ -408,18 +407,18 @@ test = 1,2 (generate an image of this) -### Cross Validation code example +### Cross-validation code example -The `sklearn.model_selection` module provides support for doing k fold cross validation in scikit-learn. It can automatically partition our data for cross validation. +The `sklearn.model_selection` module provides support for doing k-fold cross validation in Scikit-Learn. It can automatically partition our data for cross validation. -Let us import this and call it `skl_msel` +Import this and call it `skl_msel` ~~~ import sklearn.model_selection as skl_msel ~~~ {: .language-python} -Now we can choose how many ways we would like to split our data, three or four are common choices. +Now we can choose how many ways we would like to split our data (three or four are common choices). ~~~ kfold = skl_msel.KFold(4) @@ -434,7 +433,7 @@ for (train, test) in kfold.split(data): ~~~ {: .language-python} -Now inside the loop, we can select the data by doing `data_train = data.iloc[train]` and `labels_train = labels.iloc[train]`. In some versions of Python/Pandas/Scikit Learn, you might be able to do `data_train = data[train]` and `labels_train = labels[train]`. This is a useful Python shorthand which will use the list of indices from `train` to select which items from `data` and `labels` we use. We can repeat this process with the test set. +Now inside the loop, we can select the data with `data_train = data.iloc[train]` and `labels_train = labels.iloc[train]`. In some versions of Python/Pandas/Scikit-Learn, you might be able to use `data_train = data[train]` and `labels_train = labels[train]`. This is a useful Python shorthand which will use the list of indices from `train` to select which items from `data` and `labels` we use. We can repeat this process with the test set. ~~~ data_train = data.iloc[train] @@ -455,9 +454,9 @@ Finally, we need to train the classifier with the selected training data and the {: .language-python} - Once we have established that the cross validation was ok, we can go ahead and train using the entire dataset by doing `mlp.fit(data,labels)`. +Once we have established that the cross validation was ok, we can go ahead and train using the entire dataset by doing `mlp.fit(data,labels)`. - Here is the entire example program: +Here is the entire example program: ~~~ import matplotlib.pyplot as plt @@ -485,19 +484,19 @@ mlp.fit(data,labels) ~~~ {: .language-python} -## Deep Learning +## Deep learning -Deep learning usually refers to newer neural network architectures which use a special type of network known as a convolutional network. Typically, these have many layers and thousands of neurons. They are very good at tasks such as image recognition but take a long time to train and run. They are often used with GPU (Graphical Processing Units) which are good at executing multiple operations simultaneously. It is very common to use cloud computing or HPC systems with multiple GPUs attached. +Deep learning usually refers to newer neural network architectures which use a special type of network known as a 'convolutional network'. Typically, these have many layers and thousands of neurons. They are very good at tasks such as image recognition but take a long time to train and run. They are often used with GPUs (Graphical Processing Units) which are good at executing multiple operations simultaneously. It is very common to use cloud computing or high performance computing systems with multiple GPUs attached. -Scikit learn is not really setup for Deep Learning. We will have to rely on other libraries. Common choices include Google's TensorFlow, Keras, (Py)Torch or Darknet. There is however an interface layer between sklearn and tensorflow called skflow. A short example of doing this can be found at [https://www.kdnuggets.com/2016/02/scikit-flow-easy-deep-learning-tensorflow-scikit-learn.html](https://www.kdnuggets.com/2016/02/scikit-flow-easy-deep-learning-tensorflow-scikit-learn.html). +Scikit-Learn is not really setup for deep learning. We will have to rely on other libraries. Common choices include Google's TensorFlow, Keras, (Py)Torch or Darknet. There is, however, an interface layer between sklearn and tensorflow called skflow. A short example of using this layer can be found at [https://www.kdnuggets.com/2016/02/scikit-flow-easy-deep-learning-tensorflow-scikit-learn.html](https://www.kdnuggets.com/2016/02/scikit-flow-easy-deep-learning-tensorflow-scikit-learn.html). ### Cloud APIs -Google, Microsoft, Amazon, and many others now have Cloud based Application Programming Interfaces (APIs) where you can upload an image and have them return you the result. Most of these services rely on a large pre-trained (and often proprietary) neural network. +Google, Microsoft, Amazon, and many other companys now have cloud based Application Programming Interfaces (APIs) where you can upload an image and have them return you the result. Most of these services rely on a large pre-trained (and often proprietary) neural network. > ## Exercise: Try cloud image classification > Take a photo with your phone camera or find an image online of a common daily scene. -> Upload it Google's Vision AI example at https://cloud.google.com/vision/ +> Upload it to Google's Vision AI at https://cloud.google.com/vision/ > How many objects has it correctly classified? How many did it incorrectly classify? > Try the same image with Microsoft's Computer Vision API at https://azure.microsoft.com/en-gb/services/cognitive-services/computer-vision/ > Does it do any better/worse than Google? diff --git a/_episodes/08-ethics.md b/_episodes/08-ethics.md new file mode 100644 index 0000000..04fa3b6 --- /dev/null +++ b/_episodes/08-ethics.md @@ -0,0 +1,67 @@ +--- +title: "Ethics and the Implications of Machine Learning" +teaching: 10 +exercises: 5 +questions: +- "What are the ethical implications of using machine learning in research?" +objectives: +- "Consider the ethical implications of machine learning, in general, and in research." +keypoints: +- "The results of machine learning reflect biases in the training and input data." +- "Many machine learning algorithms can't explain how they arrived at a decision." +- "Machine learning can be used for unethical purposes." +- "Consider the implications of false positives and false negatives." +--- + +# Ethics and machine learning + +As machine learning has risen in visibility, so to have concerns around the ethics of using the technology to make predictions and decisions that will affect people in everyday life. For example: + +* The first death from a driverless car which failed to brake for a pedestrian.[\[1\]](https://www.forbes.com/sites/meriameberboucha/2018/05/28/uber-self-driving-car-crash-what-really-happened/) +* Highly targetted advertising based around social media and internet usage. [\[2\]](https://www.wired.com/story/big-tech-can-use-ai-to-extract-many-more-ad-dollars-from-our-clicks/) +* The outcomes of elections and referenda being influenced by highly targetted social media posts. This is compounded by data being obtained without the user's consent. [\[3\]](https://www.vox.com/policy-and-politics/2018/3/23/17151916/facebook-cambridge-analytica-trump-diagram) +* The widespread use of facial recognition technologies. [\[4\]](https://www.bbc.co.uk/news/technology-44089161) +* The potential for autonomous military robots to be deployed in combat. [\[5\]](https://www.theverge.com/2021/6/3/22462840/killer-robot-autonomous-drone-attack-libya-un-report-context) + +## Problems with bias + +Machine learning systems are often argued to be be fairer and more impartial in their decision-making than human beings, who are argued to be more emotional and biased, for example, when sentencing criminals or deciding if someone should be granted bail. But there are an increasing number of examples where machine learning systems have been exposed as biased due to the data they were trained on. This can occur due to the training data being unrepresentative or just under representing certain cases or groups. For example, if you were trying to automatically screen job candidates and your training data consisted only of people who were previously hired by the company, then any biases in employment processes would be reflected in the results of the machine learning. + +## Problems with explaining decisions + +Many machine learning systems (e.g. neural networks) can't really explain their decisions. Although the input and output are known, trying to +explain why the training caused the network to behave in a certain way can be very difficult. When decisions are questioned by a human it's +difficult to provide any rationale for how a decision was arrived at. + +## Problems with accuracy + +No machine learning system is ever 100% accurate. Getting into the high 90s is usually considered good. +But when we're evaluating millions of data items this can translate into 100s of thousands of mis-identifications. +This would be an unacceptable margin of error if the results were going to have major implications for people, such as criminal sentencing decisions or structuring debt repayments. + +## Energy use + +Many machine learning systems (especially deep learning) need vast amounts of computational power which in turn can consume vast amounts of energy. Depending on the source of that energy this might account for significant amounts of fossil fuels being burned. It is not uncommon for a modern GPU-accelerated computer to use several kilowatts of power. Running this system for one hour could easily use as much energy a typical home in the OECD would use in an entire day. Energy use can be particularly high when models are constantly being retrained or when "parameter sweeps" are done to find the best set of parameters to train with. + +# Ethics of machine learning in research + +Not all research using machine learning will have major ethical implications. +Many research projects don't directly affect the lives of other people, but this isn't always the case. + +Some questions you might want to ask yourself (and which an ethics committee might also ask you): + + * Will the results of your machine learning influence a decision that will have a significant effect on a person's life? + * Will the results of your machine learning influence a decision that will have a significant effect on an animial's life? + * Will you be using any people to create your training data, and if so, will they have to look at any disturbing or traumatic material during the training process? + * Are there any inherent biases in the dataset(s) you're using for training? + * How much energy will this computation use? Are there more efficient ways to get the same answer? + + +> ## Exercise: Ethical implications of your own research +> Split into pairs or groups of three. +> Think of a use case for machine learning in your research areas. +> What ethical implications (if any) might there be from using machine learning in your research? +> Write down your group's answers in the etherpad. +{: .challenge} + +{% include links.md %} diff --git a/_episodes/08-learn-more.md b/_episodes/09-learn-more.md similarity index 56% rename from _episodes/08-learn-more.md rename to _episodes/09-learn-more.md index d185887..2b4bfde 100644 --- a/_episodes/08-learn-more.md +++ b/_episodes/09-learn-more.md @@ -5,31 +5,23 @@ exercises: 0 questions: - "Where can you find out more about machine learning?" objectives: -- "To learn more about machine learning" +- "Know where to go to learn more about machine learning" keypoints: -- "This course has only touched on a few areas of machine learning." -- "Machine learning is a large and growing field." -- "This course is designed to teach you just enough to do something useful." -- "Machine learning is a rapidly developing field and new tools and techniques are constantly appearing." +- "This course has only touched on a few areas of machine learning and is designed to teach you just enough to do something useful." +- "Machine learning is a rapidly evolving field and new tools and techniques are constantly appearing." --- # Other algorithms -There are many other machine learning algorithms that might be suitable for helping to answer your research questions. +There are many other machine learning algorithms that might be suitable for helping you to answer your research questions. -The Scikit Learn [webpage](https://scikit-learn.org/stable/index.html) has a good overview of all the features available in the library. +The Scikit-Learn [webpage](https://scikit-learn.org/stable/index.html) has a good overview of all the features available in the library. -## Ensemble Learning +## Genetic algorithms -Ensemble Learning is a technique which combines multiple machine learning algorithms together to improve results. A popular ensemble technique -is Random Forest which creates a "forest" of decision trees and then tries to prune it down to the most effective ones. Its a flexible algorithm -that can work both as a regression and a classification system. See the article [Random Forest Simple Explanation](https://medium.com/@williamkoehrsen/random-forest-simple-explanation-377895a60d2d) for more information. - -## Genetic Algorithms - -Genetic algorithms are a technique which tries to mimic biological evolution. They will learn to solve a problem through a gradual process -of simulated evolution. Each generation is mutated slightly and then evaluated with a fitness function, the fittest "genes" will then be selected -for the next generation. Sometimes this is combined with neural networks to change the network's size structure. +Genetic algorithms are techniques which try to mimic biological evolution. They will learn to solve a problem through a gradual process +of simulated evolution. Each generation is mutated slightly and then evaluated with a fitness function. The fittest "genes" will then be selected +for the next generation. Sometimes this is combined with neural networks to change the networks size structure. This [video](https://www.youtube.com/watch?v=qv6UVOQ0F44) shows a genetic algorithm evolving neural networks to play a video game. diff --git a/_episodes/ensemble_classification.md b/_episodes/ensemble_classification.md new file mode 100644 index 0000000..1499a4d --- /dev/null +++ b/_episodes/ensemble_classification.md @@ -0,0 +1,89 @@ +## Stacking: classification +import seaborn as sns +penguins = sns.load_dataset('penguins') + +feature_names = ['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g'] +penguins.dropna(subset=feature_names, inplace=True) + +species_names = penguins['species'].unique() + +# Define data and targets +X = penguins[feature_names] + +y = penguins.species + +# Split data in training and test set +from sklearn.model_selection import train_test_split + +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5) + +print(f'train size: {X_train.shape}') +print(f'test size: {X_test.shape}') + +from sklearn.ensemble import ( + GradientBoostingClassifier, + RandomForestClassifier, + VotingClassifier, +) +from sklearn.gaussian_process import GaussianProcessClassifier +from sklearn.gaussian_process.kernels import RBF +from sklearn.tree import DecisionTreeClassifier + +# training estimators +rf_clf = RandomForestClassifier(n_estimators=100, max_depth=7, min_samples_leaf=1, random_state=5) +gb_clf = GradientBoostingClassifier(random_state=5) +gp_clf = GaussianProcessClassifier(1.0 * RBF(1.0), random_state=5) +dt_clf = DecisionTreeClassifier(max_depth=5, random_state=5) + +voting_reg = VotingClassifier([("rf", rf_clf), ("gb", gb_clf), ("gp", gp_clf), ("dt", dt_clf)]) + +# fit voting estimator +voting_reg.fit(X_train, y_train) + +# lets also train the individual models for comparison +rf_clf.fit(X_train, y_train) +gb_clf.fit(X_train, y_train) +gp_clf.fit(X_train, y_train) +dt_clf.fit(X_train, y_train) + +import matplotlib.pyplot as plt + +# make predictions +X_test_20 = X_test[:20] # first 20 for visualisation + +rf_pred = rf_clf.predict(X_test_20) +gb_pred = gb_clf.predict(X_test_20) +gp_pred = gp_clf.predict(X_test_20) +dt_pred = dt_clf.predict(X_test_20) +voting_pred = voting_reg.predict(X_test_20) + +print(rf_pred) +print(gb_pred) +print(gp_pred) +print(dt_pred) +print(voting_pred) + +plt.figure() +plt.plot(gb_pred, "o", color="green", label="GradientBoostingClassifier") +plt.plot(rf_pred, "o", color="blue", label="RandomForestClassifier") +plt.plot(gp_pred, "o", color="darkblue", label="GuassianProcessClassifier") +plt.plot(dt_pred, "o", color="lightblue", label="DecisionTreeClassifier") +plt.plot(voting_pred, "x", color="red", ms=10, label="VotingRegressor") + +plt.tick_params(axis="x", which="both", bottom=False, top=False, labelbottom=False) +plt.ylabel("predicted") +plt.xlabel("training samples") +plt.legend(loc="best") +plt.title("Regressor predictions and their average") + +plt.show() + +print(f'random forest: {rf_clf.score(X_test, y_test)}') + +print(f'gradient boost: {gb_clf.score(X_test, y_test)}') + +print(f'guassian process: {gp_clf.score(X_test, y_test)}') + +print(f'decision tree: {dt_clf.score(X_test, y_test)}') + +print(f'voting regressor: {voting_reg.score(X_test, y_test)}') \ No newline at end of file diff --git a/_episodes_rmd/.gitkeep b/_episodes_rmd/.gitkeep deleted file mode 100644 index e69de29..0000000 diff --git a/_episodes_rmd/data/.gitkeep b/_episodes_rmd/data/.gitkeep deleted file mode 100644 index e69de29..0000000 diff --git a/code/Classification.ipynb b/code/Classification.ipynb new file mode 100644 index 0000000..7429888 --- /dev/null +++ b/code/Classification.ipynb @@ -0,0 +1,984 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 28, + "id": "c82a3270-a3a5-4a4d-af15-da2a8fe22e17", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import numpy as np \n", + "import pandas as pd\n", + "import seaborn as sns" + ] + }, + { + "cell_type": "markdown", + "id": "f3e361ed-e348-4bf4-9e7e-906fe175821c", + "metadata": { + "tags": [] + }, + "source": [ + "## Loading the Iris dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "7f0293df-38f1-4e5a-baaa-9368677e9582", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "dataset = sns.load_dataset('penguins')\n", + "\n", + "feature_names = ['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']\n", + "dataset.dropna(subset=feature_names, inplace=True)\n", + "\n", + "class_names = dataset['species'].unique()\n", + "\n", + "X = dataset[feature_names]\n", + "\n", + "# Y, class_names = dataset['species'].factorize()\n", + "Y = dataset['species']\n", + "class_names = dataset['species'].unique()" + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "id": "6113c8f7-c0e9-49f5-b55b-0d161cf7f6ee", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
speciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsex
0AdelieTorgersen39.118.7181.03750.0Male
1AdelieTorgersen39.517.4186.03800.0Female
2AdelieTorgersen40.318.0195.03250.0Female
4AdelieTorgersen36.719.3193.03450.0Female
5AdelieTorgersen39.320.6190.03650.0Male
........................
338GentooBiscoe47.213.7214.04925.0Female
340GentooBiscoe46.814.3215.04850.0Female
341GentooBiscoe50.415.7222.05750.0Male
342GentooBiscoe45.214.8212.05200.0Female
343GentooBiscoe49.916.1213.05400.0Male
\n", + "

342 rows × 7 columns

\n", + "
" + ], + "text/plain": [ + " species island bill_length_mm bill_depth_mm flipper_length_mm \\\n", + "0 Adelie Torgersen 39.1 18.7 181.0 \n", + "1 Adelie Torgersen 39.5 17.4 186.0 \n", + "2 Adelie Torgersen 40.3 18.0 195.0 \n", + "4 Adelie Torgersen 36.7 19.3 193.0 \n", + "5 Adelie Torgersen 39.3 20.6 190.0 \n", + ".. ... ... ... ... ... \n", + "338 Gentoo Biscoe 47.2 13.7 214.0 \n", + "340 Gentoo Biscoe 46.8 14.3 215.0 \n", + "341 Gentoo Biscoe 50.4 15.7 222.0 \n", + "342 Gentoo Biscoe 45.2 14.8 212.0 \n", + "343 Gentoo Biscoe 49.9 16.1 213.0 \n", + "\n", + " body_mass_g sex \n", + "0 3750.0 Male \n", + "1 3800.0 Female \n", + "2 3250.0 Female \n", + "4 3450.0 Female \n", + "5 3650.0 Male \n", + ".. ... ... \n", + "338 4925.0 Female \n", + "340 4850.0 Female \n", + "341 5750.0 Male \n", + "342 5200.0 Female \n", + "343 5400.0 Male \n", + "\n", + "[342 rows x 7 columns]" + ] + }, + "execution_count": 72, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "53b11b03-1216-4875-ba25-58564c6fa71f", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Split data into a training and testing set\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=0)" + ] + }, + { + "cell_type": "markdown", + "id": "fbf3b554-a187-4732-b1cf-3181a4fb834d", + "metadata": {}, + "source": [ + "### Visualising the Iris dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "fd0b8c2d-c2c7-4322-89c6-f3dacd4d990b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import seaborn as sns" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "79f48d94-abf9-486e-9224-d763f9657726", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "feature_names" + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "id": "17543ad5-10dd-45fb-9558-eae8cb1e863f", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Vary feature indices to show data from different perspectives - note that setosa is trivial to seperate\n", + "x = feature_names[0]\n", + "y = feature_names[1]\n", + "\n", + "fig = sns.scatterplot(X_train, x=x, y=y, hue=dataset['species'])" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "d766a032-ce19-4509-811c-42bcc2f663d9", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Vary feature indices to show data from different perspectives - note that setosa is trivial to seperate\n", + "x = feature_names[3]\n", + "y = feature_names[0]\n", + "\n", + "fig = sns.scatterplot(X_train, x=x, y=y, hue=dataset['species'])" + ] + }, + { + "cell_type": "markdown", + "id": "03a5a355-747c-4516-ace0-e9301c063c50", + "metadata": {}, + "source": [ + "## Decision tree classifier" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "e65bdaad-ea41-4e9b-bc81-c65d63e83fe2", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
DecisionTreeClassifier(max_depth=7)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ], + "text/plain": [ + "DecisionTreeClassifier(max_depth=7)" + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sklearn.tree import DecisionTreeClassifier, plot_tree\n", + "\n", + "clf = DecisionTreeClassifier(max_depth=7, min_samples_leaf=1)\n", + "clf.fit(X_train, y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "36b7e843-7770-4541-ba0b-4d17201d2b67", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0.9710144927536232" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "clf.score(X_test, y_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 69, + "id": "f3f04b11-f7be-4447-b637-dc03b1f4e7a4", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import matplotlib.pyplot as plt\n", + "\n", + "fig = plt.figure(figsize=(12, 10))\n", + "plot_tree(clf, class_names=class_names, feature_names=feature_names, filled=True, ax=fig.gca())\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "71cb6e2b-281a-4600-b82f-bf94c3f7fe5e", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkQAAAGxCAYAAACDV6ltAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3QUVfvA8e/MbEvddJKQEHoHaQoIKiCKCHYsiIiKqIgNOz87+lp49UWs2EXFgr2Ahd6kSO+9pJHe25aZ+/sjGojZTSDJJgHu55ycQ/ZOeWaB7JNbnqsIIQSSJEmSJEmnMbWxA5AkSZIkSWpsMiGSJEmSJOm0JxMiSZIkSZJOezIhkiRJkiTptCcTIkmSJEmSTnsyIZIkSZIk6bQnEyJJkiRJkk57MiGSJEmSJOm0Z2rsAE4WhmGQmppKUFAQiqI0djiSJEmSJB0HIQSFhYXExsaiqt77gWRCdJxSU1OJj49v7DAkSZIkSaqFpKQk4uLivLbLhOg4BQUFATAo4kZMqqWRo5GkU4uzfSwp59pwxjsZ1/NPLg/a3NghSVK9M2nxxGY8h/hyjcd2pX8Hcvr9TpFzXgNHdmorKjIY1Dez4nPcG5kQHad/hslMqkUmRJJUzwyTDc1mQ/VTsQWaCQyS0xulU1EKfu0NTG3iITmzclOAH8oF7clxPkSgVf7794WaprvIhEiSJEmSGsgR1300v+E9tG1xsPoQwu1G6R4P/WNJMe4C3I0d4mlLJkSSJEmS1EAMUUiS4zr8upxFcOcrUEQARepPFDl/QyZDjUsmRJIkSZLUwEqdayllbWOHIR1DJkT1yOJnIijUH0WVy/JPVsIQ5GUUobuNxg5FkiRJakAyIaoHigLnXXMGvS/ogMmsyTpFJzEhBAU5xXz23AIKs0saOxxJkiSpgciEqB6cd80ZDLisG6EhYaiKfEtPdgFB+Vwwtjffz1iOEI0djSRJktQQ5Kd3HVn9zPS+oAOhIWGYVVtjhyPVgwBbEK27x+IXZKOkoKyxw5EkSZIagCx2UEeBoX6YzJrsGTqFKIqKpqn4Bcp6U5IkSacLmRDVkaIqcs7QKUhRFDk5XpIk6TQiEyKp3jz6xAPced+Exg5DkiRJkk6YHOeR6s1jDz+FkLOQJUmSpJOQTIikehMUFNzYIUiSJElSrcghs1PMb/PnccmoYXTv24G+5/XgptvHUFJaUjGc9cbMV+k3uBe9BnTlyef+D6fLWXGuYRi888GbDLl4IN37duDSay7it/mVd13eu28Pt999C70GdKXn2V24/uarSUw6DFQdMqvpevkF+Tww5V76De5F974duPCSQXz7wxwfv0OSJEmSVJXsITqFZGRm8MCUe3jo3kcZOmQYxSXFrNvwV8Uw1qq1f2K1Wvn0vS9JSU1mylMPEWoPZfLdDwHwzgdv8dO873nm8f/QskUr/lq/hoceu4+w0DDO6tOP9PQ0bhh/DWf16cesdz8nMDCQDRvX4dY9779T0/VmvPkK+w/s5b03PiY0JJTEpMOUOeQyd0mSJKnhyYToFJKZlYHb7eaC8y+ieWwcAB3adaxot5jNPP/0f/Hz86Nd2/bcc+dkpk1/gXsnPYDb7eKdD97ko3c+o+cZvQGIj2vB+k3r+OqbzzmrTz9mf/UJgYFB/O/F1zGbzQC0SmjtMRan01Hj9VLTUunUsQvdunQHIK55vM/eG0mSJEmqjkyITiEd23eif98BXHL1RQzsfy4D+5/DsAsuxh5sB6BD+074+flVHN+zey9KSoo5kpZKSUkJpWWl3HLH2ErXdLlcdOrYGYCdu3fQp+eZFclQdQ4nHq7xeqOvHsM9D05kx85tDOh/LkMHX0ivHr3r9B5IkiRJUm3IhOgUomkaH838jA2b1rNy1TI+/XIW0994mTmf/VDjuSWlxQC88/qHNIuKrtRmsZQXKLTZjr8S9/Fc77yBg1k8byVLVyxm5eoV3HT79Yy59kYeuf+x476PJEmSJNUHmRCdYhRFoXfPPvTu2YdJt9/L4OEDWLDodwB279lJWVlZRWKzaetG/P0DiImOxW4PwWKxkJqWyll9+nm8dod2Hfn+529xuVw19hK1ad2uxusBhIWFc8Wlo7ji0lF8+c2ZTJv+gkyIJEmSpAYnE6JTyOatG1m15k8G9D+H8LBwNm/dRE5uDq1btWH3np04XS4ee/phJk64m5TUZF5/ezo3XHcjqqoSGBDILTfexgsvP4swDHr3PJPCokI2bFpHYEAgV1w6ijHXjePTL2dx/6N3c9stdxIUGMSmrRvp3vUMWrdsUymW47nejLf+R5dOXWnXpj1Op5MlyxbRplUbL08nSZIkSb4jE6JTSGBAEH9tWMOs2R9SVFxIbEwcjz7wGOcNHMyvv/9C/7POJqFFS8aMvwan08nIiy7l7jvuqzj/vkkPEBYaxjsfvkVychJBQcF07tSFO8ZPAiA0JJRZ737Of6c/z9jx16JqGp06dKZ3jz4e46npemazmf+9Po2U1GRsVhu9e57J/156w+fvkyRJktT02cy9CWEMGsGUKZvJdX+GYeT67H6KkKWFj0tBQQF2u52hUbdiUo9u+hkRZ+fW50cQFRmN1oQ3eH30iQcoKCzgrVffa+xQmjxduMnITOP9/5tLVnJ+Y4dzWnB2jCN5iA1HCycT+ixnVPCGxg5JkqRGY6a59W3Mu6ywdC+iuBSlVTTKRZ3ItLxKkfuPE7paUaFBny7p5OfnExzsvYBw0/0ElyRJkiTptBNleQLTzwWI7YkVr4ndSYi9yUTecTdl/jtw68n1fl9ZqVqSJEmSpCZBUfzwL+sBxyRDFQyB+HEb4drtPrm37CE6Tbz47CuNHYIkSZIkVctiagt7cry2i5R0rMbZPrm3TIgkSZIkqclRsVl6oSoBOFw70Y2Mxg6oQQjhAGs1qYmqgOKbqc+NPmSWkpLCDTfcQHh4OH5+fnTr1o1169ZVtAshePLJJ4mJicHPz4+hQ4eyd+/eStfIyclhzJgxBAcHExISwvjx4ykqKqp0zJYtWzjnnHOw2WzEx8czbdq0Bnk+SZIkSToRdvM1tNR+InrLXUStGEN8/ts0t7yLqtgbOzSfc7r3Qms7KJ7bla6tKFRObFL18WrUhCg3N5cBAwZgNpv59ddf2bFjB6+88gqhoaEVx0ybNo3XXnuNmTNnsmbNGgICAhg2bBhlZUc3AR0zZgzbt29n/vz5/PLLLyxbtozbbrutor2goIALL7yQhIQE1q9fz3//+1+efvpp3n333QZ9XkmSJEmqTrBpFGGHrkK8sgDx6ybEih2I95dj+iSdOMvHQM1bJ53cBDnMQrnqrKpNIUEwvA15ri99cudGHTJ76aWXiI+P56OPPqp4rVWrVhV/FkLw6quv8vjjj3PZZZcB8Mknn9CsWTN++OEHrrvuOnbu3Mlvv/3GX3/9RZ8+5fVwXn/9dS6++GJefvllYmNjmT17Nk6nkw8//BCLxUKXLl3YtGkT//vf/yolTpIkSZLUeFTCxE0YXy+Af48KHclGXZ5B0MARFDp/aIzgGkyB+xvUVn6E3H8trE9DyXMhOoegt1A4ok9AiGKf3LdRe4h++ukn+vTpw9VXX01UVBQ9e/bkvfeO1sk5ePAgaWlpDB06tOI1u91O3759WbVqFQCrVq0iJCSkIhkCGDp0KKqqsmbNmopjzj333Io9tACGDRvG7t27yc31XOTJ4XBQUFBQ6UuSJEmSfMVm6Qa7sqsmQ38Ta/di58qGDaqR5Lk/5ZC4goyzPyNrxK+kxj1JkvM63Hqqz+7ZqAnRgQMHePvtt2nXrh2///47EydO5J577mHWrFkApKWlAdCsWbNK5zVr1qyiLS0tjaioqErtJpOJsLCwSsd4usax9/i3F154AbvdXvEVHx9fx6eVJEmSJO8UxQ+Knd4PcLlRxKk+ZHYsFyVlSygs/al8bpGPNeqQmWEY9OnTh+effx6Anj17sm3bNmbOnMm4ceMaMzSmTJnC/fffX/F9QUGBTIokSZIkn3G69kCHCFjs5YAW0ZQpO1GVYMLMtxOgD0RxCgybmzxlDgXOr/HavdTEmLV4wrV7sTrbgW5g+BWRJWZS6lrZaDE1akIUExND586dK73WqVMnvv32WwCio6MBSE9PJyYmpuKY9PR0evToUXFMRkbl5Yhut5ucnJyK86Ojo0lPT690zD/f/3PMv1mtVqxWay2f7OSSmZXBOx+8xdIVi0lLP0JQYDAt4hO4dMTlXHHJKPz8/OrlPmPHX0vHDp157OGn6uV6kiRJpxLdyMFpT8HcIgoS/7XMXlFQL+lKvniIePNnKN/uQexbjAAUTSNs4AUE9D+XI467aepJkdXUlVjXNMQn6xAZ5dmf6m+j2SX3UJDQlRzXO40SV6MOmQ0YMIDdu3dXem3Pnj0kJCQA5ROso6OjWbhwYUV7QUEBa9asoX///gD079+fvLw81q9fX3HMokWLMAyDvn37VhyzbNkyXC5XxTHz58+nQ4cOlVa0nY6SkhO54roRrFy9nMl3PcQPX87jq0++49abbmfJskX8uWZFY4coSZJ02jjiegQxph3K4K5gLZ/3qrSMRZ00lEz/NwlXJqLM2oLYd8zWFbqOWLod6zp/Ai0XN1Lkxy9aeQ5j5hJExjEFGEvKEF+tIjh3OCYtrlHiatSEaPLkyaxevZrnn3+effv28fnnn/Puu+8yaVL5buiKonDffffx3HPP8dNPP7F161ZuvPFGYmNjufzyy4HyHqWLLrqICRMmsHbtWlauXMldd93FddddR2xsLADXX389FouF8ePHs337dr766itmzJhRaUisSTAMrGkZ+B84jDUtAwzD57d8+vnH0TQT337+MxcPG0mb1m2Jj2vB0MEX8u4bHzHkvPIJ7QUF+Tz2zCP0G9yLXgO6cuOE0ezavaPiOq+/PZ3LrhnOD798x5DhA+g9sBuTH7mLouLyelCPPvEAa9ev4ZPPP6JDj5Z06NGS5JQkANauW82oMZfR9cz2DBx6Ji/PeBG3211xbafTwXMvPU3/wb3pdlZ7Rt80ii3bNvv8vZEkSWpoQhSTWHYdmWd9jX5fR4xHzqL42lwSrbdQrC/BWtISkZbl+dxlOwkVYxo44hNjs5yBsqcAyrzMlfptF6HqTQ0a0z8adcjszDPP5Pvvv2fKlClMnTqVVq1a8eqrrzJmzNG/0Icffpji4mJuu+028vLyGDhwIL/99hs2m63imNmzZ3PXXXdx/vnno6oqV111Fa+99lpFu91u548//mDSpEn07t2biIgInnzyySa15N7vcDKhazdgKimteM3t70fuWb0oTfBNtpybl8vKVcu5/+6H8Pfz93iMopRXx7r3oUlYbTbee+NjggKD+Orbzxl3+xh+/3ExIfYQABKTE1m4+A9mvv4hBQX53PfwJN778G0m3/0Qjz38FIcOH6Rd2w7cc+dkAMJCw0lPT+O2u27miktH8dJzr3Dw4H4ef3YKVouVuyeWHzdt+gv8vuBXXnz2ZZrHxPH+xzO59c4b+ePnpRX3liRJOnXoFDt/p5jfK71q0uIgq8jLOYDThepq2lM9zFoLlIPFXgf1RFoWVjo1aEz/aPStO0aOHMnIkSO9tiuKwtSpU5k6darXY8LCwvj888+rvU/37t1Zvnx5reP0Jb/DyUQsqTqRTCspJWLJSrIGDfBJUpSYdAghBK1atq70et9BPXE6HABcf+1YBp83lC3bN7Nq0ToslvL/bI/c/xgLFv/B7/Pnce2o6wEQhsELU18mMCAQgEtHXMmqtSuZzEMEBQVjNpux2WxERhxdFfj5nE+Jjo7hySlTURSFNq3akp6ZwcszXmTS7fdS5ijjy69n88LUlzlv4GAAnn3yRVZePJBvvv+KW2/yzSZ/kiRJTY1u5EKI519eAdA0hFkHl/dDGptbT0fE+IGXTn4lzI5L8bz629caPSE67RkGoWs3AFUrlSuUT40LXbuB0vhYUBtmhPObz37EMAwe/L97cTqd7N69k5KSYvqe17PScWWOMhKTD1d83zw2riIZAoiKiCQ7J7vae+0/uI+e3XtV9EQB9O7Rm5KSYtLSj1BQWIDL7aJXj94V7Wazme5dz2D/wX11fVRJkqSThhDFuIKzMIUEQV5hlXalT1vylO8bIbLjV+r8C7pFwHwVdA/TQi7oQK7xWMMHhkyIGp01I6vSMNm/KYCppBRrRhaO6Civx9VGi/iWKIrCwUMHKr0eH9cCAJu1fFiyuLSYyIgoPn2/arn0oKDgij+bTP/656QoiAaYByVJknS6yNCfofmt78DHaxFZeRWvK10S0IcEk1/2TeMFd1wEGbxMs/H3Y3z2J5T8vQ2XqqAM7kpp8904nb6vOeSJTIgamVZNMlSb405EaEgoA/oN5LMvP+GG0Td5nUfUpWNXsrIz0TSNuOa1r8VkNlsw/pUgtWnVlt8X/ooQoqKXaP2m9QQEBBLdLIaQkFDMZgsbNq2neWz5sKHL5WLr9i2MG3NLrWORJEk6Gbn0JJK1CUTd8igWR28ocUGwhWLTcrLKHqFJj5f9rcS9hCP2AiLuug9TcQi43IgQjVy+It9Z/fQXX5IJUSPT/Y+vxs/xHneinvq/5xh901Vcdf0l3H3HfXRo1wlFVdi6fQsHDu2nS+dunN1vID2692LS5Nt46L4ptExoRUZmBkuXL2LokGF069L9uO7VPDaOzVs3kZyShL9/ACH2EK6/ZiyzZn/Isy8+xZjrbuTgoQO8PnM6N98wHlVV8ffzZ/TVY5g2/Xnsdjux0c15/+OZlJWVMuqKa33ynkiSJDVlbj2FVP1uUFWUIBvCKIFqClw3RWXuDSRzI1jMKFYTwlX/v/SfKJkQNTJHVARufz+0ktIqc4igfA6R7u+HIyrCJ/dvEZ/A91/O450P3uSV16aRnp6G2WKhbeu23HLjbVx/zVgUReHdNz7i1TdeZspTD5Gbm0NERCR9ep1FRPjxx3XLuAk8+sQDjLjqAsrKylg4dzlxzeN5942PmDb9BeZcczEhdjujLr+GiRPurjjvwXsfQQjBw4/fT3FxEV07d+f9tz7BHmz3xVsiSZLPaVjNnQAVp2sXohE+zTU1CrMpDreegVtPrvmEJslAiJLGDqKOXAjhvVfLYmqLqgbhdB/EMPJ8GokihGjaJS2biIKCAux2O0OjbsWkHt0kNiLOzq3PjyAqMhpNqV1+eewqs2OTon/+Yny1ykzyTBduMjLTeP//5pKVnN/Y4ZwWnB3jSB5iw9HCyYQ+yxkVvKGxQ5J8JMx8O8H6JbA3u3xSbbsIiq0ryXROA3w/59CkxRKtvYApKxBS8iEiACNWIV1MxeHe5vP7S8cnwDyYCO5DOVQEBQ5oGYrTnkKaawqGqDqhvDpFhQZ9uqSTn59PcHCw1+NkD1ETUJoQR9agAVXqEOk+rkMkSZLUkCLMDxG4oSNiwYJKrwec1RZt8DTSnA/69P6aGkac8h7indWIY1ZpKTYLsbdPI8VyH073Hp/GINUswDSIqOx7MT5eijhmJZq5eQRxY2eR5LjOJ72KMiFqIkoT4iiNj8WakYVWUnp0mKyBltpLkiT5kqqGEFg0ELGg6s6lYu0+bB3OxtwsAZd+2MPZ9SPMdCd8ua3qkvUyJ8aHK4ma+AjJjPfZ/aXjE6Hch/HxsqrL8lOyUJekE3Te5RQ459T7feWnbVOiqjiioyhpnVC+xF4mQ5IknSKCzMNhRaL3A5YexK5d49MY/N19EIePeG4sLMZUHAmYfRqDVD2zqRVKYonnGkWAWLcPu7jcJ/eWn7iSJEmSz2kiBAod3g8oKUUzfLxQwlXDHKUyJ6rStLe+ONWpSlD5nCFv3DqK4ZvBLTlkJkmSJPlcKZsI7tQXDnpZ0dW2GaXqQp/GYNgcqFYLOLzMPwm1YejV7BUm1RtNjSLMdDv+7p6gG7it2WSJN3G5D0LLMLBZ0Pr3QG3ZHHQdDAN9zVZEbgEONRn0+o9JJkSSJEmSz5U6VyG6PQpLbEerE//DbIKBCRQ6f/JpDDnKLCIvuB7xS9VVjErvNhRq833yQStVZjG1JdZ4HfHFZkhcBoDJHkjs5U+T0+xbnPZU/G+5En3BalxL1pYvufa3YRp8FqJtM3LwTVFeOWQmSZIkNYgj+gMod56L0jmhosaI0qY56l1DSFOe9Hk9omLnb5R03o96dV+w/73vop8V5cIzcF3gR7bzDZ/eXyoXrU5DzFwGicds4ppfhDFrGWGF16GJYNxzfsfYc+ho/ZmSMtxzl0F6JirVbHBbB7KHSJIkSWoQTn0ficpoQkaOIXDkeSAUitV15LpvRHdnNUgMGa5nsLXqTdjt4zEZ4RhqMTnKp5Q4FnP001fyFau5C+q+EkSx58rUYvEBzOd2Rs/6y2O7/us6wm6/lVQm1XtsMiGSJEmSGowh8slxvUUObzVaDGWu9aSyvvwbOUTWoMxaK5R9hd5TzzInJOd6v0B+ESYj0hehySEz6cS8/vZ0Lrtm+HEfn5ySRIceLdm5azsAa/5aRYceLSkokBWgJUmSvDFrLQi33EeU6QkCLRcDWmOHVC90kYOIrGYln0mF8EDv7VYLhlrNKrQ6kAmRxMbN6+nUqzW33XWzz+/Vs0dvVixYS1CQ9/LpkiRJpy+VaMsrNM98haA5cfh/YiFi9eW0NP+M1dS5sYOrs1LHaujRDFRPu3cCvWPRYw2wWjw2K/3bk6d84ZPYZEIk8c0Pc7jhunH8tWEt6RnpPr2XxWwhMiIKRfHyn0GSJOk0Fmn+P2wLLYhZKxEHkxHp2YilOxCvLSHGeBlVOdl/mTTIUt5EvfHc8tWFx1DO7oCjVRrpxuOoE84Df1vl9s4tcPe3UuT8zSeRyTlETYghdA46dlKo5xKkhdLK2glV8W03aXFJMfN+/4VvP/+JrOxMvv/pG+649ehktXc/fIuPP/uQ0rJShl84grDQsCrX+Pq7L/nw0/dITkmieWwcY0ffzJhrx3q835q/VnHjhNH8tWwzwX/vVr9u41/877VpbNuxhdCQMC4YMoz773kYfz/frCSQJElqihTFjwBHX8Q6D/WYypzw6x5CLhlDjvPthg+uHhW552FEFhEx+W7UbBUcLogOpED9lRzHW4Ag1fYIUXc9ipbnD8UOiAqk2LKGrLJH8dUmwDIhaiK2lazip5wPyNezK16za+FcGjaerv79fXbfX/+YS+uWbWjdsg2XjriC5/87ldvH34miKMz7/Rden/kqT055lt49+/DjL9/z6RcfE988vuL8n+b+wIy3/8eTj06lU8cu7Ny1nSemPoq/nx9XXDqqxvsnJh1mwp3juHfSAzz/9DRycnN49sUnefaFJ3lh6ss+e25JkqSmxmbuBlsyvLaLnYcJGHkuOZzcCRFAiXsZiSxDDQlBwYKuZ4J+dKq1Q99Gkn4DamAQanAAbj0LnG6fxiSHzJqAbSWr+DRzWqVkCCBfz+bTzGlsK1nls3t/8/1XXDricgDOOfs8CosKWbtuNQCfzP6QUZdfy9VXXEvrlm2YfNeDtG3dttL5r8+czqP3P8aF519EfPN4Ljz/IsbdMJ6vvvn8uO7/zodvccnFl3HTDeNpmdCKXj1689gjT/PDL9/hcJTVfAFJkiQfUNVQ/K2DsJg6NuBdRfWfysrfx9QTRQnAbGqDplbt+W8ohpGHbmTg7bkUxQ9FCUBRfL/HnOwhamSG0Pkp54Nqj/kp50M6+51V78NnBw7tZ+v2zbw5/R0ATCYTF184km9+mEPfM/uz/+A+rrt6TKVzenTvxZq/yhO0ktISEpMO89gzj/DE1CkVx7h1N0GBxzfOvWv3Tnbv3cXP836seE0IgWEYJKck0+ZfCZgkSZIvqWoYLSxfYioKQuzPAHsASrSdDOVVChyzfXrvMucW6NwM/tjqsV3p2pIiZVGd76MqdqLNz2IpSoC0fAgJQI8oI914Fqd7V52vXx9spjOJUh9GTRdQWAbNQyj120mG62mE8FzDqK5kQtTIDjp2VukZ+rd8PYuDjp20sXWt13t/8/0c3G4351zQt+I1IQQWi4UnH32mxvNLSooBePaJFzmjW49Kbap2fMlbSWkx1426nrGjb6rSFhMTe1zXkCRJqh/+tLbOx/35IlyHUo6+bLUQdfOdKCEq+Y5PfXZ3gYMiy1ICz+6A+HN35cYAPxjWjnzXY3W6h6L4E2/5BD7ZikhdXPG66m+j+YTppFjuw+neXc0VfM/P1JfowicwPlyKcLqOvt4mhrirPyLJcQNQ/8NnMiFqZIV6NQWoanHc8XK73fz4y7c8+sDjDOh/TqW2SZNv45fffqJNq7Zs3rqJyy+5qqJt89aNFX+OCI8kKrIZSSmJFcNuJ6pzx67sO7CXhBYta3W+JElSfYnyfwJj4WbEsckQgMOJ68MfiJx8L/n4LiECyHK9jHbOM/h1PReWJ5ZPKO4chTgjnBTjbgxRt81nQ83jUH45hEj9V2XwkjKMD5YTNen/SGZcne5RV5HKIxgfLAFX5aRH7D+CtiKEoAEjKXT+UO/3lQlRIwvSQuv1uOO1ZNlC8gsKGHX5NVVqAl14/kV88/0cxo+bwKNPPkjXzt3o1bMPP8/9gb3791aaVH3PxMk8N+1pggKDOGfAeTidTrZt30JBYQE3j721xjgm3HwH1954BVNfeJKrr7gWPz9/9h3Yy5+rV/DklKn1+sySJEnVCXIPwb3+a8+NThciKRtb/JmUOT1vK1E/BOnOJ9GCIgm+9FJUEUSp8gslzuX1cvVAYyhi+wrPjUUlmPKCUQMC65x41ZZZa4Ga7ES4PPcAiTV7CTl7FIX8UO/3lglRI2tl7YRdC6922MyuRdDK2qle7/vND3M4u+8AjwUSh50/nPc/fofWrdpy54R7+O+MF3E4HAw7/yJGXz2GFX8uqzj26iuvw2bz44NZ7zBt+gv4+/nRvl0Hxo05vt2IO7bvxKfvf8Wrb7zM9bdcA0IQH9+Ciy+8pN6eVZIk6XgoOlV6JY4l8goxt4ylIZZ76EYmuc7q55fWhuICRDUTswvLUIOCMPTGSYhUNQTyqpkj5HKjGJ6LNtaVTIgamapoXBo2nk8zp3k95tKwW+p9QvXM17z/R+verQe7Nx0CyhOWY+sSATx035RK319y8WVccvFlHq8V1zy+4loAfc/sX+l7gO5dz+DDmb7thpYkSaqJYSqDkCDIK/TYrraIptSxroGjql+6pQTNZoUyL9tfRAWWL3FvJC49EVqEeG1XQoNxqmmgawRZLiVEXI2imzG0QnKUjylxLvN6bk3ksvsmoKt/f8ZGPoxdC6/0ul2LYGzkwz6tQyRJkiSVy1Rfw3TRAI9tSkwkRqiO20jx2H6yyFU+RrnA8wIdpV1zSq2bAJfH9oZgGHk47WnQLNzzASO6kcNHxNtmEb5yOOr0DSivrER7Yw/Ndk6gmeXFWt9b9hA1EV39+9PZ76wGr1QtSZIklSss/RZ7y6uwXjsM968roaAIVBW1Wzu0Ef045Li0sUOss2LnQgI7n4efdiZi/jYoLgWThtK7HfqQUDIcNzV2iKS5/4/4mz9G+S0RseUAGALsgSgXd6cwdhnBykWoP2Uith8+elJJGcZP6/Fz9iTwjBEUOeee8H1lQtSEqIpW70vrJUmSpOOX7Lie4DbXE3HnnahOM5hUik1/ke4YgmHU72rfxpLufBL/DucQ1v5mVHcQwqSTr3xPQdl3NGbv0D8MI5dE53WEDBtN0AUXga6gm7LJ5kXKXJtoqf6E2L7A47li0TZCu4+lCJkQSZIkSVKdFJR+TgF/V9t30RRyhHpX4lxOCX+vXGuCzydEKbnOD8nlw/IX/o5RVUMgu8T7iU4XqsNSqwlBMiGSJEmSJKneWUxtCVDPBxSKjUU43XvqfE1hlEKArfqDzCroJ35tmRBJkiRJklRvFMWfWMvrmJMCUdakIITAfuZ5uBMcpLruxhCeV/EdD4EDV2AmpuDA8jle/753u3iKtOW1SojkKjNJkiRJkupNrOUNTF9kImavwtiXiNifhPhyDdrsFGItb9X5+hn686i3nF2+nckxlMgwxJXtyXG9V6vryh4iSZIkSZLqhcXUDvNhP8ThtKqNyZmY9iZgbdMNh8vzBrbHw6XvJ8V0L80mPYWW7QfZxRATjDM4jXT3TRiioFbXlQmRJEmSJEn1IkgdBqurqdW0JpmgNsNxUPuECMCp7yVJvwEtJAItLAy3nobhrF0i9A85ZCbVqEOPlixY9LvX9jV/raJDj5YUFOQ3YFSSJEneKYoNTY1CwTfbPBwPTQ1DVWu/D6WqBKKpkXj7qFYUPzQ1Cm99GwrWv98Da61jOHEqCMN7szBQOP76eooS8Pd74Pkc3cjC6d5T616hY8keIonMrAxmvv8mS5YvIj0jnfCwcDp16My4MbfQv6/nqq3H6tmjNysWrPW4L1ptvP72dBYs/oMf5/xaL9eTJOn0YdJiidKewFIcC/klEBpAmd8eMtzPYRh5DRKD3Xw1IdyAkuEARUFEmsjhAwpdvxzX+TZTDyLVh9By/aDMBZEBFGqLyXa9CuiYTa2IUh/HXBAORWUQHkCpdRMZrucRohRNjSDK9CTWkpaQWwKh/jj8D5LhnopueN83sz4UicUE9XkUDqV6PqBPHIXMqPE6VlNnItVHMeUHQbEDIgMpNq8k0/kyvqoTIBOi01xyShKjbxpFcFAwD0/+P9q364Db5WbFqmU888IT/PbDohqvYTFbiIyIaoBoK3O5XJjN5ga/ryRJTZNJiyZOeR/eX4vI3l3xurV5JPFjZ5HkvKFOK5yOR7h5MkE7eiLmLUEYf2+iqqlEXDEGU+tm5Lqq37DVz9SP6KLHMD5agThmv7GgXq2xDZtJhvsFmrvfRHy4EpF/dNjJr00M8dd8QqrzbuJMH8BH6xEZR39+W5uFE3/TLJLcN6IbOfX70MdwuLaitwU1OhzSKidfSmQoRicbZY711V7DajqD2NLny9+D4qMbvfp3SSDu0g9ILrsJqKYXqpbkkFkTYgjYVaSxOs/EriINo5oNievLM88/gaIofD37R4YNHU6rhNa0a9uem8feypxPvq84Ljcvl0mTb+OMfh258JJBLFwyv6Lt30Nm3/34NX0GdmP5n0sZfsX59OzfmfF33khGZkalc0aNuYwe/TrRZ2A3rht3FSmpyXz349e88c4Mdu3ZSYceLenQoyXf/fg1UD509/mcT7nj3lvp0a8TM99/A13X+b+nH2bIxQPp3rcDwy4bwqzZH1Z6xkefeIA775vAGzNfpd/gXvQa0JUnn/s/nC6nL99aSZIaWKT2KMxah8jOq9yQkony3T7CzLf59P6aGkFQwXmIXzZQ6Qe4bmB8swZ76eWoSmC114hSHsV4f2mVzVfFhgOYN1uI0V5BvL8C8isvORf7j6D+kUGMbQbM3oLIqJz0iPRs+GI74aZ76/aQxyHFNQnjpvYol/VGiY1EiYlAuaQnxvjOpLgm1nh+M+UxjPeWlm8rcqzthzGtdhJoGeaTuGUPUROxPl/j8yNWcl1Hc9RQs8H1MQ5622tRUOE45OXnsfzPpUy+60H8/fyrtAcH2yv+/MY7M3jovkd5ePL/8ekXH/Pg/93H4l9XEmIP8XjtsrIyPpz1HtOem46qqjz02H289L//8MoLM3C73UyafBtXXzma/734Gi6Xiy3bNqEoChcPu4S9+/ewfOVSPnrnMwCCAo8Oxb0xcwYP3PMwjz30JJqmYRgG0VHRzPjvW4SEhLJx03qefHYKkRFRXDxsZMV5q9b+idVq5dP3viQlNZkpTz1EqD2UyXc/VE/vpiRJjUvB6miLyPCypcOeJAL0wWTxis8isJuuhsUHvB+w4jBBw0aS7/jSY7PF1B7lQBHC5fbYLlYfwNxiCO7CYs/tG/djPb8fruRVntsTj+DvOr/6h6gHhpFHkuNabB16E9ThIhRUCpU3KXWsrfFck9Yc9YhAODz/wir+3EPImaMpov6nVMiEqAlYn6/xZmLVypu5LoU3E21MalHmk6QoMfEQQghat2pT47FXXDqKkcMvA+D+ex7m0y8+Zsu2TZw7YJDH411uF888/h9axCcAMObacbz1bvm4cVFxEYVFhQw+d0hFe5vWbSvO9ffzR9M0j8NwI4dfylWXX1PptXvuvL/iz/HN49m0ZQO/zZ9bKSGymM08//R/8fPzo13b9txz52SmTX+Beyc9gKrKjlJJOtkpig2Kq+/1VZzC29zcemESzSC3arHACjnFmIxmXps1LRyySr22IwQi33MyBIBhgNNzMlXBaUADzTQoc62njOqHx/5NU8Or35rD4UQ1aqhUXUuN+knw9NNPoyhKpa+OHTtWtA8aNKhK+x133FHpGomJiYwYMQJ/f3+ioqJ46KGHcLsr/4NYsmQJvXr1wmq10rZtWz7++OOGeLzjYgj4/Mg/KwCUf7WWf//FEatPhs8Ex3/RDu2O/r34+/kTGBhETo73yXl+Nr+KZAcgKjKS7L+PD7GHcOWloxh/543ccc94Zs3+sNJwWnW6du5e5bXZX37ClaNH0m9wL3r278ycb78g9UjlCX0d2nfCz+9oEa+e3XtRUlLMkTQvE/8k6RTlZz6TWPPbtDB9RXPze/iZa1440bRoBFpGEGeeRQvTl0RbXsFiao8QpRBUzYoyTcWw1s8vliatOVGWp2hh+op48+fYLWNQFD8cym6ID/d+YkJo+TFeuNyJkGD32q5YLSiRQd6vb7OAVa36UVJxAQXh56WtnqmKnTDLJOLNXxJv/pJwy71oaliN57n1ZIizg6qintEB8w2XYB53GabLhqBEhYE9ELea5ZOYG72HqEuXLixYcLSL02SqHNKECROYOnVqxff+/keHdnRdZ8SIEURHR/Pnn39y5MgRbrzxRsxmM88//zwABw8eZMSIEdxxxx3Mnj2bhQsXcuuttxITE8OwYb4ZhzwRe4q1SsNkVSnkuBT2FGt0DKzfXqKEFq1QFIUDB/fXeKz5X38vCmAY3ie1/fvvUUFBiKMJ2AtTX2bs9TexfOVSfv39F1598xU+mvkpPbr3qjYOf7/K/5vn/vYTL03/D4/c/zg9z+hJgH8gH8x6h81bN9X4TJJ0umlmeRG/vfGI+TugcDemAD+aDbkdR5drOeK4F07gl6TGoCh+xFk+RFtVhFi1DZwubJGhxI58ifzIeRQrKwnoFI/YmVT13H4dyFO/93DVExNouoDI0nsRP25DJK1C0VRCe/QnZOh1pOh3EnbejYiN+8t7a45lNkHfOIqc8z1fGHDrKejRTlR7YJU5QgBc2IlS/52Y4yMhKbNKszKoC8WmlVh7tCmP4d/tvdpQoP52oo98wiymDsSK6fDjLsTu1SAgqH0LgkfOJtX0EA73Nq/n6kYOrrAC/MZfibF1D645v4HThRIVhun8fhhBKuk87JO4G32swGQyER0dXfEVERFRqd3f379Se3Dw0fkkf/zxBzt27OCzzz6jR48eDB8+nGeffZY333wTp7O863TmzJm0atWKV155hU6dOnHXXXcxatQopk+f3qDP6U2e21sqX7vjTkSIPYSBZ5/L7K8+paS0ahelr+sKde7YldvHT+LLT76jfdv2/PLrTwCYzZZqk61jbdi0np5n9GbMtWPp3LErCS1akpicWOW43Xt2UlZWVvH9pq0b8fcPICY6tn4eRpKauGDLVfhtaIb47i/4Zw5KcSni5w1YV1gJsYxr3ACPQzPz02jfJCGWbgdn+dJrkZmL+GgpIWkXUcgf6Jc3R+ndDv4ZCjebUM7tguschXzn53W6v6aGEemcjPHWQkTS35WYdQOxfh/Ke+uI1p4nQ3sFdcIglPCQivOUqDCU2weRzrPUtMnWEf0RlNv6o7Q+5mdTgB/q1f0ojF5OmvNBGNMFpWsrUP7+XLBaUIb1oOyMDI6UPIR7mB2lX0fQ/n4PNA3l7I64Lwwix1m7bS2On0Ks+grizaWIXYkVObbYk4Tx5mJilGnUNG4pKMH9+0r01VuO/j1n5OD6Yl75BHVRzbBkHTR6QrR3715iY2Np3bo1Y8aMITGx8ofZ7NmziYiIoGvXrkyZMoWSkqMf3KtWraJbt240a3Z0THbYsGEUFBSwffv2imOGDh1a6ZrDhg1j1SrPk84aWojp+H4jO97jTtRTU57FMHSuHnMZvy/4lUOHD7L/wD4++fwjrh13pU/umZSSxCuvvcTGzetJSU1mxZ/LOJR4qGIuU/PYOJJTkti5azs5uTk4nQ6v10po0ZJtO7ay/M+lHDx8gFfffIWt27dUOc7pcvHY0w+zb/9eli5fzOtvT+eG626U84ek00aIuB6x2PNv5uLPXdjFFQ0c0YlRFD9spZ0Q+zwPc4ufNxPGzSSX3Uje0LWIB8+G+85FTD6TnP6/keK4nbou1Q4xjUX8uhNPcxhETj6mVAtO9pES/ACO8aEw+Vy4/zzKbvYnxe8uSt01f+649RSSjLEUXZ0FD54H952LPqkDaa1eI9v1KoYoINE5mvwROxEPngP3nYtxb0+yeswhzfkQ4CLZcQu55y1HPNC//D14oC+55ywjuewWoIY5RnXkbzkP1qdDqYef2w4nrE6udpWYqgRhKWyOSDzisd34aRVhyu31FW4ljTpk1rdvXz7++GM6dOjAkSNHeOaZZzjnnHPYtm0bQUFBXH/99SQkJBAbG8uWLVt45JFH2L17N9999x0AaWlplZIhoOL7tLS0ao8pKCigtLS00rySYzkcDhyOo3+hBQV1r4LpSfsAnVCzQa5LwfPAryDMLGgf4JuVZvFxLfjui7nMfP8NXnrlOTKyMgkLDaNL5648/X/P+eSefjYbBw7u5/ufvyUvL4+oiEjGXDOW60aNAWDY0IuYv+g3bpwwmoLCAl545r9cednVHq913ajr2blrO5MfvgtFURhx0aVcf80NLFuxpNJx/c86m4QWLRkz/hqcTicjL7qUu++4zyfPJ0lNkVpqQri9/BwxBEoR5fNPfFDfpT6YtXg4kOu1XeQWYHJ3ReAkzzmLPGaV/0itxxp+fvRE7Pe+5YSyIwdrfEeKSn/jCH8vbxfACVb40I0cspzTyAKPzyBEKbnOmeQy08szusl3ziaf2fX+HtTEX/SC3VWH8yrsycCvX0+KmOex2WxqBbu810kSmblYjE51DdOjRk2Ihg8fXvHn7t2707dvXxISEpgzZw7jx4/nttuO1ozo1q0bMTExnH/++ezfv582bWpeGVUXL7zwAs8884xP7wGgKnB9jOPvVWaCyklR+W8ho2McqPU/YlYhKjKKJ6dM5ckpUz227950qMpr61Yc/aHQ98z+lY658rKrqyQwQ4cMqzgmIjySN6e/6zUei8XKay+/fVxxWCxWXpj6Mi9MfbnS6w/c80iVY++58/5KK9Ik6bRiruGHiEWjqSZDAIYohsBqVhepKmjCp49giEK0AD8o8DxkI+wWDKOaVWDH0NRwAsyDUTBToq/G5T5Yn6E2Gl0pgIB4r+1KgB+Gku613RBFEFTNViMmDaHqNY081kqTGi8ICQmhffv27Nu3z2N73759ASrao6OjSU+v/Mb+8310dHS1xwQHB3vtHQKYMmUK+fn5FV9JSVUn6dWX3nadSS3KCDVX7oYNMwufLbmXJOn0UmbeiRId4bFNCQ3GaTncwBGdGLeeghFrLp+c7IHSvTUFim+3+8lVPkc5p533A3o0o8SxuoaraDQzv0B80fuE/TqYsJ/OpvmRF4mzfISqeF9hdrIocP8I5yZ4bRfnJpCvf+e13eU+gEgIODr/6V+UXm3IV+o+Od6TJpUQFRUVsX//fmJiYjy2b9q0CaCivX///mzdupWMjKNLtufPn09wcDCdO3euOGbhwoWVrjN//nz69+9fbSxWq5Xg4OBKX77U267z3w4lPNyqlNviy3i4VSnTOpTIZEiSpHqRpb8KY3pD4L+KsPrbUMb1I1P8t1HiOhGZ4hXUceeAVnlSrtIsHHFRC/JdX/n0/qXOVbi7KSjtm1duUEC56izytK+oaXyqmeU5/OYHIN5Ziti8F2PHfsRnq9A+SaG5xdcTnn1PNzIpCd2Mcm7VYS2lXztKo/bi1lOqvUaWmIFy4zlHJ8b/IyYCY0gMhc4f6zPkCo06ZPbggw9yySWXkJCQQGpqKk899RSapjF69Gj279/P559/zsUXX0x4eDhbtmxh8uTJnHvuuXTvXl6L5sILL6Rz586MHTuWadOmkZaWxuOPP86kSZOwWsu73O644w7eeOMNHn74YW655RYWLVrEnDlzmDt3bmM+ukeqQr0vrZfgxWd9V5lWkk4Wbv0IKaY7iZ70H7Q0ExwphKhAjFhIEZNxuQ81dog1KnWvJD30FSIfeADlYBHkl0HLUFyhWRxxjSuvReRjKY4JRF0xFb/iwbA3C2xmaB9GLp+S764+IdPUcPzyOiI2Lq3aeCQLbUscft36U+psGot+aivD+Qzh/e4lqPdQ2P13zaAOERSZl5HlfLn6k4Fi9yKIUIh48B6UA4VQ4IDWYTjtR0hzjkPgfaFNXTRqQpScnMzo0aPJzs4mMjKSgQMHsnr1aiIjIykrK2PBggW8+uqrFBcXEx8fz1VXXcXjjz9ecb6mafzyyy9MnDiR/v37ExAQwLhx4yrVLWrVqhVz585l8uTJzJgxg7i4ON5///0mUYNIkiSpIbncB0niekyRzTFFx+DW03C7khs7rBNS4l7GYZZhadkWVQ3B5T6A7vTdZqX/JnCQ7nwE1RKIpUcHhCjD4drB8dRw8rcMgtXVFINdfRB71yso5eROiACyXTPI5i1s3ToDCg7nDsQJ7B9Z7F5IMQuxtGqPqgY3yN9zoyZEX37peT8XgPj4eJYu9ZBF/0tCQgLz5nmerf6PQYMGsXHjxhOOT5Ik6VTk1lNqHLbwPRN1WQLldHuea9pQDFFEmfPEtqVQMaO4DK+pk3DrKFRTbdsHFMWGEA58U5TTRZlzR8Wfa8Pp3lN/4dSg0StVn+yEISpVYJZODUIIhC/2S5Gk05yfeSARyiS0Qn8QYAS5yeF9ily+nRDdFJS4VxPa+wrY4Xl3AKVbPEWK76dzKFgIt9xLoH4eFDjBz4zDtp9M/SXcuuf6PyfK3zSIcGUiWpENhMAIcpIj3qPI7b1Sd2OTCVEdFeWW4nbpGMKNpsi381QghIGuG5QWnWDxEEmSqhVkupLwtDGIr9ZU7GaumE1EXDIWS9uW5Liqlts4lbjch3DFlmBqHgEp/9qPK8APBsZR6Kx+xKPuzMTZPkL7KQOx/ei2WdbIMOJu+oBkbUKdew/tptGEpV6J8fVaxN+VphWziYgrxmNKiCPP/VGdru8r8hO8jhylLtbP382AyyyEhoShyqTopFdcVsj+TSmUFpbVfLAkScdFUWyEu29FfPpH5dEZlxvx3VqCJw4n338OuuF90+hTwRHXvTQf+x7a1nhYfRDhdqN0i4cBcaQYd+PrStJ2y7VoCwsQ2yuXWRCZOSgfrKHZhCdI0e/wcnbNFMWfUOcNGLP/1RPkciPmrCbkrispsHyHIXy7NVRtyE/verB0zmYAel/QAZNZQ1F8WEVR8ikhBAU5xSyYvQE5EipJ9SfQMgwWJ3qfqrJwL/YrriXH+VaDxtXQDFFAkuNa/Lr2w97lChTMFClz/+4Z8m0yBGAXlyHWe560LXLyMRf2QrH51XrFXpD5Elh8yPsBSw5gH3kluY6m10skE6J6IAQs+Wozf/60naBQfxRflpWWfMrQDfIzi9HdTbdirySdjMyiOWRUsylndiFmo7n39lNMqXM1pdRUxLH+KU4V9Gp+vuWVoMWG4NZrlxBZRBxkFno/ILMAsxFXq2v7mkyI6pGz1E12qW/2PJMkSToeNnMfwpSb0AjHyQFyjPebxLYQTg5AXA/wFkpMKE616sbMUu1YzV0IVcZjJgaXSCWX93G4dmJYXKhmE7i89EaFB6Ab3veMq4lTPQBxbSE5zfMBsSE4lIZPBI9Hk6pULUmSJNWWSqz1DaL33Id5ZhrqS2vwmyWIy/4fYeaJjR0cRc4F0DfO85YMCijntyXf9XXDB3YKirJMJSbpSWzv5aO+tAbbB4XEJD9NlOUp8pSvUAZ09HxiTAQO/wMIUfv5k4WOeTAgoWqVaQBFgUFtKHB537qjMcmESJIk6RQQbp6EZbGK+GEd5JcPTYm0LIwPlxKcOBg/c99GjtBNhjINdcJgCAk6+nKgP+q4c8mxfowhZA97XQWbR+G/qQXii1WInPKJyyI7D/H5Kvy3tAYhcPZzowzsXCk5VdrGwY3dyXB73uT7eAkcZGkzUG8dBPbAow1BAag3n0u2+e0GqSheG3LITJIk6aSnEGQMQ6xZ4LFV/LKJ8El3kMyaBo6rshL3ElKC04i47R7Mjt4ogNuaQzrPUeba0KixnSpCuB6xYKXHNrFgK6HdxpLouBr7gNHY+1+J6tAQZoUibSU5rqn1kpQWuX7HZU8m4o57MJU1A8BtyyJLPIXD1XSHRWVCJEmSdJLT1FDILPZ+QFEJmsNeXhi6kTldu0jlzqPjE7UrYCx5oRZrCN3LnpguN2qJCcwG+c7Z5DMbNMD4+6seOdzbSeH2o3/PJ0FZN5kQSZIkneQM4QCrufqDNKXeP/Qaj4a/tT+qasfp2u2TbTwsprZYzB0wjAJKHKs40SXxmhqGn/UsEIISx+qGq7tjqmEmjKkJZMVNlEyIJEmSTnJCFKOHlKHaLFDm4VfxhBhKTOtPit/SaxJsuoow42b4Kx0lx4nocAN6gsIR/YF62Z/NpMUSo72ClqSg7MxFhFqgxyPkmj4l31X9bvZQvi1GM8sL2PJboyxPRygK9JpIafA20p1PAV56b+qJ05aMOTQYcqsOfSlhdpzWxFPi34EvyIRIkiTpFJAp/kfMjU9ifLAUjh0yCfRHubo7Oe6xjRdcPQk0XUx4yrUYXywA8XeNx42g2gOJu/1dEo3RdZoDoyqBxGnvId5ZjcgrPFpDctFWQq+5EiOhjEL3j9VeI8b6GpYfihC7lh49f81O/Hq0InrYS6Q5H6x1fMcj03iZuLGvY7yzBBzHZD42C8rYfmQad/r0/iczmRBJkiSdAsrc60i3/5fIBx5C2Z0HGaXQyo4Rr5BqTETXs2q8RlMXrtyB8eWSqtWu84vglz2EXDKWHOebtb6+3Xw9zNsHeVULC4qvVxP20HgK8Z4QWUztsBwOQezaVfX8TQex9RqIKSym3jZQ9cTl3k+q9VGi730G5UAxpBRDXCBGKz9SxUNNoiZVUyUTIkmSpFNEiXs5h1mOtUN3TJ0jcLoP4XIeaOyw6oVZa4GSVIQwPE+EEjsPEzhiEDnUPiEKMs5HbPNSNFAIlEOFmBPa4HJ73q0+WLkcVh7yfoMVSQRdMYJc/f1ax3g8HO4tHOYKLK07Ym4Xi0tPxemqmqRJlcmESJIk6RTjcG3BcYqt3lIUK5RVM7FZAEYdNyAUKtVuYljmQlVsXptV/Kj2jXe6UIV/HQI8MU7XLpkInQCZEEmSJElNnst9GFqGem1XosJwmJLqNGHYoe3HLyYCccTL8GKrcJzuAwSaLyaMm1BLrGBScFqTyTL+R7HyJ/5dR0N6tufzu0dTQs3VuM1aPBHaA1gdrcEtMPzd5CmfU+D8tvYPV4lGqGU8wcYIlFLAolJi2ki2/iq6kVNP9zj5yIRIkiRJavIETootawg4oyVi86HKjQool/cgx7i/TvfIMWYSd9lriHcWV+0p6tqCEtsGwsU9BO3qhvHrWsTf+4GZQ4NpfsMMjtieRPSJgjX+UFRS+fyQIIxOAZQ6/6o2BoupPbHuGfDxWkTWovLH0zTChozEr1dv0p3/V6dnBI0463uYFpYi1i1B/P2c/i2i8R/9KUnum9CNzDre4+Qkt+6QJEmSTgqZzhdxDfdHubwPhAaD2YTSpjnqpAvIsn9Q53pELvchMoPeQb1raPlWFiYNQoJQLu2N+xI7ucZsAtPPwvhpXeXNUXMLMN5ZQjPlSVKNe1DuHIjStz1YLWCzlu8ddvtZpOp31xhDtPoc4p2liKxjNljVdcT8zfgfaIXN3LNOzxhsuRLTCjfir72Vkj6RmAYfraeZ6ak6Xf9kJnuIJEmSpJOEmxTHbdja9yK0/Q1oSivKxHZy9ZfQ3fWziq7IPZdSy2pCrhqHH93QRR65zKCsbB3NzM/CHzs9n+h0oezOR+sYzGHnFQQNuoygQReCMChQ5lHo/IWaijuaTS1RkwxEiefNVY35Owi77RZSqTmx8iZEjEKsXuuxTWTkYCk+A8Xi12T3G/MlmRBJkiRJJ5Uy1waO4Lu9z3Qjm2zn/6q8blZaIDI2ez1PSSrG1CWWMjZT4JxDAXNO6L4mLRZSqy75r5BXiEbnE7pmlRidZnBXUxwypwQtNgS3LhMiSZIkSao1BStBlssIEkMQuMlXfqTYuZCmtG+ISYshVLsRi2iLW0kj15h1XMNtLpIxRYQivEyaFs0DcOtptY7LrR+B2CDvB9gD0anbpGdhcaNoWuXinccK80c38lCVYOzm0fiL3ugUkscXlLnW1eneTZ2cQyRJkiTVC4vWlgTz94QtHoTp9YOY30ol8q9rSbB+i6aGNXZ4AISYbyG+6B0CvgzDNGMvfp8oxB55nmaWqTWem2t8hHJRd8+NZhOiUxhlzo21js3lPogRbwI/q8d2ZWhncvio1tcHyFO+LZ/f5On6kaE4A1Lw086khfgK+y+dMM3Yh+29fKJ33UOc9SMULHW6f1MmEyJJkiSpHpiI1V5DvLEMsXYPlDqgqASxaBvKB1uINb3W2AHiZ+5LaPJwjJmLEIePgMOJOJKF+OxP/NY1J8Rc/fYmLv0Iakw02tB+5ROu/2EPxHzjpeimojrHmGY8jnr7IJQw+9EXNRVlSDfK2iXXuZemwPkN7vNsKD3bwDH7vCqxUXDzmWQZb9LM8QjitfmI7YfK36PcAsRPGzB9n02U5dk63b8pk0NmkiRJUp0FWS6GFUngYUKwyMxFO6xhadEBp2t3I0RXLpyJGD96nnskFm/D3vtq8vjU6/kh5uvQ561GUSyYb7y0fC6OpiJKynDPXYZpZCdMIdF1GjZzuneRbJ5I5IQHsTj6gEsgAnRymEOB44taX/coneSy8YRdeDtB51+IUqqAVaHUvI0s942EmSYgftoOetUhTrE7Gb/iwaiWQAxR9+SvqZEJkSRJklRnAWIgYnuy13ZlYzr+LfvipPESIs1hh2Ivk4UNgZLjQgkOQIhij4cEiAGInTsQLjfG5qrPoW7KxHZhD4pKf6tTnC73IVK5q7wHxwLUe9VxNznON8u3ObFQXuX774KWNnEG4pCX7UsAdmVh7d2pxnpKJyOZEEmSJEl1JnCgWMxV9l2taLeaEIqjQWOqQlOqbzebAO8rsATO8mNcXpbP2zSEKM8sFMUfm6UHYFDm3HwSLWM3QFM99hABYDOVvw+nIDmHSJIkSaqzfL6Hs1t6P6B/c4qcCxosHk8c5j0ozcI9N9os6MHFCOG5BhBAnvIdSt82XttF72aUONYSZXmGBNeXNFtzC83+mkCCPodIy6NUmrTTRBUoc1F6tPZ+QMdwypxbGy6gBiQTIkmSJKnOylwbcHdUoGWzKm1Kn7aUhuxAN7zs8dVAsvTXUEb3Btu/VkqpKuoNA8gyXq32/GLnAoy+4eAhqVIGdabIupRoy3P4/x6CeH0RYvE2xMKtiBkLCVjWgmYnwYTkAte3iKEJKKHBVdqUS3qTp86hKZVQqE9yyEySJEmqFynO24m59mUsWe1RNqSDSUWcGUNJ4GYynE83dni49WRSzQ8Tfe9/UHbmoxwqQkTZoEcUmcoMSt2eKzgfZZDsuoWYm6ZjTusAm7LAT4OzYiiyLSNffEvzlJcQm1ZVOVOs2oNf13PRgiOb9F5hQpSR7L6V2NumoyWpKNtyEMEm6BNDgeVn8lyfNHaIPiMTIkmSJKleCFFCqvNOTKEx+A3vgxBuShx/YrjyGzu0Cg73Ng5zGbZOPTF3SUAXmZQ4VnG8vR6GkUeK82bMkS2wjeiJEGUUO5YjXCWEW+6HZYe9n7wikeDLRpLrqFstIV/TjXSSnNdjad4Wa0IXDIopcaxAuLwPJ54KZEIkSZIk1Su3foTC0p8bO4xqlTk3UkYdiijqibhKEyu9polgKK0maShxohrVVKJuYpzufXXeMPdkIhMiSZIkqcFYTG2JUO7F4oxHAG5LJlm8jsO15bjOV9UQwk0T8Xf3A5eBsLnIVWZT6PzRt4EfhxJlLf6dLocML9trdImiRPkes6kVEco9WFytwTDQrblk8dYpvzVGUycTIkmSJKlB+JkGEF08BfHlX4ic/QCYggOJueoZsiNmUej+odrzNTWSeNPHMGcH4uBiABSzifBBlxPQawBpzod9/ATVK3L+TkTfu2DNfij7V4mBQH9Et1CE4SSu7A3El+sQmeXPoAX6E33Fo+TGfEe+6/NGiFwCucpMkiRJahAazXgM453FiJxj5hQVFCE+XkqE8zZUJbDaK0Sb/gMf/IU4mHL0RZcbMX8zth0x+FvO9VHsx0snVdyPctcglG6tQFFAVVF6tkG5cyCpxn00U57GeGcJIjP36GlFJYhPVxBaPBpN9VIWQPI5mRBJkiRJPhdgOR/WpnjeZV2AWLyXYPPVXs9XFTvm/EhEVp7HdrFwO2HipvoJtg6c7l0kGteQf/F2jIf7YDzYi7wL13HYPQqTGoOyOROcXkpP/7GHENOYhg1YqiCHzCRJkiSfs9IWEqtZbZaSi9Vo67XZZIqGtALv55eUobqr72FqKIYoJNf5Lrm8W+l1i9Ia5XCh92reR7KxCO/vgeRbMiGSJEk6iVhM7bAr16IJO6XqRgpc358U20K4lFSI7AYHvRwQHoRLPeD1fF3PgvDyhEeJDEXt2QnFz4ZIy0LfuBMAYXL5YN+v2lAIsAwhSFyAwKBQmUeJcwVujiCi/GGXl7PC7LiVIw0bqlRBDplJkiSdFDRirK8Rm/IsAbP9sb6bT+jCs2mpfo+fqW9jB1ejIufvMKCF990rhrQl3/2V1/N1Ixs9ogzTVRegndsHY8cB3EvWIkrLMN90GerIfuQpX/om+BNg0qJJsH5P5Prrsb1fit+HTqK2jKeF9VvKXFvhzFhQvbwJQ9uSq3/asAFLFWRCJEmSdBKINE/B+juIz1cjUjIgvwixbh/GjAVEO59CUyMbO8RqCVFKrnkW6piBYDEfbdA0lMv6UBi8EN3IqvYaDvYhUtNxfzsfkZwG+UUYW/bg+uA7tNhoyvTG32OruekteGcDYsFWRE4+IisP8etm1A+3E2t6lWztbdRx54L1mO1DVBVleA+KI9fh1pMbL/jTnBwykyRJauIUxUaAoy9i48KqjS434udthF19C5nOlxo+uBOQ75qDKyaDiPsmohZYwTAQIQrZfEyhq6Y6Qib83b3RV3nYIFY3cH+3lLCxt5KuP+6T2I+Hn3kAyuZ8RG7VuU4iIwdtv8DZ+jBpEf8l4r670Qr8wa0jQjVymE2Bc04jRC39QyZEkiRJTZzV3AF2ee89EQdS8TP6N2BEtVfiXkIiS1D8bYCCcB3f/CeLqSUc8FLwEBCpGVjdA+onyFoKYghsTPF+wPpUAtqcS47rTZL4E8VmBUVDuEoaLkjJKzlkJkmS1MQJYYBWzY9rRUF4m5vTRAlRdkKTwQU6aFr1Bx3zFpm0WGyWHg06lChwVxujomnlzwGAisXcGoupDYpia5gA/0VTw7BZzsCsxTfK/ZuaRk2Inn76aRRFqfTVsWPHivaysjImTZpEeHg4gYGBXHXVVaSnp1e6RmJiIiNGjMDf35+oqCgeeugh3G53pWOWLFlCr169sFqttG3blo8//rghHk+SJKleOFw7oL33gn1KlwSKlEUNGFHDc7kPQkJwebFDD5S2cZSoa7CYOhBv+ZK4jFeJ3nIP8XkzibPMwqQ193mMBfwC/bwnF6J/HEX6r4SYbqCl9hMxe6cQs+MBEvSviTQ/DtSQ8NUTTY2gueU94gvfJ3rLvTRPe5kW5m+wmXo2yP2bqkYfMuvSpQsLFhwdEzaZjoY0efJk5s6dy9dff43dbueuu+7iyiuvZOXKlQDous6IESOIjo7mzz//5MiRI9x4442YzWaef/55AA4ePMiIESO44447mD17NgsXLuTWW28lJiaGYcOGNezDSpIk1YpOvvoD9osGIH7bXLkp0B+Gtyff9UTjhNaAyswHCbhkEO6fFldu8Ldhvvx8ipV7ae6ajvHuMkTJ0U1WNXsgcbe9S5I6rsaJ23XhcG3F3c5Aa9EMEiv/8q60b46zeTZBxnCCd56N+LnyXKiAHq0wXzSDVMddPosPQFWCiDN9hPLBBkTW9qPxWczETHiOVP9Hcbgbf3J6Y1CEEN5qRPnc008/zQ8//MCmTZuqtOXn5xMZGcnnn3/OqFGjANi1axedOnVi1apV9OvXj19//ZWRI0eSmppKs2bNAJg5cyaPPPIImZmZWCwWHnnkEebOncu2bdsqrn3dddeRl5fHb7/9dtyxFhQUYLfbGRp1KybVUvMJkiQdN2fHOJKH2HC0cDKhz3JGBW9o7JCapHDzZIJKBqP8mQyFDkTncETHYFKN+3C5DzV2eD7X1rIOY9VO1DYt0DfvhsJilIQY1FZx6HsOoZzdAv2tPyC/qMq5Snw0JaNLyXA949MYFSWAWPOrmI/YUdYeAVVF9I3BGZVGmvsJWrg/QczwMDkeUK7ty5GEqThcO30WX7jlHoJ+bInYlVi10c+KcXc3klw3+Oz+jaGo0KBPl3Ty8/MJDg72elyjzyHau3cvsbGxtG7dmjFjxpCYWP6XtH79elwuF0OHDq04tmPHjrRo0YJVq1YBsGrVKrp161aRDAEMGzaMgoICtm/fXnHMsdf455h/ruGNw+GgoKCg0pckSVJjynZNJ9EyhqwLF5J75WbSOszgsPPy0yIZspp7IpJz0JetxzX7FzAMlIhQjP3JuN77BmPFRtRim8dkCEAkpeHv7uXzOIUoJsU5gZSoh8i9fD05l64hOfxeUp134W85G1ZVM+l66X7sjPZpfIHGIMRuD8kQQKkDLc8fVfGeNJzKGnXIrG/fvnz88cd06NCBI0eO8Mwzz3DOOeewbds20tLSsFgshISEVDqnWbNmpKWlAZCWllYpGfqn/Z+26o4pKCigtLQUPz8/j7G98MILPPOMb3+TkCRJOlGGKKTQ8W1jh9HgTFokFP09Cdvpwtj0r3LPqgJlzuov4hYNNU0Hl36YPP2TSq9pIgSl0Ol9646iEjSifRyYwGsAAMUO1OAADP306wRo1IRo+PDhFX/u3r07ffv2JSEhgTlz5nhNVBrKlClTuP/++yu+LygoID5ezsSXJEmqC5PWnHBtEjZXBwCc5iSyjTdwuvdVe16pcyNKixjvBwT4oYQElVfC9vSBbzZh2JzHtbWHv+UcwsR4NGcgwqxQpC4izzULQ3jufTpeZcZWRKfhsNPzFiVKm2hKlY11ukdNdEshmr8NjpljVUlUIG4906cxNFWNPmR2rJCQENq3b8++ffuIjo7G6XSSl5dX6Zj09HSio8sz6Ojo6Cqrzv75vqZjgoODq026rFYrwcHBlb4kSZKk2rOZehHveg+/TwX8bzn8bznWD/JpXjidQNMF1Z5rUu2AgdIy1nP7+f3QzQ6UPu08titDupKrfFZjjJGWKUTtnYj2xi54dQXKy8sJ/rkd8ebP0dSwGs+vjsO1BaOdHwQFVG3UVJTz25Hv8m1xxhzlfZTh3T03dm5BsWUN4PbcfoqrVQ/Ra6+95vF1RVGw2Wy0bduWc889F62mmhH/UlRUxP79+xk7diy9e/fGbDazcOFCrrrqKgB2795NYmIi/fuXFyDr378///nPf8jIyCAqKgqA+fPnExwcTOfOnSuOmTdvXqX7zJ8/v+IakiRJUkNQiWYqxtuLwXm0m0Zk5SLeXUzkffdTrKzwWptIYODOPIhpSF+M7fvRN+wAlxslPARtSF9EWibuViXo55sxh56BWL4LSh0QFIByQWdK2x2i0PlTtRHazL0JOHgG4rvVx94YseMQSlYBzW5+nlTnHXV6F47o99P8jrfg172I7YdBCGgRjXpZN9K15xHu4jpdvyYlzhUUtRlA4HX9Eb9tg7xCsFpQzm6Pu5+NrLJHfXr/pqxWCdH06dPJzMykpKSE0NBQAHJzc/H39ycwMJCMjAxat27N4sWLqx1mevDBB7nkkktISEggNTWVp556Ck3TGD16NHa7nfHjx3P//fcTFhZGcHAwd999N/3796dfv34AXHjhhXTu3JmxY8cybdo00tLSePzxx5k0aRJWqxWAO+64gzfeeIOHH36YW265hUWLFjFnzhzmzp1bm0eXJEmSasHfch78lVYpGapgGIjF+wi+cBT5Ts+bm7rcByE+CNf/fkTt1h7zdReDqiCKStBXbgCbmVItm3z9O2L6vIq5WxdwGQizgtOWTLbj9RpjDGNCeZLggcjIwZLfDTUgBMPIO5FHr/wc+mESleuwjxhD0MWDQSiUqdvJ0SfgdqfW+ronIsv1EkUJPQi7bQImoxuG6iBP+YKist8Ao0FiaIpqlRA9//zzvPvuu7z//vu0adMGgH379nH77bdz2223MWDAAK677jomT57MN9984/U6ycnJjB49muzsbCIjIxk4cCCrV68mMrK8suj06dNRVZWrrroKh8PBsGHDeOuttyrO1zSNX375hYkTJ9K/f38CAgIYN24cU6dOrTimVatWzJ07l8mTJzNjxgzi4uJ4//33ZQ0iSZKkBmSjExzM835AYjY2oyP5Xg8Q5PElIRddgDF3Y+VJ1TYLyp2DKBD/R6z7NcTby3EVHd0OQwuzE3frBySJseiG9+0/TEYYFOzy2k5yHubOcTjqkBBB+cT4XOdMcplZp+vURZlzE6lMKv9Gr/7Y00Wt6hC1adOGb7/9lh49elR6fePGjVx11VUcOHCAP//8k6uuuoojR47UV6yNStYhkiTfkXWITn3B1qsI++N8xIY9HtuV1s0pvDqZbOeMaq8Tbr6HoOKhsCIRCsqgbRiiZxRp4hEi1LsxvZsKHjZXJSGGkmuLyHQ9i6qGEml7DJveFpeWRobjBdz6YeLNs1Ff3eS5FwtQbjyb5KjJuPVqls5LTc7x1iGqVQ/RkSNHqmyPAeB2uyuWu8fGxlJYWFiby0uSJEmnmELnr4QNvBm8JEQMaUO+/kqN18l2vUau9WOChl+MZoRRpvxEiXM5oGDW4xC5Xnp4Dh8hQB+CsD5KiHMU+q/rEanbsIbZaTnoC0rt28kzviK870jE8h1Vz7daMKLB7ZLJ0KmqVqvMBg8ezO23387GjUeXB27cuJGJEycyZMgQALZu3UqrVq3qJ0pJkiTppCZECXnWb1FGnVV5o1pFQbmgO8Xh63Dracd1LUMUkO/4khzXW5Q4lwECRbHWWIdIcdsIyRqO69XZGBt2ItKyMHbsx/XWHGy7mmFVzkAfGIDSMa7yif421AmDSBf/OcGnlk4mteoh+uCDDyqtBIPy3qHzzz+fDz74AIDAwEBeeaXmbF+SJEk6dZhNrQlXJmLRW6EoCiXaenLc76EbmeS5P0JvlU3YA7egpDtBNxCxAeTxNXnOT2q+eDWEKAV7NdMZzCY0WxSuWd+CUXWmiPvnpYS0v579jrOJvPRR/Iefj0gvRAm0oYeUkGY8jMPtecJ1fVOVQELMNxJoDAEd3OZMcsRMylybaz5ZqrVaJUTR0dHMnz+f3bt3s3v3bgA6dOhAhw4dKo4ZPHhw/UQoSZIknRQCTRcQUXQPfLsRkbECgICEGAKumsUR0wM43DspdP9EIT+hRUQAGrorvfqLnoBCdT5BvdoiNuyv0qac1xkQ5cvMPXHrkF2EYg/4e78zM6bYSAyjCMPVcFWbNTWKONMHKD/vR+xcAQLMIUFEj3yC/JjfyHW/32CxnG7qVJixQ4cOXHrppVx66aWVkqF/BAcHc+CA54qckiRJUtOkKH6YTa1PqBChqgQS4b4f8c4iRMbRlVzi8BHEW0uJVl+qdLxuZKEb3pMhkxaN2dQSMHts19RwzKZWKMrRArvZzjdxDwtAGdwVrH/3FvnbUEb2pqx3NjhrKDhYqXaeC7eeiiG8JUMmzKaWmLRqqmfXQoxpGsp76xA7Dh2tuJ1XiPhsBSHZI7CY2tbr/aSjfLp1Ry0WsEmSJEmNRFUCiTJPxVbSFhLzwe6PHuEgw/gPDvf2as+1m6+BX/d6HI6izIGyNQe/bn0pda6p9joBpsFEKPegpDkRpS6U+BCKzSvJdE4DdKymM4hSp6BlauWrzGLtlPntIt31NEKUkFx2K4FnXUTomdej6FYMrZAc5T1KnMtoa92EEhqM8LQKzWxCCQnGcObW8C5pRJgfJNB9DuJQHorNhIixkSXeoNi9oIZzq2fSojFlBCJyPBcfEHO3ETZuAmk8Uqf7SJ416l5mkiRJUtOgKDbiLJ+gfL4Lkbio4nXVz0rsrdNItT1cbVJkE10QhzO8X39vLtZunSjFe0IUaLqYyIxbMWavQLjLi+MIIOCMlliGv022/hYxJf/B+GAp4pgJ1LZWMcRfN4uksjEInBQ551HE3zsUHFNnUHdnYRo5CNfnv4BeuQChafg56MXZqJZADOF9hXSs9U0sfzgRGxZUxIemEXn9RNRofwrd1VfDro7Z1BoOek/IRHo2ZvrW+vpS9ZrUXmaSJElS47Cbr0P9IxUS/5XUlDowPlhOlPpYtee7yUIJDvTaLkL9cCvZ1VxBIUJMxPh0efl8nmPP3XwI8w4b0ZYXMN5fWmU1mTh4BHVRFkGWK6qN0aAIfc0WzOOvRDuzK0pcNGr39phvvgKRno2hFGF42ToEwGbuiWV3IGLDv6aC6DrisxWEi9uoy8eqYeRCWDUbm/tZEZR4b5fqRCZEkiRJEsHGxYhNXuZ8lpShZVurnVOUJ76EwW283+DsOIqd3oeU/Cx9YHuW5yE3QCzdg1YSAg7PS+vFun3YxeXe7w8UaysRQsf1wfcIpwu1fQKK1YLr87noew7h8k+luo1NQxmLWOKljpIQsDkDP2vte3Acrp2ItvbKZQmOoQzsSK7yea2vL1XPpwmRoii+vLwkSZJUTxS3Aob3faxEfgmq6r3Kr8t9kNLYfSgDO1VuUBWUUWeRZ/7W68atAKoaCtnV1BEqKEIR1Xxk6TqKu/oNxXNc7yCubI8SEYKxeTf6ojXof20Dixn15v5k6M9Xe76mhEOB981XlWwHmhpS7TVqkiVeRr1lEJj/NaOlUzzuPhrFzkUez5PqTk6qliRJOo0EWM4nVIxF1QMQWhm5yucUOX9DNxeiBfihRIaSO7A/Zf4BWIRO8MatmDZuQ4kOxq2nYzF3IFyZiNndHDRBoTKfPNdshCgh3TGFsLPvJLjfxZCUDyYV0dyfbPExhe7vq43L6d4H7UJgred2JT4aYapm062gAHRTHnjedQMoL+iY7B5Hs5uexVLYDTIKISwAd0gBqWISLvfhamN0sIOA+GaIQ543YRXt7Thd+6q9Rk2K3UtJC3ETOXkyaoaBKHaiNLdTYllHZtkUji49k+pbvSREuq6zdetWEhISCA0NrXj9119/pXnz5vVxC0mSJKlOFGKsM7ButiOW7IBSB4rVQsQ5V2PvcwU54iMCb/4vq4qDeWePQXapwKKaGd6iHzf274XNfzHBxuWEZV2D+GkzIusQqAr2rt0IHv4Vye5b0I1MclxvkcNMzPFxCPTj3urC5T6AHmeg2gMhv6hq9Bd3ptS8HXNcJCRnVm0f3p0cXq7xPrqRQ6pzEootAFOrKHQjF8OVd1wx5rg/JnDEB4i3UqvmJUEBGAkaTufe47pWdUrdK0lkJVp4FGqkP253KsJVfRVuqe5qNWR23333VVSk1nWd8847j169ehEfH8+SJUsqjhs4cCBWq7VeApUk6dQQHm0nNMr70IvkG3bLaKyr/BC/boRSR/mLDidiwRbMi9z4my5iuTuY5zfrZJeWf9o7DfjxkM7UvRYKLCGEFYzG+HApIiuv/HxDILYcQHl/PTGm/x5zNwOXnnjCm6Ae0R9Eub0/SocW8PeMCyXMjnLzeeQEfUGa4wG4oQtKz7ag/v3xFeiPenU/ilpuoNT113HfS4hiXO6DGCewc71upJMTMAv1lkEo4SEVryvt41HuGMAR/cHjvtbx3S8Dl/sQApkMNYRa9RB988033HDDDQD8/PPPHDx4kF27dvHpp5/y2GOPsXLlynoNUpKkk9+FN53FOaO7kKYnoika4aI5f7y7jpU/bG3s0E4LIWIUYsUKj23ir724hk3g3c2e5xBtztDJc/Un6rcfPZ+fnYcpIxBTePM67QTv1lNINK4n9PJxBBjngi5wmdPJFo/hcJVvuJroHI39wmsJHnox6Aq6KZ8cZXqN9Y3qS77ra0rDtxJ+652YXV1AUyhSV5DrfvaEkiup6alVQpSVlUV0dDQA8+bN4+qrr6Z9+/bccsstzJgxo14DlCTp5DfmiQtwnXmYFw7dgfF3YRhNMXHdbXdhj+zPD8uTGjnCU59SqiF0L5OmhaDEpZHn8D5H52CeSvti75Oi2ZeDpVnbOiVEAIbIJ9v5Gtm8Vv7CvzpHhCglz/kxeXxc/kI1c4Z8xenaxRHuKf+mhuLX0smjVkNmzZo1Y8eOHei6zm+//cYFF1wAQElJCZpW/Sx/SZJOL+HRdmLP8eOnI7MqkiEAXbiZnfQqZ17bGpvN8/YMUj0yV//j3qSpaNUsDA6xKgiqO8CGIfIACPQbSXO/D4n1fwc/y1m1CLZuFMWPYOsVhFkmEWgdjqxBLB2PWiVEN998M9dccw1du3ZFURSGDh0KwJo1a+jYsWO9BihJ0slt0OgeLMz72mv7iqKfGXie3J/J10rNm1GaR3lsUyJCMUQyA+M8fyQEmBXM5hLKerT2fHFVgY6h6EYRbWxrabb7dsxf5WL9uoTYI8/T2rYUFe9FG+uT3Xw1CeIbwhZeSPBXLYlYcTkttZ/wNw1qkPtLJ69apc1PP/00Xbt2JSkpiauvvrpi4rSmaTz66KP1GqAkSSc3e3QAOc6qq4L+kePKIC7MCvW36bnkQZZ7BvHXz4L3VlXe8T0oAMadya+5/+OSjk+SXOjP/ryjPXn+ZoVHz1J5fv3PPNu9Jx2S4lF2HzPEqamoNwwkkzdJMM3B/eE8RHp5RWoBGHsOo3ZIIOHKXzhYNsinz+hvOpfQI9ciPpt/dBXYoRRYtZtmEx4gJTgNp2uXT2OQTl617kccNWpUpe/z8vIYN25cnQOSJOnUkrQti9btO5FW7HmeUCtrZw4cyIeABg7sNKMbmSQrE4i+/VlMuSGQXgiRgbjDi0nXJzLEdpifc/K4svtj+BnNOZSvEO4HwdZSpm9dSKH/Vn4r/h5x8c10vmgIHM4FfysizkqGeAOLpS3GX/srkqFjGbsPY0rrgTXyDByuzT57xgh1EuLrtagdW6P16gSahnA40f/ahvHVX0Tcdg+p3Omz+0snt1olRC+99BItW7bk2muvBeCaa67h22+/JSYmhnnz5tG9e/d6DVKSpJPXsm828cTN17MmcxG6qDwD1arZ6O0/mDl//QGDZIkOX3PrKSTrt6AGhWIKaYauZ6I7jyYwlwQu55uC/2Nl5hk0s4VSWFhClqMA/OCC6F1cEbQBWMthzYa5fQLCKMHlKk90I7WHMDas9npvY802Qq8cR5rrfh89nRm10IbpiqEYaVm4vl0AZQ4ICsA0sBeYNBSHQnXToKTTW63mEM2cOZP4+HgA5s+fz/z58/n111+56KKLePDB+q3DIEnSyUNRFOzhgfgH2ipec5Q6+eGF1dzXZhqxAS0qXm8Z3I7JrV/h00cXYHjZv0ryDcMoQtezMIyqu7qPCt7A9DafcEbIJs4M28wF0bu4IHoXo4I3VBwjRBlO125c+rG9fkqVHeQrEQJfTm5WUFBtYei7D6IvXlueDAEUFuP+dTmiqATFXJ/1rxQ0NQxV8T43SlVDURV7Pd5T8qVa/etMS0urSIh++eUXrrnmGi688EJatmxJ376139hOkqSTk6IoXDppIH0ubUumOwWbyR9zYRDfv7ScHWsPsW7+LtIO5nLJPZOIaB0AKKRuz+Wth+aRkZwDHeMa+xFOC5oaRqTpcWylbSG7FOx+OP0Pke5+Ft3IRFFsRJofwd/Zh4llJRBowxWURYbxH1w1LC8v1P7AfkYv9CWeiyOqvTuR73rKB09VTuBENfnh2rjTY7u+YgOms0Z5bDsxCmHm2wgWIyGzFCwm9FAnWcarlLrLe8iCTVcRqtyIkukERUFEaGSL9yhyz6uH+0u+UquEKDQ0lKSkJOLj4/ntt9947rnngPK9y3S9mr1mJEk6JU1640r2xy7n+YP/Rfw9m9XPFMBtLzyJ7T9WNizaTfK+dN6+54fGDfQ0pip24kyz4NPNiNSjG4RaosKIv+ljkhhPrGUG2rdJiD0LK9pNwYHEjX+TFO1unLr3fbqyS2YQ2m8t+qZdlSdt8/c+ZPEBlJb9Wf8P9s89FD/0vCzvW3253BhluXXe0jza8jK2FQGIPxdW3Eu1WogeO4WM0NexKV0J2tkbMXcp4p/NcjWNyCvHYmrVjDzXR3ULQPKZWv3TuPLKK7n++uu54IILyM7OZvjw4QBs3LiRtm3l8llJOp20P6MFZa1SWJzxY0UyBFDqLubN/Y9xxSNnN2J00j/CzXejzNkJqVmVXhcZOfDZJmJsM9AW5SD2JFc+saAI473lRGlP1nAHN0nGeMwTr0Ab2g8lKgwlNhLT5YPRxg7hkPPyen2efxPCgWqrfmm/YqrbzH2ruSu2g7GIlbsqJ14OJ8aHS4lUHyCoYDDi5/VgHDN8qOsYX68hpOxKFEWuHmiqapUQTZ8+nbvuuovOnTszf/58AgPL/xEeOXKEO++UM/gl6XQydEIvfs/60mObW7jZ5VxHx14tGzYoqYoAd1/EQS+7tKdmYnO2Qazz0gNUVIIpNxBVDan2Hg7XRvaV9SG37wrEzT0xbuxMZpfP2F96FoaRUccnqIkAmxVCgjy2KvHRKOa6FQANZRws2u250TBQMzVYfMD7BZYfJsg8sk4xSL5TqyEzs9nscfL05MmT6xyQJEknF3tkIFl5aV7bs0QK9ojmDRiR5JGzmgnPAC4dqpnyIHKL0ewhx7Ffl0FuyRvk8sYJh1gXimLDKM7EPOpCXJ/PhZKyo40hQZiGn4O7LLtOq8xMShQi13vZAMWlQm7VieoVcooxi+jaByD5VJ2m/O/YsYPExESczsqbzVx66aV1CkqSToTJrHH2Jd04+5ouaBaFtL15zHt7DUcOeS8GKNWfI/uyadm6PQcLPP/mnGDqzM5D3ueeSA1D2ABF+Xu1lwdWDfysUOrw2Kw0C8at172XR1UCsZuvJ1AMBhRKlDXkuWehGznl7QQR6f8Uge5zUISCS0sn3XiKMueGaq8rRCnCZuD+dinmq4chikoQufkokWEoZhOub/9A3NK9TnuPOcReAmKjEEmefwEQfgLiwyEjx/MFWoTgUJbWPgDJp2qVEB04cIArrriCrVu3oigK4u//YIpSnnrLidVSQ/EPsvHI7OtZa8zjrayPcOplJLRox00f3M6f7+5n6VebGjvEU95v76zlxg9v5rWCqlXqgy0hxLjakrjH8y7rUsMpUH/G3qcX4q89VdqUri0pMv2JbUgXxFwPiUdMBM6AwwhnSZ1iMJta05w3YO4exPa1IARB7eIJvmQ2R0xT0CkkwfQN+u/rcG+cA7qOEh9N3KVvkhf0A1mOl6q9frFpJQEhkbhm/QghQSiBAYi126CoBOXsjuSpP9Qp/lzjIwKHv4V410NC5G/DHVqEel4r2LgP/l1KwqRB33iKnH/UKQbJd2o1h+jee++lVatWZGRk4O/vz/bt21m2bBl9+vRhyZIl9RyiJHk34ZWRfFn0CgvTv8epl3eRHy7Yy6v7HuLs29oQkxDRyBGe+tISs9n+TSbjW/4fdktoxettQzpzT4tpvH//qbfUuLz2zMm1IW2u8yPc5/uhnN0R/tmEW1VR+rRHvySS9JIHKet6BOWiHmC1lLcroHROgHHdSHPXNKm6MkXxR1FslV6LVacj3lqO2HaooqdK7E3CeGMR0bxAC9Ns3O//jLFue8XwnUhKw/X2HEKKr8Bial/tPbNc09GvaI7Sqy0UFCGS06DMgXJOZ1znauQ7Z5/QM/ybW08hN+RH1DEDIND/6LPGNUO541zSjIfI0l5FnTAYJexo/SElMgz19sGkK89Rpy4qyacUIbz1n3oXERHBokWL6N69O3a7nbVr19KhQwcWLVrEAw88wMaNG30Ra6MqKCjAbrczNOpWTKqlscORgOCwACZ+cSEzDj3ssb1lcDsGJI7jg0fmNnBkp6cufVtz8V1nYQ1X0RQTB9dm8PObf5KbUVDjuc6OcSQPseFo4WRCn+WVigA2HRrh5kkEiQshpwz8LLgDssgwXsTprtrr0jRp2C3XYDeuRHFpCLNBgfozec7PARcAgZaLCRVjUZ1WMEGRtogc10cIUXxcdwg2jyKUG1ByDdBUjGAHWeINhOqk2baJiLmbPJ6njOyHFhaJ+5OfPbe3iUe/No6k0murvb+CFbvleoKNkSguBWFxk6d8Q4HzG6CGeVTHyWbuQwQT0ZyhoCmUapvI1meiG+Ub8lnMHYjgLsyuOFDAYTpAtngTl7uaCdeSzxQVGvTpkk5+fj7Bwd6Lc9ZqyEzXdYKCymfyR0REkJqaSocOHUhISGD3bi8z8CWpnsW3jWa/Y6vX9kMFe7mqXajXdql+bV9zgO1rTtUf+ArNrW9jXuBCrFtQ8aoWFEDzm18l1foIDrf3f4tNh06+8wvy+aJ8crGHzooi5zyKmFferv/9dZwizA8RuL0b4rdliL+rVitWC82um0xpbDLszPJ6rmYOQOw66LVdHEjGYgysMQaBgzznR+TxUfkzuI4//uNV5lpHMuPLr29QJc9yunaTyt3/BOSTGKT6V6shs65du7J5c/lM+759+zJt2jRWrlzJ1KlTad26db0GKEnelBSVEah6T3gsmg29ppU1TUhoVDCDr+7D+deeSWTzqs8VaPfjvMt7MXT0WcS3a9YIEZ6+/C3nYd5kRazbW7mhsBjj3SU0U05sOOlUZNJiCcw5u3wO0rFbeDicGJ8sx995BgT7eT1fGDoEVlOjx9+GwOm9vZ7ZLGdgt44m0HoxiuI9bunUUaseoscff5zi4vLu06lTpzJy5EjOOeccwsPD+eqrr+o1QEny5uCOFG6xDENVNAxR9dfYcyIuZuWsHY0Q2YkxW0zc+vJIAtrrrHMswIXBmJuHQFIgb9/zI84yF2OeuIAWA+2sLf2DMlHCyLFnE1p0AW9O/IH87KLGfoRTXpgYi1jmeUsIypyoKTrmmBa49MSGDawJCVGvhwV7PTcKgbFmD8r5nRGbPB9jBLkwtW2PvmiNx3at/xnkap7rXdUns6klMeorqHtKUHblIuxWIvpOJN/6I7mu931+f6nx1CohGjZsWMWf27Zty65du8jJySE0NLRipZkkNYS5r61lwr2P8d7B/1RKiloFdeQsdTjP/jKrEaM7PnfMuIy14d+x+fDRbQ3WsoT29jO4971bSdqRTm6vbXxz8NuK9nUsIzagBQ9+8iRPXfoRRnWbakp1phoBleva/FtmMVpc5GmdEJmJQeTmem1X0otwWVMwDe6CWLy9clvv1jjiUigTiQRcdT7u7xZWqgSttGqOclYbckuu91X4QPn2Js15C/HmivJl+/80rN6J/fJBGO2KyHf7PimTGke9bT0cFhZWX5eSpOO2+pdtAEy5+232u7ZRpOfS1q87pYdUXrr9c9yupl0CIiouDLVVUaVk6B978jdzdkIyHS9M4IXdz1VpTy1OZF3QfPpe1IVVc0+G+SsnL7eajdkeCPleeuOa23G5kz23NTA/S3/CxDhUQnCRSI54D6fbS89NPXIo+7DGtEN4eY9Egp0c4z1sZ3YjuOdFsCuzfGitYxTF1jVkOp4HDKLbvUjgg2Mxdh6GkjLUjgno9jIOOi6kviZFexNivgF+3AVFVcsLiB/XEfrgDeTzFd43TJNOZrVKiMrKynj99ddZvHgxGRkZGEblf6QbNjTFFSLSqWr1L9tY/cs2WnVujl+Ald/3/U5h7vGtiGlsfUd2ZlWx92XpGephjpR4n6i8InseN171jEyIfCyHD4ge9jBizuqqjfZA3FGl6M70hg+sEpVY61tYtvkjluyCol3YIsNofvGLFDRbSrbrNZ/ePc/9JfYLPkPsTqyaL5hN0KsZJY7llLCUHN7B1q0LoOFw7UC4SisOTXM8CvwfQd0vQ1UDKS5dhLssxaex/yPQOA+x28sGtAI4mIelVRucbllo9FRUq4Ro/Pjx/PHHH4waNYqzzjpLDpNJTcLBHQ3zQ7M+aRYVl1HdRFGB0yj12urSnZj96q2jV/KizLWBklY78R/ZGzF/KzjK/86UFjFwbTfS9AkneMW/6wCdyBKuGoSb78GywI1Yd/QXUpGZg5i1nOBrBlDSch2lrhPZbd7MiSyPMoxccvxmETbuOoxv10Nh+S8lSkQoynV9SBPPcLSHx0WZc1N1V6Ow9PsTiLWeCKX6zh+XjqLIsiunqlr9JP3ll1+YN28eAwYMqO94JOm0sn3pIc4Zdh47czZ5bA/Sw+kY2IfF/OKxvVf4QLbOPeS7AKUKGc5nCOg8jLDON6GWWcGsUmzaQI57LLqRfVzXsFiGoFgeILcsHIBQWzY4/4fDubCO0SkEGUMrlQQ4ljF3E+GTbiOZmhIiE+GWSQTpQ6HIDVYTDttBMvWXcOs1/8KR75qDI2ov4XfehaksDFQFh+UQWcbdJ0UNHoe2E7/mzRApXnr7WofjdMneoVNVrRKi5s2bV9QhkiSp9vZuTuQ6x2CiA+JJK06q1BZmi6St6EXW7nza27uzJ39LpXY/UwAXhFzHs1/UrfpuUxF0WAAW5sd1bOxQqpEJ/PdfryX8/VW9zkEj0ApH8fo6ldK/6//4mQK4q/f/MKzfsKOwagHR4y1QqalhkFHNMHFxKZozuIaNTTXibB9g+jUfselogmaNDCVu3PukaLfh0pOqOb9cmWsjKYw/WtSl4VbK11m2/i5xV8yEtzLhX1NBlF6tKbasRDhPogeSTkitEqJXXnmFRx55hJkzZ5KQUPMPAkmSvHv9tu+Y/PGTbA1exuq8+RhC56yQIfSxXchr478jP7uIBz6+g6Tm21ie9zOl7mLOsPdnUPAVfHDvb5SVeN6M82R1KDmS+Y0dRD3z16ycG3o1U5ZVzkhK3fDfNSovDLmalZnJlOhV/y6PJykyRHnl7GqZlGpH6IItl2Fa4UBsqtyTIzJz4f0/aXb7MyTrt9QYy8nMrSeT6f8Kkfc+DAv3IA6kowT6w3ltcLTKIrOGvdSkk1utEqI+ffpQVlZG69at8ff3x2yuvKdPTo6XnX4lSaoiL6uQpy/5iF6DOzLysntRNZUNX+3jyd8/rFhO//y1n9KpTysuGH0HFj8zO39L4qnvZ+FynBr7Ill2JWMnDrACFg4R2dgh1atr2vbg1z0mvGUkv+4x0c18Hl8drrzt0bGJYXWJkRDF6PZSVJsVyqomVUrLWIq1tdUmRHZxNWL1es+NeYWY8kJRA4IxRM1bsZzMil2LKVX+wj78OmziDHRyyeMZnI5djR2a5GO1SohGjx5NSkoKzz//PM2aNZOTqiWpjoQQrF+0k/WLvBT/A3auO8jOdd63NjiVWBNPrYmrzVuGs7GaEa3UIugZFFbluQ9R3lt2QXTNH8aZ4r/E3DwV4/2l4DomUQ4OhKu7kuO+odrzVZep8nn/InKK0ILDMNyndkIEYIgicp2yCOPpplYJ0Z9//smqVas444wz6jWYF198kSlTpnDvvffy6quvAjBo0CCWLl1a6bjbb7+dmTNnVnyfmJjIxIkTWbx4MYGBgYwbN44XXngBk+no4y1ZsoT777+f7du3Ex8fz+OPP85NN91Ur/FLkuSdzd/K4NG9OOOC1ghD8NdPe1j+/aZTpperOsm5ObQMh31e6hYmBENKbt161svcG0kLfJ6oyY+i7C+EjFJoZUePcZOm345uVH99w+xEtVoqVtD9mxIRhFvPRFPDCTGNw5/eGJSQxxyKnQvxdY0gSfK1Wu1l1rFjR0pLvS8Fro2//vqLd955h+7du1dpmzBhAkeOHKn4mjZtWkWbruuMGDECp9PJn3/+yaxZs/j444958smjewsdPHiQESNGMHjwYDZt2sR9993Hrbfeyu+//16vzyBJkmcJHaJ54ucbKLl4E28U3MfbxQ9iuvYAT/98E1Fxp35R1z927mZoC4HmoTNdU+CCBMHvO+q+MXapexWHXZdxpM1/yBjwCSlRD5DkvP64JkPnKrNRzu3ksU2JDMUZnIa/1pcW7k8J+jYWddpfmF7bR+Rf19LC9hWqEljn+CWpMdUqIXrxxRd54IEHWLJkCdnZ2RQUFFT6OlFFRUWMGTOG9957j9DQqpta+vv7Ex0dXfEVHBxc0fbHH3+wY8cOPvvsM3r06MHw4cN59tlnefPNN3H+vRpg5syZtGrVildeeYVOnTpx1113MWrUKKZPn16bx5ck6QSomsptb47kf4cnsyZrEW7hxmk4WJYxlzePPMqkty9r7BB9rszt5os1f/LIWRrhfkezonA/hUfO0vhy7Z+Uueuvp8zh2kmJY/lxJUL/KHL+irN3Cco5XUA7+tGgtIyFW/qQZbxKlOMBjNcXIPYll9frKXUgFm1D+Wwv0WY54Vg6udUqIbroootYtWoV559/PlFRUYSGhhIaGkpISIjHhKYmkyZNYsSIEQwdOtRj++zZs4mIiKBr165MmTKFkpKjZdVXrVpFt27daNbs6O7fw4YNo6CggO3bt1cc8+9rDxs2jFWrVp1wrJIknZh+w7uwpvh3il2FVdpyyjI5oGymY++WDR9YA1uwew8fLJ3HLR3ymXo2TD0bbulYwAdL5zF/157GDg8QpDruIvfsRYgHBqDcPQjlgcEUX1tAovsGgtVLEHN3VFmODkBSBpa8WDQ1vOHDlqR6Uqs5RIsXL663AL788ks2bNjAX3/95bH9+uuvJyEhgdjYWLZs2cIjjzzC7t27+e677wBIS0urlAwBFd+npaVVe0xBQQGlpaX4+flVua/D4cDhOLpaozY9X5IkQYcBcSwt/M1r+9bSVXToP5QtC079CePbjqTz8Pc/NHYY1RDkO78gny/KPx0MKuoI+dELsd/zTvQA7MjCOqALJY5lDRGoJNW7WiVE55133nEdd+eddzJ16lQiIiI8ticlJXHvvfcyf/58bDabx2Nuu+22ij9369aNmJgYzj//fPbv30+bNm1OPPjj9MILL/DMM8/47PqSdLpwFLvwMwV4bffTAnAUHf8WEVJNFPwsZ6Jp4TjdB3G6Tny5uMXUFou5HbqRS6ljLWAgcILFDE4vf1cBJoSo37mlktSQajVkdrw+++yzantW1q9fT0ZGBr169cJkMmEymVi6dCmvvfYaJpMJXa9aNKNv374A7NtXXj49Ojqa9PTKZdb/+T46OrraY4KDgz32DgFMmTKF/Pz8iq+kpOMfi5ck6agVc7ZzbrD3eUJnB4xg9S/bGzCiU1eA6Xxamn6m2ZaJRP5+KbEHn6CF+RssprbHdb5Za0ELyxxik54l8vdLabb+NlpqPxFkuoR8vkHpW811ukRS6pQbe0snL5/uCilEdbvkwfnnn8/WrZV36b755pvp2LEjjzzyCJqmVTln06ZNAMTExADQv39//vOf/5CRkUFUVBQA8+fPJzg4mM6dO1ccM29e5R3F58+fT//+/b3GZrVasVqt1T+gJEk1StxzBGvKOfQIPZtNuZX30jo3ciQ5G93kZRViySo8pjgjFCbI+mbH+qagV7XFGf1M/YjKvQfjw0VgGOV7lK4HJcCP2DteJ0m9Cd3wskcXoKqhNFdmIt5aiSgsPrrH6QKFiLE3kdHsHYwBkSh7MyCt8t5tyuV9yFO/AqP+NquVpIamiJqyljoICgpi8+bNtG7d+rjPGTRoED169ODVV19l//79fP7551x88cWEh4ezZcsWJk+eTFxcXEVtIl3X6dGjB7GxsUybNo20tDTGjh3LrbfeyvPPPw+UL7vv2rUrkyZN4pZbbmHRokXcc889zJ07l2HDhh1XXAUFBdjtdoZG3YpJPbWKxkmSr5nMGje/cDGR3W1sKVuJhkZ3v4EcXJ7N7GfnV/rlydkxrhEjbVqSh9hwtHDSMi6TC6J3VZsQxVu+RH11g8c6QkpcNCVjSslwep8GEG65n6DvmiP2eugN1zTEAwNIdt9MjPllzFlhsCMLAszQPYo809fkuWbV6hklydeKCg36dEknPz+/0ir1f/NpD1FdWSwWFixYwKuvvkpxcTHx8fFcddVVPP744xXHaJrGL7/8wsSJE+nfvz8BAQGMGzeOqVOnVhzTqlUr5s6dy+TJk5kxYwZxcXG8//77x50MSZJUN26XznsP/kyg3Y+OvVth6Aa/rf0WR2nVD2/LruRGiLBpCmrTBrBADTmiqgSi5VoQXooqiuQ0/NzVz/0MMM5G7F3huVHXUdNcKJE2UpwTMIVGYx3UBUOUUOr4q9oK15J0smhyCdGSJUsq/hwfH1+lSrUnCQkJVYbE/m3QoEFs3Lix2mMk6VTkqx6X2iQuRfmlrFu0wwfRnJrs+x0UJtg4lBwJ1WzfoSgWcNQwMV0X1e92X9NoV5kLVbGhA249DbeeVsMJknRyaXIJkSRJ9cfZMY78NnWfC6cocMk5ZzC4bzdcigmzIkjdkcyPT35Jbkb5won+I7pywYReGP5ONDRyDpTy7UvLST2YeVz3aNU5lisfHkhArAmBQM/XmPf6GjYt21vn+E91upELkf7lCY+nSRBBAeiWXBS3jTDz7QTq56OUCYRNoVhbTrbrTdymI1jC7IicfM83iQ/BpafUOVarqRuRyn2YyiJAgO5XSJZ4i1J3+fwys6kVkcr/s3fe4VFUaxx+Z7am90JIQgm9hN6L9C5IsaECKiooqICKBfWKAmJBUBHsDRABUaT33nsJBEjvvZftc/+IBGJ2k5BCc9775Lky35wz32w2O7895yvTURvqgFnCYqcng5/IM5b9pVdGpqrUqCB6/PHHy9yvk5GRqTmuiaHqCE7+YPgwIgq8mHXCjPmfeJ8GrnV4edUrLH58If0mtEPZJZWFidMwmHUA+DjU5rkf3uTXl/YQdq7s1aTWvRox5O1gfoydQ2ZkGgD2SkceeWsKtRp5sPm7I1W+h3sbiRxxE85dWiEdKr2SJI5oQ4rwKQGaXxDXJyBd2FWsmxwbB2I/chkp0lxqj5yP6fv1pce3bUq+6gRUse+co7I/XllTsKw8hpRblFAj2mnwHT2dDL966KRz+BnmI/16FCkjHABBpcRz4KPYNW9FqmFela4vI1MWlUq7r1u3LrNnzyYmJqbM85YsWWKzBpGMjEzNUZ1iqGeDeqSavFgfbsZ8w+pDWJaZ+SEiz339HF49RVbFLSkWQwDJBfEsjHiNJ+b1L3N+USEyZlZ3Po94nUx9WvHxAlMeP0R9SLtHA3H1dKryfdzrZBiXou9ZgPBQJwQPV1CICAG+iM/2JtNvE/Z0QNyQiHQhqsQ46XIM4poI3MXnsBTqUI0fUdSuQyGCmzPKwT0QmzdAJdWuoocqPC3TsHy/F3Lzrx8u1GNZdgD3wkfxFedgWbqn5CqV0YS04SQOCe1QKxtW0QcZGdtUShC9/PLLrF27lvr169O/f39WrlxZoqqzjIzMvcMDrdvyd5j1TubJBRZc/RRsTLWeYVRoyieaEOo3tx3H1Pa+xpwq2I3JYj0GZkvGCvo83vbmHf/PIZGon05i3bnonnbA/Eor8h/NJ9b5ebKMP+Bo7ot0wXo1cCksDntDKyxr9mD8cydiwzqoHh2CsndHzCFhmH5djyrHq0oNXB01/eFIrPXWHwCn0xDDDFaz5ADYehF34elKX19GpjwqLYjOnDnDsWPHaNq0KVOnTqVWrVpMmTKFU6fkwlwyMvcSapWGPKPt6hySaCY+L8qmPd4chldt2z0Oveq5Em8Ktz0+PxLv+i4V8lUG9MbzJBlfJc4wnlTj+5j+ifsR9FiPL7pGoQFMJsjJw7z9MMZlGzCt3YEUkwiAlJaLQlH5FX+1pS4klO5ndw3BJEFspk27lJqBSgio9PVlZMqjSpWq27Zty+eff05CQgLvvvsu3333HR06dKB169b88MMP5RZmlJGRufMxmgzYq2xvvQkWkVoOth9UtcR6pCdl2bSnRefgq6hre7x9IGnRth+kMiVRKxvgrZpNbfXXeKpnoBC9AJA05Wyf2qnBSjHcawgejpjN6Tbt5WEQYsHX9gqTpBCgtqvt63u6YZSqHtQtI2OLKgkio9HIqlWrGD58ODNmzKB9+/Z89913jB49mjfffJPHHnusuvyUkZG5Taw/e5ph9ax/VHjYCWQnS9xfa4JVu0ahpYGqNWHnbLe+ObXnEh0c+6EQrOd4DHQfy85fT9603/9FfNRz8UuYi/0vIspF4Tiu8SWw8AdcVGPJU+xBaFrH6jihXi0K1RcRuja2PrGbM0aXDCxS5YVpvmErdAksSlm0gtTaA0tDu6J+adYY0JQMfqj09WVkyqNSWWanTp3ixx9/5LfffkMURcaNG8dnn31GkyZNis8ZOXIkHTp0qDZHZWRkKo46NK64DUZVA6t3XwlnYLPmDKjrwY5oM5Z/Fn4DnEVebmzhl+d/ZvqKB3mA8WyIWFEcC+Su9WJy8Cx0V8vOTDKbLKybf5gXXvuAH2M/JNeQBRSJqTGBL3D+oI5kd2dwv70Zq3d6wUg31dPYHfVC2nO9PYoUHo/0VTzu48eQ6PshDiNmIprNSFeu34tQrxbSw41J1I/Hr8tnKPMbIJ0MK95eE3w8YFw7ks3PVMk/CQNpisV4TpiI9PtRKPgnAF+jRhjRjmyHvyiUTlPr2fex/HoYsvOK7AoFQt8WFPhfxGC4+Ua1MjIVpVKtOxQKBf379+fpp5/mgQceQKUqrejz8/OZMmUKP/74Y7U4eruRW3fI3I1UVx0iURAY3bcd3do2xYACtQipYUn8OWsFXUY0J73PUey0dvSoPQiTxYBSUJFrzGZd+K884PEs3z2+m8xU242eARq1CWT4O73RektYJDOSwZ4/d4Sz/2x0lf2vLlzC9bdUGBma+Be373im/f4yW3fUVa5H+mSn9TghZ0dMkwJJML2Mp+olHEzdQGcBrUCB8gRpxgX/rP4ocFc/i5NlMEK+BBoRnfoyaeZPqq0Qo1bVAU9hKsoCN5AkzA4FpEvfUGDcBYBa2QgvcQYqXW0wWZAczGSynBzjmmq5vsx/jxpt3REREUGdOtaXXq/h4OBwz4ghGZm7lesrRVVnW9hBtnEQQQDVpeuioFmPQL7I+AyzZOJA/FYEBKQbnsoXHA/RsFUgx3ZcKHP+C4UWDn53gNw6Qsn6gndUk1cNLvjfcatFCtEdUgttB03n5KE0eCIJBaQa5pEKoBbAIkGJpC4zGYYlZLAENP/8FmwkfVUWnfE4cYz75+kjgKGk0wbTFeJ5riigQy1AGQH9MjLVSaUEUXliSEbmv4iLhyN1m/hRmK8n7FwMFsud8UFe0w9vo8GMRqGlwFS0xSH966mswQ7TTfa6qslXrr6HG95OjiTm5BKdkVWDV7KOxk5Fw1ZFn6FXz0ajLyy75YZTtIQ+sOw5JckIqnI+zhUClMh4t/0qKxV+qJX1MVsy0RtDyp7XCqLgiEbdEiQzOsMZJJuqqrzf9J3xNyTz36DCgsjNzQ3BRjDcv8nIyKi0QzIydxv2TlqeXXA/qkADV/VncBCceUI7gR3fnmbvqjO3270a5/DqS3R7ZjDbk1Zbtbey68mmo7/fYq9K09TXi5f69COhQE1CvshgRwveWj2fbN9GeFrNf2YJgsCjs/rSoKc3F3XHQIJRdg8Svi+VFR/sKDMrVxOjZrt/E5t2gKfc1LipVWAoLbCE2t7syC+kgV3ZPipEH2opP0KZYg8RWeCuRWroTJr0GfmmPVbHrMm5XiNKFJR0dX8BtdScE7FqVAqJjn4Gkgx7OZ29ouyLQ5lbgjIyNU2FBdHChQuL/zs9PZ0PPviAgQMH0qVLFwAOHz7M1q1befvtt6vdSRmZOxWFUmTm8rH8lvcxUVFXio8LfMuEp18DWt/zoujI5gu8N/lJLjoeL1WPaIjvWEK2xmLQldN4tIap5+HG9P7DmHPUQr5R4lonU2e1hreGjuDtv/4gPrvsGKeq8szHw4isf5C1ERuLj23gV3q0GcIzn9zPNzP+LjXmxuD4RGqzvavt+YPUGYwcOwjplw1w4+qkVoP48AB+23WAA9JLNmORRMGJAOX38O3xkpWiFSLeT04jyc1Ioelg8eE1OW3ZnlRSpM1s/BTrL/pyJOHa9QWWXdAyuskgfFz8WBazoczXCGRRJHP7qLAgGj9+fPF/jx49mtmzZzNlypTiYy+++CJffvklO3bsYNq0adXrpYzMHUqXoS05IW0lKvdKieMSEj9FfcSbE79m3+qz93RNLovZwsdPrOT5xa+QXyeZ84aD2OFIB7t+XN6RzO+fFAXLigqRToOa02pgfSSLxPG/rnB67+Vb8to8270Hi05J/4ih6+QYJL48I/FUt268v2lzjV3fq7YbTi0t7I/aWMq2P3UTLVt0wau2G6nxtgsTAkVd723QuU0Q34Z9z+gXHsLxXAKajELyA13IaeDKuuQlTG/1KKFfxPEtPaB9aeHhphoP66+Wbu5qtmD5eT9e06cRQ5EguiaGbvSnmZs36dk3iqHr/BEq8W73YDKSz5BjtN3VYPsN/y0LI5lbTaViiLZu3cr8+fNLHR80aBCvv/56lZ2Skblb6PZwc5amvmrVJiERqj9Jw+AArpwtu+/f3U5ORj4fProcv3peNGrTjOxCIx/tXYuuoOjh5+3vzovfj+S4bjvrsxYgCgq6TB/IiBlPsmDCKrLT82rUP3cnN5ILrLeMiM2x4N/SttCoDro+0IJ9eets2vfn/U23Ub3564v9Zc6jibGe4ephb0+2PpmdSZvYnbyVlt7tcavtQXxBDFfPFgWzD+/yBC7hekBjVRQ5WHohXTpg/cJGE2KyBYWHF79nBZQQQ9d8Gt6gDdsiRa6tvpW6xxgFA+ybsP78JZv3F4UX24H+vnJ6vcytp1KCyMPDg3Xr1jFjxowSx9etW4eHh0e1OCYjczeg1CjQ5+ps2vMt2Wgdqp72freQEJlKQmRqiWOCIDD125EsTizZvPXvwp844bCbKUtfZ86Dv9SYTwJgtNE+6xqmcuxVxc5FTZ7B9pZcriEbrVPly3m42mnINxQVTbRIZs6mHi11jtFcdrqYYJbKjGGW8g2InvY27Q4aDbkG2xPkGASc1f+dvwWZu49KCaL33nuPiRMnsmfPHjp16gTA0aNH2bJlC99++221OigjcycTeyGVBk2aEZZ10aq9kbYNuy/vKHeegIY+PDC9G2717JCAvAQDfy84RPiF6mlVoLFTMXhiZ1oPro9JMCAa1RxedYmdK47XeDZc296NOWvcW0IMXSMhP5oU9zDqN/cn1PrCQpWRALVgQiEoMFu5VbUCRGo2xunqkXiad+5IZM5lq/YWDp0IO5pQ6fmjM7Pxc+pm0+6kdkVt1pY5h1GZhsrF8XpBxH8h+LtgMicC1r/0no+PpZVXLXbYKBvV2svM+uOJZfogI3M7qVTrjgkTJnDw4EGcnZ1Zu3Yta9euxdnZmQMHDjBhwoRqdlFG5s5l09dHGOn1LKKVP6W6To3RRyvL3Q5qfV9Dxi3tyd+OC/koegofR09hleJDRi9sR5f7W1TZR629hjd+f4K03keZHzuZT2JeZEHSFNQPxjDjx0cQxZqt89OyX11O59reCjpVsIcW99WtUR+2hpxneAPrH3djGilYd+Z0jV7/1O7LtNH2wklVukmto8qZNtpenNxV+W0ik8VCbKqBXrWGWLU/0fAFtn5b9j1msBRheGvrxuZ1yFcfLyN9HjaHhDKwjoTWSjs0L3uRuo56LiQml+mDjMztpNK9zDp16sTy5cs5deoUp06dYvny5cWrRTIy/xXSErLYuTCEGQ0X0MC5OQBahR39fcbwiOMrfDu97KwahVJkzDs9WBQxk+SC66tB6boUvgx/i8HT2qG1r9o2wyNv9WFd4RJOpO4trhFkkkzsTPqTENdd9B1bsy12TAYLKtFGfypALWow6Wtoeegf/jhzDm9FIs8GK/CyL/rY83EQeaG1ApUhmi0XazZmRZIkvp6ynpfrfEoHz16IggIRkfae9zGtzgK+nrK+ysHlz6/ZwfCAiYxrOAUXtRsAfg6BvNZqPi5xgfz59Z4yx+uMZ8n23Yz49H1F7ToA7LUIA1phHu5OqmFemeMNZjOfbt/Ce11FOvgqEAClCH0CFbzazsx7G8vPMJORuZ1UassMwGw289dff3HpUlGAXPPmzRk+fDiKMroly8jcixz6+wJXTsQx5LmHGNHCC6POxIHvQnh/y0+YywlO6TSoOUfythT3/7oRCxZ2Za2h55hWbPvlWKV8EwSB+p28WRluPWNnX/IGpj/0BduXVW7+inBk7SV6zhtKVM5Cq/bODoNZtuUwODnUmA8AH2zeSuvatXi0XTvc7B1Iz8tj9ZETt2zVIuZKMnNGLKf32HZMG/AQIHF2WyRzli+nIM92HFpFMVgsPDhrLU+1acG7w5ai0IoUZhr47Z29HN12vkJzZJq+J899D+7jJ6KmMxYhj0x+okC3j4oUSTyfkMT01SsZ3aY173QMxCxJ7Lt6mckrLqIz3VxxThmZW02lBFFYWBhDhw4lLi6Oxo2LuiPPmzePgIAANm7cSFBQULU6KSNzp5OWkMUv72656XG1GrkTpj9i0x5TcJUejXtU2i87Rw1ZptKxO9cwSSYs6pqNnwk7F8uY7Pto4NyCsJyS7Ttau3XFEKYtSjdvUrOCCOBMfCJn4m/fSkVBno6N3xxi4zeHyj+5kvyx5iTrP7Cd0VYeRlM4ybxR6fFZhTq+P3QEsP2+lpG5E6mUIHrxxRepX78+hw8fxt3dHSgq1vj444/z4osvsnFj6VobMjIypcmIz8NL7cdlzlm1e2n9yIyvfEq6rsCAk8IVKOo+39N/CG4aD5Ly49kXv4kCYx5KS1F2k6OrPX3GtsW3gRvpcbnsWnaKzJTqKVa4cOJqJn/xNP3q5HBStxMRJR3t+pN3WeTrGZV/eMtUP6LoiY/2f2hMjTCLWaRZPqHQUHMriDIydwqVEkR79+7lyJEjxWIIilLxP/zwQ7p1s53pICMjU5LD6y/w+rMPciDZ+upSX9cxLP795leermExW0i7UsgTLV/Ew86bbVFrSSlMINApiBdbv0d0djhn/4yi9yNtuO/ZZmzN/I1T+eF4+/vx7OBHCNuSyR+f7qn09a+hK9Dz2dOr8PZ3J7hHGyxmC9/t3kNmas1Wh5a5Odw103A3PIZ5/TEssQdQODvg12seZj89UTrrAdsyMvcKlRJEGo2G3NzcUsfz8vJQqytfS0NG5r+GrkDPgZ9CeerxN1gW+xkGc1EsiVJUMab2JC6uT6ly0cKIk4kEtBBYeGpW8bG0wiROpxzilTYfkZSdTOunfZh/dWpx0HVaYRIXM0/xSN8pdAsL5uA66ytYN0tKXAY7fpNXG+5E7DQ9cM8eg/HrFWApin2TMnMw/bweRY+21O72HfH6ibfZSxmZmqNSWWbDhg3j2Wef5ejRo0iShCRJHDlyhEmTJjF8+PDq9lFG5p5mx68nOPZJEi97fsZL9T5iat15vFLrS65+beCvRfuqPH/n0Y1ZcXlxqeMSEt9dnM+gqe1YEbewVJd6gD/iv6H/M21LHZe59/DhHUxrdxWLoRsx7z+FnbkNVcjDkZG546nUu/vzzz9n/PjxdOnSBZWqKJ3WaDQyYsQIFi1aVK0Oysj8FzixPZQT20PR2KkRRYHCfNv9nm4GjZ2aPGUmJsl6hk+mPg2TSk+2wXoPLaPFQKE6G5VGiVEvZwnl1hEAzT8NV28N2UGaf65bsygNzhiT023apagk7Oq3BeT3gcy9SaUEkaurK+vWrSMsLIyLF4sq9DZr1owGDRpUq3MyMv819IVlt1cIaOiDWqsmLiwJfWH52WGSJKEQyi6FIQplLxQrBCVSBatZa+zU+DfwwaAzEHu1dDq71l5Nx34tMJstHN16DlNN98yoAa6Jolt7vVuAWM6GgVKBJIshmXuYSq9/fv/993z22WdcvXoVgIYNG/Lyyy8zcaK8xywjU910HxnMwEntiDaGUmjJJ8iuB3EnsvnlnS2YjLaLGhp0RlT5jmgVdujMhaXsvvb+WPIUeNnVIrWwdFsFO6UDQo62zGsAqNRKxr0/CL+2TkQUhmAnOhCoHMKWJSc4tO48oijy5u9P49XEjnPpJ1CISh6d/xLhR9L4bOKvN/+C3GZumUi5hRhU8YiBtZBirLTXEAWEAC90hlOAvIUqc29SKUH0zjvvsGDBAqZOnUqXLl0AOHz4MNOmTSMmJobZs2dXq5MyMv9lej/ShiZPOjEvahIWrq+otArqwowfH2P+48vLHL9+4WHGz3qNbyJml4gTUosanqj9Citm7WL8rJksCn8No+X6CpWIyITA1/j7vfJr5sz48WF2qn5lecT1pqIiIhOefw2Vtg1DJ/VkT8Eadh1aU2LcqKDneHvNM7w/Ru6BeLtJMs0icPRPGJesAl3JlUrl8F7kiJtvk2cyMreGSgmiJUuW8O233/Loo48WHxs+fDjBwcFMnTpVFkQyMtWEQinSZ2Ir5oQ/V8p2NvMwDWu3onmn+oQcjbA5x5m9V3H2cuCNyUs4kreFFGMcgerGtLPvzYq3dhNyOAK9Xs/M9xZzMn8XscareKv86ew4iG2LT3P+QHiZPgZ3a0isy1nOx5fssG7Bwo9R83l/6s+kq+PYdWVNqbFrw7/m3Y4d8AlwJ7aCr4lM9bA9qQljnK9XMDeYLpCsmY3Py+9hOXkFKTIeXBxRdGtFgf1pUvSzWJNTcnVIEyNnFcvcO1RKEBmNRtq3b1/qeLt27TDJ5dllZKqNFp0acLbwgE37rvQ/GDXhtTIFEcC+NWc4vP4CHQc0x6dWEDFhmfy154fiTvfnD4YTMjCStvc1xqfBfWTG5/H+9uUYDeX/PfceH8zvadb7XElI5GnT2BTxi83xW2NX8tCM4Xy61HYDWJnqQR0aB0FBaGLUROHFGt9/b3/FAc/QsdVw6rZtRDZZ7M+YSUF+DtCW7UlNAIiK85LFkMw9R6UE0RNPPMGSJUtYsGBBiePffPMNjz32WLU4JiMjAw7OdmRL0TbtOfpMHFy0FZrLqDdxcP1Zm3aL2cKJXZdg1835aO9iR06W9Sw1ADMmm1lsANn6dBw87W7uojKVxiVcT1FQuJrt/k2snrM9KRS41vDW75+fIiEE8sqQzL1JhQXR9OnTi/9bEAS+++47tm3bRufOnQE4evQoMTExjBs3rvq9lJG5w2naoR5DXuiA1lOJAiUXd8ey5bsj5GWXDmS+GaJCExipast+rMdvNHINJvJEEnYOGgY82ZFWA+thFowYs2HrkhOcPXC1StevkI9nk2jUKpjQrDNW7Q640My9A0n51jfFmnt0IXRVZA16WH20C6jNIx06Yqe2RylKHIsMZ9Wp0+Tpy84OvFVUpCzAjWn81wTOjfT0q8dDQZ1Qi1pE0cK+hEusCjuNzmy6S4SQEhf1g7hYHkAwKbCoDGQJK8g1bKQiDWpl/rtUWBCdPn26xL/btWsHQHh4UXyBp6cnnp6ehISEVKN7MjJ3PsOndCfwfjUrEueQGV/USLVl5468Meg5Phu3hrTErErPnRCZimtebatZYCIiwz2f4vs/d/DmmsfZUvALnyR8ggULTmpXRrz1JM0P1mHFBzuqcnvlsuW7Y0xd+RRXsl4uEfQN4G3vR26khWHNxnIgfiMGS8n6Sg4qJ7r79GfS0o+gya2r7VMZJnbrgr93E765aCFLLyEAHWo144uHG/HqH3+Qll9wu10EikRRbp2KrRr+W+C80q8vojaQxScs5BokFAJ0q92Ob7o2Z9qaNeRxZwg/Wwio8df+hGJ3DtLxY2A2I6pVePQYhVOHISToX0AWRTK2ECRJkt8dFSAnJwcXFxf6eU9EKd4N35JkbgUBDX146Kv2LI6YVcrmZVeLx1Rv8eGjZWeBlYeHrwvTf3mQrXnLOZG2F4tkJtCpAQ/6PM/uRaF0Gt6E9XZfEJdXepVlQp3X2PNWLKEno6rkQ3l0e6Al901tzF+p32Iw6zFZTNSxb8gAp8f49InVdBrUggGvtOanSx9zObNo2y7YszPjmkznl2k7OLbtPIYm/resCOHN0tTXiye6D+Ozk6XLD9R2FHm8YTavrP3zNnhWGn2ggbr+qTbttuJ/utarQ88Wffj+Qul7bOwm0tc3mfc21Wym2TXf+/uGlgj4riie6ldx3Fgb6UJUKZvQozkZXXeSY1hZDZ7K3E3k5Vpo3zyZ7OxsnJ2dbZ4n12GXkakCgyd34q+Ur63aUgsT0Xmn4+nnSlpCVqWvkZ6UzfsP/Erfx9oxfcgYECSSrmbxwxu7yc3Kp+eURsRFWt9y+jv5Rx5+/g1Cn46q9PUrwtGNFwm+L4ines4kIv8iDkonfKnHnx8dICM5m80/H+TyyWjGz5uBe2N7AJJCs5kzdBmJUbYf3ncKj7TvyO+XrX93jM+zIKic8XCwJ/02rRLpA4tWbioiJtb4toX2RVlmN4qj0W3b8dUF64UyL2daeKypN3YqJYXGOzdxxtHcC+nCdqs26VAorl0eJAdZEMlYRxZEMjJVwLuuC/GpUTbtUYZL1K7nXSVBBEVNYDd+e4iN35asCRTUwp9YfZjNcRm6VBx8a35Fc/r3D3PQfjU/nrjee00hKHn6+TdBCObAn+eIuBDHu/cvqXFfagJ3BycS8mxX1Y7IFghwdbltgggqJoYAxjifKpE+rw80oIlRo1FpyTHY3jCIzQUvRwdiMrOrzefqRBDsIKeMljdmM6JOAWUXbpf5D1Op5q4yMjJFFOYacFK52LS7iz5kZ1StW31ZZGfk4a70tmlXK7SYq6ctmk1adA4i0f0iJ9NLNqI1Sya+i/yAgc+3QxTvjG0wB7WaMW2CmdrrPka1aon9P70Yy8NoNmKvsn0PnnYSWYVVC6C/3YiCBWUZTwQPLeToavjNVAUkSQ92ZYt/SX1nvA9l7kxkQSQjUwV2/3iWvt5jrNqUooo6iuZEXUqoseunJWThpvfDTulg1X6fzzAOrqzZRIfeT7ZmR9pqqzYLFi4UHqZph/o16kNFeKBVMJ89NBbJsT3Hs+ujdGnP5488xqBm1lPPb2TDuTPcX9/6grpGAbXtjURlZFWzx7eWXaEX6RVgffnERSOgpICsQt0t9upmsGDQRiJ4ulm1CkG1yRcP3mKfZO4mZEEkI1MFTu+9TJ2sNrR371XiuFZhx9Sgufwxt+aLDcaFpPNqu49wUDmVON7SswPDgh7l4pGoGr2+o6sd2foMm/YsKRVHF/sa9aE8OtcNpEODdrx10MK+OBPhWWZ2x5p566CFvi260Lq2X5njI9LSeaCxPZ1qlVxRclAJvN/DCW1ZSyt3CX+fC6G3n44WniXvxVUj8HpHkcV7brJAVRX5d1XsipBi/hAmdATXkn8Lgq8n0pgmZJiWVpd7MvcgcgyRjEwVkCSJTyb8xiNv9GXAfY8Qr4vEUeWMfYE7f7yxj4tHo2r0+qJCpG5HT5aFfsbk4LcwWPTkGLKo5RBIXG4E34TMY8BTQ1jxgfVA0+og5kIyDZq34Er2Oav2BupWrL1yvsauXxEe69SZT06XjgGySPDNOQuTu3ThzJo/bI5/pX8/Fh4voKW3klGNtcTmmHHSCNgrBX48X8i09i7Yq9UUGO7stPSyMJjNvLx6DS/16cUjjfyIz5NwsxPAnM/8zbu5mpp+y3zZntSE/r6h5Z/4L0zmeOIUz+Dz3LuocrwgMx88HTE4RJNsGo9FyqkBb2XuFWRBJCNTRcwmC8vf345inoi7jwu6AgO5mfm35NoBDXyI0IUQkR3KJydfx0HlhL3SkUx9GiaLEYAhHZ6qUR+2fnecScuf4tPsaSWaxwJ42vnimlebxKhbu7rwbxQKOwqM1gOGs/QSdhrHMscHuntx7Fg+RxONKEXw0IoUmCRy/wlCPppoZEDjIP46f6nafa9pPPeruF6bx8CXYdtQKxW4O9uTV2ggr7AobsjJ5gzViZrEmNoAfBvoZbOSdvnsxk6hwVllT1Z0HnqLERhUbV7K3F2Y8vXAl+WeJwsiGZlqwmyykBpvu0XFrSDfmEu+MfeWXjMtMYv9X19h6jPzWJP0FQn5MYiItPboylCXJ/lsfOmmrrcaodxY2oqXYzNZILng36tNtzZY193eDgeNmqScPIzm0nWDyuPGlHuX8NJxQRqNEjcPDao8I4qyMreqGZfwokraAE7RKhLr1K6GWW/vdq3M7cesq1js2x0liD788EPeeOMNXnrpJRYuXAiATqdjxowZrFy5Er1ez8CBA/nqq6/w8fEpHhcTE8PkyZPZvXs3jo6OjB8/nnnz5qFUXr+9PXv2MH36dEJCQggICGDWrFlMmDDhFt+hjEz1Ens1iSCt7W++Td3aEHYk0aa9utj7+2kiTycydMoLeNR1QJQUnNkawZxfl1GQe/sDcU2mQhxUduRbWSVy1Qjk68oWkdHpqXSo5cKxRKNVe2c/JYs21nyblNa1/XimR0/yzRqy9FDHGa4mxfFJ2LYKz3FjbI5TtFTU8PUf7Bw0jJ8zCK/m9iTqo3BR1UGT48pv/9tFZEjNJQfciNc/O2WGJv64hN+SS8rc45hMOiryVrpjBNHx48f5+uuvCQ4OLnF82rRpbNy4kdWrV+Pi4sKUKVMYNWoUBw8WZQuYzWaGDh2Kr68vhw4dIjExkXHjxqFSqZg7dy4AkZGRDB06lEmTJrF8+XJ27tzJxIkTqVWrFgMHDrzl9yojU11YLBLH/rzCkIFj2ZS0ooRNq7BnpOdzLPh27S3xJeZKEkte/KtKc1xrPFrd1aqXHTnCsx36svCkucRakCjApFYiP+8/ZHMswMfbt/Plo49zMc1E3r9E1YiGGuIykik01WzBwnYB/jzVoz/zj5spNF33oa1PAJ/3eIiPIsrfEriRf68OqbUqXl/5GKvzFxEWcaH4uKPKmSlfzuPnKbdOFAElhJqMTFUQLRWL7bsjWnfk5eXRtm1bvvrqKz744ANat27NwoULyc7OxsvLixUrVjBmTFFqc2hoKE2bNuXw4cN07tyZzZs3M2zYMBISEopXjZYuXcrMmTNJTU1FrVYzc+ZMNm7cyIUL1//IH3nkEbKystiyZUuFfJRbd8jcyTz+zgBqd3dgX+5fZBszaWTXig4O/flpxlaunrXeVPVO5VoLj8rgZK+hb4d6uDqriYjN4eC5aMyWoo+4QV1bMKhnB7bGCcQXQKAj9K8tsW77IXaduFzu3L3bNWDSw4PYEmnkVLIRF43IqEZq1IZcnp/3a6X8LY8bheE3j43lvaNKDFZ2yB5oKJKp/Rt75epyCzOuyWnL9qQmJB6qjf8uXbHwGPRkZwzDL3IoZWupMY4qZ551msfcB5dV7YZkZG4DJouBHSnf3R2tO1544QWGDh1Kv379+OCDD4qPnzx5EqPRSL9+/YqPNWnShMDAwGJBdPjwYVq2bFliC23gwIFMnjyZkJAQ2rRpw+HDh0vMce2cl19+2aZPer0evf763nlOjpydIHPnsmz2Nlw9neg2qguB7lqiTifz9o4fsJhtV1e+U1GHxpXZrd0WDz7emubdndifuZYYQyoNa7fksQGD+OnzU1wMSeRw+ElOrz5Hjz7N6errRsq5dObsuYheb8J2ac3rnAoP4fm1l3j4sa5MCK6HLr+Qn/+3l4iI1AqNrxxFwtCjjRtx+SoMZuvfX7dGWnitRzeOZVmvB1UROoxoxIKUL6za8ow56J2zcHZ3ICfj1iQMyMjcam67IFq5ciWnTp3i+PHjpWxJSUmo1WpcXV1LHPfx8SEpKan4nBvF0DX7NVtZ5+Tk5FBYWIidnV2pa8+bN4/33nuv0vclI3OryUrLZeM3ZW/93C3c7HZJn8fa4dwmjgWXr7cGuZoVwt7kv5j2wqcsfiKM1PhMLMDeMyX7vt3seu/qN2KqNP5muCYM/eprydAJ2Ar+LjSBSrg5T/SBBrKDNMUxOxaFCbNke9sv25iOg7O9LIhk7lluqyCKjY3lpZdeYvv27Wi12tvpSineeOMNpk+fXvzvnJwcAgICbqNHdy5BLfwZMqUjLrXswCxydO1l9q45jUFnPQD1VqNQinS9P5juY5ujUENBhpGtS08QcjTilvng7e/OsBe64NfMDQGBi3ti2PrjcfKyinpfuXg4MvjZTjTs7AcCRJ1KYdPSI6Qn3Zl9o/6NocnNr+j8m6rEjPQeF8y8qEmljuvNOlalfMGw5x/jp1mbad+/Jb3G90bpoMWQW8juH3ZwavfFqrhdrTg429Fv3H007xuMJAikXI1n3eazxKVmMdrFdnSDn6NIpjEDAREn9XBcpYcRzGrMiiwyhO8oNBwtPneM8ym2JxWls+fWEXD553enM6pwUruSa8iyeg1fhyCSXBIxNKncdqaMzO3CZNJBSvnn3VZBdPLkSVJSUmjb9nrWg9lsZt++fXz55Zds3boVg8FAVlZWiVWi5ORkfH19AfD19eXYsWMl5k1OTi62Xfv/a8duPMfZ2dnq6hCARqNBo5H/8Mtj2OSuNBjlxB+JC0hNSkQpKOk8uj+zHnmC+WN/Iz/n9vZ3UmmUvPbrWM5pdrI49RUMZh1uGk+G/m8cbY825Nf/lY6XqG7a9mnEsFltWZ38FZFJlxEQaNm1E68Pe5olz2xAbafkqS8GsDbta/5KKor/aNCkBVNXTOb31w9w6VhUjftYWa4JocrG/NzItdWQmxVGXrXdSDBHYsH69mBEdihjWnnw0rfPEeZWm/mxkJ8g4aR2ZOTUR+k8OoolU3/gdodT+gR48ML3z7MyScPyMAsSUN+1KU/NasLfuw5gNmYT4OxKbE7p+3y4KWxN3subAbPwOJSKdPQMGIwoHe3xGfASBQ2vkGL4X/H5/X1D2Q5E4VX8u/vjYBSjhk/k59BPSs3f0LUlSekCKf4K5O6oMncbZoMEB8o/77YKor59+3L+fMkKtk8++SRNmjRh5syZBAQEoFKp2LlzJ6NHjwbg8uXLxMTE0KVLFwC6dOnCnDlzSElJwdu7qMnl9u3bcXZ2plmzZsXnbNq0qcR1tm/fXjyHTOWo29SPhqOcWRLxbvExk2TiQMpmYh3DePrjSXz+3O2tQTP2rX5s52fOJR0pPpapT2NZzAIe6TiVtr2acGrPzVfErSj2jlpGvd2VeVdfKN6OkJA4l3GEqLxQXvjiYxQqgQWRL1Ngut4ENiz7AgtypzNz3mL+N/hnjIaazWCqDNeCn6svI6zoweyC/02JIkEQoBwx4+rhy85cZ7ZEXD8v1yDxSziMrFOXPo92ZeeK29vnauKip/jgipr0wuuCJyLLzP9OwZxB3Xl/y1+8O+x+/opQciTBjEUCDzuBsU1EQnMu0N+lGV4bo5AuRl2fNK8Aae0x7O9vh0OzfuQbdpS6bloPI5oYNdmeEODUl0cbK9gQ+TW5hiwUgpJOvgPoX+d5QmKyqj37T0bmVmDWVex9e1sFkZOTEy1atChxzMHBAQ8Pj+LjTz/9NNOnT8fd3R1nZ2emTp1Kly5d6Ny5MwADBgygWbNmPPHEE3z00UckJSUxa9YsXnjhheIVnkmTJvHll1/y2muv8dRTT7Fr1y5WrVrFxo0bb+0N32MMeaEjfyZZD8KMzruKfT0JR1f74m2hW41CKVK/ize/Rxyxal+f9BPPTJpbo4Ko1yNt2Ja50mpsRo4hixjledRKTQkxdA2DRc+B3PV0vb8le/84XWM+3ilcf9hqbkoUpcZn4qeqj4BQqlI2QB3nhqB0Y1u8dVG5PlZi7iM9bqsgqtvUjxjRkfTC0v6bJfgjUqB348ZMWbmSUa1b8b9ODTFLAnn6An4/dpTjihj29XgA4aL194m0/TzuTZ8kn9KC6BoPd+jI/w6qqO86gCeadEOjKEAQNByOc2D6ToG3O/miUSrQm26+EKSMzN3AbQ+qLo/PPvsMURQZPXp0icKM11AoFGzYsIHJkyfTpUsXHBwcGD9+PLNnzy4+p169emzcuJFp06axaNEi/P39+e677+QaRFXE3d+RpGTbD60o3SX86nhyJSvG5jk1iaunE8kG2ynn+cZclDXcj6BuGx/+yDlr036p4ARN3FvbtF/JP0v3tm3Za7vN1j1HkTDSQFBQhcdsPZvM0BYT2BDzY4njSlHFmMCXSSqUsNhYRDJZQOfsQOrQil+vumnbsQkXdBrAer2UkHQLPZv5km8w8uuxE/x67EQJu2sjO5TpZWTC6gyIBrsyC2rbq+3I1EmcTJI4mWQHlAwniMkFHydHYjLvjrg2GZmb5Y4TRHv27Cnxb61Wy+LFi1m8eLHNMXXq1Cm1JfZvevXqxenT9/637Mqg0ijpNKgFHgFOJIdncnz7Rcym8tO1zQYJtajBYLFe2t9F4UFeTmp1u2sVJzcHOg9tjp2LmoiTiVw4Ek5hnh4npavNMQIColSzfwL5GTpc3T1sdoN3VXkhSkUxGb72/rTy7oyIyPm048TlReKidicv7fbGYV1DEARadW9IYLA3+ek69l3N5N+PxmA/X1r5+6MzGtl95Spp+ZVbHbS1NeNmb0efRg1wUGu5kBDPqbiiQoG/RF5hSp3WvNDyU/YnriZTn0I9pxZ08x3Fl7sv8XT3xmVeT6GWbut2UIqTnkYa239zbhqBnDLaDxSYjEgO5SSmKAUoY3FHFCQUQtGKlFUftJCrv3ub18rIlMcdJ4hkbi3dRwYzYEobDudtIsYQi3//IN57ZQJr5xzg1K4rZY49tOoi3R8fzK7kv0rZVKIaP4JIiNxfQ55fZ+zb/ajT05WDeRtJN2URPKg9DyufYukLGxAy7W1mzrTy6ML57dE16tvuX88ydNFofsz50Kq9k/1AJMnMC63ewSQZOZK4C4skMbDuaJzVblhylHz/+54a9bEi1G3mx8TPBnPecJAruh24KD2Y6fYAx0PyWBISgq+zI+8Nu5/wHA2nU0XsHeCt+1uTmB7PJzt2YqmGgOVpfXpT1zuQ3XEKEozQp1VzJt9n4L2NG4nLyubLPWfwdLBnaMuRNHNQcTU2l6e27cFoNjOkRSr+Tt7E5ZYWHQ1cFYSn3N6qyIcjYxnXxcy6MOv2IfVg3THbX+gKTUaStCZcHe0hr7QIFRoFkKvYW6YgOhB2mW61W7MvrvTWooNKwE7UkVlwZ4hzGZma4I6oVH03cC9Wqm7RtT4D3mvM0oj/lcjQUQpKpgZ9yMqph4m8aLtUv1KlYNbqcawp/JywnOtVwNWihsn1Z7P5fxc4t9/GJ3w18cBLPRH6xLA56bcSx13UbrwY8DG/vr6TRz7qxueRr5eI06ntWJcnPd5m7phlFOTVbK+tF74cRWitnRxMu14VXUDg4YAppK5TMWp6Txafm83FjJIVhuu7NOGVtvN5qpF1MXWrcPFw5NXVY1gQOaNUrNOoupO4GufPoOYd+fSkolTT036BIt6KKBbt2lMlH57r0Q2dqiGbIkvO72EnMLO9xOQVK9CV0TrDz8WJD0aMZv5xiQzd9Y88b3uRV9tLvPrHGlLzbm99nVGtg2letx3fnDOX2N7rXluko0cGb6z72+ZYfaCB3o0ULPQdjOa7HVB4fdVW8HKDp9oRYxyLRSraVrtWrToqzgsoauNhp1LyxcMP8+0FFRHZ119nOyW80VHBV7s2cz4xqZrvWkam5jHrdITPfbPcStWyIKog96IgemvNEyzJfs1qQK+H1puHzK/z2VNlV77V2mt48sPBeDTVEq0LxVnpjpcpkD/m7+f8gZrtzKhUKXhn8+PMDS9dfwagg2cvXLa159KhKB59ry+5dsmkG5Pw1zZAHyPyw8zNZKeXvvfqRhQFHprZh2Z9/QkvvIBK1FBH2ZhdP51FNCnp8KoXC0/Psjr2qeYzyP7TheWfbq5xP23x6Bv9CG2zkUuZpVtCCAi812k568J82RJpfflhdleBV1avIN9QubpUGqWCzx95nFkHrX9U9QlUYM45yZrT58qcp467K9P69sMoOJCYB7WdQDDl8smO7SRkl93c9VYxuHlTxrRtT2SOiN4MDV0hJD6KL/fsw1zGR7U+0EBd/1Qeq6PnMacHUSSLkFEAfi4YXVJIMr2J2ZJefL41QQTgrNXw2oD+uDh4EJkNHnbgrtbz1d49nE+QxZDM3UlFBZG8ZfYfRRQFBFcDBTYEQbouBce6qnLn0RXoWfLiX9g5aPCt40l+Tgwpcbuq212rBLUI4FLhCZv2k2n7mDbgQTZ8fZAPRv2Ch68Lzu6OpMZfIS/71i39WywSK+ftRPmJAv8gH4xGE/HhRRWlP9k1hd9jF9ocuyt2Pc8//N5tFUQNOtfizyTr/bEkJGLzLxGW6WPVDnA8WaRtgB/7wyu3PRns58uZVNtVmvfFmXm9XeNyBVF0RhYvr16Dh4M9no72pOTm33FbQJtDLrE55BJ13V1RKxVEpWdhMFc8qytFH0ms5hGUnr4ovN0xmhOw2Ci0aI0cnZ5Zf2/ASaPBz9WJ7EIdSTk1/6VBRuZOQBZE/1UEoQKF6Cq+eFiYryfyYnzVfLpJBFEos9WABUtRjZp/SE/Kvq2Vn01GM1GhJbcgBRHMFtsPPItkRhBvb+2XG1PZRUQc1M7oTYXFwfRmyUhZ6UtmSUAUxEpfXyGKNgN9ASwSiELFX6P0/ALSKxnsfauIysiq0niTOQmTufIrOrl6PZeTrSdLyMjcq8iC6D+KxWxBUWCHWqHFYC4dQ+OkdkWfdhscuwkiL8bxsN1D/M1PVu0t3Tty+dDtDZYtjwO/XaTHhEGl4oeu0dWvPxe2Vu0eqtpWIyZOT0PPlrTy6k8Dlx4k5ytw1oBFimdt2EfUcWxBgBOEZVkf387bzLpDiZW+/vmEZMZ1lfjzqnV7x1oKjkdGWjfKyMjIVBBZEP2H2bLkOA+99DzLYhaUOC4g8Jj/y2x4y3pBwzsFfaGRiINpdG7RnyPp20vYtAo7Rng+zUc/VL77963gz6938d301wh0CiImt2TMlY99bbr7DWTiWx9Vev5r1aSrwopjYXz+2ucsPmNiycnrxz3sGvJut285HXWZIfV8OZEskGsouZTTwUdBSlYCWYU3H7juFH1tLj1xkXF0rRXAocSSQdX2KoFRQRam/m671pOMjIxMRZAF0X+YY1su4lO/GzNGLWB75u8k6+Lwtw+iv9tD7P32EheP3fnfupfP3sbkz0fSskFn9mSvJdeQTVOndvRwup+fpm+/Kzpzzxuzgjf/+JTDSTvZG7cRi2Shm19/7qs9lM+f+rNSc94ohKpaX6d7s9r8cQWO/CvhML1Q4q19Ai+3CuCDTRt4Z8hQ9seLnEwGB7VAv0AJezKY9bft6sj/xsvRgf5u/igVIleuRhMXW1S/afmHm3nx9Qfo0NyLLYkiuQaJth4Cfb1NfPbTZoRYPchtJWRkZKqAnGVWQe7FLLNruHg40uvRNv8UZsxi7++nb2nQcXVQu743PR8JRuusIuJ4EofWn78j+3+VxZNvj6DtiLogwqVt8Xz1euX7wFVnn7FFDz3Ix6fs0NsIdXq1vcBHm1aTWVBI78YNaBNQB53RwJaQEK6mplsf9C+Uosibgzrg7VTA8cyNmCUjbTS9cMioxeLn/yxuEhzQ0IduD3ZD46Ql/MgVDm88RWEDP4Bq7qt2d3Ety6y/byhjnK1vv96IrSwzGZl7ETnLTKbCZKfnse7Lmi+gWJPER6Tw29yKr0Tcifz4/jp+fP92e1EahUJlUwwBpBSAu4MdafkF7Ai9yo5QG8E+ZfDW4I5E6FezJnRP8bFTHKKOY0Om/zSD90f9DEDs1WRWzl1bYqw6NK7KcVIyMjIysiCSqTJ+9bwYPLkTvg1cMerMHFgRwpHNF7CYy2//cafQpF1dBj7XDkdPO/Iz9Gz/5iQhxyKK7V0GBTPqjW7YuaqwGOHQilBWLdyBxVI992jvqKXv4+0J7lcXBLi0P5ZtPx0vboyr0ijpOao1HUc2RqEUiTqbzOavj95U1tyEzm25v1U7lKISnVHPsiOHWH+h/Ma2BfpCXDXOZOmtLyYHOFGl1GwvRwc8XQpYHbKnlC067yoRLqdp0TmIC0fKrmslCHBfg3rcH9wGjUpFck42K08cJ6yCq1QAjX08eaR9RzwdnSg0Glh35hSHIqJvIt9SRkbmbkUWRDJVou9j7ejwVB3WJn9NTGoYWoUd9z0znLefHsf8x35DV3Dnp+4+8/H9SC1SWZ0yn4zUVNw0ngz+32P0uBLM0pf/4vXlj6FolMeSy7NICI3BQeXEwKFj+PKJaUzv9gW6gqr1d6od5M3z39zP5qxf+CztUyQkgrt35vXhT/LjS9tIS8jm1eWPsK9wLV+kLcVkMdKwaUum/vYc6+ec4OSOy2XOrxRFVj3zFCeSFby+V0eWXo+vg8j4Fr0Y0aoNE5f/Vub4P06eYHRwP76/UHqZKMBJpFCXSY6u8r/n7kH+nEjbYNN+KGsTfUY9W6YgUqkUzJ0yhot6ZxZfsJBnlAhw9uW53sM5F32BX48eL9ePZ7t3pa5vY36/LJGQZ8FZrWVo496MbJ3FzD/XlVkYUUZG5u6n8sVBZP7zBDT0od2T/iwMe5WY3KIWHTpzIVuTfmet7kuemj/4NntYPr0eakNWk1CWxXxGhq6oEW2mPo0VsYtIrneGl754GGP9FD499ToJ+TEA5BtzWRv2IyuiFvHWqnFV9mHS4vtZGPMKJ9L2YcGChMTZ9MN8GjmNpxcO5vnFD/B92v/Yl7IRk6Wo2vPV7PN8fPVlRrzVEWd3hzLn/3jUCP4Kg2/PFhav8iTlW5h/tIDoAmde6NGlzPFHo2Ox6GJ4vJkCuxu+QrXxUTCllZmPtm2r0v2rFOI/tYysY7QYET2cMDTxt/nz4IwBbE51ZvUVM3nGonuMzbHw0XEzjf1bEOznW6YPnesG4OPRmM9OmknIK1r1yzFI/BZq4UiqKxO7d63SPcrIyNz5yIJIptIMeb4TfyQvtWq7mnMet8Zq7J3K6cB9m7nvieBSfdCusT15Nc37B7D88mKr9hMp+3Guo0ZrX/mA1OBuDQkxHrbafLbQlM+R/C3YB0Bifmwpu1kysT7tJ/pP6FDmNRp4+7Elwvoq1i8hOga2aFWunx9t38nZq/t4pY2O/3W2MKerhbrKy7y0ahUZVaz2fCwqkWD3fjbtbT16czAqnewgjdWfgsb2BDUK4FiS9UCn5ZckxnbsVKYPD7XvyIpQ69ufe+MsdKwbdFPFH2VkZO4+5C0zmUrjVc+Z+OQom/YIXQi163tz9WzMrXPqJjFr9MWrLv/GJJnQWQrI0tuOQYnKvUy9ZrW5dKJyJQoadPDjUuEmm/aQ3OO08G1r034p8xS92z1u0+7t6EBSvsVmDEyBUUJnrtj3ol1Xwtl1pfr700VlZOGr7Uxd50ZE5VwpYXPVeNA7cBRfbFqP2UYGWYCbEzFltCJLK7TgZFf2KppGZUeOwfaWWGK+gJu93R1f4VpGRqbyyIJIptJIJlCKKpuCwlHhQmF+Ubnr2vW9adw+EKPezOk9oXdMWr+inD8BlahGQCjRvuJGnNVu5GRUPqC4MNuAo9LFpt1J7YJoUZRhd0WXZzuGKVenx0ldtuDRKouEhgC0D6xNgJsjKbmFHI6Mrfa4mVrOTnSo449FkjgcGUN6fgEqUcSCL6MbLCAqZweHEv/AZDHR1rs/7X1Gk5jrjKejA8m5eTioVXSrH4iDRsWFhFSupqZTYDDirLbd60wAFOUs7ijFsu/TSQ06Y+Wa08rIyNwdyIJIptIcXRtKlwf7sz+l9AqHQlASIDYmO/0sr694jDy3JM4ZDqAV7HnxhWHEHsph2XvbKtBPrWZJvZKHn2tgcXzQjfjY18aSJxLs1ZmzqYdL2bUKe7xV/sRHpFb6+of+Ps+kR+/nZKr1sgc9nR7ASe9lc3wvzxHs/dB2U1NNpAk3rYibViBTV/q1buKhQC1KtKrtw7R+LbmcfZD4gkPUqxPIMz378tOhMHZfqfoKn51KydtDBqNUu3M0WYVCkJjVohPp2UkcjQhnX5zA+ggNwV7D6RHQD6UocSrJjt8uSDT1kHigdSskKYcO9Rw5lb6ZAlM2TzXqjquyFe+uP4qzUo9WoUZnZdesYy0FhyPKzqY7FxdDsFcDzqWWnsBJLYClgHyDLIhkZO5l5BgimUqz948z9FSPIsAhqMRxUVDwXL132LDoKK/88jCrLZ/yXcwHHEvaw/7ETSyInE5a8GnGzrIdN3KrWPPhPp6q/RZOqpKrNI4qZ572n8VXz6/jySbT8bGvXcKuEtW82u4j1n96tErXz07PI+lIIf19xpSydfUciPGyPbt+PsvYwJdK2Zu6tKF+QXvOHihd90cdGodLuB6fBIm8wnze76EpERAN4GEn8HpnDbpCPdN6N2DR+edYF/kNJ5L3sTl6GZ+ceYYH27nQXeWDU7RU4udmmTtiOFvjPfj4hMS+WAO7Y4zMPSpxNrsWYzt2ILGg6KPoXKqZxac0LDqhZX+chFmCxHwLXWt74uJ6kc/OPc/e+PUcT97Hr1fm8nvUW3w8uis/Hz7Eqx0UqP71iebnKDKqgYk1p86U6d/yY8d5vIkZH/uSE2gU8Ep7kW/27bvpe5aRkbm7kFeIZCqNUW9i/mO/8eyCqSjr6QnTncNZ9CBI1YL1C44imeGstIf4/NLxNbtT1/FKzz7YOWgozL99qfkpcRl889xmJs2fT6Y2jgRDBLVUdfEwBvL9pK1EX07io4dW8eovC0gxxxKadRYf+9o0c2nL5i9PsOWX0itHN8vP72zh0df7Mn/0CswaHRKg0Ks5szWSr99Zh8Ui0c/Qnlnjv+WS7gQ6Sz5N7NuRccnAx68VBYQLgkDb3o3pNb4VWkc1cRdT2fz1MVIvx2OvyeH7ix/wfs9pxOQ4k5CrpL6rETe7VH65uIRnms/i80tz0P+rya9ZMvHDldk81+8z3tsa/S+vK14VupmvN6kGZ6JzJEY10tLGR4lFgmOJRnZE6RnV0J56LhZO2GjO3tBFxNG+gI2XfiplSylI4HDyGnyc27Pi8E5md+9BZK6K9EKBIFcLCnMOr/2xhXxD2aURsgp1vPnXn7w+aCCFFgciswW87SHA0ciSvTsISUqu0L3KyMjcvciCSKZK5GUVsOCp33HxcCSgoS/5OQn8dLGo5stL34xmZfocm2NPFu6iVffGHNlqe8vnVhBzJZn3R/6Mb6AHnn5uHE48T2L07mJ7xIU4prRdSIOWATTtUJ9jcRl8se2zart+rTqetOhbl79if+R42m4sSLT17M7gro8T2MiXqNBEdiw7wa4VJwlqGYBaq2LLpfXF7SxUGiWv/vwoYQ5H+Tn1HfLScqjfsCkTf57I8Z+jyciL52LGcS5mjMXPoQ6uGg9CohJIKyxSIDpzDklWstgACkx5iNo87MMTMRmvbye54A9UrFdav6bNuJqlYHYPLX9c1vHuAR2iAD391cy9z4mdkXqGNVCyLsyMwcqW17j6cDxxl835Dydv4bn6w5m5aA8To2Jo6OWBi52WtRlZpORVvJddYk4uL61ag4+TIwFuLmQUFBKRllHh8TIyMnc3siCSqRay0/PIzYooUZ1aoVJgsNj+Zm6QClH+e4+jhlFrlRh01nucJcWkkxRjO6Ms7HwskZfiMZuqtwL35CXD+Tz+NbL1GQj//O9E6l4uZp5kxheLeHvQD1jMFiwWiatnYxAVYonXefz7g9gq/EhIwvXigxHZl1iQPYOXxs/HZLq+xZWQH01CfsnVHqOl7J5vRoUFU9MADDf0hruZvmFqpYJRDbS8sTeP/H9qBFkk2BVj4GyqkXe7OXIqKpS3Owey5KxEYp4FQQBntcCzTQQuHL+CvoHtVUSTxYikFYr9qWj/NFsk5+aRnFv5QHkZGZm7E1kQyVQJURQY8mxXOo1sRIGQg1q0IytSx6oP9nBhTzRtRnTjUKr1wn2tND355eTBGvexXrPaTF4yAYWLC3kmC25qkZTLMXz65LcU5pW/XadUKRjxYg9aD6xHATloRHuSL+ayau7um2qdYY0WnYO4bD6Ov2Ndnm05k6KcKAlRULA9ei3HCrbRaVBzkqLSGfN6T+xqiRgtBrQmJ/b8co5D685Tq40TyyKsV2JenfgV01p9gkJQYpZKCx8XtRsqHHBSuZBrLH0vSkGJVulBaoACuJ7tdjNNVI0GPbuiDcVi6EbSCyVC0kxcSk7kTFwc7/boicGiwmQBZ5WFLXtP8tfBs7zbtjNb463Xi2rp2YkzsfJKjoyMTNWQBZFMlXhx6YNc9t7DvKgFWChatfC29+P5n97hpxd38OJLT3Im8xAFppLfuBu4NMch34u0xKwa9a9BywBeXvEi/ztsIDL7eoxMzwBfPtn/LjO6/a/M1huiQuTVXx7lkHotcyLnFR/396jH9OVvsXDcWlLjMyvtX8NOtcE+lZ6eQ1h85v3i10mj0DK2yfPkGnKo/0BTHIMEvo2ZTVZ40eqHUlAy9PEnmNzzASILL9qcPyE/mmxDAUPqjmd95Pel7KODXmL58UjGNHyJHy/NLmUfWvcp/giJqlIXeZVKy4kk2xlah+ON9PAPxMWpFq/sMZBrKBKpagWMa9aCga4i8TnQ3L0DIRklhZ9a1DA4cCIvraxacLuMjIyMnGUmU2na9m5Cil8Iu5PXFYshKAp0/TzqdcbNG0B8phfT2/5Mj9ojcFG74W3vx6gGLzG8/nxUTgEINVz99/lvnubNA3ois0sGp+yLtfB7hMQzHz5U5vj7xrThnMNOjqbtLHE8Li+SpfFv88QH/avmoFkg2KsjS8/NKSEa9WYdP4YsoJFbC+q3rcWi8JklCkSaJBPr4n9E01SHs8bV5vSioCCn0IQhpwXPNptDkEtTnFQuNHdvz7RWX3D0soLfTl7g+FUVLwd/TlP3tjipXAhyacpzzeeRn92E9ecrXozR39WFwc0a0bthfexURd+3Cgx6HNW2f89OaoHW/gHMP2Ym94biiAYzfHfeTJu6Tfnh0CW6ejzHg0EvUcshACe1K519BzCj9dcs3H6ZrEKdzfllZGRkKoK8QiRTafo81YqfUt6xass1ZFHolsm6CAXnzjjTp84URgY9jdEisDvaga9TzDzeUKTh6E5cuBBXoeupQyt23jWc3OwpVNsRb2NbbGuUmTF9m5c5R7eHm7Io+RurttTCRDT1LFXKlAto4MPeuI02Cz9ui/6D/v6jMFisz/9Hwje82GQuimjrW2IdfHqz50oyK09cIsDNhdFtnqFLbS3R6fm88ccF0v6pvLzq5BV2X3ZgdJtHaVfbgeRcHR9viiAuq2Jbgq52Wt4ZMgSd4MSZVCW+SomHO5o5EXWFLSEXmHBfEKE2QnsG1oWd0WbMNrL514ULDGnegmlr9hLs58P9rV7FXq3kbGwGz+7aS6Gx7BgoGRkZmYogCyKZSqN1UpGXnGPTnqSLwWDugMECWyJhS6TdP5ai1Zp4vRL7Fm5kF1assGFRZlPFhZFvoCdJBdb7WwGYLKCXylmhUlswlhEYnm5IwsnNodKCyKmWlvi8f6e0XychLwaDZHv1IzE/lrxUHc/Um8U3EbNLrNT5OgTQy3cck3YVZczFZmazcNdpm3Ol5uWzdP/NZ/wpBIGPR43i6wtqonMsQNH22IYIGNOoMb0aSYimNLr6eXIooWRA+uB6IpIph7BsZ5vzx+dZ6FKnqE7UuYRkziXIKfAyMjLVjyyIZCpNQaYBF4072XrrAa1+9vWL20JYI9DRxHZV7k3EpxSlebvgXyFRlBiVRm0HBWB9BUGtAI1QdpFBSS+iUWhxVrvRJ2ActR0bkWPIYF/cr1zJOoeX2o/s9GPl+mLvqKXXo91o3rsFFqOZo38e4fCGUySFZhLYMYjLmdaFSKBTfTTY25zX36keV0/FcelQPG+8uJRQw0lyzOkEObRBMHozfc3B4hWU5r7ejG7bDk9HR2IzM1l18gTRGVnl+n4NbycHHmrXjkbePuTodKw7c4rjMfH0ahTE0RQN0TmlxeeaKxbmdGvMtFUreP6++xjQpTZnUkUUArTxsnAmNowN51JoVLeXzTijei4KTOay6wjJyMjIVBVZEMlUmu3fnmLwrLGsjP2ylM1N44kiy41+fhaOJZYea6eEpu4mPom3UY2vGsjLLkBZkEc9F22pGCKA+xsouLDllM3xhib+7NyRwMTHP0BUNGV5iB1XMsx42tflgYYtGFo/mpwrZnLreJfpR4NGvjw9YwRrY0X+SDSjEqHP2Ad4Z8oQvpi7lvfGPMSu2PWltrwEBEbUH09hogU7pQOFptI1dYZ5jue3N4+REJnK0Y0h1B3RHkfH+mwkljDn6+0q3h48EEFTi7/CICnfQj1XR6b2DyAk5iI/Hi4/IHlYi+YMadWR3y/D6hgzblp7hrTszyMdsrFYJL4KsV2K4HiySKvatfh4+w4c1Gpa+vlgkSR+3JWI3mTmgeDmdAtQseayDoOVaUY31hKVLH9UycjI1CxyULVMpTl/KAxNqD/D/J5AJaqLj9d1bsiUwHl8+9xaog9dYFILBfaq66tAfo4ib7cX+Xb59lItIcr6cQnX4xKuv6lYopDdp/igp5pgr+tvdYUAw4IUPNhI5MDaE1bHGZr4kx2kIUwoBDrxzn41lzPMSEBqgYVvzyrZE9WERIOG7CDbP4WN7Xlq5gjeOiVxINGMyQKFJtgYbeGTq2omvH4/Zp2Kl9rOxk3jWXx9Z7Urk1u9jcbswA+vbeHl+h/j6xBQbNcq7Bkb+CIx2wpIiCzactQ3rs0xYwa7MhMIc74unh5p15YEYy2WnLWQmG9BAiKyzMw/Zsbfpxmd6waW+RrW93SnX4uO/O+QmYvpZixSUbr8rxct7E5woY6nB4Um2ytterOAWlmUsp9vMHAkKpZj0XHoTUUiVaVQciDWwNvdHEu0znBSC0zr4MDZFBMSNRt8LyMjIyN/7ZKpEl9PX0evB9sy7YnPMWv0KFERdzaTT1/+g4zkbP6etZJHPpb4amAb8s0CCgFEg45li3cRdzQa233eS3OzQdWiKFCvkxfzjj/MxFZzcFLVo9AEjmqJ6Jwj/O/IQh55fiYXj5VsLXJNDOXWEXhgRBcWn7G+XbMtysi87nUoqCva7Ao/IrgpW+JFCk2lV6gS8iyY7Z05nrWGk6k7eLzpFOxVjkiShMGiZ3PkKiI1V/HwcWbpk1sY9eqLeDawxySZEArUbP3sBMe3HS/l87/p36w5bx60voKz/JKFqR06ciTKdgPXxzt24peL1sO+jyZZeKqlmg61LBxJsB6v1crLzOZjKTbnPxMXR7O6bfjhnIkJwXY4qwUsEpglWHdVh5+DRFhilM3xMjIyMtWBLIhkqsye1afYs9r61tPDr/dB1ewybx2dWZwp5a714qlxb6G/FM3FY1E15perlzPJpliSC+KZc2yC1XPs/Ev+CfxbWLjYO5FaYHs7KCZXwMfZkYTsXKv2Vv51WBVle3yB2cD5jP1EZIfyxZn/lbLnO+bSrft4jm4NYfELa63OUZYYclCryNSLWGwItlyDhFppZ9V2DW9nV2Jzbd/DyWQDDzUSOZ0M+n9poibuInpdOqlltNC4mpqOpyofk+TA/CMlz3NSCzzWWGLi7itl+igjIyNTVeQtM5kao1ZdL2r30bA6bkmJtPEMXSqfR8zk0ff71Oj1DXojWkXRw15AoIl7Kzr63oe/Y73ic8Qbqi/fyLWO7opygq4dVKArI+1bbzKW2C78NwIidgoHoEgoPtToGR5pPAkf+9oA2Ckd0OXbLmpYHgazBe0Nmq+5p5Ie/iqCXK/ft6KcTwFRKPoB8HcS6e6vorW3sviYSpD45fB+Pugm0qO2AjsleNqJPNpE5NGGBczZvKVcP9/dsIHnWxoY1VDEVSPgoBLoV0fBO51h9sYNGC3V2y5FRkZG5t/IK0QyNcaAie3YlP6rVZvBoifUeJym7etx6USk1XOqSl5WAfYF7nTy6cWAuqMJST9Jui6FvoEjqOUQwN7YLYQfKRnUrQ6NK07vBw0XQ6Jp5dWIs6mlt4M0CnBWGcgoKLTpw+YL5xncrjY/XLBudxMFung8wPCgx1CISg7Eb8MimXku+E1Uopro+Cj+XG1jcAUwms2YjPl0q+3MyEZ2nE81kZhvpn9dDc+6KtgQVkhkqvXGrtc4Eh7GgLot6eavJb3QwsV0E/5OCh5rbsfmcD21HQvZHnqVQxFR3B/ckudbBKI3Gtl84RyfR5U99zUyCgp5dvlv3NegPuOaNkUlKjgaGcFzBy4WxxrJyMjI1CSyIJKpMbzruJJgo4s6QKIlAi+/xlyiZgQRwJnNEfR67n7mHptWnMW1l024qN14t+MSPnz191JjrsUqueDPju8P8fr8+szNE0kvvL5apBBgWjsFPx/eU+b1zyUk8WTXHII9nTmXVnKV4/5AkYjjYfTu34A1UV9zIGFrsW1HzF+09erGk01e4ctw253eK8Lu0IuM7tCL1/fmYix2wYCTWmBBH0fe+bvs2kNbLl7kpwkdeW13Psk3bB/+fknHG10cOBtTlM2WbzCy8sQpVp6wnblXFhZJYvfVcHZfrXhlbBkZGZnqQt4yk6kxzAYBPwfbGUz1nZuTlV6zXcXbDAli4em3SqW0Zxsy+frCXHo+EmxzrDo0Dt2xqywZ9zmv1MrmpUYWBtVR8HgDkXmdJHZs3ceRSNvByNeY8/k6BgiJvNsahtZRMLKuyIfBZrxOHufkjztJ1EeWEEPXOJV6kNDsM3ToZ7uadlnxQ9fo3aQZ/zuYf4MYKiLXIDHvSAFDWth+DQAGNW/G0tOFJcQQgATMP5JP68A6ZY6XkZGRuRuQV4hkagxR6c7QepNZfHZaKZtaoaWJRy9ONzqGIb2gQvPdbJaZo6s9+Xbp6M3WKz1fzjrHiE6Typ0nOTaduaM/oXaQN/5BPlzJyGNNvpGs+hqoQFHJND8zM49tJSBdSyc7T4wmM3PPx6LXm3j93QFsi/nR5thtMWsYOeNVDsZZb6FRnhhSKRQoVY7kWivwQ1H6/ZPNfMv0v3P9Brx31HqclFmC2DwlAW4uxGZWrM2HjIyMzJ2ILIhkagyzBeKym/Bwo1f4M/xLDP8IEw+tN082n8+hWDdMgXZkp2gqNF9FK1RfQ61RoTPbju8BsFDx+JT48BTiw4vSx6Um/uWcXZpYDx1Z0f9sIforAAUKdwWFObYzsApNBSicRLKDrL9G5VX5VitEdGXUCIKi31NZWKSiH1sUmECrrP6PkmuB7f91Kl7JXUZGpirIgkimyvR+qB09n2jxTx0iNQnnMvnjk31cvJpAosYPQRjE1FbdgSxEQUmWzoWlp+wZ2cDESmNyuR/4QZ7uPNO9KV52gMmMJV/N1k8Pcmr3ZQA8/VwZ/WpPfJu7YMaIqNOw5+ezHPjzLL4q21t2bhpPClPLbwwqigIDJ3Si85gmGJU6VGgIu5zP9wcukcvN9TD7971ezTHQzrsvoRlnrZ7f1rsXifmVfyjmG4y4aSyIgmBV1DipBYymAlSiyGOdmtGjoRcmqRCVaMfJqEx+OnyR5JwsApx9iM2xrpwaukJ0RhYv3teFAc1bkG8CpShgMRXy6Y7dHIm6yaa8/xThvNWIosD9I1vSqU9tjEIBKkHL1XNZrP71LHl5tvvJ1TxlrwLKyMhUD7IgkqkSkxY+QGr983yaNBWTpSg9PNCvAdOWz2TJJ6eZOLQdbx2VOBDnADgUj6vrDGopl7gs281hAdoF1GJS70B+vfI/0gqLMsLslA48OHMy/s28OLnlCpO/HcJPiR8SF1kUnK0S1Qx+cixPdRrG6U0R9OwxjH2pG0rN/ZDf82yYWXbbCkEQmPb9w1x03838uM+xSEUrSg29W/Dhi6/x8l8HizvGV4ZMnYmOLr3w0C4jXVeyeKGT2pU23iM4GpZZ6fkBtl28wP1BrVkXVno17IlmIqtPHmPhQ704mbmCBWd3cK0EYwuPjnz5yBQ+3X6KZ3oNYc4RShVn7FxL5FJCNF8+MoK4Qlcmb9Oj++cytR0VzBo0lB/372FjyOUK+XpNDN3s9mhVERUir/78KKcdNvNR2ObiJrmNfVvxzpypfPToSnIybK/k1SRFWY+yKJKRqWnkoGqZShPcrSGFDWPYmLisWAwBxOSGsTj2DZ4Y34xlC9bzQSeRLn4KlGLRisQDDUSeaaFn9saNZc4vAFP7NGPx+RnFYgig0JTPL9Gf0GykN88tHM4X0a8Tl3c9U81oMfB3/E/QMo1L+2NonNyLxwJexsuuFqKgIMilKdMafMyVVTlcPm270zxA9wdaEeF5hN1JfxWLIYCrWRf4KfJtpvdrXXzM29GBHkF1aBfgh0Ko2MNLbzKy/qoLk4K/4T7/UWgVdqhFDd38hvFS6x9YfcmFQmPVGpv+fvI0PsoEXmitoLZjUWPVBm4K3uioIDrxIvU8HTmd9TtHkrZzYz3qC+nHWBf9CUNaBrD13BFmd1PQwlOBQiiqMzS+mUivWtmciYvGKLqz5LSpWAxBUZf6Gbt0vND7vgr5ebvEEEDvh9pywWkX+1I2FoshgMtZZ/k1Yz6Pv9f/lvskIyNza5FXiGQqTf9n2rAsZbZVW6Y+Db1vJpkHQpkXlUaPJzrwRrM6GIwmdh+9wGvnwrFYJJzKmL9dY39CMw+UKOp4I1sylvFo/SlkJ1hfQdmU8isPPvc6i55ZQ6PWdRj29Mu4eDmQcDmd71/ZQ0pcRrn32PPxFnyZPMOqLSE/Go8gEz5ODkzv1x+FyoWQDBFHlcTkXhb+Pnuav8+dL3P+XZevMGNQMK/ucqBX4GSebDYBUYDjCXa8thsmtrSw7MClcv0sC6doic+XbqFxgBcje7fHvZ4jCSkZfPvTSeJSslg0sw8LLm2zOvZK1jlGtnBg6XfHuXQwmgd6t2NUCx9y83Vs3n6KM1fj+fqtR/nsvPVYrAKjxKlkI6O8GrL9hO1q09e2yG6HGALo+nAzPktZatUWnXMVz6b2qNRKjIbyt1hlZGTuTmRBJFNp7FzVZCfb3s5JNETh6etCxslwdp0M58ZqOu4VmL+RszuJOtsP0bi8SCwK21WcM3SpOHppAbhyJporU8teDbKGoLHYzFIDSNcl8PGokXxxVklUtgX+CdJeCUwKbo8kSaw/b7uwYlJOHjm5SXSu5cv2KAvbo6630WjrLWJHJmGp6Tft9zVuXHXJIJyflx4pYfcCFNM6l1gV+Td6XQa1d8aiLzSwbkXJmkVegPP7dkRn2w5Ov5whEKxTcGbjnVtfSFIbS6xy/psMQzKOLvZkppa9xSsjI3P3clu3zJYsWUJwcDDOzs44OzvTpUsXNm/eXGzv1asXgiCU+Jk0qWSadExMDEOHDsXe3h5vb29effVVTKaS3+L27NlD27Zt0Wg0NGjQgJ9++ulW3N49jy7XiKPK2abdRx1ARkrlHyBpsTn4KGzXuKnlEIBosq3pXTTuFGRWMTjXIKIS1TbNfo4B7E9Q/SOGSrL0nJkHWrdFFARUCgX3t2zInOFd+N+wTnSrH1Dcv33ulq00sIvjnc4Cg+qq6F9XyVudBDp7JPP236Vjn2zRzNebmQP6M2f4cMZ17IBfqrrEFpRfPS/GzR7IS9+O4aHX+uDmXfS7c3P1QCijm7yPhx8GnW2xUJCdj7+T7Y+S+i4SsVeTUKoU9BjZmilLR/LC4pF06N8MUbwz4mJEkwqFYPu95KbyIi+n8rFiMjIydz63dYXI39+fDz/8kIYNGyJJEj///DMjRozg9OnTNG9eVIzumWeeYfbs69sy9vb2xf9tNpsZOnQovr6+HDp0iMTERMaNG4dKpWLu3LkAREZGMnToUCZNmsTy5cvZuXMnEydOpFatWgwcOPDW3vA9xq7vzzLglYdYG/9dKZuT2hXHPG/SErMqPf+5Q1cYo32SLcJvmKTSWxUD3R7DlKbEQeVEvrF0c9VBXo+y44PTlb4+wMGVl+j52DB2JpVurOpt74e71o/NkbZXRy6ki/RvEsTjnetxMGkN6xP3oRLVdGs+lKfb9+Ptr/aTk69jyXfbcbLX0LaRP6IosOBqPJm5hWiAsooS5NYRUAgCs+8fik7wZFMUZOokmnt48MHLzdnw/U7OhMYx7r1BuHWW2JbxGymFCQS2b8DzA5/g+LJoJLMz7X36cDx5Z6n567s0QZRcEUQByWw9Df7P344ybtwA5hwuLT61Cujsq+b9E5G8t/FJDhVsYHXmSkRBQZcpAxj+8lN8On4VWWnWm+PeKo7+cZluDwxkX0rpuDY/hzpkhRkw6uXtMhmZexlBkmy0wb5NuLu78/HHH/P000/Tq1cvWrduzcKFC62eu3nzZoYNG0ZCQgI+Pj4ALF26lJkzZ5KamoparWbmzJls3LiRCxeub1s88sgjZGVlsWVL+U0nr5GTk4OLiwv9vCeiLGPF4L/GS18/yFWffexJ/rt428XHvjYTa7/NN89tIfZqcpXmb9O7EQPfasEPsXPIMWQBoBY1jPB7CsM+d46uu8RTS/rzbexsUgsTARAFBX19RhEY25HFU6x3iK8ookLktV/HckT7F0dTr2dgBTjV54m671CIN28eVNkcP7qRSJ/AZD49PZlcY8nChQGOQYx0eJ0PXrcev1MRsoM0PDm+F3HmeuyPK7lKpRBgdjuRK2t3oByQyN8JP5Ua/0zdWdTSdKfAwcT2mA85m3ag2Bbk0pyHG80lN0XL5yPnoC8svUpkaOKPqYUTP81+hk0RJlZf1mH6xw0PO4HXOjmgKMhAaUljcfR0svQlt/98HQJ4wv4t3h/1c6Vfg+pAqVIwc/lj7FWs5ETa3uLjdZ0b8rjnTD59bDWZVVjtrAoVqUauDzRQ1z+V/r6hjHEuv3XKmpy2bE9qQlScFwCaGPkzTebexazTET73TbKzs3F2tr2rccfEEJnNZlavXk1+fj5dunQpPr58+XKWLVuGr68v999/P2+//XbxKtHhw4dp2bJlsRgCGDhwIJMnTyYkJIQ2bdpw+PBh+vXrV+JaAwcO5OWXX74l93Wv8/mkNQx5pguvj1qKTpGHWtCSHpbP4gkbSIqpfOzLNU7vvkJeRiETXp2NNkDAIplRFNqx/euTHPq7KCrpt1f389KCj1C4WtCbC3FSunF+SxSLX62aGAKwmC18PG4FI6b04I3BY9CJ+WgEO+JjDLz95WGGj+xAU4/6XEq3vkrU019iX9yaUmIIIDYvnHz3KOoZDcUFH28WT1Udgn3r8tvR0lt2Zgl+DBd4a8J9vH5yqNXxvyd8ydut2vLOQWdGNHyLwfXy0Juy0SicCM9y5L0DGt5uqLMphrKDNIwY1oolZwpxUCmY09MJvVlCKQrkGiwsPV3I1LZ2nE7dWUoMASTlx5LodpmglgGEn69YI9iawGQ089HjK3jgxR68OWAsOiEfrWBP3NkMPpl6+1ewZGRkap7bLojOnz9Ply5d0Ol0ODo68ueff9KsWTMAxo4dS506dfDz8+PcuXPMnDmTy5cvs3Zt0YMuKSmphBgCiv+dlJRU5jk5OTkUFhZiZ2eHNfR6PXr99S2AnBw5mNIakiSx8ZtDbPzmEKIoYCmrpHEluXo2lo8e/w2g1DVq1fHk8Y9680PCHCIuhSIiolZoGNZsHOPfH8zPb2+2NW2FMZssrF24l7UL9yIqRCxmS7EYWHniJO+NqM//DhUJkBtp7CaiEjI5nb7P5txndHtp1rldpQVRPZORKxm2X/OwTDMmpapEyYAbyTFkkZWbwphAZ749rwRcUQiuxfcyPFDiyJrDZfrQrk4dPjplxmQxsz3KgCiUrGxdYDZwJn2vzfGnC/fSvGff2yqIAIwGE6s/2c3qTyj+PcvIyPx3uO2CqHHjxpw5c4bs7GzWrFnD+PHj2bt3L82aNePZZ58tPq9ly5bUqlWLvn37Eh4eTlBQUI36NW/ePN57770avUZ1IYoCDVoGoLFXExWaSG7m7SkgVxNiqLxrPPXJYL6MeYNMfVqRHQs6cyFr4r5mQofXaNS6DlfO3Hx2mc3r/+shmZybx+/H9vNe1x78fhnOp5lxUAkMqCvQ1qOQ8wlJaBRam/OpBS0mfZFYcfF0ZMzkfohKkb++2U1qfPkFGU1GMxqFbbsA5dZEMhWasRw6wWv3teO3WJHYHAve9iKj/S04RkWw9Pvr+YFu3s74B/mQn1NI6D8ay2g2o1YImP753fz7bSD8I1JtoRa0mHR3lviQxZCMzH+P2y6I1Go1DRo0AKBdu3YcP36cRYsW8fXXX5c6t1OnTgCEhYURFBSEr68vx44dK3FOcnJRzIqvr2/x/187duM5zs7ONleHAN544w2mT59e/O+cnBwCAgIqcYc1S+9H2tDn6daE6k9SaMllqHYQ+eHw7YwN6ApuffuDW4lXbTfyHJPJzEizat+Q8gsjJ83gyqTqE0TW2H0ljAsJiTzUri0jO9ZCbzSy8fw5vtseQduAWgxqM5y1UV9ZHdvJfiBf7drKgoMzULgaOZa8F4vFzJujx6IsdGRGt08wGGwHbceFJdPA3oIoiFZbc3TyVWDKzsZe6UiBKa+U3d+xHsmXc1k1fx2Bfx7jwef641nXk+yUHLa/u52r/4hJFw9Hnv1sGBbvPCL0F3ARfZng2JZft11hx6WL9A3sxvpw6346KLR08xpBVM7HVu1dHIfwy6aDNu9RRkZG5lZw2wXRv7FYLCW2qm7kzJkzANSqVQuALl26MGfOHFJSUvD29gZg+/btODs7F2+7denShU2bNpWYZ/v27SXilKyh0WjQaCrWdPR20eexdgQ9rmFOxPWVtE2soIFrc15b9iIfjPn5lqza3C68/d2JM9iubZNWmISzt71Ne3WSmpfP4r37Sx0/EZPAsx36UN+pKRG5JQssdvMcROopPXO3T2Fbykq2XVhZbPsz/Fu61hrCF6dm8lyLuWVee8e6Y0wc1JVvQkquarhrBR4KMLHj+4O8OHU2H518tcTWmVZhz/Mt3+Hr8VsBiLmSyLczfik1v8ZOzavLH+GHtPdIiI4pPq4QlDw3+AN+P55Nt1oGzqSqSvU7e6CByN4roQS7NaaxS2suZ58pYW/vfh+6K6oqZSPey1zbmpWRkal5bqsgeuONNxg8eDCBgYHk5uayYsUK9uzZw9atWwkPD2fFihUMGTIEDw8Pzp07x7Rp0+jZsyfBwcEADBgwgGbNmvHEE0/w0UcfkZSUxKxZs3jhhReKxcykSZP48ssvee2113jqqafYtWsXq1atYmM5bSPudESFSO8ng5kT/mwpW1hOCBdq7aNd36Yc337xNnh3c4iiQLs+TWgztAFIcGL9VU7vuUx5CZAZyTl4q5rZtDurXSnMrlrbi2s4ONvRe2xbAlt4kZNSwJYDiWRTsbo073y1n7cefgmpTjJnDHtRoaG9th9JJwo4uv4K9QZ5si1mZalxhxI30alWb3qNbM+eP0/YnH/3tnP0CrBnbvdWHEoUydKLNHUzU0+t4/PZf/H00y05mLCJtzou5HTKoaK0e6cGNHVvzeaYVXQZ3aXMFiZ9H2vH9vzlJOTHlDhulkx8c/ltXur6La/8sZZ3hw4l1+zI6VQF9kroWsvE+bgwlhw4hGeigrfGPEO/ulmc1O1CREEHbT8yQ8x8/drfpa5paOJfodf2Xqe87DIZGZnq47YKopSUFMaNG0diYiIuLi4EBwezdetW+vfvT2xsLDt27GDhwoXk5+cTEBDA6NGjmTVrVvF4hULBhg0bmDx5Ml26dMHBwYHx48eXqFtUr149Nm7cyLRp01i0aBH+/v589913d30Noqbt6hFSeMSmfU/qOp56Ys4dL4g8fF14+YcxnDHtYXPWFwiCQKdp/XhgxpN89uSaMisDJ0al4m2qZ3M7aKD3w+ycV7U6RACdh7VgyIy2bMlYwcncS7h7ejG256NkJLsy++Dxcsfn6wzMf2cn/nkFNO/UDJ3Rwhf7NpGXVcCHm15ig43tNIBNkb8y4dV3yxREAOv2nGZVzHm61a+Ds1bLn1dTiTuagkeKEbW3hf0RWziYuJ2Wnh1w1XhwIf0Ea65+D8DMdveXOXfbYQ1YkLLIqs1kMZKmj8BepeLl1Wuo5+FGSz9fUoxGXj0YTb6hKDtNbzDx6ft78MvOo0WXppjMEksPbCc7veTv7ZoQkldFipDFkIzMreO2CqLvv//epi0gIIC9e21nplyjTp06pbbE/k2vXr04fbrqD8Y7CTtHDbmS7aDbAlMeGmfb9XHuFF78bjTfpL1dXEMIYH3BLxy338XUb99m9gM/lTl+2awdvLRoPkuj3y0OrBYQuM/7fjziG3P2wOoq+RfYyJdeLzfiwysvFNdZytSnEZ79Dv1rP8ITHYP59VjFeo2lxGWU6p+mclSQm5Rlc0yeMQeFumIPRYPZzO6rEcX/dgLUaiWF5qKVLItk5mxqaRFtppyCg4LFZpYaQL4pC3t10XstMj2TyHTb78u0xCz2rD1p3f8K1NuRkZGRqSnuuBii/xpBLf0ZPq0LjrXUCAikheXx14KDJESmljku8mI892l6sBPrtXaaurUh4nSiVdudQssuDbgsHS8hhq6RVBBHjNt5mrStQ+gp29s54efi+P657Yx98y0cghQUmgtwxp3j666yaMkaALT2GoY825ngAXUxCQYEvYp9y86z748z5W7L3f9iF1YmfW6119eO+N95tfVAlh27RFmzFD3gNbjgX6p56ZX9MbTq2pPLmeesjm3p2ZXE8+l41nLl/heH4BdcD4sExqxcNn+5idNZujJFREGBATfRCwGhRCf7a9gpHbDklZGmBmTFF+LjVJvkgnir9kDHZsRmHi1zjluBo0bN2PbtaFc3CJNFAMnA+nNn2HLxcoXGC8DQFs0Y0jIYCRVK0cKRiDB+P3GKAqPt1iUyMjL3BrIguo10HxVM58mBrEiYR0ZMkQDycwtk4vcz+OONo4QcjbQ5Nj0pGxId8HeoT1x+RAmbiMhwz6f4/NuK98G6HbTsW5dTeb/btJ/M30OLPsPLFEQAMVeS+XTCKkRRQKlWlui75eBsx8zfHmWz7ic2x85HQkItaug3bjRT+43mi8l/WBVF11YrnFu4knTZen0cCYmkwnB8nZ1IzCm7cN81UcS/ykV8cOg0fz34EFuilhVX4r6GndKBIXUf4s11e5m8+hW+CRUIu1i0UuOq0TJu9pN4hZzjh6vWV1yucWZzBH37P8CO2D9L2R4Keo4dc8uubLzxy6OMWTSJxRFvl7I1d+tARIoRnen2trVwtdPy6ZgxrAlT8eehotdIq9AwPKgr79Stx+xNZVelF4APRtxPVIEnc46ZMVhAQEEnv2YseiiIaWvWkKevnng0GRmZO5Pb2tz1v4yDsx39p7RiccQsMnTXV4MS8mNYGP4aj37Qu9zGl1+/vJ5xrm/Q2+cB1P/Uumno0pJXGn7G5o/P3PGduU0Gc5mNU1WiurhGT0WwWKRSTUifmD2AldkLOJ12sHiFxGDRsylxBQm1z9BteHCpeW7cupHUZf+JKEU1JkvFatbk1hFK/RTWU5BV4MrU1t/T0acvoqBAQKCNVw+mtf2RHJ0bzz4+kPdPSoRlXn8tsvQSn58106p9MP6utkvRAzh6aGnn042HGj2Di8Yd+Ke9SovX8LDzxtnLoczxkRfjOb8ijReDPiTQqUjQ2SsdGVL7cfq6Pc8n28sWZLeC6f368s0FFSeSrr9GOjOsumImh1rc16B+meMHNW9CTKEnf4UViSEACTiSYOaXUDVTe/eqMd9lZGTuDGRBdJvo9UgbtmeutLqNYbDoOVW4h7a9mpQ5R15WAbNH/kz+L7V43vETZnh+RatLo/jqsS0c23xnB1MDHPv7Ml2dhti0d3UcwvGNoZWeX6VW4t3coVS6+zW2J6/mvnEtSxz7dxzL5cQ8Grhaz2RTCkrcVYGk5lW+EOZ9DeuzMdzEG3ucUQuv8WKrtbzc5i/cVG/xzj53dsdIxBdoyTVY35RbfQXGtG1rc35BgPqdfZh/4hVC0k/yRNOpvNr+I0YEPcHOmHV8fnYWXR+2nal3jW0/HmPZM4foEfcUr3h9xUTNXKLW1mbmol0YzBUXrTWBvUqFh5MnEVnW/fgrzMLINrZfI4BhLVuzMcL6+NAMC3U9aqES5Y9LGZl7GXnL7Dbh19idc/lXbdqjDaH4NeoDu2yeAhRVKt639gz71p6pXgdvAVGhCbik3UcT1zaEZpUMem/u2gFVggfxEZVraQFFxQTTDLbjqPRmHYK27NWdX4+F8tHo6Xx5/uVSmWyPNnqVFcfDKu0fQF0PT8JzBApN8OcVkT+v3Fg3ScIgiVzOLLs1x+h6Hjbt9vYaskxFweYh6acISS+9PWZRV2wrKCEyle9fu74Na2jijyVIQ9GG0+3D09Ge+NJJhsXkGyVUynKy1gQlZS1GphSCi52WtPyKlVqQkZG5+5AF0W0iMyEfz4BaVgOKAbyUfmQmlPEpf4+w6Lk1TP78CXrVGclJ3Q6QBDrY98cYqWXxy6VjXm6G3KwCXFWeNu2ioEA0l/0nkJKbz7zNF3h7yNfYqVWIohOSpMdszmfl8Qh2hF6xOdYp+rqQEQWBLi0Dad/CC5NJYs/xOEIik8mrnYtPkESIjTmUSPg72BZEvg4i6fm23yeFhQaclK427QICSsvd3ek8q1CHVxn1N1UiUEaWHIBCsKAQxFL96K7hroVcGwVjZWRk7g3kNeDbxO7lp+jv9pBVm4BAV6chHNtm6zF576AvNLLwmdWsmHgU7eo2aFa15pcJB/li0h+l4oFufm4D+gQRTztfq/bOXv04/pdtQXONPo2a4KgN4vdQH97dr+bLUy5k6OsxrGUHm2OcoiVcwvW4hOtpnK/lqxn9aNEtmkPSp5xWfcHgoUYWPNeLixsuMcDb9ipVc3sjwQ5GlDb+Ukc3EFlzynZQtMUikXalAD+HOlbtbTy7cW5blM3xdwM5Oj2SKQ8PO+srVX0CFWy/eKHMOfZdDaV7bevZdr4OIrkFWehNt3drUEZGpmaRBdFtIj0pm+iduYzxfw6FcH2VQq3Q8nTdN9n97YUqC4K7iZS4DLYvP8aO345VaxuHZW/vYJL/bLzt/Uocb+7WgZ7CaHYsL7vgYc+gunRt1IpJW3PZHWMgMd/C+VQTs/bnczrdgXkjShY1dIqW8N+lwyVcjzo0Du3VBGa80YWvomewKX4ZSfmxxOZG8HvM5/ydt4BJz7Xh6E87eLEJqG94HitFeLohhK0/jD4pmv91BwfV9Qe+KMADDSXa+5hJO1OyttG/WTl7F0/7vo2fQ2CJ441cghmkmcCmb2wX+Lxb+HzXTl5rL+BtX/IjrYOvgi4++Wy8UHZM3R+nzzIgQEcrr5KiyM9RZHpbic93l7N3LSMjc9cjb5ndRn7/cBe9Itswc8JXZAupKEQl2gJnNnx0lFM7K1Y7RQYcXe0Z8mxnmt7njxkThmyJLYtPcO7gVZJj0/l83DoefXcGzvWV5JqycVd5E344hXlzl2PUW08Xd4qWyK0j8Hyv3nx0osDqVsrKSzp+GBKICFi4vip0Y62hzoObc0y3rbho5I2E517EVCeLC3suIFrMfPXiYDIkJRYJvFRmdn23i72rjtB4aH82R33JW91exGj2RGcCL3sDJ5P/Zm14CiMbdGbz+pASweA3+pIOLHh8DWPfnYpbAw05pixclZ7Enszgww+WV7gJcLs+Tej/XDuUjhIKVJw5ncFvp8PI5dYIdzc7LWM7diDYPxCTRaDQUMBvx45yMjaeqIws/rf+T56/rxfO9q7kGMDTTuJ8XATTVh/AWE4moN5k5uXVa5jcswcPNwogXQcuGkjPTeetdXtIyC67rIKMjMzdjyyIbjN7fj/Nnt9PY++kxWKW7vkO9dWNZy1Xpv0yhrVZS/g7rmi1x0ntysi3n6bxTn9Wf7yblLgMFj2zBqVKgdZBQ0FOoc2mt+rQOLxCr7WQ0KBW2hGbYz1GRwIuZ5jpKHgTEpVcSgwBBPevx/qsz2z6f1K3iwFP9KduHxcWX32atMJkREHEWePG2DEv4+zUg7O6/VzJOsOVU0+hVmhRiSryjUUPaLVCy3PBg9n5cRwuFPkMlPIlPSmbLyavvf4a5OqwmCtWLgBgwgeDsbRN4pvE1ynIykNAoFWTLnzUaxIv/bmfzEJdheeqDLVdnPnggZEsuySy6tC1WkxOPN5hAC1rX+SnI0eJyczm9b/WoVIosFcpydUbsJRTePNGCoxGPt25C1EQcNKoKTAYyxVSMjIy9w7yltkdQkGuThZDlWDigqEsTZxFSMb1ra9cQxa/RH+Kbz8F9ZtfbxJqMprJyyqwKYZcPZ0IahmAm5cz6tA4XML1lF3DGVSCgDa6KFbIOSaVoBb++NXzKraLCgUK0fb3DqWopOOIJiyMeJW4vEh05gIKTHkk5ceyKHwmHcc0QsH1FiwGs65YDEFR6v81YXPNZ2vCrNRrcBNiqGWXBijaZbAq9qviTDsJiTPph1gW/S4zB7av8FyV5fVBA/nouMC51JK1mL48baZFYDPqe7gVHzeazWTr9Dclhm7EIklk6/SyGJKR+Y8hrxDJ3LV4+Lqgd8skNdp6pt7fKT8wdPKLfDXFuji4hm+gBxPmD8TolkOiIRpfdTDqLFd+fn0bBYnpNHK350pG6YBahQBBzpC88xxT5gzGp6UDEYUXcVQ44ycM4+8Fh5B0At2bDmRt2I9Wr31f7aFcyT2DwVJaDJslE5sSV9DdczA7E623aOniMYCjP13fXrUlhKrCgEltWZY026otLi8Sl3qFOGk0NZaF5e/qTI7JnrRC6wJl5WV4uH0H5m3dViPXl5GR+W8gCyKZuxbfQE9i9LZjrZIL4nHzcyxzDg9fF6b8NILFMW+QGX09zsdV48GUH+axZeFmXvlgHC/tzKHwX+FGU9vZc3X/BV77dSwbTF9zKeJ6LSWFoOS5me/gkOOEv0dbjiXtIS6vZCuWdt7dMUtG8gTbzVCv5J6lh+YBengNYX9qySbGXna16KIdyrsbfijzHquK1lVJdpJtHxPyw6nl4khuSs0IokA3V8KzRcB6lld0jhm/Jq41cm0ZGZn/DrIgkrlryc7Iw0PV0KbdQeWEPq/sHltjZt7HL0kflgp6ztKn80PCHJ6c9A6hGatZOmgU2yJNXEwz42Uv8kBDFWmFJ/Bu4sbh/C1cSi9ZWNIsmVga+R5z2/zM8rBFPNL4OdJ0yZxMPohaVNPFry96s46QlNN42vnY9M9D68OJTZcJ9OvK612GkKtJQUJCY3BEleHKZxNWYzbV7NaOZBRRixqrq1gA7lpfMgus93urDjIKCvG2s7395aoRyNPJ280yMjJVQ44hkrlriQtLprbUsLiP27/p5z2G3T+dLXOOWs1diMkNt2pLyI/GyV/F8ssLeG1fd+xVPzG6yQXa+GxlwalhfHH2RTReIntS11kdb5HMnE09yiDvsXxy8nV2xayntmMd3LVe/Ba6hO/Of0RbbR/8LI1s+tfHZQz7V5/HydOBVF0C26LX8nf4Mq7mXEBjr0RrX04F5mrg4G8h9PAaatXmoHJCg2+V2peUR2hyKg1cTEUFFq0wPEhg7emyyyfIyMjIlIcsiGTuala+t5sX68/DXllya6yjR2/qZLbl1O6ye6GZhLJTxg0WHZZ//vdXxA8sOj2VHy7OJdtQVPvHgrlUS48byTSlYgjTMMR3LHG5EWyJWs32mD/JMWQxse5b7Fh6lmO/hfN44DSUN9SjEgUFo2s/Q+T2LEa/1oPDzqv5PmouV7MuEJcXyYb4X1kUP4MXvhmOWquyef3q4MC6s7Qx9ifYrXOJ485qV15o8jGLdp2v0esDfLN/L693VGCvKll88b4AEW9VBsdj4mvcBxkZmXsbecvsP45SrWTiB6Np1r81RgRUWDj913F+fn8dlrsgy+bSsSh+n27muTc/xOJaQJ4pBx91AJd2xfHpRyuRysk0Uhm0KAQlZqn01pooKLAXnHFWu5JjyLI6XmFWUde5IVE51vvSBWla8s3r6+g8vBlvPLiUFEscKkGNk8GTjZ8c5cT2IsHWNaUlrz63mExFMpJkwQM/dv9ylhNbjjHl90FExF9kVIOXCHLpjFkSyDcmsi16Kbtz/6DnqFbsWFFzKyRmk4X5j6/gxS8fY3zXaeikfNSiFl22yEfLjnGZ9Bq79jWORceiN23hle49MGJHvhH8HCSORl7hzXWHa/z6MjIy9z6CVN4TQwaAnJwcXFxc6Oc9EaV4d/d+uobWXs0nB95lbQxsDjdjsICdEkY2UtDXw8j0rrPvClF0DXsnLRo7NdnpeRVOKx8wriPa0YlsS1pVytbXZyS+Fzqg7ZHD96EflrIHe3ama/LDaBuZWBj+aim7n0MdRpmn88n4lcXH3LycMZnM5GZa32JydndAFEWy0opS6zsNaEm9GXqae4/kp3NOnEkpui8fB5GnW+kJz/6V5umd+fiJlVbnqy4ee7s/dl1zWZf0I7mGLERE2nvdR2/PJ5n61z5yyonhsVa08t8YmviXKC5pC0eNGjuVivT8gkqn1t9r6AMN1PVPpb9vKGOcbbdyucaanLZsT2pCVFxRiQhNzL3xmSYjYw2zTkf43DfJzs7G2dnZ5nnyltl/mBk/PsMPVyTWXS0SQwCFJlhx0cz6JBXPLxx7ex28SQpydWSm5NxUjZ3tvx6nTkJHRvs/i6Oq6A/FQeXEyNpP0zClB2d2R+Ch6cmEZrNx1xY9PNQKLf0CH2V4/bcp1Gs58VMMLwbNK26NoRCUdPHqz5Meb/Pt9A0lrpeZmmNTDAHkZOQXiyEAySLR1ncMsw84FoshgOR8C3MPqQj2fAKVqma3zJp1rIdDtwKWxXxG7j8rZRYsHEvdzW+xs5k5sF2NXv/f5OkNpObly2JIRkamWpG3zP7D+DSrw94t1isMbwg3M6Jfi1vskXWKqkbXHJ8uOkq7DnUYP/IjtPYC+kKJHWsi+PnIYd6YN5bZR9UEOndnVMM2OKn1mCUVu6McmB4i8X6bBiz9YS8hHxsZ/ugreNbRIFlEju2N550tO9F7uYKXa6V9S3O0IyrbmQydwap9RYiWBx3dS71G1VmPaODk9vyWNMeqLSYvDI96Rhw1avL01n2UkZGRuRuQBdF/FDtHDRk629+wTRYoMJe9dXEruLaNUtPsykhi1/dJJQ8GaRBd7Mg1SISkmQlJ0wLXMtqKXrs4g4CqjTvn0nM4t/pYyfH+Cii31nXZ2Ld051SybfvlDDOWWtpSr5EL/tUmihw81GQl2o4TSigIw8/FiSspNR9LJCMjI1NTyILoP0phnh5HVdmCR6u8dYKoQXAAbQc1QJLg5KarhP5Tg68iMSU1isqCQlBYbe4K4OoAiR4Gch3L97FdQC061fPGaLawMzSOiHTbxQ6vkeigo5PGekFCAE87kVRTgZXXSFNKFDk429FjVCvc/R1JuJzBofXnMejKb8wqGQVUohqjxfoKkKvam6zC6q+QLSMjI3MrkWOI/svk5hDobP0t0MJTJCcmyaqtOnF0teet1ePoOc+PS+3WcbnDevp8FMg78wdiauF0e8UQsPfqZbrXtr7K46oRwJxfbssKb0cHvn28L4Pb5xPL96Srl/NcXxc+HtUDjbLsFaSTsQm09TajsPEyDKsPf54pHUSbW0cgO0hTvJU2eGInZqx9gILhZzjV/HfUY6N4Z9NjdBjQtMzrAxxadYnu3kOs2uyUDtiLfqTk1lwdIhkZGZlbgSyI/sN8PeUXZnfV4GNf8m0Q6Cwys4OGJVN+rXEfXvp2DKv0n7AibiFXs0K4knmeZXGf8lf+x7z9TFecom9v4OyfZ84yIFBHE/eSr5GLRmBmR5HFe3aXOV4UBOaP6sry8DdZG7GY8OxLhGac5efL73M04xveGdqpzPEWSWLF0SNMb69A/a+/1t6BIo5CGucTSu+p3ZjV1XFQMwLGKPko/EWOpe4mKucK+1I2Mi/sBfrObErdpn5l+rB/7Rk6mAfT1LVNieP2SkdeaPoxX9yCOkQyMjIyNY28ZfYf5urZGL568ivmfvUk2aKahDwzgU5K7AwFzB+9mMSo1Gq5TkBDH4ZN7YJHXQcESeTi3li2/nAMvzqeJNhfIjajdKXoqNzL5Lpepo3kzOnonGrxo3KYeGfBGiY/2IfHG/gSnw/uWhAN+Xzx/S6SE9JxKmN0t5aBhGTsIKUgoZTtYuYJenqPJijHgZRM28Udj0ZfRpsmsXBEd9KMKgqMFuo5i5y6EM5HX+zEyUq21Y0p7oOf78BnsS+VOscsmVgW/ymjp7/KomfW2H4FjGbmP7acx/83huFtnyZRH42zyg2V0YuvfgkhRJRjh2RkZO5+ZEH0H+fSiUhe7vgOHr4u+NbxJD48pUTad1XpPjKY7lMasCrxSxISoxEQaNGlA28MeZbw40kcyV1uc+zRvK10ajKEiE3VI8wqj55vQ9ahVitxd3cgL09PXl5Rdp5LOSPvG+3F5rTvbdpPZm3hPs/ebDsRYvMcbx9nxvYLZG3ULKLzLqMS1agVakb7T2F4/QB2bytdFPKaGNLYqdBpc2z2IUsrTMK5TvlB6/pCI9/P3IBKo8Td24WCPB3pPm5Fwdy3eVtTRkZGpjqQBVE1cO0hkZ9bSF5Wwe12p1KkJ2WTnpRdrXO6eDjSb2pL5l+divRPVpaExPmMY8Tmh/N67y84EnX9fA+tDyCRrkspMU91ppBXBbVWhdrbGXV2AerswgqNEXMLwa2MEyRQJmeUeY9T547jq6hXydCVFIZfXH6NVx9YyKXfT5GRXL2/O1sY9SaSY/9ZEfIp68ZkZGRk7i5kQVQFtPYaxs0eQK1WziToo3BRuqPOdWHle7uJuCD3VhrwZAfWp/9YLIZuJEufTqIumi6uA6nr3JBufv2Jz4tGEARqOQSwP34LdYTmbN8Sdhs8L4m9o5ZxHwzEq7k9/2/vzuOiqt4/gH9mYIZtmEF2DAE3UFBU1BDzq6YIEpm7ueRupoGG9TXTcklLrTT1q6hpCe6WGUYqICrghqIIBKgoKCKyqezrbOf3Bz/HJrZRlkHneb9evGruOffMc58G5unec+/JqcqAEc8UWgUCHFpxFhl36p94fuNEKlwXuONkxYFa2/vqD8O+qEt17m/dyQJPdB8g/3HNs2QMDMGPA+A5ewQOrz1T6/5VFRLoVgrrXK3eVM8SJZn0/CBCCKGC6CXx+Nr44tBkBFX6I+Xe8xXVBTwhfLeuxf6FkUhL1OyiyKa7OU4X1X0p6FrROYxr9xEuFRbj26ufQI7qJzFzwcV4+w/hoNUDu25ebKlwa8XX5WHJ4ck4WroZqfeeH4uQbwTfn9bi54/C6y2KYs/ewnuLZuGa/rka84gcjXpDmq6HJ1mFde5v09kSqdKEOttTi5Lg2W12vccQuuMaJi5cgH0ZG5S2a3G08cEbn+GIb90FGSGEaAq6y+wlDRrfC9EIRkqR8pdVPy2KKgAAJSRJREFUqaQY/7v3BSaufFtNkbUepQUVaKNjWme7Fd8WOWWZOHZ3j6IYAqqXhfj1zk8olDyBwEi/JUKtk/sHfRAh+RWpxcqFXbG4ENvSl2Hy10Pq3V8uZ9g083fMFq3C+Dfmo4OoK7oY98Bsm2V4u3wqdn7yZ737lxSWw4hjVme7ka4pSgvqv3x3NeQmHh6T4fOO/8ObZm/DTmiPgebeWNrJH2e/v4X0WzUnfBNCiKahgugl9RvrgEuPw2ptK5eWolyQDyPT+u4/ev1FBMRjqMn4Wts44MDVzB1heb/WvX9REAaMcm6u8FTS573OiHlS+631xeJCyE3KIRDp1TtGfm4RVr4XgBuri9E9fgw6XnkHfy64iY0zf23wwYjJMWlw1hsAbh2/qu4m43B2T1yDxxHy8xVsHHMc+sE90StpAsSH7LD6nYO4FnarwX0JIUQT0CWzl6XNIJXX/WX27OxGU96x9aq5k5CBdzL7oZ/pMFx5Eq7YrsXRxgzbxci8/RgF3Lw69y8Q56GtRf3FRnOTc6WQs7qfFF0ofgJ9Qz2UqjDJOjnmHpJj7r3Y+8vkCNseiznzvsSe9HWQMqmi7U3jt2GS1xm3Y2NVGqusuAKhgVde6P0JIURTUEH0kioKpRDx26BIXPvyC2/otMeTrLrnfmiK/83/HRO/GIJlb49FesVt6GjpwYrTHqe2XoMOvwgdpjkho6Tmc4gAoINeN2QmP2mROF0Gd8HQOT2hY6gFcakM5/Yk4PrZW5CUAAY8Q5RJai9sLXVsUPC4eYuMi3/8DXGFIxYv2I5cpKNCVgY7vS64ez4Hm779rVnfmxBCNAUVRC/p9M5YjFg+AwcyNtVoszN0QMFdCSrL61/SQRPIZXIc+vYMtL/XgoWNCSRVUuRlRgKonpi+yncaLj0OrbFOFl9LF26C4VgeuqdZ4+NwOPD1H4N82zsIyF2O0pxiGPAM4bFoAt6aMB5nf46D96cf4LfMHTX27SzsjpzEUkiqpLWM3LRiQm4iJuQmzK2NwdflITfjBiTi5n9fQgjRFDSH6CUlXk6FOMYIk20+gSHfCADA5WjB1WwIpogWI3BpiHoDbGWkEhkepeUhLzNfsU0iluLIiigs6rgBNoYdFdvthPb4tMMGHFh6BjLp88nWAiN9GJkagsNpugcBun/QFw/eiMGxzF0olVQ/EbtMUoKgR78g1eICzGxF4CW+gffb+UDAEwKovuTX39wD4wQLsX957fPImkteZj4yU3OpGCKEkCZGZ4gaYd/KUPQcaI+ZH60Gz5QDLaaN2L9S8c2BfaiqaHgVcQIkXkzD4xmFGLFwNiwchOBwOMhKKsD2xSGK4slliD1G+PVHqc4TiGWVMNduhyvHUnDyp8uNfv8BEx3x/SP/Wtsi8oKxZPx2rPQKhMvbXTDrw2/AM+OAK9fCteN38O3h/SqtFk8IIaT1o4KokeLP30H8+TvqDuOVlpPxFLv/+1etbQPGOKP3fEtsSl+oeLAgBxx4eE3AnI4j8PPnte+nKgm/EjJW+9kWOZNBol09WfpGxG3ciLjdqPcihBDSelFBRFotLW0uPD92wbq0+UrPKWJgCMv9FR8628O6kwUyU2uu9q4qbcar/idHGy4WA2CqZ4GnFXmIzb0AKZNCG/xGH4eqLNqZwGWYPbhcDuIjUvEoTfkOPCNTQ7zp5Qi+vjZSrjzE3YSMFouNEEJed1QQkVar10AHxFZEKBVD/xRecARDZ3yAvV+FvvR7ZCcXwaPrWLhavY0r2eeQXnQXbQU2WPrmJlzJjkDureZ/bAJfl4ePt44Ca1eEq+WnwSDHqLFDYfBkKLbND0JleRVmrnsHJr14uFx2EvmycvznvbcwSTIE/vOCmnwNOkII0URUEJFWS2QmwFNZ3WuZPanIhZGFQaPe4/pfdzFq0CCsveqnuHR2M/8Gzj38C0t6b8CpzTcbNb4qfPxHI0pwADcfPH+eUAKuooNhV3zysy8e3XmMjM6XcfD+KUV7Eq7DTM8KiwK/wcp39yhNPieEEPLi6C4z0mo9Sn0CO22nOtvtBA7IvNm45xS9PaMHtv29qsY8IjmTYdvfX2PQtO6NGr8hVnZmkLbNx83Cmg9XvFdyC09F6eg4yBQXHp+q0f64IhvRlSfh9m7zxkgIIZqACiLSat2OvY/OWi7Q1xbUaOOAg3dMp+LMPtWe0lwXfSsuiqrya20rERdC16LpbvGvTd93HHClvO5b959wHyK2LLLO9uinp+E6yqEZIiOEEM1CBRFp1QL+Gwq/9hvQQdRFsc1E1wI+Hdfg/E+3UfS0VOWxeHxtCIz0weU+L3IYWL37MNa8l6K4WhzI5PU9U4hByuq+tV/KpOBqPf815ulUH2NTPquJEEI0gVoLoh07dsDZ2RlCoRBCoRBubm4ICXn+QMPKykr4+PjAxMQEAoEAY8eORW6u8h1FGRkZ8Pb2hr6+PszNzbF48WJIpcpfMJGRkXBxcYGOjg46deqEwMDAljg80gTuJ2fhf1P+hGvaFHxu7Y//Wm/FeMkSBC36G5G/NryoKQCYWxvD7+fx+OLkOMw5PAirTk/D9DVe0NHjQVaoVesZKADQ1dIHK2neu8wSzt5Db4O362wXyc3hYjCozvY+xoOQEH4f7Tpb4L/7JmLJibGYc3gQVofPwMSlQ6HN02qOsAkh5LWj1knV1tbWWL9+PTp37gzGGPbu3YuRI0ciLi4OTk5OWLRoEU6ePImjR49CJBLB19cXY8aMwaVLlwAAMpkM3t7esLS0xOXLl5GdnY1p06aBx+Nh7dq1AID79+/D29sb8+bNw8GDB3H27FnMmTMHVlZW8PT0VOfhExU9yS5EwJc159CowqKdCRYEvoddWauQdy9Lsd3Jvi++ODQHJ7ZewaQlC/FL+toa+05qtwCnvr/60nGr4v7NRzAvHQJrgw7ILFNe+NVcvy3sZM7ITcxHN/M3kVQYo9Qu4AnhLpqAgIQwzPnJAzszVqDg3vM5VX16D8Z/Ayfh+6kHIZfXfyaMEEI0HYcx1qr+UhobG+OHH37AuHHjYGZmhkOHDmHcuHEAgNu3b6Nr166Ijo5Gv379EBISgnfffRdZWVmwsLAAAOzcuRNLlizB48ePwefzsWTJEpw8eRJJSUmK95g4cSIKCwsRGqr67drFxcUQiURwN58DbW7LPZumNeBwOHDu3wlWnU1QkF2KGxG3X5mlIz4LfB+/ctfjSUVOjbb+5p7Q+csRcjC4TmmPsIIjyCpPR1t9WwxrMxE3fs3AqV3RzR6jYRsDLNozHmm8G7hadBpyJkNf0VA4cftj86xjKCksg9+u8Sg0e4ALRcGokJbDWdgf/QyG46eP/8LUdR7YWbSk1gVovawmIXeXIS6fSGzyuMVdrFHUUQcltg1fnjN8wCBKqwL/dmaTjEeUVdmIYWf9GMMsb2Oc8EaD/X8vdkF4ThekZ5oBAHQyNOtvGtEssspKpK1dhqKiIgiFwjr7tZo5RDKZDEeOHEFZWRnc3NwQGxsLiUQCd3d3RZ8uXbrAxsYG0dHVX1LR0dHo3r27ohgCAE9PTxQXFyM5OVnR559jPOvzbIy6VFVVobi4WOlHE9m72GB12Ex0+UILhe9dgdnH+Vh+agr+M7aHukNrkI4eHzpt5bUWQwBw5fEZ9B1lj7A9V7F5/F+wODcAHrk+sIwciK0TTrZIMQQAJQVlWD06EPFrSvFW+gwMyvwQKT/IsfLdAOTnFkFSJcUP0w/j7OIM9EmZDPfseXi62xgrhu9BWXElSg3yai2GACAi7zgGfkB3oRFCSEPU/hyixMREuLm5obKyEgKBAEFBQXB0dER8fDz4fD6MjIyU+ltYWCAnp/oLLicnR6kYetb+rK2+PsXFxaioqICenl6tca1btw5ff/11UxziK8vSxgSTNwzExnsLUSmrUGw/hyB8OG85Sp7aIz6y9S5bYmikj0JJ3bfly5kMcq3qM13F+WUI3n6xpUKrVfLVe0i+eq/O9nvJmbiXrHyGxcjUEE8ktRd8AFApq4C2Hp1xIYSQhqj9DJGDgwPi4+Nx9epVzJ8/H9OnT8fNm83/MLyGLF26FEVFRYqfhw8fqjukFjdy0Vs4kL1BqRgCADnkCHzwHd7zc1NTZKopeloKU75Vne06WrpglWr/FWiUvEcFeIPfvs52kY4xKgppAVpCCGmI2r8N+Hw+OnXqhN69e2PdunXo0aMHtmzZAktLS4jFYhQWFir1z83NhaWlJQDA0tKyxl1nz1431EcoFNZ5dggAdHR0FHe/PfvRNBZdhMgsvV9rW5WsEpX6RdAz0GnhqFQnEUuRl1yGDoZda20fZjEeUfuafm5NSyopKIMsWweW+ta1tnubT8XpnQ3PKSGEEE2n9oLo3+RyOaqqqtC7d2/weDycPXtW0ZaSkoKMjAy4uVWfmXBzc0NiYiLy8p4vghkeHg6hUAhHR0dFn3+O8azPszFI3Vgda4g9I5GLoaXd6j5CSvavOI1Jos/Qy/QtcFB96YjP1cE7VpPRNrMnLgX/reYIG2/P5yH40PJrOLZxUWzT1dLHWOu54PxthsTo58ufcDgc6OjRBFpCCPk3tc4hWrp0Kby8vGBjY4OSkhIcOnQIkZGRCAsLg0gkwuzZs/Hpp5/C2NgYQqEQCxYsgJubG/r16wcA8PDwgKOjI6ZOnYrvv/8eOTk5+Oqrr+Dj4wMdneozF/PmzcO2bdvw+eefY9asWTh37hx+++03nDx5Up2H/kooy5GijY4pCqpqzsPhcrRgDEuUFlXUsmfrUVZcgbUTDsDrw//A02MqpBwxOFU8nN+XiKBjx9DKbrJ8KQV5xVg3/hBGzPfGewPnQgoxWLk2zvrH4cqp6s+5kakhJi4fAksnIcpkRTDUMkbqpRz8uv4cqirEaj4CQghRP7UWRHl5eZg2bRqys7MhEong7OyMsLAwDBs2DACwadMmcLlcjB07FlVVVfD09MT27dsV+2tpaeHEiROYP38+3NzcYGBggOnTp2P16tWKPu3bt8fJkyexaNEibNmyBdbW1vj555/pGUQq+GtzNCZv/ATb01bUeKLzu1ZTEXXg1Ti7UlFWhT82R+GPzVHqDqXZlBaW4/C6M8C6mm1tzIT47+EJ2Ju3Dhlpz88WOXXti6VH5mLdxP2oqqB5RoQQzabWguiXX36pt11XVxf+/v7w9/evs4+trS1Onar/oX2DBw9GXJxqTzXWRFwuB8797WHyhhC5DwqQdKX6S/NuwkNc322KJR9tRUj+IWSU3oGpniU82kxE7iUJzhw4rebIW5b7BFc4vGmH/EfFCPrpLCrLlc+s2NhbolNPa1SWiBEbebvVnHmZtGIIAvO+xcMS5TvYkguuQc9UH8Nn98Of2y6oKTpCCGkd1H7bPVEvlyH2GLN0ABKqLiBXloDuWp3wPm8WDi2PwK2Y+4j6NQ43Tt/B0A8GwMXhPRTdLcP+vdHIy6x9QdTXUd9hTpi1yRs3nl5GYmEQTLu3w/ez5yPpVCZ2fX4MxhYizPd/D08NHuCm+CpEHBGWfDoOf5/IxPEt59UaO1eLC7OuAjxMq/12/tgn57HknUlUEBFCNB4VRBrMvpcNhn3piPVp8yFlz548HYWTWgfht/577JlbjszUXJQUlOH4Vs38wrTr0hbTNg3DipgZKJEUKbaHpB+Ez8Bv8MGX3ug60Ao7c7/C46fZivYI/IlR7rPgXdUfJ3deVkfoAABdfT5KpAV1tjMwSLmt40wWIYSoU+u+RYg0q9GfD8CeB2v/UQxVE8sqEfhoPUZ/9paaIms9Zm8Yhd03v1EqhoDqQmJn4iq8/UEPRJb/jscV2TX2PZ61B67j7NV6J15lWRVE2iZ1tmtxtMGV8FowIkIIaZ2oINJQXC4HOmasxhf9M3nlWWjToe7nNGkKIxsd3Cmo/VlFUrkEFShF9JMzde5/syIG9j1tmyu8BsnlDA+uPYWDUe1LrfzH3AtXf09p4agIIaT1oYJIQ3G4HEhY/XcWySFr2vfkvHpLSMhY/TlgTA6pvO48ilkFtHnqvTJ95NuzGGvgi54m/RXbtDjaGGTxLnqUeiL8wDU1RkcIIa0DzSHSUDKpHDpVAmhzebV+oetrCyAvbpqPx8CxPTFkZk9IdMqhxdFGRbYcx767gHtJj5pk/OYkL9OCuX5b5JVn1dquyzWAk3FvJOfH1treVbcvQpOCmzPEBpWXVmLthIPwnjcIXsOmQcIRQ1umg6vHbuOHvYchl9X/AE5CCNEEVBBpsLN74jFy1kwce7SrRtuENz7GqbVXG/0eH6zwAHszCxuyfRWFl0jHGB/+bzn+XB6LxEtpjX6P5vT7mnOY/c0yrL++oMazmIbZvI+MhHyMNJ+DO4WJkMiVJyd3M3oTj5MqUV5S2ZIh16qyvArHfozEsR/VHQkhhLROdMlMg134Ix7yyxb4uP0adBR1hQHPEPZGzljUcQMe/SlH/Pm7jRr/jQ7mMB3AwbFHu5XOQhVV5WNL2hJMWDGwsYfQ7C6HJOBBWBnWuAXC2bQfBDwhbAw7wsf5GwzSH4NvJ/2CX5ddxJJO2/Cm6RAIeEJY6lvj/Td8MFQyDQFf1P+MLEIIIa0DnSHScIfXnoHlARO4z5oAE2shcuMLsCvgDJ7m1D7Z+kV4zOmDkKf7am2TyMW4JY6B05sdkRzTus8S7Vl2HNZ7LDB1zRxMdBSioliM4JXncfV09bIYydH38O27j/D2JBdMcX0HFYViRP34N1LiDqo5ckIIIaqigkhFz9a8kspfv2e2ZKZnI3DFiSYfV7eNFh4+vg+JrPaJyeklKTAw6fxK5DT9zkOseX9Hne0lJWIE74oCal59fG1JpZWQiRlklQ1PlpeJGaTSKnDr+W/9IuMRZfIKMaRlVagslaCU0/CcsMpSCaRlVZBXVF/OlVXSPDLy+pJXVX/OG1q7ksNeh9UtW0BmZibatWun7jAIIYQQ8hIePnwIa2vrOtupIFKRXC5HVlYWDA0NX8nbx1tCcXEx2rVrh4cPH0IoFKo7nFcS5bDxKIdNg/LYeJTDxmuKHDLGUFJSgrZt24LLrXvqNF0yUxGXy623siTPCYVC+uVvJMph41EOmwblsfEoh43X2ByKRKIG+9BdZoQQQgjReFQQEUIIIUTjUUFEmoyOjg5WrlwJHR0ddYfyyqIcNh7lsGlQHhuPcth4LZlDmlRNCCGEEI1HZ4gIIYQQovGoICKEEEKIxqOCiBBCCCEajwoiUq/z589jxIgRaNu2LTgcDo4fP67UXlpaCl9fX1hbW0NPTw+Ojo7YuXOnUp/Kykr4+PjAxMQEAoEAY8eORW5ubgsehXo1lMPc3FzMmDEDbdu2hb6+PoYPH467d5UX1tX0HK5btw59+/aFoaEhzM3NMWrUKKSkpCj1USVHGRkZ8Pb2hr6+PszNzbF48WJIpdKWPBS1USWHu3btwuDBgyEUCsHhcFBYWFhjnPz8fEyZMgVCoRBGRkaYPXs2SktLW+go1K+hPObn52PBggVwcHCAnp4ebGxssHDhQhQVKa8PSZ/F+j+LH330ETp27Ag9PT2YmZlh5MiRuH37tlKfps4hFUSkXmVlZejRowf8/f1rbf/0008RGhqKAwcO4NatW/Dz84Ovry+Cg4MVfRYtWoS//voLR48eRVRUFLKysjBmzJiWOgS1qy+HjDGMGjUK9+7dw59//om4uDjY2trC3d0dZWVlin6ansOoqCj4+PjgypUrCA8Ph0QigYeHxwvlSCaTwdvbG2KxGJcvX8bevXsRGBiIFStWqOOQWpwqOSwvL8fw4cOxbNmyOseZMmUKkpOTER4ejhMnTuD8+fOYO3duSxxCq9BQHrOyspCVlYUNGzYgKSkJgYGBCA0NxezZsxVj0Gex4c9i7969ERAQgFu3biEsLAyMMXh4eED2/2tjNksOGSEqAsCCgoKUtjk5ObHVq1crbXNxcWFffvklY4yxwsJCxuPx2NGjRxXtt27dYgBYdHR0s8fc2vw7hykpKQwAS0pKUmyTyWTMzMyM7d69mzFGOaxNXl4eA8CioqIYY6rl6NSpU4zL5bKcnBxFnx07djChUMiqqqpa9gBagX/n8J8iIiIYAFZQUKC0/ebNmwwAu3btmmJbSEgI43A47NGjR80dcqtUXx6f+e233xifz2cSiYQxRp/Ff1MlhwkJCQwAS01NZYw1Tw7pDBFplP79+yM4OBiPHj0CYwwRERG4c+cOPDw8AACxsbGQSCRwd3dX7NOlSxfY2NggOjpaXWG3GlVVVQAAXV1dxTYulwsdHR1cvHgRAOWwNs8uPxgbGwNQLUfR0dHo3r07LCwsFH08PT1RXFyM5OTkFoy+dfh3DlURHR0NIyMj9OnTR7HN3d0dXC4XV69ebfIYXwWq5LGoqAhCoRDa2tWrZdFnUVlDOSwrK0NAQADat2+vWGS9OXJIBRFplK1bt8LR0RHW1tbg8/kYPnw4/P39MXDgQABATk4O+Hw+jIyMlPazsLBATk6OGiJuXZ59aS9duhQFBQUQi8X47rvvkJmZiezsbACUw3+Ty+Xw8/PDW2+9hW7dugFQLUc5OTlKfzyftT9r0yS15VAVOTk5MDc3V9qmra0NY2NjjcshoFoenzx5gjVr1ihdVqTP4nP15XD79u0QCAQQCAQICQlBeHg4+Hw+gObJIRVEpFG2bt2KK1euIDg4GLGxsdi4cSN8fHxw5swZdYf2SuDxePjjjz9w584dGBsbQ19fHxEREfDy8qp3VWZN5uPjg6SkJBw5ckTdobyyKIdNo6E8FhcXw9vbG46Ojli1alXLBveKqC+HU6ZMQVxcHKKiomBvb48JEyagsrKy2WKh1e7JS6uoqMCyZcsQFBQEb29vAICzszPi4+OxYcMGuLu7w9LSEmKxGIWFhUr/956bmwtLS0s1Rd669O7dG/Hx8SgqKoJYLIaZmRlcXV0VlyUoh8/5+voqJvJaW1srtquSI0tLS8TExCiN9+wuNE3KY105VIWlpSXy8vKUtkmlUuTn52tUDoGG81hSUoLhw4fD0NAQQUFB4PF4ijb6LFZrKIcikQgikQidO3dGv3790KZNGwQFBWHSpEnNkkP6X1Dy0iQSCSQSSY0zGVpaWpDL5QCqv+x5PB7Onj2raE9JSUFGRgbc3NxaNN7WTiQSwczMDHfv3sX169cxcuRIAJRDoPpuPF9fXwQFBeHcuXNo3769UrsqOXJzc0NiYqLSF3p4eDiEQiEcHR1b5kDUqKEcqsLNzQ2FhYWIjY1VbDt37hzkcjlcXV2bMtxWS5U8FhcXw8PDA3w+H8HBwUpzBAH6LL7MZ5ExBsaYYt5ls+TwpaZiE41RUlLC4uLiWFxcHAPAfvzxRxYXF8cePHjAGGNs0KBBzMnJiUVERLB79+6xgIAApqury7Zv364YY968eczGxoadO3eOXb9+nbm5uTE3Nzd1HVKLayiHv/32G4uIiGBpaWns+PHjzNbWlo0ZM0ZpDE3P4fz585lIJGKRkZEsOztb8VNeXq7o01COpFIp69atG/Pw8GDx8fEsNDSUmZmZsaVLl6rjkFqcKjnMzs5mcXFxbPfu3QwAO3/+PIuLi2NPnz5V9Bk+fDjr1asXu3r1Krt48SLr3LkzmzRpkjoOSS0aymNRURFzdXVl3bt3Z6mpqUp9pFIpY4w+iw3lMC0tja1du5Zdv36dPXjwgF26dImNGDGCGRsbs9zcXMZY8+SQCiJSr2e33/77Z/r06Yyx6j+gM2bMYG3btmW6urrMwcGBbdy4kcnlcsUYFRUV7OOPP2Zt2rRh+vr6bPTo0Sw7O1tNR9TyGsrhli1bmLW1NePxeMzGxoZ99dVXNW4b1fQc1pY/ACwgIEDRR5UcpaenMy8vL6anp8dMTU3ZZ599prgV+nWnSg5XrlzZYJ+nT5+ySZMmMYFAwIRCIZs5cyYrKSlp+QNSk4byWNfvOwB2//59xTj0Waw7h48ePWJeXl7M3Nyc8Xg8Zm1tzSZPnsxu376tNE5T55BWuyeEEEKIxqM5RIQQQgjReFQQEUIIIUTjUUFECCGEEI1HBREhhBBCNB4VRIQQQgjReFQQEUIIIUTjUUFECCGEEI1HBREhhBBCNB4VRIQQANVrBc2dOxfGxsbgcDgwMjKCn5+fot3Ozg6bN29WW3wvgsPh4Pjx4+oOAwCwatUq9OzZU91hEEIaQAURIQQAEBoaisDAQJw4cQLZ2dno1q2bUvu1a9cwd+5cNUX3amhNhRgh5MVoqzsAQkjrkJaWBisrK/Tv3x8AoK2t/OfBzMxMHWHVIBaLwefz1R0GIeQ1Q2eICCGYMWMGFixYgIyMDHA4HNjZ2dXo8+9LZhwOBzt27ICXlxf09PTQoUMH/P7774r29PR0cDgcHDlyBP3794euri66deuGqKgopXGTkpLg5eUFgUAACwsLTJ06FU+ePFG0Dx48GL6+vvDz84OpqSk8PT1f+PgePnyICRMmwMjICMbGxhg5ciTS09OVjn/UqFHYsGEDrKysYGJiAh8fH0gkEkWf7OxseHt7Q09PD+3bt8ehQ4eUcvIsZ6NHj641h/v374ednR1EIhEmTpyIkpISlWIfPHgwFixYAD8/P7Rp0wYWFhbYvXs3ysrKMHPmTBgaGqJTp04ICQlR7BMZGQkOh4OwsDD06tULenp6GDJkCPLy8hASEoKuXbtCKBRi8uTJKC8vf+F8EvI6ooKIEIItW7Zg9erVsLa2RnZ2Nq5du6bSfsuXL8fYsWORkJCAKVOmYOLEibh165ZSn8WLF+Ozzz5DXFwc3NzcMGLECDx9+hQAUFhYiCFDhqBXr164fv06QkNDkZubiwkTJiiNsXfvXvD5fFy6dAk7d+58oWOTSCTw9PSEoaEhLly4gEuXLkEgEGD48OEQi8WKfhEREUhLS0NERAT27t2LwMBABAYGKtqnTZuGrKwsREZG4tixY9i1axfy8vIU7c9yFhAQUCOHaWlpOH78OE6cOIETJ04gKioK69evV/kY9u7dC1NTU8TExGDBggWYP38+xo8fj/79++PGjRvw8PDA1KlTaxQ3q1atwrZt23D58mVFUbh582YcOnQIJ0+exOnTp7F169YXyichry1GCCGMsU2bNjFbW1vF60GDBrFPPvlE8drW1pZt2rRJ8RoAmzdvntIYrq6ubP78+Ywxxu7fv88AsPXr1yvaJRIJs7a2Zt999x1jjLE1a9YwDw8PpTEePnzIALCUlBRFHL169XqhYwHAgoKCGGOM7d+/nzk4ODC5XK5or6qqYnp6eiwsLIwxxtj06dOZra0tk0qlij7jx49n77//PmOMsVu3bjEA7Nq1a4r2u3fvMgA1cvLsfZ9ZuXIl09fXZ8XFxYptixcvZq6uriody6BBg9iAAQMUr6VSKTMwMGBTp05VbMvOzmYAWHR0NGOMsYiICAaAnTlzRtFn3bp1DABLS0tTbPvoo4+Yp6enSnEQ8rqjOUSEkJfm5uZW43V8fHydfbS1tdGnTx/FWaSEhARERERAIBDUGDstLQ329vYAgN69e790jAkJCUhNTYWhoaHS9srKSqSlpSleOzk5QUtLS/HaysoKiYmJAICUlBRoa2vDxcVF0d6pUye0adNGpRjs7OyU3t/Kykrp7FJDnJ2dFf+upaUFExMTdO/eXbHNwsICAGqM+c/9LCwsoK+vjw4dOihti4mJUTkOQl5nVBARQtSmtLQUI0aMwHfffVejzcrKSvHvBgYGjXqP3r174+DBgzXa/jlRnMfjKbVxOBzI5fKXft9/auzYte3/z20cDgcAaoz57z7NeYyEvOpoDhEh5KVduXKlxuuuXbvW2UcqlSI2NlbRx8XFBcnJybCzs0OnTp2UfhpTBP2Ti4sL7t69C3Nz8xrvIRKJVBrDwcEBUqkUcXFxim2pqakoKChQ6sfj8SCTyZokbkJIy6KCiBDy0o4ePYo9e/bgzp07WLlyJWJiYuDr66vUx9/fH0FBQbh9+zZ8fHxQUFCAWbNmAQB8fHyQn5+PSZMm4dq1a0hLS0NYWBhmzpzZZIXFlClTYGpqipEjR+LChQu4f/8+IiMjsXDhQmRmZqo0RpcuXeDu7o65c+ciJiYGcXFxmDt3LvT09BRnZ4DqS2Nnz55FTk5OjWKJENK6UUFECHlpX3/9NY4cOQJnZ2fs27cPhw8fhqOjo1Kf9evXY/369ejRowcuXryI4OBgmJqaAgDatm2LS5cuQSaTwcPDA927d4efnx+MjIzA5TbNnyd9fX2cP38eNjY2GDNmDLp27YrZs2ejsrISQqFQ5XH27dsHCwsLDBw4EKNHj8aHH34IQ0ND6OrqKvps3LgR4eHhaNeuHXr16tUk8RNCWgaHMcbUHQQh5NXD4XAQFBSEUaNG1dqenp6O9u3bIy4u7rVcuiIzMxPt2rXDmTNnMHToUHWHQwhpJJpUTQghKjh37hxKS0vRvXt3ZGdn4/PPP4ednR0GDhyo7tAIIU2ALpkRQl4pBw8ehEAgqPXHycmp2d5XIpFg2bJlcHJywujRo2FmZobIyMgad269iIyMjDqPRSAQICMjowmPgBBSH7pkRgh5pZSUlCA3N7fWNh6PB1tb2xaO6OVJpVKlJUT+zc7OrsaacoSQ5kEFESGEEEI0Hl0yI4QQQojGo4KIEEIIIRqPCiJCCCGEaDwqiAghhBCi8aggIoQQQojGo4KIEEIIIRqPCiJCCCGEaDwqiAghhBCi8f4PIdIxKjU89aAAAAAASUVORK5CYII=", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from sklearn.inspection import DecisionBoundaryDisplay\n", + "\n", + "f1 = feature_names[2]\n", + "f2 = feature_names[3]\n", + "\n", + "clf = DecisionTreeClassifier()\n", + "clf.fit(X_train[[f1, f2]], y_train)\n", + "\n", + "d = DecisionBoundaryDisplay.from_estimator(clf, X_train[[f1, f2]])\n", + "\n", + "# labels = [class_names[i] for i in y_train]\n", + "sns.scatterplot(X_train, x=f1, y=f2, hue=y_train, palette='husl')\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "175ce92d-b61d-425c-a2dc-480394f26074", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from sklearn import svm\n", + "from sklearn import preprocessing\n", + "\n", + "# For SVM - it's important that we normalise input, as features have different magnitudes.\n", + "# If we didn't do this, the model would only look at body mass (measured in 1000's of grams) and ignore bill length/depth (measured in 10's mm's)\n", + "\n", + "scalar = preprocessing.StandardScaler()\n", + "scalar.fit(X_train)\n", + "X_train_scaled = pd.DataFrame(scalar.transform(X_train), columns=X_train.columns, index=X_train.index)\n", + "X_test_scaled = pd.DataFrame(scalar.transform(X_test), columns=X_test.columns, index=X_test.index)" + ] + }, + { + "cell_type": "code", + "execution_count": 77, + "id": "a4ed72f5-4050-470b-a687-39d502a41070", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.9565217391304348" + ] + }, + "execution_count": 77, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sklearn import svm\n", + "\n", + "SVM = svm.SVC(kernel='poly', degree=3, C=1.5)\n", + "SVM.fit(X_train_scaled, y_train)\n", + "\n", + "SVM.score(X_test_scaled, y_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 78, + "id": "d2def741-ebae-4438-913c-7b2b73ca5cd9", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "x2 = X_train_scaled[[feature_names[0], feature_names[1]]]\n", + "\n", + "SVM = svm.SVC(kernel='poly', degree=3, C=1.5)\n", + "SVM.fit(x2, y_train)\n", + "\n", + "DecisionBoundaryDisplay.from_estimator(SVM, x2) #, ax=ax)\n", + "sns.scatterplot(x2, x=feature_names[0], y=feature_names[1], hue=dataset['species'])\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "e098f962-0c27-4725-bbd1-d8bd092e4c83", + "metadata": {}, + "outputs": [], + "source": [ + "# Decision trees tend to overfit on data with a large number of features. Getting the right ratio of samples to number of features is important, since a tree with few samples in high dimensional space is very likely to overfit.\n", + "# TODO - cross validation vs. feature count" + ] + }, + { + "cell_type": "markdown", + "id": "3aa1bc5f-ef9c-4e88-b889-2f15d26057f9", + "metadata": {}, + "source": [ + "## Tuning hyper-parameters" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "id": "08f8d620-9319-4fc0-8a60-8ccb329fb029", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "max_depths = [1, 2, 3, 4, 5] # Would need more samples to go past 5\n", + "min_leafs = [1, 3, 5, 7, 9]\n", + "\n", + "accuracy = []\n", + "for i, d in enumerate(max_depths):\n", + " for j, m in enumerate(min_leafs):\n", + " clf = DecisionTreeClassifier(max_depth=d, min_samples_leaf=m)\n", + " clf.fit(X_train, y_train)\n", + " acc = clf.score(X_test, y_test)\n", + "\n", + " accuracy.append((d, m, acc))\n", + "\n", + "acc_df = pd.DataFrame(accuracy, columns=['depth', 'leaf_min', 'accuracy'])" + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "id": "c0a5a956-3535-4b8c-a705-27b6eb552b83", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "max_depths = [1, 2, 3, 4, 5] # Would need more samples to go past 5\n", + "\n", + "accuracy = []\n", + "for i, d in enumerate(max_depths):\n", + " clf = DecisionTreeClassifier(max_depth=d)\n", + " clf.fit(X_train, y_train)\n", + " acc = clf.score(X_test, y_test)\n", + "\n", + " accuracy.append((d, acc))\n", + "\n", + "acc_df = pd.DataFrame(accuracy, columns=['depth', 'accuracy'])" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "id": "fbf8ef63-a5f8-4880-9049-e9f645853bf4", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "Text(0, 0.5, 'Accuracy')" + ] + }, + "execution_count": 67, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "sns.lineplot(acc_df, x='depth', y='accuracy')\n", + "plt.xlabel('Tree depth')\n", + "plt.ylabel('Accuracy')" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "73bd3e01-aa97-420c-8f84-91d511d3f33a", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
DecisionTreeClassifier(max_depth=5)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ], + "text/plain": [ + "DecisionTreeClassifier(max_depth=5)" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "X_exp, X_val, y_exp, y_val = train_test_split(X, Y, test_size=0.2, random_state=0)\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(X_exp, y_exp, test_size=0.2, random_state=0)\n", + "\n", + "clf = DecisionTreeClassifier(max_depth=5, min_samples_leaf=1)\n", + "clf.fit(X_train, y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "8b5eb8da-022a-4469-a34c-bfecfa0d5444", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from sklearn.model_selection import cross_val_score\n", + "\n", + "clf = DecisionTreeClassifier(max_depth=5, min_samples_leaf=1)\n", + "scores = cross_val_score(clf, X, Y, cv=5)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "3776bdaf-41a6-4e8c-b413-fb62325686cb", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0.9618499573742542" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "scores.mean()" + ] + }, + { + "cell_type": "markdown", + "id": "3fb7c148-b926-40be-bb3e-71ff4ee5cf27", + "metadata": {}, + "source": [ + "# For September\n", + "- Tune SVM hyper params as below - from this we'd pick D=3, C=1\n", + "- Use SVM to introduce cross-validation, use figure from https://scikit-learn.org/stable/modules/cross_validation.html in text\n", + "- Use validation split to test performance of SVM(3,1) vs. DC(2) from above to pick the best one\n", + "\n", + "Note that the current code breaks the below if run in order - i.e.: y_train is the result of t/t/v splitting, not the original" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "cc367f34-21cf-4dc1-9bcc-d959ea3992e3", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "## SVM hyper-parameter tuning" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "a48b4ea7-ef9a-4216-8808-2c003c893841", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "164 Chinstrap\n", + "189 Chinstrap\n", + "44 Adelie\n", + "234 Gentoo\n", + "1 Adelie\n", + " ... \n", + "166 Chinstrap\n", + "96 Adelie\n", + "271 Gentoo\n", + "312 Gentoo\n", + "188 Chinstrap\n", + "Name: species, Length: 218, dtype: object" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_train" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "id": "de538b42-a005-43f2-b535-90bb9f7c6a96", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
SVC(C=1.2, kernel='poly')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ], + "text/plain": [ + "SVC(C=1.2, kernel='poly')" + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sklearn import svm\n", + "\n", + "x2 = X_train_scaled[[feature_names[1], feature_names[3]]]\n", + "\n", + "SVM = svm.SVC(kernel='poly', degree=3, C=1.2)\n", + "SVM.fit(x2, y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "id": "a545fd49-683d-4dc8-9d2b-3056bf99db32", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "Cs = np.linspace(0.2, 5.0, 20)\n", + "Degrees = [2, 3, 4, 5, 6]\n", + "\n", + "scores = []\n", + "for C in Cs:\n", + " for D in Degrees:\n", + " clf = svm.SVC(kernel='poly', degree=D, C=C)\n", + " # x1, x2, y1, y2 = train_test_split(X_train, y_train, test_size=0.2)\n", + " # clf.fit(x1, y1)\n", + " # scores.append((C, D, clf.score(x2, y2)))\n", + " cv_scores = cross_val_score(clf, X_train_scaled, y_train)\n", + " scores.append((C, D, cv_scores.mean()))\n", + " \n", + "df = pd.DataFrame(scores, columns=['C', 'degree', 'accuracy'])" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "id": "a489ae8a-fffc-4a9f-b21d-4a4b48eae494", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 62, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "sns.lineplot(df, x='C', y='accuracy', hue='degree', dashes=[4, 2], palette='Set1')" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "id": "298ff2fe-d4da-42b7-ac09-74b8e12f162d", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "array([0.98181818, 1. , 0.98181818, 0.94444444, 1. ])" + ] + }, + "execution_count": 52, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "clf = svm.SVC(kernel='poly', degree=3, C=1.0)\n", + "cross_val_score(clf, X_train_scaled, y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "id": "3ef6bd6f-ec29-4b87-9bef-c12907cbe13c", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0.8695652173913043" + ] + }, + "execution_count": 57, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "clf = svm.SVC(kernel='poly', degree=2, C=1)\n", + "clf.fit(X_train_scaled, y_train)\n", + "clf.score(X_test_scaled, y_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "id": "57bde67f-2591-4d37-b177-2ec4ca44416a", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from sklearn import svm\n", + "\n", + "x2 = X_train_scaled[[feature_names[0], feature_names[1]]]\n", + "\n", + "SVM = svm.SVC(kernel='poly', degree=3, C=1)\n", + "SVM.fit(x2, y_train)\n", + "\n", + "fig = plt.figure()\n", + "ax = fig.gca()\n", + "DecisionBoundaryDisplay.from_estimator(SVM, x2, ax=ax)\n", + "sns.scatterplot(x2, x=feature_names[0], y=feature_names[1], hue=y_train)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "03e011f4-919a-4097-8839-fd1f6906ec48", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.1" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/code/Ensemble-methods-random-forests.ipynb b/code/Ensemble-methods-random-forests.ipynb new file mode 100644 index 0000000..0e700cc --- /dev/null +++ b/code/Ensemble-methods-random-forests.ipynb @@ -0,0 +1,905 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 79, + "id": "165d67d2-a6e4-4143-a7b2-e40386af4442", + "metadata": {}, + "outputs": [], + "source": [ + "# import libraries\n", + "import seaborn as sns\n", + "import pandas as pd\n", + "import numpy as np\n" + ] + }, + { + "cell_type": "markdown", + "id": "58a0e2bc-94b0-4814-a72f-333f4417334d", + "metadata": {}, + "source": [ + "## Load Penguins dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 80, + "id": "e6186217-2de4-48eb-b448-e2b4a35a4025", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
bill_length_mmbill_depth_mmflipper_length_mmbody_mass_g
039.118.7181.03750.0
139.517.4186.03800.0
240.318.0195.03250.0
436.719.3193.03450.0
539.320.6190.03650.0
...............
33847.213.7214.04925.0
34046.814.3215.04850.0
34150.415.7222.05750.0
34245.214.8212.05200.0
34349.916.1213.05400.0
\n", + "

342 rows × 4 columns

\n", + "
" + ], + "text/plain": [ + " bill_length_mm bill_depth_mm flipper_length_mm body_mass_g\n", + "0 39.1 18.7 181.0 3750.0\n", + "1 39.5 17.4 186.0 3800.0\n", + "2 40.3 18.0 195.0 3250.0\n", + "4 36.7 19.3 193.0 3450.0\n", + "5 39.3 20.6 190.0 3650.0\n", + ".. ... ... ... ...\n", + "338 47.2 13.7 214.0 4925.0\n", + "340 46.8 14.3 215.0 4850.0\n", + "341 50.4 15.7 222.0 5750.0\n", + "342 45.2 14.8 212.0 5200.0\n", + "343 49.9 16.1 213.0 5400.0\n", + "\n", + "[342 rows x 4 columns]" + ] + }, + "execution_count": 80, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "penguins = sns.load_dataset('penguins')\n", + "\n", + "feature_names = ['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']\n", + "penguins.dropna(subset=feature_names, inplace=True)\n", + "\n", + "species_names = penguins['species'].unique()\n", + "\n", + "# Define data and targets\n", + "X = penguins[feature_names]\n", + "\n", + "y = penguins.species\n", + "\n", + "X" + ] + }, + { + "cell_type": "code", + "execution_count": 81, + "id": "44c058d0-58ab-4c29-9003-9affa4f32d94", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "train size: (273, 4)\n", + "test size: (69, 4)\n" + ] + } + ], + "source": [ + "# Split data in training and test set\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5)\n", + "\n", + "print(f'train size: {X_train.shape}')\n", + "print(f'test size: {X_test.shape}')" + ] + }, + { + "cell_type": "markdown", + "id": "74bad068-b077-4106-a396-d2baff903ec7", + "metadata": {}, + "source": [ + "## Generate a decision tree estimator for comparison" + ] + }, + { + "cell_type": "code", + "execution_count": 101, + "id": "e42b6455-d07b-47a2-bb0d-53d40d82b927", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0.9420289855072463\n" + ] + } + ], + "source": [ + "from sklearn.tree import DecisionTreeClassifier\n", + "\n", + "\n", + "clf = DecisionTreeClassifier()\n", + "\n", + "clf.fit(X_train, y_train)\n", + "\n", + "clf.predict(X_test)\n", + "\n", + "print(clf.score(X_test, y_test))" + ] + }, + { + "cell_type": "code", + "execution_count": 138, + "id": "9fd08221-7712-452f-9039-49614d3995e6", + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.inspection import DecisionBoundaryDisplay\n", + "f1 = feature_names[0]\n", + "f2 = feature_names[3]\n", + "\n", + "\n", + "clf = DecisionTreeClassifier(max_depth=7, min_samples_leaf=1, random_state=5)\n", + "clf.fit(X_train[[f1, f2]], y_train)\n", + "\n", + "d = DecisionBoundaryDisplay.from_estimator(clf, X_train[[f1, f2]])\n", + "\n", + "sns.scatterplot(X_train, x=f1, y=f2, hue=y_train, palette=\"husl\")\n", + "plt.show()\n", + "\n", + "## Using Random Forest for classification \n", + "\n", + "We'll now take a look how we can use ensemble methods to perform a classification task such as identifying penguin species! We're going to use a Random forest classifier available in scikit-learn. Random forests are built on decision trees and can provide another way to address over-fitting. Rather than classifying based on one single decision tree (which could overfit the data), an average of results of many trees can be derived for more robust/accurate estimates compared against single trees used in the ensemble. " + ] + }, + { + "cell_type": "code", + "execution_count": 103, + "id": "4502fc0a-1f37-4988-9ed9-30f78f0d98b1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
RandomForestClassifier(max_depth=7)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ], + "text/plain": [ + "RandomForestClassifier(max_depth=7)" + ] + }, + "execution_count": 103, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sklearn.ensemble import RandomForestClassifier\n", + "from sklearn.tree import plot_tree\n", + "\n", + "clf = RandomForestClassifier(n_estimators=100, max_depth=7, min_samples_leaf=1) #extra parameter called n_estimators which is number of trees in the forest\n", + "\n", + "clf.fit(X_train, y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 104, + "id": "2c7b824e-6621-4107-8957-bf83652a279d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.9855072463768116" + ] + }, + "execution_count": 104, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "clf.score(X_test, y_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 137, + "id": "8a267659-9a78-410c-8f4e-a03f9f0b1b64", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import matplotlib.pyplot as plt\n", + "\n", + "fig, axes = plt.subplots(nrows=1, ncols=5 ,figsize=(12,6))\n", + "\n", + "# plot first 5 trees in forest\n", + "for index in range(0, 5):\n", + " plot_tree(clf.estimators_[index], \n", + " class_names=species_names,\n", + " feature_names=feature_names, \n", + " filled=True, \n", + " ax=axes[index])\n", + "\n", + " axes[index].set_title(f'Tree: {index}')\n", + " \n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "413a3c33-f53e-44ae-92b1-504c6fb6d73f", + "metadata": {}, + "source": [ + "## Potential questions\n", + "* What does the parameter n_estimators control?\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 139, + "id": "74fb702a-e3cc-4ce5-9d9c-7df863215fa4", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# plot classification space for body mass and bill length with random forest\n", + "clf = RandomForestClassifier(n_estimators=100, max_depth=7, min_samples_leaf=1, random_state=5)\n", + "clf.fit(X_train[[f1, f2]], y_train)\n", + "\n", + "d = DecisionBoundaryDisplay.from_estimator(clf, X_train[[f1, f2]])\n", + "\n", + "fig = sns.scatterplot(X_train, x=f1, y=f2, hue=y_train, palette=\"husl\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "6e52280b-b7a7-4dd4-b791-d7dfc5cc2614", + "metadata": {}, + "source": [ + "## Random forests for regression problems (continuous data) ***MAYBE INLCUDE THIS?***\n", + "\n", + "Ensemble methods can also be useful for continuous data to perform regression problems. We'll take a look at how random forests can be utilised with scikit-learn when working with continuous data. " + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "74497146-3c9b-4822-a41f-b5952cf357c2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[]" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "stock = sns.load_dataset('dowjones')\n", + "\n", + "# convert date to unix time \n", + "stock['date_int'] = stock.Date.apply(lambda x: x.timestamp())\n", + "\n", + "# define X and y \n", + "X = stock.date_int.to_numpy()\n", + "y = stock.Price.to_numpy()\n", + "\n", + "# define X as a 2d array\n", + "X = X[:,None]\n", + "\n", + "plt.plot(X, y, 'o')" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "7663135b-80fb-40eb-a6a8-9abee3385ed3", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[]" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# define classifier \n", + "from sklearn.ensemble import RandomForestRegressor\n", + "\n", + "clf = RandomForestRegressor()\n", + "\n", + "# fit \n", + "clf.fit(X, y)\n", + "pred = clf.predict(X)\n", + "\n", + "# plot subset \n", + "plt.plot(X[:50], y[:50], 'o')\n", + "plt.plot(X[:50], pred[:50], '-r')" + ] + }, + { + "cell_type": "markdown", + "id": "5bcb52ba-c44b-46a1-a771-6f3c52436560", + "metadata": {}, + "source": [ + "## Stacking up some regression problems\n", + "\n", + "The diabetes dataset available in sk-learn contains 10 baseline variables from 442 diabetes patients (features) and a quantitative measure of disease progression after one year after baseline (include weblink). We can apply ensemble methods for regression problems. " + ] + }, + { + "cell_type": "code", + "execution_count": 143, + "id": "f9980656-1fc4-4355-981f-ac18fb71e1d4", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'data': array([[ 0.03807591, 0.05068012, 0.06169621, ..., -0.00259226,\n", + " 0.01990749, -0.01764613],\n", + " [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,\n", + " -0.06833155, -0.09220405],\n", + " [ 0.08529891, 0.05068012, 0.04445121, ..., -0.00259226,\n", + " 0.00286131, -0.02593034],\n", + " ...,\n", + " [ 0.04170844, 0.05068012, -0.01590626, ..., -0.01107952,\n", + " -0.04688253, 0.01549073],\n", + " [-0.04547248, -0.04464164, 0.03906215, ..., 0.02655962,\n", + " 0.04452873, -0.02593034],\n", + " [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,\n", + " -0.00422151, 0.00306441]]), 'target': array([151., 75., 141., 206., 135., 97., 138., 63., 110., 310., 101.,\n", + " 69., 179., 185., 118., 171., 166., 144., 97., 168., 68., 49.,\n", + " 68., 245., 184., 202., 137., 85., 131., 283., 129., 59., 341.,\n", + " 87., 65., 102., 265., 276., 252., 90., 100., 55., 61., 92.,\n", + " 259., 53., 190., 142., 75., 142., 155., 225., 59., 104., 182.,\n", + " 128., 52., 37., 170., 170., 61., 144., 52., 128., 71., 163.,\n", + " 150., 97., 160., 178., 48., 270., 202., 111., 85., 42., 170.,\n", + " 200., 252., 113., 143., 51., 52., 210., 65., 141., 55., 134.,\n", + " 42., 111., 98., 164., 48., 96., 90., 162., 150., 279., 92.,\n", + " 83., 128., 102., 302., 198., 95., 53., 134., 144., 232., 81.,\n", + " 104., 59., 246., 297., 258., 229., 275., 281., 179., 200., 200.,\n", + " 173., 180., 84., 121., 161., 99., 109., 115., 268., 274., 158.,\n", + " 107., 83., 103., 272., 85., 280., 336., 281., 118., 317., 235.,\n", + " 60., 174., 259., 178., 128., 96., 126., 288., 88., 292., 71.,\n", + " 197., 186., 25., 84., 96., 195., 53., 217., 172., 131., 214.,\n", + " 59., 70., 220., 268., 152., 47., 74., 295., 101., 151., 127.,\n", + " 237., 225., 81., 151., 107., 64., 138., 185., 265., 101., 137.,\n", + " 143., 141., 79., 292., 178., 91., 116., 86., 122., 72., 129.,\n", + " 142., 90., 158., 39., 196., 222., 277., 99., 196., 202., 155.,\n", + " 77., 191., 70., 73., 49., 65., 263., 248., 296., 214., 185.,\n", + " 78., 93., 252., 150., 77., 208., 77., 108., 160., 53., 220.,\n", + " 154., 259., 90., 246., 124., 67., 72., 257., 262., 275., 177.,\n", + " 71., 47., 187., 125., 78., 51., 258., 215., 303., 243., 91.,\n", + " 150., 310., 153., 346., 63., 89., 50., 39., 103., 308., 116.,\n", + " 145., 74., 45., 115., 264., 87., 202., 127., 182., 241., 66.,\n", + " 94., 283., 64., 102., 200., 265., 94., 230., 181., 156., 233.,\n", + " 60., 219., 80., 68., 332., 248., 84., 200., 55., 85., 89.,\n", + " 31., 129., 83., 275., 65., 198., 236., 253., 124., 44., 172.,\n", + " 114., 142., 109., 180., 144., 163., 147., 97., 220., 190., 109.,\n", + " 191., 122., 230., 242., 248., 249., 192., 131., 237., 78., 135.,\n", + " 244., 199., 270., 164., 72., 96., 306., 91., 214., 95., 216.,\n", + " 263., 178., 113., 200., 139., 139., 88., 148., 88., 243., 71.,\n", + " 77., 109., 272., 60., 54., 221., 90., 311., 281., 182., 321.,\n", + " 58., 262., 206., 233., 242., 123., 167., 63., 197., 71., 168.,\n", + " 140., 217., 121., 235., 245., 40., 52., 104., 132., 88., 69.,\n", + " 219., 72., 201., 110., 51., 277., 63., 118., 69., 273., 258.,\n", + " 43., 198., 242., 232., 175., 93., 168., 275., 293., 281., 72.,\n", + " 140., 189., 181., 209., 136., 261., 113., 131., 174., 257., 55.,\n", + " 84., 42., 146., 212., 233., 91., 111., 152., 120., 67., 310.,\n", + " 94., 183., 66., 173., 72., 49., 64., 48., 178., 104., 132.,\n", + " 220., 57.]), 'frame': None, 'DESCR': '.. _diabetes_dataset:\\n\\nDiabetes dataset\\n----------------\\n\\nTen baseline variables, age, sex, body mass index, average blood\\npressure, and six blood serum measurements were obtained for each of n =\\n442 diabetes patients, as well as the response of interest, a\\nquantitative measure of disease progression one year after baseline.\\n\\n**Data Set Characteristics:**\\n\\n :Number of Instances: 442\\n\\n :Number of Attributes: First 10 columns are numeric predictive values\\n\\n :Target: Column 11 is a quantitative measure of disease progression one year after baseline\\n\\n :Attribute Information:\\n - age age in years\\n - sex\\n - bmi body mass index\\n - bp average blood pressure\\n - s1 tc, total serum cholesterol\\n - s2 ldl, low-density lipoproteins\\n - s3 hdl, high-density lipoproteins\\n - s4 tch, total cholesterol / HDL\\n - s5 ltg, possibly log of serum triglycerides level\\n - s6 glu, blood sugar level\\n\\nNote: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times the square root of `n_samples` (i.e. the sum of squares of each column totals 1).\\n\\nSource URL:\\nhttps://www4.stat.ncsu.edu/~boos/var.select/diabetes.html\\n\\nFor more information see:\\nBradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) \"Least Angle Regression,\" Annals of Statistics (with discussion), 407-499.\\n(https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)\\n', 'feature_names': ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'], 'data_filename': 'diabetes_data_raw.csv.gz', 'target_filename': 'diabetes_target.csv.gz', 'data_module': 'sklearn.datasets.data'}\n" + ] + } + ], + "source": [ + "from sklearn.datasets import load_diabetes\n", + "\n", + "print(load_diabetes())" + ] + }, + { + "cell_type": "code", + "execution_count": 144, + "id": "a3b574d7-4202-4629-aa03-4e5bb2ac4d25", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "train size: (353, 10)\n", + "test size: (89, 10)\n" + ] + } + ], + "source": [ + "X, y = load_diabetes(return_X_y=True)\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5)\n", + "\n", + "print(f'train size: {X_train.shape}')\n", + "print(f'test size: {X_test.shape}')" + ] + }, + { + "cell_type": "code", + "execution_count": 145, + "id": "e27c9e7c-067d-43b3-bae7-2bd6c42453e6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.5260997590686997" + ] + }, + "execution_count": 145, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# define classifier \n", + "from sklearn.ensemble import RandomForestRegressor\n", + "\n", + "clf = RandomForestRegressor()\n", + "\n", + "clf.fit(X_train, y_train)\n", + "\n", + "pred = clf.predict(X_test)\n", + "\n", + "clf.score(X_test, y_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 146, + "id": "261c727c-6e31-4fad-876c-d13f492e3079", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Text(0.5, 1.0, 'Regressor predictions')" + ] + }, + "execution_count": 146, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.figure()\n", + "\n", + "plt.plot(pred[:20], \"gd\", label=\"RandomForestRegressor\") # plot first 20 predictions to make figure clearer.\n", + "\n", + "plt.tick_params(axis=\"x\", which=\"both\", bottom=False, top=False, labelbottom=False)\n", + "plt.ylabel(\"predicted\")\n", + "plt.xlabel(\"training samples\")\n", + "plt.legend(loc=\"best\")\n", + "plt.title(\"Regressor predictions\")" + ] + }, + { + "cell_type": "markdown", + "id": "0f52dd97-e858-4f25-b33b-90496119886d", + "metadata": {}, + "source": [ + "We can also take this one step further. In the same way the a RandomForest estimator uses the average of a series of trees to generate a results, we can combine the results from a series of different estimators. This is done using whats called an ensemble meta-estimator called VotingRegressor. We'll apply a Voting regressor to a random forest, gradient boosting and linear regressor. VotingRegressor can fit several base estimators, on the whole dataset, then will take the average of the individual predictions to form a final prediction. " + ] + }, + { + "cell_type": "code", + "execution_count": 147, + "id": "5094098a-7cc7-4d40-801d-535794d674ff", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
VotingRegressor(estimators=[('gb', RandomForestRegressor(random_state=5)),\n",
+       "                            ('rf', GradientBoostingRegressor(random_state=5)),\n",
+       "                            ('lr', LinearRegression())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ], + "text/plain": [ + "VotingRegressor(estimators=[('gb', RandomForestRegressor(random_state=5)),\n", + " ('rf', GradientBoostingRegressor(random_state=5)),\n", + " ('lr', LinearRegression())])" + ] + }, + "execution_count": 147, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sklearn.ensemble import (\n", + " GradientBoostingRegressor,\n", + " RandomForestRegressor,\n", + " VotingRegressor,\n", + ")\n", + "from sklearn.linear_model import LinearRegression\n", + "\n", + "# instantiate estimators \n", + "rf_reg = RandomForestRegressor(random_state=5)\n", + "gb_reg = GradientBoostingRegressor(random_state=5)\n", + "linear_reg = LinearRegression()\n", + "\n", + "# fit estimators\n", + "rf_reg.fit(X_train, y_train)\n", + "gb_reg.fit(X_train, y_train)\n", + "linear_reg.fit(X_train, y_train)\n", + "\n", + "voting_reg = VotingRegressor([(\"gb\", rf_reg), (\"rf\", gb_reg), (\"lr\", linear_reg)])\n", + "voting_reg.fit(X_train, y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 148, + "id": "5a7b25b7-acf7-48e9-b902-6a33a577935b", + "metadata": {}, + "outputs": [], + "source": [ + "# make predictions\n", + "X_test_20 = X_test[:20] # first 20 for visualisation\n", + "\n", + "rf_pred = rf_reg.predict(X_test_20)\n", + "gb_pred = gb_reg.predict(X_test_20)\n", + "linear_pred = linear_reg.predict(X_test_20)\n", + "voting_pred = voting_reg.predict(X_test_20)" + ] + }, + { + "cell_type": "code", + "execution_count": 149, + "id": "b6fa5c41-3a2b-4e97-aad7-5cba9823cc12", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.figure()\n", + "plt.plot(rf_pred, \"gd\", label=\"GradientBoostingRegressor\")\n", + "plt.plot(gb_pred, \"b^\", label=\"RandomForestRegressor\")\n", + "plt.plot(linear_pred, \"ys\", label=\"LinearRegression\")\n", + "plt.plot(voting_pred, \"r*\", ms=10, label=\"VotingRegressor\")\n", + "\n", + "plt.tick_params(axis=\"x\", which=\"both\", bottom=False, top=False, labelbottom=False)\n", + "plt.ylabel(\"predicted\")\n", + "plt.xlabel(\"training samples\")\n", + "plt.legend(loc=\"best\")\n", + "plt.title(\"Regressor predictions and their average\")\n", + "\n", + "plt.show()\n" + ] + }, + { + "cell_type": "code", + "execution_count": 150, + "id": "cf52c80a-c6dc-4cc6-887e-acb4d9a673fa", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "random forest: 0.526627803806025\n", + "gradient boost: 0.5290702255768158\n", + "linear regression: 0.5271558947230808\n", + "voting regressor: 0.5520305568906223\n" + ] + } + ], + "source": [ + "print(f'random forest: {rf_reg.score(X_test, y_test)}')\n", + "\n", + "print(f'gradient boost: {gb_reg.score(X_test, y_test)}')\n", + "\n", + "print(f'linear regression: {linear_reg.score(X_test, y_test)}')\n", + "\n", + "print(f'voting regressor: {voting_reg.score(X_test, y_test)}')" + ] + }, + { + "cell_type": "markdown", + "id": "b8d0bec7-753e-48b2-ac59-8705344a152a", + "metadata": {}, + "source": [ + "When both the random forest regressor and voting regressor (average of all 3 estimators combined) are validated against the test data, we notice an increase in accuracy, albeit a slight one! " + ] + }, + { + "cell_type": "markdown", + "id": "2fcc7a38-2a86-445d-8e10-958d39b3c9e9", + "metadata": {}, + "source": [ + "## Exercise \n", + "\n", + "\n", + "Sci-kit learn also has method for stacking ensemble classifiers ```sklearn.ensemble.VotingClassifier``` do you think you could apply a stack to the penguins dataset using a random forest, SVM and decision tree classifier, or a selection of any other classifier estimators available in sci-kit learn? " + ] + }, + { + "cell_type": "code", + "execution_count": 151, + "id": "ccd03fa9-0bbe-40f8-aee8-a37693572e7b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "train size: (273, 4)\n", + "test size: (69, 4)\n" + ] + } + ], + "source": [ + "penguins = sns.load_dataset('penguins')\n", + "\n", + "feature_names = ['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']\n", + "penguins.dropna(subset=feature_names, inplace=True)\n", + "\n", + "species_names = penguins['species'].unique()\n", + "\n", + "# Define data and targets\n", + "X = penguins[feature_names]\n", + "\n", + "y = penguins.species\n", + "\n", + "# Split data in training and test set\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5)\n", + "\n", + "print(f'train size: {X_train.shape}')\n", + "print(f'test size: {X_test.shape}')" + ] + }, + { + "cell_type": "code", + "execution_count": 152, + "id": "441bccac-7231-482f-a974-ffe06845259e", + "metadata": {}, + "outputs": [], + "source": [ + "# import classifiers \n", + "\n", + "# instantiate classifiers \n", + "\n", + "# fit classifiers\n", + "\n", + "# instantiate voting classifier and fit data\n", + "\n", + "# make predictions\n", + "\n", + "# compare scores" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a617947f-d7b5-4be6-92bf-c387b20fd534", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/fig/EM_dt_clf_space.png b/fig/EM_dt_clf_space.png new file mode 100644 index 0000000..b5675f9 Binary files /dev/null and b/fig/EM_dt_clf_space.png differ diff --git a/fig/EM_rf_clf_space.png b/fig/EM_rf_clf_space.png new file mode 100644 index 0000000..607e276 Binary files /dev/null and b/fig/EM_rf_clf_space.png differ diff --git a/fig/EM_rf_reg_prediction.png b/fig/EM_rf_reg_prediction.png new file mode 100644 index 0000000..001a698 Binary files /dev/null and b/fig/EM_rf_reg_prediction.png differ diff --git a/fig/EM_stacked_plot.png b/fig/EM_stacked_plot.png new file mode 100644 index 0000000..076a442 Binary files /dev/null and b/fig/EM_stacked_plot.png differ diff --git a/fig/ML_summary.xcf b/fig/ML_summary.xcf new file mode 100644 index 0000000..1024ee1 Binary files /dev/null and b/fig/ML_summary.xcf differ diff --git a/fig/MnistExamples.png b/fig/MnistExamples.png new file mode 100644 index 0000000..30cdec9 Binary files /dev/null and b/fig/MnistExamples.png differ diff --git a/fig/bagging.jpeg b/fig/bagging.jpeg new file mode 100644 index 0000000..2a9a137 Binary files /dev/null and b/fig/bagging.jpeg differ diff --git a/fig/boosting.jpeg b/fig/boosting.jpeg new file mode 100644 index 0000000..910603b Binary files /dev/null and b/fig/boosting.jpeg differ diff --git a/fig/culmen_depth.png b/fig/culmen_depth.png new file mode 100644 index 0000000..3fe2147 Binary files /dev/null and b/fig/culmen_depth.png differ diff --git a/fig/decision_tree_example.png b/fig/decision_tree_example.png new file mode 100644 index 0000000..74c51f6 Binary files /dev/null and b/fig/decision_tree_example.png differ diff --git a/fig/e3_dt_2.png b/fig/e3_dt_2.png new file mode 100644 index 0000000..363985c Binary files /dev/null and b/fig/e3_dt_2.png differ diff --git a/fig/e3_dt_6.png b/fig/e3_dt_6.png new file mode 100644 index 0000000..043f125 Binary files /dev/null and b/fig/e3_dt_6.png differ diff --git a/fig/e3_dt_overfit.png b/fig/e3_dt_overfit.png new file mode 100644 index 0000000..26bcf37 Binary files /dev/null and b/fig/e3_dt_overfit.png differ diff --git a/fig/e3_dt_space_2.png b/fig/e3_dt_space_2.png new file mode 100644 index 0000000..dd4da2b Binary files /dev/null and b/fig/e3_dt_space_2.png differ diff --git a/fig/e3_dt_space_6.png b/fig/e3_dt_space_6.png new file mode 100644 index 0000000..244025a Binary files /dev/null and b/fig/e3_dt_space_6.png differ diff --git a/fig/e3_penguins_vis.png b/fig/e3_penguins_vis.png new file mode 100644 index 0000000..a16d62a Binary files /dev/null and b/fig/e3_penguins_vis.png differ diff --git a/fig/e3_svc_space.png b/fig/e3_svc_space.png new file mode 100644 index 0000000..908fbfa Binary files /dev/null and b/fig/e3_svc_space.png differ diff --git a/fig/house_price_voting_regressor.svg b/fig/house_price_voting_regressor.svg new file mode 100644 index 0000000..753cde5 --- /dev/null +++ b/fig/house_price_voting_regressor.svg @@ -0,0 +1,1194 @@ + + + + + + + + 2024-07-24T13:20:26.580306 + image/svg+xml + + + Matplotlib v3.7.1, https://matplotlib.orgdiff --git a/fig/introduction/AI_ML_DL_differences.png b/fig/introduction/AI_ML_DL_differences.png new file mode 100644 index 0000000..7cd5909 Binary files /dev/null and b/fig/introduction/AI_ML_DL_differences.png differ diff --git a/fig/introduction/ML_summary.png b/fig/introduction/ML_summary.png new file mode 100644 index 0000000..287be10 Binary files /dev/null and b/fig/introduction/ML_summary.png differ diff --git a/fig/introduction/sklearn_input.png b/fig/introduction/sklearn_input.png new file mode 100644 index 0000000..c37e3e6 Binary files /dev/null and b/fig/introduction/sklearn_input.png differ diff --git a/fig/kmeans_concentric_circle.png b/fig/kmeans_concentric_circle.png index 85a0077..8f3b061 100644 Binary files a/fig/kmeans_concentric_circle.png and b/fig/kmeans_concentric_circle.png differ diff --git a/fig/kmeans_overlapping_clusters.png b/fig/kmeans_overlapping_clusters.png index 945a0c5..516f628 100644 Binary files a/fig/kmeans_overlapping_clusters.png and b/fig/kmeans_overlapping_clusters.png differ diff --git a/fig/mnist_30000-letter.png b/fig/mnist_30000-letter.png new file mode 100644 index 0000000..e84b25e Binary files /dev/null and b/fig/mnist_30000-letter.png differ diff --git a/fig/mnist_pairplot.png b/fig/mnist_pairplot.png new file mode 100644 index 0000000..3eca33a Binary files /dev/null and b/fig/mnist_pairplot.png differ diff --git a/fig/mnist_pairplot_pixels.png b/fig/mnist_pairplot_pixels.png new file mode 100644 index 0000000..b0d37f4 Binary files /dev/null and b/fig/mnist_pairplot_pixels.png differ diff --git a/fig/pairplot.png b/fig/pairplot.png new file mode 100644 index 0000000..8079335 Binary files /dev/null and b/fig/pairplot.png differ diff --git a/fig/palmer_penguins.png b/fig/palmer_penguins.png new file mode 100644 index 0000000..736ae89 Binary files /dev/null and b/fig/palmer_penguins.png differ diff --git a/fig/pca.svg b/fig/pca.svg index 8404b97..599eeb7 100644 --- a/fig/pca.svg +++ b/fig/pca.svg @@ -1,12 +1,23 @@ - - + + + + + + 2023-03-03T11:46:14.106318 + image/svg+xml + + + Matplotlib v3.7.0, https://matplotlib.org/ + + + + + - + @@ -15,7 +26,7 @@ L 288 288 L 288 0 L 0 0 z -" style="fill:#ffffff;"/> +" style="fill: #ffffff"/> @@ -24,13 +35,13 @@ L 214.56 256.32 L 214.56 34.56 L 36 34.56 z -" style="fill:#ffffff;"/> +" style="fill: #ffffffstyle="stroke: #000000; stroke-width: 0.8"/> - + - - - - + + - + - - - - - +" transform="scale(0.015625)"/> + + + + + - + - - - + + + + + + + + + + - + - - - - + + + + + + - - - + - - - - + - - - + + - - - - - +" transform="scale(0.015625)"/> + + + + + - + - + - - - - - + + + + + + + + + - + + + + + + - + - - - - - - - - + + + + + - + - + - - - + + + + + - + - + - - - - + + + + + + + + - + - + - - - - + + + + + + + + - + - + - - - - + + + + + + + + + + + + + + + + + + + + + + + @@ -5693,418 +5833,319 @@ z +" style="fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square"/> +" style="fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square"/> +" style="fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square"/> +" style="fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square"/> - +" style="fill: #ffffff"/> - - + - + - + - + - + - + - + - + - + +" clip-path="url(#pb940068749)" style="fill: #cccccc"/> - - + + - +" style="stroke: #000000; stroke-width: 0.8"/> - - - - - - - - - - - - - - + - - - + + + - + - + - - - + + + - + - + - - - + + + - + - + - - - + + - - - +" transform="scale(0.015625)"/> + + - + - + - - - - - - + + + - + - + - - - - - - + + + - + - + - - - - - - + + + - + - + - - - - - - + + + - + - + + + + + + + + + + + + + + - - + + - - - +" transform="scale(0.015625)"/> + + + +" style="fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square"/> - - + + - - + + diff --git a/fig/pca_2d.gif b/fig/pca_2d.gif new file mode 100644 index 0000000..ede4e45 Binary files /dev/null and b/fig/pca_2d.gif differ diff --git a/fig/pca_clustered.png b/fig/pca_clustered.png new file mode 100644 index 0000000..0444bcc Binary files /dev/null and b/fig/pca_clustered.png differ diff --git a/fig/pca_labelled.png b/fig/pca_labelled.png new file mode 100644 index 0000000..04b807d Binary files /dev/null and b/fig/pca_labelled.png differ diff --git a/fig/pca_unlabelled.png b/fig/pca_unlabelled.png new file mode 100644 index 0000000..0588768 Binary files /dev/null and b/fig/pca_unlabelled.png differ diff --git a/fig/random_clusters.png b/fig/random_clusters.png new file mode 100644 index 0000000..2fabe32 Binary files /dev/null and b/fig/random_clusters.png differ diff --git a/fig/random_clusters_centre.png b/fig/random_clusters_centre.png new file mode 100644 index 0000000..d777d26 Binary files /dev/null and b/fig/random_clusters_centre.png differ diff --git a/fig/randomforest.png b/fig/randomforest.png new file mode 100644 index 0000000..01f1356 Binary files /dev/null and b/fig/randomforest.png differ diff --git a/fig/regress_both.png b/fig/regress_both.png new file mode 100644 index 0000000..38e587c Binary files /dev/null and b/fig/regress_both.png differ diff --git a/fig/regress_linear.png b/fig/regress_linear.png new file mode 100644 index 0000000..8acc4b5 Binary files /dev/null and b/fig/regress_linear.png differ diff --git a/fig/regress_linear_2nd.png b/fig/regress_linear_2nd.png new file mode 100644 index 0000000..ef471d0 Binary files /dev/null and b/fig/regress_linear_2nd.png differ diff --git a/fig/regress_linear_3rd.png b/fig/regress_linear_3rd.png new file mode 100644 index 0000000..4208799 Binary files /dev/null and b/fig/regress_linear_3rd.png differ diff --git a/fig/regress_linear_4th.png b/fig/regress_linear_4th.png new file mode 100644 index 0000000..82fba85 Binary files /dev/null and b/fig/regress_linear_4th.png differ diff --git a/fig/regress_penguin_lin.png b/fig/regress_penguin_lin.png new file mode 100644 index 0000000..8385526 Binary files /dev/null and b/fig/regress_penguin_lin.png differ diff --git a/fig/regress_penguin_lin_tot.png b/fig/regress_penguin_lin_tot.png new file mode 100644 index 0000000..f9d21fb Binary files /dev/null and b/fig/regress_penguin_lin_tot.png differ diff --git a/fig/regression_both.png b/fig/regression_both.png new file mode 100644 index 0000000..563b78a Binary files /dev/null and b/fig/regression_both.png differ diff --git a/fig/regression_example.png b/fig/regression_example.png new file mode 100644 index 0000000..8f38730 Binary files /dev/null and b/fig/regression_example.png differ diff --git a/fig/regression_inspect.png b/fig/regression_inspect.png new file mode 100644 index 0000000..a945542 Binary files /dev/null and b/fig/regression_inspect.png differ diff --git a/fig/regression_linear.png b/fig/regression_linear.png new file mode 100644 index 0000000..1764847 Binary files /dev/null and b/fig/regression_linear.png differ diff --git a/fig/regression_new_data.png b/fig/regression_new_data.png new file mode 100644 index 0000000..d113cf4 Binary files /dev/null and b/fig/regression_new_data.png differ diff --git a/fig/rf_5_trees.png b/fig/rf_5_trees.png new file mode 100644 index 0000000..b0fcbd1 Binary files /dev/null and b/fig/rf_5_trees.png differ diff --git a/fig/stacking.jpeg b/fig/stacking.jpeg new file mode 100644 index 0000000..db65db1 Binary files /dev/null and b/fig/stacking.jpeg differ diff --git a/fig/tsne.svg b/fig/tsne.svg index 2d8e7fa..f8397ec 100644 --- a/fig/tsne.svg +++ b/fig/tsne.svg @@ -1,12 +1,23 @@ - - + + + + + + 2023-03-03T12:05:32.173340 + image/svg+xml + + + Matplotlib v3.7.0, https://matplotlib.org/ + + + + + - + @@ -15,7 +26,7 @@ L 288 288 L 288 0 L 0 0 z -" style="fill:#ffffff;"/> +" style="fill: #ffffff"/> @@ -24,13 +35,13 @@ L 214.56 256.32 L 214.56 34.56 L 36 34.56 z -" style="fill:#ffffff;"/> +" style="fill: #ffffffstyle="stroke: #000000; stroke-width: 0.8"/> - + - - - - + + - + - - - - - +" transform="scale(0.015625)"/> + + + + + - + - - - + + + + + + + + + + - + - - - - + + + + + + - - - + - - - - + - - - + + - - - - - +" transform="scale(0.015625)"/> + + + + + - + - + - - - + + - - - - - +" transform="scale(0.015625)"/> + + + + + - + + + + + + - + - - - - - - - - + + + + + - + - + - - - + + + + + - + - + - - - - + + + + + + + + - + - + - - - - + + + + + + + + - + - + - - - - + + + + + + + + + + + + + + + + + + + + + + + @@ -5721,390 +5833,319 @@ z +" style="fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square"/> +" style="fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square"/> +" style="fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square"/> +" style="fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square"/> - +" style="fill: #ffffff"/> - - + - + - + - + - + - + - + - + - + +" clip-path="url(#p9410105589)" style="fill: #cccccc"/> - - + + - +" style="stroke: #000000; stroke-width: 0.8"/> - - - - - - - - - - - - - - + - - - - - - + + + - + - + - - - + + + - + - + - - - - - - + + + - + - + - - - + + + + + + - + - + - - - + + + - + - + - - - + + + - + - + - - - - - - + + + - + - + - - - - - - + + + - + - + + + + + + + + + + + + + + - - + + - - - +" transform="scale(0.015625)"/> + + + +" style="fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square"/> - - + + - - + + diff --git a/fig/tsne_clustered.png b/fig/tsne_clustered.png new file mode 100644 index 0000000..11a5b05 Binary files /dev/null and b/fig/tsne_clustered.png differ diff --git a/fig/tsne_labelled.png b/fig/tsne_labelled.png new file mode 100644 index 0000000..dbc4fa7 Binary files /dev/null and b/fig/tsne_labelled.png differ diff --git a/fig/tsne_unlabelled.png b/fig/tsne_unlabelled.png new file mode 100644 index 0000000..f4621a6 Binary files /dev/null and b/fig/tsne_unlabelled.png differ