diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 3b2dfd14..72b0e00d 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -29,6 +29,10 @@ jobs:
         run: pytest docs/source/.codespell/test_notebook_to_markdown.py
       - name: Run tests
         run: pytest --cov-report=xml --no-cov-on-fail
+      - name: Check codespell for notebooks
+        run: |
+          python ./docs/source/.codespell/notebook_to_markdown.py --tempdir tmp_markdown
+          codespell
       - name: Upload coverage to Codecov
         uses: codecov/codecov-action@v4
         with:
diff --git a/docs/source/knowledgebase/quasi_dags.ipynb b/docs/source/knowledgebase/quasi_dags.ipynb
index 5eeec27e..2c780c40 100644
--- a/docs/source/knowledgebase/quasi_dags.ipynb
+++ b/docs/source/knowledgebase/quasi_dags.ipynb
@@ -104,7 +104,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This leads us to Randomized Controlled Trials (RCTs) which are considered the gold standard for estimating causal effects. One reason for this is that we (as experimenters) intervene in the system by assigning units to treatment by {term}`random assignment`. Because of this intervention, any causal influence of the confounders upon the treatment $\\mathbf{X} \\rightarrow Z$ is broken - treamtent is now soley determined by the randomisation process, $R \\rightarrow T$. The following causal DAG illustrates the structure of an RCT."
+    "This leads us to Randomized Controlled Trials (RCTs) which are considered the gold standard for estimating causal effects. One reason for this is that we (as experimenters) intervene in the system by assigning units to treatment by {term}`random assignment`. Because of this intervention, any causal influence of the confounders upon the treatment $\\mathbf{X} \\rightarrow Z$ is broken - treamtent is now solely determined by the randomisation process, $R \\rightarrow T$. The following causal DAG illustrates the structure of an RCT."
    ]
   },
   {
diff --git a/docs/source/notebooks/ancova_pymc.ipynb b/docs/source/notebooks/ancova_pymc.ipynb
index 50faf453..a2daa024 100644
--- a/docs/source/notebooks/ancova_pymc.ipynb
+++ b/docs/source/notebooks/ancova_pymc.ipynb
@@ -222,7 +222,7 @@
     "## Run the analysis\n",
     "\n",
     ":::{note}\n",
-    "The `random_seed` keyword argument for the PyMC sampler is not neccessary. We use it here so that the results are reproducible.\n",
+    "The `random_seed` keyword argument for the PyMC sampler is not necessary. We use it here so that the results are reproducible.\n",
     ":::"
    ]
   },
diff --git a/docs/source/notebooks/did_pymc.ipynb b/docs/source/notebooks/did_pymc.ipynb
index 72364638..e44c6117 100644
--- a/docs/source/notebooks/did_pymc.ipynb
+++ b/docs/source/notebooks/did_pymc.ipynb
@@ -148,7 +148,7 @@
     "## Run the analysis\n",
     "\n",
     ":::{note}\n",
-    "The `random_seed` keyword argument for the PyMC sampler is not neccessary. We use it here so that the results are reproducible.\n",
+    "The `random_seed` keyword argument for the PyMC sampler is not necessary. We use it here so that the results are reproducible.\n",
     ":::"
    ]
   },
diff --git a/docs/source/notebooks/did_pymc_banks.ipynb b/docs/source/notebooks/did_pymc_banks.ipynb
index 6c8fab8e..82e802b9 100644
--- a/docs/source/notebooks/did_pymc_banks.ipynb
+++ b/docs/source/notebooks/did_pymc_banks.ipynb
@@ -329,7 +329,7 @@
     "* $\\mu_i$ is the expected value of the outcome (number of banks in business) for the $i^{th}$ observation.\n",
     "* $\\beta_0$ is an intercept term to capture the basiline number of banks in business of the control group, in the pre-intervention period.\n",
     "* `district` is a dummy variable, so $\\beta_{d}$ will represent a main effect of district, that is any offset of the treatment group relative to the control group.\n",
-    "* `post_treatment` is also a dummy variable which captures any shift in the outcome after the treatment time, regardless of the recieving treatment or not.\n",
+    "* `post_treatment` is also a dummy variable which captures any shift in the outcome after the treatment time, regardless of the receiving treatment or not.\n",
     "* the interaction of the two dummary variables `district:post_treatment` will only take on values of 1 for the treatment group after the intervention. Therefore $\\beta_{\\Delta}$ will represent our estimated causal effect."
    ]
   },
@@ -515,7 +515,7 @@
    "source": [
     "## Analysis 2 - DiD with multiple pre/post observations\n",
     "\n",
-    "Now we'll do a difference in differences analysis of the full dataset. This approach has similarities to {term}`CITS` (Comparative Interrupted Time-Series) with a single control over time. Although slightly abitrary, we distinguish between the two techniques on whether there is enough time series data for CITS to capture the time series patterns."
+    "Now we'll do a difference in differences analysis of the full dataset. This approach has similarities to {term}`CITS` (Comparative Interrupted Time-Series) with a single control over time. Although slightly arbitrary, we distinguish between the two techniques on whether there is enough time series data for CITS to capture the time series patterns."
    ]
   },
   {
diff --git a/docs/source/notebooks/geolift1.ipynb b/docs/source/notebooks/geolift1.ipynb
index a41c6d1f..caf187f1 100644
--- a/docs/source/notebooks/geolift1.ipynb
+++ b/docs/source/notebooks/geolift1.ipynb
@@ -269,7 +269,7 @@
     "We can use `CausalPy`'s API to run this procedure, but using Bayesian inference methods as follows:\n",
     "\n",
     ":::{note}\n",
-    "The `random_seed` keyword argument for the PyMC sampler is not neccessary. We use it here so that the results are reproducible.\n",
+    "The `random_seed` keyword argument for the PyMC sampler is not necessary. We use it here so that the results are reproducible.\n",
     ":::"
    ]
   },
diff --git a/docs/source/notebooks/inv_prop_pymc.ipynb b/docs/source/notebooks/inv_prop_pymc.ipynb
index 76f06887..844c1b6a 100644
--- a/docs/source/notebooks/inv_prop_pymc.ipynb
+++ b/docs/source/notebooks/inv_prop_pymc.ipynb
@@ -22,9 +22,9 @@
                 "\n",
                 "In this notebook we will briefly demonstrate how to use propensity score weighting schemes to recover treatment effects in the analysis of observational data. We will first showcase the method with a simulated data example drawn from Lucy D’Agostino McGowan's [excellent blog](https://livefreeordichotomize.com/posts/2019-01-17-understanding-propensity-score-weighting/) on inverse propensity score weighting. Then we shall apply the same techniques to NHEFS data set discussed in Miguel Hernan and Robins' _Causal Inference: What if_ [book](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/). This data set measures the effect of quitting smoking between the period of 1971 and 1982. At each of these two points in time the participant's weight was recorded, and we seek to estimate the effect of quitting in the intervening years on the weight recorded in 1982.\n",
                 "\n",
-                "We will use inverse propensity score weighting techniques to estimate the average treatment effect. There are a range of weighting techniques available: we have implemented `raw`, `robust`, `doubly robust` and `overlap` weighting schemes all of which aim to estimate the average treatment effect. The idea of a propensity score (very broadly) is to derive a one-number summary of individual's probability of adopting a particular treatment. This score is typically calculated by fitting a predictive logit model on all an individual's observed attributes predicting whether or not the those attributes drive the indivdual towards the treatment status. In the case of the NHEFS data we want a model to measure the propensity for each individual to quit smoking. \n",
+                "We will use inverse propensity score weighting techniques to estimate the average treatment effect. There are a range of weighting techniques available: we have implemented `raw`, `robust`, `doubly robust` and `overlap` weighting schemes all of which aim to estimate the average treatment effect. The idea of a propensity score (very broadly) is to derive a one-number summary of individual's probability of adopting a particular treatment. This score is typically calculated by fitting a predictive logit model on all an individual's observed attributes predicting whether or not the those attributes drive the individual towards the treatment status. In the case of the NHEFS data we want a model to measure the propensity for each individual to quit smoking. \n",
                 "\n",
-                "The reason we want this propensity score is because with observed data we often have a kind of imbalance in our covariate profiles across treatment groups. Meaning our data might be unrepresentative in some crucial aspect. This prevents us cleanly reading off treatment effects by looking at simple group differences. These \"imbalances\" can be driven by selection effects into the treatment status so that if we want to estimate the average treatment effect in the population as a whole we need to be wary that our sample might not give us generalisable insight into the treatment differences. Using propensity scores as a measure of the prevalance to adopt the treatment status in the population, we can cleverly weight the observed data to privilege observations of \"rare\" occurence in each group. For example, if smoking is the treatment status and regular running is generally not common among the group of smokers, then on the occasion we see a smoker marathon runner we should heavily weight their outcome measure to overcome their low prevalence in the treated group but real presence in the unmeasured population. Inverse propensity weighting tries to define weighting schemes are inversely proportional to an individual's propensity score so as to better recover an estimate which mitigates (somewhat) the risk of selection effect bias. For more details and illustration of these themes see the PyMC examples [write up](https://www.pymc.io/projects/examples/en/latest/causal_inference/bayesian_nonparametric_causal.html) on Non-Parametric Bayesian methods. {cite:p}`forde2024nonparam`\n"
+                "The reason we want this propensity score is because with observed data we often have a kind of imbalance in our covariate profiles across treatment groups. Meaning our data might be unrepresentative in some crucial aspect. This prevents us cleanly reading off treatment effects by looking at simple group differences. These \"imbalances\" can be driven by selection effects into the treatment status so that if we want to estimate the average treatment effect in the population as a whole we need to be wary that our sample might not give us generalisable insight into the treatment differences. Using propensity scores as a measure of the prevalance to adopt the treatment status in the population, we can cleverly weight the observed data to privilege observations of \"rare\" occurrence in each group. For example, if smoking is the treatment status and regular running is generally not common among the group of smokers, then on the occasion we see a smoker marathon runner we should heavily weight their outcome measure to overcome their low prevalence in the treated group but real presence in the unmeasured population. Inverse propensity weighting tries to define weighting schemes are inversely proportional to an individual's propensity score so as to better recover an estimate which mitigates (somewhat) the risk of selection effect bias. For more details and illustration of these themes see the PyMC examples [write up](https://www.pymc.io/projects/examples/en/latest/causal_inference/bayesian_nonparametric_causal.html) on Non-Parametric Bayesian methods. {cite:p}`forde2024nonparam`\n"
             ]
         },
         {
@@ -832,7 +832,7 @@
             "cell_type": "markdown",
             "metadata": {},
             "source": [
-                "We see here how the particular weighting scheme was able to recover the true treatment effect by defining a contrast in a different pseudo population. This is a useful reminder in that, while propensity score weighting methods are aids to inference in observational data, not all weighting schemes are created equal and we need to be careful in our assessment of when each is applied appropriately. Fundamentally the weighting scheme of choice should be tied to the question of what are you trying to estimate. Aronow and Miller's _Foundations of Agnostic Statistics_ {cite:p}`aronowFoundations` has a good explantion of the differences between the `raw`, `robust` and `doubly robust` weighting schemes. In some sense these offer an escalating series of refined estimators each trying to improve the variance in the ATE estimate. The `doubly robust` approach also tries to offer some guarantees against model misspecification. The `overlap` estimator represents an attempt to calculate the ATE among the population with the overlapping propensity scores. This can be used to guard against poor inference in cases where propensity score distributions have large non-overlapping regions."
+                "We see here how the particular weighting scheme was able to recover the true treatment effect by defining a contrast in a different pseudo population. This is a useful reminder in that, while propensity score weighting methods are aids to inference in observational data, not all weighting schemes are created equal and we need to be careful in our assessment of when each is applied appropriately. Fundamentally the weighting scheme of choice should be tied to the question of what are you trying to estimate. Aronow and Miller's _Foundations of Agnostic Statistics_ {cite:p}`aronowFoundations` has a good explanation of the differences between the `raw`, `robust` and `doubly robust` weighting schemes. In some sense these offer an escalating series of refined estimators each trying to improve the variance in the ATE estimate. The `doubly robust` approach also tries to offer some guarantees against model misspecification. The `overlap` estimator represents an attempt to calculate the ATE among the population with the overlapping propensity scores. This can be used to guard against poor inference in cases where propensity score distributions have large non-overlapping regions."
             ]
         },
         {
diff --git a/docs/source/notebooks/its_covid.ipynb b/docs/source/notebooks/its_covid.ipynb
index d12ec10b..dbba8e81 100644
--- a/docs/source/notebooks/its_covid.ipynb
+++ b/docs/source/notebooks/its_covid.ipynb
@@ -167,7 +167,7 @@
     "\n",
     "* `date` + `year`: self explanatory\n",
     "* `month`: month, numerically encoded. Needs to be treated as a categorical variable\n",
-    "* `temp`: average UK temperature (Celcius)\n",
+    "* `temp`: average UK temperature (Celsius)\n",
     "* `t`: time\n",
     "* `pre`: boolean flag indicating pre or post intervention"
    ]
@@ -182,7 +182,7 @@
     "In this example we are going to standardize the data. So we have to be careful in how we interpret the inferred regression coefficients, and the posterior predictions will be in this standardized space.\n",
     "\n",
     ":::{note}\n",
-    "The `random_seed` keyword argument for the PyMC sampler is not neccessary. We use it here so that the results are reproducible.\n",
+    "The `random_seed` keyword argument for the PyMC sampler is not necessary. We use it here so that the results are reproducible.\n",
     ":::"
    ]
   },
diff --git a/docs/source/notebooks/its_pymc.ipynb b/docs/source/notebooks/its_pymc.ipynb
index 4ba95219..bc9d8dc3 100644
--- a/docs/source/notebooks/its_pymc.ipynb
+++ b/docs/source/notebooks/its_pymc.ipynb
@@ -163,7 +163,7 @@
     "Run the analysis\n",
     "\n",
     ":::{note}\n",
-    "The `random_seed` keyword argument for the PyMC sampler is not neccessary. We use it here so that the results are reproducible.\n",
+    "The `random_seed` keyword argument for the PyMC sampler is not necessary. We use it here so that the results are reproducible.\n",
     ":::"
    ]
   },
@@ -304,7 +304,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "As well as the model coefficients, we might be interested in the avarage causal impact and average cumulative causal impact.\n",
+    "As well as the model coefficients, we might be interested in the average causal impact and average cumulative causal impact.\n",
     "\n",
     ":::{note}\n",
     "Better output for the summary statistics are in progress!\n",
diff --git a/docs/source/notebooks/iv_weak_instruments.ipynb b/docs/source/notebooks/iv_weak_instruments.ipynb
index 7ebbdd06..b03f46cf 100644
--- a/docs/source/notebooks/iv_weak_instruments.ipynb
+++ b/docs/source/notebooks/iv_weak_instruments.ipynb
@@ -155,7 +155,7 @@
             "source": [
                 "#### Digression: Sampling Multivariate Normals\n",
                 "\n",
-                "How can we measure this correlation between instrument and treatment? How much correlation should we expect? In the `CausalPy` implementation of instrumental variable regression we model this correlation explicity using an LKJ Cholesky prior on a Multivariate Normal distribution. It's worth a small digression here to show how sampling from this distribution under different priors can shape the correlations of the joint-distribution. We'll show below how this offers us a mechanism to impose constraints on our beliefs about the relationships between our instruments. "
+                "How can we measure this correlation between instrument and treatment? How much correlation should we expect? In the `CausalPy` implementation of instrumental variable regression we model this correlation explicitly using an LKJ Cholesky prior on a Multivariate Normal distribution. It's worth a small digression here to show how sampling from this distribution under different priors can shape the correlations of the joint-distribution. We'll show below how this offers us a mechanism to impose constraints on our beliefs about the relationships between our instruments. "
             ]
         },
         {
@@ -241,7 +241,7 @@
             "cell_type": "markdown",
             "metadata": {},
             "source": [
-                "In the above series of plots we have sampled from two different parameterisations of an LKJ prior probability distribution. This distribution is a prior on the expected values and covariance structure of a multivariate normal probability distribution. We have shown different parameterisations of the prior lead to more or less correlated realisations of the bivariate normal we're sampling. Here we can see that increasing the `eta` parameter on the LKJ prior shrinks the range of admissable correlations parameters. By default the `CausalPy` implementation of the `InstrumentalVariable` class samples from a bivariate normal distribution of the treatment and the outcome with a parameter setting of `eta=2`. This is worth knowing if your model makes such potential correlations very unlikely. We will show below how you can apply priors to these parameters in the instrumental variable context. "
+                "In the above series of plots we have sampled from two different parameterisations of an LKJ prior probability distribution. This distribution is a prior on the expected values and covariance structure of a multivariate normal probability distribution. We have shown different parameterisations of the prior lead to more or less correlated realisations of the bivariate normal we're sampling. Here we can see that increasing the `eta` parameter on the LKJ prior shrinks the range of admissible correlations parameters. By default the `CausalPy` implementation of the `InstrumentalVariable` class samples from a bivariate normal distribution of the treatment and the outcome with a parameter setting of `eta=2`. This is worth knowing if your model makes such potential correlations very unlikely. We will show below how you can apply priors to these parameters in the instrumental variable context. "
             ]
         },
         {
@@ -349,7 +349,7 @@
                 "df = cp.load_data(\"schoolReturns\")\n",
                 "\n",
                 "\n",
-                "def poly(x, p):  # replicate R's poly decompostion function\n",
+                "def poly(x, p):  # replicate R's poly decomposition function\n",
                 "    X = np.transpose(np.vstack([x**k for k in range(p + 1)]))\n",
                 "    return np.linalg.qr(X)[0][:, 1:]\n",
                 "\n",
@@ -715,7 +715,7 @@
             "cell_type": "markdown",
             "metadata": {},
             "source": [
-                "There is at least some indication here that the proximity to college has an impact on educational attainment, and that education seems to drive upward the achieved wages. Typical approaches to instrumental variable regressions try to quantify the strength of the instrument formally. But even visually here the evidence that proximity to college would induce increased levels of education seems compelling. For instance, we see a clear upward sloping trend relationship between educational attainment and wage-acquisition. But additionally, in the first plot we can see a sparse presence of individuals who achieved the maximum educational outcomes among those with poor proximity to 4 year colleges i.e. fewer red dots than blue in the ellipses. A similar story plays out in the second plot but focused on the maximal educational attainment assuming the proxmity of a 2 year college. These observations should tilt our view of the conditional expectations for wage growth based on proximty to college. "
+                "There is at least some indication here that the proximity to college has an impact on educational attainment, and that education seems to drive upward the achieved wages. Typical approaches to instrumental variable regressions try to quantify the strength of the instrument formally. But even visually here the evidence that proximity to college would induce increased levels of education seems compelling. For instance, we see a clear upward sloping trend relationship between educational attainment and wage-acquisition. But additionally, in the first plot we can see a sparse presence of individuals who achieved the maximum educational outcomes among those with poor proximity to 4 year colleges i.e. fewer red dots than blue in the ellipses. A similar story plays out in the second plot but focused on the maximal educational attainment assuming the proxmity of a 2 year college. These observations should tilt our view of the conditional expectations for wage growth based on proximity to college. "
             ]
         },
         {
@@ -724,7 +724,7 @@
             "source": [
                 "### Justificatory Models\n",
                 "\n",
-                "We start with the simple regression context. This serves two purposes: (i) we can explore how the effect of `education` is measured in a simple regression and we can (ii) benchmark the efficacy of our instrument `nearcollege_indcator` in the context of trying to predict `education`. These regressions are effectively diagnostic tests of our instrument. In what follows we'll look seperately at\n",
+                "We start with the simple regression context. This serves two purposes: (i) we can explore how the effect of `education` is measured in a simple regression and we can (ii) benchmark the efficacy of our instrument `nearcollege_indcator` in the context of trying to predict `education`. These regressions are effectively diagnostic tests of our instrument. In what follows we'll look separately at\n",
                 "\n",
                 "- (i) the first stage regression of the LATE estimate, \n",
                 "- (ii) the reduced form regression and \n",
@@ -3155,7 +3155,7 @@
             "cell_type": "markdown",
             "metadata": {},
             "source": [
-                "The uncertainty in the correlation implied in the last model kind of undermines this model specification. If our argument about the instrument is to be compelling we would expect __relevance__ to hold. A model specification which degrades the relevance by means of reduced correlation is perhaps too extreme. We have in effect degraded the relevance of our instrument and still recover a strong positive effect for `beta_z[education]`. The point here is not to argue about the parameter settings, just to show that multiple models need to be considered and some sensetivity testing is always warranted when justifying an IV design.  "
+                "The uncertainty in the correlation implied in the last model kind of undermines this model specification. If our argument about the instrument is to be compelling we would expect __relevance__ to hold. A model specification which degrades the relevance by means of reduced correlation is perhaps too extreme. We have in effect degraded the relevance of our instrument and still recover a strong positive effect for `beta_z[education]`. The point here is not to argue about the parameter settings, just to show that multiple models need to be considered and some sensitivity testing is always warranted when justifying an IV design.  "
             ]
         },
         {
diff --git a/docs/source/notebooks/multi_cell_geolift.ipynb b/docs/source/notebooks/multi_cell_geolift.ipynb
index 4c1fbc86..ebba88e9 100644
--- a/docs/source/notebooks/multi_cell_geolift.ipynb
+++ b/docs/source/notebooks/multi_cell_geolift.ipynb
@@ -10,7 +10,7 @@
     "\n",
     "This may be a particularly common use case in marketing, where a company may want to understand the impact of a marketing campaign in multiple regions. But these methods are not restricted to marketing of course - the methods shown here are general. Another concrete use case may be in public health, where a public health intervention may be rolled out in multiple regions.\n",
     "\n",
-    "This notebook focusses on the situation where the treatment has already taken place, and now we want to understand the causal effects of the treatments that were executed. Much work likely preceeded this analysis, such as asking yourself questions like \"which geos should I run the treatment in?\", \"what should the treatment be?\" But these pre-treatment questions are not the focus of this notebook.\n",
+    "This notebook focusses on the situation where the treatment has already taken place, and now we want to understand the causal effects of the treatments that were executed. Much work likely preceded this analysis, such as asking yourself questions like \"which geos should I run the treatment in?\", \"what should the treatment be?\" But these pre-treatment questions are not the focus of this notebook.\n",
     "\n",
     "We can imagine two scenarios (there may be more), and show how we can tailor our analysis to each:\n",
     "\n",
@@ -310,7 +310,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Always vizualise the data before starting the analysis. Our rather uncreative naming scheme uses `u` to represent untreated geos, and `t` to represent treated geos. The number after the `u` or `t` represents the geo number."
+    "Always visualise the data before starting the analysis. Our rather uncreative naming scheme uses `u` to represent untreated geos, and `t` to represent treated geos. The number after the `u` or `t` represents the geo number."
    ]
   },
   {
@@ -559,7 +559,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Let's vizualise this aggregated geo and compare it to the individual treated geo's."
+    "Let's visualise this aggregated geo and compare it to the individual treated geo's."
    ]
   },
   {
@@ -1120,7 +1120,7 @@
    "source": [
     "Now let's plot the weightings of the untreated geos for each treated geo. Note that `sigma` is the model's estimate of the standard deviation of the observation noise.\n",
     "\n",
-    "If we wanted to produce seperate plots for each target geo, we could do so like this:\n",
+    "If we wanted to produce separate plots for each target geo, we could do so like this:\n",
     "\n",
     "```python\n",
     "fig, axs = plt.subplots(len(treated), 1, figsize=(8, 4 * len(treated)), sharex=True)\n",
diff --git a/docs/source/notebooks/rd_pymc.ipynb b/docs/source/notebooks/rd_pymc.ipynb
index 348c89a7..be23d3c4 100644
--- a/docs/source/notebooks/rd_pymc.ipynb
+++ b/docs/source/notebooks/rd_pymc.ipynb
@@ -51,7 +51,7 @@
    "metadata": {},
    "source": [
     ":::{note}\n",
-    "The `random_seed` keyword argument for the PyMC sampler is not neccessary. We use it here so that the results are reproducible.\n",
+    "The `random_seed` keyword argument for the PyMC sampler is not necessary. We use it here so that the results are reproducible.\n",
     ":::"
    ]
   },
diff --git a/docs/source/notebooks/rd_pymc_drinking.ipynb b/docs/source/notebooks/rd_pymc_drinking.ipynb
index 03af17b0..4ffca0db 100644
--- a/docs/source/notebooks/rd_pymc_drinking.ipynb
+++ b/docs/source/notebooks/rd_pymc_drinking.ipynb
@@ -89,7 +89,7 @@
    "metadata": {},
    "source": [
     ":::{note}\n",
-    "The `random_seed` keyword argument for the PyMC sampler is not neccessary. We use it here so that the results are reproducible.\n",
+    "The `random_seed` keyword argument for the PyMC sampler is not necessary. We use it here so that the results are reproducible.\n",
     ":::"
    ]
   },
diff --git a/docs/source/notebooks/rkink_pymc.ipynb b/docs/source/notebooks/rkink_pymc.ipynb
index 0c147d07..2c1d004c 100644
--- a/docs/source/notebooks/rkink_pymc.ipynb
+++ b/docs/source/notebooks/rkink_pymc.ipynb
@@ -599,7 +599,7 @@
    "source": [
     "## Example 3 - basis spline model\n",
     "\n",
-    "As a final example to demonstrate that we need not be constrained to polynomial functions, we can use a basis spline model. This takes advantage of the capability of `patsy` to generate design matricies with basis splines. Note that we will use the same simulated dataset as the previous example."
+    "As a final example to demonstrate that we need not be constrained to polynomial functions, we can use a basis spline model. This takes advantage of the capability of `patsy` to generate design matrices with basis splines. Note that we will use the same simulated dataset as the previous example."
    ]
   },
   {
diff --git a/docs/source/notebooks/sc_pymc.ipynb b/docs/source/notebooks/sc_pymc.ipynb
index 93621c2b..f9260913 100644
--- a/docs/source/notebooks/sc_pymc.ipynb
+++ b/docs/source/notebooks/sc_pymc.ipynb
@@ -57,7 +57,7 @@
     "## Run the analysis\n",
     "\n",
     ":::{note}\n",
-    "The `random_seed` keyword argument for the PyMC sampler is not neccessary. We use it here so that the results are reproducible.\n",
+    "The `random_seed` keyword argument for the PyMC sampler is not necessary. We use it here so that the results are reproducible.\n",
     ":::"
    ]
   },
@@ -195,7 +195,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "As well as the model coefficients, we might be interested in the avarage causal impact and average cumulative causal impact."
+    "As well as the model coefficients, we might be interested in the average causal impact and average cumulative causal impact."
    ]
   },
   {
diff --git a/docs/source/notebooks/sc_pymc_brexit.ipynb b/docs/source/notebooks/sc_pymc_brexit.ipynb
index 359b274f..1a8f39cd 100644
--- a/docs/source/notebooks/sc_pymc_brexit.ipynb
+++ b/docs/source/notebooks/sc_pymc_brexit.ipynb
@@ -7,7 +7,7 @@
    "source": [
     "# The effects of Brexit\n",
     "\n",
-    "The aim of this notebook is to estimate the causal impact of Brexit upon the UK's GDP. This will be done using the {term}`synthetic control` approch. As such, it is similar to the policy brief \"What can we know about the cost of Brexit so far?\" {cite:p}`brexit2022policybrief` from the Center for European Reform. That approach did not use Bayesian estimation methods however.\n",
+    "The aim of this notebook is to estimate the causal impact of Brexit upon the UK's GDP. This will be done using the {term}`synthetic control` approach. As such, it is similar to the policy brief \"What can we know about the cost of Brexit so far?\" {cite:p}`brexit2022policybrief` from the Center for European Reform. That approach did not use Bayesian estimation methods however.\n",
     "\n",
     "I did not use the GDP data from the above report however as it had been scaled in some way that was hard for me to understand how it related to the absolute GDP figures. Instead, GDP data was obtained courtesy of Prof. Dooruj Rambaccussing. Raw data is in units of billions of USD."
    ]
@@ -225,7 +225,7 @@
    "metadata": {},
    "source": [
     ":::{note}\n",
-    "The `random_seed` keyword argument for the PyMC sampler is not neccessary. We use it here so that the results are reproducible.\n",
+    "The `random_seed` keyword argument for the PyMC sampler is not necessary. We use it here so that the results are reproducible.\n",
     ":::"
    ]
   },
diff --git a/pyproject.toml b/pyproject.toml
index 54d9f342..2b635c13 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -49,7 +49,7 @@ dependencies = [
 #
 # Similar to `dependencies` above, these must be valid existing projects.
 [project.optional-dependencies]
-dev = ["pathlib", "pre-commit", "twine", "interrogate", "codespell"]
+dev = ["pathlib", "pre-commit", "twine", "interrogate", "codespell", "nbformat", "nbconvert"]
 docs = [
     "ipykernel",
     "daft",
@@ -69,7 +69,7 @@ docs = [
     "sphinx-design",
 ]
 lint = ["interrogate", "pre-commit", "ruff"]
-test = ["pytest", "pytest-cov"]
+test = ["pytest", "pytest-cov", "codespell", "nbformat", "nbconvert"]
 
 [metadata]
 description-file = 'README.md'