Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
richfitz committed Jun 7, 2024
1 parent a793636 commit 65fb162
Showing 1 changed file with 5 additions and 21 deletions.
26 changes: 5 additions & 21 deletions vignettes/stan.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -28,22 +28,16 @@ The `CmdStan` interface compiles stan programs into standalone executables, and

## Installation and versions

The version of the package that you use on your own machine is not important to these instructions (though it may be important to you depending on what model you are compiling). You need to install a very recent (as of 31 May 2024) version of `cmdstanr` on the cluster, however; the only versions that work are later than **0.8.0.9000**, which is not yet released.
The version of the package that you use on your own machine is not important to these instructions (though it may be important to you depending on what model you are compiling). You need to install a very recent version of `cmdstanr` (0.8.1 or newer, released on 6 June 2024).

For now, your `pkgdepends.txt` should contain:

```
github::stan-dev/cmdstanr
```

Once this version is released (to their internal repository), then you will be able to write:
Your `pkgdepends.txt` should contain:

```
repo::https://mc-stan.org/r-packages
cmdstanr
```

There is only one version of `CmdStan` which works, and this is automatically available for you on the cluster (this is version `2.35.0-rc3` at the time of writing). We would like to support multiple versions at once, but the internal gymnastics in `cmdstanr` make that quite hard to do. Please let us know if you need a specific version (later than `2.35.0-rc3` but older than whatever we have installed).
There is only one version of `CmdStan` which works, and this is automatically available for you on the cluster (this is version `2.35.0` at the time of writing). We would like to support multiple versions at once, but the internal gymnastics in `cmdstanr` make that quite hard to do. Please let us know if you need a specific version (later than `2.35.0` but different to whatever we have installed).

It is easy to make a mess on the cluster by following instructions from the stan developers which assume you are the only user of a machine, and that stan is the main piece of software you are interested in running. This is especially the case if you follow instructions found on StackOverflow as these may refer to previous versions. The fallout should be restricted to your own working space though.

Expand All @@ -68,14 +62,14 @@ you can use
mod <- cmdstanr::cmdstan_model("model.stan", dir = tempfile())
```

Using `tempdir()` will mean that every process that starts up will be guaranteed a different directory, so you won't have to think about this very much.
Using `tempdir()` guarantees that every process that starts up will be use a different directory, so you won't have to think about this very much.

The two issues being mitigated here are:

* if the toolchain used to compile stan models differs on your laptop and the cluster (this is true if you are using a different operating system, minor version of R, or if you have a different version of Rtools, or if you have a different version of cmdstanr) then the model that you have compiled on your machine will not work on the cluster, with perhaps difficult-to-diagnose errors.
* if you launch two or more tasks at once from the cluster, then every job will try and compile the model at the same time, and then try and write to the same file. At most one of these tasks will succeed, with the others failing due to permissions issues (windows cannot write a file that is open).

The (potetentially large) downside of using `tempdir()` is that every time you start a task that uses a stan program it will have to compile it. Even for a simple program this is quite slow (say up to two minutes for a "hello world" type problem). Before deciding if this is going to cause you issues, think about how long it will take to *run* your code. If it's going to run for an hour, then this delay may be tolerable.
The (potentially large) downside of using `tempdir()` is that every time you start a task that uses a stan program it will have to compile it. Even for a simple program this is quite slow (say up to two minutes for a "hello world" type problem). Before deciding if this is going to cause you issues, think about how long it will take to *run* your code. If it's going to run for an hour, then this delay may be tolerable.

However, if your program is very fast once compiled, this might become a significant issue. If you want to reuse the models then you need to do so in a way that (a) only hipercow tasks reuse the compiled models and (b) no task tries to recompile the model after it is created. You might explore something like first submitting a task that runs

Expand All @@ -102,13 +96,3 @@ cmdstan_model_but_dont_recompile <- function(path, dir = ".", ...) {
```

which you can use in place of `cmdstanr::cmdstan_model`. Good luck.

# Parallelism

Stan supports compiling models and running chains in parallel and you should exploit this wherever you can, as it will typically reduce the total time you spend waiting for your task to complete. Before starting, be sure to read `vignette("parallel")` which covers general advice about running tasks that use more than one core. All you should need to do is to request more than one core via `resources`, and stan should pick up on and respect this.

## Within-chain parallelisation

Chain-based parallelism is an easy win. Each chain takes about the same amount of time and runs totally indepenently, so running 2 chains in parallel usually halves the total wall time of a program.

Some programs can use multi-threading within the model; for details see [the stan guide](https://mc-stan.org/docs/cmdstan-guide/parallelization.html). Before exploring these too far, bear in mind that you will need to rewrite bits of your program and that the overall speedup will be less impressive than before. The interface for this in stan appears in flux, and we will document how you can get the expected behaviour here once it stabilises.

0 comments on commit 65fb162

Please sign in to comment.