Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompose model into ensemble of simpler models #247

Closed
phelps-sg opened this issue May 13, 2020 · 10 comments
Closed

Decompose model into ensemble of simpler models #247

phelps-sg opened this issue May 13, 2020 · 10 comments

Comments

@phelps-sg
Copy link
Contributor

phelps-sg commented May 13, 2020

The model incorporates many features and many parameters. Depending on the purpose of the model (#246), it may be desirable to simplify the model. For example, if the goal of the model is to produce accurate forecasts, there is little point in making the model more complicated by implementing #242, #240, #210 unless these features significantly improve forecasting accuracy, as introducing additional complexity is likely to increase the number of parameters and correspondingly the variance of the forecasts.

It might be useful to design a framework allowing the modeler to disable/enable certain features of the model in order to assess their impact on the overall quality (#248) of the outputs produced by the model (this will be more straightforward to implement once #208 has been resolved).

@bbolker
Copy link

bbolker commented May 13, 2020

With respect ... the suggestions you've been making are all good ones, but I suspect that they're not that useful to the development team - who as has been mentioned before on various threads here are up to their eyeballs in urgent pandemic-related tasks: for better or worse, they simply don't have time themselves for significant refactoring or enhancements of the model. For that reason, they probably won't be acted on.

Many of your issues have been of the form "it would be good if the model did XXX". You're definitely right, but each of these takes significant development effort, especially to do without breaking the current structure. The most useful thing you can do (in my opinion) would be (1) to use the issues list to suggest relatively straightforward improvements or enhancements that can either be made by the development team or (preferably) in separate forks by independent developers, then submitted as pull requests; (2) use forks to work on larger-scale architectural changes that can be considered in the slightly longer term.

I think @zebmason and @Feynstein are good examples of developers who are actually getting in and making useful changes.

I apologize for the meta-comments here, and any objectionable tone. Ideas are useful too, but actual development is the much more limiting factor at this point ...

Some enabling/disabling can probably already be done at this point by setting certain parameters to null values (e.g. setting within-household transmission to zero?), although I agree it might be better to have flags or switches for disabling the features entirely.

@Feynstein
Copy link

@bbolker thanks :p I was starting to think people where looking down at me because I was trying to hard.

On that note though, I recently found out that they were using this repo as the live version for their modelling. Hence the difficulty of the community to get in substantial improvements to the code. I tried raising issues on this matter in #231, and I also tried to send an email to @weshinsley. I still have yet to receive an answer but I think the fact they use it live and can't respond that often shows how they are over their heads with work. I feel very bad now because of I knew that I wouldn't have pushed that hard.

Basically they are doomed to do incremental changes until the end of the Pandemic or until someone can finish the huge task of refactoring it all. I will still be trying to work on my fork but I started my day job back, that's why I tried to start an experimental branch movement because my work involves code, but as a lead scientist it also involves making tasks and dealing with issues, which is much easier for me. I also have to get tested as per my MDs request because I've been having neurological symptoms and it might be that damn virus... That might mean I won't be able to do anything for a while... ...

So yeah basically my wall of text meant nothing except that we somehow need to start an experimental branch and get it moving somewhere. And make sure the code stands against validated results.

@bbolker
Copy link

bbolker commented May 14, 2020

Your fork could easily become the experimental branch, especially if you

  • set up a pipeline to pull in recent commits from the main repository (this is easy, ask me if you want);
  • set up a testing pipeline on your fork that matches (or improves on) the testing pipeline here;
  • publicize it/make a point of inviting some of the more productive contributors to this repo (those who appear to have time and interest in making more architectural improvements)

@robscovell-ts
Copy link

robscovell-ts commented May 17, 2020

@bbolker I don't think that is a fair response to @phelps-sg to be honest. I agree with @phelps-sg's suggestions because they relate to making this a more realistic model that can be used with confidence to inform public policy.

The scenario suggestions could be generalised to a question of viral load in different situations. As I understand it (and please correct me if I am wrong) the model as-is assumes that each individual interaction gives rise to an equal probability of transmission, or at least the same probability distribution for each individual interaction if the interactions are modelled stochastically. A better model would use different probability distributions for stochastic interaction models in different scenarios.

The suggestions for refactoring are fundamentally about confidence in the output of the model. I am very much looking forward to (and perhaps contributing to) @Feynstein's refactor.

@bbolker
Copy link

bbolker commented May 17, 2020

My point is less about the specific goal (design a framework ...), which is fine. I'm saying that people can propose enhancements and changes until they're blue in the face, but that just saying "why don't you do XXX", especially when XXX is a large-scale/architectural change to the model, is not likely to lead to useful outcome (because the development team is super-busy) ...

@zebmason
Copy link
Contributor

I should, perhaps, state that I'm just doing what I'm interested in which is killing the boredom of lockdown. The other day I did look into getting a flight to Stockholm so I could go to a bar but it looks like I'm stuck here so I thought that I might as well help out. As to that I'm just doing what I think is useful and have thus far not seemed to repeat any other work.

@robscovell-ts
Copy link

@bbolker Thank you for clarifying. As I understand it, the code has been open-sourced here for the sake of public scrutiny by professionals in the software development and mathematical modelling communities. There are three possible levels of scope for professional scrutiny:

1 the choice of mathematical model(s) used
2 the architecture of the implementation of the model(s)
3 the implementation details, i.e. the actual code.

At the moment, the only contributions that are welcome are at level 3, due to resource constraints in your team.

Will there be any future processes in place to allow contributions at levels 1 and 2?

This is very much a first: I don't know of the open-sourcing of any other model of such importance for public policy guidance. I admire your team's courage in putting it out there and I can imagine that managing the responses must add extra strain to already-stretched resources.

@bbolker
Copy link

bbolker commented May 18, 2020

To clarify, I am not a member of the modelling team. I'm an academic; one component of my research is epidemic modelling (although not generally at this level of realism/complexity).

I can't speak for the modelling team's long-term goals. For what it's worth:

  • there are a variety of other agent-based models for COVID available: see the agent-based sheet at https://tinyurl.com/covid19-models
  • I would distinguish between contributions and suggestions. A "contribution" is "I've implemented XXX in a fork, it's well tested, it won't break your current workflow, here's a pull request". A "suggestion" is "gee, you guys ought to do XXX"

@robscovell-ts
Copy link

robscovell-ts commented May 19, 2020

Thank you for the clarification, @bbolker . I am a software engineer with a background education in mathematical modelling. I have worked in the pharmaceutical industry and academia as a developer of statistical modelling software before becoming a consultant in general software development.

That's an incredibly useful and interesting sheet of models that you shared.

@weshinsley
Copy link
Collaborator

Just catching up/cleaning up some old discussions that have probably run their course now, with a brief summary.

The model might be better thought of as an "explorative" platform, rather than a forecasting tool, although those things sometimes overlap a bit in application. The primary use is to answer "what-if" scenario modelling, rather than forecasting in the meteorological style. Many different behaviours can be switched on and off in the parameter-file, and you can incrementally switch things off all the way down to basic spatial behaviour if you like, leaving you with something that behaves like a compartmental SIR model. (Very non-typical use case and might not have been tested recently, but in principle...)

To answer the top post, I'd therefore argue that this codebase already is a framework supporting switching on/off all the features of the model, and having one platform you can parameterise by switching on the behaviours you want (and that you have data for...), feels more appealing to me than a suite of different models with this-or-that feature appearing in each version. The matrix of possibilities would be pretty explosive.

One other correction; it's very much not the case that individual interactions give rise to equal probability of transmission, and that's rather the point of this kind of individual-based spatial simulation. Individual transmission probabilities will depend on the circumstances and modelled behaviour of the infectee, including the properties of their spatial area - their household, age, the properties of those living around, the intervention them etc...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants