Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Memory allocation #59

Open
wants to merge 13 commits into
base: dev
Choose a base branch
from
Open

WIP: Memory allocation #59

wants to merge 13 commits into from

Conversation

wytbella
Copy link

@wytbella wytbella commented Jun 24, 2020

To look into issue: #51

I tracked the time/memory use in update!, here are some initial results on Scotland_run.jl for single thread (see the table below)

 ──────────────────────────────────────────────────────────────────────────────
                                       Time                   Allocations
                               ──────────────────────   ───────────────────────
       Tot / % measured:             597s / 53.7%           38.0GiB / 68.4%

 Section               ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────────────
 virusupdate!              90     227s  70.7%   2.52s   25.4GiB  97.5%   289MiB
   firstloop            2.70k     172s  53.7%  63.7ms   22.7GiB  87.4%  8.62MiB
     birth draw          146M    71.2s  22.2%   487ns   10.9GiB  41.9%    80.0B
     virusmove          2.69M    35.0s  10.9%  13.0μs     0.00B  0.00%    0.00B
     death draw          146M    25.0s  7.80%   171ns   3.06GiB  11.8%    22.4B
   rest virus update       90    668ms  0.21%  7.42ms    112MiB  0.42%  1.24MiB
 classupdate!              90    90.2s  28.2%   1.00s    586MiB  2.20%  6.51MiB
 invalidatecaches!         90    3.07s  0.96%  34.1ms     0.00B  0.00%    0.00B
 habitatupdate!            90    544ms  0.17%  6.05ms   67.4MiB  0.25%   767KiB
 applycontrols!            90   39.7μs  0.00%   441ns     0.00B  0.00%    0.00B
 ──────────────────────────────────────────────────────────────────────────────

@claireh93 @richardreeve , the tool (TimerOutputs.jl) I used doesn't seem to support multi-threading well, so I might have to use other tools for further investigation. Just want to ask, does the result (the table above) on a single thread give you any insights? Any suggestions on what's next?

Edit: some results from 1 or 2 threads (TimerOutputs could work if I don't track in details

1 threads
 ──────────────────────────────────────────────────────────────────────────────
                                       Time                   Allocations
                               ──────────────────────   ───────────────────────
       Tot / % measured:             158s / 64.8%           16.0GiB / 54.3%

 Section               ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────────────
 virusupdate!              30    69.6s  68.0%   2.32s   8.41GiB  96.9%   287MiB
   rest virus update       30    396ms  0.39%  13.2ms   51.6MiB  0.58%  1.72MiB
 classupdate!              30    31.1s  30.4%   1.04s    211MiB  2.37%  7.03MiB
 invalidatecaches!         30    1.12s  1.09%  37.2ms     0.00B  0.00%    0.00B
 habitatupdate!            30    583ms  0.57%  19.4ms   67.4MiB  0.76%  2.25MiB
 applycontrols!            30   33.5μs  0.00%  1.12μs     0.00B  0.00%    0.00B
 ──────────────────────────────────────────────────────────────────────────────

2 threads
 ──────────────────────────────────────────────────────────────────────────────
                                       Time                   Allocations
                               ──────────────────────   ───────────────────────
       Tot / % measured:             125s / 48.7%           16.0GiB / 54.2%

 Section               ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────────────
 virusupdate!              30    40.7s  67.0%   1.36s   8.40GiB  96.9%   287MiB
   rest virus update       30    624ms  1.03%  20.8ms   51.6MiB  0.58%  1.72MiB
 classupdate!              30    17.9s  29.5%   597ms    211MiB  2.37%  7.03MiB
 invalidatecaches!         30    1.44s  2.38%  48.1ms     0.00B  0.00%    0.00B
 habitatupdate!            30    650ms  1.07%  21.7ms   67.4MiB  0.76%  2.25MiB
 applycontrols!            30   14.9μs  0.00%   498ns     0.00B  0.00%    0.00B
 ──────────────────────────────────────────────────────────────────────────────

Update to new dev:

  ────────────────────────────────────────────────────────────────────────────────
                                         Time                   Allocations      
                                 ──────────────────────   ───────────────────────
        Tot / % measured:              252s / 36.5%           54.1GiB / 44.3%    

 Section                 ncalls     time   %tot     avg     alloc   %tot      avg
 ────────────────────────────────────────────────────────────────────────────────
 virusupdate!                60    83.8s  90.9%   1.40s   19.5GiB  81.3%   333MiB
   secondloop             31.5M    69.1s  74.9%  2.19μs   15.4GiB  64.2%     524B
     loop                 31.5M    18.3s  19.8%   581ns   4.59GiB  19.1%     156B
       Poisson            4.04M    3.10s  3.36%   767ns    185MiB  0.75%    48.0B
         Poisson rand     4.04M    1.38s  1.49%   341ns   61.7MiB  0.25%    16.0B
         Poisson crea...  4.04M    855ms  0.93%   212ns    123MiB  0.50%    32.0B
     binomial2            31.5M    8.58s  9.31%   272ns   1.88GiB  7.83%    64.0B
     binomial             31.5M    6.67s  7.23%   212ns   0.94GiB  3.92%    32.0B
     alloc                31.5M    3.24s  3.52%   103ns   2.82GiB  11.8%    96.0B
     traitfn              31.5M    2.59s  2.81%  82.4ns     0.00B  0.00%    0.00B
     .assign              31.5M    2.17s  2.35%  68.9ns     0.00B  0.00%    0.00B
   firstloop                180    7.62s  8.26%  42.3ms   1.94GiB  8.08%  11.0MiB
     birth draw           14.4M    5.09s  5.52%   352ns   1.08GiB  4.50%    80.1B
     virusmove            1.13M    171ms  0.19%   152ns     0.00B  0.00%    0.00B
   rest virus update         60    436ms  0.47%  7.27ms    140MiB  0.57%  2.33MiB
 classupdate!                60    7.50s  8.13%   125ms   4.41GiB  18.4%  75.3MiB
 habitatupdate!              60    735ms  0.80%  12.3ms   62.0MiB  0.25%  1.03MiB
 invalidatecaches!           60    122ms  0.13%  2.04ms     0.00B  0.00%    0.00B
 applycontrols!              60   18.1μs  0.00%   302ns     0.00B  0.00%    0.00B
 ────────────────────────────────────────────────────────────────────────────────

@wytbella wytbella changed the base branch from master to dev June 24, 2020 09:47
@claireh93
Copy link

Hi @wytbella, now #55 has been merged there has been a fairly substantial change to virusupdate! - it would be good to re-run this analysis if possible and see if/how it has changed memory allocations. I would probably be inclined to do this on Scotland_run.jl as it works both single and multithreaded and includes all the age classes, which slows everything down - but in the meantime, I'll take a look at what is going wrong with multithreads on SEI3HRD_example.jl

@jwscook
Copy link

jwscook commented Jul 23, 2020

We've got what we needed from this - thanks very much @wytbella. We'll keep the branch alive and can merge in these changes in the future on an ad-hoc basis to rerun the memory profiling.

@jwscook jwscook closed this Jul 23, 2020
@claireh93 claireh93 reopened this Jul 29, 2020
@wytbella
Copy link
Author

Hi @claireh93 , sorry about the delay. I meant to include the latest development on dev and claireh93/mixing for an updated profiling, but struggle to precompile Simulation due to the newly introduced python dependency.

Before I spend more time to solve the python issue, just want to ask how important it is to profile the latest changes? As mentioned in the meeting, I think this PR provides more of a tooling/reference rather than a concrete conclusion (sorry about that!). It includes examples to use TimerOutputs to monitor time and memory use for different parts of the code. I'm not sure it's stable for multi-threading diagnosis though.

@claireh93
Copy link

Hi @wytbella, yes I think you are right - it's more important that we have the tools on this branch so that we can run them to explore the code as it changes, so I wouldn't worry about running it if there are python problems! As Richard said, it would be great to chat about how these things work at some point in the future, but there is no rush :)

* dev: (150 commits)
  Revert rand MedianGenerator Binomial removal.
  changes from Richard's PR review
  Apply suggestions from code review
  Add codecov reports
  Remove old unused EpiList constructor.
  Fix testing and examples to new EpiList API.
  Switch to new API for EpiList so that human_abun and virus_abun store all information about human and virus states in a dataframe so disease_classes is no longer needed. Introduce DiseaseState enum to define Susceptible, Infectious, etc. Other small fixes: - first Susceptible age group is now Susceptible1 for consistency - virus needs initialisation with a vector like humans when there are age categories
  rand call didnt have rng
  Update SEI3HRD_example to new movement kernel format - number grid cells instead of classes
  things missed from merge
  lower case kwarg
  needed type anntations for ::Type{R} where R<:Random.AbstractRNG & VecRNGType
  Remove redundant parameter setting
  Fix commit problem
  Fix data frames so they are easier to read.
  work in progress
  Tidying and clarification
  Fixing documentation and tidying
  starting work to merge in from dev
  Use dot product instead of allocating
  ...

# Conflicts:
#	Project.toml
#	examples/Epidemiology/SEI3HRD_example.jl
#	examples/Epidemiology/Scotland_run.jl
#	src/Epidemiology/EpiGenerate.jl
#	src/Simulation.jl
@richardreeve
Copy link
Member

@claireh93 - I've updated this to the latest version of dev. There's still a lot of work to do here (happily?) when someone has time.

@codecov
Copy link

codecov bot commented Aug 24, 2020

Codecov Report

Merging #59 into dev will increase coverage by 0.22%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev      #59      +/-   ##
==========================================
+ Coverage   61.97%   62.20%   +0.22%     
==========================================
  Files          46       46              
  Lines        2735     2741       +6     
==========================================
+ Hits         1695     1705      +10     
+ Misses       1040     1036       -4     
Impacted Files Coverage Δ
src/Simulation.jl 60.00% <ø> (ø)
src/Epidemiology/EpiGenerate.jl 83.33% <100.00%> (+3.33%) ⬆️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants