-
Notifications
You must be signed in to change notification settings - Fork 11
Fortran backend for 1D EM forward modelling and Numpy broadcasting #22
Conversation
…ble. i.e. using x = np.empty() must be followed by x[:, :] in order to take advantage of the allocated memory.
…r 1 pulse. Fast diff uses broadcasting for 2 pulses. Piecewise Pulse fast keeps the gauss legendre coefficients so no need to keep asking for them.
Fixed a bug where j was not assigned in certain cases.
… glue to the Fortran codes. Memory is preallocated before calling the Fortran codes.
…o update the Fortran compile flags using the extra_f90_compile_args option. I had to add them to the link_args instead. Bit hacky, but it should work fine.
Fortran backend for 1D EM forward modelling and Numpy broadcasting
Hi @leonfoks, I'll review your pull request here! |
Awesome, let me know if I need to do anything! |
Hi @leonfoks, am I supposed to I used to use from distutils.core import setup but it seems you are using from numpy.distutils.core import setup Can you explain bit more what am I supposed to make this work on my machine? |
@sgkang I navigate to the simpegEM1D root folder and use "pip install ." to install the package. unfortunately the distutils.core setup cannot handle Fortran extensions. The numpy version calls f2py on the Fortran code and then compiles the shared library. f2py comes packaged with Numpy so no need to install anything. |
@leonfoks No worries... it is working both of my mac and linux machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @leonfoks, this looks great, and actually I learned quite a bit from your modifications!
If you can answer couple of questions that I made, that will be great.
Then we can merge this into master branch
@@ -29,24 +28,41 @@ | |||
with open('README.md') as f: | |||
LONG_DESCRIPTION = ''.join(f.readlines()) | |||
|
|||
fExt = [Extension(name='simpegEM1D.m_rTE_Fortran', # Name of the package to import | |||
sources=['simpegEM1D/Fortran/m_rTE_Fortran.f90'], | |||
# extra_f90_compile_args=['-ffree-line-length-none', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leonfoks what are these commented lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally you put the fortran compile flags in the extra_f90_compile_args attribute, but it was not being recognized for some reason so I put them in the extra_link_args instead. You can delete these, but i left them commented to remind that this was the case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I'll remove them.
rTE = np.empty((n_frequency, n_layer), dtype=complex) | ||
rTE = rTEfunfwd( | ||
n_layer, f, lamda, sig, chi, depth, self.survey.half_switch | ||
rTE = np.empty( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leonfoks Is this still okay to generate empty array? or do I need to change this as zero array as you did?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is essential because the order of the array is 'F' and I want to ensure that the ext funcion puts values in the correct memory locations
@@ -233,6 +235,92 @@ def piecewise_ramp(step_func, t_off, t_currents, currents, n=20, eps=1e-10): | |||
) * const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leonfoks Did you get read of all loops here in piecewise_ramp_fast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are no loops in the _fast methods. Note that the _diff_fast replaces the original call to piecewise_ramp twice. Rather than calling the same function twice, and re-computing the same stuff, i just unrolled the two calls, figured out the common stuff, created this new function
@@ -0,0 +1,503 @@ | |||
module rTE_Fortran |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leonfoks Great! This is the core function, and I was thinking about using cython, but you've already transformed this into fortran!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So i did the Fortran part because it was easier for me at this time... If you ever do rewrite in cython it be easier from a maintenance standpoint. I'd be interested in what the times look like too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leonfoks I see. It seems both numba and cython are potential options. We can explore this later!
resp, _ = ffht( | ||
u.flatten()*factor, self.time, | ||
self.frequency, self.ftarg | ||
) | ||
# Compute EM sensitivities | ||
else: | ||
resp = np.zeros( | ||
(self.n_time, self.n_layer), dtype=float, order='F' | ||
) | ||
resp_i = np.empty(self.n_time, dtype=float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So, it is unecessary to form empty array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends.
lets say I have a function that returns A
def func():
X = np.random.randn(10,10)
return X
If I call func, then a new array is created X and a variable y is assigned to that memory after the function finishes.
y = func()
If I first create an empty array (or zeros) and then call the function, there are two ways I could do it
y = np.zeros([10, 10]) # or np.empty([10, 10])
y = func()
this would create memory for y, enter the function, create NEW memory X, and then reassign the variable(or label) y to that newly created memory, and the first np.zeros
call is garbage collected. So there was no point creating it in the first place.
If instead I do
y = np.zeros([10, 10])
y[:, :] = func()
y is created, enter function, create NEW memory X, but then copy those values into the space of the existing variable y. The memory of X that was create inside the func is now garbage collected instead.
By default, numpy arrays are ordered with order='C' not order='F', and we want order='F' because we are using a Fortran backend. We could still use order='C' but when we pass those arrays through to Fortran there is a hidden copy of memory that occurs in order to map it to the correct ordering, something we don't want to happen every time we call the Fortran codes.
So, rather than me editing all the functions that return an array and adding order='F', I just edited the instantiation of that memory, and copy the results of functions into those placeholders. Since we re-use those arrays, doing this once at the beginning does not incur any real cost to the code. Hence why I keep np.empty only when we have order='F' in the memory instantiation.
Clear as mud?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also there is a minuscule difference between np.zeros and np.empty.
np.empty just creates memory, if you were to print those values, they might be weird and random.
np.zeros essentially calls np.empty and then fills the memory with 0.
So if you know that the entire array will be filled with numbers, there is no need to use np.zeros, just use np.empty. If there is any chance that some array elements will not be filled, I would always use np.zeros. In general it is best practice to always initialize memory with a value. It makes bugs easier to track.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see not it's crystal clear.
For below case,
y = np.zeros([10, 10])
y[:, :] = func()
when copying output from func()
to y
, are we using doubled amount of memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interested though if @leonfoks agrees or not...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The pre-allocation is not costing you lots of time, it is the indexing later by filling in the new values into the existing array. If you just create it and then overwrite it you don't loose much, but it is still unnecessary. See http://nbviewer.jupyter.org/gist/prisae/af1b40ef4a2e16e130b39f6d7957be50 for an updated version.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. it seems now pretty clear to me what's the big kid on the block!
simpegEM1D/Survey.py
Outdated
) | ||
|
||
resp_int_i = np.empty(self.time_int.size, dtype=float) | ||
dtype=np.float64, order='F') | ||
|
||
# TODO: remove for loop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leonfoks, I am thinking about merging this branch to master today, let me know if you any comment or suggestion that you want to make! |
Reviewing pull request: #19
From @leonfoks
I've updated the 1D forward modeller to use numpy broadcasting and Fortran for a nice speed up (see below for timings)
For timings I used the default 19 layer model from the test example scripts.
In the setup.py file i've added the Fortran extension. On my Mac numpy.distutils would not add the extra compile flags needed from the extra_f90_compile_flags argument, so i've stuck them in the link args. This shouldn't be a problem.
Timing
Forward Modelling
EM1D - Survey
Two major bottlenecks
Solutions
Had a double loop in python in piecewise_ramp and lots of redundantly recalculated arrays inside the loop as a pythonic inline list expansion.
By first expanding memory in C and utilizing broadcasting, the call time for only projectFields() for 1000 forward models was reduced from 20.6 s to 1.27s. This could be made faster by reducing to Fortran, I would just need to write the 1D interpolator in Fortran. Not necessary just yet.
Because of the recursion relation, there's not much you can do other than use numpy broadcasting inside a python loop...Which is what was already done in RTEfun_vec.
However, this can be sped up by reducing to Fortran (or C), I chose the former.
After I did this, the call time for the forward() function for 1000 forward models went from 24.6s to about 10-11s. I was not expecting great speed up for this one function given that the only slow down was the python loop. Also I used a loop so no need for all the extra memory required when using numpy.
A single forward model is about 16ms instead of ~45ms. Which is a lot of time for our parallel code once you factor in the number of iterations and the number of data.
Sensitivity i.e. getJ_sigma()
Quite a few issues here with redundant, repeated code, if statements inside loops, and multiple appends to pythonic lists.
Solutions
Since there are two calls to piecewise_ramp the new fast function reduced the time from 46.3s to 1.76s for 100 sensitivity calculations
Used loops in Fortran to remove memory overhead that broadcasting requires
100 calls to forward() went from 16.8s to 6.66s
Some general comments on speed
Avoid inline loops and list building i.e.
np.array([fixed_quad(step_func, t, t+t0, n=n)[0] for t in time])
It's extremely slow compared to numpy arrays.
Don't create empty arrays assigned to a variable and then immediately reassign that variable. e.g.
rTE = np.empty((n_frequency, n_layer), dtype=complex)
rTE = rTEfunfwd(
n_layer, f, lamda, sig, chi, depth, self.survey.half_switch
)
rTE gets reassigned to the output of rTEfunfwd, so the np.empty statement just allocated space for no reason. The overhead is minor, but it's unnecessary. If you want to assign the values to the memory of the first empty call, use rTE[:, :] = function(...).