Skip to content
This repository has been archived by the owner on Jul 12, 2023. It is now read-only.

Fortran backend for 1D EM forward modelling and Numpy broadcasting #22

Merged
merged 36 commits into from
Oct 24, 2018

Conversation

sgkang
Copy link
Contributor

@sgkang sgkang commented Sep 26, 2018

Reviewing pull request: #19

From @leonfoks

I've updated the 1D forward modeller to use numpy broadcasting and Fortran for a nice speed up (see below for timings)

For timings I used the default 19 layer model from the test example scripts.

In the setup.py file i've added the Fortran extension. On my Mac numpy.distutils would not add the extra compile flags needed from the extra_f90_compile_flags argument, so i've stuck them in the link args. This shouldn't be a problem.

Timing

Forward Modelling

EM1D - Survey
Two major bottlenecks

  1. projectFields - piecewise pulse - piecewise ramp - Scipy.fixed_quad
  2. forward - hz_kernel_circular_loop - rTEfunfwd/RTEfun_vec

Solutions

  1. Had a double loop in python in piecewise_ramp and lots of redundantly recalculated arrays inside the loop as a pythonic inline list expansion.
    By first expanding memory in C and utilizing broadcasting, the call time for only projectFields() for 1000 forward models was reduced from 20.6 s to 1.27s. This could be made faster by reducing to Fortran, I would just need to write the 1D interpolator in Fortran. Not necessary just yet.

  2. Because of the recursion relation, there's not much you can do other than use numpy broadcasting inside a python loop...Which is what was already done in RTEfun_vec.
    However, this can be sped up by reducing to Fortran (or C), I chose the former.
    After I did this, the call time for the forward() function for 1000 forward models went from 24.6s to about 10-11s. I was not expecting great speed up for this one function given that the only slow down was the python loop. Also I used a loop so no need for all the extra memory required when using numpy.

A single forward model is about 16ms instead of ~45ms. Which is a lot of time for our parallel code once you factor in the number of iterations and the number of data.

Sensitivity i.e. getJ_sigma()

  1. ProjectFields - same as above.
  2. forward - hz_kernel_circular_loop - rTEfunjac
    Quite a few issues here with redundant, repeated code, if statements inside loops, and multiple appends to pythonic lists.

Solutions

  1. Used the same functions as above, instead of piecewisePulse i wrote piecewise_pulse_fast().
    Since there are two calls to piecewise_ramp the new fast function reduced the time from 46.3s to 1.76s for 100 sensitivity calculations
  2. Recoded any matmul operations when I could take advantage of symmetry and negation of the components to minimize flops
    Used loops in Fortran to remove memory overhead that broadcasting requires
    100 calls to forward() went from 16.8s to 6.66s

Some general comments on speed

  1. Avoid inline loops and list building i.e.
    np.array([fixed_quad(step_func, t, t+t0, n=n)[0] for t in time])
    It's extremely slow compared to numpy arrays.

  2. Don't create empty arrays assigned to a variable and then immediately reassign that variable. e.g.
    rTE = np.empty((n_frequency, n_layer), dtype=complex)
    rTE = rTEfunfwd(
    n_layer, f, lamda, sig, chi, depth, self.survey.half_switch
    )
    rTE gets reassigned to the output of rTEfunfwd, so the np.empty statement just allocated space for no reason. The overhead is minor, but it's unnecessary. If you want to assign the values to the memory of the first empty call, use rTE[:, :] = function(...).

Leon Foks and others added 12 commits August 30, 2018 12:09
…ble. i.e. using x = np.empty() must be followed by x[:, :] in order to take advantage of the allocated memory.
…r 1 pulse. Fast diff uses broadcasting for 2 pulses. Piecewise Pulse fast keeps the gauss legendre coefficients so no need to keep asking for them.
Fixed a bug where j was not assigned in certain cases.
… glue to the Fortran codes. Memory is preallocated before calling the Fortran codes.
…o update the Fortran compile flags using the extra_f90_compile_args option. I had to add them to the link_args instead. Bit hacky, but it should work fine.
Fortran backend for 1D EM forward modelling and Numpy broadcasting
@sgkang
Copy link
Contributor Author

sgkang commented Sep 26, 2018

Hi @leonfoks, I'll review your pull request here!

@leonfoks
Copy link

Awesome, let me know if I need to do anything!

@sgkang
Copy link
Contributor Author

sgkang commented Oct 2, 2018

Hi @leonfoks, am I supposed to python setup.py install to compile m_rTE_Fortran.f90?
It seems not working...

I used to use

from distutils.core import setup

but it seems you are using

from numpy.distutils.core import setup

Can you explain bit more what am I supposed to make this work on my machine?

@leonfoks
Copy link

leonfoks commented Oct 2, 2018

@sgkang I navigate to the simpegEM1D root folder and use "pip install ." to install the package.

unfortunately the distutils.core setup cannot handle Fortran extensions. The numpy version calls f2py on the Fortran code and then compiles the shared library. f2py comes packaged with Numpy so no need to install anything.

@sgkang
Copy link
Contributor Author

sgkang commented Oct 9, 2018

@leonfoks No worries... it is working both of my mac and linux machine.
Not sure why its failing in travis, but will figure it out.

@coveralls
Copy link

coveralls commented Oct 10, 2018

Pull Request Test Coverage Report for Build 182

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at ?%

Totals Coverage Status
Change from base Build 125: 0.0%
Covered Lines:
Relevant Lines: 0

💛 - Coveralls

Copy link
Contributor Author

@sgkang sgkang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @leonfoks, this looks great, and actually I learned quite a bit from your modifications!
If you can answer couple of questions that I made, that will be great.

Then we can merge this into master branch

@@ -29,24 +28,41 @@
with open('README.md') as f:
LONG_DESCRIPTION = ''.join(f.readlines())

fExt = [Extension(name='simpegEM1D.m_rTE_Fortran', # Name of the package to import
sources=['simpegEM1D/Fortran/m_rTE_Fortran.f90'],
# extra_f90_compile_args=['-ffree-line-length-none',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leonfoks what are these commented lines?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally you put the fortran compile flags in the extra_f90_compile_args attribute, but it was not being recognized for some reason so I put them in the extra_link_args instead. You can delete these, but i left them commented to remind that this was the case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I'll remove them.

rTE = np.empty((n_frequency, n_layer), dtype=complex)
rTE = rTEfunfwd(
n_layer, f, lamda, sig, chi, depth, self.survey.half_switch
rTE = np.empty(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leonfoks Is this still okay to generate empty array? or do I need to change this as zero array as you did?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is essential because the order of the array is 'F' and I want to ensure that the ext funcion puts values in the correct memory locations

@@ -233,6 +235,92 @@ def piecewise_ramp(step_func, t_off, t_currents, currents, n=20, eps=1e-10):
) * const
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leonfoks Did you get read of all loops here in piecewise_ramp_fast?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are no loops in the _fast methods. Note that the _diff_fast replaces the original call to piecewise_ramp twice. Rather than calling the same function twice, and re-computing the same stuff, i just unrolled the two calls, figured out the common stuff, created this new function

@@ -0,0 +1,503 @@
module rTE_Fortran
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leonfoks Great! This is the core function, and I was thinking about using cython, but you've already transformed this into fortran!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So i did the Fortran part because it was easier for me at this time... If you ever do rewrite in cython it be easier from a maintenance standpoint. I'd be interested in what the times look like too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leonfoks I see. It seems both numba and cython are potential options. We can explore this later!

resp, _ = ffht(
u.flatten()*factor, self.time,
self.frequency, self.ftarg
)
# Compute EM sensitivities
else:
resp = np.zeros(
(self.n_time, self.n_layer), dtype=float, order='F'
)
resp_i = np.empty(self.n_time, dtype=float)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So, it is unecessary to form empty array.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends.
lets say I have a function that returns A

def func():
    X = np.random.randn(10,10)
    return X

If I call func, then a new array is created X and a variable y is assigned to that memory after the function finishes.

y = func()

If I first create an empty array (or zeros) and then call the function, there are two ways I could do it

y = np.zeros([10, 10])  # or np.empty([10, 10])
y = func()

this would create memory for y, enter the function, create NEW memory X, and then reassign the variable(or label) y to that newly created memory, and the first np.zeros call is garbage collected. So there was no point creating it in the first place.

If instead I do

y = np.zeros([10, 10])
y[:, :] = func()

y is created, enter function, create NEW memory X, but then copy those values into the space of the existing variable y. The memory of X that was create inside the func is now garbage collected instead.

By default, numpy arrays are ordered with order='C' not order='F', and we want order='F' because we are using a Fortran backend. We could still use order='C' but when we pass those arrays through to Fortran there is a hidden copy of memory that occurs in order to map it to the correct ordering, something we don't want to happen every time we call the Fortran codes.

So, rather than me editing all the functions that return an array and adding order='F', I just edited the instantiation of that memory, and copy the results of functions into those placeholders. Since we re-use those arrays, doing this once at the beginning does not incur any real cost to the code. Hence why I keep np.empty only when we have order='F' in the memory instantiation.

Clear as mud?

Copy link

@leonfoks leonfoks Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there is a minuscule difference between np.zeros and np.empty.
np.empty just creates memory, if you were to print those values, they might be weird and random.
np.zeros essentially calls np.empty and then fills the memory with 0.

So if you know that the entire array will be filled with numbers, there is no need to use np.zeros, just use np.empty. If there is any chance that some array elements will not be filled, I would always use np.zeros. In general it is best practice to always initialize memory with a value. It makes bugs easier to track.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see not it's crystal clear.

For below case,

y = np.zeros([10, 10])
y[:, :] = func()

when copying output from func() to y, are we using doubled amount of memory?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it uses more memory, but the pre-allocation costs you unnecessary time.
2018-10-22_02

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the first two cells. I cannot attach .py or .ipynb, apparently.

2018-10-22_03

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interested though if @leonfoks agrees or not...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The pre-allocation is not costing you lots of time, it is the indexing later by filling in the new values into the existing array. If you just create it and then overwrite it you don't loose much, but it is still unnecessary. See http://nbviewer.jupyter.org/gist/prisae/af1b40ef4a2e16e130b39f6d7957be50 for an updated version.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. it seems now pretty clear to me what's the big kid on the block!

)

resp_int_i = np.empty(self.time_int.size, dtype=float)
dtype=np.float64, order='F')

# TODO: remove for loop
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @prisae and @leonfoks, this is the loop that I can remove, related to #14

@sgkang
Copy link
Contributor Author

sgkang commented Oct 22, 2018

@leonfoks, I am thinking about merging this branch to master today, let me know if you any comment or suggestion that you want to make!

@sgkang
Copy link
Contributor Author

sgkang commented Oct 24, 2018

Thanks a lot particularly for @leonfoks, this improvement is huge!
Thank you @prisae for reviews and constructive comments. I learned a lot with this pull request from you guys.

@sgkang sgkang closed this Oct 24, 2018
@sgkang sgkang reopened this Oct 24, 2018
@sgkang sgkang merged commit 4802b67 into master Oct 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants