Fortran backend for 1D EM forward modelling and Numpy broadcasting #22

sgkang · 2018-09-26T19:14:47Z

Reviewing pull request: #19

I've updated the 1D forward modeller to use numpy broadcasting and Fortran for a nice speed up (see below for timings)

For timings I used the default 19 layer model from the test example scripts.

In the setup.py file i've added the Fortran extension. On my Mac numpy.distutils would not add the extra compile flags needed from the extra_f90_compile_flags argument, so i've stuck them in the link args. This shouldn't be a problem.

Timing

Forward Modelling

EM1D - Survey
Two major bottlenecks

projectFields - piecewise pulse - piecewise ramp - Scipy.fixed_quad
forward - hz_kernel_circular_loop - rTEfunfwd/RTEfun_vec

Solutions

Had a double loop in python in piecewise_ramp and lots of redundantly recalculated arrays inside the loop as a pythonic inline list expansion.
By first expanding memory in C and utilizing broadcasting, the call time for only projectFields() for 1000 forward models was reduced from 20.6 s to 1.27s. This could be made faster by reducing to Fortran, I would just need to write the 1D interpolator in Fortran. Not necessary just yet.
Because of the recursion relation, there's not much you can do other than use numpy broadcasting inside a python loop...Which is what was already done in RTEfun_vec.
However, this can be sped up by reducing to Fortran (or C), I chose the former.
After I did this, the call time for the forward() function for 1000 forward models went from 24.6s to about 10-11s. I was not expecting great speed up for this one function given that the only slow down was the python loop. Also I used a loop so no need for all the extra memory required when using numpy.

A single forward model is about 16ms instead of ~45ms. Which is a lot of time for our parallel code once you factor in the number of iterations and the number of data.

Sensitivity i.e. getJ_sigma()

ProjectFields - same as above.
forward - hz_kernel_circular_loop - rTEfunjac
Quite a few issues here with redundant, repeated code, if statements inside loops, and multiple appends to pythonic lists.

Solutions

Used the same functions as above, instead of piecewisePulse i wrote piecewise_pulse_fast().
Since there are two calls to piecewise_ramp the new fast function reduced the time from 46.3s to 1.76s for 100 sensitivity calculations
Recoded any matmul operations when I could take advantage of symmetry and negation of the components to minimize flops
Used loops in Fortran to remove memory overhead that broadcasting requires
100 calls to forward() went from 16.8s to 6.66s

Some general comments on speed

Avoid inline loops and list building i.e.
np.array([fixed_quad(step_func, t, t+t0, n=n)[0] for t in time])
It's extremely slow compared to numpy arrays.
Don't create empty arrays assigned to a variable and then immediately reassign that variable. e.g.
rTE = np.empty((n_frequency, n_layer), dtype=complex)
rTE = rTEfunfwd(
n_layer, f, lamda, sig, chi, depth, self.survey.half_switch
)
rTE gets reassigned to the output of rTEfunfwd, so the np.empty statement just allocated space for no reason. The overhead is minor, but it's unnecessary. If you want to assign the values to the memory of the first empty call, use rTE[:, :] = function(...).

…ble. i.e. using x = np.empty() must be followed by x[:, :] in order to take advantage of the allocated memory.

…r 1 pulse. Fast diff uses broadcasting for 2 pulses. Piecewise Pulse fast keeps the gauss legendre coefficients so no need to keep asking for them.

Fixed a bug where j was not assigned in certain cases.

… glue to the Fortran codes. Memory is preallocated before calling the Fortran codes.

…o update the Fortran compile flags using the extra_f90_compile_args option. I had to add them to the link_args instead. Bit hacky, but it should work fine.

Fortran backend for 1D EM forward modelling and Numpy broadcasting

sgkang · 2018-09-26T19:15:30Z

Hi @leonfoks, I'll review your pull request here!

leonfoks · 2018-09-28T21:23:39Z

Awesome, let me know if I need to do anything!

sgkang · 2018-10-02T05:31:34Z

Hi @leonfoks, am I supposed to python setup.py install to compile m_rTE_Fortran.f90?
It seems not working...

I used to use

from distutils.core import setup

but it seems you are using

from numpy.distutils.core import setup

Can you explain bit more what am I supposed to make this work on my machine?

leonfoks · 2018-10-02T14:40:32Z

@sgkang I navigate to the simpegEM1D root folder and use "pip install ." to install the package.

unfortunately the distutils.core setup cannot handle Fortran extensions. The numpy version calls f2py on the Fortran code and then compiles the shared library. f2py comes packaged with Numpy so no need to install anything.

sgkang · 2018-10-09T18:37:37Z

@leonfoks No worries... it is working both of my mac and linux machine.
Not sure why its failing in travis, but will figure it out.

coveralls · 2018-10-10T03:33:57Z

Pull Request Test Coverage Report for Build 182

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at ?%

Totals
Change from base Build 125:	0.0%
Covered Lines:
Relevant Lines:	0

💛 - Coveralls

sgkang

Hi @leonfoks, this looks great, and actually I learned quite a bit from your modifications!
If you can answer couple of questions that I made, that will be great.

Then we can merge this into master branch

sgkang · 2018-10-17T00:46:49Z

setup.py

@@ -29,24 +28,41 @@
 with open('README.md') as f:
    LONG_DESCRIPTION = ''.join(f.readlines())

+fExt = [Extension(name='simpegEM1D.m_rTE_Fortran', # Name of the package to import
+                  sources=['simpegEM1D/Fortran/m_rTE_Fortran.f90'],
+                #   extra_f90_compile_args=['-ffree-line-length-none',


@leonfoks what are these commented lines?

Normally you put the fortran compile flags in the extra_f90_compile_args attribute, but it was not being recognized for some reason so I put them in the extra_link_args instead. You can delete these, but i left them commented to remind that this was the case

Sounds good! I'll remove them.

sgkang · 2018-10-17T00:52:16Z

simpegEM1D/EM1D.py

-            rTE = np.empty((n_frequency, n_layer), dtype=complex)
-            rTE = rTEfunfwd(
-                n_layer, f, lamda, sig, chi, depth, self.survey.half_switch
+            rTE = np.empty(


@leonfoks Is this still okay to generate empty array? or do I need to change this as zero array as you did?

This one is essential because the order of the array is 'F' and I want to ensure that the ext funcion puts values in the correct memory locations

sgkang · 2018-10-17T00:59:41Z

simpegEM1D/Waveforms.py

@@ -233,6 +235,92 @@ def piecewise_ramp(step_func, t_off, t_currents, currents, n=20, eps=1e-10):
            ) * const


@leonfoks Did you get read of all loops here in piecewise_ramp_fast?

Yes, there are no loops in the _fast methods. Note that the _diff_fast replaces the original call to piecewise_ramp twice. Rather than calling the same function twice, and re-computing the same stuff, i just unrolled the two calls, figured out the common stuff, created this new function

sgkang · 2018-10-17T01:01:07Z

simpegEM1D/Fortran/m_rTE_Fortran.f90

@@ -0,0 +1,503 @@
+module rTE_Fortran


@leonfoks Great! This is the core function, and I was thinking about using cython, but you've already transformed this into fortran!

So i did the Fortran part because it was easier for me at this time... If you ever do rewrite in cython it be easier from a maintenance standpoint. I'd be interested in what the times look like too.

@leonfoks I see. It seems both numba and cython are potential options. We can explore this later!

sgkang · 2018-10-17T01:02:54Z

simpegEM1D/Survey.py

                resp, _ = ffht(
                    u.flatten()*factor, self.time,
                    self.frequency, self.ftarg
                )
            # Compute EM sensitivities
            else:
                resp = np.zeros(
-                    (self.n_time, self.n_layer), dtype=float, order='F'
-                )
-                resp_i = np.empty(self.n_time, dtype=float)


I see. So, it is unecessary to form empty array.

It depends.
lets say I have a function that returns A

def func(): X = np.random.randn(10,10) return X

If I call func, then a new array is created X and a variable y is assigned to that memory after the function finishes.

y = func()

If I first create an empty array (or zeros) and then call the function, there are two ways I could do it

y = np.zeros([10, 10]) # or np.empty([10, 10]) y = func()

this would create memory for y, enter the function, create NEW memory X, and then reassign the variable(or label) y to that newly created memory, and the first np.zeros call is garbage collected. So there was no point creating it in the first place.

If instead I do

y = np.zeros([10, 10]) y[:, :] = func()

y is created, enter function, create NEW memory X, but then copy those values into the space of the existing variable y. The memory of X that was create inside the func is now garbage collected instead.

By default, numpy arrays are ordered with order='C' not order='F', and we want order='F' because we are using a Fortran backend. We could still use order='C' but when we pass those arrays through to Fortran there is a hidden copy of memory that occurs in order to map it to the correct ordering, something we don't want to happen every time we call the Fortran codes.

So, rather than me editing all the functions that return an array and adding order='F', I just edited the instantiation of that memory, and copy the results of functions into those placeholders. Since we re-use those arrays, doing this once at the beginning does not incur any real cost to the code. Hence why I keep np.empty only when we have order='F' in the memory instantiation.

Clear as mud?

Also there is a minuscule difference between np.zeros and np.empty.
np.empty just creates memory, if you were to print those values, they might be weird and random.
np.zeros essentially calls np.empty and then fills the memory with 0.

So if you know that the entire array will be filled with numbers, there is no need to use np.zeros, just use np.empty. If there is any chance that some array elements will not be filled, I would always use np.zeros. In general it is best practice to always initialize memory with a value. It makes bugs easier to track.

I see not it's crystal clear.

For below case,

y = np.zeros([10, 10]) y[:, :] = func()

when copying output from func() to y, are we using doubled amount of memory?

I don't think it uses more memory, but the pre-allocation costs you unnecessary time.

Here the first two cells. I cannot attach .py or .ipynb, apparently.

Interested though if @leonfoks agrees or not...

(The pre-allocation is not costing you lots of time, it is the indexing later by filling in the new values into the existing array. If you just create it and then overwrite it you don't loose much, but it is still unnecessary. See http://nbviewer.jupyter.org/gist/prisae/af1b40ef4a2e16e130b39f6d7957be50 for an updated version.)

got it. it seems now pretty clear to me what's the big kid on the block!

sgkang · 2018-10-17T15:19:51Z

simpegEM1D/Survey.py

-                    )
-
-                resp_int_i = np.empty(self.time_int.size, dtype=float)
+                        dtype=np.float64, order='F')

                # TODO: remove for loop


Hi @prisae and @leonfoks, this is the loop that I can remove, related to #14

sgkang · 2018-10-22T16:08:25Z

@leonfoks, I am thinking about merging this branch to master today, let me know if you any comment or suggestion that you want to make!

sgkang · 2018-10-24T17:56:02Z

Thanks a lot particularly for @leonfoks, this improvement is huge!
Thank you @prisae for reviews and constructive comments. I learned a lot with this pull request from you guys.

Leon Foks and others added 12 commits August 30, 2018 12:09

Removed np.empty statements when the next line would recast the varia…

cd5c127

…ble. i.e. using x = np.empty() must be followed by x[:, :] in order to take advantage of the allocated memory.

Added 3 new functions. Piecewise ramp fast uses Numpy broadcasting fo…

68f2a39

…r 1 pulse. Fast diff uses broadcasting for 2 pulses. Piecewise Pulse fast keeps the gauss legendre coefficients so no need to keep asking for them.

Added single layer flags to half-switch so no need to specify.

3f7d08f

Fixed a bug where j was not assigned in certain cases.

Updated forward modelling to use Piecewise_Pulse_Fast

98d7e39

minor

f21906f

Added Fortran module for forward and sensitivity calculations

f98e0f8

Inputs to forward modelling must be in Fortran order to get efficient…

a20954b

… glue to the Fortran codes. Memory is preallocated before calling the Fortran codes.

Added Fortran extension to setup.py. On my Mac, I could not get pip t…

8d83d1f

…o update the Fortran compile flags using the extra_f90_compile_args option. I had to add them to the link_args instead. Bit hacky, but it should work fine.

Removed unused variables

e7b6529

version update

255f5d0

Merge branch 'speed_up_leon' into master

2fe95e6

Merge pull request #19 from leonfoks/master

3f76463

Fortran backend for 1D EM forward modelling and Numpy broadcasting

seogi_macbook added 3 commits October 1, 2018 23:34

add import setuptools

6ad4342

add fortran

508cef4

minor change

d08c212

seogi_macbook added 3 commits October 2, 2018 23:41

blah..

39d4cf5

add pip install

045d8de

add fortran

52f4fdf

seogi_macbook added 5 commits October 9, 2018 11:45

minor fix

37f9f8c

blah

47e2b61

update travis.ml

20d92e5

minor fix

25d4e39

...

d0917d4

update notebooks

5a53afc

seogi_macbook added 7 commits October 10, 2018 00:38

tests parallel version

63653bc

add make build

a3d1943

minor fix

2cfaa11

minor fix

20bb56e

add make build

b66a0a6

add try and except statement for fortran

ceeace8

add gcc

c546fea

sgkang commented Oct 17, 2018

View reviewed changes

fix time range bug in Survey.py

afa21c6

sgkang mentioned this pull request Oct 17, 2018

Dual moment fails at interpolation #21

Open

sgkang commented Oct 17, 2018

View reviewed changes

seogi_macbook added 2 commits October 17, 2018 08:22

pep8 clean up

d8aca20

back to piecewise_pulse_fast

7fc58f0

seogi_macbook added 2 commits October 22, 2018 16:14

minor update

73735de

back master

79a6ea5

sgkang closed this Oct 24, 2018

sgkang reopened this Oct 24, 2018

sgkang merged commit 4802b67 into master Oct 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fortran backend for 1D EM forward modelling and Numpy broadcasting #22

Fortran backend for 1D EM forward modelling and Numpy broadcasting #22

sgkang commented Sep 26, 2018 •

edited

Loading

sgkang commented Sep 26, 2018

leonfoks commented Sep 28, 2018

sgkang commented Oct 2, 2018

leonfoks commented Oct 2, 2018 •

edited

Loading

sgkang commented Oct 9, 2018

coveralls commented Oct 10, 2018 •

edited

Loading

sgkang left a comment

sgkang Oct 17, 2018

leonfoks Oct 18, 2018

sgkang Oct 22, 2018

sgkang Oct 17, 2018

leonfoks Oct 18, 2018

sgkang Oct 17, 2018

leonfoks Oct 18, 2018

sgkang Oct 17, 2018

leonfoks Oct 18, 2018

sgkang Oct 22, 2018

sgkang Oct 17, 2018

leonfoks Oct 18, 2018

leonfoks Oct 18, 2018 •

edited

Loading

sgkang Oct 22, 2018

prisae Oct 22, 2018

prisae Oct 22, 2018

prisae Oct 22, 2018

prisae Oct 22, 2018

sgkang Oct 24, 2018

sgkang Oct 17, 2018

sgkang commented Oct 22, 2018

sgkang commented Oct 24, 2018

		@@ -233,6 +235,92 @@ def piecewise_ramp(step_func, t_off, t_currents, currents, n=20, eps=1e-10):
		) * const

Fortran backend for 1D EM forward modelling and Numpy broadcasting #22

Fortran backend for 1D EM forward modelling and Numpy broadcasting #22

Conversation

sgkang commented Sep 26, 2018 • edited Loading

Timing

Forward Modelling

Solutions

Sensitivity i.e. getJ_sigma()

Solutions

Some general comments on speed

sgkang commented Sep 26, 2018

leonfoks commented Sep 28, 2018

sgkang commented Oct 2, 2018

leonfoks commented Oct 2, 2018 • edited Loading

sgkang commented Oct 9, 2018

coveralls commented Oct 10, 2018 • edited Loading

Pull Request Test Coverage Report for Build 182

💛 - Coveralls

sgkang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leonfoks Oct 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgkang commented Oct 22, 2018

sgkang commented Oct 24, 2018

sgkang commented Sep 26, 2018 •

edited

Loading

leonfoks commented Oct 2, 2018 •

edited

Loading

coveralls commented Oct 10, 2018 •

edited

Loading

leonfoks Oct 18, 2018 •

edited

Loading