Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump to xDSL supporting inplace bufferization #59

Merged
merged 13 commits into from
Aug 30, 2024
Merged

Conversation

PapyChacal
Copy link
Collaborator

@PapyChacal PapyChacal commented Feb 25, 2024

Just bump to xDSL supporting inplace bufferization and support inplace

Stacked on:

@PapyChacal PapyChacal self-assigned this Feb 25, 2024
@georgebisbas georgebisbas changed the title Adapt lowering to hqndle inplace update equations Adapt lowering to handle inplace update equations Feb 26, 2024
@PapyChacal PapyChacal changed the base branch from emilien/snip-iet to emilien/coming-soon August 21, 2024 13:33
@PapyChacal PapyChacal changed the title Adapt lowering to handle inplace update equations Bump to xDSL supporting inplace bufferization Aug 21, 2024
Comment on lines 134 to 136
def test_inplace():
# Define a simple Devito Operator
grid = Grid(shape=(3, 3))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to check correct computations here then 🤔

@PapyChacal
Copy link
Collaborator Author

Self-note: There's an issue with an unlowered memef.alloc
Which might mean the bufferization is not acting as expected, as I wouldn't expect an allocation in the in place case

@PapyChacal
Copy link
Collaborator Author

Okay so, Devito here is doing that (turning on OpenMP for clarity):

for (int x = x_M; x >= x_m; x -= 1)
    {
      #pragma omp parallel num_threads(nthreads)
      {
        #pragma omp for schedule(static,1)
        for (int y = y_m; y <= y_M; y += 1)
        {
          u[t0][x + 1][y + 1] = r0*(-u[t0][x + 1][y + 1]) + r0*u[t0][x + 2][y + 1];
        }
      }
    }

So it sees there's a dependency only on y and expresses the computation to be sequential over x so that it can indeed do this one in place.
This is just not something the stencil dialect can express at this point 🥲
@georgebisbas There are multiple ways from here onwards, let's discuss it when you have time. But indeed I don't see how to manage this without an extra allocation in xDSL in the time I have on my side

@georgebisbas
Copy link
Collaborator

Okay so, Devito here is doing that (turning on OpenMP for clarity):

for (int x = x_M; x >= x_m; x -= 1)
    {
      #pragma omp parallel num_threads(nthreads)
      {
        #pragma omp for schedule(static,1)
        for (int y = y_m; y <= y_M; y += 1)
        {
          u[t0][x + 1][y + 1] = r0*(-u[t0][x + 1][y + 1]) + r0*u[t0][x + 2][y + 1];
        }
      }
    }

So it sees there's a dependency only on y and expresses the computation to be sequential over x so that it can indeed do this one in place. This is just not something the stencil dialect can express at this point 🥲 @georgebisbas There are multiple ways from here onwards, let's discuss it when you have time. But indeed I don't see how to manage this without an extra allocation in xDSL in the time I have on my side

Thanks for that @PapyChacal , does xdsl produce one level of parallelism + correct results?
Let's discuss this yes.


xdsl_op = Operator([eq0], opt="xdsl")
xdsl_op.apply(time_M=5, dt=0.1)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add tests remider

@pytest.mark.xfail(reason="Cannot store to a field that is loaded from")
def test_inplace():
def test_inplace_II():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably no meaning for this

@PapyChacal
Copy link
Collaborator Author

Okay so, Devito here is doing that (turning on OpenMP for clarity):

for (int x = x_M; x >= x_m; x -= 1)
    {
      #pragma omp parallel num_threads(nthreads)
      {
        #pragma omp for schedule(static,1)
        for (int y = y_m; y <= y_M; y += 1)
        {
          u[t0][x + 1][y + 1] = r0*(-u[t0][x + 1][y + 1]) + r0*u[t0][x + 2][y + 1];
        }
      }
    }

So it sees there's a dependency only on y and expresses the computation to be sequential over x so that it can indeed do this one in place. This is just not something the stencil dialect can express at this point 🥲 @georgebisbas There are multiple ways from here onwards, let's discuss it when you have time. But indeed I don't see how to manage this without an extra allocation in xDSL in the time I have on my side

Thanks for that @PapyChacal , does xdsl produce one level of parallelism + correct results? Let's discuss this yes.

For the record, this does not seem to happen in TTI

@PapyChacal PapyChacal changed the base branch from emilien/coming-soon to master August 30, 2024 14:17
@georgebisbas georgebisbas merged commit 9f76499 into master Aug 30, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants