Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PythonMutator: propagate source locations #1783

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

kanterov
Copy link
Contributor

@kanterov kanterov commented Sep 20, 2024

Changes

Add a mechanism to load Python source locations in the Python mutator. Previously, locations pointed to generated YAML. Now, they point to Python sources instead. Python process outputs "locations.json" containing locations of bundle paths, examples:

{"path": "resources.jobs.job_0", "file": "src/examples/job_0.py", "line": 3, "column": 5}
{"path": "resources.jobs.job_0.tasks[0].task_key", "file": "src/examples/job_0.py", "line": 10, "column": 5}
{"path": "resources.jobs.job_1", "file": "src/examples/job_1.py", "line": 5, "column": 7}

Such locations form a tree, and we assign locations of the closest ancestor to each dyn.Value based on its path. For example, resources.jobs.job_0.tasks[0].task_key is located at job_0.py:10:5 and resources.jobs.job_0.tasks[0].email_notifications is located at job_0.py:3:5, because we use the location of the job as the most precise approximation.

This feature is not yet enabled by default because PyDABs don't yet ignore unknown arguments, and this will be only supported with 0.6.0.

experimental:
  pydabs:
    enabled: true
    import: [...]
    load_locations: true

Note: for now, we don't update locations with relative paths, because it has a side effect in changing how these paths are resolved

Example

% databricks bundle validate

Warning: job_cluster_key abc is not defined
  at resources.jobs.examples.tasks[0].job_cluster_key
  in src/examples/example.py:10:1

Tests

Unit tests and manually

Stacked on top of #1782


// LoadLocations is a flag to enable loading Python source locations from the PyDABs.
//
// Locations are only supported since PyDABs 0.6.0, and because of that,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it just try to load locations and fail gracefully?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will enable it by default when we move out of the experimental section. It will be extra difficult to handle the case when the subprocess fails with an unknown argument error and not another error because we will need to parse stderr for that. Not sure it's worth it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding flags in the future?

It's not great if we have to break out a flag every time. Not a problem when using env vars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newer versions will ignore and warn about unknown flags. We should have done it from the beginning.

return nil, fmt.Errorf("failed to open locations file: %w", err)
}

defer locationsFile.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the file going to be closed before parse function succeeds?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just read the whole file and pass the bytes to parsePythonLocations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure of the exact order. I think it works because defer is executed only on exit. I will rewrite the code in a way which doesn't require such knowledge :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All defer statements are executed upon exiting the frame after the return value is computed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

The line protocol in the locations file looks fine.


// LoadLocations is a flag to enable loading Python source locations from the PyDABs.
//
// Locations are only supported since PyDABs 0.6.0, and because of that,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding flags in the future?

It's not great if we have to break out a flag every time. Not a problem when using env vars.

exists bool
}

type pythonLocationEntry struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is used only for serialization/protocol, but good to have a comment explaining this (or move it to a separate file, together with the loader). If this context is missing it's weird to see the definition of pythonLocations above and see it not use the pythonLocationEntry.

bundle/config/mutator/python/python_locations.go Outdated Show resolved Hide resolved
bundle/config/mutator/python/python_locations.go Outdated Show resolved Hide resolved
bundle/config/mutator/python/python_locations.go Outdated Show resolved Hide resolved

currentNode.location = location
currentNode.exists = true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this can be a member function of the type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into changing but it doesn't make it obviously better. putPythonLocation/findPythonLocation is not intended to be used outside of merging code, so if we could limit visibility, we would only export mergePythonLocations and parsePythonLocations

func loadLocationsFile(locationsPath string) (*pythonLocations, error) {
if _, err := os.Stat(locationsPath); os.IsNotExist(err) {
return newPythonLocations(), nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can check for this error as part of os.Open. Saves a syscall.

Also make sure to use errors.Is(err, fs.ErrNotExist) -- os.IsNotExists is marked deprecated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants