Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect Python/package version mismatches #231

Open
JoshKarpel opened this issue Aug 31, 2020 · 1 comment
Open

Detect Python/package version mismatches #231

JoshKarpel opened this issue Aug 31, 2020 · 1 comment
Labels
enhancement New feature or request

Comments

@JoshKarpel
Copy link
Contributor

@JoshKarpel - can you open a separate ticket to provide a Loud Warning when there's a version mismatch for Python? Seems like something we could detect easily.

Originally posted by @bbockelm in #229 (comment)

@JoshKarpel JoshKarpel added the enhancement New feature or request label Aug 31, 2020
@JoshKarpel
Copy link
Contributor Author

The answer is yes, but it's not easy with the current way we do IO.

Right now, all IO from the job is handled with (cloud)pickles. This was easy to implement, and so far we haven't needed any richer communication, so it's been fine. But the longer that HTMap is around, the more incompatibilities we're going to pick up between Python versions, particularly whenever the pickle protocol gets a bump (this is the problem that the user hit in #229, trying to go from 3.7 to 3.8). It's also occasionally been problematic in the past when cloudpickle or user packages aren't installed execute-side, e.g. #194 .

The solution to this is to redo the input and output formats so that we can send arbitrary, structured data and metadata back and forth. I recommend JSON with (cloud)pickled objects for function inputs and outputs. Python and package versions could easily be stored as plain text inside the JSON, as well as whatever other metadata we want to add (we could do our own runtime tracking, for example). JSON is readable with the Python standard library and is not versioned, so we shouldn't hit any compatibility issues when loading it from mismatched versions.

For example, the input JSON might look like

{
  "args": <pickled args tuple as bytes>,
  "kwargs": <pickled kwargs dict as bytes>,
  "python_version": "3.8.0",
  "package_versions": {"numpy": "1.18.1", "scipy": "1.0.1"},
}

And the output JSON might look like

{
  "output": <pickled return value bytes>,
  "python_version": "3.7.6",
  "package_versions": {"numpy": "1.18.1", "scipy": "1.0.1"},
}

A version mismatch warning could then be generated when loading output by comparing the local copy of the input file and the output file we got back from the job.

I didn't include the function in the input JSON above, since we get a nominal disk space savings by only storing the pickled function on disk once. The tradeoff is that we have to transfer two files to the job, one of them very small. Consider packing the function in to the input JSON as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant