Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish review on initial implementation #15

Closed
schneems opened this issue Mar 7, 2023 · 3 comments
Closed

Finish review on initial implementation #15

schneems opened this issue Mar 7, 2023 · 3 comments

Comments

@schneems
Copy link

schneems commented Mar 7, 2023

I wasn't actually done with my review on #3. I threw out my back and wasn't able to get to it on Friday. Here's the rest of it:

I would like to see the command logged to the end user. Above it's hard coded to "Running pip install". I would like to see the command be exactly what we run for the user so if they copy and paste (and of course modify any paths) they would get the exact same result.

Since it also looks like the results rely on PYTHONUSERBASE env var. I would like to see that in the output as well.

In addition to local debugging, it can also help buildpack maintainers spot accidental discrepancies.

Same as above. We're streaming this command, but not announcing the (exact) command that's being streamed. If we don't want to announce this specific command (since it seems it's a helper command rather than something a user might expect to be run against their code) then perhaps we move to only emitting the command string in the error message.

I want the exact command run to be in the logs or the error (or both).

Function name is log_io_error. Function body says "unexpected error" which might not always be true. Or rather if we're saying "unexpected error" I'm reading that as synonymous with "not my fault" if I'm a customer reading it.

I imagine someone copying and pasting that function without looking too close thinking it's for handling all IO errors. For example a new file format is added and reading it generates an std::io::Error due to a permissions problem or bad symlink the customer checked in. In that case it wouldn't be so unexpected. Rename for clarity? unexpected_io_error?

Mentioned in Colin's PR. I would like to avoid early returns when there's only two branches. We can if/else this and eliminate the early return.

I think we should log_warning instead of log_info.

In a bit of a surprise to even me, I'm going to advocate for less testing. I think testing build logic inside of main.rs should be enough here. One fewer docker boots at CI test time.

Also we're effectively testing the output of pack here which is subject to change. If you do want to keep this test, I would scope it to only the strings you control.

Same as above. Doesn't need to be a docker test.

Regarding the error message (it's very good), when we say the user is missing files: It would be nice for us to give them an ls of what files we see in that directory. So at a glance they could see in one window we're looking for "requirements.txt" but they have "REQUIREMENTS.txt" (or something).

Comment: You could manually combine the streams yourself let output = format!("{}{}", context.pack_stdout, context.pack_stderr). I'm assuming the goal is to get all of the pack output on test failure.

Also I think you could get rid of this test. Esssentially it's testing that a bad requirements.txt file triggers a non-zero pip install. The git test above it, seems more useful as an integration test.

Testing caching 👍. We can make these less brittle by asserting only the individual lines (or even just parts of lines). I think asserting for "Using cached Python" and "Using cached pip" without the version numbers, would be enough to convince me. Maybe a "Using cached typing_extensions" for good measure. All the other values and numbers will cause churn on this file and possibly failures on otherwise unrelated changes (if libherokubuildpack updates it's logging style for example).

That comment applies to all integration tests. They're really nice and easy to review in this format, but I don't want to have to update 10 files every time I add an oxford comma (for example).

If we know that one change invalidates things, I would bet multiple changes would as well. I think this is covered in your unit tests ✂️

Unit test should be okay.

Comment: This is a good idea

Unit test should be fine.

Time to test

Not that it's a race, but time to run CI universally always goes up, and integration tests are historically one of the last things that developers are willing to delete. Right now Python is ~6min for integration tests while Ruby is ~3min.

Ideally I would like to aim for <5min for CI completion with a maximum of around 10 min. Once you hit 10 min and a random glitch causes the tests to need to be re-run then you're pushing nearly half an hour for a single change and it absolutely kills (my) productivity.

I think we should be agressive in testing and safety. I also think we should consider pruning some of these tests now, as this is otherwise the fastest this CI suite will ever execute.

@edmorley
Copy link
Member

edmorley commented Mar 7, 2023

I would like to see the command logged to the end user. Above it's hard coded to "Running pip install". I would like to see the command be exactly what we run for the user so if they copy and paste (and of course modify any paths) they would get the exact same result.
Since it also looks like the results rely on PYTHONUSERBASE env var. I would like to see that in the output as well.
In addition to local debugging, it can also help buildpack maintainers spot accidental discrepancies.

So I agree that transparency in what is being run is definitely something to consider, but the problem is that many of the pip options used will just not work locally (and are not actually needed locally; the options are quite different categories from those you use with bundler).

And for several options we make it so that people don't even need them if running from a run image either. (You mention PYTHONUSERBASE, but that's set in the launch image too, so people don't need to.)

Lastly, I think in many of the potential failure modes, what needs to be clearest to the end user is the action they need to take, and in some cases their action is "try again, since this isn't your fault" - and in those cases including the whole command run actually makes the error message more verbose and potentially dilutes the intended message.

As such, I really think this has to be decided on a case by case basis.

I'll continue to think about this, however.

The function streams output to the user but is named run_command. Maybe stream_command ?

None of my subprocess commands have their output captured (intentionally). The helper does more than just stream, it checks exit code etc. So I think the generic name of "run" seems to capture all the things it does, whereas "stream" might sound like it only streamed? (If I had two helpers, and one streamed and the other didn't then yeah.)

However, you mentioning this made me realise the rustdocs for run_command don't mention it streams, so I'll update the rustdocs for it now.

Function name is log_io_error. Function body says "unexpected error" which might not always be true. Or rather if we're saying "unexpected error" I'm reading that as synonymous with "not my fault" if I'm a customer reading it.

This is similar to the discussion we had on the PR. At the moment it seems that lifecycle shields us from most/all permissions based IO errors, meaning that if any occur then they really are unexpected, since they are an issue with the platform etc.

I would like to dig more into this, which is why there is a ~TODO here:

// by issues in the application. In the future, perhaps we should try and split these out?

I've also now filed GUS-W-12650236 to track.

Mentioned in Colin's PR. I would like to avoid early returns when there's only two branches. We can if/else this and eliminate the early return.

Unfortunately, the GitHub diff links aren't jumping to the correct place in the PR (it seems every URL is identical?), so I'm having to guess which lines were being referenced. (I believe it's possible to leave a comments-only review on a closed PR, so next time could you leave on the PR rather an an issue?)

If this was in reference to the early return in determine_python_version, then that's intentional since there are going to be more package managers added in the future.

Also, I'm not convinced this type of suggestion belongs in a PR review, since which to use here comes down to personal style or a decision made on a case-by-case basis (and is not a unilateral best practice) and really doesn't affect readability at all. If this were a style enforced by Clippy or rustfmt, then I'd be more than happy to go with that, however, as-is it feels like someone else's personal style guide is being enforced on other repos without us having agreed on it as a team.

I think we should log_warning instead of log_info.

Is this about the log_info in detect? (I'm having to guess due to the broken links)

If so, good callout - although, looking now I think it might still be best as info, since:
- the detect output is shown when using --verbose when building other languages, so could be seen when it's not technically an issue, so perhaps shouldn't be on stderr
- using log_warning has a mandatory heading that I don't think makes sense here

Also we're effectively testing the output of pack here which is subject to change. If you do want to keep this test, I would scope it to only the strings you control.

I'm guessing this is about the detect test?
https://github.com/heroku/buildpacks-python/blob/main/tests/integration/detect.rs

If so, I agree the test assertion contains parts of lifecycle (not pack) output, which is subject to change. However, this was an intentional decision/tradeoff so I could test the UX of the message as a whole.

when we say the user is missing files: It would be nice for us to give them an ls of what files we see in that directory. So at a glance they could see in one window we're looking for "requirements.txt" but they have "REQUIREMENTS.txt" (or something).

I agree - I've filed GUS-W-12650345.

Comment: You could manually combine the streams yourself let output = format!("{}{}", context.pack_stdout, context.pack_stderr). I'm assuming the goal is to get all of the pack output on test failure.

This wouldn't correctly interleave the stdout and stderr though. (ie: The line ordering would be different unless all the errors happened to be output after the stdout)

Also I think you could get rid of this test. Esssentially it's testing that a bad requirements.txt file triggers a non-zero pip install. The git test above it, seems more useful as an integration test.

The git test above is testing the success path. The bad requirements test is testing the non-zero exit status path. Removing the latter would mean the non-zero exit status (and resultant error message) was not tested anywhere in the buildpack.

Testing caching 👍. We can make these less brittle by asserting only the individual lines (or even just parts of lines).

This is an intentional decision to ensure the UX of the full output can be seen/reviewed/is tested. Yes the tests will need updating more often, but that is a price I'll happily pay. I'm not going to enforce that style on other language's repos if they don't want, but it's something I want to do for Python.

This is also something that came up in one of our team meetings in the last 2-3 weeks - maybe you were away then? I gave several classic buildpack examples then of why this kind of thing can be beneficial.

if libherokubuildpack updates it's logging style for example

This is exactly the type of change I want to catch! If the logging style changes (for example the "Error:" prefix in the error headings changes), we should check our messages make sense with the new style/helper wordings etc.

If we know that one change invalidates things, I would bet multiple changes would as well. I think this is covered in your unit tests ✂️

I'm aware there's overlap. The point of the multiple-cases test is to test the UX of the more complicated error message, since it's multi-line and concatenated in a different way.

Unit test should be okay.

I have no idea what test these are referring to, given all the links that were pasted are the same URL.

In general though, I put a lot of thought (multiple days) into exactly what integration tests are run, to get as much coverage of possible, with as little overlap as possible. I've definitely erred slightly more on the side of coverage than no-overlap, however, this is not by accident.

Time to test

I agree short CI end to end times are important, which is why I've put a lot of time into making them quick for both the classic buildpack and CNBs.

This is why my classic buildpack integration tests are currently the fastest of any of the classic language buildpacks (heroku-buildpack-python's CI is 3x faster than heroku-buildpack-ruby for example).

Right now Python is ~6min for integration tests while Ruby is ~3min.

I think you might not be taking into account cold/warm cache, and also GitHub Actions queueing time.

For example, this Ruby CNB run is not the ~3 mins you mention, but instead ~6.5 mins since cold cache:
https://github.com/heroku/buildpacks-ruby/actions/runs/4303282807/jobs/7502823991

And this Python CNB run is not the ~6 mins you mention, but instead ~3 mins since warm cache:
https://github.com/heroku/buildpacks-python/actions/runs/4343644369/jobs/7585944610

So picking random runs and comparing them isn't going to work - it has to be like for like. (Note: Caches are discarded whenever the Rust version changes, and also after 7 day of inactivity.)

If we want to be fair, we should (a) pick the fastest of several runs with warm cache, (b) look at the individual job time (not the workflow time, since that includes queuing time spent waiting for a worker to be available). Doing that I found:

So the Python CNB integration tests are 3 seconds slower (2%) than the Ruby CNB's. However, the Python CNB runs 20 integration tests in that time, rather than the 3 for Ruby - so Python has much better coverage for basically the same CI duration.

I see that Ruby doesn't test each major Ruby version (how do you know your S3 archives work, or that bundler works with each major you support?), along with several other features that aren't tested at all in integration tests (eg cache invalidation).

As such, I'd say the Python integration tests are doing more than ok, and maybe it's the Ruby CNB integration tests that need some work for increasing coverage :-P

@edmorley
Copy link
Member

edmorley commented Mar 7, 2023

the rustdocs for run_command don't mention it streams, so I'll update the rustdocs for it now.

Updated in #16.

@schneems
Copy link
Author

schneems commented Mar 7, 2023

As such, I really think this has to be decided on a case by case basis. I'll continue to think about this, however.

Thanks for the consideration. I think this could be an area where a "debug logging" or "give me the firehose of information" might be useful, though it's not supported by the CNB spec. We could informally add something like that to our buildpacks by convention in the future.

One of the debugging techniques I use if someone says "this used to work, but now it fails and nothing changed" is to "heroku run bash" into their machine (after getting permission) and re-run the exact command from the last deploy.

it feels like someone else's personal style guide is being enforced on other repos without us having agreed on it as a team.

While you're the primary maintainer of the Python buildpack, it's code the team maintains as well. Not commenting feels to me as having to accept someone else's style.

We don't really have these conversations anywhere else and I honestly don't want to have a meeting for it. At least not yet. I would like to have a place to have these conversations. I proposed merging this in heroku/libcnb.rs#533 and we could add a chapter on styles and have discussions on PRs to that chapter.

However, you mentioning this made me realise the rustdocs for run_command don't mention it streams, so I'll update the rustdocs for it now.

Thanks!

Unfortunately, the GitHub diff links aren't jumping to the correct place in the PR (it seems every URL is identical?), so I'm having to guess which lines were being referenced. (I believe it's possible to leave a comments-only review on a closed PR, so next time could you leave on the PR rather an an issue?)

So sorry. Though to be quite honest you're very good at divining my intentions. I now see the UI you're talking about, which I couldn't find last night. Sorry about that.

Is this about the log_info in detect? (I'm having to guess due to the broken links)

It's in determine_python_version which is called in build.

If so, I agree the test assertion contains parts of lifecycle (not pack) output, which is subject to change. However, this was an intentional decision/tradeoff so I could test the UX of the message as a whole.

I understand the desire. In syntax_suggest I have a flag I can run on my tests to output the formatting. Another one will emit timing and profiling. https://github.com/ruby/syntax_suggest/#development. I'm wondering if there's another way to get this information/feedback without churn/conflicts in testing. Github Actions gives us the ability to generate artifacts https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts. I don't know what options we have for adding them to the PR view, but that could be a good option/compromise.

In my ideal world, I would get a warning to investigate differences in output format that can be dismissed without needing a code change. The more rigid the tests, the more toil it is to make minor changes. That being said, you're probably the most toil-resistant developer I've met.

I see that Ruby doesn't test each major Ruby version (how do you know your S3 archives work, or that bundler works with each major you support?), along with several other features that aren't tested at all in integration tests (eg cache invalidation).

I don't believe the buildpacks should be testing binaries. They didn't make them. I hatchet test once when they're created https://github.com/heroku/docker-heroku-ruby-builder/blob/efecb7ebdd850dab93bcdb44674ee6c40e5eae28/Rakefile#L126-L153. Then I trust they will continue to work. My tooling prevents me from over-writing an existing binary.

So the Python CNB integration tests are 3 seconds slower (2%) than the Ruby CNB's. However, the Python CNB runs 20 integration tests in that time, rather than the 3 for Ruby - so Python has much better coverage for basically the same CI duration.

Happy to be wrong here. Thanks for investigating/clarifying. I pulled the last CI runs out of convenience, not malice :)

As such, I'd say the Python integration tests are doing more than ok, and maybe it's the Ruby CNB integration tests that need some work for increasing coverage :-P

I'm not sure what common cases should be tested. Happy to hear opinions. In general I'm most wary of caching and the provide/require interface. That's what I've focused on. I've got some gus backlog work items (if you can believe it) for some tests in classic that aren't represented in the CNB, but they're all edge cases that I'm not sure need to be tested due to guarantees both Rust and the CNB spec give us.

@schneems schneems closed this as completed Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants