-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloud init schema failures #1954
Cloud init schema failures #1954
Conversation
a7f4f26
to
62d34b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment for now, more review tomorrow.
62d34b5
to
077d49b
Compare
077d49b
to
8af0d5d
Compare
@blackboxsw could you please take a look at this as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chris, I'm really glad to see this work done. I think it's going to help reduce user confusion. I have some items to follow-up on.
8af0d5d
to
3c4bb85
Compare
Thanks for the review @dbungert. I've gone ahead and implemented your suggestions. I added an extra test case to capture the scenario in which autoinstall keys are present in the cloud-config and the Also, by removing the dependence on the recoverable errors I was able to test this successfully on a 20.04.6 image which has cloud-init version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work! A few minor comments but nothing blocking
3c4bb85
to
4c47763
Compare
Thanks @ogayot I've implemented all of your suggestions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One tweak please then LGTM.
4c47763
to
96e7720
Compare
Thanks! Since I changed CloudInitSchemaValidationError to a NonReportableException, I also added some lines to the client code so the error overlay works for those error types. Could you give that a look over too @dbungert ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping on this for our awareness.
While I think consuming the CLI and reading stderr is potentially fragile, I get that subiquity may not want to tightly couple to internal cloudinit python libraries and functions to process and iterate over SchemaValidationErrors
raised by cloudinit.config.schema.validate_cloudconfig_schema or cloudnit.config.schema.validate_cloudconfig_file.
To use these functions, subiquity would have to likely have to do something like the following:
from cloudinit.config.schema import validate_cloudconfig_file, get_schema, SchemaValidationError
cloudconfig_schema = get_schema()
try:
validate_cloudconfig_file('/var/lib/cloud/instance/user-data.txt', schema=cloudconfig_schema)
except SchemaValidationError as e:
return [schema_problem.path for schema_problem in e.schema_errors if "unexpected" in schema_problem.message]
Note that even the structured SchemaValidationError will not set a 'path' attribute if multiple unexpected keys exist on the object. So, you'd still need your pattern matching to schema_problem.message
to extract the separate keys so it's nearly the same logic you have for parsing the stderr of cloud-init schema --system
.
The fragility in CLI approach in this PR will be due to cloud-init relying on jsonschema
for that error string as cloud-init just presents that error message directly without modification. If jsonschema
module changes their error messaging format this could break subiquity parsing.
That said, I don't see this error output format from jsonschema being any different between jammy and noble and I think cloud-init would like to work on a machine-readable representation of cloud-init schema --system --format=yaml
per your feature request canonical/cloud-init#5100 that will make this easier to process in the future.
Minor changes requested that you can take or leave as you see fit
- better regex pattern match, splitting of the parsed jsonschema error messages
- dropping unused functions/tests
We'll make sure we keep you informed when we start tackling canonical/cloud-init#5100. So this code can consume more friendly structured content when available.
subiquity/cloudinit.py
Outdated
# Matches: | ||
# ('some-key' was unexpected) | ||
# ('some-key', 'another-key' were unexpected) | ||
pattern = r"\((?P<args>'[\w\-]+'(,\s'[\w\-]+')*)+ (?:was|were) unexpected\)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of thoughts on this regex:
- Pattern should probably be more flexible as these keys can really have any characters in them. and could include underscores, periods etc. So, let's match each offensive key on
[^']+
- Also we can push the leading and trailing single-quotes outside the P? match so we don't have to strip them later
- We can drop the trailing
+
outside the(P?<args>...)+
as your greedy matching and*
should take care of all listed unexpected key matches.
pattern = r"\((?P<args>'[\w\-]+'(,\s'[\w\-]+')*)+ (?:was|were) unexpected\)" | |
pattern = r"\('(?P<args>[^']+(,\s'[^']+)*)' (?:was|were) unexpected\)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review on this especially (Regex is hard!). I think we can definitely (1) replace [\w\-]
with [^']
to be more flexible and (3) remove the trailing +
, but (2) isolating the key names without the quotes is difficult. The suggested regex doesn't quite work and my attempts thus far have been insufficient.
I think I'm okay with another round of processing to strip the quotes and this is still an improvement on the parsing.
args_list: list[str] = search_result.group("args").split(", ") | ||
no_quotes: list[str] = [arg.strip("'") for arg in args_list] | ||
|
||
return no_quotes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the above pattern suggestion is acceptable. Then you can avoid having to strip single quotes and adapt your split instead.
args_list: list[str] = search_result.group("args").split(", ") | |
no_quotes: list[str] = [arg.strip("'") for arg in args_list] | |
return no_quotes | |
args_list: list[str] = search_result.group("args").split("', '") | |
return args_list |
async def get_schema_failure_sources() -> list[str]: | ||
"""Retrieve the keys causing schema failure.""" | ||
|
||
cmd: list[str] = ["cloud-init", "schema", "--system"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that providing --system
does report schema errors with any network-config or vendor-data as well as user-data provided to the instance. This is probably ok, and you probably want subiquity to raise any errors with any network, user-data or vendor-data provided to the instance. What you will not have with this approach is visibility to the file that generated the schema errors because you are processing stderr which only emits the specific error message from jsonschema for a given key, the file name (user-data.txt or network-config.txt) is represented only in the stdout at the moment. I don't think subiquity currently provides network-config to cloud-init, but users could provide this config via kernel commandline params ds=nocloud-net;http://some_url/
What you may want to do is specifically perform schema validation of only the user-data if it exists.
cmd: list[str] = ["cloud-init", "schema", "--system"] | |
if not os.path.exists("/var/lib/cloud/instance/user-data.txt"): | |
log.debug("No processed cloud-init user-data present") | |
return [] | |
cmd: list[str] = ["cloud-init", "schema", "-c", "/var/lib/cloud/instance/user-data.txt"] |
or network-config:
if not os.path.exists("/var/lib/cloud/instance/network-config.json"):
log.debug("No processed cloud-init network-config present")
return []
cmd: list[str] = ["cloud-init", "schema", "-c", "/var/lib/cloud/instance/network-config.json", "--schema-type", "network-config"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in general we want to capture errors from all sources since we read the combined config when extracting the autoinstall config (e.g. is it possible someone sends autoinstall config in their network-config?), but capturing the file source would improve our error messaging for sure. What do you think about leaving this to future improvements?
96e7720
to
67bcfa3
Compare
Users attempting to do autoinstall may incorrectly send autoinstall directives as cloud-config, which will result in cloud-init schema validation errors. When loading autoinstall from cloud-config, we now check to see if there are any cloud-init schema validation errors and warn the user. Additionally, if the source of the error is from a known autoinstall error, we inform the user and halt the installation with a nonreportable AutoinstallError.
67bcfa3
to
1b2c6be
Compare
@blackboxsw Thanks a lot for the extensive review! I went ahead and implemented most of your suggested changes (all but two, I left my reasoning in the comments and kept them unresolved). Based on your analysis I think we're okay with the potential fragility in the error parsing until we move to the structured output when it's available. I look forward to seeing more on canonical/cloud-init#5100! |
OK to merge, despite Noble test failures (unrelated archive problems) |
Users attempting to do autoinstall may incorrectly send autoinstall directives as cloud-config, which will result in cloud-init schema validation errors. When loading autoinstall from cloud-config, we now check to see if there are any cloud-init schema validation errors and warn the user. Additionally, if the source of the error is from a known autoinstall error, we inform the user and halt the installation with a nonreportable AutoinstallError.
Requires #1945 and #1947.