You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The error handling in the scheme parsing has an unexpected amount of complexity. This proposal suggests several simplifications.
As a case in point, some time ago we encountered an error during error handling that occur during scheme parsing (Slack post about it here). To illustrate the issues in the error handling and to guide our considerations below, here is the analysis I made at the time on exactly what went wrong there (slightly more detailed than that Slack post).
IRMA apps that downloaded this inconsistent scheme noticed the inconsistency and attempted to restore it from their assets, but due to the bug solved by this commit, that restored copy of the scheme was written incorrectly to the configuration path, under $configuration_path/pbdf-requestors/pbdf-requestors.
Since the introduction of requestor schemes, irmago no longer allows folders within the configuration path that do not contain either a description.xml or a description.json. Because of the double scheme name in the path, this caused the error from that Slack post.
That incident by itself, as well as the difficulty I had at the time in identifying this chain of events, made it clear that there is too much complexity here.
Current behaviour
Scheme parsing is done by the following two methods of the Configuration struct.
ParseFolder()
ParseFolder() iterates over all folders within the configuration path, parsing them as schemes (using the ParseSchemeFolder() method). If it encounters any error, it treats the error in one of following two ways.
If the error is not associated to any one scheme (e.g., the entire configuration path is unreadable), then that error is returned directly.
If the error occured during parsing of a particular scheme, then the following happens:
The error is wrapped into a SchemeManagerError and then stored in Configuration.DisabledSchemeManagers or Configuration.DisabledRequestorSchemes schemes (two maps tracking errors per scheme).
The Status field on the SchemeManager/RequestorScheme is set to some state other than Valid describing the error.
The function does not return directly, but instead it continues to parse the next scheme folder. The function may thus encounter multiple SchemeManagerErrors along the way, but only the last one is returned (although all of them are stored in those two maps).
ParseOrRestoreFolder()
This method invokes ParseFolder(). If an error is returned of type other than SchemeManagerError, then that error is returned directly. Otherwise, the scheme that had an error during parsing is reinstalled first from its online remote, or if that fails from the local assets. If reinstalling worked, then the error that was set in the DisabledSchemeManagers or DisabledRequestorSchemes maps is removed. If and only if everything works (either directly or after restoring any faulty schemes), both of those maps are empty and no error is returned.
Notes
Note 1: If ParseFolder() encountered a SchemeManagerError, then that error occured somewhere halfway during the parsing of a scheme. Consequentially, the contents of that scheme may or may not be partially present in the various maps of the Configuration struct (SchemeManagers, Issuers, CredentialTypes, etc). The idea is that the user of these structs must, when using their contents, always explicitly check that there were no errors during parsing, through the Status field on the SchemeManager/RequestorScheme structs. See e.g. this comment.
Note 2: Any folder that does not contain a description.xml or description.json results in an error (not of type SchemeManagerError, i.e., not recoverable by ParseOrRestoreFolder()). With the exception of folders whose name starts with a dot, ., which are ignored, if Fix leftover temp folder in scheme folder if scheme updating is aborted #269 is merged.
Analysis
I think the behaviour of ParseOrRestoreFolder() is reasonable, although the error bookkeeping in the DisabledSchemeManagers/DisabledRequestorSchemes maps seems unnecessarily complex.
If we want to keep the behaviour of ParseOrRestoreFolder(), i.e., the attempted restoration of faulty schemes, then that means that one aspect of ParseFolder() must stay as it is: if it encounters an error during parsing of some scheme, it must not immediately give up but instead first try to parse the other schemes within the configuration path. So we can keep using the SchemeManagerError type as errors from which ParseOrRestoreFolder() can try to recover.
As to Note 1, I find myself disagreeing with my past self who wrote this; this behaviour seems just wrong. Requiring all readers of the various maps within a Configuration to not use their contents in case a particular status field is set on the scheme is an invitation for bugs that happen because the user forgets to check that status field. Instead, it makes much more sense to not touch the contents of the Configuration at all if an error occured during parsing of a scheme. That way, the presence of a scheme or its contents within a Configuration implies that parsing was succesful and complete so that all of the Configurations contents can be trusted. This can be implemented easily by first parsing the scheme in a temporary second Configuration instance; in case of success the contents of that temporary Configuration instance can be put into the first one using Configuration.Join() method, which was written for use in the scheme updating mechanism in exactly the same fashion.
As to Note 2, this behaviour forbids the existence of any non-scheme within the configuration path (excluding dotted folders if #269 is merged). Before requestor schemes were introduced in irmago, this was different; then non-scheme dirs were simply ignored. Indeed, in the Slack post linked to above, this change in behaviour is what caused the faulty error handling to become visible to end users, in the form of IRMA apps that threw errors and refused to start. Note that essentially there was no reason to bother end users with this issue; in this case, the app could have safely continued operating by just ignoring the non-scheme directory in the configuration path.
Proposed changes
I suggest to change the following:
For both ParseFolder() and ParseOrRestoreFolder() it holds that if they encounter more than one error, then only the final error is returned, discarding all previous errors. Instead, we should keep track of all errors that occur. We can do this using https://github.com/hashicorp/go-multierror, which is already a dependency of irmago. That way, errors and error reports will be more helpful.
The DisabledSchemeManagers/DisabledRequestorSchemes fields of the Configuration struct are removed. Instead, if ParseFolder() encounters any errors it just returns them (all of them, using go-multierror). ParseOrRestoreFolder() then tries to restore the relevant schemes, returning no error if that worked for all problematic schemes, or else returning all errors encountered along the way.
During parsing of a scheme, its contents are written to the Configuration only after the parsing of the scheme was entirely successful. Thus, the Configuration instance exclusively contains schemes which were parsed entirely succesfully. This obsoletes the Status fields on SchemeManager/RequestorScheme so that they can be removed.
In case of the irmaclient, I suggest we make ParseFolder() ignore any folder that does not appear to be a scheme (i.e., it does not contain a description.xml/description.json) - the way it was before the introduction of requestor schemes. This will make the app more tolerant to any buggy code writing wrong things to the configuration path, as happened in the issue described at the start of this post, as well as in Irma configuration folder corrupts when exiting irma server during update process #139. We should, however, report to Sentry any such non-scheme folders, since certainly in case of the irmaclient they shouldn't be there. As to the IRMA server, here things are I think different, since for them the contents of the configuration path influences the attributes that the verifier will and will not trust. Therefore, for the IRMA server it makes more sense than in the irmaclient to crash when the server encounters non-scheme folders, so that the IRMA server maintainer has more certainty about what exactly is present in that folder. Note also that while IRMA app users have no manual control over the contents of the configuration path, IRMA server maintainers do, so it is more reasonable for us to expect IRMA server maintainers to cleanup trash in there. Therefore, I suggest adding a new parameter to the ConfigurationOptions struct called AllowNonSchemeDirs or something, set to true in the irmaclient and false elsewhere.
Alternative
These proposals keep intact the functionality of trying to restore faulty schemes. Since that functionality has had issues by itself in the past, we might also instead consider (on top of the above suggestions) to do away with it entirely -- that is, completely remove ParseOrRestoreFolder(), and just crash (and Sentry-report) whenever we encounter a faulty scheme in ParseFolder(). This would completely remove all of the complexity in the error handling, by reducing it to just simple if err != nil { return err } blocks everywhere. However, while we have seen problems in the restoring-code, we don't know how often that functionality has succesfully restored faulty schemes without us knowing about it. I think we should keep this, but I'm happy to hear other opinions.
The text was updated successfully, but these errors were encountered:
@bobhageman noticed that the schemes.go file has some linter warnings and remarks that can be fixed. That's something to pay attention too when refactoring.
The error handling in the scheme parsing has an unexpected amount of complexity. This proposal suggests several simplifications.
As a case in point, some time ago we encountered an error during error handling that occur during scheme parsing (Slack post about it here). To illustrate the issues in the error handling and to guide our considerations below, here is the analysis I made at the time on exactly what went wrong there (slightly more detailed than that Slack post).
pbdf-requestors
scheme at https://privacybydesign.foundation/schememanager was in an inconsistent state for some time.$configuration_path/pbdf-requestors/pbdf-requestors
.irmago
no longer allows folders within the configuration path that do not contain either adescription.xml
or adescription.json
. Because of the double scheme name in the path, this caused the error from that Slack post.That incident by itself, as well as the difficulty I had at the time in identifying this chain of events, made it clear that there is too much complexity here.
Current behaviour
Scheme parsing is done by the following two methods of the
Configuration
struct.ParseFolder()
ParseFolder()
iterates over all folders within the configuration path, parsing them as schemes (using theParseSchemeFolder()
method). If it encounters any error, it treats the error in one of following two ways.SchemeManagerError
and then stored inConfiguration.DisabledSchemeManagers
orConfiguration.DisabledRequestorSchemes
schemes (two maps tracking errors per scheme).Status
field on theSchemeManager
/RequestorScheme
is set to some state other thanValid
describing the error.SchemeManagerError
s along the way, but only the last one is returned (although all of them are stored in those two maps).ParseOrRestoreFolder()
This method invokes
ParseFolder()
. If an error is returned of type other thanSchemeManagerError
, then that error is returned directly. Otherwise, the scheme that had an error during parsing is reinstalled first from its online remote, or if that fails from the local assets. If reinstalling worked, then the error that was set in theDisabledSchemeManagers
orDisabledRequestorSchemes
maps is removed. If and only if everything works (either directly or after restoring any faulty schemes), both of those maps are empty and no error is returned.Notes
ParseFolder()
encountered aSchemeManagerError
, then that error occured somewhere halfway during the parsing of a scheme. Consequentially, the contents of that scheme may or may not be partially present in the various maps of theConfiguration
struct (SchemeManagers
,Issuers
,CredentialTypes
, etc). The idea is that the user of these structs must, when using their contents, always explicitly check that there were no errors during parsing, through theStatus
field on theSchemeManager
/RequestorScheme
structs. See e.g. this comment.description.xml
ordescription.json
results in an error (not of typeSchemeManagerError
, i.e., not recoverable byParseOrRestoreFolder()
). With the exception of folders whose name starts with a dot,.
, which are ignored, if Fix leftover temp folder in scheme folder if scheme updating is aborted #269 is merged.Analysis
I think the behaviour of
ParseOrRestoreFolder()
is reasonable, although the error bookkeeping in theDisabledSchemeManagers
/DisabledRequestorSchemes
maps seems unnecessarily complex.If we want to keep the behaviour of
ParseOrRestoreFolder()
, i.e., the attempted restoration of faulty schemes, then that means that one aspect ofParseFolder()
must stay as it is: if it encounters an error during parsing of some scheme, it must not immediately give up but instead first try to parse the other schemes within the configuration path. So we can keep using theSchemeManagerError
type as errors from whichParseOrRestoreFolder()
can try to recover.As to Note 1, I find myself disagreeing with my past self who wrote this; this behaviour seems just wrong. Requiring all readers of the various maps within a
Configuration
to not use their contents in case a particular status field is set on the scheme is an invitation for bugs that happen because the user forgets to check that status field. Instead, it makes much more sense to not touch the contents of theConfiguration
at all if an error occured during parsing of a scheme. That way, the presence of a scheme or its contents within aConfiguration
implies that parsing was succesful and complete so that all of theConfiguration
s contents can be trusted. This can be implemented easily by first parsing the scheme in a temporary secondConfiguration
instance; in case of success the contents of that temporaryConfiguration
instance can be put into the first one usingConfiguration.Join()
method, which was written for use in the scheme updating mechanism in exactly the same fashion.As to Note 2, this behaviour forbids the existence of any non-scheme within the configuration path (excluding dotted folders if #269 is merged). Before requestor schemes were introduced in
irmago
, this was different; then non-scheme dirs were simply ignored. Indeed, in the Slack post linked to above, this change in behaviour is what caused the faulty error handling to become visible to end users, in the form of IRMA apps that threw errors and refused to start. Note that essentially there was no reason to bother end users with this issue; in this case, the app could have safely continued operating by just ignoring the non-scheme directory in the configuration path.Proposed changes
I suggest to change the following:
ParseFolder()
andParseOrRestoreFolder()
it holds that if they encounter more than one error, then only the final error is returned, discarding all previous errors. Instead, we should keep track of all errors that occur. We can do this using https://github.com/hashicorp/go-multierror, which is already a dependency ofirmago
. That way, errors and error reports will be more helpful.DisabledSchemeManagers
/DisabledRequestorSchemes
fields of theConfiguration
struct are removed. Instead, ifParseFolder()
encounters any errors it just returns them (all of them, usinggo-multierror
).ParseOrRestoreFolder()
then tries to restore the relevant schemes, returning no error if that worked for all problematic schemes, or else returning all errors encountered along the way.Configuration
only after the parsing of the scheme was entirely successful. Thus, theConfiguration
instance exclusively contains schemes which were parsed entirely succesfully. This obsoletes theStatus
fields onSchemeManager
/RequestorScheme
so that they can be removed.irmaclient
, I suggest we makeParseFolder()
ignore any folder that does not appear to be a scheme (i.e., it does not contain adescription.xml
/description.json
) - the way it was before the introduction of requestor schemes. This will make the app more tolerant to any buggy code writing wrong things to the configuration path, as happened in the issue described at the start of this post, as well as in Irma configuration folder corrupts when exiting irma server during update process #139. We should, however, report to Sentry any such non-scheme folders, since certainly in case of theirmaclient
they shouldn't be there. As to the IRMA server, here things are I think different, since for them the contents of the configuration path influences the attributes that the verifier will and will not trust. Therefore, for the IRMA server it makes more sense than in theirmaclient
to crash when the server encounters non-scheme folders, so that the IRMA server maintainer has more certainty about what exactly is present in that folder. Note also that while IRMA app users have no manual control over the contents of the configuration path, IRMA server maintainers do, so it is more reasonable for us to expect IRMA server maintainers to cleanup trash in there. Therefore, I suggest adding a new parameter to theConfigurationOptions
struct calledAllowNonSchemeDirs
or something, set totrue
in theirmaclient
andfalse
elsewhere.Alternative
These proposals keep intact the functionality of trying to restore faulty schemes. Since that functionality has had issues by itself in the past, we might also instead consider (on top of the above suggestions) to do away with it entirely -- that is, completely remove
ParseOrRestoreFolder()
, and just crash (and Sentry-report) whenever we encounter a faulty scheme inParseFolder()
. This would completely remove all of the complexity in the error handling, by reducing it to just simpleif err != nil { return err }
blocks everywhere. However, while we have seen problems in the restoring-code, we don't know how often that functionality has succesfully restored faulty schemes without us knowing about it. I think we should keep this, but I'm happy to hear other opinions.The text was updated successfully, but these errors were encountered: