Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSSION] Permissive or strict parser? #111

Open
fabianegli opened this issue Apr 4, 2022 · 0 comments
Open

[DISCUSSION] Permissive or strict parser? #111

fabianegli opened this issue Apr 4, 2022 · 0 comments
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@fabianegli
Copy link
Collaborator

There is a good discussion to be had about how permissive or strict a parser for a file standard should be and if it is permissive which errors in the format should be tolerated and which not. To me, the answer to this question for the case of SDRF files is not yet clear and I would welcome a discussion about that from contributors and users of sdrf-pipelines. It follows a list of questions (not comprehensive, at all):

  1. What are permissible errors?
    1. Can a trailing whitespace always be stripped? Or can a trailing whitespace have meaning?
    2. Can an empty line be tolerated? At the beginning? At the end? In the middle?
  2. Can we make valid assumptions about strings? Is the encoding UTF-8? Are file names supposed to be composed of only a limited charset?
    1. Filenames?
    2. Column names?
    3. Fields in the SDRF table?
  3. How thoroughly is the content checked?
    1. Are empty fields allowed? Or filled with some value?
    2. Do we need the same number of value X and Y in a column?
    3. Is invalid content detected? e.g. labelling information in a fraction column?
  4. Which detected issues are how severe?
    1. What do they affect?
    2. Should they be handled silently, trigger warning or raise an error?

Some of these questions have clear answers, others not so much. I would very much welcome a discussion around and about

These questions might also have different answers for different use cases. The SDRF is a tool expected to be applied in a broad range of environments and use cases. Discussing these questions will help us anticipate the requirements better and help in the design and implementation of the next iteration of the sdrf-pipelines package.

Since I am new to this project, such a discussion will also help me get going with contributions. Or in other words, keep me from straying into territories that are better left uncharted.

@fabianegli fabianegli added help wanted Extra attention is needed question Further information is requested labels Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant