Skip to content

Discussion points

Hannes Hauswedell edited this page Feb 17, 2022 · 7 revisions

A list of things we should discuss before a 1.0 release (create individual pages with details if more than a paragraph is needed):

  1. Should all bio::field enum entries be distinct for simplicity?
  2. "Deep records":
  • Shallow records are default and recommended.
  • Deep records are required sometimes, e.g. in combination with `views::async_input_buffer
  • Currently, one can select deep records via template parameter and the options. This means all formats need to implement it (not that much work actually) and that the options and dynamic_type are more complicated (a little annoying).
  • An alternative design would be to have the formats always output shallow records and offer a generic .make_deep_copy() on the record that returns a self-contained record (this would automatically turn views into vectors...).
  • PRO: the overall design becomes easier to understand; a little less work for format input handlers
  • CON: you cannot specify the specific "deep" types anymore, the record always picks e.g. vector for views; certain optimisations are no longer possible (e.g. deep FASTA reading into std::string can currently avoid a copy by swapping buffers with output strings; this wouldn't be possible in the changed design)
  1. Should all concepts that constrain public interfaces be public? Especially the field_types have lots of requirements. I am afraid this will clutter the documentation significantly.
  2. What happens when a writer receives no record?
  • Currently, nothing happens. But it should write the header to create a valid "empty" file. → the destructor should write the header. this is now implemented for VCF and BCF
  • What happens when there is no header, because e.g. you stream reader | std::views::filter(foo) | writer; and the filter removes all records? → The assignment/pipe operator should "unpeel" the input range and try to find out if the most-underlying range is a reader and if yes, get that that reader's header and set it. not yet implemented
  1. Do we want to gracefully accept char const * as a string type? I am trying to do this everywhere, but it is a real hassle, because the type is not a range.
  2. What to do about non-io exceptions thrown within IO, e.g. failure to convert string to number? Currently results in error without context. Proper solution: catch exception and rethrow with context. But we don't want to catch inside the library. Alternative solution: check in destructor of the respective format_input_handler whether stack is unwinding and a non-io exception is being thrown, if yes, print context to stderr.
  3. raw reading/writing:
  • we officially support the field type std::span<std::byte const> for "raw" reading/writing.
  • this is currently poorly tested and it definitely can't work for some fields, e.g, BCF writing needs to know certain things, it cannot just "write bytes".
  • it also breaks abstraction, i.e. what do we do when we read VCF and write BCF and some field is "raw"? When/where is this detected? I am OK with creating output files that contain non-sensical values, but I am not OK with creating invalid files (broken format).
Clone this wiki locally