You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Read returned by JsonReader::next_string_reader will always provide valid UTF-8 bytes (assuming there is no bug in a JsonReader implementation). It might therefore be convenient if it was possible to directly obtain str pieces from it. Given that the data should be valid UTF-8, this might also allow unchecked UTF-8 conversion to further improve performance.
Enhancement description
📢 This is only a proposal and not a planned feature yet. If you are interested in this, add a 👍 or feel free to comment (also with any suggestions and feedback).
Add a new StringValueReader trait (similarly named as the existing StringValueWriter trait), and let JsonReader::next_string_reader return that.
Add a read_str method to the trait which takes a &[u8] as input and returns a &str backed by it as output. Since StringValueReader should already provide valid UTF-8 data through its read method, this new read_str method could have the following default implementation:
// TODO doc: Returned str might be shorter than `buf` if EOF has been reached (possibly even empty str),// or if last UTF-8 char bytes might not have fit completely into `buf`// TODO: This implementation should probably retry reading in case of `ErrorKind::Interrupted`,// and document this behavior then, similar to `StringValueWriter::write_str`fnread_str<'a>(&mutself,buf:&'a mut[u8]) -> Result<&'a str,IoError>{let buf_len = buf.len();if buf_len < utf8::MAX_BYTES_PER_CHAR{panic!("Buffer is too small");}// Reserve space at end of buf to make sure no incomplete UTF-8 data is read, and// all missing continuation bytes of last char (if any) still have enough spacelet fill_buf = &mut buf[..(buf_len - (utf8::MAX_BYTES_PER_CHAR - 1))];debug_assert!(!fill_buf.is_empty());// Read once to determine if previous `read` left incomplete UTF-8 data in buffer; in that case fail fastletmut pos = self.read(fill_buf)?;if pos == 0{// Reached EOF, return empty stringreturnOk("");}elseif utf8::is_continuation(fill_buf[0]){returnErr(IoError::new(ErrorKind::InvalidData,"previous read left incomplete UTF-8 data",));}while pos < fill_buf.len(){let read = self.read(&mut fill_buf[pos..])?;if read == 0{break;}
pos += read;}// Find position where UTF-8 bytes of last char startletmut last_char_start_pos = pos;while utf8::is_continuation(buf[last_char_start_pos]){// This should be overflow safe, because StringValueReader should always provide valid UTF-8 bytes,// and due to `is_continuation` check at beginning of method
last_char_start_pos -= 1;}let last_char_start_byte = buf[last_char_start_pos];letmut missing_bytes = if utf8::is_1byte(last_char_start_byte){0}elseif utf8::is_2byte_start(last_char_start_byte){1}elseif utf8::is_3byte_start(last_char_start_byte){2}elseif utf8::is_4byte_start(last_char_start_byte){3}else{// StringValueReader should always provide valid UTF-8 bytespanic!("Unexpected: Malformed UTF-8 bytes in buffer: {buf:02X?}");};// Subtract number of continuation bytes which are already in buffer
missing_bytes = usize::checked_sub(missing_bytes, pos - last_char_start_pos)// StringValueReader should always provide valid UTF-8 bytes, but apparently it provided too many continuation bytes.expect("Unexpected: Malformed UTF-8 bytes in buffer");// If necessary, read remaining bytes of last charif missing_bytes > 0{self.read_exact(&mut buf[pos..(pos + missing_bytes)])?;
pos += missing_bytes;}// This should be safe because StringValueReader should only return valid UTF-8 bytes// TODO: Maybe this is too risky for custom JsonReader implementations, where it cannot be guaranteed// that they are bug free and only return valid UTF-8 datalet string = utf8::to_str_unchecked(&buf[..pos]);Ok(string)}
⚠️ This code has not been tested yet!
It might then also be possible to override Read::read_to_string to use read_str instead for the string pieces to avoid UTF-8 validation overhead; though this might not be worth it, especially since the default read_to_string implementation seems to be highly optimized already.
Alternatives / workarounds
Don't add any new method and let user handle the UTF-8 conversion
at the risk that users would have to re-implement this themselves and might have bugs in their implementation
requiring the users to perform unchecked UTF-8 conversion in their code, if desired
The text was updated successfully, but these errors were encountered:
Problem solved by the enhancement
The
Read
returned byJsonReader::next_string_reader
will always provide valid UTF-8 bytes (assuming there is no bug in aJsonReader
implementation). It might therefore be convenient if it was possible to directly obtainstr
pieces from it. Given that the data should be valid UTF-8, this might also allow unchecked UTF-8 conversion to further improve performance.Enhancement description
📢 This is only a proposal and not a planned feature yet. If you are interested in this, add a 👍 or feel free to comment (also with any suggestions and feedback).
Add a new
StringValueReader
trait (similarly named as the existingStringValueWriter
trait), and letJsonReader::next_string_reader
return that.Add a
read_str
method to the trait which takes a&[u8]
as input and returns a&str
backed by it as output. SinceStringValueReader
should already provide valid UTF-8 data through itsread
method, this newread_str
method could have the following default implementation:It might then also be possible to override
Read::read_to_string
to useread_str
instead for the string pieces to avoid UTF-8 validation overhead; though this might not be worth it, especially since the defaultread_to_string
implementation seems to be highly optimized already.Alternatives / workarounds
The text was updated successfully, but these errors were encountered: