-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return char
instead of u8
from InvalidCharError::invalid_char
#100
Comments
We had a similar bug in rust-bech32 in our new Agreed with your strategy. Ideas for the new method name
|
|
I had a play with this which led me to the view that
I hacked up a /// Validates that input `string` contains valid hex characters.
pub fn validate_hex_string(s: &str) -> Result<(), NonHexDigitError> {
for c in s.chars() {
if !c.is_ascii_hexdigit() {
return Err(NonHexDigitError { invalid: c});
}
}
Ok(())
} Perhaps a middle ground would be to add this to crate docs under some sort of |
UTF-8 is designed so that as soon as you hit a non-ASCII byte you know that it's the start of a non-ASCII character. So we shouldn't need to use the slow |
That sounds feasible, I'll have a play with it. Thanks |
Another possible approach that doesn't require explicitly tracking the position and relies on iterators only: // We're not using `.bytes()` because it doesn't have the `as_slice` method.
let mut bytes = s.as_bytes().iter();
// clone() is actually a copy and it allows peeking behavior without losing access to the `as_slice` method on the iterator.
while let Some(byte) = bytes.clone().next() {
match decode_hex_digit(byte) {
Some(value) => { /* do something with the value here */ },
None => {
// SAFETY: all bytes up to this one were ACII which implies valid UTF-8 and the input was valid UTF-8 to begin with so splitting the string here is sound. We're also making sure we split it at correct position by cloning the iterator.
let remaining_str = unsafe { core::str::from_utf8_unchecked(bytes.as_slice()) };
return Err(InvalidChar { c: remaining_str.chars().next().unwrap(), pos: s.len() - remaining_str.len() });
},
}
bytes.next()
} |
👀 can you really call |
LOl, I forgot to write |
Ah! The |
FTR I was also considering having |
What do you mean by this? |
Basically if we ever add functions that take |
Ah. yeah, that's probably best. |
If the string contains a multi-byte UTF-8 char then the error returns confusing value unsuitable for the user.
This change is kinda breaking but we can phase it slowly: add
invalid_char_human(&self) -> char
method (please invent a better name!), implementchar
->u8
conversion insideinvalid_char
by encoding the first byte of the char and deprecate it.I suspect this may need a bunch of changes to be able to get the entire char though.
The text was updated successfully, but these errors were encountered: