You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have an NSString that was converted from little-endian Unicode NSData. The first character of the string is the Unicode byte-order marker (BOM), which in little endian is 0xFF 0xFE.
The first time _loadMoreIfNecessary calls initWithBytes:length:encoding:, the BOM is in the buffer and the buffer is read correctly. However, when the second buffer is converted there is no BOM, and the data is treated as big-endian. This means that the second and all subsequent buffers of data are corrupted.
In one sense, the bug is that _loadMoreIfNecessary is converting each buffer of text independently, rather than maintaining conversion context from one buffer to the next. In general, text encodings require context to handle multi-byte characters, byte order markers and such. A more robust version of this function would use the lower-level Text Encoding Converter, which maintains context from one buffer to the next.
But an easier fix might be to change initWithCSVString: to use a fixed encoding like NSUTF16BigEndianStringEncoding rather than calling [csv fastestEncoding], which evaluates to NSUnicodeStringEncoding which is ambiguous. I believe that using a unambiguous encoding would prevent the error, even if its not as general a solution as using Text Encoding Converter.
The text was updated successfully, but these errors were encountered:
The convenience initializers now use a fixed encoding (NSUTF8StringEncoding), but this would still be an issue for NSInputStreams provided to the designated initializer.
I have an NSString that was converted from little-endian Unicode NSData. The first character of the string is the Unicode byte-order marker (BOM), which in little endian is 0xFF 0xFE.
The first time _loadMoreIfNecessary calls initWithBytes:length:encoding:, the BOM is in the buffer and the buffer is read correctly. However, when the second buffer is converted there is no BOM, and the data is treated as big-endian. This means that the second and all subsequent buffers of data are corrupted.
In one sense, the bug is that _loadMoreIfNecessary is converting each buffer of text independently, rather than maintaining conversion context from one buffer to the next. In general, text encodings require context to handle multi-byte characters, byte order markers and such. A more robust version of this function would use the lower-level Text Encoding Converter, which maintains context from one buffer to the next.
But an easier fix might be to change initWithCSVString: to use a fixed encoding like NSUTF16BigEndianStringEncoding rather than calling [csv fastestEncoding], which evaluates to NSUnicodeStringEncoding which is ambiguous. I believe that using a unambiguous encoding would prevent the error, even if its not as general a solution as using Text Encoding Converter.
The text was updated successfully, but these errors were encountered: