Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add encoding argument to scripts #10

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

BazookaMusic
Copy link

Issue
On Windows 11, the measurement creation script failed on my machine with polars complaining that the file created is not in utf-8 encoding.

Solution
Force a utf-8 encoding so that polars can read the file, since it defaults to utf-8.

Error message:

C:\Users\_\Documents\1brc\createMeasurements.py:426: DataOrientationWarning: Row orientation inferred during DataFrame construction. Explicitly specify the orientation by passing `orient="row"` to silence this warning.
  stations = pl.DataFrame(STATIONS, ("names", "means"))
Creating measurement file 'myfile.res' with 1,000,000,000 measurements...
  0%|                                                                                          | 0/100 [00:00<?, ?it/s]C:\Users\_\Documents\1brc\createMeasurements.py:463: UserWarning: Polars found a filename. Ensure you pass a path to the file instead of a python file object when possible for best performance.
  data.write_csv(f, separator=sep, float_precision=1, include_header=False)
  0%|                                                                                          | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\_\Documents\1brc\createMeasurements.py", line 507, in <module>
    measurement.generate_measurement_file(
  File "C:\Users\_\Documents\1brc\createMeasurements.py", line 463, in generate_measurement_file
    data.write_csv(f, separator=sep, float_precision=1, include_header=False)
  File "C:\Users\_\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\polars\dataframe\frame.py", line 2696, in write_csv
    self._df.write_csv(
polars.exceptions.InvalidOperationError: file encoding is not UTF-8

@BazookaMusic
Copy link
Author

Let me know if we'd like to respect the default encoding of the system, where the solution would have to be some change in the polars write to csv code

@ifnesi
Copy link
Owner

ifnesi commented Jul 8, 2024

Hi @BazookaMusic, thank you. It was a similar suggestion by @Askill. However, to make sure it will not break for other users, how about we add a new argparse argument (--encoding) where the default is None but we give a chance to other users to set what encoding they prefer? Then on the other scripts we would need to have argparse arguments such as --input (for input file, the default is measurements.txt) and --encoding (default is None)

@BazookaMusic
Copy link
Author

@ifnesi Done and done

@BazookaMusic BazookaMusic changed the title Windows 11 - Add utf-8 to file creation Add encoding argument to scripts Jul 13, 2024
@BazookaMusic
Copy link
Author

However, it doesn't make sense to add a flag for encoding for the other scripts as they assume that the file is in utf-8 already. Binary mode reading does not support passing an encoding parameter. Also the code does comparisons like:
return f.read(1) == b"\n" where it assumes the input file is in the same encoding as the python file, which should be utf-8.

See also: f"{location.decode('utf-8')}={measurements[0]:.1f}/{(measurements[2] / measurements[3]) if measurements[3] !=0 else 0:.1f}/{measurements[1]:.1f}",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants