Write your own WC - Python
Though, I might have used a million times, when implementing it I had a chance to sidestep and understand UTF8 encodings, multi-byte characters, encoding of emoj with zero with joiners,the unix definition of a POSIX text file
The code is structured under wordcount and is using Pipenv as the package manager as I find it better than Poetry due to the overhead in learning new package/dependency manager.
./wc.py --help
usage: wc.py [-h] [-m] [-c] [-l] [-w] [-L] [--version] [-v] [infiles ...]
A program to count words (wc)
positional arguments:
infiles The name of the input files
options:
-h, --help show this help message and exit
-m, --chars Count the characters
-c, --bytes Count the bytes
-l, --lines Count the lines
-w, --word Count the words
-L, --max-line-length
print the maximum display width
--version output the version and exit
-v, --verbose Increase the verbosity
Checking the input text mentioned in the challenge step zero
The output from wc (GNU Coreutils) 8.32 on Ubuntu with (LC_NAME=sv_SE.UTF-8)
wc tests/testdata/test.txt
7145 58164 342147 tests/testdata/test.txt
The output from wc.py for the same test.txt,
./wc.py tests/testdata/test.txt
7145 58164 342147 tests/testdata/test.txt
The output for the various options are same as that of the wc tool and it is verified using both Unit test (Pytests) and functional tests
A sample run for stdin,
echo -n "TwoLines\n\n" | ./wc.py
2 1 10 <stdin>
Only an single module wc.py, and it uses argparse module for handling arguments, process the file content and outputs depending on the input arguments. The output function attempts to align with wc's structure but that is just me trying to play with the f'string format.
To run the tests, ensure the pipenv shell is activated, as we have pytest dependency.
pytest -vv
Checkout the testcases and see the various inputs it uses.
To verify more inputs testdata is used which to compare alignment with wc result by running test.sh.