Files versus string literals - I guess I just don't get it #337

sbeitzel · 2024-02-01T23:17:58Z

sbeitzel
Feb 1, 2024

I first started using this library to have fun with some Advent of Code problems, and I noticed something weird. As I worked on my solutions, I'd first write a test case so I could debug my solution, and I'd provide the test data as a multiline string in the test case. For example:

    let sampleData = """
1000
2000
3000

4000

5000
6000

7000
8000
9000

10000


"""

Then, once my code worked properly, I'd download the real data to a local file, and invoke my solver. The solver would open the local file, read it into memory, and then fail. I found this quite frustrating, and ultimately solved the problem by reading the local file from STDIN, a line at a time, because consistently, processing a single line would work no matter what the source, but processing multiple lines would not.

Now, I'm working on a project where I'm trying to import a database dump and once again, I'm encountering this problem. I've managed to parse fields out of a line, and multiple lines out of a string literal, but the parsing fails spectacularly when reading from a file. It's as if the line endings are being consumed by the operating system, except it's even weirder than that.

In my test case, I've got code that loads the file from disk:

        guard let playersURL = Bundle(for: ArtooParseTests.self)
            .url(forResource: "players", withExtension: "txt")
        else {
            XCTFail("Unable to find players.txt")
            return
        }
        let playersData = try String(contentsOf: playersURL, encoding: .isoLatin1)
        let dataLines = playersData.split(separator: "\r\n", omittingEmptySubsequences: true)

Subsequently, I iterate over all the lines and run my "line parser" on each one. This works.

Then, if I try to pass the data to a parser defined to process a bunch of lines, it fails. I can make it fail in a variety of ways, too. Most straightforward is, the result is that instead of 47k lines I get only one line with...well, a lot of values. Beyond the naive approach, I get either infinite loops or a failure to consume input, depending on whether I use literals, just a OneOf, Many, or an Optionally for the line ending separator.

let r2Eol =
OneOf {
    "\r\n"
    "\r"
    "\n"
}

public struct R2Parser: Parser {
    public var body: some Parser<Substring, [[String]]> {
        Many {
            R2Line()
        } separator: {
            r2Eol
        } terminator: {
            End()
        }
    }
}

The most frustrating thing is trying to convert from a linguistic expression to a code expression. Clearly and provably, my line parser is working. So what I want is, "Zero or more line, where each line is terminated by either an end of line or end of file."

Okay, that's the second most frustrating thing, because the most frustrating thing is that when I provide all the data as a multiline string literal, my parsers work as written. Which suggests that the problem is something to do with the invisible line ending sequence -- the difference between \r\n and \n.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files versus string literals - I guess I just don't get it #337

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Files versus string literals - I guess I just don't get it #337

sbeitzel Feb 1, 2024

Replies: 0 comments

sbeitzel
Feb 1, 2024