Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any workaround for split? #85

Open
i-am-the-slime opened this issue Aug 12, 2024 · 2 comments
Open

Is there any workaround for split? #85

i-am-the-slime opened this issue Aug 12, 2024 · 2 comments

Comments

@i-am-the-slime
Copy link

Thanks for this nice library!

I'm using this library from another language that can compile to Golang.
I've now finally hit the case where I use a library that needs split on regex. You mention in the README that this you're still working on this. Do you happen to have a draft or other unfinished code that can do some splitting (maybe slow, maybe wrong in edge cases)?

@dlclark
Copy link
Owner

dlclark commented Aug 12, 2024

I had written a split function (based on C#) for the code-gen version of the library. I suspect it'll work with the main version as well, but there are probably edge cases:

// Split splits the given input string using the pattern and returns
// a slice of the parts. Count limits the number of matches to process.
// If Count is -1, then it will process the input fully.
// If Count is 0, returns nil. If Count is 1, returns the original input.
// The only expected error is a Timeout, if it's set.
//
// If capturing parentheses are used in the Regex expression, any captured
// text is included in the resulting string array
// For example, a pattern of "-" Split("a-b") will return ["a", "b"]
// but a pattern with "(-)" Split ("a-b") will return ["a", "-", "b"]
func (re *Regexp) Split(input string, count int) ([]string, error) {
	if count < -1 {
		return nil, errors.New("count too small")
	}
	if count == 0 {
		return nil, nil
	}
	if count == 1 {
		return []string{input}, nil
	}
	if count == -1 {
		// no limit
		count = math.MaxInt64
	}

	// iterate through the matches
	priorIndex := 0
	var retVal []string
	var txt []rune

	m, err := re.FindStringMatch(input)

	for ; m != nil && count > 0; m, err = re.FindNextMatch(m) {
		txt = m.text
		// if we have an m, we don't have an err
		// append our match
		retVal = append(retVal, string(txt[priorIndex:m.Index]))
		// append any capture groups, skipping group 0
		gs := m.Groups()
		for i := 1; i < len(gs); i++ {
			retVal = append(retVal, gs[i].String())
		}
		priorIndex = m.Index + m.Length
		count--
	}

	if err != nil {
		return nil, err
	}

	if txt == nil {
		// we never matched, return the original string
		return []string{input}, nil
	}

	// append our remainder
	retVal = append(retVal, string(txt[priorIndex:]))

	return retVal, nil
}

It uses the m.txt private field, but I'm sure it could be written without it for your purposes. Let me know if you run into any issues. I could look at adding this to the main library version.

@dlclark
Copy link
Owner

dlclark commented Aug 15, 2024

@i-am-the-slime did this help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants