Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] - Test using re2 regex pkg in place of std regex #1542

Closed
wants to merge 7 commits into from

Conversation

ahrav
Copy link
Collaborator

@ahrav ahrav commented Jul 24, 2023

Some benchmarks testing go std lib regex, vs re2 vs oniguruma.

re2 - https://github.com/wasilibs/go-re2
oniguruma - https://github.com/go-enry/go-oniguruma

old.txt -> go std lib
new.txt -> oniguruma
newest.txt -> re2 (too brain dead to name it better lol)

go vs oniguruma
Screenshot 2023-08-17 at 7 38 55 PM

go vs re2
Screenshot 2023-08-17 at 7 41 55 PM

oniguruma vs re2
Screenshot 2023-08-17 at 7 42 19 PM

@zricethezav
Copy link
Collaborator

@ahrav have you seen https://github.com/go-enry/go-oniguruma? I played around with introducing this into gitleaks once upon a time and the results were very impressive. This was before keywords were introduced into the detecting algorithm so the results were more dramatic. Still, go-onigurama might be worth investigating as it has better performance than re2 in most cases and can be dropped in iirc.

@ahrav
Copy link
Collaborator Author

ahrav commented Aug 17, 2023

Ohhhh, wow. I did not see this at all. Yea I can definitely give this a spin. Thanks for pointing it out.

@ahrav
Copy link
Collaborator Author

ahrav commented Aug 18, 2023

@zricethezav added some benchmarks after i did some testing w/ oniguruma and re2.

@zricethezav
Copy link
Collaborator

@zricethezav added some benchmarks after i did some testing w/ oniguruma and re2.

@ahrav Based on what I'm seeing looks like re2 wins out right? That's good news since onigurama would require some cgo flags if im not mistaken

@ahrav
Copy link
Collaborator Author

ahrav commented Aug 22, 2023

@zricethezav added some benchmarks after i did some testing w/ oniguruma and re2.

@ahrav Based on what I'm seeing looks like re2 wins out right? That's good news since onigurama would require some cgo flags if im not mistaken

Yep that's right. You are correct I had to set the CGO flags when testing oniguruma as well which wasn't great. The re2 change is a pretty nice and easy drop in and replace.

@dustin-decker
Copy link
Contributor

Since it's API compatible w/ go's regexp I think we could just consider doing a go mod replace directive.

@ahrav
Copy link
Collaborator Author

ahrav commented Aug 23, 2023

Since it's API compatible w/ go's regexp I think we could just consider doing a go mod replace directive.

I don't think you can go mod replace a std lib pkg. I could be wrong though.

@ahrav
Copy link
Collaborator Author

ahrav commented Aug 23, 2023

I do wonder with this more efficient regex engine, could we explore a larger chunk size?

@dustin-decker
Copy link
Contributor

Closing because #2324

@dustin-decker dustin-decker deleted the use-re2-regex branch January 26, 2024 06:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants