Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to use RE2J instead of java.util.regexp #78

Open
alebastrov opened this issue Feb 9, 2023 · 5 comments
Open

Make it possible to use RE2J instead of java.util.regexp #78

alebastrov opened this issue Feb 9, 2023 · 5 comments

Comments

@alebastrov
Copy link

As I See it works if only change imports so we need to create a factory for Pattern/Matcher and adaptors

@alebastrov
Copy link
Author

com.google.re2j re2j 1.7 runtime

@alebastrov alebastrov changed the title Make ability to use RE2J instead of java.util.regexp Make it possible to use RE2J instead of java.util.regexp Feb 9, 2023
@bpossolo
Copy link
Contributor

bpossolo commented Feb 26, 2023

can you provide more information about the purpose of this feature request?

are you concerned about the runtime performance of matching user-agent strings?
or is it about reducing start up time?
or does the RE2J support regexp patterns that java.util.regexp doesn't support?

I'm a little hesitant to make uap-java have a hard dependency on re2j since it would require all users to pull in another lib (which may have its own transitive dependencies... although honestly I haven't looked that deep to see if re2j depends on anything).

how would you envision this working?
would it be like a java service provider/implementation... whereby the user adds re2j to the classpath and the regexp engine is specified by name at runtime? that might get a little complicated because it would likely require a wrapper around re2j that follows the java service provider spec so it could be plugged in.

@alebastrov
Copy link
Author

Hi
I'm concerned about the runtime performance of matching user-agent strings. the regular expression syntax accepted by RE2 is a subset of that accepted by PCRE. I believe your regexp's are not using unsupported features of RE2. Unlike PCRE it has o(n) validation/search time (i.e. each symbol is checking only once). I think creating some interface facade for PCRE and RE2 will be enough.

Page https://swtch.com/~rsc/regexp/regexp3.html#caveats describes sets of features which are not supported
(lookahead or lookbehind assertions, backreferences, atomic grouping operators (?>...) and ++)

The main goal for developing it is that RE2 provides stronger guarantees on execution time than and enables high-level analyses that would be difficult or impossible with ad hoc implementations

@alebastrov
Copy link
Author

Hm
I see

Object.keys(regexes).forEach(function (parser) {
    suite(`no reverse lookup in ${parser}`, function () {
      regexes[parser].forEach(function(item) {
        test(item.regex, function () {
          if (/\(\?<[!=]/.test(item.regex)) {
            assert.ok(false, 'go parser does not support regex lookbehind. See https://github.com/google/re2/wiki/Syntax')
          }
          if (/\(\?[!=]/.test(item.regex)) {
            assert.ok(false, 'go parser does not support regex lookahead. See https://github.com/google/re2/wiki/Syntax')
          }
        })
      })
    })
  })

Does it mean that RE2 is already implemented?

@bpossolo
Copy link
Contributor

bpossolo commented Mar 8, 2023

Does it mean that RE2 is already implemented?

the code you referenced is a javascript unit test in the other repo named uap-core. I don’t know why they’re checking for entries that are unsupported by the go runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants