Skip to content

Commit

Permalink
i/prompting: implement path pattern precedence (canonical#13868)
Browse files Browse the repository at this point in the history
* i/prompting: implement path pattern precedence

When a path matches multiple patterns, the most specific of those
patterns must be selected so that the associated rule can take
precedence.

When comparing the precedence of path patterns, those patterns cannot
contain groups, since it is necessary to know which variants of the
pattern were matched against the path in question. Thus, path pattern
precedence must only be computed between patterns which have already
been rendered.

In general, at the first character where two patterns differ, the one
which is more restrictive has precedence (i.e. '/' over '?', '?' over
'*'). Less restrictive patterns are repeatedly pruned until a single
most restrictive pattern remains. That pattern has precedence.

One property which emerges from this precedence system is that the
pattern which most specifically restricts the location of matching paths
takes precedence. For example, the following patterns all match
`/foo/bar/baz/myfile.txt` (though this is not an exhaustive list) and
are sorted from highest to lowest precedence:

1.  /foo/bar/baz/myfile.txt
2.  /foo/bar/baz/m?file.*
3.  /foo/bar/baz/m*file.txt
4.  /foo/bar/baz/m*file*
5.  /foo/bar/baz/*
6.  /foo/bar/*/myfile.txt
7.  /foo/bar/*/myfile*
8.  /foo/bar/*/*
9.  /foo/bar/**/baz/myfile.txt
10. /foo/bar/**/myfile.txt
11. /foo/bar/**/baz/*.txt
12. /foo/bar/**/baz/*
13. /foo/bar/**
14. /foo/b?r/baz/myfile.txt
15. /foo/?*/baz/myfile.txt
16. /foo/?*/**
17. /foo/*?/baz/myfile.txt
18. /foo/*/baz/myfile.txt
19. /foo/*/baz/*
20. /foo/*/*/*
21. /foo/*/**/baz/myfile.txt
22. /foo/**/baz/myfile.txt
23. /foo/**/baz/**
24. /**/myfile.txt
25. /**

Signed-off-by: Oliver Calder <[email protected]>

i/prompting: add tests for precedence between example patterns

Signed-off-by: Oliver Calder <[email protected]>

* i/prompting/patterns: add pattern variant type for precedence comparison

After a path pattern is rendered into variants, parse each variant into
its constituent components so that precedence between any variants can
be decided based on the order and occurrence of those components.

For example, `/foo/*/bar/b?z/**` has precedence over `/foo/*/b?r/baz/**`.

There are several types of scenarios which make it difficult to compute
precedence in all cases using a simple mechanism:

- `/foo/` > `/foo` > `/foo/**`
- `/foo/**/bar` > `/foo/**/` > `/foo/**`
- `/foo/bar/` > `/foo/b*r/`
- `/foo/**/x/b*r` > `/foo/**/bar/`
- `/foo/bar/**` > `/foo/b?r/**/baz` > `/foo/b*r/baz`

The current implementation fails to address all these scenarios, namely
because it relies on the length of literals (to catch `/foo/bar/` vs
`/foo/b*r/`, for example), but this falls apart for literals which occur
after a previous `/**/`, since a shorter literal may match an earlier
component of a hypothetical path.

Additionally, we cannot always compute precedence between two variants:
Both `/foo/**/bar/**` and `/foo/**/baz/**` match `/foo/bar/baz`, but
nothing about the order or length of components can tell us which should
have precedence. In this case, we would likely say the former has
precedence, since `bar` matches earlier in the path, but if the path
were `/foo/baz/bar`, we would likely say the latter has precedence.

Thus, we must use the path against which the matches were made when
computing precedence between pattern variants.

Signed-off-by: Oliver Calder <[email protected]>

* i/prompting/patterns: compute precedence based on matching path

With knowledge of the path which matched every pattern variant we're
comparing, we can compute which of those variants has the highest
precedence.

For example, the precedence of the following pattern variants cannot be
computed a priori:

- /foo/**/fizz/**
- /foo/**/buzz/**

However, with the matching path "/foo/bar/fizz/buzz", we can say that
"/foo/**/fizz/**" has precedence, since "fizz" matches earlier in the
path than "buzz". If instead the path were "/foo/bar/buzz/fizz", then
"/foo/**/buzz/**" would have precedence, for the same reason.

To compute this, we add a regular expression snippet for every
component, concatenate and compile them into a single regex for the
pattern variant, and match it against the path when computing
precedence, keeping track of submatches. As we iterate through the
components of the pattern variant, we also iterate through the
submatches, and we are thus able to tell more about how many bytes each
component matched. For variable-length component types (e.g. globstars
and doublestars), matching fewer bytes yields higher precedence.
Conversely, for literals, matching more bytes yields higher precedence,
as this implies that the literal itself is longer and thus more
specific, as literals may be followed by variable-length components.

This also helps reliable compute less-pathological cases, such as the
following:

- /foo/**/myfile*/**
- /foo/**/bar/myfile.txt

We may be able to tell that the latter has precedence, but a
deterministic algorithm to compute this is difficult to write and reason
about. By including a matching path, such as "/foo/bar/myfile.txt", we
are able to see that the "/**/" in the former pattern matches more
characters than that of the latter, and thus is a less precise match,
and has lower precedence.

Signed-off-by: Oliver Calder <[email protected]>

* i/prompting/patterns: use component submatch instead of storing length

Signed-off-by: Oliver Calder <[email protected]>

* i/prompting/patterns: fix bugs around handling doublestars and escaped runes

Signed-off-by: Oliver Calder <[email protected]>

* i/prompting/patterns: stack allocate pattern variants and improve documentation

Signed-off-by: Oliver Calder <[email protected]>

* i/prompting/patterns: add test case and improve robustness

Signed-off-by: Oliver Calder <[email protected]>

* i/prompting/patterns: fix path pattern json marshalling

Signed-off-by: Oliver Calder <[email protected]>

* i/prompting/patterns: improve documentation and unexport parsePatternVariant

Signed-off-by: Oliver Calder <[email protected]>

* i/prompting/patterns: add documentation and test cases

Signed-off-by: Oliver Calder <[email protected]>

---------

Signed-off-by: Oliver Calder <[email protected]>
  • Loading branch information
olivercalder authored Aug 5, 2024
1 parent 17c48f7 commit fbfd725
Show file tree
Hide file tree
Showing 7 changed files with 1,794 additions and 50 deletions.
22 changes: 22 additions & 0 deletions interfaces/prompting/patterns/export_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
// -*- Mode: Go; indent-tabs-mode: t -*-

/*
* Copyright (C) 2024 Canonical Ltd
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 3 as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*
*/

package patterns

var ParsePatternVariant = parsePatternVariant
74 changes: 35 additions & 39 deletions interfaces/prompting/patterns/patterns.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,15 @@ package patterns

import (
"encoding/json"
"errors"
"fmt"
"regexp"
"strings"

doublestar "github.com/bmatcuk/doublestar/v4"
)

var ErrNoPatterns = errors.New("cannot establish precedence: no patterns given")

// Limit the number of expanded path patterns for a particular pattern.
// When fully expanded, the number of patterns for a given unexpanded pattern
// may not exceed this limit.
Expand Down Expand Up @@ -76,7 +78,7 @@ func (p *PathPattern) Match(path string) (bool, error) {

// MarshalJSON implements json.Marshaller for PathPattern.
func (p *PathPattern) MarshalJSON() ([]byte, error) {
return []byte(p.original), nil
return json.Marshal(p.original)
}

// UnmarshalJSON implements json.Unmarshaller for PathPattern.
Expand All @@ -95,46 +97,13 @@ func (p *PathPattern) NumVariants() int {
}

// RenderAllVariants enumerates every alternative for each group in the path
// pattern and renders each one into a string which is then passed into the
// given observe closure, along with the index of the variant.
// pattern and renders each one into a PatternVariant which is then passed into
// the given observe closure, along with the index of the variant.
//
// The given observe closure should perform some action with the rendered
// variant, such as adding it to a data structure.
func (p *PathPattern) RenderAllVariants(observe func(index int, variant string)) {
cleanThenObserve := func(i int, variant string) {
cleaned := cleanPattern(variant)
observe(i, cleaned)
}
renderAllVariants(p.renderTree, cleanThenObserve)
}

var (
duplicateSlashes = regexp.MustCompile(`(^|[^\\])/+`)
charsDoublestar = regexp.MustCompile(`([^/\\])\*\*+`)
doublestarChars = regexp.MustCompile(`([^\\])\*\*+([^/])`)
duplicateDoublestar = regexp.MustCompile(`/\*\*(/\*\*)+`) // relies on charsDoublestar running first
starsAnyMaybeStars = regexp.MustCompile(`([^\\])\*+(\?\**)+`)
)

func cleanPattern(pattern string) string {
pattern = duplicateSlashes.ReplaceAllString(pattern, `${1}/`)
pattern = charsDoublestar.ReplaceAllString(pattern, `${1}*`)
pattern = doublestarChars.ReplaceAllString(pattern, `${1}*${2}`)
pattern = duplicateDoublestar.ReplaceAllString(pattern, `/**`)
pattern = starsAnyMaybeStars.ReplaceAllStringFunc(pattern, func(s string) string {
deleteStars := func(r rune) rune {
if r == '*' {
return -1
}
return r
}
return strings.Map(deleteStars, s) + "*"
})
if strings.HasSuffix(pattern, "/**/*") {
// Strip trailing "/*" from suffix
return pattern[:len(pattern)-len("/*")]
}
return pattern
func (p *PathPattern) RenderAllVariants(observe func(index int, variant PatternVariant)) {
renderAllVariants(p.renderTree, observe)
}

// PathPatternMatches returns true if the given pattern matches the given path.
Expand Down Expand Up @@ -173,3 +142,30 @@ func PathPatternMatches(pattern string, path string) (bool, error) {
// match paths like `/foo/`.
return doublestar.Match(pattern+"/", path)
}

// HighestPrecedencePattern determines which of the given path patterns is the
// most specific, and thus has the highest precedence.
//
// Precedence is only defined between patterns which match the same path.
// Precedence is determined according to which pattern places the earliest and
// greatest restriction on the path.
func HighestPrecedencePattern(patterns []PatternVariant, matchingPath string) (PatternVariant, error) {
switch len(patterns) {
case 0:
return PatternVariant{}, ErrNoPatterns
case 1:
return patterns[0], nil
}

currHighest := patterns[0]
for _, contender := range patterns[1:] {
result, err := currHighest.Compare(contender, matchingPath)
if err != nil {
return PatternVariant{}, err
}
if result < 0 {
currHighest = contender
}
}
return currHighest, nil
}
Loading

0 comments on commit fbfd725

Please sign in to comment.