Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression in >= v0.10.0 for spatial joins #359

Open
H-Plus-Time opened this issue Jul 5, 2024 · 3 comments
Open

Performance regression in >= v0.10.0 for spatial joins #359

H-Plus-Time opened this issue Jul 5, 2024 · 3 comments

Comments

@H-Plus-Time
Copy link

NB: The binding errors present in 0.9.0 -> 0.9.2 prevent this from being checked for those versions, so exactly when this started is a little fuzzy.

For a query along the lines of (note: for v0.8.1, the geom column was wkb_geometry::GEOMETRY):

SELECT l.id FROM locations l JOIN catchment_geojson gj
ON ST_CONTAINS(gj.geom, l.point);

where locations is ~1k points, and catchment_geojson is a single-row table of multipolygon geometries, execution time jumps from 0.02s to 22.83s.

With the addition of SET disabled_optimizers = 'extension';, execution time drops back down to that observed in v0.8.1 (with an identical physical plan). Semi-joins (appropriate in scenarios like this) also produce the same BNL-centric physical plan (as opposed to the NL Join + Filter plan). An inner join against an exploded (~280 polygons) version of the multipolygon brings total execution time down to 1.6s.

By the looks of the Range Join rewriting logic at https://github.com/duckdb/duckdb_spatial/blob/v1.0.0/spatial/src/spatial/core/optimizer_rules.cpp#L33 , the factors that produce this kind of perf degradation are:

  • <122k rows (duckdb's row group size, the minimum threshold for evaluating a filter predicate in parallel).
  • High false-positive for the bounding box range query (multi-polygons, highly concave polygons (e.g. a cross, which intersects with a tiny proportion of its bounding box)).

Profiling

Before (v0.8.1):

┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││        Total Time: 0.0148s        ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
┌───────────────────────────┐
│      EXPLAIN_ANALYZE      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             0             │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             id            │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            202            │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│     BLOCKWISE_NL_JOIN     │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           INNER           │
│      st_contains(CAST     │
│(wkb_geometry AS GEOME...  ├──────────────┐
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │              │
│            202            │              │
│          (0.01s)          │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         SEQ_SCAN          ││         SEQ_SCAN          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│         locations         ││     catchment_geojson     │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           point           ││        wkb_geometry       │
│             id            ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││           EC: 0           │
│           EC: 0           ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││             1             │
│            1000           ││          (0.00s)          │
│          (0.00s)          ││                           │
└───────────────────────────┘└───────────────────────────┘

After (v0.10.0):

┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││    Query Profiling Information    ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
EXPLAIN ANALYZE SELECT l.id   FROM locations l JOIN catchment_geojson gj ON ST_Contains(gj.geom, l.point);
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││         Total Time: 22.83s        ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
┌───────────────────────────┐
│      RESULT_COLLECTOR     │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             0             │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      EXPLAIN_ANALYZE      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             0             │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             id            │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            202            │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           FILTER          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│  ST_Contains(geom, point) │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          EC: 1000         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            202            │
│          (22.83s)         │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      NESTED_LOOP_JOIN     │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           INNER           │
│ ST_YMax(point) >= ST_YMin │
│           (geom)          │
│ ST_YMin(point) <= ST_YMax │
│           (geom)          │
│ ST_XMax(point) >= ST_XMin │
│           (geom)          ├──────────────┐
│ ST_XMin(point) <= ST_XMax │              │
│           (geom)          │              │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │              │
│          EC: 1000         │              │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │              │
│            1000           │              │
│          (0.00s)          │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         SEQ_SCAN          ││         SEQ_SCAN          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│         locations         ││     catchment_geojson     │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           point           ││            geom           │
│             id            ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││           EC: 1           │
│          EC: 1000         ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││             1             │
│            1000           ││          (0.00s)          │
│          (0.00s)          ││                           │
└───────────────────────────┘└───────────────────────────┘

Query plan diagnostics

Before (v0.8.1):

┌─────────────────────────────┐
│┌───────────────────────────┐│
││ Unoptimized Logical Plan  ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             id            │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│          ANY_JOIN         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│      st_contains(CAST     ├──────────────┐
│(wkb_geometry AS GEOME...  │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│          SEQ_SCAN         ││          SEQ_SCAN         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│     catchment_geojson     ││         locations         │
└───────────────────────────┘└───────────────────────────┘

┌─────────────────────────────┐
│┌───────────────────────────┐│
││  Optimized Logical Plan   ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             id            │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│          ANY_JOIN         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│      st_contains(CAST     ├──────────────┐
│(wkb_geometry AS GEOME...  │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│          SEQ_SCAN         ││          SEQ_SCAN         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│     catchment_geojson     ││         locations         │
└───────────────────────────┘└───────────────────────────┘

┌─────────────────────────────┐
│┌───────────────────────────┐│
││       Physical Plan       ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             id            │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│     BLOCKWISE_NL_JOIN     │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           INNER           ├──────────────┐
│      st_contains(CAST     │              │
│(wkb_geometry AS GEOME...  │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         SEQ_SCAN          ││         SEQ_SCAN          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│     catchment_geojson     ││         locations         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│        wkb_geometry       ││           point           │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││             id            │
│           EC: 0           ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│                           ││           EC: 0           │
└───────────────────────────┘└───────────────────────────┘

After (v0.10.0):

┌─────────────────────────────┐
│┌───────────────────────────┐│
││ Unoptimized Logical Plan  ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             id            │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│          ANY_JOIN         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ├──────────────┐
│  ST_Contains(geom, point) │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│          SEQ_SCAN         ││          SEQ_SCAN         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│     catchment_geojson     ││         locations         │
└───────────────────────────┘└───────────────────────────┘

┌─────────────────────────────┐
│┌───────────────────────────┐│
││  Optimized Logical Plan   ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             id            │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           FILTER          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│  ST_Contains(geom, point) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      COMPARISON_JOIN      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           INNER           │
│(ST_XMin(ST_Extent(point)) │
│<= ST_XMax(ST_Extent(geom))│
│             )             │
│(ST_XMax(ST_Extent(point)) │
│>= ST_XMin(ST_Extent(geom))├──────────────┐
│             )             │              │
│(ST_YMin(ST_Extent(point)) │              │
│<= ST_YMax(ST_Extent(geom))│              │
│             )             │              │
│(ST_YMax(ST_Extent(point)) │              │
│>= ST_YMin(ST_Extent(geom))│              │
│             )             │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│          SEQ_SCAN         ││          SEQ_SCAN         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│         locations         ││     catchment_geojson     │
└───────────────────────────┘└───────────────────────────┘

┌─────────────────────────────┐
│┌───────────────────────────┐│
││       Physical Plan       ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             id            │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           FILTER          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│  ST_Contains(geom, point) │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│         EC: 1000          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      NESTED_LOOP_JOIN     │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           INNER           │
│ST_YMax(ST_Extent(point)) >│
│ = ST_YMin(ST_Extent(geom))│
│ST_YMin(ST_Extent(point)) <│
│ = ST_YMax(ST_Extent(geom))├──────────────┐
│ST_XMax(ST_Extent(point)) >│              │
│ = ST_XMin(ST_Extent(geom))│              │
│ST_XMin(ST_Extent(point)) <│              │
│ = ST_XMax(ST_Extent(geom))│              │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │              │
│         EC: 1000          │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         SEQ_SCAN          ││         SEQ_SCAN          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│         locations         ││     catchment_geojson     │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           point           ││            geom           │
│             id            ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││           EC: 1           │
│         EC: 1000          ││                           │
└───────────────────────────┘└───────────────────────────┘
@Maxxen
Copy link
Member

Maxxen commented Jul 5, 2024

Hi! Thanks for reporting this issue!

The problem seem to be that the range rewrite does not produce a IE-Join like intended but still remains a NL-join. It could be that something changed duckdb side that made the planning differ in the latest version.

Ill investigate.

@Maxxen
Copy link
Member

Maxxen commented Jul 5, 2024

What d you observe if you set "SET prefer_range_joins=true"?

@H-Plus-Time
Copy link
Author

Mm, I noticed joins involving Polygons (still stored as Geometry, but presumably those are auto-cast to duckdb Polygon_2D internally) were correctly using IE-Joins (of course, much better hit rate given their bounding boxes were a fair bit closer to their convex hull), though still paired with a Filter(ST_Contains(...)).

No luck with SET prefer_range_joins=true, I still get an NL-join.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants