Skip to content

Postgis 3, smarter download-osm, performance improvements

Compare
Choose a tag to compare
@nyurik nyurik released this 20 May 22:28
· 2 commits to v5.1 since this release

Summary

  • Migrate to Postgis 3.0.1 and osml10n v2.5.8, add GZIP extension
  • Improved download-osm tool
  • Docs and a script to quickly set up Google Cloud testing environment
  • A new profiling tool to evaluate PG function performance profile-pg-func
  • SQL performance improvements
  • Removed unused Natural Earth tables

download-osm Improvements

Many download-osm tool improvements (#255) - @nyurik

  • Added --output <file> param to specify the location and the name of the file to save. Also allows --force to override existing file.
  • allow area and URL downloading without explicitly specifying the source:
download-osm toronto      # will search Geofabrik, BBBike, and osm.fr
download-osm https://...  # will download from URL
  • Auto-detection logic:

    • if starts with http:// or https:// -- try it as a URL
    • look through Geofabrik catalog, matching the last portion of the name (e.g. michigan would match us/michigan) -- this is the same logic as before
    • look through BBBike catalog (string-insensitive comparison, must be a full match)
    • check if osm.fr gives a 200 rather than 404
    • fail otherwise
  • added cacheable BBBike catalog (scraped from a single HTML page), allowing non-exact searches (download-osm bbbike toronto instead of Toronto) and catalog listing (download-osm list bbbike)

  • better handling of mirrors that have just the latest planet file

  • added a few mirrors

  • CHANGE: --make-dc will now set OSM_AREA_NAME to the full ID of the area, e.g. for Geofabrik it will be europe/monaco instead of monaco

New profile-pg-func benchmarking utility

Add profile-pg-func to evaluate PG function speed (#263) - @nyurik

profile-pg-func is a micro-benchmarking tool that runs all given functions thousands of times, in several runs, discarding slowest and fastest results. Use it to analyze relative performance between function implementations.

profile-pg-func [--file <file>]... <func>... [--raw]
                [--calls <calls>] [--runs <runs>] [--trim <trim>]
                [--verbose] [--pghost=<host>] [--pgport=<port>] [--dbname=<db>]
                [--user=<user>] [--password=<password>]

<func>        A SQL function call that should be tested, e.g. 'random()'.
              The func code can use call_idx column -- it will be set to  1..{calls}
              If --raw is used, the <func> must be a complete query.
              These magical values will be substituted in <func> before running:
              * {calls} will be replaced with --calls value
              * {run_idx}  will be replaced with the current RUN number (1..{runs})
              * {random_geopoint} will be replaced with a function call to generate
                a random geo point.

Remove unused Natural Earth tables

  • Shrink the size of the import-data and postgis-preloaded images by removing all tables that OMT does not use. (#262) - @nyurik
    • ne_10m_admin_0_boundary_lines_disputed_areas
    • ne_10m_rivers_europe
    • ne_10m_rivers_north_america
    • ne_10m_roads
    • ne_10m_roads_north_america
    • ne_10m_urban_areas
    • ne_50m_admin_0_breakaway_disputed_areas_scale_rank
    • ne_50m_admin_1_states_provinces
  • remove unneeded clean-natural-earth.sh from the resulting docker images

Other Improvements

Bug Fixes

  • Download state file from geofabrik whether exact name match or not (#251) - @frodrigo