Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: gdalwarp 3.4.3 takes seconds, gdalwarp 3.5.0 and later takes 30+ minutes #10809

Open
pjonsson opened this issue Sep 16, 2024 · 14 comments · May be fixed by #10812
Open

perf: gdalwarp 3.4.3 takes seconds, gdalwarp 3.5.0 and later takes 30+ minutes #10809

pjonsson opened this issue Sep 16, 2024 · 14 comments · May be fixed by #10812
Assignees

Comments

@pjonsson
Copy link
Contributor

pjonsson commented Sep 16, 2024

What is the bug?

The command

 gdalwarp -q -geoloc -r cubic -of COG -co COMPRESS=DEFLATE -co NUM_THREADS=1 --config GDAL_NUM_THREADS 1 /tmp/tmp_xnrpustp/oa02.vrt /tmp/output_a2odmztn/oa02.tif

took seconds with GDAL 3.4.3, but takes 38+ CPU minutes with GDAL 3.9.2 when using the GDAL ubuntu-small Docker image. I don't have exact numbers for GDAL 3.5.x, but that was also measured in minutes rather than seconds. Around 15 months ago I bisected GDAL, and landed on #5520 where the performance changed. The PR mentions
"goes from 1.26 s to 3.52 s" when testing on a Sentinel 5P product, so I understand some slowdown was expected.

Performance numbers are on a Ryzen 5900x boosting to 4.55-4.65 GHz. System is not overheating, the fans are not audibly louder than when the system is idle. IO is to a fast SSD, but there is no IO on the system when the CPU is going at full speed.

The commands have the environment variable GDAL_CACHEMAX=256, and that value is tuned for the workload using GDAL 3.4.3.

The example is using files from the product S3B_OL_2_WFR____20230309T115352_20230309T115652_20230310T233800_0180_077_066_1800_MAR_O_NT_003.SEN3, but from what I have observed there is nothing unique with the mentioned product, the GDAL performance is similar for other OL_2_WFR___ products.

Steps to reproduce the issue

The VRT is created with gdal_translate -q -ot Float32 -unscale NETCDF:Oa02_reflectance.nc:Oa02_reflectance -co NUM_THREADS=1 --config GDAL_NUM_THREADS 1 /tmp/tmp_xnrpustp/oa02.vrt, and then modified based on https://gis.stackexchange.com/questions/103116/map-project-a-raster-having-separate-latitude-and-longitude-raster-bands.

The people who initially wrote the code that constructed the GDAL commands are gone, so I don't have any more information than that. Attaching the modified VRT, and the latitude/longitude VRTs that it uses.

vrts.zip

Versions and provenance

GDAL ubuntu-small Docker image. 3.4.3 is quick, 3.5, 3.6, 3.8 and 3.9 are slow.

Additional context

As a sidenote, the resulting GeoTiffs with 3.4.3 for the product mentioned are 3.5-5.5M, and with 3.9.2 they are 128K.

If there is a faster way to get GeoTiffs projected as EPSG:4326 for OL_2_WFR___ products, I'm happy to use that instead.

@jratike80
Copy link
Collaborator

jratike80 commented Sep 16, 2024

I might have a try if I knew where and how to get the data "S3B_OL_2_WFR____20230309T115352_20230309T115652_20230310T233800_0180_077_066_1800_MAR_O_NT_003.SEN3" or something similar.

@pjonsson
Copy link
Contributor Author

The data comes from EUMETSAT. A lot of things on their web pages require a login, but I'm pretty sure you can register for free at their login portal: https://eoportal.eumetsat.int/cas/login

If you have a EUMETSAT user already, you can download the data with their Python client (EUMDAC): https://user.eumetsat.int/resources/user-guides/eumetsat-data-access-client-eumdac-guide

The product is also available from Copernicus, but I think that also requires a user (but you can register for free). If you have a user, go to their browser, click on the half-invisible tab called "Search" in the upper left corner, enter the product name in the box and press the green search button at the bottom left side (you might need to scroll down). You should get a result like this:
image
and then you can click on the rightmost icon to download.

@jratike80
Copy link
Collaborator

I succeeded in downloading and running your command. Next time you could test all the steps yourself for making it as easy as possible for others to re-produce. I had to edit away some fixed paths from the VRT files before I could run the command, from lines like these:

<SourceFilename relativeToVRT="0">NETCDF:/tmp/tmpfs/local_fzg3hhra/S3B...
<mdi key="X_DATASET">/tmp/tmpfs/tmp_xnrpustp/longitude.vrt</mdi>

When I run you command I can confirm that it is slow. The worst is that the oa02.tif file seems to be empty with no valid pixels. That's why it is so small. I used --debug on and and I can see some suspicious lines on my screen, like:

GEOLOC: Ending backmap generation
WARP: At least one point failed after revert transform
WARP: Recompute out extent with CHECK_WITH_INVERT_PROJ=TRUE
WARP: At least one point failed after revert transform

The whole debug log as an attachment. Without knowing hardly anything about NetCDF I believe that the slowness is not the main problem, but the the process does not actually work at all.

debug_log.txt

@pjonsson
Copy link
Contributor Author

If "does not actually work at all" is taken at face value, I would have expected an error code to be returned from one of the GDAL commands. Am I misinterpreting what "does not actually work at all" means?

Something I just realized is that I'm using a local security-patched ubuntu-small image, and that has the same netCDF support that is in the ubuntu-full image. I think using the ubuntu-full image will be sufficient for reproducing this issue.

@jratike80
Copy link
Collaborator

Sorry, I meant that the command seems to work but the result is useless. It is an image with reasonable spatial bounds but it is totally filled by nodata (STATISTICS_VALID_PERCENT=0). Obviously it used to return meaningful data with old GDAL versions, so something must have been changed. Feels like GDAL tries now hard to find something that it never finds.

Corner Coordinates:
Upper Left  ( -32.0000000,  74.0000000) ( 32d 0' 0.00"W, 74d 0' 0.00"N)
Lower Left  ( -32.0000000,  59.0026822) ( 32d 0' 0.00"W, 59d 0' 9.66"N)
Upper Right (   5.9975298,  74.0000000) (  5d59'51.11"E, 74d 0' 0.00"N)
Lower Right (   5.9975298,  59.0026822) (  5d59'51.11"E, 59d 0' 9.66"N)
Center      ( -13.0012351,  66.5013411) ( 13d 0' 4.45"W, 66d30' 4.83"N)
Band 1 Block=512x512 Type=Float32, ColorInterp=Gray
  NoData Value=65535
  Overviews: 3221x1271, 1610x635, 805x317, 402x158
  Metadata:
    STATISTICS_VALID_PERCENT=0

@jratike80
Copy link
Collaborator

Let's check that I have modified your files correctly. Do you get the same output then I from gdalinfo? Is it the same with 3.5.0 and newer versions?

gdalinfo oa02.vrt -stats
Driver: VRT/Virtual Raster
Files: oa02.vrt
       NETCDF:S3B_OL_2_WFR____20230309T115352_20230309T115652_20230310T233800_0180_077_066_1800_MAR_O_NT_003.SEN3/Oa02_reflectance.nc:Oa02_reflectance
Size is 4865, 4090
Geolocation:
  LINE_OFFSET=0
  LINE_STEP=1
  PIXEL_OFFSET=0
  PIXEL_STEP=1
  SRS=GEOGCS["WGS 84",
                DATUM["WGS_1984",
                    SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],
                    AUTHORITY["EPSG","6326"]],
                PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],
                UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],
                AUTHORITY["EPSG","4326"]]

  X_BAND=1
  X_DATASET=longitude.vrt
  Y_BAND=1
  Y_DATASET=latitude.vrt
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0, 4090.0)
Upper Right ( 4865.0,    0.0)
Lower Right ( 4865.0, 4090.0)
Center      ( 2432.5, 2045.0)
Band 1 Block=1217x1023 Type=Float32, ColorInterp=Undefined
  Minimum=-0.192, Maximum=0.589, Mean=0.061, StdDev=0.068
  NoData Value=65535
  Metadata:
    STATISTICS_MINIMUM=-0.19229102134705
    STATISTICS_MAXIMUM=0.58924502134323
    STATISTICS_MEAN=0.061422025804079
    STATISTICS_STDDEV=0.067506731456606
    STATISTICS_VALID_PERCENT=12.43

@jratike80
Copy link
Collaborator

I installed GDAL 3.4.3 and it creates a TIFF with the same statistics than the source vrt. Debug log from this successful run attached.

debug_log_gdal_343.txt

@pjonsson
Copy link
Contributor Author

Do you get the same output then I from gdalinfo? Is it the same with 3.5.0 and newer versions?

I get identical values to you with both 3.4.3 and 3.9.2. The only difference is that my Metadata field is sorted alphabetically, so STATISTICS_MINIMUM is the third field rather than the first.

And compared to 3.4.3 and your output, my 3.9.2 has an additional "Min=.. Max" in its output:

Band 1 Block=1217x1023 Type=Float32, ColorInterp=Undefined
  Min=-0.192 Max=0.589
  Minimum=-0.192, Maximum=0.589, Mean=0.061, StdDev=0.068

@jratike80
Copy link
Collaborator

I fear I have done all that I can. The issue is real, it is not in the speed but it is something more fundamental. I hope that someone else is capable in finding what goes wrong (and edit the title to reflect that).

@pjonsson
Copy link
Contributor Author

Thank you for your help.

Based on your analysis, I agree that it's not a performance issue, but I have no idea what is going wrong, so I will leave the title as is for now.

I don't know if it matters, but this is the output of gdalinfo -stats oa02.tif processed with 3.4.3:

Driver: GTiff/GeoTIFF
Files: oa02.tif
Size is 6260, 2280
Coordinate System is:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Origin = (-31.560815999999999,73.373125000000002)
Pixel Size = (0.005873524956157,-0.005873524956157)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=DEFLATE
  INTERLEAVE=BAND
  LAYOUT=COG
Corner Coordinates:
Upper Left  ( -31.5608160,  73.3731250) ( 31d33'38.94"W, 73d22'23.25"N)
Lower Left  ( -31.5608160,  59.9814881) ( 31d33'38.94"W, 59d58'53.36"N)
Upper Right (   5.2074502,  73.3731250) (  5d12'26.82"E, 73d22'23.25"N)
Lower Right (   5.2074502,  59.9814881) (  5d12'26.82"E, 59d58'53.36"N)
Center      ( -13.1766829,  66.6773065) ( 13d10'36.06"W, 66d40'38.30"N)
Band 1 Block=512x512 Type=Float32, ColorInterp=Gray
  Minimum=-0.211, Maximum=1.163, Mean=0.063, StdDev=0.064
  NoData Value=65535
  Overviews: 3130x1140, 1565x570, 782x285, 391x142
  Metadata:
    STATISTICS_MAXIMUM=1.1632422208786
    STATISTICS_MEAN=0.063151587373428
    STATISTICS_MINIMUM=-0.21090789139271
    STATISTICS_STDDEV=0.064036442691412
    STATISTICS_VALID_PERCENT=7.576

and this is with gdalinfo 3.4.3 on the result processed with 3.9.2:

Driver: GTiff/GeoTIFF
Files: oa02.tif
       oa02.tif.aux.xml
Size is 6443, 2543
Coordinate System is:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Origin = (-32.000000000000000,74.000000000000000)
Pixel Size = (0.005897490274103,-0.005897490274103)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=DEFLATE
  INTERLEAVE=BAND
  LAYOUT=COG
Corner Coordinates:
Upper Left  ( -32.0000000,  74.0000000) ( 32d 0' 0.00"W, 74d 0' 0.00"N)
Lower Left  ( -32.0000000,  59.0026822) ( 32d 0' 0.00"W, 59d 0' 9.66"N)
Upper Right (   5.9975298,  74.0000000) (  5d59'51.11"E, 74d 0' 0.00"N)
Lower Right (   5.9975298,  59.0026822) (  5d59'51.11"E, 59d 0' 9.66"N)
Center      ( -13.0012351,  66.5013411) ( 13d 0' 4.45"W, 66d30' 4.83"N)
Band 1 Block=512x512 Type=Float32, ColorInterp=Gray
ERROR 1: oa02.tif, band 1: Failed to compute statistics, no valid pixels found in sampling.
  NoData Value=65535
  Overviews: 3221x1271, 1610x635, 805x317, 402x158
  Metadata:
    STATISTICS_VALID_PERCENT=0

@rouault rouault self-assigned this Sep 16, 2024
rouault added a commit to rouault/gdal that referenced this issue Sep 16, 2024
rouault added a commit to rouault/gdal that referenced this issue Sep 16, 2024
rouault added a commit to rouault/gdal that referenced this issue Sep 16, 2024
rouault added a commit to rouault/gdal that referenced this issue Sep 16, 2024
@rouault
Copy link
Member

rouault commented Sep 16, 2024

Fix(es) in #10812
Several things:

  • latitude.vrt and longitude.vrt are wrong. They use <VRTRasterBand dataType="Int32" must be changed to <VRTRasterBand dataType="Float32", otherwise the latitude and longitudes will be integer values...
  • you can workaround the issue by setting the GDAL_GEOLOC_USE_TEMP_DATASETS configuration option to NO. So by adding --config GDAL_GEOLOC_USE_TEMP_DATASETS NO on the gdalwarp command line.

rouault added a commit to rouault/gdal that referenced this issue Sep 16, 2024
rouault added a commit to rouault/gdal that referenced this issue Sep 16, 2024
rouault added a commit to rouault/gdal that referenced this issue Sep 16, 2024
rouault added a commit to rouault/gdal that referenced this issue Sep 16, 2024
@pjonsson
Copy link
Contributor Author

pjonsson commented Sep 16, 2024

you can workaround the issue by setting the GDAL_GEOLOC_USE_TEMP_DATASETS configuration option to NO. So by adding --config GDAL_GEOLOC_USE_TEMP_DATASETS NO on the gdalwarp command line.

Are you saying that if I upgrade to GDAL 3.10 when that is released, I can mask the issue of the misconfigured Int32 type for latitude/longitude by keeping things in memory/not using temp datasets?

I can update the generation of latitude/longitude to use Float32 going forward. I still need to understand the ramifications of the past though, the flag you mention was introduced in GDAL 3.5, did 3.4 always keep things in memory and therefore masked the issue?

@rouault
Copy link
Member

rouault commented Sep 16, 2024

Are you saying that if I upgrade to GDAL 3.10 when that is released, I can mask the issue of the misconfigured Int32 type for latitude/longitude by keeping things in memory/not using temp datasets?

no, the wrong declare VRT data type needs to fix it for any GDAL versions. Past versions could be more tolerant about a wrong data type for the declared VRT band data type if for example doing the following chain: Int32 (pixel acquision) -> Float64 (intermediate working type in VRT when applying the scaling) -> Int32 (VRT band data type) -> Float64 (data type used by GDAL geolocation mechanism). Past versions would skip the down cast to Int32, but this was wrong, and has been fixed in later versions. I don't expect any no bad things from declaring the correct data type in past GDAL versions.

the flag you mention was introduced in GDAL 3.5, did 3.4 always keep things in memory and therefore masked the issue?

Yes 3.4 and below always ingested all the geolocation array and its generated inverse backmap in memory. This was an issue for very large datasets when this didn't fit in RAM and prevented use of geolocation arrays in those cases.

@pjonsson
Copy link
Contributor Author

I noticed the 3.9-backport tag on the PR, so I'm looking forward to leaving GDAL 3.4.3 behind when GDAL 3.9.3 is released. Thank you for the incredibly quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants