Skip to content

Diâmetro de referência e simplificação das geometrias de jurisdições

Peter edited this page Oct 2, 2022 · 7 revisions

Diâmetro de referência

Pensando que a relação entre perímetro da forma e sua "diagonal" é da ordem de 3 ou mais, a query abaixo permite avaliar qual o fator que melhor aproxima a raiz da área com o fator do perímetro:

SELECT count(*) n FROM optim.jurisdiction_geom where isolabel_ext like '%-%-%'; -- 8781

-- Find perim3_km factor that minimizes d_median, d_avg and d_dev; and avoid negative d_min or extreme d_max
SELECT round(PERCENTILE_DISC(0.5) within group (order by abs(d)) ) as d_median,
       round(avg(abs(d))) as d_avg, round(stddev_pop(abs(d))) as d_dev,
       max(abs(d)) as d_max, min(abs(d)) as d_min, min(d) as signed_min,
       count(*) filter (where d<0) as signed_n
FROM (
 SELECT 200.0*(perim3_km-km2_to_km)/(perim3_km+km2_to_km) AS d -- percentual difference
 FROM (
   select *, round(sqrt(km2),2)::float as km2_to_km  -- query for visual check
   from (
     select osm_id, isolabel_ext,  st_area(geom,true)/1000000.0 as km2,
            st_perimeter(geom,true)/3500.0 as perim3_km  -- tested:  3000, 3500, 4000 (ok), 4500 (generated negative d_min and more d_dev)
     from optim.jurisdiction_geom where isolabel_ext like '%-%-%'
   ) t1
 ) t2
)t3;  -- AVG of tested permeters: 66.3, 52.4, 28.6

Parece então razoável, depois dessa aproximação de ordem de grandeza, fazer a média entre os dois, quando se tratando de geometrias de jurisdições:
diam_ref = (SQRT(area)+perimetro/3.5)/2.0

Resultados, usando 8781 geometrias de municípios de optim.jurisdiction_geom. Valores das diferenças percentuais execeto pelo fator:

perim_factor d_median d_avg d_dev d_max d_min signed_min
3.0 65 66 17 167.80 16.66 16.66
3.5 51 52 18 162.93 1.32 1.32
4.0 38 40 19 158.19 0.0294 -12.05
4.5 27 29 18 153.57 0.0101 -23.70

A quantidade de negativos foi alta, 439, no fator 4.5. Como são esperadas mais geometrias e o é desvio alto, o ajuste é grosseiro, quantizado em 0.5.

Simplificação da geometria para uso em interface SVG

A simplificação é realizada em duas etapas, a primeira mais forte com ST_SimplifyPreserveTopology e a segunda "grudando dobras ilhas na escala menor", através de ST_Buffer(+ depois -), depois ST_SimplifyVW para remover redundâncias dentro da mesma escala.

Nota: segundo referências 1 e 2 a suavisação por buffer não seria necessária quando usando algorítmo de Visvalingam–Whyatt. Sugere-se a realização de testes e comparações antes de eliminar o buffer.

-- DROP FUNCTION ST_viz_simplify_noise;
CREATE or replace FUNCTION ST_viz_simplify_noise(  -- simplify by buffer-eroding 
    p_geom      geometry,             -- input polygon
    p_radius    float,                -- buffer radius
    p_VWfactor float   DEFAULT 0.1,    -- reduce buffer radius by a factor, to use as tolerance in ST_SimplifyVW
    p_mitre    boolean DEFAULT true    -- false for rounded, or true for not. 
) RETURNS geometry AS $f$
   SELECT ST_SimplifyVW(
      ST_Buffer(ST_Buffer(p_geom,p_radius,mitre),-p_radius,mitre),
      p_radius*p_VWfactor
   ) FROM (SELECT CASE WHEN p_mitre THEN 'join=mitre' ELSE '' END) t(mitre)
$f$ LANGUAGE SQL IMMUTABLE;

CREATE or replace FUNCTION ST_viz_simplify_noisebefore(  -- simplify for VIsualiZation 
    p_geom      geometry,               -- input polygon
    p_both        boolean DEFAULT true,   -- 3state flag: NULL= only noise, true = both, false= only main
    p_factor      float   DEFAULT 0.1,       -- 1 = no reduction, 0.5 is 50% to tolerance control (100% default)
    p_noiseFactor float   DEFAULT 0.01    -- second factor, to smooth "noise", can be 100%, 10% or 1%.
) RETURNS geometry AS $f$
  WITH   diam_ref AS (
    SELECT *, d*fract AS tolerance 
    -- area metric of the SRID, not meters!
    FROM ( SELECT ( SQRT(ST_Area(p_geom)) + ST_Perimeter(p_geom)/3.5 )/2.0  AS  d,
           p_factor*0.1 AS fract -- fraction metric
    ) t0
  )
  SELECT CASE WHEN p_both IS NOT NULL THEN ST_SimplifyPreserveTopology(geom,tolerance) ELSE geom END
  FROM (
    SELECT CASE   -- noise reducer:
       WHEN  p_both IS NULL OR p_both THEN ST_viz_simplify_noise(p_geom, tolerance*p_noiseFactor, 0.1)
       ELSE p_geom          -- no noise reduction
     END as geom,
     tolerance
    FROM diam_ref
  ) t1
$f$ LANGUAGE SQL IMMUTABLE;

CREATE or replace FUNCTION ST_viz_simplify_noiseafter(  -- simplify for VIsualiZation 
    p_geom        geometry,               -- input polygon
    p_both        boolean DEFAULT true,   -- 3state flag: NULL= only noise, true = both, false= only main
    p_factor      float   DEFAULT 0.1,       -- 1 = no reduction, 0.5 is 50% to tolerance control (100% default)
    p_noiseFactor float   DEFAULT 0.01    -- second factor, to smooth "noise", can be 100%, 10% or 1%.
) RETURNS geometry AS $f$
  WITH   diam_ref AS (
    SELECT *, d*fract AS tolerance 
    -- area metric of the SRID, not meters!
    FROM ( SELECT ( SQRT(ST_Area(p_geom)) + ST_Perimeter(p_geom)/3.5 )/2.0  AS  d,
           p_factor*0.1 AS fract -- fraction metric
    ) t0
  )
  SELECT CASE   -- noise reducer:
     WHEN p_both IS NULL OR p_both THEN ST_viz_simplify_noise(geom, tolerance*p_noiseFactor, 0.1)
     ELSE geom  -- no noise reduction
    END
  FROM (
    SELECT CASE WHEN p_both IS NOT NULL THEN ST_SimplifyPreserveTopology(p_geom,tolerance) ELSE p_geom END as geom,
           tolerance
    FROM diam_ref
  ) t1
$f$ LANGUAGE SQL IMMUTABLE;

---------------------------------
CREATE TABLE lix AS
  SELECT * FROM optim.jurisdiction_geom TABLESAMPLE BERNOULLI(0.8) where isolabel_ext like '%-%-%'
;
SELECT type, PERCENTILE_DISC(0.5) within group (order by abs(dif)) AS dif_absmdn,
       round(AVG(abs(dif)),4) as dif_absavg, min(abs(dif)) as dif_absmin, Max(abs(dif)) as dif_absmax,
       min(dif) as dif_min, Max(dif) as dif_max, count(*) as n,
       round(AVG(n_pts)) as n_pts, round(AVG(n_newpts)) as n_newpts, round(AVG(100*n_newpts/n_pts)) as perc_redux_avg
FROM (
 select  type, round(2*(km2-simple_km2)/(km2+simple_km2), 4) as dif, n_newpts, n_pts
 from (
  SELECT 'before' AS type,
         isolabel_ext, round(st_area(geom,true)/1000000.0,2) as km2,
         round(st_area(newgeom,true)/1000000.0,2) as simple_km2,
         ST_NPoints(newgeom) as n_newpts, ST_NPoints(geom) as n_pts 
  FROM (SELECT *,ST_viz_simplify_noiseafter(geom,true,0.5) newgeom FROM lix) l1
  UNION ALL
  SELECT 'after' AS type,
         isolabel_ext, round(st_area(geom,true)/1000000.0,2) as km2,
         round(st_area(newgeom,true)/1000000.0,2) as simple_km2,
         ST_NPoints(newgeom) as n_newpts, ST_NPoints(geom) as n_pts 
  FROM (SELECT *,ST_viz_simplify_noiseafter(geom,true,0.5) newgeom FROM lix) l2
  UNION ALL
  SELECT 'main' AS type,
         isolabel_ext, round(st_area(geom,true)/1000000.0,2) as km2,
         round(st_area(newgeom,true)/1000000.0,2) as simple_km2,
         ST_NPoints(newgeom) as n_newpts, ST_NPoints(geom) as n_pts 
  FROM (SELECT *,ST_viz_simplify_noiseafter(geom,false,0.5) newgeom FROM lix) l3
  UNION ALL
  SELECT 'noise' AS type,
         isolabel_ext, round(st_area(geom,true)/1000000.0,2) as km2,
         round(st_area(newgeom,true)/1000000.0,2) as simple_km2,
         ST_NPoints(newgeom) as n_newpts, ST_NPoints(geom) as n_pts 
  FROM (SELECT *,ST_viz_simplify_noiseafter(geom,NULL,0.5) newgeom FROM lix) l4
 ) t
) t2 GROUP BY type ORDER BY type;

With p_factor=0.1, the default, result in minor area changes and reduction of points to 15% only:

type dif_absmdn dif_absavg dif_absmin dif_absmax dif_min dif_max n n_pts n_newpts perc_red
after 0.0015 0.0022 0.0000 0.0153 -0.0080 0.0153 62 1904 60 14
before 0.0015 0.0022 0.0000 0.0153 -0.0080 0.0153 62 1904 60 14
main 0.0016 0.0022 0.0000 0.0154 -0.0079 0.0154 62 1904 61 14
noise 0.0001 0.0002 0.0000 0.0011 -0.0011 0.0007 62 1904 192 27

With p_factor=0.01, no area changes but only 40% reduction of points:

type dif_absmdn dif_absavg dif_absmin dif_absmax dif_min dif_max n n_pts n_newpts perc_red
after 0.0001 0.0002 0.0000 0.0032 -0.0032 0.0013 62 1904 324 39
before 0.0001 0.0002 0.0000 0.0032 -0.0032 0.0013 62 1904 324 39
main 0.0001 0.0002 0.0000 0.0032 -0.0032 0.0013 62 1904 333 40
noise 0.0000 0.0001 0.0000 0.0032 -0.0032 0.0002 62 1904 468 47

With p_factor=0.5, minor area changes and remains 5% of points:

type dif_absmdn dif_absavg dif_absmin dif_absmax dif_min dif_max n n_pts n_newpts perc_red
after 0.0164 0.0208 0.0001 0.0681 -0.0681 0.0645 62 1904 14 5
before 0.0164 0.0208 0.0001 0.0681 -0.0681 0.0645 62 1904 14 5
main 0.0165 0.0210 0.0003 0.0680 -0.0680 0.0645 62 1904 14 5
noise 0.0006 0.0010 0.0000 0.0116 -0.0116 0.0023 62 1904 95 18

Conclusion, the best is with only noise reducer and p_factor=0.05. Something more as :

  -- listing a sample of tests:
  SELECT * , round( 2*(km2-simple_km2)/(km2+simple_km2) ,4) AS km2_dif_ratio,
         round(simple_km2/n_newpts,3) AS km_per_newpt
  FROM (
    SELECT isolabel_ext,ST_NPoints(geom) as n_pts, round(st_area(geom,true)/1000000.0,2) as km2,
         round(st_area(newgeom,true)/1000000.0,2) as simple_km2,
         ST_NPoints(newgeom) as n_newpts
    FROM (SELECT *, ST_viz_simplify_noiseafter(geom,NULL,0.05) as newgeom FROM lix) t0
  ) t ORDER BY ABS( 2*(km2-simple_km2)/(km2+simple_km2) ) DESC, 1;
isolabel_ext n_pts km2 simple_km2 n_newpts km2_dif_ratio km_per_newpt
UY-MA-LaBarra 191 3.13 3.14 63 -0.0032 0.050
BR-MG-Bicas 1841 140.08 140.15 204 -0.0005 0.687
CO-CUN-Mosquera 1055 105.48 105.44 175 0.0004 0.603
CL-LL-Osorno 2106 953.89 954.25 418 -0.0004 2.283
BR-GO-Edealina 1434 604.46 604.26 343 0.0003 1.762
CO-CAL-Marulanda 1036 378.09 378.19 276 -0.0003 1.370
BR-PR-SantaIzabelOeste 805 319.82 319.74 305 0.0003 1.048
BR-SP-CapaoBonito 7994 1642.61 1642.21 442 0.0002 3.715
BR-PR-Manfrinopolis 1423 216.37 216.32 243 0.0002 0.890
BR-RS-Sobradinho 939 130.57 130.54 189 0.0002 0.691
BR-PE-VitoriaSantoAntao 271 371.83 371.91 105 -0.0002 3.542
CL-LR-RioBueno 2091 2179.95 2179.60 464 0.0002 4.697
... ... ... .. .. ..
CO-VAC-SanPedro 877 199.63 199.63 187 0.0000 1.068
UY-CA-VillaSanJose 35 0.53 0.53 14 0.0000 0.038
UY-CL-LaPedrera 15 0.16 0.16 13 0.0000 0.012
UY-CO-LosPinos 53 0.86 0.86 26 0.0000 0.033
UY-RV-Masoller 26 0.29 0.29 22 0.0000 0.013

Algumas heurísticas para resolver os raros casos de distorção de área:

  • aplicar apenas a redução de ruído, ST_viz_simplify_noise(), quando a geometria em si já possui poucos pontos, como no caso de Masoller.
  • usar fator mais baixo quando o "fator de alongamento" for alto (jurisdiction.info->'elongation_factor'>=4), novamente como no caso de Masoller.

Avaliando complexidade pelo numero de pontos

select ptsmin, round(avg(spp)) as spp_avg, round(stddev(spp)) as spp_dev from (
  select *, size/n_points as spp FROM (
    SELECT ptsmin, round(sqrt(st_area(geom,true))) as size, ST_NPoints(geom) n_points
    FROM optim.jurisdiction_geom,  (select 1000 as ptsmin union all select 5000 union all select 10000 ) t0
    WHERE isolabel_ext like '%-%-%'
  ) t1
  where n_points<=ptsmin   -- > or <=
) t2 group by ptsmin;
n_points spp_avg spp_dev
<1000 98 144
<5000 71 122
<10000 69 121
>1000 19 16 < 19
>5000 10 10
>10000 6 5 < 6

Usando o critério n_points>1000 temos uma chance razoável de medir a complexidade por "size per point", spp. Como a média é ~20 metros, podemos supor que 10 metros seja muito baixo, justificando uma simplificação maior por ST_SimplifyPreserveTopology.

Colocando as heuristicas em uma só função:

DROP FUNCTION ST_viz_simplify_noisebefore;
DROP FUNCTION ST_viz_simplify_noiseafter;

CREATE or replace FUNCTION ST_viz_simplify(
    -- simplify for VIsualiZation 
    p_geom        geometry,               -- input polygon
    p_both        boolean DEFAULT true,   -- 3state flag: NULL= only noise, true = both, false= only main
    p_factor      float   DEFAULT 0.1,       -- 1 = no reduction, 0.5 is 50% to tolerance control (100% default)
    p_noiseFactor float   DEFAULT 0.01    -- second factor, to smooth "noise", can be 100%, 10% or 1%.
) RETURNS geometry AS $f$
  WITH   diam_ref AS (
    SELECT *, d*fract AS tolerance 
    -- area metric of the SRID, not meters!
    FROM ( SELECT ( SQRT(ST_Area(p_geom)) + ST_Perimeter(p_geom)/3.5 )/2.0  AS  d,
             ST_NPoints(p_geom) AS n_points,
             p_factor*0.1 AS fract -- fraction metric
    ) t0
  )
  SELECT CASE   -- noise reducer:
     WHEN n_points>100 AND p_both IS NULL OR p_both THEN ST_viz_simplify_noise(geom, tolerance*p_noiseFactor, 0.1)
     ELSE geom  -- no noise reduction
    END
  FROM (
    SELECT CASE WHEN n_points>10 AND p_both IS NOT NULL THEN ST_SimplifyPreserveTopology(p_geom,tolerance) ELSE p_geom END as geom,
           tolerance, n_points
    FROM diam_ref
  ) t1
$f$ LANGUAGE SQL IMMUTABLE;

Redução de mosaico

Supondo que as funções de redução são locais e usam "mitre", não há maior risco de se incompatibilizar fronteiras ao colocar todos os polígonos simplificados lado a lado, para formar o mosaico. Por exemplo o Estado de SP como mosaico dos seus municípios.

PROBLEMA: se cada um usa uma escala diferente de simplificação, aí sim, as fronteiras ficam incompativeis.

Obter a média dentro do estado (jurisdição superior) e assumir essa média como métrica de referência para todos. Se o município for pequeno demais, não simplifica... Avaliar.

Outra opção é simplificar por topologia, ver TopoJSON e cia.


Lembretes