-
Notifications
You must be signed in to change notification settings - Fork 0
Diâmetro de referência e simplificação das geometrias de jurisdições
Pensando que a relação entre perímetro da forma e sua "diagonal" é da ordem de 3 ou mais, a query abaixo permite avaliar qual o fator que melhor aproxima a raiz da área com o fator do perímetro:
SELECT count(*) n FROM optim.jurisdiction_geom where isolabel_ext like '%-%-%'; -- 8781
-- Find perim3_km factor that minimizes d_median, d_avg and d_dev; and avoid negative d_min or extreme d_max
SELECT round(PERCENTILE_DISC(0.5) within group (order by abs(d)) ) as d_median,
round(avg(abs(d))) as d_avg, round(stddev_pop(abs(d))) as d_dev,
max(abs(d)) as d_max, min(abs(d)) as d_min, min(d) as signed_min,
count(*) filter (where d<0) as signed_n
FROM (
SELECT 200.0*(perim3_km-km2_to_km)/(perim3_km+km2_to_km) AS d -- percentual difference
FROM (
select *, round(sqrt(km2),2)::float as km2_to_km -- query for visual check
from (
select osm_id, isolabel_ext, st_area(geom,true)/1000000.0 as km2,
st_perimeter(geom,true)/3500.0 as perim3_km -- tested: 3000, 3500, 4000 (ok), 4500 (generated negative d_min and more d_dev)
from optim.jurisdiction_geom where isolabel_ext like '%-%-%'
) t1
) t2
)t3; -- AVG of tested permeters: 66.3, 52.4, 28.6
Parece então razoável, depois dessa aproximação de ordem de grandeza, fazer a média entre os dois, quando se tratando de geometrias de jurisdições:
diam_ref = (SQRT(area)+perimetro/3.5)/2.0
Resultados, usando 8781 geometrias de municípios de optim.jurisdiction_geom
. Valores das diferenças percentuais execeto pelo fator:
perim_factor | d_median | d_avg | d_dev | d_max | d_min | signed_min |
---|---|---|---|---|---|---|
3.0 | 65 | 66 | 17 | 167.80 | 16.66 | 16.66 |
3.5 | 51 | 52 | 18 | 162.93 | 1.32 | 1.32 |
4.0 | 38 | 40 | 19 | 158.19 | 0.0294 | -12.05 |
4.5 | 27 | 29 | 18 | 153.57 | 0.0101 | -23.70 |
A quantidade de negativos foi alta, 439, no fator 4.5. Como são esperadas mais geometrias e o é desvio alto, o ajuste é grosseiro, quantizado em 0.5.
A simplificação é realizada em duas etapas, a primeira mais forte com ST_SimplifyPreserveTopology e a segunda "grudando dobras ilhas na escala menor", através de ST_Buffer(+ depois -), depois ST_SimplifyVW para remover redundâncias dentro da mesma escala.
Nota: segundo referências 1 e 2 a suavisação por buffer não seria necessária quando usando algorítmo de Visvalingam–Whyatt. Sugere-se a realização de testes e comparações antes de eliminar o buffer.
-- DROP FUNCTION ST_viz_simplify_noise;
CREATE or replace FUNCTION ST_viz_simplify_noise( -- simplify by buffer-eroding
p_geom geometry, -- input polygon
p_radius float, -- buffer radius
p_VWfactor float DEFAULT 0.1, -- reduce buffer radius by a factor, to use as tolerance in ST_SimplifyVW
p_mitre boolean DEFAULT true -- false for rounded, or true for not.
) RETURNS geometry AS $f$
SELECT ST_SimplifyVW(
ST_Buffer(ST_Buffer(p_geom,p_radius,mitre),-p_radius,mitre),
p_radius*p_VWfactor
) FROM (SELECT CASE WHEN p_mitre THEN 'join=mitre' ELSE '' END) t(mitre)
$f$ LANGUAGE SQL IMMUTABLE;
CREATE or replace FUNCTION ST_viz_simplify_noisebefore( -- simplify for VIsualiZation
p_geom geometry, -- input polygon
p_both boolean DEFAULT true, -- 3state flag: NULL= only noise, true = both, false= only main
p_factor float DEFAULT 0.1, -- 1 = no reduction, 0.5 is 50% to tolerance control (100% default)
p_noiseFactor float DEFAULT 0.01 -- second factor, to smooth "noise", can be 100%, 10% or 1%.
) RETURNS geometry AS $f$
WITH diam_ref AS (
SELECT *, d*fract AS tolerance
-- area metric of the SRID, not meters!
FROM ( SELECT ( SQRT(ST_Area(p_geom)) + ST_Perimeter(p_geom)/3.5 )/2.0 AS d,
p_factor*0.1 AS fract -- fraction metric
) t0
)
SELECT CASE WHEN p_both IS NOT NULL THEN ST_SimplifyPreserveTopology(geom,tolerance) ELSE geom END
FROM (
SELECT CASE -- noise reducer:
WHEN p_both IS NULL OR p_both THEN ST_viz_simplify_noise(p_geom, tolerance*p_noiseFactor, 0.1)
ELSE p_geom -- no noise reduction
END as geom,
tolerance
FROM diam_ref
) t1
$f$ LANGUAGE SQL IMMUTABLE;
CREATE or replace FUNCTION ST_viz_simplify_noiseafter( -- simplify for VIsualiZation
p_geom geometry, -- input polygon
p_both boolean DEFAULT true, -- 3state flag: NULL= only noise, true = both, false= only main
p_factor float DEFAULT 0.1, -- 1 = no reduction, 0.5 is 50% to tolerance control (100% default)
p_noiseFactor float DEFAULT 0.01 -- second factor, to smooth "noise", can be 100%, 10% or 1%.
) RETURNS geometry AS $f$
WITH diam_ref AS (
SELECT *, d*fract AS tolerance
-- area metric of the SRID, not meters!
FROM ( SELECT ( SQRT(ST_Area(p_geom)) + ST_Perimeter(p_geom)/3.5 )/2.0 AS d,
p_factor*0.1 AS fract -- fraction metric
) t0
)
SELECT CASE -- noise reducer:
WHEN p_both IS NULL OR p_both THEN ST_viz_simplify_noise(geom, tolerance*p_noiseFactor, 0.1)
ELSE geom -- no noise reduction
END
FROM (
SELECT CASE WHEN p_both IS NOT NULL THEN ST_SimplifyPreserveTopology(p_geom,tolerance) ELSE p_geom END as geom,
tolerance
FROM diam_ref
) t1
$f$ LANGUAGE SQL IMMUTABLE;
---------------------------------
CREATE TABLE lix AS
SELECT * FROM optim.jurisdiction_geom TABLESAMPLE BERNOULLI(0.8) where isolabel_ext like '%-%-%'
;
SELECT type, PERCENTILE_DISC(0.5) within group (order by abs(dif)) AS dif_absmdn,
round(AVG(abs(dif)),4) as dif_absavg, min(abs(dif)) as dif_absmin, Max(abs(dif)) as dif_absmax,
min(dif) as dif_min, Max(dif) as dif_max, count(*) as n,
round(AVG(n_pts)) as n_pts, round(AVG(n_newpts)) as n_newpts, round(AVG(100*n_newpts/n_pts)) as perc_redux_avg
FROM (
select type, round(2*(km2-simple_km2)/(km2+simple_km2), 4) as dif, n_newpts, n_pts
from (
SELECT 'before' AS type,
isolabel_ext, round(st_area(geom,true)/1000000.0,2) as km2,
round(st_area(newgeom,true)/1000000.0,2) as simple_km2,
ST_NPoints(newgeom) as n_newpts, ST_NPoints(geom) as n_pts
FROM (SELECT *,ST_viz_simplify_noiseafter(geom,true,0.5) newgeom FROM lix) l1
UNION ALL
SELECT 'after' AS type,
isolabel_ext, round(st_area(geom,true)/1000000.0,2) as km2,
round(st_area(newgeom,true)/1000000.0,2) as simple_km2,
ST_NPoints(newgeom) as n_newpts, ST_NPoints(geom) as n_pts
FROM (SELECT *,ST_viz_simplify_noiseafter(geom,true,0.5) newgeom FROM lix) l2
UNION ALL
SELECT 'main' AS type,
isolabel_ext, round(st_area(geom,true)/1000000.0,2) as km2,
round(st_area(newgeom,true)/1000000.0,2) as simple_km2,
ST_NPoints(newgeom) as n_newpts, ST_NPoints(geom) as n_pts
FROM (SELECT *,ST_viz_simplify_noiseafter(geom,false,0.5) newgeom FROM lix) l3
UNION ALL
SELECT 'noise' AS type,
isolabel_ext, round(st_area(geom,true)/1000000.0,2) as km2,
round(st_area(newgeom,true)/1000000.0,2) as simple_km2,
ST_NPoints(newgeom) as n_newpts, ST_NPoints(geom) as n_pts
FROM (SELECT *,ST_viz_simplify_noiseafter(geom,NULL,0.5) newgeom FROM lix) l4
) t
) t2 GROUP BY type ORDER BY type;
With p_factor=0.1, the default, result in minor area changes and reduction of points to 15% only:
type | dif_absmdn | dif_absavg | dif_absmin | dif_absmax | dif_min | dif_max | n | n_pts | n_newpts | perc_red |
---|---|---|---|---|---|---|---|---|---|---|
after | 0.0015 | 0.0022 | 0.0000 | 0.0153 | -0.0080 | 0.0153 | 62 | 1904 | 60 | 14 |
before | 0.0015 | 0.0022 | 0.0000 | 0.0153 | -0.0080 | 0.0153 | 62 | 1904 | 60 | 14 |
main | 0.0016 | 0.0022 | 0.0000 | 0.0154 | -0.0079 | 0.0154 | 62 | 1904 | 61 | 14 |
noise | 0.0001 | 0.0002 | 0.0000 | 0.0011 | -0.0011 | 0.0007 | 62 | 1904 | 192 | 27 |
With p_factor=0.01, no area changes but only 40% reduction of points:
type | dif_absmdn | dif_absavg | dif_absmin | dif_absmax | dif_min | dif_max | n | n_pts | n_newpts | perc_red |
---|---|---|---|---|---|---|---|---|---|---|
after | 0.0001 | 0.0002 | 0.0000 | 0.0032 | -0.0032 | 0.0013 | 62 | 1904 | 324 | 39 |
before | 0.0001 | 0.0002 | 0.0000 | 0.0032 | -0.0032 | 0.0013 | 62 | 1904 | 324 | 39 |
main | 0.0001 | 0.0002 | 0.0000 | 0.0032 | -0.0032 | 0.0013 | 62 | 1904 | 333 | 40 |
noise | 0.0000 | 0.0001 | 0.0000 | 0.0032 | -0.0032 | 0.0002 | 62 | 1904 | 468 | 47 |
With p_factor=0.5, minor area changes and remains 5% of points:
type | dif_absmdn | dif_absavg | dif_absmin | dif_absmax | dif_min | dif_max | n | n_pts | n_newpts | perc_red |
---|---|---|---|---|---|---|---|---|---|---|
after | 0.0164 | 0.0208 | 0.0001 | 0.0681 | -0.0681 | 0.0645 | 62 | 1904 | 14 | 5 |
before | 0.0164 | 0.0208 | 0.0001 | 0.0681 | -0.0681 | 0.0645 | 62 | 1904 | 14 | 5 |
main | 0.0165 | 0.0210 | 0.0003 | 0.0680 | -0.0680 | 0.0645 | 62 | 1904 | 14 | 5 |
noise | 0.0006 | 0.0010 | 0.0000 | 0.0116 | -0.0116 | 0.0023 | 62 | 1904 | 95 | 18 |
Conclusion, the best is with only noise reducer and p_factor=0.05. Something more as :
-- listing a sample of tests:
SELECT * , round( 2*(km2-simple_km2)/(km2+simple_km2) ,4) AS km2_dif_ratio,
round(simple_km2/n_newpts,3) AS km_per_newpt
FROM (
SELECT isolabel_ext,ST_NPoints(geom) as n_pts, round(st_area(geom,true)/1000000.0,2) as km2,
round(st_area(newgeom,true)/1000000.0,2) as simple_km2,
ST_NPoints(newgeom) as n_newpts
FROM (SELECT *, ST_viz_simplify_noiseafter(geom,NULL,0.05) as newgeom FROM lix) t0
) t ORDER BY ABS( 2*(km2-simple_km2)/(km2+simple_km2) ) DESC, 1;
isolabel_ext | n_pts | km2 | simple_km2 | n_newpts | km2_dif_ratio | km_per_newpt |
---|---|---|---|---|---|---|
UY-MA-LaBarra | 191 | 3.13 | 3.14 | 63 | -0.0032 | 0.050 |
BR-MG-Bicas | 1841 | 140.08 | 140.15 | 204 | -0.0005 | 0.687 |
CO-CUN-Mosquera | 1055 | 105.48 | 105.44 | 175 | 0.0004 | 0.603 |
CL-LL-Osorno | 2106 | 953.89 | 954.25 | 418 | -0.0004 | 2.283 |
BR-GO-Edealina | 1434 | 604.46 | 604.26 | 343 | 0.0003 | 1.762 |
CO-CAL-Marulanda | 1036 | 378.09 | 378.19 | 276 | -0.0003 | 1.370 |
BR-PR-SantaIzabelOeste | 805 | 319.82 | 319.74 | 305 | 0.0003 | 1.048 |
BR-SP-CapaoBonito | 7994 | 1642.61 | 1642.21 | 442 | 0.0002 | 3.715 |
BR-PR-Manfrinopolis | 1423 | 216.37 | 216.32 | 243 | 0.0002 | 0.890 |
BR-RS-Sobradinho | 939 | 130.57 | 130.54 | 189 | 0.0002 | 0.691 |
BR-PE-VitoriaSantoAntao | 271 | 371.83 | 371.91 | 105 | -0.0002 | 3.542 |
CL-LR-RioBueno | 2091 | 2179.95 | 2179.60 | 464 | 0.0002 | 4.697 |
... | ... | ... | .. | .. | .. | |
CO-VAC-SanPedro | 877 | 199.63 | 199.63 | 187 | 0.0000 | 1.068 |
UY-CA-VillaSanJose | 35 | 0.53 | 0.53 | 14 | 0.0000 | 0.038 |
UY-CL-LaPedrera | 15 | 0.16 | 0.16 | 13 | 0.0000 | 0.012 |
UY-CO-LosPinos | 53 | 0.86 | 0.86 | 26 | 0.0000 | 0.033 |
UY-RV-Masoller | 26 | 0.29 | 0.29 | 22 | 0.0000 | 0.013 |
Algumas heurísticas para resolver os raros casos de distorção de área:
- aplicar apenas a redução de ruído,
ST_viz_simplify_noise()
, quando a geometria em si já possui poucos pontos, como no caso de Masoller. - usar fator mais baixo quando o "fator de alongamento" for alto (
jurisdiction.info->'elongation_factor'>=4
), novamente como no caso de Masoller.
Avaliando complexidade pelo numero de pontos
select ptsmin, round(avg(spp)) as spp_avg, round(stddev(spp)) as spp_dev from (
select *, size/n_points as spp FROM (
SELECT ptsmin, round(sqrt(st_area(geom,true))) as size, ST_NPoints(geom) n_points
FROM optim.jurisdiction_geom, (select 1000 as ptsmin union all select 5000 union all select 10000 ) t0
WHERE isolabel_ext like '%-%-%'
) t1
where n_points<=ptsmin -- > or <=
) t2 group by ptsmin;
n_points | spp_avg | spp_dev |
---|---|---|
<1000 | 98 | 144 |
<5000 | 71 | 122 |
<10000 | 69 | 121 |
>1000 | 19 | 16 < 19 |
>5000 | 10 | 10 |
>10000 | 6 | 5 < 6 |
Usando o critério n_points>1000
temos uma chance razoável de medir a complexidade por "size per point", spp
. Como a média é ~20 metros, podemos supor que 10 metros seja muito baixo, justificando uma simplificação maior por ST_SimplifyPreserveTopology.
Colocando as heuristicas em uma só função:
DROP FUNCTION ST_viz_simplify_noisebefore;
DROP FUNCTION ST_viz_simplify_noiseafter;
CREATE or replace FUNCTION ST_viz_simplify(
-- simplify for VIsualiZation
p_geom geometry, -- input polygon
p_both boolean DEFAULT true, -- 3state flag: NULL= only noise, true = both, false= only main
p_factor float DEFAULT 0.1, -- 1 = no reduction, 0.5 is 50% to tolerance control (100% default)
p_noiseFactor float DEFAULT 0.01 -- second factor, to smooth "noise", can be 100%, 10% or 1%.
) RETURNS geometry AS $f$
WITH diam_ref AS (
SELECT *, d*fract AS tolerance
-- area metric of the SRID, not meters!
FROM ( SELECT ( SQRT(ST_Area(p_geom)) + ST_Perimeter(p_geom)/3.5 )/2.0 AS d,
ST_NPoints(p_geom) AS n_points,
p_factor*0.1 AS fract -- fraction metric
) t0
)
SELECT CASE -- noise reducer:
WHEN n_points>100 AND p_both IS NULL OR p_both THEN ST_viz_simplify_noise(geom, tolerance*p_noiseFactor, 0.1)
ELSE geom -- no noise reduction
END
FROM (
SELECT CASE WHEN n_points>10 AND p_both IS NOT NULL THEN ST_SimplifyPreserveTopology(p_geom,tolerance) ELSE p_geom END as geom,
tolerance, n_points
FROM diam_ref
) t1
$f$ LANGUAGE SQL IMMUTABLE;
Supondo que as funções de redução são locais e usam "mitre", não há maior risco de se incompatibilizar fronteiras ao colocar todos os polígonos simplificados lado a lado, para formar o mosaico. Por exemplo o Estado de SP como mosaico dos seus municípios.
PROBLEMA: se cada um usa uma escala diferente de simplificação, aí sim, as fronteiras ficam incompativeis.
Obter a média dentro do estado (jurisdição superior) e assumir essa média como métrica de referência para todos. Se o município for pequeno demais, não simplifica... Avaliar.
Outra opção é simplificar por topologia, ver TopoJSON e cia.
Lembretes
- revisar e levar para a documentação, removendo da Wiki em seguida
- https://github.com/AddressForAll/site-v2/issues/44