Tilesets: creating them on-the-fly #132

TangoYankee · 2024-01-05T16:41:43Z

TangoYankee
Jan 5, 2024
Maintainer

Description

As an alternative to using pre-generated tiles, we can explore on-demand rendering

Motivation

There have been limitations when using pre-rendered tilesets. These limitations include long generation times, rigid data structures, and unwieldy storage & access. Generating tilesets directly from the database may alleviate these problems. Two main approaches were explored - Martin and Drizzle. After these exploring both approaches, Drizzle is the recommended approach to on-demand tilesets.

Characteristics of on-demand tilesets

Benefits of on-demand tilesets

On demand tilesets address each of the key limitations we are experiencing with pre-rendered tilesets. First, each tile is only created when it is needed. In most cases, it can be generated in only a few tenths of a second. Once it is generated, it can be cached for future requests. Though, robust caching is out-of-scope for the initial implementation.

Second, the structure of the returned data can be changed simply by changing the sql request to generate the tile. Each subsequent request for the tilesets will get this update structure. In contrast, pre-rendered tilesets need to be regenerated for each change, slowing down the development and release process.

Finally, each prerendered tile needs to be stored on a separate file storage server. In our case, we were storing it on digital ocean. Due to CORS policies preventing redirects from the zoning-api to digital ocean (#84 ), the frontend needs to get its data and tilesets from different APIs. With on-demand tilesets, the tilesets come from the same database and API as the data. This also means the tilesets do not need to be regenerated every time there is an update to the underlying data. Instead, they will pull from the new data automatically. Consequently, both frontend development and data management are simplified.

Drawbacks of on-demand tilesets

With on-demand tilesets, there are two key limitations. First, large and detailed tilesets may take several seconds to generate. In these cases, pre-generated tilesets would be advantageous. Second, the database will be working harder when it generates tiles than it would otherwise. This can be alleviated with caching. However, database performance should be monitored closely

Limitations that exist in either approach

Two problems will exist regardless of whether we're using pre-rendered or on-demand tilesets, First, some larger/detailed files may be 1 or 2mbs. When requesting dozens of these files, rendering times will be slowed. Second, the client needs to draw the shapes held within the files. Under-powered clients will see performance hits with large tiles. When choosing between on-demand and pre-rendered tilesets, we should remember these two choke points will exist regardless. For example, we may be limited to a minZoom of 15 for tax lots because the data transfer will be too slow. So while pre-rendered tilesets would be faster at sending tiles at lower zoom levels, we probably wouldn't want to send tiles at lower zoom levels anyway.

On demand tileset approaches

Martin

Martin is a rust-based tileset server. It is capable of generating tilesets from a postgis database. When connected to a database, it implicitly discovers tables and functions with geometry columns that can serve tilesets. Martin was the first approach attempted in on-demand generation, as seen in branch exp/martin-2. However, we soon found that we would need to manage both the function definitions within the database and a whole additional server environment. We soon realized the function definitions were sufficient to produce MVTs; we did not need Martin to create the MVTs.

Drizzle

The flexibility of Drizzle allowed us to create the MVTs using SQL queries. We could then return the resulting data through the Nest API. The simplicity and flexibility of this approach made it a clear winner over Martin. The next section breaks down the components of its implementation.

Drizzle implementation

Tilesets for tax lots and zoning districts were created. We generated label and fill tilesets for each of these domains. Tax lots were colored based on their land uses and labeled based on their bbls. Zoning Districts were colored based on their classes and labeled based on their labels.

Code examples

Backend draft PR 132 on-demand tiles #133
Backend branch with expanded history of changes 132/exp/drizzle-tilesets
Frontend branch leveraging API updates 132/exp/drizzle-tiles

Database Schema

Tilesets are ultimately encoded using a web mercator-projected coordinate system (epsg:3857). Geometry columns were created for each tileset for several reasons. First, it helps decouple the tileset geometries from the geography data column. Second, it saves run-time calculations. These run-time calculations include the transformation from espg:4326 to epsg:3827 and the calculation of the point where we want to place each label. Finally, we can create an index specific to the tileset geometries. The tax lot mercator fill column received an index because the large volumes of rows and the complex nature of polygon intersection calculations made them noticeably slower. The other spatial columns did not receive a indexes because they did not exhibit the same performance limitations. The zoning district columns only contain a few thousand rows- a spatial index would likely only show marginal improvements. The tax lot label column contains point geometries, which are relatively simple to query for spatial intersections.

tax lot schema example: src/schema/tax-lot.ts

The data for these columns are generated from UPDATE table commands in the drizzle-management/load.sql. In the future, we may ask data engineering to integrate them into the source csv files, making them part of the COPY table commands.

Open API documention

As these will be endpoints within the API, they need to be documented. All four endpoints follow the same general structure. They accept zoom, x, and y path parameters. These parameters are of type string with a number-like regrex check because, as url values, they are always strings before they are coerced to numbers; we must first check they are "number-like". (Zod does have a 'coerce' however, I couldn't find documentation for kubb generation supporting it).

The 200 path returns a protobuf file. The routes may error with 400 and 500 responses, returning the standard json responses.
We may also want to return 404s under two scenarios. First, users may request tiles that we know are outside the NYC extent. In which case, we could preemptively check and return a 404. Otherwise, the API returns an empty file. Second, users may request a zoom level that is expensive to calculate. For example, tax lot zooms lower than 14 are expensive and not generally helpful to view. We may want to preempt these requests as well (maybe we want to make this a 403?).

openapi/openapi.yaml

Controller

When the tilesets are successfully generated, the controllers send the tile with a protobuf header. If there is an error, the standard json error responses are sent.

tax lot controller example src/tax-lot/tax-lot.controller.ts

Service

The services extract the byte data from the query and pass it to the controller

tax lot service example src/tax-lot/tax-lot.service.ts

Repository

Each tile is generated with a sql query in the repository. This gives us flexibility to format the properties in a way that best serves us. In several instances, we manually JSON-stringify arrays of data. Frontends can then explicitly parse this data into lists. This is helpful in a couple situations. First, we convert the label colors from hex to RGBA for easier rendering: src/tax-lot/tax-lot.repository.ts. Second, we list the multiple classes and categories a single zoning district label can belong to: src/zoning-district/zoning-district.repository.ts. We could've also applied the same treatment to zoning-district fills. However, there is a benefit to instead having a distinct row for class and category for each zoning-district fill; it allows us to blend the colors of multiple fills to show the mixed nature of a zoning district.

Notes

Ignore the TilesetRetrievalException. This was a misadventure and was removed from the PR version.

The tilesets/*.sql files were experimental. They were used to test the raw queries that were eventually converted into the drizzle queries. They are saved on the experimental branch for reference. However, they are removed in the PR branch.

TylerMatteo · 2024-01-11T15:35:17Z

TylerMatteo
Jan 11, 2024
Maintainer

Thanks for putting together this write up, really great stuff. I have some thoughts that I'll try to organize along your headings, but overall I agree with your conclusions:

Benefits of on-demand tiles

Piggybacking off this point:

Second, the structure of the returned data can be changed simply by changing the sql request to generate the tile. Each subsequent request for the tilesets will get this update structure. In contrast, pre-rendered tilesets need to be regenerated for each change, slowing down the development and release process.

This could in theory make data orchestration for updates easier as well. If the JSON data and tiled data are both coming from the database, we don't have to worry about their data being out of sync or having an update to one go live slightly before the other, etc

On the topic of caching, I think it's reasonable to consider it out of scope for this investigation but here are a few things to consider for when/if we do tackle it:

Do we need to cache at the API layer? It's my understanding that Postgres itself implements some caching, so it may cache, for example, the result of running some of the SQL function we're using here and would pull results from the cache when it sees those functions called again with the same arguments. If it is doing that, that may meet our needs
If we do cache at the server level, there are a few ways to go about doing that. Nest actually has some nice docs on caching. I did a very rough implementation of in-memory caching of tiles in my old mikro-orm poc project but I doubt in-memory would work for us in practice. The next obvious solution would probably be something like Redis.

Drawbacks of on-demand tiles

I think one thing to just be aware of here is the added complexity of having to write the SQL to generate these tiles. If we went this route, we would of course abstract some of that query building to general-purpose code, and there are plenty of upsides to outweigh this, but just calling it out because some folks aren't going to be as comfortable with writing these sort of SQL queries and it's something we don't really have to think about when we're given pre-generated tiles

Martin vs Drizzle

No real notes here. Favoring building the tiles from our APIs with Drizzle and "raw" PostGIS instead using Martin seems like a no brainer to me.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tilesets: creating them on-the-fly #132

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Tilesets: creating them on-the-fly #132

TangoYankee Jan 5, 2024 Maintainer

Description

Motivation

Characteristics of on-demand tilesets

Benefits of on-demand tilesets

Drawbacks of on-demand tilesets

Limitations that exist in either approach

On demand tileset approaches

Martin

Drizzle

Drizzle implementation

Code examples

Database Schema

Open API documention

Controller

Service

Repository

Replies: 1 comment

TylerMatteo Jan 11, 2024 Maintainer

Benefits of on-demand tiles

Drawbacks of on-demand tiles

Martin vs Drizzle

TangoYankee
Jan 5, 2024
Maintainer

TylerMatteo
Jan 11, 2024
Maintainer