Tilesets: creating them on-the-fly #132
Replies: 1 comment
-
Thanks for putting together this write up, really great stuff. I have some thoughts that I'll try to organize along your headings, but overall I agree with your conclusions: Benefits of on-demand tilesPiggybacking off this point:
This could in theory make data orchestration for updates easier as well. If the JSON data and tiled data are both coming from the database, we don't have to worry about their data being out of sync or having an update to one go live slightly before the other, etc On the topic of caching, I think it's reasonable to consider it out of scope for this investigation but here are a few things to consider for when/if we do tackle it:
Drawbacks of on-demand tilesI think one thing to just be aware of here is the added complexity of having to write the SQL to generate these tiles. If we went this route, we would of course abstract some of that query building to general-purpose code, and there are plenty of upsides to outweigh this, but just calling it out because some folks aren't going to be as comfortable with writing these sort of SQL queries and it's something we don't really have to think about when we're given pre-generated tiles Martin vs DrizzleNo real notes here. Favoring building the tiles from our APIs with Drizzle and "raw" PostGIS instead using Martin seems like a no brainer to me. |
Beta Was this translation helpful? Give feedback.
-
Description
As an alternative to using pre-generated tiles, we can explore on-demand rendering
Motivation
There have been limitations when using pre-rendered tilesets. These limitations include long generation times, rigid data structures, and unwieldy storage & access. Generating tilesets directly from the database may alleviate these problems. Two main approaches were explored - Martin and Drizzle. After these exploring both approaches, Drizzle is the recommended approach to on-demand tilesets.
Characteristics of on-demand tilesets
Benefits of on-demand tilesets
On demand tilesets address each of the key limitations we are experiencing with pre-rendered tilesets. First, each tile is only created when it is needed. In most cases, it can be generated in only a few tenths of a second. Once it is generated, it can be cached for future requests. Though, robust caching is out-of-scope for the initial implementation.
Second, the structure of the returned data can be changed simply by changing the sql request to generate the tile. Each subsequent request for the tilesets will get this update structure. In contrast, pre-rendered tilesets need to be regenerated for each change, slowing down the development and release process.
Finally, each prerendered tile needs to be stored on a separate file storage server. In our case, we were storing it on digital ocean. Due to CORS policies preventing redirects from the zoning-api to digital ocean (#84 ), the frontend needs to get its data and tilesets from different APIs. With on-demand tilesets, the tilesets come from the same database and API as the data. This also means the tilesets do not need to be regenerated every time there is an update to the underlying data. Instead, they will pull from the new data automatically. Consequently, both frontend development and data management are simplified.
Drawbacks of on-demand tilesets
With on-demand tilesets, there are two key limitations. First, large and detailed tilesets may take several seconds to generate. In these cases, pre-generated tilesets would be advantageous. Second, the database will be working harder when it generates tiles than it would otherwise. This can be alleviated with caching. However, database performance should be monitored closely
Limitations that exist in either approach
Two problems will exist regardless of whether we're using pre-rendered or on-demand tilesets, First, some larger/detailed files may be 1 or 2mbs. When requesting dozens of these files, rendering times will be slowed. Second, the client needs to draw the shapes held within the files. Under-powered clients will see performance hits with large tiles. When choosing between on-demand and pre-rendered tilesets, we should remember these two choke points will exist regardless. For example, we may be limited to a minZoom of 15 for tax lots because the data transfer will be too slow. So while pre-rendered tilesets would be faster at sending tiles at lower zoom levels, we probably wouldn't want to send tiles at lower zoom levels anyway.
On demand tileset approaches
Martin
Martin is a rust-based tileset server. It is capable of generating tilesets from a postgis database. When connected to a database, it implicitly discovers tables and functions with geometry columns that can serve tilesets. Martin was the first approach attempted in on-demand generation, as seen in branch exp/martin-2. However, we soon found that we would need to manage both the function definitions within the database and a whole additional server environment. We soon realized the function definitions were sufficient to produce MVTs; we did not need Martin to create the MVTs.
Drizzle
The flexibility of Drizzle allowed us to create the MVTs using SQL queries. We could then return the resulting data through the Nest API. The simplicity and flexibility of this approach made it a clear winner over Martin. The next section breaks down the components of its implementation.
Drizzle implementation
Tilesets for tax lots and zoning districts were created. We generated label and fill tilesets for each of these domains. Tax lots were colored based on their land uses and labeled based on their bbls. Zoning Districts were colored based on their classes and labeled based on their labels.
Code examples
Database Schema
Tilesets are ultimately encoded using a web mercator-projected coordinate system (epsg:3857). Geometry columns were created for each tileset for several reasons. First, it helps decouple the tileset geometries from the geography data column. Second, it saves run-time calculations. These run-time calculations include the transformation from espg:4326 to epsg:3827 and the calculation of the point where we want to place each label. Finally, we can create an index specific to the tileset geometries. The tax lot mercator fill column received an index because the large volumes of rows and the complex nature of polygon intersection calculations made them noticeably slower. The other spatial columns did not receive a indexes because they did not exhibit the same performance limitations. The zoning district columns only contain a few thousand rows- a spatial index would likely only show marginal improvements. The tax lot label column contains point geometries, which are relatively simple to query for spatial intersections.
tax lot schema example:
src/schema/tax-lot.ts
The data for these columns are generated from
UPDATE table
commands in thedrizzle-management/load.sql
. In the future, we may ask data engineering to integrate them into the source csv files, making them part of theCOPY table
commands.Open API documention
As these will be endpoints within the API, they need to be documented. All four endpoints follow the same general structure. They accept zoom, x, and y path parameters. These parameters are of type string with a number-like regrex check because, as url values, they are always strings before they are coerced to numbers; we must first check they are "number-like". (Zod does have a 'coerce' however, I couldn't find documentation for kubb generation supporting it).
The 200 path returns a protobuf file. The routes may error with 400 and 500 responses, returning the standard json responses.
We may also want to return 404s under two scenarios. First, users may request tiles that we know are outside the NYC extent. In which case, we could preemptively check and return a 404. Otherwise, the API returns an empty file. Second, users may request a zoom level that is expensive to calculate. For example, tax lot zooms lower than 14 are expensive and not generally helpful to view. We may want to preempt these requests as well (maybe we want to make this a 403?).
openapi/openapi.yaml
Controller
When the tilesets are successfully generated, the controllers send the tile with a protobuf header. If there is an error, the standard json error responses are sent.
tax lot controller example
src/tax-lot/tax-lot.controller.ts
Service
The services extract the byte data from the query and pass it to the controller
tax lot service example
src/tax-lot/tax-lot.service.ts
Repository
Each tile is generated with a sql query in the repository. This gives us flexibility to format the properties in a way that best serves us. In several instances, we manually JSON-stringify arrays of data. Frontends can then explicitly parse this data into lists. This is helpful in a couple situations. First, we convert the label colors from hex to RGBA for easier rendering:
src/tax-lot/tax-lot.repository.ts
. Second, we list the multiple classes and categories a single zoning district label can belong to:src/zoning-district/zoning-district.repository.ts
. We could've also applied the same treatment to zoning-district fills. However, there is a benefit to instead having a distinct row for class and category for each zoning-district fill; it allows us to blend the colors of multiple fills to show the mixed nature of a zoning district.Notes
Ignore the TilesetRetrievalException. This was a misadventure and was removed from the PR version.
The
tilesets/*.sql
files were experimental. They were used to test the raw queries that were eventually converted into the drizzle queries. They are saved on the experimental branch for reference. However, they are removed in the PR branch.Beta Was this translation helpful? Give feedback.
All reactions