Skip to content

Product: FacDB

Damon McCullough edited this page Aug 31, 2023 · 2 revisions

Home

Facilities database (FacDB) 🏭 🏥 🏨

GitHub release (latest SemVer) Build

I. Overview

The City Planning Facilities Database aggregates more than 35,000 records from 52 different public data sources provided by City, State, and Federal agencies.

While each source agency classifies its facilities according to their own naming systems, we have grouped all facilities and program sites into the following seven categories to help planners navigate the data more easily:

  • Health and Human Services
  • Education, Child Welfare, and Youth
  • Parks, Gardens, and Historical Sites
  • Libraries and Cultural Programs
  • Public Safety, Emergency Services, and Administration of Justice
  • Core Infrastructure and Transportation
  • Administration of Government

Within each of these domains, each record is further categorized into a set of facility groups, subgroups, and types that are intended to make the data easy to navigate and more useful for specific planning purposes. Facility types and names appear as they do in source datasets, wherever possible. A full listing of the facility categories is provided in the data dictionary.

General information

Dataset Name Facilities Database (FacDB)
Agency Name Department of City Planning
Update Frequency Quarterly
Dataset Description Facilities and program sites that are owned, operated, funded, licensed or certified by a City, State, or Federal agency
Dataset Keywords Facilities, Education, Child Welfare, Parks, Gardens, Historical Sites, Libraries, Cultural Programs, Public Safety, Emergency Services, Administration of Justice, Health Services, Human Services, Infrastructure, Transportation, Government Administration
Dataset Category City Government
Additional Information The Department of City Planning aggregates information about 33,000+ facilities and program sites that are owned, operated, funded, licensed or certified by a City, State, or Federal agency in the City of New York into a central database called the City Planning Facilities Database FacDB). These facilities generally help to shape quality of life in the city’s neighborhoods, and this dataset is the basis for a series of planning activities. This public data resource allows all New Yorkers to understand the breadth of government resources in their neighborhoods.

Each record in FacDB represents a facility site.

FacDB is the most comprehensive spatial data resource available for facilities run by public and non-public entities in NYC, but it does not claim to capture every facility within the specified domains. Some facilities are deliberately excluded from the data that source agencies provide in order to protect the safety and privacy of their clients. Also, many records could not be geocoded.

There are known to be cases when the address provided in the source data is for a headquarters office rather than the facility site location. Unfortunately, these could not be systematically verified. For more detailed information on a specific facility reach out to the respective oversight agency.

II. Common uses

Fair Share Analysis, Neighborhood studies, Facilities planning

III. Watch-outs

Analysis Limitations. As a result of the data limitations and inconsistencies listed below users should be careful in their use of this database so as to avoid developing suspect analyses. For example, a comparison of the density or accessibility of facilities across neighborhoods should recognize that some of the facilities included are organizational headquarters rather than service sites and that this database is not authoritatively comprehensive. In addition, we rely on source data from other agencies to populate the database, and some of these sources may fall out-of-date. Users can find the date of each source dataset’s latest update in the source data dictionary.

Missing Records. Currently, FacDB is the most comprehensive spatial data resource available for facilities run by public and non-public entities in NYC, but it does not claim to capture every facility within the specified domains. Some facilities are deliberately excluded from the data that source agencies provide in order to protect the safety and privacy of their clients. Also, many records could not be geocoded. To learn more about how the data are processed, please review the Data Sources and Compilation Process.

Duplicates. Please be aware that this dataset may include cases of duplicate records for the same facility because several source datasets have content that overlap.

Administrative Addresses. There are known to be cases when the address provided in the source data is for a headquarters office rather than the facility site location. Unfortunately, these could not be systematically verified. For more detailed information on a specific facility reach out to the respective oversight agency.

Public Accessibility of Sites. DCP is unable to verify the public accessibility of all sites. For example, some playgrounds or playing fields may only be accessible to participants in certain programs.

IV. Data Sources and Compilation Process

Since the facility records are aggregated from many datasets designed for different purposes, the data will be transformed over several stages to reach its final state. The stages are described below and all the scripts used are available on the NYC Planning GitHub page.

Data loading. Since the source datasets have been maintained by various agencies and updated with different frequencies, datasets are loaded into Amazon s3 as a centralized datahub preparing for the downstream data processing. The list of data sources can be found here.

Geoprocessing. When records have address information, spatial data is assigned by taking the centroid of the BIN returned by Geosupport that matches the DoITT building footprints dataset. If a BIN is not available, the latitude and longitude returned by Geosupport is used to create the geometry for the record. If these fields are not available from Geosupport, but the source data has spatial information (i.e. coordinates) the spatial data is created from the source data. If the source data consisted of polygon geometries, the centroid of the polygon was used to assign the geometry for the records in the database. There are cases where the coordinates from the source data fall in the roadbed and not inside a BBL boundary due to the geocoding technique used by the source. Lastly, if a geometry could not be assigned from the BIN, latitude/longitude from Geosupport, or source data, the centroid of the BBL from the clipped MapPLUTO is used. Other geographic information such as the community district is taken from Geosupport if a value is returned, otherwise administrative districts are assigned via spatial joins where the record has a geometry.

Duplicate Record Removal. Several of the source datasets have content that overlaps. Duplicate records were identified by querying for all the records that fall within the same BIN or BBL and have the same Facility Subgroup or Type, same Facility Name, or same Oversight Agency. Where duplicate records were identified all but the primary record was removed from the database.

V. General Constraints Use Limitations

The facilities database is being provided by the Department of City Planning (DCP) for informational purposes only. DCP does not warrant the completeness, accuracy, content, or fitness for any particular purpose or use of the dataset, nor are any such warranties to be implied or inferred with respect to the dataset as furnished on the website

VI. Legal Constraints Use Limitations

DCP and the City are not liable for any deficiencies in the completeness, accuracy, content, or fitness for any particular purpose or use of the dataset, or applications utilizing Dataset, provided by any third party. The City Planning Facilities Database (FacDB) is only as good as the source data it aggregates, and the Department of City Planning cannot verify the accuracy of all records. Please read more about specific data and analysis limitations before using this data.


Data Loading Instructions

Data Loading

Datasets that get updated every build through scraper:

OpenData updates – check date against what’s in recipes:

Manually check data for updates:

Manually download and load via recipe app:

  • dcp_pops

    • Source: Download from POPs app, available on DCP Commons. Be sure to only take the public version.
    • Be sure to do this source last, as the OpenData release of POPs needs to be in sync
  • doe_lcgms

  • dot_bridgehouses

    • Source: Will receive via email or FTP
  • dot_ferryterminals

    • Source: Will receive via email or FTP
  • dot_mannedfacilities

    • Source: Will receive via email or FTP
  • dot_publicparking

    • Source: Will receive via email or FTP
  • dot_pedplazas

    • Source: Will receive via email or FTP
  • foodbankny_foodbanks

    • Source: Foodbank NYC
    • Source url: http://www.foodbanknyc.org/get-help/
    • Go to the expanded view of the google maps. Click “Download KML” under the options (three dots). Instead of “Entire Map,” select “Food Bank For NYC Open Sites.” Select. “Keep data up to date with network link KML (only usable online).“ Go to https://mygeodata.cloud/converter/kmz-to-csv to convert the kmz to csv, then use recipe app to load in the csv
  • nysed_activeinstitutions

  • nysed_nonpublicenrollment

  • nysoasas_programs

  • usnps_parks

Used, but not updated

  • dep_wwtc

Need more information


Guidelines for bringing in dataset into FacDB

Python

  1. Ingest dataset using custom function
    • Filter dataset if filters are straightforward, an example of this is filtering state data by county.
    • Clean any fields as much as is necessary to use them as inputs in geocoding functions
  2. Use decorators to geocode as much as possible
    • If BBL exists, use function BL
    • If BIN exists, use function BN
    • If house number, street name, borough and/or zipcode exist, pass into 1B directly
    • If address, borough and/or zipcode exist, pass into parse address then into 1B directly
    • If no address, bin, and bbl info exist, pass without geocoding

SQL

  1. Do source-specific manipulations in SQL to create the table _{dataset}
    • Map source data fields directly to facdb fields
      • facname
      • factype
      • datasource
      • facsubgrp
      • opname
      • optype
      • overagency
      • capacity
      • captype
      • proptype
    • Include any filtering that is not straightforward in python
  2. Combine records from _{dataset} tables together into single table.
    • Standardize across datasets
      • opname
      • overagency
    • Assign values by taking from cleaned Geosupport inputs or from lookup tables
      • boro
      • addressnum
      • streetname
      • address
      • city
      • zipcode
      • bin
      • bbl
      • facgroup
      • facdomain
      • servarea
      • opabbrev - lookup with opname (opname should be standardized)
      • overabbrev - lookup with overagency (overagency should be standardized, need to decide if we should take Green Book standard)
      • overlevel
    • Assign geographic attributes and check that values are consistent across a record (need to decide logic about when we take from Geosupport versus source, and which Geosupport functions have priority)
      • addressnum
      • streetname
      • address
      • city
      • zipcode
      • bin
      • bbl
      • latitude
      • longitude
      • xcoord
      • ycoord
      • commboard
      • nta
      • council
      • censtract
      • geom

Data Dictionary

uid

  • Longform Name: ID
  • Description: Unique ID of the record

facname

  • Longform Name: Facility name
  • Description: Name of the facility in proper case as received from the source data

factype

  • Longform Name: Type
  • Description: Value representing the specific type of facility, which the most granular category of facilities. This value is often taken directly from the source data

facsubgrp

  • Longform Name: Subgroup
  • Description: Value identifying the subgroup the facility belongs to based on the facility type. Subgroup values are assigned by DCP

facgroup

  • Longform Name: Group
  • Description: Value identifying the group the facility belongs to based on the subgroup

facdomain

  • Longform Name: Domain
  • Description: Value identifying the domain the facility belongs to based on the group. Domain is the broadest categorical grouping

servarea

  • Longform Name: Service area
  • Description: Value identifying whether the extent of the area the facility serves is local or regional

opname

  • Longform Name: Operator name
  • Description: Name of the operating entity

opabbrev

  • Longform Name: Operator acronym
  • Description: Abbreviation for the operating entity

optype

  • Longform Name: Operator type
  • Description: Indicates whether the operating entity is public or non-public

overagency

  • Longform Name: Oversight agency name
  • Description: Value identifying the domain the facility belongs to based on the group. Domain is the broadest categorical grouping

overabbrev

  • Longform Name: Oversight agency acronym
  • Description: Abbreviation for the oversight agency

overlevel

  • Longform Name: Oversight level
  • Description: The level of government of the oversight agency: City, State, City-State, Federal, or Non-public Oversight

capacity

  • Longform Name: Capacity
  • Description: How many of capacity type/unit the facility is intended to hold.

captype

  • Longform Name: Capacity type
  • Description: Value representing the unit type of capacity, such as beds, visitors, seats, etc.

proptype

  • removed from dataset
  • Longform Name: Property type
  • Description: x

addressnum

  • Longform Name: House number
  • Description: Address number of where the facility is located according to GeoSupport

streetname

  • Longform Name: Street name
  • Description: Street name where the facility is located, according to GeoSupport

address

  • Longform Name: Address
  • Description: Concatenated value of AddressNumber and StreetName of where the facility is located

city

  • Longform Name: City
  • Description: City name where the facility is located according to GeoSupport

zipcode

  • Longform Name: Zipcode
  • Description: Zip code of address from GeoSupport

boro

  • Longform Name: Borough
  • Description: Full name of the borough the facility is within

borocode

  • Longform Name: Borough Code
  • Description: The 1 digit of the borough the facility is within

bin

  • Longform Name: BIN
  • Description: BIN value of the building the facility is located in. If the facility spans multiple buildings only one BIN is reported

bbl

  • Longform Name: BBL
  • Description: BBL values for the tax lots the facility is located on. If the facility spans multiple lots only one BBL is reported

latitude

  • Longform Name: Latitude
  • Description: Latitude of the location as returned by Geosupport, or calculated using the coordinates in or geometry from the source data

longitude

  • Longform Name: Longitude
  • Description: Longitude of the location as returned by Geosupport, or calculated using the coordinates in or geometry from the source data

xcoord

  • Longform Name: X coord
  • Description: X Coordinate of the location as returned by Geosupport, or calculated using the coordinates in or geometry from the source data

ycoord

  • Longform Name: Y coord
  • Description: Concatenated value of House Number and Street Name of where the facility is located

cd

  • Longform Name: Community district
  • Description: Community District the facility is within according to Geosupport

nta

  • Longform Name: NTA code
  • Description: Code of the NTA the facility is within according to Geosupport

council

  • Longform Name: Council district
  • Description: Council district the facility is within according to Geosupport

schooldist

  • Longform Name: School district
  • Description: School district the facility is within according to Geosupport

policeprect

  • Longform Name: Police precinct
  • Description: Police precinct the facility is within according to Geosupport

censtract

  • Longform Name: Census tract
  • Description: Census tract of the NTA the facility is within according to Geosupport

datasource

  • Longform Name: Source dataset
  • Description: Name of the dataset the record came from

geom

  • Longform Name: Geometry
  • Description: Spatial data component

2020 March updates

acs_daycareheadstart

+ status: NA
+ comments: discontinued, we are no longer using this data source

bpl_libraries

+ status: updated

dca_operatingbusinesses

+ status: updated

dcas_colp

+ status: NA
+ comments: no new version of COLP released yet (using 2018 November version on Bytes)

dcla_culturalinstitutions

+ status: updated

dcp_pops

+ status: updated
+ comments: downloaded from the POPS app

dep_wwtc

+ status: NA
+ comments: this dataset doesn't need updates

dfta_contracts

+ status: updated

doe_busroutesgarages

+ status: updated

doe_lcgms

+ status: updated
+ comments: this dataset is updated for CEQR

sca_enrollment_capacity

+ status: updated

doe_universalprek

+ status: updated

dohmh_daycare

+ status: updated

dot_bridgehouses

+ status: NA 
+ comments: might need refresh from FTP

dot_ferryterminals

+ status: NA 
+ comments: might need refresh from FTP

dot_mannedfacilities

+ status: NA 
+ comments: might need refresh from FTP

dot_pedplazas

+ status: NA 
+ comments: might need refresh from FTP

dot_publicparking

+ status: NA 
+ comments: might need refresh from FTP

dpr_parksproperties

+ status: updated

dsny_mtsgaragemaintenance

+ status: NA
+ comments: might need refresh from FTP

dycd_afterschoolprograms

+ status: updated

fbop_corrections

+ status: NA
+ comments: doesn't need update, no new facilities added

fdny_firehouses

+ status: updated

foodbankny_foodbanks

+ status: NA
+ comments: need to scrape data from google map and the downloaded KML does not have spatial info

hhc_hospitals

+ status: updated

hra_centers

+ status: updated
+ comments: new data source https://data.cityofnewyork.us/City-Government/Community-Health-Centers/b2sp-asbg/data

moeo_socialservicesiteloactions

+ status: NA
+ comments: receive by email

nycdoc_corrections

+ status: NA
+ comments: no update needed, hand checked no new facilities added

nycha_communitycenters

+ status: updated

nycha_policeservice

+ status: updated

nycourts_courts

+ status: NA
+ comments: hand checked, no update needed

nypl_libraries

+ status: updated
+ comments: not sure there are new libraries added, but the scraper worked

nysdec_lands

+ status: updated
+ comments: for some reason gdal won't read the link, so I had to manual update. not sure if new records added tho

nysdec_solidwaste

+ status: updated

nysdoccs_corrections

+ status: updated
+ comments: 1 facility in queens, 1 facility in Manhattan, 0 in the other 3 boros. 

nysdoh_healthfacilities

+ status: updated

nysdoh_nursinghomes

+ status: updated

nysed_activeinstitutions

+ status: updated
+ comments: manually downloaded selected table __  All Institutions: Active Institutions with GIS coordinates and OITS Accuracy Code - Select by County__ CSV from [website](https://eservices.nysed.gov/sedreports/list?id=1) and loaded into S3

nysed_nonpublicenrollment

+ status: updated
+ comments: this data set was not previously included in the list for some reason. 

nysoasas_programs

+ status: updated
+ comments: original link no longer work, switch to https://edm-recipes.nyc3.digitaloceanspaces.com/2020-03-23/Treatment_Providers_OASAS_Directory_Search_23-Mar-20.csv

nysomh_mentalhealth

+ status: updated

nysopwdd_providers

+ status: updated

nysparks_historicplaces

+ status: updated

nysparks_parks

+ status: updated

qpl_libraries

+ status: updated

sbs_workforce1

+ status: updated

uscourts_courts

+ status: updated
+ commetns: scraper ran smoothly, not sure there are new facilities added

usdot_airports

+ status: updated
+ comments: on argis site, it says updated 2020/02/17, no url change

usdot_ports

+ status: updated
+ comments: url changed to https://data-usdot.opendata.arcgis.com/datasets/major-ports-1, data is as of __2019/12/17__

usnps_parks

+ status: updated
+ comments: manually downloaded from url and loaded into s3
Clone this wiki locally