Skip to content
This repository has been archived by the owner on Jul 13, 2023. It is now read-only.

Refactor scratch pad #283

Open
SPTKL opened this issue Sep 29, 2021 · 4 comments
Open

Refactor scratch pad #283

SPTKL opened this issue Sep 29, 2021 · 4 comments

Comments

@SPTKL
Copy link
Contributor

SPTKL commented Sep 29, 2021

This issue will be used as a scratch pad to document potential refactors

Terminologies:

  • bbl (the actual bbl)
  • billing_bbl (the actual bbl for non-condo lots, 75 condo bbl for condo lots)
  • prime_bbl (the same as billing bbl, this is the intended join key for all DOF related tables)

Note that the bbl field in pluto is actually the prime_bbl, which is a combo of bbl and billing_bbl for condo lots

cama_bsmtcode.sql

replacing create_cama_primebbl.sql and cama_bsmttype.sql

SELECT 
	DISTINCT 
	x.prime_bbl,
	COALESCE(b.bsmtcode, '5') as bsmtcode
FROM (
        SELECT
        	COALESCE(dof_condo.condo_bill, pluto_input_cama_dof.bbl) AS prime_bbl, 
        	bsmnt_type,
        	bsmntgradient,
        	ROW_NUMBER() OVER (
        		PARTITION BY LEFT(bbl,10) 
        		ORDER BY bsmnt_type DESC, bsmntgradient DESC
        	) AS row_number
        FROM pluto_input_cama_dof LEFT JOIN dof_condo 
        	ON pluto_input_cama_dof.bbl = dof_condo.condo_base
        WHERE bsmnt_type <> '0' AND bldgnum = '1'
) x
LEFT JOIN pluto_input_bsmtcode b
	ON x.bsmnt_type = b.bsmnt_type 
	AND x.bsmntgradient = b.bsmntgradient
WHERE x.row_number = 1;
@SPTKL
Copy link
Contributor Author

SPTKL commented Sep 30, 2021

Tables that has BBL or BillingBBL or PrimeBBL

  • dcp_colp
    • bbl
    • mapbbl
  • dof_condo
    • condo_base
    • condo_bill
  • dof_dtm
    • bbl

dof_dtm contains all the physical bbls and doesn't have billing bbls

select bbl from dof_dtm
where right(bbl, 4) like '75%'
-- 0 record

dof_condo condo_base represent the base (physical bbl), and is a subset of all bbls in dof_dtm
dof_condo represent the comprehensive lookup between base (physical bbl) and condo bbl (billing bbl)

SELECT bbl from dof_dtm WHERE bbl not in (select condo_base from dof_condo)
  • pluto_input_cama_dof
    • bbl
-- the longest bbl is 13, so we would have to do left(bbl, 10)
select max(length(bbl)) from pluto_input_cama_dof
  • pluto_input_condolot_descriptiveattributes
    • parid
  • pluto_input_geocodes
    • billingbbl
    • bbl
  • pluto_input_numbldgs
    • bbl
  • pluto_pts
    • parid
    • boro, block, lot

not that for pluto_pts, only the first 10 digit of parid is bbl
image

--- boro||block||lot = LEFT(parid, 10)
select boro||block||lot as bbl, parid
from pluto_pts
where boro||block||lot <> LEFT(parid, 10)

pluto_pts contains condo bbls (billing bbl)

select left(parid, 10) from pluto_pts 
where right(left(parid, 10), 4) like '75%'

condo bbl, which is a type of billing bbl, is characterized by e.g. RIGHT(bbl, 4) LIKE '75%', note that the base (physical) bbl would never have '75%' as lot number.

SELECT 
	LEFT(parid, 10), 
	dof_condo.condo_base, 
	dof_condo.condo_bill 
FROM pluto_pts 
JOIN dof_condo ON LEFT(parid, 10) = dof_condo.condo_base
WHERE RIGHT(LEFT(parid, 10), 4) LIKE '75%'
-- 0 records returned

Comparing PTS vs DTM

SELECT bbl, left(parid, 10) 
FROM dof_dtm FULL OUTER JOIN pluto_pts
ON bbl = left(parid, 10) 
WHERE bbl is null or left(parid, 10) is null;

There are bbls in DTM not in PTS and bbls in PTS but not in DTM (most of them have lot number starting with 1), so the final list of bbls should be a combination of the two

Comparing CAMA vs PTS

select LEFT(bbl, 10) as cama_bbl, left(parid, 10) as pts_bbl
from pluto_input_cama_dof FULL OUTER JOIN pluto_pts
ON LEFT(bbl, 10)  = left(parid, 10) 
WHERE LEFT(bbl, 10)  is null or left(parid, 10) is null;

It seems like CAMA mostly align with PTS, however, there were two bbls not in PTS, and PTS has condo bbls not in CAMA

select LEFT(bbl, 10) from pluto_input_cama_dof
where right(LEFT(bbl, 10), 4) like '75%';
-- 0 records

CAMA only has actual base (physical) bbls not billing bbls

We have very few condo lots that have a record in pluto_input_cama_dof

157 condo lots have a record in CAMA, and there are 10349 condo lots in total

SELECT LEFT(bbl, 10) AS cama_bbl 
from pluto_input_cama_dof
WHERE LEFT(bbl, 10) in (SELECT condo_base from dof_condo);

@SPTKL
Copy link
Contributor Author

SPTKL commented Sep 30, 2021

For condo unit records in PTS

e.g.
base_bbl = 4015070001
billing_bbl = 4015077502
unit_bbl = 4015071101 ~ 4015071140
image

the values would have to come from aggregation of condo units
image
but land area and gross sqft would come from the main record?
image

@SPTKL
Copy link
Contributor Author

SPTKL commented Oct 4, 2021

Issues that I noticed:

There are 3 kinds of BBLs

  • bbl the physical lot bbl, it's the same as billing bbl for non-condos
    • for condos, we call it condo_base_bbl
  • condo_billing_bbl the imaginary billing bbl for condos (think of condos as lots), they nest within a condo_base_bbl
  • unit_bbl each unit in a condo has a unit_bbl that nest on top of a condo_base_bbl

the same condo_base_bbl can correspond to multiple condo_billing_bbl
select * from dof_condo where condo_base = '4076210001'
image
the dof_condo table also has duplicated entries
image
in dof_condo there are new condo buildings that don't have a condo_billing_bbl assigned yet. but if they do, they will always follow the 75 convention

SELECT * FROM dof_condo WHERE RIGHT(condo_bill, 4) NOT like '75%' or condo_bill is null;

In this case, these lots are just treated as normal lots, instead of condo lots

SELECT * FROM pluto_pts WHERE left(parid, 10) in (SELECT condo_base from dof_condo where condo_bill is null);

@SPTKL
Copy link
Contributor Author

SPTKL commented Oct 15, 2021

SELECT 
	pts.prime_bbl,
	coalesce(cama.residarea, pts.residarea) as residarea,
	coalesce(cama.officearea, pts.officearea) as officearea,
	coalesce(cama.retailarea, pts.retailarea) as retailarea,
	coalesce(cama.garagearea, pts.garagearea) as garagearea,
	coalesce(cama.storagearea, pts.storagearea) as storagearea,
	coalesce(cama.factoryarea, pts.factoryarea) as factoryarea,
	coalesce(cama.otherarea, pts.otherarea) as otherarea,
	coalesce(cama.commercialarea, pts.commercialarea) as commercialarea
FROM (
	SELECT 
		coalesce(
			coalesce(dof_condo.condo_bill,dof_condo.condo_base), 
			LEFT(parid,10)
		) as prime_bbl,
		NULLIF(SUM( 
	        office_area_gross::numeric+
	       	retail_area_gross::numeric+
	        garage_area::numeric+
	        storage_area_gross::numeric+
	        factory_area_gross::numeric+
	        other_area_gross::numeric
	    ), 0) as commercialarea,
		NULLIF(SUM(residential_area_gross::numeric), 0) as residarea,
	    NULLIF(SUM(office_area_gross::numeric), 0) as officearea,
	    NULLIF(SUM(retail_area_gross::numeric), 0) as retailarea,
	    NULLIF(SUM(garage_area::numeric), 0) as garagearea,
	    NULLIF(SUM(storage_area_gross::numeric), 0) as storagearea,
	    NULLIF(SUM(factory_area_gross::numeric), 0) as factoryarea,
	    NULLIF(SUM(other_area_gross::numeric), 0) as otherarea
	FROM pluto_pts LEFT JOIN dof_condo 
	ON appt_boro||appt_block||appt_lot = dof_condo.condo_base
	WHERE RIGHT(LEFT(parid,10), 2) not like '75'
	GROUP BY prime_bbl
) pts LEFT JOIN (
	SELECT 
		COALESCE(
			coalesce(dof_condo.condo_bill,dof_condo.condo_base), 
			cama.bbl
		) as prime_bbl,
		NULLIF(SUM(commercialarea::numeric), 0) as commercialarea,
	    NULLIF(SUM(residarea::numeric), 0) as residarea,
	    NULLIF(SUM(officearea::numeric), 0) as officearea,
	    NULLIF(SUM(retailarea::numeric), 0) as retailarea,
	    NULLIF(SUM(garagearea::numeric), 0) as garagearea,
	    NULLIF(SUM(storagearea::numeric), 0) as storagearea,
	    NULLIF(SUM(factoryarea::numeric), 0) as factoryarea,
	    NULLIF(SUM(otherarea::numeric), 0) as otherarea
	FROM (
		SELECT * FROM pluto_input_cama_dof 
		WHERE bldgnum = '1' AND LENGTH(bbl) = 10
	) cama LEFT JOIN dof_condo 
	ON cama.bbl = dof_condo.condo_base
	GROUP BY prime_bbl
) cama on pts.prime_bbl = cama.prime_bbl;

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant