nycbuildxwalks downloads official New York City geographic boundary data and generates comprehensive crosswalk tables showing how administrative and spatial boundaries overlap. It produces both wide-format and long-format CSV crosswalks with intersection area and percentage calculations.
This is an R port of the Python NYC geography crosswalks tool by Nathan Storey at MODA-NYC, which builds on earlier work by BetaNYC.
Supported geographies
| id | dataset_name |
|---|---|
| cd | Community Districts |
| pp | Police Precincts |
| dsny | Sanitation Districts |
| fb | Fire Battalions |
| sd | School Districts |
| hc | Health Center Districts |
| cc | City Council Districts |
| nycongress | Congressional Districts |
| sa | State Assembly Districts |
| ss | State Senate Districts |
| bid | Business Improvement Districts |
| nta | Neighborhood Tabulation Areas |
| zipcode | Modified ZIP Code Tabulation Areas |
| hd | Historic Districts |
| ibz | Industrial Business Zones |
Installation
You can install the development version of nycbuildxwalks from GitHub with:
# install.packages("pak")
pak::pak("kjhealy/nycbuildxwalks")Alternatively, install this package from my r-universe:
install.packages(
"nycbuildxwalks",
repos = c("https://kjhealy.r-universe.dev", "https://cloud.r-project.org")
)Including https://cloud.r-project.org ensures dependencies on CRAN are resolved automatically.
Usage
The main entry point is make_run(), which orchestrates the full pipeline:
library(nycbuildxwalks)
# Full run: download boundaries and build all crosswalks
run_dir <- make_run()
# With ZIP archives
run_dir <- make_run(zip_artifacts = TRUE)
# Test with limited features
run_dir <- make_run(max_primaries = 2, primary_only = c("cd", "pp"))You can also run individual steps:
# Step 1: Download and combine boundaries
run_dir <- download_all_boundaries()
# Step 2: Build crosswalks from existing boundaries
build_crosswalks(
boundaries_path = file.path(run_dir, "all_boundaries.geojson"),
run_dir = run_dir
)Methodology
ZIP codes are MODZCTAs: The
zipcodelayer uses DOHMH MODZCTA polygons. Results may differ from USPS ZIP definitions.Geometry validity: After combining inputs, invalid geometries are repaired with
sf::st_make_valid().Intersection de-noising: A small negative buffer (-50 ft by default) is applied during intersection to reduce line-touching artifacts. Candidates are not filtered by this buffer. After intersection, a minimum area threshold (100 sq ft) suppresses noise.
Multipart primaries: Primary features are dissolved by name so multipart geometries are treated as a single unit.
Area calculations: All spatial operations use EPSG:2263 (NY State Plane, feet) for accurate area computation.
Outputs
Each run produces a timestamped directory under outputs/ containing:
-
all_boundaries.geojson— combined boundary file -
run_meta.json— source URLs, cycle resolutions, git SHA -
longform/— one CSV per primary geography with pairwise intersection details -
wide/— one CSV per primary geography with semicolon-separated overlapping feature names -
crosswalks_meta.json— buffer, thresholds, and IDs used
Acknowledgements
This package is based on the Python tool by Nathan Storey at MODA-NYC, which builds on the concepts and data aggregation methods originally implemented in the BetaNYC NYC Boundaries Map repository.
Hexlogo Photo: Detail from “Brooklyn Paving 5-25-16”, NYC DOT.
