In this session we will discuss the problems that Data Engineering at the Department of City Planning encountered managing datasets and introduce the open source tooling that we’ve built to manage metadata, generate documentation, enforce data quality, and automate distribution of data to platforms, with a focus specifically on NYC Open Data.

This talk is aimed primarily at those who have an interest in automating some or all of the above. We will walk through how we, at City Planning, catalog our dataset metadata; how that metadata is used to generate READMES, data dictionaries, and other metadata files; how to leverage metadata for automated QA; how to automate distribution of data to destinations like the Tyler/Socrata open data platform, databases, FTP servers, and data lakes; and finally how interested developers can make use of our framework and potentially contribute code of their own.

This presentation is part of the Open Data @ NYC Planning event series.

Click here to RSVP for virtual attendance.

Click the blue “Going” button below to RSVP for in-person attendance at the Department of City Planning’s offices (120 Broadway, New York, NY 10271).

As data analysts and engineers know, quality source data is crucial to sound analyses and healthy pipelines. It can also take a lot of time, effort, and resources to wrangle. Data Engineering at the Department of City Planning has written a new tool (python module/CLI) to manage data extraction and archival.

In this session, we will show potential users how they might simplify and automate extracting data from external sources like NYC Open Data and ArcGIS Online. We will touch on some of the built-in features of the tool as well as where we’re going: simple data cleaning, automatic geocoding, data validation, and more.

This presentation is part of the Open Data @ NYC Planning event series.

Click here to RSVP for virtual attendance.

Click the blue “Going” button below to RSVP for in-person attendance at the Department of City Planning’s offices (120 Broadway, New York, NY 10271).

NYC Planning has developed the Fast Tracker app to allow users to determine whether planned housing projects are eligible for the new citywide Green Fast Track rule. The rule streamlines housing production by allowing projects of a specific typology to simplify the environmental review process, while satisfying state and city environmental standards. The app integrates ESRI Experience Builder with Survey123 and Microsoft Power Automate so users can enter project criteria and determine eligibility under the rule. We will discuss developing the app in tandem with the rule, under a strict timeline and scope. This presentation will focus on how the app was developed, how data was collected, reviewed, and processed for inclusion in the app. We will also talk about how the app and data pipelines are maintained.

This presentation is part of the Open Data @ NYC Planning event series.

Click here to RSVP for virtual attendance.

Click the blue “Going” button below to RSVP for in-person attendance at the Department of City Planning’s offices (120 Broadway, New York, NY 10271).

This presentation, hosted by the NYC Office of Management and Budget, will take participants through the journey of building a centralized data pipeline on a generic cloud platform to deliver accurate, consistent, and timely insights automatically! We will walk through step-by-step the entire lifecycle of data management, from raw data ingestion and cleaning to transforming it into a processed and standardized dataset that serves as the foundation for consistent and accurate reporting across the organization.

We will also focus on the general motivations behind the automation process and critical data decisions made during the cleaning process. And going from the general to the specific, we will show how we have set up a data workflow that allows us to run automatic reporting, both in the form of a dashboard and regular emails, using the City’s 311 data.  In addition, we will talk about cloud data storage, cloud computing, and other modern digital tools we used.

Join this event to learn from NYC Emergency Management (NYCEM) about their role coordinating citywide emergency planning and response for all types and scales of emergencies, and how they use 311 data in both the response and recovery cycles of a disaster.
This virtual presentation will explore:
– The types of reporting products produced at NYCEM for different disaster cycles.
– A technical overview of our data collection process and data pipeline.
– The important role 311 data plays in different cycles of an emergency.

NYC School of Data is a community conference that demystifies the policies and practices around open data, technology, and service design. This year’s conference helps conclude NYC Open Data Week and features 30+ sessions organized by NYC’s civic technology, data, and design community! Our conversations and workshops will feed your mind and inspire you to improve your neighborhood.

To attend, you need to purchase tickets. The venue is accessible, and the content is all-ages friendly! If you have accessibility questions or needs, please email us at schoolofdata@beta.nyc.

Thank you to Reinvent Albany and Esri for helping to cover conference costs and making it possible to meet in 2025.

And If you can’t join us in person, tune into the main stage live stream provided by the Internet Society New York Chapter. Follow the conversation #nycsodata on Bluesky.

Purchase your tickets here.

Open-source is transforming the data engineering space. By combining tools like Parquet, Polars, DuckDB, and Dagster, data product creation can achieve a collective 1000x improvement in cost, performance, and simplicity. Plus — thanks to LLMs, it has never been easier to quickly learn how to build with these tools!

Join Christian Casazza, Data Engineer, where he’ll speak about the Open Data Stack. He’ll show you how to use open-source tools to ingest and store any dataset from the Open Data API, run a SQL transformation pipeline, and visualize the results as a live web app (all for free from your computer!). Then, learn how to work with an LLM like ChatGPT to write ETL code, build SQL queries, and create frontend apps.

Power Query is perhaps the most useful but also most unknown of Microsoft programs. Ryan Yeung, Director of Performance Evaluation and Analytics, and Lori Lam, Data Analyst, from the Department of Citywide Administrative Services (DCAS) will demonstrate how to connect Power Query to the NYC Open Data and how to automate tedious Extract, Transform, Load (ETL) processes for use in Microsoft Excel or Microsoft PowerBI.