Loading Events

« All Events

Virtual Event Virtual Event

Dirty Data Part 1: Finding and Cleaning Anomalies

March 24 @ 1:00 pm - 2:00 pm

Virtual Event Virtual Event
Free

This is a virtual presentation illustrating how to undertake a data cleansing effort It is Part I of a two part presentation involving data quality presented by David Tussey, formerly of the NYC Department of Information Technology and Telecommunications and Jun Yan, professor of statistics from the University of Connecticut.

This part will describe the steps of a data cleansing effort, and illustrate those steps via live, real-world examples utilizing data from the NYC 311 Service Request dataset. We will examine the 311 SR dataset for anomalies. It is intended to be an instructional session reflecting lessons learned from our previous data cleansing efforts, augmented by real-time code execution providing examples of each step. By the end, we hope attendees will have a basic understanding of how to go about “cleansing” their own data sets.

Part II, presented on Thursday, 3/27 will deal with evaluating data quality over time in an attempt to answer the question “is the data getting better”?

Details

Date:
March 24
Time:
1:00 pm - 2:00 pm
Cost:
Free
Event Category:
Event Tags:
,

Organizers

David Tussey
Jun Yan

Other

Pre-requisites
Basic experience with procuring and using datasets from 3rd parties. Some knowledge of the NYC 311 organization, role and mission, would be useful.
Public Dataset(s)
311 Service Requests
Event Materials
https://github.com/jun-yan/nyc311clean/tree/main/data_anomalies

General Admission

Click "Going" to register for this event. If you are signing up for a virtual event, you will receive a Zoom link in an email confirmation.

RSVP Here