This event has passed.

Dirty Data Part 1: Finding and Cleaning Anomalies

Name: Dirty Data Part 1: Finding and Cleaning Anomalies
Start: 2025-03-24T13:00:00-04:00
End: 2025-03-24T14:00:00-04:00

March 24 @ 1:00 pm - 2:00 pm

Free

This is a virtual presentation illustrating how to undertake a data cleansing effort It is Part I of a two part presentation involving data quality presented by David Tussey, formerly of the NYC Department of Information Technology and Telecommunications and Jun Yan, professor of statistics from the University of Connecticut.

This part will describe the steps of a data cleansing effort, and illustrate those steps via live, real-world examples utilizing data from the NYC 311 Service Request dataset. We will examine the 311 SR dataset for anomalies. It is intended to be an instructional session reflecting lessons learned from our previous data cleansing efforts, augmented by real-time code execution providing examples of each step. By the end, we hope attendees will have a basic understanding of how to go about “cleansing” their own data sets.

Part II, presented on Thursday, 3/27 will deal with evaluating data quality over time in an attempt to answer the question “is the data getting better”?

Details

Date:: March 24
Time:: 1:00 pm - 2:00 pm
Cost:: Free
Event Category:: Presentation or Talk
Event Tags:: Data Governance, Data Science

Organizers

: David Tussey
: Jun Yan