This is a virtual presentation illustrating how to undertake a data cleansing effort It is Part I of a two part presentation involving data quality presented by David Tussey, formerly of the NYC Department of Information Technology and Telecommunications and Jun Yan, professor of statistics from the University of Connecticut.
This part will describe the steps of a data cleansing effort, and illustrate those steps via live, real-world examples utilizing data from the NYC 311 Service Request dataset. We will examine the 311 SR dataset for anomalies. It is intended to be an instructional session reflecting lessons learned from our previous data cleansing efforts, augmented by real-time code execution providing examples of each step. By the end, we hope attendees will have a basic understanding of how to go about “cleansing” their own data sets.
Part II, presented on Thursday, 3/27 will deal with evaluating data quality over time in an attempt to answer the question “is the data getting better”?
Click "Going" to register for this event. If you are signing up for a virtual event, you will receive a Zoom link in an email confirmation.