Unlocking Insights from Public Data: A case study with COVID-19 exposure data

Map of Potential COVID-19 Exposures on UC Davis Main Campus; Updated Twice Daily

The Power of Data Visualization

Throughout the pandemic we have seen a proliferation of public data reporting and sharing, which for many may have been their first exposure to dashboards and real-time data visualization. Data visualizations are arguably the most powerful tool we have to enable and share insights from our data. Good data visualizations strive to accurately and faithfully display data, allowing us to quickly and efficiently detect patterns, make connections, and draw conclusions. Good data visualizations make data accessible. But what goes into turning public data into an impactful visual?

UC Davis has made available and visualized a plethora of data on its COVID-19 dashboard, where we can explore our university’s high vaccination rates and low positivity rates. To help our community further engage with these public data and to illustrate how data science workflows and tools can be combined to create similar impactful visualizations, the DataLab team focused on a dataset that has not yet been visualized – the AB 685 COVID-19 exposure data. All the code from this project is available, so you too can learn to turn public tabular data into your own interactive data visualization.

About AB 685

UC Davis makes all known potential worksite exposures to COVID-19 on campus publicly available through the Potential Worksite Exposure Reporting (AB 685) webportal. These data only report known potential exposures, and do not directly address exposure risk. While the exposure data tables in the webportal are comprehensive, it can be difficult to interpret these data beyond identifying potential exposures for specific buildings on specific days. Visualizing these data by layering the exposures onto a campus map can help those unfamiliar with campus geography (including our first and second year students who are attending in person for the first time this fall) better assess what potential exposures may be more relevant to them. Adding an interactive time series component further enhances our ability to quickly assess the extent of that relevance, and explore other patterns in the data. Making the visualization dynamic allows for real time public access to these insights.

The Making of an Interactive and Dynamic Visualization

To create this data visualization, the UC Davis DataLab wrote computer scripts to collect the exposure data from the public AB 685 webportal, and automated these scripts to run twice daily to ensure near real-time consistency with the campus’ online dataset. The potential exposure worksites are then paired with known building names on campus, and the potential exposure dates are converted into date ranges. These spatial and temporal components are then combined with a map of the campus, and shown on a timeline. Campus buildings on this map are shown in gold during the time frames of potential exposures, helping students and the broader campus community understand where and when a COVID-19 exposure may have occurred.

As with nearly all data science projects, a majority of the effort in creating the final data visualization centered on “data munging” — the process of cleaning and transforming data from its “raw” form into formats that can be analyzed and displayed. For example, a lack of standardization in worksite names from the AB 685 webportal meant writing additional cleaning scripts to unify the dataset with the underlying spatial geography. While “Activities and Recreation Center” is the official name of the campus fitness center, the webportal exposure dataset includes more colloquial and variable names for this building including “ARC” and “Activities and Recreation Center (ARC)”. The ARC example was common enough that we could build a dictionary to recognize it, but a large number of worksites in the dataset had to be checked by hand. As data scientists we don’t always get to design our database, and so we develop clever techniques for parsing and aligning data. Future work to fully automate the entirety of this visualization workflow would include leveraging additional text mining techniques to fully automate name matching efforts.

The UC Davis DataLab hopes the community finds this visualization useful as an exploration tool and as an example of the power of a good data visualization. If you would like to learn more about the code we used to make this visualization see the publicly available project repository. Want to learn how you can use these same tools to make your own interactive data visualization? Keep an eye out for our upcoming workshops and check out the DataLab training archives for recordings and learner guides from our past workshops.

Stay safe Aggies, and remember to follow campus COVID-19 policies, keep an eye on campus announcements and sign up for CA Notify to get personalized COVID-19 alerts as we start this fall quarter back on campus.

Resources

More about how DataLab is helping UC Davis fight COVID-19: Predicting COVID-19 hospital admissions at UC Davis Medical Center project description and blog post.

More about how DataLab is helping teach students data visualization see our training archive, and specifically our recent Data Visualization Principles and Critical Approach to Data Visualization workshops.

This post was written by graduate student Jared Joseph and Dr. Pamela Reynolds with contributions from Dr. Michele Tobias and Jessica Nusbaum. The data visualization project was initiated and led by Dr. Tobias, DataLab’s geospatial data specialist, with significant contributions from undergraduates Elijah Stockwell and Sebastian Lopez, and feedback from the wider DataLab team. Questions can be directed to datascience@ucdavis.edu.

Facebook Tweet LinkedIn

Search

Unlocking Insights from Public Data: A case study with COVID-19 exposure data

The Power of Data Visualization

About AB 685

The Making of an Interactive and Dynamic Visualization

Resources

DataLab is Hiring!

Women in Data Science 2023 Datathon and Conference

UC Love Data Week 2023

RESCHEDULED – Computational Pedagogy Fall Meetup

CA 2022 Election Data Challenge Public Symposium

DataLab Launches New Micro-Credential with GradPathways

Hood Canal Landscape Assessment and Prioritization Tool (HC-LAP): A Web Mapping Application to Support Conservation Efforts in Hood Canal, Washington

Winter 2023 Course Announcement: Adventures in Data Science (Quarter 1)

Call for 2022-2023 DataLab Affiliates

Call for Proposals: Pilot Translation and Clinical Studies Program

New RLC: the Davis Julia Users Group

Upcoming Intro Workshops: Python & R

2022 CITRIS Seed Funding Applications Open

Call for Applications – Translational Health Data Science Fellowship Program 2022-2023

WOMEN’S REPRODUCTIVE RIGHTS ARE A FUNDAMENTAL HUMAN RIGHT

NIH Common Fund Hackathon

HackDavis Returns April 16-17

Disease Bioportal Project Awarded Phase II NSF Grant

Recordings now available from 2022 UC Love Data Week

MPA Project Receives New Funding

The Power of Data Visualization

About AB 685

The Making of an Interactive and Dynamic Visualization

Resources

Related Posts