Making data science connections
Whether the research involves satirical French cartoons, mutating tumors or ancient Greek tombstones, three Carolina faculty say using datasets opens doors to discovery.
They work in distinctly different disciplines — art history, genetics, classical archeology — but the Carolina faculty members all improve their research with the same key tool: data science.
The three talked about their data work and the collaborators who helped them during a Jan. 26 webinar, “Usual and Unusual Suspects in Data Science.” The event illustrated the growing use of data science on campus, especially across disciplines, and promoted resources for faculty and researchers.
The webinar was the first in the Carolina Data Science Now series, through which the Renaissance Computing Institute will help data science practitioners connect, while educating and inspiring the data-curious. The webinar showed the interdisciplinary approach to the School of Data Science and Society that the University will launch in fall 2022.
Future presentations are scheduled for Feb. 24, March 24, April 28 and May 26. Check Data Science Now for information.
A bridge to a solution
Kathryn Desplanque, assistant professor of 18th- and 19th-century European art in the College of Arts & Sciences’ art & art history department, described how she used data science to solve a large problem.
Desplanque took photos of 530 pieces of art satirizing Parisian artistic life published between 1750 and 1850. She wanted to analyze the works and represent them as one body without falling into what she called “the typical art historical track of pulling out exceptional images and focusing exclusively on them.”
Desplanque’s 20-second video flashed through all 530 images to emphasize the project’s size. Her notes and the images were stored in different places with nothing relating them to each other.
“It was a mess, and it became clear that I needed a solution. I found it completely daunting to embark upon this project and make meaning of these images in a precise and analytical way. I was terrified that the images would become marginalized, fall to the wayside,” Desplanque said.
But she found a person who served as a “bridge” to data science, helping her determine her goals. She then used the qualitative data analysis software INVIVO to link metadata around each image, eliminating the manual creation of links for data points.
Desplanque was able to store the photographs and link them to bibliographic information, cataloging them by size, medium and artist call number for easier retrieval. She annotated them and associated them to iconography she defined and to notes about how they approached joke making.
She urged participants to be open to ways such methodologies can change one’s thinking and research approach. “Humanists are doing data science, too. We’re usually doing it in wild and scrappy ways because we’re working on our own, outside of fields that have developed methodologies.”
Inspiration from nature
Corbin Jones, a professor in the College of Arts & Sciences’ biology department, the UNC School of Medicine’s genetics department and the Integrative Program for Biological & Genome Sciences, said that genomic datasets for humans and other species continue to grow at an incredible pace but are hard for biologists to access and use.
“We are trying to leverage analogies, mechanisms, processes, paradigms and methods used in disparate fields of biology to try and analyze different types of datasets that are emerging as a result of novel and new technologies,” Jones said.
Jones and collaborators in the College of Arts & Sciences’ statistics and operations research department and the UNC Eshelman School of Pharmacy are using “old-school evolutionary theory” called load theory to better understand how to assess cancer treatment effectiveness. Load theory predicts the number of mutations in a population at which they can no longer survive. The team assessed the mutation load in tumor cells found in cancer patients receiving two different drug treatments for insights into the recurrence of cells that are resistant to treatment. Their analysis indicates that mutational load can predict the disease’s outcome.
Like the other speakers, Jones talked about spatial data. He belongs to a group of Carolina researchers who are applying the analytic tools of spatial ecology to spatial transcriptomics or the measurement of gene expression across a tissue or organ. Instead of starting with a “lump of tissue,” they collect a smaller sample of cells as a snapshot of the tissue’s or organ’s complexity. They then use spatial ecology, which helps explain natural patterns such as animal migration, to relate the gene expression data to the tissue’s structure.
Questions and lines of research
Tim Shea, assistant professor of classical archeology in the College of Arts & Sciences’ classics department, described his methods in researching cemeteries of ancient Athens. He used a geographic information system called ArcGIS Pro to map tombstones in an excavation in the city center.
The points on Shea’s map represent a substantial database with his notes, a bibliography and information from tombstone inscriptions such as: person’s name, who dedicated the tombstone, whether the deceased was an immigrant, sculptor’s signature, kind of stone and dimensions.
Employing a large data range helped Shea notice patterns that he would have otherwise overlooked. The most interesting, he said, is that immigrant communities were buried in groups based on their city of origin. He also realized that immigrants from the same Mediterranean regions were buried near one another.
Shea created another database that visually connects originating points of Athenian immigrants to their tombstone locations. “I began thinking about the immigrant experience in ancient Athens, that groups from the same regions of the Mediterranean and Black Sea were probably living in neighborhoods together and being buried together in the same cemeteries.
“These are questions and lines of research that I could not have asked unless I had organized my database spatially from the beginning,” Shea said.
Shea trains undergraduates on GIS, many of whom use it in jobs with environmental law firms, public health, nonprofits and tech startups. He said that a proposed Spatial Antiquity Lab for campus, focused on the study of ancient cities and urbanism, will foster research and teaching in spatial humanities by researchers from many disciplines.
“These webinars are informal ways of sharing our research in data science across different disciplines, across our campus,” said Jay Aikat, RENCI chief operating officer and research associate professor in the College of Arts & Sciences’ computer science department. “We’re already a really collaborative campus, so we hope that this will kick off even further collaborations and, hopefully, among people who wouldn’t have otherwise come together.”
Participants also discussed data storage solutions, funding for student training, and opportunities for students such as the College’s data science minor and a certificate program in applied data science through the School of Information and Library Science.