University of Maryland

SoDa Symposium: Rehabilitation of open-ends: Creating a codebook for open-ends using machine learning techniques and human intervention that then can be used to drive action Presentations followed by a Q&A

12:00 – 1:00 PM EST December 13, 2022.
AbstractOpen-ends are a well known problem in survey research: language can yield extremely rich responses, including bringing to the surface aspects of a question or issue that the researcher might not have known to look for, but the analysis of text is costly and labor-intensive.  As a result, there is a tendency to include open-ends as an afterthought, to use them minimally, or to avoid them altogether. Computational methods can potentially help, but they often raise concerns about whether the results they provide are as trustworthy and actionable as other kinds of responses.
We will talk about approaches we’ve been taking to the analysis of open-ends, which combine automation with human intervention in order to navigate the balance between automation and trustworthiness.  Two experiments were run independently on the same set of 16,648 responses on Reddit to a question about reasons that people who considered suicide did not end up killing themselves.  The first experiment had human intervention at the start using a machine learning process that included word clouds and TF/IDF techniques to help human coders develop a codebook that was actionable. The second experiment used topic modeling, an unsupervised machine learning approach, to pull out latent categories from the open-ends, which then guided a step-by-step content analysis protocol carried out by subject matter experts to identify category labels and descriptions. We will compare/contrast our results at the symposium and more generally discuss the potential for techniques of this kind to bring open-ends out of the shadows in survey research.Presenters


Carol Haney
Head of Research and Data Science

At Qualtrics, Carol Haney is head of research and data science. Her principal research area is online quantitative research, specifically focusing on best practices around sampling, Total Survey Error, and advanced analytics.
Carol currently works with multiple commercial clients, mostly in the financial, health, and tech spaces. Carol has experience running large survey programs that involve customer experience, segmentation, and performance measurement. In 2015, Carol was honored by Qualtrics as the most valuable player.
Prior to Qualtrics, Carol has worked in executive positions at Toluna; Harris Interactive; TNS; SPSS; and the National Opinion Research Center at the University of Chicago. Carol currently leads all the formative research for the CDC’s anti-smoking ads for the past five years, a campaign that has in part contributed to the five-year decline in smoking rate in the U.S. amongst adults from 23% to 14%.

Philip Resnik
Professor, Institute for Advanced Computer Studies and Department of Linguistics
University of Maryland

 Philip Resnik holds a joint appointment as Professor in the University of Maryland Institute for Advanced Computer Studies and the Department of Linguistics, and an Affiliate Professor appointment in Computer Science. He earned his bachelor’s degree in Computer Science at Harvard in 1987, and his Ph.D. in Computer and Information Science at University of Pennsylvania in 1993, and joined the University of Maryland faculty in 1996. His industry experience prior to entering academia includes time in R&D at Bolt Beranek and Newman, IBM T.J. Watson Research Center, and Sun Microsystems Laboratories.   Resnik’s research focuses on computational modeling of language that brings together linguistic knowledge, domain expertise, and data-driven machine learning methods, with an emphasis on applications in computational social science as well as experience in multilingual text analysis and machine translation, and scientific interests in computational cognitive neuroscience.  He holds two patents and has authored or co-authored more than 100 peer-reviewed articles and conference papers. At various times his work has been highlighted in Newsweek, The Economist, New Scientist, and on National Public Radio, and he has been a repeat organizer and panelist at SXSW Interactive. Outside academia, Resnik was a technical co-founder of CodeRyte (clinical natural language processing, acquired in 2012 by 3M Health Information Systems), and is an advisor to Converseon (social strategy and analytics), FiscalNote (machine learning and analytics for government relations), and SoloSegment (web site search and content optimization). 

Frauke Kreuter
Co-Director, Social Data Science Center (SoDa)
Professor, Joint Program in Survey Methodology
University of Maryland
University of MarylandChair of Statistics and Data Science in Social Sciences and the Humanities
Ludwig-Maximilians-University of Munich