User Communities

Attitudes Towards Pervasive Data Research

Lead: Jessica Vitak & Matthew Bietz with Casey Fiesler & Michael Zimmer

A series of studies will investigate how those who create pervasive data—users of social media, fitness trackers, etc.—feel about their data being used in research. For example, in preliminary work in this area, PI Fiesler conducted a survey of Twitter users about their level of comfort with certain uses of their tweets in research. Findings for this work revealed that participants are largely unaware that researchers are permitted to use public data, and that their attitudes about the practice are highly context-dependent. Findings also pointed to the potential for systems design, such as a tool for keeping track of users whose data was collected and automatically sharing results at the conclusion of the work. Similarly, PI Bietz has conducted surveys, focus groups, and interviews to examine individuals’ attitudes toward the privacy of personal health data and the use of that data for health research.

This work reveals that people’s feelings about pervasive data research are contingent on the kind of research being done and whether it is being done for commercial purposes or for the public good. To move work on this topic forward, we plan to analyze the interviews with user communities and the representative survey of social media users to examine whether there are differences based on the type of community that is being studied: Is pervasive data research more acceptable to users on some platforms than on others? Could this be dependent on, for example, the size of the community or the subject matter of the data? Additionally, we will build a rich dataset focused on the reaction of social media users to the idea of researchers using their data. We will adapt the method, which uses a quasi-experimental qualitative approach, to understand the public’s concerns when presented with hypothetical research situations. This method will allow us to better understand how the contingencies identified in prior work affect attitudes toward pervasive data research. Altogether, an understanding of how pervasive data research is perceived and could impact the people creating this data is critical to designing the best ethical practices.

Representation of and Reactions to Pervasive Data Research in the Media

Lead: Casey Fiesler with Michael Zimmer

In recent years, the ethics of social computing research has become an increasingly mainstream topic. Controversies such as the Facebook emotional contagion study and the OKCupid public dataset have been covered extensively in the news. Public reaction to these studies has provided a great deal of anecdotal evidence about how the general public feels about this type of research, and also fueled more conversation about ethics within our research community. After assembling a large dataset of news articles related to social computing research as well as the comments on these articles, PIs will conduct a content analysis focused on revealing attitudes and perceptions across media types. A better understanding of how this research is both presented and perceived is critical to thinking through best practices both for research and how we present this research to the world.

Methods and Tools Towards Personalized Privacy and Data Ethics

Lead: Arvind Narayanan with Katie Shilton & Casey Fiesler

Appropriate use of data is notoriously hard to define and even harder to operationalize. Participants have different views on questions such as: Which data attributes or inferences are sensitive? How should notice and consent be implemented and what are the limits? For example, is it acceptable to continue to use data after it has been withdrawn from the public sphere (e.g., deleted Tweets)? Are some uses of data problematic by themselves or only if they can lead to concrete harms? We are developing a modeling language for expressing privacy characterizations elicited via surveys of users, extending work that specifies laws and contextual integrity norms in a formal language. We are also building software that helps researchers incorporate and respect participant-specific data-use restrictions during data analysis. For example, we are developing a method and tool to simulate how social science research conclusions would be affected if a small fraction of users/participants were to withdraw data from the public sphere; a tool that uses natural-language processing techniques to semi-automatically scrub or modify sensitive information from free-form textual data, according to a privacy specification; and a method to automatically infer “context” and “roles”—key elements of contextual integrity—so that data analysis may respect the norms of the implied context.