Using large language models to generate trolling at scale

This is Jessica. Large language models have been getting a lot of negative attention–for being brittle and in need of human curation, for generating socially undesirable outputs–sometimes even on this blog. So I figured I’d highlight a recent application that cleverly exploits their ability to generate toxic word vomit: using them to speculate about possible implications of new designs for social computing platforms.  

In a new paper on using LLMs to generate “social simulacra,” Joon Sung Park et al. write: 

Social computing prototypes probe the social behaviors that may arise in an envisioned system design. This prototyping practice is currently limited to recruiting small groups of people. Unfortunately, many challenges do not arise until a system is populated at a larger scale. Can a designer understand how a social system might behave when populated, and make adjustments to the design before the system falls prey to such challenges? We introduce social simulacra, a prototyping technique that generates a breadth of realistic social interactions that may emerge when a social computing system is populated. Social simulacra take as input the designer’s description of a community’s design — goal, rules, and member personas — and produce as output an instance of that design with simulated behavior, including posts, replies, and anti-social behaviors. We demonstrate that social simulacra shift the behaviors that they generate appropriately in response to design changes, and that they enable exploration of “what if?” scenarios where community members or moderators intervene. To power social simulacra, we contribute techniques for prompting a large language model to generate thousands of distinct community members and their social interactions with each other; these techniques are enabled by the observation that large language models’ training data already includes a wide variety of positive and negative behavior on social media platforms. In evaluations, we show that participants are often unable to distinguish social simulacra from actual community behavior and that social computing designers successfully refine their social computing designs when using social simulacra.

The idea is a clever solution to sampling issues that currently limit researchers’ ability to foresee how a new social platform might be used: it’s difficult to get the number of test users for a prototype that are needed for certain behaviors to manifest, and it’s hard to match the sample makeup to the target deployment population. Using the social simulacra approach, the designer still has to supply some notion of target users, in the form of “seed personas” passed as input to the LLM along with a short description of the system goal (“social commentary and politics”) and any hard or soft rules (“be kind,” “no posting of advertisements”), but then the prototype platform is populated with text associated with a large number of hypothetical users.       

I like how the alignment between the type of data LLMs injest and the context for using the approach makes the often criticized potential for LLMs to push more racist, sexist slurs into the world a feature rather than a bug. If you want to see how your plan for a new platform could be completely derailed by trolls, who better to ask than an LLM? 

It also provides a way to test interventions at scale: “social simulacra can surface a larger space of possible outcomes and enable the designer to explore how design changes might shift them. Likewise, social simulacra allow a designer to explore ‘what if?’ scenarios where they probe how a thread might react if they engaged in a moderation action or replied to one of the comments.” The researcher can intervene at the conversation level to see what happens, and can get a sense of uncertainty in the generating process by using a “multiverse” function to generate many instantiations of an outcome

The idea that we can treat LLM generated social simulacra as predictive of human behavior makes it philosophically intriguing. When I first learned about this project late last year when visiting Stanford, my first question was, What do the ethicists think about doing that? The authors make clear that they aren’t necessarily claiming faithfulness to real world behavior: 

Social simulacra do not aim to predict what is absolutely going to happen in the future – like many early prototyping techniques, perfect resemblance to reality is not the goal. […] However, social simulacra offer designers a tool for testing their intuitions about the breadth of possible social behaviors that may populate their community, from model citizen behaviors to various edge cases that ultimately become the cracks that collapse a community. In so doing, social simulacra, such as those that we have explored here, expand the role of experience prototypes for social computing systems and the set of tools avail- able for designing social interactions, which inspired the original conceptualization of wicked problems [64]. 

By sidestepping the question, they’re asking for something of a leap of faith, but they provide a few evaluations that suggest the generated output is useful. They show 50 people pairs of subreddit conversations, one real, one LLM generated, and find that on average people can identify the real example only slightly better than chance. They also find that potential designers of social computing systems find them helpful for iterating on designs and imagining possible futures. While checking whether humans can discriminate between real and generated social behavior is obviously relevant, it would be nice to see more attempts at characterizing statistical properties of the generated text relative to real world behavior. For example, I’d be curious to see, both for thread level and community level dynamics, where the observed behavior on real social systems falls in the larger space of generated possibilities for the same input. Is it more or less extreme in any ways? Naturally there will be many degrees of freedom in defining such comparisons, but maybe one could observe biases in generated text that a human might miss when reviewing a small set of examples. The paper does summarize some weaknesses observed in the generated text, and mentions using a qualitative analysis to compare real and generated subreddits, but more of this kind of comparison would be welcome. 

I haven’t tried it, but you can play with generating simulacra here: http://social-simulacra.herokuapp.com/

Leave a Reply

Your email address will not be published. Required fields are marked *