publications | Fabio Pernisi

2024

Monica2024

MoniCA: Monitoring Coverage, Attitudes and Accessibility of Italian Measures in Response to COVID-19

Giuseppe Attanasio , Fabio Pernisi , and Debora Nozza

2024

Abs PDF

Modern social media have been long observed as a mirror for public discourse and opinions. Especially in the face of exceptional events, computational language tools are valuable for understanding public sentiment and reacting quickly. During the 2019 coronavirus pandemic, the Italian government issued a series of financial measures, each unique in target, requirements, and benefits. However, despite the many recipients, how such measures were perceived and whether they eventually hit their goal have yet to be understood. In this resource paper, we document the collection and release of MoniCA, a new social media dataset for MONItoring Coverage, Attitudes, and accessibility to such measures. Data include approximately ten thousand posts discussing a variety of measures in ten months. For each post, we collected annotations for sentiment, emotion, and contextual aspects. We conducted an extensive analysis using computational models to learn these aspects from text. We release a compliant version of the dataset to foster future research on computational approaches for understanding public opinion about government measures.
SafetyPrompts

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

Paul Röttger , Fabio Pernisi , Bertie Vidgen , and 1 more author

2024

Abs PDF

The last two years have seen a rapid growth in concerns around the safety of large language models (LLMs). Researchers and practitioners have met these concerns by introducing an abundance of new datasets for evaluating and improving LLM safety. However, much of this work has happened in parallel, and with very different goals in mind, ranging from the mitigation of near-term risks around bias and toxic content generation to the assessment of longer-term catastrophic risk potential. This makes it difficult for researchers and practitioners to find the most relevant datasets for a given use case, and to identify gaps in dataset coverage that future work may fill. To remedy these issues, we conduct a first systematic review of open datasets for evaluating and improving LLM safety. We review 102 datasets, which we identified through an iterative and community-driven process over the course of several months. We highlight patterns and trends, such as a a trend towards fully synthetic datasets, as well as gaps in dataset coverage, such as a clear lack of non-English datasets. We also examine how LLM safety datasets are used in practice – in LLM release publications and popular LLM benchmarks – finding that current evaluation practices are highly idiosyncratic and make use of only a small fraction of available datasets. Our contributions are based on this http URL, a living catalogue of open datasets for LLM safety, which we commit to updating continuously as the field of LLM safety develops.
Compromesso! Italian Many-Shot Jailbreaks undermine the safety of Large Language Models

Fabio Pernisi , Dirk Hovy , and Paul R�ttger

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop) , Aug 2024

Abs

As diverse linguistic communities and users adopt Large Language Models (LLMs), assessing their safety across languages becomes critical. Despite ongoing efforts to align these models with safe and ethical guidelines, they can still be induced into unsafe behavior with jailbreaking, a technique in which models are prompted to act outside their operational guidelines. What research has been conducted on these vulnerabilities was predominantly on English, limiting the understanding of LLM behavior in other languages. We address this gap by investigating Many-Shot Jailbreaking (MSJ) in Italian, underscoring the importance of understanding LLM behavior in different languages. We base our analysis on a newly created Italian dataset to identify unique safety vulnerabilities in 4 families of open-source LLMs.We find that the models exhibit unsafe behaviors even with minimal exposure to harmful prompts, and–more alarmingly–this tendency rapidly escalates with more demonstrations.