Report of the 1st Workshop on Generative AI and Law [blog]

Authors A. Feder Cooper*, Katherine Lee*, James Grimmelmann*, Daphne Ippolito* and 31 other authors across 25+ institutions *Equal contribution. All of the listed authors on the report contributed to the workshop upon which the report and this associated blog post are based, but they and their organizations do not necessarily endorse all of the specific claims in this report. Correspondence: [email protected]. We thank Matthew Jagielski, Andreas Terzis, and Jonathan Zittrain for feedback on this report.

Back to GenLaw ↩︎

The Report of the 1st Workshop on Generative AI and Law reflects the synthesis of ideas from our Day 2 roundtables. The report begins with a brief framing note about the impact of Generative AI on Law, then goes on to suggest useful components of a shared knowledge base, an outline of ways that Generative AI is unique (and the ways it isn’t), a preliminary taxonomy of legal issues, and a concrete research agenda at the intersection of Generative AI and law.
The report closes with some brief takeaways about this emerging field.

The Impact of Generative AI on Law

Section 2

We begin the report with some background that helps situate why Generative AI is going to have such an impact on law. It’s true that Generative AI is “generative” because it generates text, images, audio, or other types of output. But it is also “generative” in the sense of Jonathan Zittrain’s theory of generative technologies from 2008: it has the “capacity to produce unanticipated change through unfiltered contributions from broad and varied audiences”(Zittrain 2008). It provides enormous leverage across a wide range of tasks, is readily built on by a huge range of users, and facilitates rapid iterative improvement as those users share their innovations with each other. As a result, generative-AI systems will be both immensely societally significant — too significant for governments to ignore or to delay dealing with — and present an immensely broad range of legal issues.

Developing a Shared Knowledge Base

Section 3

Crafting Glossaries and Metaphors

At GenLaw, it quickly became apparent that the two communities may share words, but these words may not share meanings. We therefore developed a glossary and list of metaphors to make definitions of terms concrete, so that we can be sure that we’re talking about the same things (and talking about them precisely) Drawing from the report, we use “harm” as an example of an overloaded term — on that has a colloquial understanding that can be mistaken for a term-of-art in the law:

Many technologists were not aware of the importance of harms as a specific and consequential concept in law, rather than a general, non-specific notion of unfavorable outcomes. We found our way to common understandings only over the course of our conversations, and often only after many false starts.

We also suggest that metaphors can be a useful abstraction for communication between communities, since they play a central role to how both machine-learning experts and lawyers communicate among themselves. Lawyers use metaphors as rational frameworks for thinking through the relevant similarities and differences between cases. In the machine-learning community, experts use metaphors all the time to give intuitions for technical processes. For example, generative models are said to “learn” or “make collages” of training data. This imaginative naming is often intentional; technical processes are often named for the human behaviors or science-fiction tropes that inspired them.

However, we caution that metaphors can also simplify and distort (as is the case with the metaphor of a collage). For better communication across fields, it can nevertheless be instructive to understand the ways a metaphor is appropriate and where it falls short.

Understanding Evolving Business Models

Section 3.3

Generative AI is not a single entity or business model. There are many different types of Generative AI built by a diversity of actors, potentially in partnership. To get a better understanding of the array of generative-AI systems and the ways that they’re produced, we can look at existing and emerging business models: 1) business-to-consumer (B2C) hosted services (including direct to consumer applications, e.g., OpenAI’s ChatGPT, Google’s Bard) and application programming interfaces (APIs)); 2) business-to-business (B2B) integration with hosted services ( via direct partnership/integration or through the use of APIs), 3) products derived from open-source software, models, and datasets (e.g., ,some versions of Stable Diffusion, offered by Stability AI, are open sourced), and 4) companies that operate at specific points in the generative-AI supply chain (Lee, Cooper, and Grimmelmann 2023) (e.g., companies that work specifically on datasets, training diagnostics, and training and deployment). We go into more detail on each of these in the report.

Pinpointing Unique Aspects of GenAI

Section 4

With so many different types of generative-AI systems, some of the lawyers in the room asked the ML experts to clarify some commonalities that make the “magic” of Generative AI. We discussed three aspects:

  1. Open-ended tasks: Generative AI models are trained with open-ended tasks in mind, rather than narrowly-defined tasks. This means that the same model could be used for translating between languages as could be used for question answering.
  2. Multi-stage pipelines: In part as a result of training with open-ended tasks, models are trained in a multi-stage training pipeline containing stages like: pre-training, fine-tuning, and alignment (e.g., RLHF). The delineations between these different stages is flexible, but the result is to create base models that have a “base” of knowledge about the world within the model. This training pipeline is part of a larger supply chain, which further contributes to novel dynamics in the production and use of generative-AI systems (Lee, Cooper, and Grimmelmann 2023).
  3. Scale: Finally, arguably the most discussed element of Generative AI was the role of scale: scale of datasets, of models, of the number of generations, and of compute.
Pre-training
One of the ah-ha moments we had at the workshop was when we realized that technologists and legal scholars understood the term pre-training to mean very different things. Technologists use the term pre-training to refer to an early, general-purpose phase of the model training process, but legal scholars assumed that the term referred to a data preparation stage prior to and independent of training. Clearing up that confusion made the importance of pre-training models much more apparent.

A Preliminary Taxonomy of Legal Issues

Section 5

We also outlined some of the legal issues Generative AI raises. We focus on the workshop’s intended scope of privacy and intellectual property (IP) issues. This is by no means a comprehensive taxonomy of legal issues nor of harms (legally cognizable or otherwise). Other reports have made significant attempts to catalog such harms from Generative AI, for example Fergusson et al. (2023).

Not all issues that Generative AI raises are new. Generative AI can be used to perform many tasks for which other AI/ML technology has already been commonly used.For example, instead of using a purpose-built sentiment-analysis model, one might simply prompt an LLM with labeled examples of text and ask it to classify text of interest; one could use a trained LLM to answer questions with “yes” or “no” answers (i.e., to perform classification).

In the report, we focus on four areas where Generative AI raises novel challenges for the law: intent, privacy, misinformation and disinformation, and intellectual property.

For more detail on each of these, please see the report.

Volition
Human volition plays an important and subtle role in defining IP infringement. For example, copyright infringement normally requires that a human intentionally made a copy of a protected work, but not that the human was consciously aware that they were infringing. Generative-AI systems may occasionally produce outputs that look like duplicates of the training data. Some participants at GenLaw were concerned that it may be easy to deflect the role of human-made design choices by making such choices seem “internal” to the system (when, in fact, such choices are typically not foregone conclusions or strict technical requirements).

Toward a Long-Term Research Agenda

Section 6

Through our discussions, we elicited several important and promising research directions. Each of these directions brings forth challenges that require engagement from law and machine-learning experts, and likely many other disciplines as well.

  1. Centralization versus Decentralization: First, who will build the components of future generative-AI systems? How centralized or decentralized will these actors be? This is as much a technical question as it is a business and legal question. Technical constraints, such as the design of a dataset, inform the logistics of the supply chain: who builds the component, what gets built, and how. Improvements in synthetic data may enable well-resourced actors to generate their own training data. Existing and emerging business models create incentives for particular modes of interaction among players. Finally, every important potential bottleneck in Generative AI – from datasets to compute to models and beyond – will be the focus of close scrutiny. These questions cannot be discussed intelligently without contributions from both technical and legal scholars.

  2. Rules, Standards, Reasonableness, and Best Practices: In some cases, we have standards of care. For example, HIPAA strictly regulates which kinds of data are treated as personally identifying and subject to stringent security standards. In other cases, we rely on reasonable standards of practice; but, what is reasonable is often both context-dependent and evolving.. What is reasonable will depend on technical advancements and constraints. This is an area where we feel that collaboration between legal and technical experts and policymakers is urgently needed.

  3. Notice and Takedown ≠ Machine Unlearning: Notice and takedown requests are particularly challenging for generative-AI models. The impact of each example in the training data is dispersed throughout the model once it is trained and cannot be easily traced. There are entire subfields of machine learning devoted to problems like these, such as machine unlearning and attribution. However, both machine unlearning and attribution are very young fields, and their strategies are (for the most part) not yet computationally feasible to implement in practice for deployed generative-AI systems. There is, of course, intense (and growing) investment in this area.

  4. Evaluation Metrics: Effective ways to evaluate generative-AI systems currently remain elusive. System capabilities and harms are not readily quantifiable; designing useful metrics will be an important, related area of research for Generative AI and law (and will also, in turn, influence what we understand to be reasonable system behaviors).

What’s next for GenLaw?

As is clear from the diversity of issues discussed within these topics, it is difficult to pithily sum up the main takeaways of GenLaw. (Nevertheless, we attempt to do so in the report.)

To close here, we just want to say that we’re so thrilled to have been able to have hosted the first GenLaw, host this discussion and for the community that’s grown up around it. We hope that you refer to and share our report, glossary, metaphors, resources, law review article, and explainers series (written for and with workshop participants) as reference material for your own learning, teaching materials, and research.

Right now, we’re growing GenLaw into a nonprofit, which will be a home for research, education, and interdisciplinary discussion. We will continue to create resources for both a general audience and subject-matter experts. So far, we’ve brought together experts across Generative AI, law, policy, and other computer-science disciplines from 25 different institutions, and we are excited to continue engaging with experts across industry, academia, and government. While our first event and materials have had a U.S.- based orientation, we are actively focusing on expanding our engagement globally.

Stay tuned for more from us. You can subscribe to updates here.

And, of course, a big thank you to our sponsors: Google, Microsoft, Schmidt Futures, OpenAI, Anthropic, Cornell Law School, and ML Collective.

References

Brown, Hannah, Katherine Lee, Fatemehsadat Mireshghallah, Reza Shokri, and Florian Tramèr. 2022. What Does It Mean for a Language Model to Preserve Privacy? In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 2280–92. FAccT ’22. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3531146.3534642.
Callison-Burch, Chris. 2023. Understanding Generative Artificial Intelligence and Its Relationship to Copyright.” University of Pennsylvania, School of Engineering; Applied Sciences, Department of Computer; Information Science; Testimony before The U.S. House of Representatives Judiciary Committee, Subcommittee on Courts, Intellectual Property, and the Internet.
Carlini, Nicholas, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, and Eric Wallace. 2023. Extracting Training Data from Diffusion Models.” https://arxiv.org/abs/2301.13188.
Fergusson, Grant, Caitriona Fitzgerald, Chris Frascella, Megan Iorio, Tom McBrien, Calli Schroeder, Ben Winters, and Enid Zhou. 2023. Generating Harms: Generative AI’s Impact & Paths Forward.” Electronic Privacy Information Center.
Lee, Katherine, A. Feder Cooper, and James Grimmelmann. 2023. Talkin’ ’Bout AI Generation: Copyright and the Generative-AI Supply Chain.” arXiv Preprint arXiv:2309.08133.
Lipton, Zachary. 2023. My Statement to the US Senate AI Insight Forum on Privacy and Liability.” https://www.abridge.com/blog/ai-policy-conversation.
Sag, Matthew. 2023. Copyright Safety for Generative AI.” Houston Law Review.
Samuelson, Pamela. 2023. Generative AI meets copyright.” Science 381 (6654): 158–61. https://doi.org/10.1126/science.adi0656.
Vyas, Nikhil, Sham Kakade, and Boaz Barak. 2023. On Provable Copyright Protection for Generative Models.” https://arxiv.org/abs/2302.10870.
Zittrain, Jonathan. 2008. The Future of the Internet–and How to Stop It. USA: Yale University Press.