Generative AI and Contextual Confidence: Strategies to Promote Contextual Confidence

cover
8 Feb 2024

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Shrey Jain, Microsoft Research Special Projects;

(2) Zo¨e Hitzig, Harvard Society of Fellows & OpenAI;

(3) Pamela Mishkin, OpenAI.

Table of Links

Abstract & Introduction

Challenges to Contextual Confidence from Generative AI

Strategies to Promote Contextual Confidence

Discussion, Acknowledgements and References

3 Strategies to Promote Contextual Confidence

One possible response to the threats outlined in section 2 is to retreat from digital communication. Indeed, a shift back toward in-person communications can mitigate many of generative AI’s threats to contextual confidence. However, retreat, as a strategy, misses a momentous opportunity: through retreat, we’d fail to view the challenges we’ve outlined as an occasion to develop new tools and policies to promote contextual confidence in all forms of communication. As such, the remainder of this section centers on more ambitious strategies for mitigating the challenges outlined in section 2. The strategies that we discuss here fall into two categories.

First, there are containment strategies. These are strategies that aim to protect and identify context in settings where it is currently threatened. Containment strategies are, to some extent, reactive – they aim to reassert the importance of context in digital communications, countering the expectations that the internet gave us. Containment technologies help to override norms and expectations that were established somewhat arbitrarily in the early days of the internet.[16] For instance, tools for tracking the provenance of content are containment strategies because they aim to override a norm governing information exchange that has been set in place – the norm that content on the internet need not be traceable to its origin. The containment strategies we discuss below are largely strategies that already exist or are in development.

The second set of strategies fall under the heading of mobilization. Where containment strategies are reactive in the face of challenges to contextual confidence, mobilization strategies are proactive. They work to establish context in novel settings that have arisen or will arise with generative AI. In doing so, these strategies respond to an opportunity to establish new norms and expectations of information exchange. For an example of a mobilization technology, consider watermarking – a tool that flags content (image, text or video) as having been generated by AI. This tool mobilizes confidence in that it offers an opportunity to set a better norm in a new form of information exchange, embedding context in generative AI-enabled communications. In contrast to containment strategies, many of which are already in discussion and development as responses to issues that existed before generative AI, many of the mobilization strategies we discuss here have not been developed or deployed at a meaningful scale.

In the remainder of this section, we discuss a range of specific strategies (technologies and policies) that help to contain and pre-empt AI’s challenges to contextual confidence. The second and third rows of Table 1 enumerate the specific containment and mobilization strategies we discuss.

In what follows, we discuss these strategies in turn. Our discussion is not intended to be exhaustive. In fact, our hope is that by offering a framework for understanding the strategies that were apparent to us, we make it easier for others to build out and situate other strategies that we have not discussed. In addition, the value of these strategies requires further evaluation in unique combinations, as their blind use may even undermine their intended function [38, 39].

3.1 Containment Strategies to Identify Context

3.1.1 Content provenance

Content provenance tools trace information from its origins through its lifecycle. The leading standard for content provenance is the Coalition for Content Provenance and Authenticity (C2PA) [40] standard. This standard grew out of Project Origin [41] and the Content Authenticity Initiative [42] provenance projects. The origins of content and a history of its alterations are tracked through a “manifest,” a file that is digitally signed by the cryptographic keys of each entity that modifies the content. Content provenance solutions rely heavily on digital identity attestations (as discussed in section 3.1.3 and section 3.1.4) to “tag” the originator of content. But content provenance goes further, providing not just a “tag” on content indicating from whom the information originated, but also verifiably tracing content along with all of its alterations as it moves from party to party.

Existing implementations of the C2PA standard as seen with Project Origin require media companies to trace the origin of content from media capture on a hardware device through to publication online. Dependency on centralized media distributors such as news platforms or social media sites such as YouTube, Facebook, or X (formerly Twitter) to adopt content provenance standards is one of the current bottlenecks of wider adoption of C2PA. Illustrations of decentralized provenance tools are emerging as demonstrated by the work of Cross Platform Origin of Content (XPOC) for origin of content verification [43].

Throughout the history of the internet, content has existed largely detached from its original context, and content provenance tools can, accordingly, contain challenges to contextual confidence. The detachment of content from its source and modifiers undermines a participant’s confidence in identifying the authentic context within which they are communicating.

Challenges addressed: impersonate the identity of an individual without their consent; create mimetic models; imitate members of specific social or cultural groups.

3.1.2 Community Notes

Community Notes is an open-source collaborative platform, currently implemented at X (formerly Twitter) that allows communities to add crowd-sourced context to specific pieces of digital information. It has been an active feature on X worldwide since December 2022, and its success has leaders in technology asking whether algorithms of a similar sort can be adopted in other settings [44].

Community Notes is designed to counteract the prevailing, context-free norms of the internet, where information is often presented out of context. It does this by enabling a subset of X users to actively participate in fact-checking and context provision. This reinforces the idea that every piece of content should be subjected to questioning and verification through broad consensus.

Challenges addressed: disseminate content in a scalable, automated, and targeted way; fail to accurately represent the origin of content; misrepresent members of specific social or cultural groups; leak or infer information about specific individuals or groups without their consent.

3.1.3 Centralized digital identities

Centralized digital identities are digital representations or attestations of a participant’s identity, issued by a centralized authority. Two common issuers of centralized digital identities are government actors and private companies.

Digital government attestations can include digital birth certificates, health cards, driver’s licenses, passports, and voter cards. Many countries currently use government-issued digital identities to access various government services (Estonia [45], India [46] and Singapore [47] are prominent examples). These digital identity solutions rely on cryptographic primitives (e.g. selective disclosure JWT [48], U-Prove [49], BBS+ signatures [50]) that ensure one attestation cannot be associated or traced back to another such as verifiable credentials or mobile driver’s licences [51, 52, 53].

Private companies have also become a primary issuer of digital identities. For example, it is common to sign into various third party sites with Google, Microsoft, or Facebook credentials as a form of authentication. The number of non-government issued digital identity solutions is growing – teams at Apple [54], LinkedIn [55], and Worldcoin [56], for example, have each launched their own digital identity solutions in the last year.

When participants use centralized digital identities to establish their identity online, their communication partners can verifiably identify displayed attributes of the person with whom they are communicating. Thus, centrally issued-digital identities help to identify context – they could be used to “tag” outgoing communications from a particular person or collective, whether on email, phone, text, social media or the internet. Centralized digital identities aim to contain a challenge to contextual confidence in that they override the norms of anonymity or pseudonymity that typically reign on the internet.

The perspective of contextual confidence also helps to illuminate the downsides of centralized digital identity solutions. The centralization of power, either in a government or a corporation, raises concerns about the extent to which information be used by the centralized actor outside the intended context. Consider, for instance, India’s national digital identity system, Aadhaar. Signing up for Aadhaar was presented as optional, but it was also made to be a prerequisite for filing taxes. Given the way in which Aadhaar tracks individuals’ activities across a wide range of contexts, many have voiced concerns about surveillance and the vulnerability of this database to attacks [57].

Or, in the non-government case, consider, Worldcoin – a company whose mission is to be a global issuer of digital identity. Worldcoin’s governance structure reflects significant centralization: a small group controls a large portion of the decision-making power [58]. When checks on the power of a centralized issuer of identity are limited, and when the issuer is operating supranationally, the work that standards bodies have done to enhance individual control of information (e.g. GDPR and restrictions on third-party cookies) could be undermined.

Challenges addressed: impersonate the identity of an individual without their consent; create mimetic models; imitate members of specific social or cultural groups; falsely represent grassroots support or consensus (astroturfing).

3.1.4 Identity as a social intersection

To avoid the centralization of power in a single issuer of digital identity, many have proposed digital identity solutions that harness social networks as a source of verification. Verifying identity as a social intersection is the process of identifying and authenticating participants through a set of attestations. The set of attestations do not have to be rooted in centralized authorities, but can derive from any context an actor belongs to, including academic institutions, workplaces, social circles or friendships, for example. This idea is anchored in the premise that human identities are inherently social [59].

The most common example of verifying through a social intersection is the Open Authorization (OAuth) protocol [60]. OAuth is the most popular open authentication system for federated identity, and allows users to choose from a wide range of providers in order to access third-party applications. By “signing-in” with their Google or X (formerly Twitter) account, users can bootstrap their identity without providing any new information about themselves. Gitcoin Passport is another an early example of an identity aggregator [61], along with other decentralized identity projects like Ethereum Name Service [62] or Spruce ID [63], or “proof of personhood” protocols like Proof of Humanity [64] or WorldID [56]. Many identity protocols have their own authentication strategies. These strategies are surveyed in [65] and [66].

By approaching identity verification as a social intersection, we can adapt both the authentication process and the methods used to “tag” communication outputs according to the specific context. This method contains challenges to contextual confidence, by superseding the prevailing culture of pseudonymity on the internet.

There are many open questions about how to implement a social identity verification system, which the contextual confidence perspective throws into relief. For example, should all social attestations be equally valuable? In a communication landscape where AI is prevalent, it may become easier to subvert these social attestation systems through AI-generated attestations or AI-enabled collusion. We discuss collusion-resistant identity systems in section 3.2.4.

Challenges addressed: impersonate the identity of an individual without their consent; create mimetic models; imitate members of specific social or cultural groups; falsely represent grassroots support or consensus.

3.2 Mobilization Strategies to Identify Context

3.2.1 Watermarking

Watermarking is a technique intended to allow for disclosure and detection of the use of an AI model. It works by embedding a hidden pattern or “watermark” into digital content, such as text, images, or videos, that is typically imperceptible to humans but can be algorithmically detected.

There are many different proposed approaches to watermarking for AI models.[17] The goal of watermarking is to enable receivers to identify whether some content has been generated by a specific model. The hope is that such techniques will help to prevent various forms of deceptive misuse of AI models. There are many obstacles in the way of successful and reliable watermarking for AI models.[18] Nonetheless, it is a promising technology for that may help to mobilize contextual confidence and set norms around disclosure and detection of AI model use [77].

Challenges addressed: impersonate the identity of an individual without their consent; create mimetic models; imitate another AI model.

3.2.2 Model verification

As generative AI models continue to proliferate – each uniquely characterized in terms of base weights – it will become important for users of these models to verify they are using the intended model, and not an imitation. When we discuss “verifiable models,” we are primarily focused on two areas of verification: (i) model weights, and (ii) model inference. We also discuss data verification in section 3.4.2.

For proprietary models, model weights are protected within the context of the AI lab that developed the model. Although these weights cannot be released, we often want to know that the same weights are being used across time and that we are dealing with the intended model and not an imitation. This fact can be proven to users by adding weight-based artifacts to an execution of a model (i.e., a model provider can include the hash of the model weights with each execution). Weight-based artifacts can help AI model developers to protect users from imitation models while still protecting the model’s sensitive information. This approach works well when model providers are trusted to faithfully report model weight derivatives (i.e., model signifiers). However, in contexts where vendors want true verifiability of model usage, we can draw on techniques in (ii).

Proving model inference relies on a set of zero-knowledge machine-learning (ZK-ML) tools [78, 79, 80] that enable the verification of provable claims such as “I ran this publicly available neural network on some private data and it produced this output” or “I ran my private neural network on some public data and it produced this output” or “I correctly ran this publicly available network on some public data and it produced this output”.[19] These claims are cryptographically secure proofs of inferences that can be cheaply verified by anyone without accessing original model weights. Currently, proving model inference using ZK-ML tools is possible, but still slow for large models. Researchers are actively working to improve the performance of ZK-ML tools for large models [92, 93].

Model verification aims to mobilize contextual confidence by proactively addressing the absence of established norms for human-AI interactions. By enabling verifiable identification of intended AI models and inferences, model verification promotes contextual confidence in these novel interactions.

Challenges addressed: imitate another AI model.

3.2.3 Relational passwords

As generative AI-enabled social engineering attacks proliferate, it will be important to have tools to bilaterally authenticate social relationships. Sophisticated scams that target close relationships – i.e. when a parent receives an urgent call from a child in trouble, where the child’s voice is cloned by an AI with high fidelity – demand new forms of social authentication. What we call “relational passwords” are passwords – words or phrases or call-and-answer routines – formed between two people who frequently communicate. When there is any suspicion that one might be speaking to an impersonator rather than the purported individual, the suspicious party can ask for the relational password. For instance, the parent could ask their child the name of their third grade teacher. These passwords can be used to thwart a wide range of social engineering attacks.

The paradox of good passwords is that they need to be hard to guess but easy to remember. Relational passwords have the benefit of emerging out of a specific context (by being relational). A good relational password further takes advantage of context, drawing on the relationship itself to come up with something that is hard to guess yet easy to remember, such as a shared memory like “The place we first met in the summer of 2002.” For more intimate relationships, it is easier to come up with good relational passwords – they draw on shared history and experience. For more distant relationships (i.e. within and between organizations), it can be harder to come up with something that members know but a hacker cannot learn.

Digital authentication manifests in different forms as either something you know, something you are, or something you have. The norms regarding “something you know” have traditionally been demonstrated through often “contextless” compositions of characters and numbers or the frequent use of a single contextual “recovery” password (e.g. mother’s maiden name, elementary school, childhood street). Relational passwords aim to set higher expectations for the norms of “something you know” by embedding context into what each of these “passwords” unlocks, mobilizing contextual confidence in a communication landscape where generative AI is prevalent.

Challenges addressed: impersonate the identity of an individual without their consent; create mimetic models; imitate members of specific social or cultural groups.

3.2.4 Collusion-resistant digital identities

As mentioned in section 3.1.4, social verification of identity raises many opportunities for malicious influence, especially in a communication landscape where AI is prevalent. Generative AI can help dishonest actors subvert the system either by creating false attestations, or by convincing authentic participants to give inauthentic accounts false attestations. Unlike other identity systems, it is not just the volume of attestations that matter but rather the breadth of their social origin. Multiple attestations from a singular social circle might indicate potential collusion, whereas diverse sources of attestations often contribute to a more robust authentication. Collusion-resistant digital identity schemes aim to calibrate an actor’s influence within a given social context. Collusion-resistant identity solutions thus address the biases that can arise when all attestations are treated equally. This approach discounts influence in situations where there is a high degree of correlation within a context to mitigate collusion [94]. Early implementations of collusion-resistant identities are illustrated in the context of voting [95] and communication channels [96].

There are many open questions about how to discount influence. While a collusion-resistant digital identity system’s design aims for robust authentication through diverse attestations, the precise mechanisms for discounting influence must remain undisclosed to deter technically sophisticated actors from circumventing the protocols. Collusion-resistant identities aim to mobilize contextual confidence by establishing new strategies that incorporate the quantity of social intersections as part of the verification process, anticipating new challenges that generative AI mounts against social identity solutions.

Challenges addressed: falsely represent grassroots support or consensus (astroturfing); imitate members of specific social or cultural groups; impersonate the identity of an individual without their consent.

3.3 Containment Strategies to Protect Context

3.3.1 Usage and content policies

Usage and content policies can range from defining which users can access a specific model (who, where, when) or system, to describing and enforcing “appropriate” or “allowed” uses of that model or system (how). Specialized models, such as those developed for sectors like the military or healthcare, might have unique access prerequisites. These prerequisites could include specific logins, clearances, or tailored classifiers that guide the model’s application. Additionally, access might be influenced by users’ expectations regarding how their interaction with the model will be monitored.[20] In AI deployment, model access is sometimes restricted based on a user’s IP address [101] or subscription licenses [102]. Other model deployments require professional credentials, such as clinician status to access a medical model [103, 104]. Usage policies can lead to equity concerns that must be weighed against the benefits of safe, restricted access.[21]

Usage policies may contain challenges to contextual confidence by bringing offline tools – identity checks, knowledge restrictions, content ratings, to AI systems. Historically, a significant portion of the internet was openly accessible, allowing universal access to tools, platforms, and websites. However, such unrestricted exploration occasionally resulted in potential misuse.

Challenges addressed: grant access to a context-specific model without context-specific restrictions; misrepresent members of specific social or cultural groups.

3.3.2 Rate-limiting communication

Rate-limiting policies, typically deployed by a platform or other digital service provider, impose a cost on information flowing to or from a user. These costs may be measured in money or time, depending on what is more feasible or effective in a given context. Rate-limiting defends against high-volumes of content being shared from a single source. Examples include transaction fees on blockchain networks [106], viewing limits on social media platforms [107], and spam prevention protocols in email [108].

When improperly calibrated, rate limits can hinder genuine communication or favor actors with abundant resources, inadvertently deterring participation and obstructing the open exchange of information traditionally associated with the internet. When properly tuned to the setting, ratelimiting can contain challenges to contextual confidence initiated by the early internet culture’s dogmatic pursuit of the “freedom of information.” While the pursuit of freedom of information heightened our ability to communicate freely and openly across vast distances, it also led to unintended setbacks in our ability to communicate. It established the norm that communication on the internet is not protected from reuse and recombination outside its intended context, a norm which has in turn, and paradoxically, diminished the willingness to speak openly and freely on the internet.

Challenges addressed: disseminate content in a scalable, automated, and targeted way; pollute the data commons.

3.3.3 Prompt protection and interface design

Many applications of generative AI will continue to feature prompts from a user fed directly into a model. There are at least three types of design choices around the prompting process that can be used to protect contextual confidence.

The first set of design choices concern data loss prevention (DLP) techniques. DLP techniques can be enforced within prompts to protect information from being misused outside its intended context. For example, it would be desirable and fairly straightforward to impose policies on general purpose models that restrict sharing of Social Security Numbers or API keys [109].

Second, the organization deploying the model can carefully design interface instructions to remind users of context within a given prompt [110]. Prompt examples and guided templates can be helpful in this regard. In addition, notifications that periodically remind users of key features of the terms of services are important – i.e. users should be frequently reminded about how the information they provide to the model will or will not be used or shared by the organization deploying the model or any third party.

Lastly, clear interface design is key to helping users understand a generative AI models capabilities [111]. Incorporating custom instructions or prompt templates are just two ways in which users can be guided to clarify context, enabling the model to respond more effectively. Additionally, when models proactively seek context or clarification, it not only refines the response but also reminds users they’re interacting with a tool, not a human.

These strategies for prompt protection and interface design aim to contain challenges to contextual confidence by superseding the internet norm of unrestricted prompt, message or search interfaces, and sometimes obscured terms of service.

Challenges addressed: misrepresent members of specific social or cultural groups; leak or infer information about specific individuals or groups without their consent.

3.3.4 Deniable and disappearing messages

Deniable messaging is a cryptographic technique that allows for messages to be shared in a way that, if taken out of context, their authenticity cannot be definitively proven. Disappearing messages are messages that automatically erase after a set period of time, limiting the period of time over which a particular claim can be verified.

Deniable messages have two cryptographic properties [112]: (i) unforgeability ensures the receiver is confident that the message genuinely originated from the designated sender, and (ii) deniability guarantees that even if the receiver is certain about the origin of a message, they cannot later prove to others that it was sent by the purported sender. The power of deniable messages depends crucially on defining a set of “designated verifiers” for each message.

As persuasive machine-generated content of questionable authenticity proliferates, individuals will need to rely more on verification to provide credibility in order to take action on information. In addition, deniable messages are valuable within the data collection and model outputs phases of AI development. As more of the open internet continues to become flooded with AI generated content, it is critical for AI labs to protect their models from model collapse and train only on certain types of content of verifiable origin [32]. Deniable messages can also be a tool for the enforcement of AI developers’ usage policies – the usage policy can dictate not only access but also restrictions on who can verify model outputs in particular contexts.

Additionally, addressing social media’s dilemmas of trust, content overflow, and harmful influence, some have suggested that user-formed groups can be a solution [113]. Incorporating designated verifier signatures would enforce that only individuals within these groups can authenticate messages, preventing outsiders from validating content out of context, further protecting context in the interaction. Deniable messages thereby contain challenges to contextual confidence by overriding the internet norm of information having inconsistent verifiability standards.

Challenges addressed: leak or infer information about specific individuals or groups without their consent; pollute the data commons; grant access to a context-specific model without context-specific restrictions.

3.4 Mobilization Strategies to Protect Context

3.4.1 Contextual training

Generative AI models typically undergo two main training phases: pre-training and fine-tuning. During the pre-training phase, the model is exposed to a diverse set of data, but this data lacks task-specific, subject-specific, or context-specific nuances. The fine-tuning phase refines this base model, introducing more specific details and nuances, often tailored to a particular application or context. One prominent technique in this phase is Reinforcement Learning with Human Feedback (RLHF) [114, 115, 116, 117].

Contextual training is centered on exploring whether task-specific training is adequate or if a deeper, context-specific approach is required. Consider a query such as “Are these symptoms COVID?” A task-specific model might be hard-coded to provide a set answer, regardless of the user’s intent. However, the ideal response could diverge dramatically based on the user’s context. For a scriptwriter crafting a screenplay, they might be seeking dramatic or fictional symptoms for plot purposes. In contrast, an individual inquiring for medical advice may require evidence-based information. With task-specific training, the AI is essentially limited to a narrow distribution of predefined responses, whereas context-specific training – facilitated by custom instructions, system messages, or even real-world user feedback – enables the model to gauge and respond to a broader array of nuances, leading to richer and more relevant outputs.

Contextual training tools help to mitigate challenges to contextual confidence that result from the training process of an AI model – when the AI model does not appropriately account for context in its inputs, it will struggle to appropriately account for context in its outputs.[22] In order to assess whether a model can be appropriately deployed in a given context, it is important to understand how it was trained. However, unlike pre-training, where there may be more transparency of data mixtures in model cards, annotations used for supervised fine-tuning are closely guarded by AI developers.[23] Combining increased transparency around training with techniques like content filtering or “unlearning” that limit model outputs on areas where the model has less context could vastly improve contextual confidence in AI-enabled communications [125].

Norms on how we engage various stakeholders into the development process of AI technologies, and incorporate that data prior to deployment remain immature. Contextual training aims to mobilize contextual confidence by establishing new higher expectations for protecting context through the model development process.

Challenges addressed: fail to accurately represent the origin of content; misrepresent members of specific social or cultural groups.

3.4.2 Data verification

Data verification is the process of verifying a data mixture claimed to have been used at a given stage of training, ensuring that the data used was legitimately sourced from the stated providers. Despite the progress being made in ZK-ML as discussed in section 3.2.2, research on verifiable training proofs (i.e. confirming that specific training data was used in training a model) is nascent [126]. Currently, model developers could in theory make verifiable attestations about the data mixtures used throughout model development, showing that the data come from licensed sources. However, model developers typically do not use these verification tools – instead, in model cards, the developers tend to make general and unverifiable claims about the data used in training, especially in fine-tuning, reward modeling, and reinforcement learning.

Consider the implications: should a clinician trust a model if they are unsure about whether the model was trained with credible clinical content? If an educational board is uncertain about the authenticity of a model’s training material, would they introduce it as an AI tutor? Data verification serves as a tool to instill confidence that an AI model possesses the necessary expertise or cultural comprehension for its intended deployment [127].

Data verification mobilizes contextual confidence by illustrating exactly how information that originated in a particular context is repurposed by an AI model in a new context.

Challenges addressed: fail to accurately represent the origin of content; misrepresent members of specific social or cultural groups.

3.4.3 Data cooperatives

Data cooperatives are member-governed entities that aggregate subgroups data and negotiate its usage guidelines, ensuring that the benefits derived from this shared resource are returned to the subgroups. The biggest obstacle to such approaches are figuring out how to attribute value to subgroups. Recently developed methods such as influence functions [128] and Shapley values [129] are promising, and could serve as the basis through which data cooperatives pass value back to their members.[24]

Data cooperatives can mobilize contextual confidence, seeing in the rise of generative AI an opportunity to set higher standards for data attribution, responsibility and monetization.

Challenges addressed: fail to accurately represent the origin of content; misrepresent members of specific social or cultural groups.

3.4.4 Secure data sharing mechanisms

Many of the strategies for protecting context discussed above by definition inhibit data sharing across contexts. Nonetheless, there are many settings in which it is important to share data across contexts, revealing only those pieces of information necessary to the task at hand. Secure data sharing mechanisms will be an important tool for making sure that relevant information can still be shared while protecting context through the other strategies discussed above.

Secure data sharing mechanisms include cryptographic techniques like traditional encryption schemes, secure multiparty computation, differential privacy, homomorphic encryption, secret sharing, trusted execution environments, as well as decentralized and federated training of models.

Consider a network of hospitals. Even if each hospital has the tools to protect its patient data with various other strategies discussed in this paper, there is a remaining challenge: How does each hospital share data with another hospital or AI model developer without undoing the protections it has applied? Secure data sharing mechanisms can facilitate the sharing and networking of contexts while ensuring high contextual confidence.

While regulations such as HIPAA address data protection within the healthcare domain, there remains significant ambiguity around broader norms for secure data sharing across contexts in the presence of generative AI. By proactively implementing secure data sharing mechanisms, we can mobilize contextual confidence across interconnected contexts.

Challenges addressed: Leak or infer information about specific individuals or groups without their consent.

3.5 Implementation of Strategies to Promote Contextual Confidence

We have now discussed a number of strategies that promote contextual confidence in communications. In this subsection, we clarify how these strategies relate to each other, and highlight the roles of different actors in their development.

What we give now is a crude outline of the chain of implementation of the strategies considered above, focusing on what delivers contextual confidence in communication technologies. We stress again that there are many important strategies for promoting contextual confidence not discussed here, and that there are considerations (especially around hardware, for example) that lay outside the scope of the present discussion. We also discuss only a subset of actors, mainly those who can develop and deploy these technologies in the near term, leaving aside the question of how these strategies can be strengthened through broad coordination e.g. government regulation, international ethics committees, and industry-wide standards organizations.[25]

Communication technologies, which form the foundational layer, are systems used to send and receive information. These include email and messaging services, social media platforms, news platforms, and other communication platforms for cultural production and consumption. In order to ensure contextual confidence in this root layer, there must identity protocols and model development policies in place beforehand. Through identity protocols, users gain access to communication technologies, and the details of the identity protocol determine norms of communication.[26] As generative AI becomes integral to communication, choices made at the AI model development layer also crucially feed into the communication layer. As messages flow between users, data management protocols that sit on top of the communication protocol can ensure context is protected and appropriately shared back to AI model developers.

So, there are four key layers of implementation: AI model development, identity protocols, messaging protocols, and data management protocols. It is at the layer of AI model development where usage policies, model verification, watermarking, data verification and contextual training are most likely to be effectively implemented. Meanwhile, the identity protocol layer is where digital identity solutions and relational passwords should be pursued. The communication technologies themselves can pursue strategies such as rate limits, content provenance, Community Notes, deniable messages and prompt design. Finally, data management protocols such as data cooperatives and secure data-sharing mechanisms protect contextual confidence on the communication layer, and protect context as information feeds back into AI model development.

Figure 1: Summary of party interactions in pursuit of strategies discussed in this report.

3.6 Illustrating the Value of the Contextual Confidence Perspective

In this section, we show how the contextual confidence framework helps to guide action in a specific, concrete setting. We contrast the approach suggested by contextual confidence with the approaches suggested by perspectives focused on privacy and information integrity.

Suppose a company’s CEO is interested in developing a mimetic model of herself. The purpose of this model is to maximize the CEO’s ability to field concerns raised by employees, to serve as a “proxy CEO” by attending meetings on the CEO’s behalf, and to gather information in order to provide advisory support to the human CEO. In this discussion, we will compare the recommendations that might emerge from a conventional privacy or information integrity review of such a tool with a review focused on securing contextual confidence. We hope this case illustrates that, compared to privacy and information integrity perspectives, contextual confidence suggests an approach that mitigates risks of deploying this proxy CEO model, without sacrificing too much of its usefulness.

We first consider a “privacy” perspective, and in particular a perspective that sees privacy as secrecy about and control over one’s personal data.[27] From this type of privacy perspective, the recommendations would likely focus on the removal of sensitive data from both the training data and outputs of the mimetic model. This could involve employing data loss prevention or de-identification techniques to either transform or completely remove sensitive information. This privacy-as-control approach certainly gets some things right. However, at the same time, it is hard to know where to draw the line on what constitutes sensitive information, and there is a risk that an overly conservative approach might strip away the very data that makes the mimetic model effective. There is a deep tension between privacy and functionality here, and the perspective of privacy as secrecy of personal data does not offer much useful guidance as to how to strike the right balance.

On the other hand, an information integrity perspective would advocate for a pursuit of “truth,” striving to, in some sense, “fact-check” model outputs. That is, this approach might lead to the development of a high frequency review process for the mimetic model’s outputs, with a goal of ensuring that the model’s responses are in some sense an accurate proxy for the CEO’s putative responses. Again, like the privacy-as-control approach, the information integrity approach also serves as a good start. However, in a large company, the review process may not scale and may be overburdening, especially for the CEO who created the model in the first place to improve efficiency in her communications.

How does contextual confidence guide action in this scenario? Let us first evaluate the challenges to the identification and protection of context in this scenario. The CEO model introduces uncertainty into employees’ ability to accurately identify whether they are interacting with the authentic CEO or the authentic CEO model (could be a malicious model) they are interacting with. Additionally, the norms governing the interaction – and how the information exchanged might subsequently be utilized by the company, the CEO, or other employees remain ambiguous. To navigate these complexities, a framework grounded in contextual confidence would recommend at least four specific actions:

1. Usage policies: Require employees to pass comprehension tests to engage with CEO model.

2. Model verification: Develop proofs that can be widely used to regularly verify CEO model.

3. Interface design: Disclose model use in all outbound communications.

4. Deniable messages: Deploy designated verified signatures for off-the-record conversations.

Through a hypothetical example of a company’s CEO developing a mimetic model to enhance efficiency, this section aimed to concretely illustrate the value of the contextual confidence perspective. While more conventional privacy (as control over personal information) and information integrity perspectives offer valuable insights, they may not always strike the most effective balance between protection and functionality. The contextual confidence framework, however, offers a nuanced approach that emphasizes the importance of recognizing and safeguarding context. The framework suggests a set of specific actionable recommendations, from setting usage policies and model verification strategies to refining interface designs and ensuring that there are deniable messaging protocols to enable genuinely off-the-record conversations. These recommendations underscore the potential of the contextual confidence approach in offering a balanced, effective, and comprehensive strategy for navigating emerging complexities around generative AI model use.


[16] Many have written about the competing visions that dictated the development of the internet, and how it led to the norms and expectations we have today, see for example [37].

[17] For a few recent proposals, see [67, 68, 69, 70, 71, 72, 73].

[18] The main concerns around watermarking are its robustness and susceptibility to evasion [74, 75, 76]. In addition, even if watermarking technologies were to become robust and implemented in highly regulated and controlled models, the watermarking requirement could push malicious actors to substitute toward non-regulated and more harmful models.

[19] These techniques potentially serve as valuable tools for evaluations, model cards, and audits, enabling labs to provide verifiable proof of these actions before releasing a model [81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91].

[20] For a few recent approaches to AI licensing and structured access of AI models, see [97, 98, 99] and [100].

[21] For a critical discussion of usage policies in response to [99], see [105].

[22] To understand the importance of context in training, consider the release of Llama2, which demonstrated the importance of the quality rather than the quantity of data annotations in the fine-tuning of transformer-based AI models [118].

[23] Fine-tuning base models via supervised learning has become a common trend leading to the development of a suite of very powerful models such as Phi [119], WizardLM [120], XGEN [121], Vicuna [122], Falcon [123], and Alpaca [124].

[24] For agenda-setting treatments of data cooperatives, see [130], [131] and [132].

[25] further discussion on the roles different actors play in responsible AI deployment see [133, 134, 135, 136] and [137].

[26] For example, if a web application requires only a valid email address and allows for pseudonymous user names, the norms of communication will be different from an organization-wide Slack accessible only to verified members of the organization.

[27] Of course there are many possible conceptions of privacy not captured by this one, but we take this one as an influential example.