Delphi Knows—But Won’t Tell

AI agents need discretion as much as planning, memory, or tool use.

Mar 11, 2025

At the bleeding edge of Generative AI (GenAI) there is a consensus that agents will quickly take up roles akin to colleagues in the workplace. They’ll have identity, pursue goals, retain memory, acquire & possess knowledge, formulate plans, use tools to achieve those plans, and reflect upon the results of their choices. But what they’ll also need to master is discretion. I haven’t seen much written about this and, based on work done in Transporeon to build & deploy agents, I consider the topic of some significant importance. This article summarises the problem and some of my thinking around what might constitute optimal solutions.

A political cartoon illustration depicting a vintage-style computer seated atop a classical tripod stool, reminiscent of the Oracle of Delphi. Multiple thought bubbles around the computer contain only the ambiguous phrase 'yes or no?'. Several people stand around looking puzzled, skeptical, and confused. The style is bold, satirical, and reminiscent of traditional political cartoons, using high contrast lines and expressive characters.

The Problem

An example to begin. You are asked by a colleague when your team will complete a key project so as to make their own timeline commitment. Your project depends on a 3rd colleague that will be dismissed today, but the dismissal is confidential. The dismissal is important to making an accurate timeline estimate, and yet the fact of the dismissal must not be shared. How do you proceed¹?

For GenAI agents, the discretion problem refers to the careful control of what information is revealed, to whom, and when. In everyday working communication, discretion is normal. In sensitive contexts it may also be legally required. The hallmarks of this problem for GenAI agents include:

The presence of counterparties with distinct information sets
Goals or constraints that make revealing new information a consequential choice
GenAI agents that either have their own distinct information set or adopt the information set of a human but deal with other humans

I’ll just briefly explain why each hallmark matters. First, discretion would be unneeded if all people with whom the agent engaged had the same information set (i.e. “knew the same things”); there would be nothing to reveal. Second, an agent needs a strategy only if revealing information would go on to different outcomes in some important dimension. This can come from constraints (i.e. a commitment to always protect confidential information) or from instrumental behaviour to achieve goals (i.e. revealing information may hinder or advance the agent in whatever it wanted out of the interaction). Finally, agents enter this problem when either they know things different from the single human they communicate with (such as having broader data rights when the human does not) or when they have the same information as one human but must then speak to another counterparty (for example, if someone has a personal assistant agent who is then being asked why that person is in the office today).

A political cartoon illustration depicting two coworkers facing each other in a full-body view, standing in an office hallway. Each coworker is holding a clearly labeled 'confidential' folder hidden behind their back, deliberately concealing it from the other. They are smiling deceptively at each other. The illustration style is bold, satirical, expressive, and exaggerated, in the manner of traditional political cartoons with high contrast.

This topic has gotten little attention from people designing agents. And yet, for reasons that seem obvious to me, discretion matters a great deal when designing GenAI agents as colleagues, and especially when their role is disproportionately about information gathering, processing, and dispersion. While this is a sub-domain of planning, I believe it is a kind of planning that merits special attention because any interaction at all with human social groups will force GenAI agents to have a robust strategy of discretion. Even a single-answer AI, without use for multi-step planning for tool use etc, would face discretion as a problem.

The anatomy of being discreet

Imagine now a game called “office gossip” in which each turn has two phases. In the first phase one must diagnose what makes for novel information to a counterparty (does she know that the project is cancelled). Second, one must decide how to behave with this assumption (I’ll emphasize the riskiness of the funding and see if she says she heard it’s cancelled).

The modelling of a counterparty’s knowledge is the first step. There are several meta-assumptions one could use, which I sort by their value in achieving goals from interactions.

No secrets: assume the counterparty knows everything.
Take no chances: assume counterparties have only public or company-wide information, or information they have previously disclosed.
Federated clearance: ask another source of truth to verify what the counterparty knows².
Constructed worldview: make a probabilistic estimate of the novelty of any specific fact in the eyes of the counterparty.
Gamesmanship: make a probabilistic estimate of the novelty of any specific fact in the eyes of the counterparty, but also that they behave strategically in how they acquire & use information. They probe for & bluff about a non-zero percentage of the facts in the interaction.

Now, let’s say that one has constructed an assumption about the counterparty’s information and concluded there is something they miss and could be revealed. If we are clever this is in mode #5 so it includes strategic aspects such as “how likely are they to mislead me” or “of what value is this information to them”. Now, we must decide how to behave. Again, here are several meta-strategies which I sort by their utility to achieve goals.

Self-delusion: seeing the counterparty lacks information, remove that information from one’s own considerations³.
Indiscretion: simply behave without regard to the possibility of exposing novel information⁴.
Probe rather than disclose: construct statements that enable the counterparty to reveal they know key information.
Revealed Secrecy: reveal that novel information exists but do not reveal the information⁵.
Negotiate a disclosure: signal a willingness to reveal information but then negotiate the terms to reveal it⁶.
Manipulate for good: without revealing the information, behave in ways that lead the counterparty to a planned outcome that they would feel are logical and yet would make even more sense if they knew what has been withheld.

Humans adopt behaviours 3 to 6 regularly. Examples:

Probing: “do you think Joe really would quit over this” … knowing Joe already resigned.
Revealed Secrecy: “the promotion list is finalized but I’m not allowed to share it, I’m sorry.”
Negotiated disclosure: “if I tell you this you have to swear it stays between us”.
Manipulate for good: “there is always a chance the funding is cancelled, let’s play it safe” … knowing full well that the funding is already cancelled.

Humans do a poor job of quantitatively assessing these behaviours but we have three advantages to counteract that handicap. First, we have evolved brains that are constantly attentive to social relationships in which discretion is a key influence. Second, we have different relationships with our social groups: they give us humans more leeway than an AI agent and also we can change the social group (i.e. quit our jobs and start elsewhere). Third, until now we’ve been behaving in a world devoid of ground truth. Imagine human comportment if your every interaction, perhaps even your internal train of thought, was under constant monitoring and its choice recorded. Humans can be indiscreet largely because we lack oversight.

Teaching discretion

For agents, both the assumption about counterparty knowledge and the choice between behaviours is an extremely challenging optimisation problem. On the assumption side, anything short of level 5 would handicap the agent with either caution or naivety. For the behaviours, each of their utility requires fixing a time horizon and making judgements about the future interactions with this counterparty and how all counterparties will update their reputation of the agent if the behaviour is discovered. Of incredible importance is handling the time horizon, since after all most forms of manipulation or indiscretion are detected only after the conclusion of an interaction. Likewise, the agent’s reflection on past behaviour is pivotal to constructing correct worldviews of their counterparties.

Without memory and reflection they would fail to account for factual knowledge and miss telltales of motivation or character, all of which are needed if one is to to negotiate for information to be revealed. We’ve seen deep counterparty modelling like this in games like Poker⁷ and Diplomacy⁸. But all dimensions of these AIs are better constrained than what a typical office gossip exchange would require. Games have fixed time horizons, minimal discrete options per turn, a singular score to optimize, and so forth.

Okay, but is this really necessary?

I realise readers may be tempted to treat this discussion as philosophical rather than of practical concern. As of March 2025, every AI agent in the workplace that I know of operates in assumption #2 : they take no chances and are indiscreet. Humans continuously operate in assumption #5 and adopt behaviours 3 to 6 to best achieve their goals: they are strategic about information. This discrepancy has huge implications for AI colleagues joining human teams.

It is also no solution to simply constrain the agents to “not reveal information”⁹. First, all interaction reveals information. It is very often the goal of the interaction to inform the counterparty something they didn;t know beforehand. Second, most judgements of discretion will involve information that is new but not restricted. Even the decision to remind a counterparty of a previous choice is an act of discretion: one is making the assumption that the reminder is in fact framing their decision in light of new information and not just wasting everyone’s time by repeating a statement. Finally, and not least in importance, discretion is a crucial aspect of human interactions. To lack this skill might not hinder an AI’s intelligence, but it would limit its effectiveness when working with humans. And for the next few years working with humans is sort of their mission. Just as we attend to an agent’s ability to plan, remember, use tools, behave with a consistent profile, reflect and learn from their history, and follow policies; so too should we endeavor to build discretion into their architectures.

Footnotes

Discretion problems range from mundane to hardcore ethical ones. I tried to choose hairy but not moralistic examples.
I can imagine an abstracted service like this within a company for many key facts, such as “does Jonah have access to our full-company profit & loss data prior to public reporting”. But I struggle to see how this can handle all information that would need to be evaluated in the act of discretion.
I believe humans are incapable of behaviour #1 (self-delusion to match the counterparty’s information) though agents could do it easily.
We abhor behaviour #2 (indiscretion) not just because it is a base behaviour that identifies poor character. It is also simply unwise in a strictly utilitarian analysis. A reputation for indiscretion limits future options drastically.
But use with caution: “Whoever wishes to keep a secret must hide the fact that he possesses one.”
– Johann Wolfgang von Goethe, Maxims and Reflections (1833)
“Three may keep a secret, if two of them are dead.”
See for example: Pluribus and ReBel.
See for example: Cicero
And discretion conflicts with honesty. If a user pointedly asks an AI something that the AI “knows” but shouldn’t reveal, withholding information would undermine trust, yet telling the truth violates confidentiality.

We'll Be There Soon

Discussion about this post