Designing Human-AI Workflows for Synergy

A sobering meta-analysis reveals a counterintuitive truth: most human-AI collaborations actually underperform compared to either the human or the AI working alone. Consider a study on fake hotel review detection: the AI achieved 73% accuracy, the human 55%, yet the combined system managed only 69%.

This raises a crucial question: How do we architect human-AI collaborations that truly elevate performance?

If you’re leading an AI rollout, that question is more than academic. It determines whether your investment produces step-change performance or an expensive stalemate. Simply placing people and AI in the same workflow does not guarantee better results. What matters is what they do together, and how intentionally you design the collaboration.

Synergy Vs. Augmentation

The researchers in the study above investigated two desirable outcomes: synergy and augmentation. Synergy represents the ideal state, where the combined human-AI performance surpasses both the human alone and the AI alone, mirroring “strong synergy” found in purely human groups. Augmentation is a more modest goal and simply means the human-AI system performs better than the human alone.

A common implicit assumption is that the combined system must be better than either component, but the reality is often complicated by human behavioral pitfalls. 

Humans frequently struggle to find the right balance of trust: they either over-rely on AI and blindly accept its suggestions, or under-rely and prematurely dismiss valuable AI input. For example, in the fake hotel review study, since humans were not as good at the task as AI, they didn’t make good judges of AI’s recommendations leading to a sub-par outcome. So, while AI augmented human accuracy (55% -> 69%), it was less effective than AI alone (73%)

On the other hand, in a study on bird image classification, AI was only 73% accurate, compared to expert human performance of 81%. But, the human-AI collaboration reached 90% accuracy, better than either human or AI alone. This is an example of human-AI synergy, which results from expert humans being able to better decide when to trust their own judgement versus the algorithm’s, thus improving the overall system performance.  

Complementarity

Another view of synergy in human-AI collaboration comes from research on complementarity, a practical way to ensure that what the human brings and what the AI brings are meaningfully different and mutually enhancing.

According to the authors, it is useful to think of the distinct ways humans and AI approach decision-making, which result from two key asymmetries:

  1. Information Asymmetry: Often, AI and humans operate with different inputs. AI relies on a vast collection of digitized data. Humans, however, draw on a broader, richer context that includes non-digitized real-world knowledge. For example, an AI might accurately diagnose from a scan, but a human doctor also factors in the patient’s demeanor, additional symptoms, or prior history. This holistic view gives the human a distinct informational advantage in complex situations.
  2. Capability Asymmetry: Even given the exact same information, the processing methods differ. AI models infer patterns from vast datasets while humans use more flexible mental models to build an intuitive understanding of the world. This allows humans to learn rapidly, often after only a few trials, and accumulate lifelong experiences. On the other hand, AI can instantly digest massive information and detect tiny, subtle variations in data that would be imperceptible to a human. These differences lead to different unique capabilities.

Where teams stumble is when these asymmetries are flattened. If your process gives humans and models the same inputs and asks them to do the same step, one of them is redundant. If, instead, you assign different roles and design a clean way for their contributions to combine, the whole becomes greater than the sum of its parts.

When, and When Not, To Use AI

Rethinking the architecture of modern work to integrate human and artificial intelligence demands a careful, nuanced approach. The success of this collaboration hinges on a thoughtful consideration of two critical factors: the inherent nature of the task and the complementary strengths of the human and AI partners.

Task Type

The type of work at hand fundamentally dictates the potential for synergy. For example, the MIT meta-analysis study found that Innovative Tasks are the “sweet spot” for maximum impact. These are characterized by open-ended goals and constraints that evolve in real-time, through iteration and exploration. Here, humans have the natural ability to think in non-linear ways, associating unrelated concepts to forge novel and meaningful connections. For such tasks, AI can tap into its vast informational landscape from which these connections can be drawn, leading to synergy.

However, for Decision Tasks that primarily require evaluation or judgement, the story is not as straightforward. In some cases, the human-AI collaboration can perform worse than either human or AI alone. It depends on what the task is and how it’s split between AI and humans as discussed next.

Task Separation

Underlying all successful partnerships in general is the ability to leverage distinct and complementary strengths. Human-AI collaboration is no different in that way.

It sounds counterintuitive, but if the AI’s initial performance is overwhelmingly superior, the overall human-AI system may actually perform worse. The thoughtful approach, therefore, is to precisely restrict the AI’s role to only those sub-tasks where it has a clear advantage.

Conversely, when the human is the stronger initial decision-maker, the partnership tends toward greater success. As the expert, the human is better positioned to critically assess the AI’s input and selectively integrate it into the process, in a synergistic fashion.

Ultimately, effective human-AI collaboration is not about replacing one with the other; it is a delicate exercise in defining boundaries, recognizing unique excellence, and ensuring that the final output is greater than the sum of its different parts.

Conclusion

Simply introducing AI into a workflow is not a prescription for performance improvement and can, in fact, lead to an expensive underperformance. Achieving true synergy—where the human-AI system surpasses both components working alone—requires intentional design built on complementarity. 

A crucial lesson is that human expertise matters. Experts are better positioned to critically assess and leverage AI input, transforming a simple augmentation into a synergistic gain. This is particularly relevant in the ‘sweet spot’ of Innovative Tasks, where human creative thinking abilities and an AI’s vast informational landscape can combine to give surprising creative breakthroughs.

Ultimately, the future of work hinges on recognizing and embracing these boundaries, and recognizing that AI may not be suitable for every kind of task. Instead of replacing humans, the goal is to define complementary roles that exploit the inherent asymmetries in information and capability. By expertly differentiating tasks organizations can move past the common trap of underperformance toward a future of genuine human-AI synergy.

Boosting AI’s Intelligence with Metacognitive Primitives

Over the past year or so, AI experts, like Ilya Sutskever in his Neurips 2024 talk, have been raising concerns that AI reasoning might be hitting a wall. It seems that simply throwing more data and computing power at the problem is giving us less and less in return, and models are struggling with complex thinking tasks. Maybe it’s time to explore other facets of human reasoning and intelligence, rather than just relying on sheer computational force.

At its core, a key part of human intelligence is our ability to pick out just the right information from our memories to help us solve the problem at hand. For instance, imagine a toddler seeing a puppy in a park. If they’ve never encountered a puppy before, they might feel a bit scared or unsure. But if they’ve seen their friend playing with their puppy, or watched their neighbors’ dogs, they can draw on those experiences and decide to go ahead and pet the new puppy. As we get older, we start doing this for much more intricate situations – we take ideas from one area and apply them to another when the patterns fit. In essence, we have a vast collection of knowledge (made up of information and experiences), and to solve a problem, we first need to identify the useful subset of that knowledge.

Think of current large language models (LLMs) as having absorbed the entire knowledge base of human-created artifacts – text, images, code, and even elements of audio and video through transcripts. Because they’re essentially predictive engines trained to forecast the next word or “token,” they exhibit a basic level of reasoning that comes from the statistical structures within the data, rather than deliberate thought. What has been truly remarkable about LLMs is that this extensive “knowledge layer” is really good at exhibiting basic reasoning skills just by statistical prediction. 

Beyond this statistical stage of reasoning, prompting techniques, like assigning a specific role to the LLM, improve reasoning abilities even more. Intuitively speaking, they work because they help the LLM focus on the more relevant parts of its network or data, which in turn enhances the quality of the information it uses. More advanced strategies, such as Chain-of-Thought or Tree-of-Thoughts prompting, mirror human reasoning by guiding the LLM to use a more structured, multi-step approach to traverse its knowledge bank in more efficient ways. One way to think about these strategies is as higher-level approaches that dictate how to proceed. A fitting name for this level might be the Executive Strategy Layer – this is where the planning, exploration, self-checking, and control policies reside, much like the executive network in human brains.

However, it seems current research might be missing another layer: a middle layer of metacognitive primitives. Think of these as simple, reusable patterns of thought that can be called upon and combined to boost reasoning, no matter the topic. You could imagine it this way: while the executive strategy layer helps an AI break down a task into smaller steps, the metacognitive primitive layer makes sure each of those mini-steps is solved in the smartest way possible. This layer might involve asking the AI to find similarities or differences between two ideas, move between different levels of abstraction, connect distant concepts, or even look for counter-examples. These strategies go beyond just statistical prediction and offer new ways of thinking that act as building blocks for more complex reasoning. It’s quite likely that building this layer of thinking will significantly improve what the Executive Strategy Layer can achieve.

To understand what these core metacognitive ideas might look like, it’s helpful to consider how we teach human intelligence. In schools, we don’t just teach facts; we also help students develop ways of thinking that they can use across many different subjects. For instance, Bloom’s revised taxonomy outlines levels of thinking, from simply remembering and understanding, all the way up to analyzing, evaluating, and creating. Similarly, Sternberg’s theory of successful intelligence combines analytical, creative, and practical abilities. Within each of these categories, there are simpler thought patterns. For example, smaller cognitive actions like “compare and contrast,” “change the level of abstraction,” or “find an analogy” play an important role in analytical and creative thinking.

The exact position of these thought patterns in a taxonomy is less important than making sure learners acquire these modes of thinking and can combine them in adaptable ways.

As an example, one primitive that is central to creative thinking is associative thinking — connecting two distant or unrelated concepts. In a study last year, we showed that by simply asking an LLM to incorporate a random concept, we could measurably increase the originality of its outputs across tasks like product design, storytelling, and marketing. In other words, by turning on a single primitive, we can actually change the kinds of ideas the model explores and make it more creative. We can make a similar argument for compare–contrast as a primitive that works across different subjects: by looking at important aspects and finding “surprising similarities or differences,” we might get better, more reasoned responses. As we standardize these kinds of primitives, we can combine them within higher-order strategies to achieve reasoning that is both more reliable and easier to understand.

In summary, giving today’s AI systems a metacognitive-primitives layer—positioned between the knowledge base and the Executive Strategy Layer—might provide a practical way to achieve stronger reasoning. The knowledge layer provides the content; the primitives layer supplies the cognitive moves; and the executive layer plans, sequences, and monitors those moves. This three-part structure mirrors how human expertise develops: it’s not just about knowing more, or only planning better, but about having the right units of thought to analyze, evaluate, and create across various situations. If we give LLMs explicit access to these units, we can expect improvements in their ability to generalize, self-correct, be creative, and be more transparent, moving them from simply predicting text toward truly adaptive intelligence.

Labels and Fables: How Our Brains Learn

One of the most remarkable capabilities of the human brain is its ability to categorize objects, even those that have little visual resemblance to one another. It’s easier to see that visually similar objects, like different trees, fit into a category and it’s a skill that non-human animals also possess. For example, dogs show distinct behaviors in the presence of other dogs compared to their interactions with humans, demonstrating that they can differentiate the two even if they don’t have names for them.

A fascinating study explored whether infants are able to form categories for different looking objects. Researchers presented ten-month-old infants with a variety of dissimilar objects, ranging from animal-like toys to cylinders adorned with colorful beads and rectangles covered in foam flowers, each accompanied by a unique, made-up name like “wug” or “dak.” Despite the objects’ visual diversity, the infants demonstrated an ability to discern patterns. When presented with objects sharing the same made-up name, regardless of their appearance, infants expected a consistent sound. Conversely, objects with different names were expected to produce different sounds. This remarkable cognitive feat in infants highlights the ability of our brains to use words as a label to categorize objects and concepts beyond visual cues. 

Our ability to use words as labels comes in very handy to progressively build more abstract concepts. We know that our brains look for certain patterns (that mimic a story structure) when deciding what information is useful to store in memory. Imagine that the brain is like a database table where each row captures a unique experience (let’s call it a fable). By adding additional labels to each row we make the database more powerful. 

As an example, let’s suppose that you read a story to your toddler every night before bed. This time you are reading, “The Little Red Hen.” As you read the story, your child’s cortisol level rises a bit as she imagines the challenges that Little Red Hen faces when no one helps her; and as the situation resolves she feels a sense of relief. This makes it an ideal learning unit to store into her database for future reference. The story ends with the morals of working hard and helping others, so she is now able to add these labels  to this row in her database. As she reads more stories, she starts labeling more rows with words like “honesty” or “courage”, abstract concepts that have no basis in physical reality. Over time, with a sufficient number of examples in her database for each concept, she has an “understanding” of what that particular concept means. Few days later when you are having a conversation with her at breakfast and the concept of “helping others” comes up, she can proudly rattle off the anecdote from the Little Red Hen. 

In other words, attaching labels not only allowed her to build a sense of an abstract concept, it also made it more efficient for her brain to search for relevant examples in the database. The figure above shows a conceptual view, as a database table, of how we store useful information in our brains. The rows correspond to a unit of learning — a fable — that captures how a problem was solved in the past (through direct experience or vicariously). A problem doesn’t even have to be big – a simple gap in existing knowledge can trigger a feeling of discomfort that the brain then tries to plug. The columns in the table capture all the data that might be relevant to the situation including context, internal states and of course, labels. 

Labels also play a role in emotional regulation. When children are taught more nuanced emotional words, like “annoyed” or “irritated” instead of just “angry”, they have better emotional responses. Research shows that adolescents with low emotional granularity are more prone to mental health issues like depression. One possible reason is that when you have accurately labeled rows you are able to choose actions that are more appropriate for the situation. If you only have a single label “anger” then your brain might choose an action out of proportion for a situation that is merely annoying. 

At a fundamental level, barring any disability, we are very similar to each other – we have the same type of sensors, the same circuitry that allows us to predict incoming information or the same mechanisms to create entries in the table. What makes us different from each other is simply our unique set of labels and fables. 

The Science Behind Storytelling: Why Our Brains Crave Narratives

“Once upon a time…” These four words have captivated audiences for centuries, signaling the start of a story. But what is it about stories that so powerfully captures our attention and leaves a lasting impression? The answer may lie in the way our brains learn and process information.

How Our Brains Learn: A Baby’s Perspective

A baby is constantly facing an influx of sensory information that its underdeveloped brain isn’t capable of handling. So how does it make sense of all that information? She relies on her adult caretakers to help her understand what is important and what is not. An example can clarify how this learning process works. 

  • Say you are going on a walk with your toddler and you see the neighbor’s cat. 
  • You excitedly point to the cat, in the high-pitched and exaggerated voice that only parents use, and go  “Oh look, a kitty cat” 
  • The high-pitch sound stands out from all the other audio sounds the baby is hearing. At the same time, her body releases some chemicals like dopamine (to put her in alert state) and noradrenaline (to focus attention). 
  • You might then tell her how cute the cat looks and the cheery tone of your voice tells her that the cat is a “good” thing and not something to be afraid of. And simultaneously her body releases a bit of dopamine that signals relief. 

Her brain then captures all of the information related to this event — including context like the neighborhood, the name, the image and the emotional state — and stores it as a “searchable rule”. The next time she walks by the neighbor’s house, her brain pulls up this knowledge about the cat, and she gets excited to pet the cat. Suppose, at another time you happen to be on a hike and see a different cat. Now, the knowledge that your toddler has about cats doesn’t match perfectly – it’s a different location and a different type of cat. Depending on other existing bits of information (e.g. knowledge about aggressive animals in the wild), her brain might pick a different rule and suggest a more cautionary approach. 

The Story-Learning Connection

This learning process has striking similarities to how artificial intelligence (AI) is trained. Both require labeled data and multiple examples to generalize information. However, human brains have a unique ability to learn continuously by integrating discrete “units” of information into our existing knowledge base. Given what we now know about how our brains work, it seems likely that this unit of information corresponds to what lies between the cortisol and dopamine waves. The presence of this emotional signature tells the brain to take a snapshot of the moment and store it with additional metadata. This metadata, like the labels that we assign to this information (e.g. “cat”, “neighbor”, etc.), help in searching this database of knowledge at a later time. 

This also helps explain why we find stories so compelling. Stories are packaged perfectly in the form our brain needs to process a learning unit. “Once upon a time…”, “…and they lived happily ever after” which map to the rise (and fall) of cortisol and dopamine provide the ideal bookends for this learning unit.

Our affinity for the narrative form explains a lot about learning and how we make meaning. Here are three ways stories play a role for us in society:

  • Bedtime Stories: Bedtime stories, a tradition for many generations, are an ideal medium for communicating cultural values. Most folk tales don’t just tell a story but also explicitly call out a moral value, which is essentially a label for an abstract concept, at the end. When children hear different stories for the same moral they are able to build a deeper understanding of the moral concept and the different ways it can manifest. 
  • Pretend Play: When toddlers engage in pretend play they simulate novel scenarios with all the features of a story – setting, conflict, resolution. The simulation allows the child to vividly experience the emotions in the story and thereby learn from it. Engaging in pretend play with children is a great way for parents to recognize what learning their child is taking away from the situation and reframe it for them if needed.
  • Conspiracy Theories: Unfortunately, our learning mechanism can also be hacked in unhealthy ways. The narrative structure also explains why conspiracies, even though untrue and easily verifiable, are so effective. Most conspiracies start with an outrageous claim to grab attention, label the story with a moral value and suggest an action to resolve the situation. When delivered by someone you trust, which is how we started learning in the first place, the conspiracy is easily accepted and integrated into our knowledge base. 

Conclusion: The Enduring Power of Storytelling

Stories are not just a form of entertainment; they are fundamental to how we learn, make sense of the world, and connect with others. We are not certain why stories are so powerful, but one possible explanation is that the narrative structure is recognized by our brain as a unit of learning allowing it to be integrated well into existing knowledge structures. By understanding the science behind storytelling, we can harness its power for education, communication, and personal growth. So, the next time you hear “Once upon a time…,” remember that you’re not just embarking on a journey of imagination, but also engaging in a deeply ingrained learning process that has shaped humanity for millennia.

Can AI Have Ethics?

Imagine finding yourself marooned on a deserted island with no other human beings around. You’re not struggling for survival—there’s plenty of food, water, and shelter. Your basic needs are met, and you are, in a sense, free to live out the rest of your days in comfort. Once you settle down and get comfortable, you start to think about all that you have learned since childhood about living a good, principled life. You think about moral values like “one should not steal” or “one should not lie to others” and then it suddenly dawns on you that these principles no longer make sense. What role do morals and ethics play when there is no one else around? 

This thought experiment reveals a profound truth that our moral values are simply social constructs designed to facilitate cooperation among individuals. Without the presence of others, the very fabric of ethical behavior begins to unravel. 

This scenario leads us to a critical question in the debate on artificial intelligence: can AI have ethics?

Ethics as a Solution to Cooperation Problems

Human ethics have evolved primarily to solve the problem of cooperation within groups. When people live together, they need a system to guide their interactions to prevent conflicts and promote mutual benefit. This is where ethics come into play. Psychologists like Joshua Greene and Jonathan Haidt have extensively studied how ethical principles have emerged as solutions to the problems that arise from living in a society.

In his book Moral Tribes, Joshua Green proposes that morality developed as a solution to the “Tragedy of the Commons,” a dilemma faced by all groups. Consider a tribe where people sustain themselves by gathering nuts, berries, and fish. If one person hoards more food than necessary, their family will thrive, even during harsh winters. However, food is a finite resource. The more one person takes, the less remains for others, potentially leading to the tribe’s collapse as members starve. Even if the hoarder’s family survives, the tribe members are likely to react negatively to such selfish behavior, resulting in serious consequences for the hoarder. This example illustrates the fundamental role of morality in ensuring the survival and well-being of the group.

Our innate ability to recognize and respond to certain behaviors forms the bedrock of morality. Haidt defines morality as “a set of psychological adaptations that allow otherwise selfish individuals to reap the benefits of cooperation.” This perspective helps explain why diverse cultures, despite differences in geography and customs, have evolved strikingly similar core moral values. Principles like fairness, loyalty, and respect for authority are universally recognized, underscoring the fundamental role of cooperation in shaping human morality.

The Evolution of Moral Intuitions

Neuroscience has begun to uncover the biological mechanisms underlying our moral intuitions. These mechanisms are the result of evolutionary processes that have equipped us with the ability to navigate complex social environments. For instance, research has shown that humans are wired to find violence repulsive, a trait that discourages unnecessary harm to others. This aversion to violence is not just a social construct but a deeply ingrained biological response that has helped our species survive by fostering cooperation rather than conflict.

Similarly, humans are naturally inclined to appreciate generosity and fairness. Studies have shown that witnessing acts of generosity activates the reward centers in our brains, reinforcing behaviors that promote social bonds. Fairness, too, is something we are biologically attuned to; when we perceive fairness, our brains release chemicals like oxytocin that enhance trust and cooperation. These responses have been crucial in creating societies where individuals can work together for the common good.

The Limits of AI in Understanding Morality

Now, let’s contrast this with artificial intelligence. AI, by its very nature, does not face the same cooperation problems that humans do. It does not live in a society, it does not have evolutionary pressures, and it does not have a biological basis for moral intuition. AI can be programmed to recognize patterns in data that resemble ethical behavior, but it cannot “understand” morality in the way humans do.

To ask whether AI can have ethics is to misunderstand the nature of ethics itself. Ethics, for humans, is deeply rooted in our evolutionary history, our biology, and our need to cooperate. AI, on the other hand, is a tool—an extremely powerful one—but it does not possess a moral compass. It knows about human moral values strictly from a knowledge perspective, but it’s unlikely to ever create these concepts internally by itself simply because AI has no need to cooperate with others. 

The Implications of AI in Moral Decision-Making

The fact that AI cannot possess ethics in the same way humans do has profound implications for its use in solving human problems, especially those that involve moral issues. When we deploy AI in areas like criminal justice, healthcare, or autonomous driving, we are essentially asking a tool to make decisions that could have significant ethical consequences.

This does not imply that AI should be excluded from these domains. However, we must acknowledge AI’s limitations in moral decision-making. While AI can contribute to more consistent and data-driven decisions, it lacks the nuanced understanding inherent in human morality. It can inadvertently perpetuate existing biases present in training datasets, leading to outcomes that are less than ethical. Moreover, an overreliance on AI for ethical decision-making can hinder our own moral development. Morality is not static; it evolves within individuals and societies.  Without individuals actively challenging prevailing norms and beliefs, many of the freedoms we cherish today would not have been realized.

Conclusion

Ultimately, the question of whether AI can have ethics is not just meaningless; it is the wrong question to ask. AI does not have the capacity for moral reasoning because it does not share the evolutionary, biological, and social foundations that underlie human ethics. Instead of asking if AI can be ethical, we should focus on how we can design and use AI in ways that align with human values.

As we continue to integrate AI into various aspects of society, the role of humans in guiding its development becomes more critical. We must ensure that AI is used to complement human judgment rather than replace it, especially in areas where ethical considerations are paramount. By doing so, we can harness the power of AI while maintaining the moral integrity that defines us as human beings.