GenAI lacks the creativity to make scientific discoveries [Q&A]

Generative AI is often praised for its ability to analyze data, summarize research and even propose scientific ideas, but new findings suggest its creative limits are more substantial than many of the exciting headlines imply.
Research led by Professor Amy Wenxuan Ding of emlyon business school and Professor Shibo Li of Indiana University found that while GenAI can imitate the process of science, it cannot yet produce the imaginative leaps that drive true discovery. In their study, which you can read here, a computer simulated experiment challenged ChatGPT-4 to solve a real genetics puzzle, asking it to propose hypotheses, design lab experiments and revise its thinking as results unfolded.
The model handled the mechanics of scientific reasoning, offering ideas and planning experiments, yet its breakthroughs were modest and its responses often carried misplaced confidence.
SEE ALSO: 1 in 5 workers are misusing GenAI, according to a new survey
I spoke with Professor Ding about what today’s models can and can’t do, why curiosity remains out of reach for machines, how she designed an experiment to test genuine scientific reasoning and what her findings tell us about the future relationship between human and machine creativity in science.
BN: What inspired you to test generative AI’s ability to make scientific discoveries?
AD: My research has been driven by a fascination with the origin of intelligence in both humans and machines. Inspired by Isaac Newton’s Philosophiæ Naturalis Principia Mathematica (Mathematical Principles of Natural Philosophy), we focused on using mathematics to describe human cognitive models. Our goal was to understand the creative thought process and develop computational models that allow machines to establish similar intelligence.
More than twenty years ago, we began studying how human scientists make discoveries and developed computational production systems that could replicate this process in specific fields like biology and physics. We later expanded this into groups of intelligent 'doers' -- agents with specific domain knowledge working collaboratively to solve complex tasks in business areas.
However, while these production systems worked well in specific domains, we faced a significant challenge: machines lacked true 'understanding.' The advent of Generative AI marks the first time in history that machines can genuinely grasp the meaning of human language and prompts. Since scientific discovery represents one of the highest forms of human reasoning and creativity, we wanted to test whether a model that understands language so well can also navigate the complex process of scientific discovery.
BN: Do you think people have been overestimating AI’s current scientific ability?
AD: I don't think it's a matter of overestimation, but rather a misunderstanding of how AI works. It depends entirely on the specific domain. Since machine intelligence is fundamentally achieved through computation, AI’s scientific ability is strictly limited by whether domain knowledge can be translated into a digital or symbolic format without losing its inherent meaning. We denote this requirement as having a 'computable representation.'
Currently, GenAI performs exceptionally well in fields like mathematics, physics, biology, and finance because these domains have established symbolic structures that are easily digitized. However, we risk overestimating AI when we expect it to replicate human faculties that lack a clear computable representation -- such as imagination, curiosity, or deep cultural intuition. Until we can mathematically represent those traits, AI will remain a powerful tool for processing data, rather than a source of true independent thought.
BN: What expectations/hopes (if any) did you have when beginning the experiment?
AD: Our primary expectation was to determine if the 'logic of discovery' is something that can emerge from language modeling alone. We wanted to test if GenAI possesses an implicit understanding of the Scientific Discovery -- observation, hypothesis, experimentation, and conclusion -- without being explicitly programmed to follow it. Moreover, we went into this with a bold question: Can a machine possess 'scientific intuition'? We didn't just want to know if GenAI could solve a math problem; we wanted to see if it understood how to solve a mystery.
BN: How did you design the experiment to simulate a real scientific process, and why did you choose genetics?
AD: The Design: Our experimental design was centered on minimizing instructional bias. We know that GenAI generates answers by following chains of thought triggered by prompt keywords (analogy, proximity, reflection). Therefore, to test true discovery rather than just instruction following, we had to strip the prompts of concrete guidance.
We strictly avoided prompts like 'Design an experiment for Gene X,' as that provides the roadmap. Instead, we provided raw context and observed if the system could autonomously trigger the scientific loop: Hypothesis Generation à Experiment Design à Result Interpretationà Revision. We wanted to see if the process of science would emerge without being explicitly commanded.
Why Genetics: We chose genetics because it is the fundamental code of life. It represents a domain that is both highly structured (logic-based) yet incredibly complex (high-dimensional). If GenAI can navigate this complexity to make discoveries that improve human life, it validates the utility of the model in high-stakes science.
BN: Could you briefly explain what the AI was asked to do in the experiment, in as non-technical terms as possible?
AD: The AI essentially played the role of a scientist in a virtual lab. It was given a Nobel-level biological task and asked to make an investigation. We were examining whether it could propose hypotheses, design experiments to test the proposed hypotheses, observe unusual phenomenon from experimental outcomes as well as revise unsupported hypotheses.
BN: What were the ‘modest discoveries’ that the AI made?
AD: We find that current AI can make only incremental discoveries.
BN Did the AI show any signs of learning or improving throughout the experiment?
AD: No.
BN: What does ‘curiosity’ mean in a scientific context, and why can’t AI currently replicate it?
AD: In a scientific context, curiosity is not just the desire to know; it’s the instinct to chase an anomaly, to wonder and ask why, and to imagine possibilities outside established knowledge. Currently, AI cannot fully replicate this because it operates on extrinsic reward functions. An AI explores because we tell it that exploring will maximize its score (e.g., finding a better chemical structure). A human explores because the unknown itself creates a state of cognitive tension that must be resolved.
Until we can create a 'computable representation of interestingness' -- a mathematical way to value a question simply because it hasn't been asked yet -- AI will remain an optimizer of known goals, rather than a seeker of new ones.
BN: How do you see the relationship between human and machine creativity evolving in science? Collaborative or combative?
AD: I see the relationship as fundamentally collaborative, not combative, but with a clear hierarchy of function. We view the evolution of science as moving toward a 'Human-Orchestrated, Machine-Executed' model. In our framework, human creativity defines the search space -- we ask the questions and define the value of the answers. The machine's 'creativity' is the ability to traverse that space at speeds and dimensions we cannot match.
Think of it as a symbiotic loop: The human provides the 'high-level intent' (the prompt/direction) and the domain intuition. The GenAI act as the 'high-velocity explorers,' returning with patterns we wouldn't have found alone. It is not a competition; it is the difference between the architect who designs the skyscraper and the engineers who calculate the load-bearing physics. You need both to build.
BN: Are you optimistic that AI could one day be curious in the same way humans are?
AD: I’m cautiously optimistic. Curiosity in humans arises from emotion, cognition, and lived experience, none of which AI currently possesses. But future models may develop mechanisms that mimic curiosity well enough to generate novel ideas. Whether that becomes true curiosity or simply a sophisticated imitation is a philosophical question, but from a practical perspective, even a partial imitation could meaningfully enhance scientific discovery.
