Can large language models (LLMs) actually produce novel, expert-level research ideas? In the grand tapestry of progress we are witnessing LLMs like GPT-o1 demonstrating remarkable capabilities in knowledge and reasoning. Solving challenging mathematical problems, assisting scientists in writing proofs, retrieving related works, generating code, and discovering patterns. These feats hint at a future where AI doesn’t just follow human instructions but contributes creatively to human endeavours.
The Promise and the Question A growing number of researchers propose autonomous agents that can generate and validate new ideas independently.
- CrewAI is a framework designed to build autonomous multi-agent systems that can generate and validate ideas without human intervention.
- SciMON is a framework that utilises large language models (LLMs) to generate innovative research questions Initial explorations suggest that LLMs can produce ideas judged as novel, sometimes even more so than those generated by humans. However, they may lag slightly in feasibility, indicating that while agent frameworks can think outside the box, it might not always account for practical constraints.
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
**Key Findings from the Study: **The study found that while LLMs can generate ideas that are perceived as more novel, they may lack practical feasibility compared to human-generated ideas. Assessing novelty is inherently difficult, even for experts. The study highlights the complexity of evaluating AI-generated ideas and suggests that further research is needed to refine these methods.
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
Sakana.ai aims to develop AI systems capable of autonomously generating and testing scientific hypotheses. Sakana.ai envisions AI not just as a passive tool but as an active participant in the scientific process. By leveraging advanced algorithms and vast datasets, their AI Scientist is designed to explore uncharted territories, formulating theories and conducting experiments without human intervention.
The Debate: Evolution of Roles in the Age of AI
In both cases, LLMs outperform human expectations in novelty, brainstorming potential research ideas beyond existing paradigms. The first study, a large-scale human evaluation with over 100 NLP researchers, finds that while LLMs can generate more novel ideas than humans, they lack the depth of feasibility required for real-world applications. They are creative—novel—but often disconnected from the pragmatic constraints of engineering or science. Machines can dream big, but that’s the easy part. Turning those dreams into something operational remains elusive.
Sakana.ai attempts to close this gap by pushing LLMs to not just ideate but execute the research process in full. The AI Scientist automates everything—from idea generation to coding, experimentation, and even peer review. It performs entire research loops at shockingly low costs, generating a scientific paper for as little as $15 In a week, it can output dozens of research papers across machine learning domains like diffusion modelling and transformers.
While the AI Scientist may churn out papers that exceed the acceptance threshold of top machine learning conferences, it’s still bound by the limits of its training data and coding errors. It sometimes hallucinates results or misinterprets the scope of its own experiments, generating flawed, albeit intriguing, conclusions.
What do these two studies tell us? LLMs are creative catalysts in the scientific process, capable of innovation beyond the human bottleneck. But they still rely heavily on human correction to transform raw creativity into applied knowledge. LLMs, like all breakthrough technologies, operate within a delicate feedback loop. The more they iterate, the more they improve—but supervision, for now, remains key.
The cost of ideation, experimentation, and publication is dropping rapidly. LLMs like The AI Scientist could make research a volume game, where quantity drives incremental advances faster than traditional methods could. At scale, this will push science forward at speeds unimaginable even a decade ago and may contribute to democratising research. The application space is wide. **Machine-driven discovery could reshape fields from biotech to climate science, as automated research enters new domains. **However, AI scientists will need guardrails. Ethical considerations loom large when machines autonomously generate research without full comprehension of their outcomes. Applications in synthetic biology, for example, could risk safety without stringent oversight. I think these considerations are obvious. In essence, we’re on the cusp of a new era where machines take the role of junior scientists, generating ideas, executing experiments, and even drafting conclusions. They won’t replace researchers—but they will force us to rethink how we contribute.