Blog: Automated Researcher and Beyond: The Evolution of Artificial Intelligence
You will soon be able to explain a research question to a machine and get an answer in return.
Welcome to STUDIO….a loading bar makes it way across the screen…the system scans your face…a voice with a slight east-german accent says:
“Good morning, what would you like to do today?”
Since 2025, similar solutions have taken the research world by a storm. The promise is dead simple; explain your machine — for example, a laptop or a smartphone — the research problem, and the machine will find a way to answer it.
You start by explaining that you’re interested in understanding factors related to mortality in advanced cervical cancer patients.
“Would you like me to start with a literature review?”
Hundreds of new papers are published on cervical cancer every month. Before the emergence of automated research systems, researchers had given up attempts to perform comprehensive literature reviews on all but the most niche topics where the number of newly published articles still allowed it. Papers that mention “Cervical Cancer” in their title, total roughly 10,000 papers for the past 12 months, and that’s just a small fraction of potentially relevant articles. Around the turn of the millennia, we had lost track of the collective wisdom contained within the compendium of works governing a given topic.
You decide to go with the literature review.
“Shall I provide comprehensive conclusions or will highlights do? Conclusions will move ETA from roughly 3 minutes to several days with your device. For a cloud computing budget of $300, we can reduce the ETA for conclusions to roughly one hour.”
You decide to start with highlights, that way you’ll be looking at something in just a few minutes from now.
“Roger that, shall we commence?”
You confirm, and the machine goes to work.
Automated Literature Review
The machine finds several million potentially relevant scientific articles. Some documents investigate and discuss the research topic. Others have a meaningful connection with the topic through citations, authors, and other factors. Some signals would not be meaningful for humans to interpret, they are latent propensities hidden deep in the way the content of the articles, as well as other aspects, form subtle connections with other articles. Sometimes the strongest connections are from surprisingly different topics. Advanced machine intelligence capabilities are a balancing act between usefulness and surprise. The results take a few minutes to come in, regardless of the fact that everything is happening on a sub $1,000 tablet. You could be on your smartphone, and it would not take any longer.
“The results are available on your device.”
You lift your device, and before you opens an easy-to-read interactive summary of the findings of the literature review. What had become infeasible for humans to do by the year 2000, can be done by anyone in the time it takes to get a cup of coffee.
The Parametrization of Machine Intelligence
By the early years of the 2020s, researchers had entirely parametrized the fundamental aspects of computer-aided research. Very roughly speaking, a machine intelligence based decision-making system involves collection and preparation of data, selecting, optimizing and testing of the model, and validating and governing the resulting solution. All the individual activities that made up stages of using data to answer research questions, had been meticulously mapped out and parametrized to a degree that allowed the full automation of all of those processes.
Whereas up until recent years, data scientists regularly used half of their time into so-called ETL activities — extracting, transforming, and loading data — nobody did data preparation anymore. The machine would do it in ways inconceivable to people, going through a remarkable number of different variations before deciding how optimal input data would look like. The machine might also decide to use relevant public datas in an effort to get the best possible result for the research problem.
Once all the involved processes had been clearly articulated in terms of the parameters they consist of — which was merely a question of time rather than feasibility as some had thought— it became a straightforward engineering problem to automate the related processes. There was nothing new to build, everything was available through established open-source packages. One merely needed to combine those functionalities in ways that created new, entirely automated workflows. Once the workflows were automated, several AI research groups understood that to better capitalize on the promise of machine intelligence, it was critically important to assume a more balanced approach to system development. Whereas capability development, such as new neural network architectures, had received the great majority of attention from researchers and developers for several decades, interaction and governance had largely been ignored.
Back in 2020, interaction and governance aspects involved many critically important unsolved problems; such as reducing the cognitive overhead involved with using general machine intelligence APIs, as well as AI security and ethics. This shift in balance — between the three pillars — set in motion the race for creating the Automated Research.
Moving Beyond Generality
The big problem of AI, generality, had already been solved by solutions such as TensorFlow and Pytorch in the years leading to 2020. In fact, generality had been solved by 2015, nobody just clearly understood that it had happened. Platforms such as TensorFlow, allowed a computer-savvy researcher to create solutions for virtually any research problem. On the coattails of the success of these platforms came the first machine intelligence Solution APIs, for example Keras and Fast.ai. Solution APIs radically simplified the way in which a researcher had to explain the solution they were looking for. Deep Learning Solution APIs reduced hundreds of lines of archaic computer codes, into a process resembling the way a child plays with Lego blocks.
The issue that Solution APIs did not resolve, has to do with the way humans had communicated for hundreds of thousands of years. We’re much better in explaining a research problem in words, rather than describe the solution in computer codes. It followed, that the obvious progression involved moving to what was to become the most important paradigm shift in the way humans interact with information systems since the Mother of All Demos in 1962. That shift was the Problem API.
The Problem API
The big breakthrough in the field of machine intelligence didn’t come from new deep learning model architectures, or anything the field of research had focused on. Instead, it was a silent and painstakingly slow progression with a focus in the way humans interact with machines. The result was the first Problem API. Today, in 2030, there are still many Solution APIs, but researchers rarely use them. Instead, Solution APIs are the building blocks for Problem APIs in the same way differentiation engines were building blocks for the Solution APIs. The idea of the Problem API was very simple; for thousands of years, the research process had focused on asking questions (stating problems) and then through a rigorous process, finding answers. In retrospect, it became obvious that the future of research would be to follow the same proven paradigm. Moreover, the ability to articulate problems in human language dramatically expanded the number of researchers — and other people — benefiting from developments in machine intelligence.
By 2020 more than a million researchers were using Solution APIs across virtually all imaginable fields of science and industry. After more than 50 years of neural network research and development, AI was finally taking off.
Still, the machine was left dependent on having a human — either through programming or a drag-and-drop builder — to articulate the solution in clear and precise terms. The ability of the machine to solve a given problem heavily dependent on a highly skilled human to give it precise instructions on how to do it. Instructing of Solution APIs required an understanding of specialized language, statistical functions, and many other complex matters. Some of this complexity was carried over to the first Problem APIs; a moderate degree of technical-savvy was still expected for taking the full advantage of the system.
The Voice Revolution
Developments in voice comprehension and synthesis had recently made it possible to have meaningful interactions with machines without using anything but voice. Voice removed the need for complex controls that had perpetuated the divide between the highly skilled 0.1% and the rest of the world’s population.
The emergence of solution APIs and their convergence with voice technology grew the number of people with access to state-of-the-art AI from several million people in 2020 to several hundred million people in 2030. An unprecedented era of scientific discovery, innovation, and human creativity had started.
Automated Researcher and Beyond
In 2030, there is still some way to go towards a truly automated researcher. Human researchers believe — with everything we know today — that it should not take more than five to ten years before the early 21st-century role of data scientists is made redundant by machine intelligence based systems. The future roadmap now also included a pseudo-autonomous version of the automated researcher.
The automated researcher is a precursor to pseudo-autonomous research systems that are not only able to test a hypothesis, but to take the results of the performed research, and formulate a new hypothesis based on the findings. This means that the system is able to explore research topics horizontally and vertically without end. While the premise of an autonomous system is simple — the output of the system needs to be an acceptable input for the same system — much work is still required before autonomous research systems will play an important role in answering the most pressing questions of humanity.
Eventually, autonomous research systems would be able to discover completely new fields of science. Nobody is able to say precisely how long it would take before development would reach that point. Once processes related with hypothesis testing became mature, work on hypothesis formulation would no doubt accelerate exactly in the same way we had seen taking place with first Solution APIs followed by Problem APIs. At least until 2032, our research have been focused on the ability of the system to formulate meaningful qualitative insights from a literature review or quantitative data analysis.
Once the machine would be able to formulate qualitative insights, expanding the capability to forming new questions based on available answers — including questions that had never been asked before — is going to be merely a matter of time.