Loader

Interview: AI Not Yet a Miracle Worker, But a Powerful Scientific Prompt Responder

author image

Sakthi Prasad T, Content Director   |   7mins

Tech giant Google in collaboration with Yale University released an AI model known as Cell2Sentence-Scale 27B (C2S-Scale), a new 27 billion parameter foundation model designed to understand the language of individual cells. Built on Google’s Gemma family of open models, C2S-Scale represents a new frontier in single-cell analysis.

Guest Image

Given this scenario, and even as AI fever grips Biopharma and other science-focused industries, along with its fair share of disillusionment, we spoke with Paul Denny-Gouldson, Zifo's Chief Scientific Officer, to make sense of what Google’s announcement means for AI in science and for scientists.

Google announced this partnership with Yale, and their AI model has generated a cancer hypothesis. This, of course, is the initial stage of research, and the company says they achieved in vitro success, which then requires testing and confirmation in vivo. Firstly, what are your thoughts on this development? Secondly, what does it truly mean for an AI model to generate a scientific hypothesis, and why is that significant? I am asking this because I want to know: is this a 'Sputnik moment' for AI, or is that an exaggeration?

Firstly, I would like to congratulate the Google and Yale team for achieving this impressive breakthrough. The first thing we have to consider is what this thing that's been created actually is: it's a model of a single cell that enables it to answer questions a scientist asks. For instance, they might ask, "Can you find me this?" or "Can you tell me about this?"

The model's power lies in the vast volume of different, multimodal data types used to build it. This essentially creates a statistical model of how the single cell will behave. The data types include imaging, biochemical, and phenotypic data, all merged with open-source information on pathways, genes, and proteins.

This specific development is a single-cell model for cancer, making it disease-specific. The paper discusses the concept of developing similar models for other diseases and cell types. By linking different data types, the model offers predictions. However, we must be careful about calling it an "AI" in the sense of an independent creator; it is a model that is only as good as the data used to create it.

The authors themselves discuss extending the model with more data types to increase the fidelity of the information and the model's ability to offer alternatives or predict things. But it always comes down to the question you ask of the model. The model itself won't just create something out of nothing regarding potential treatments or outcomes for the disease in question. It responds to a prompt like, "I am investigating this; can you find me an alternative or something that works with this?"

That is why it's so important to really distinguish this from the de novo creation of a hypothesis by the model or the AI on its own.

Back to the question -- is this a Sputnik moment -- maybe, maybe not. I am not sure. My healthy scepticism has me thinking if this has been done before, but behind “closed doors” -- so in that way it may not be the “first”. There is no doubt, however, it will fuel the research into this space as it shows what's possible -- and being open-source means others can build on it, which is the best way to accelerate development. So, in this way, it is just like Sputnik -- it will incentivise and challenge others to build on it and do better, like getting a man on the moon, or in this case, even Mars.

Okay, this is fantastic. They built a model, applied a “specialised LLM” trained on a multimodal corpus of over 50 million cells and associated text on top of it, and the scientists used a highly structured, computational query to interact with the model. Hence, this isn’t a classic LLM query like “Hey C2S, what drug combination works for cold tumors?”

Yes, that’s correct. Now, let's be clear: this is a very, very large model. While I don't know if private organizations have produced bigger ones, the sheer number of parameters they used -- in the billions -- means it can accommodate a vast variety of different questions. This is because all cell behavior is interlinked, and the model's main function is to map all those links.

The power of having such a large model to describe the overall behavior of an individual cell -- remember, this is a single-cell model -- is its ability to find relationships that a human cannot see or even contemplate. We can only actively manipulate perhaps two or three parameters in our minds at once; whereas this model has billions.

That is the true power I see in this methodology: building a model using the highest quality and largest volume of data possible. This allows the model to be used effectively to test or generate hypotheses.

Now that the model has come up with the hypothesis, the scientists will still have to test it as the next step, through experiments.

Yes, now that the model has come up with the hypothesis, the normal process goes on. Now we have the classic design, make, test, analyze (DMTA) cycle. What the Goole-Yale model offers is the ability to generate ideas for testable hypotheses in the lab, all derived from a vast pool of quality data.

This brings us back to the issue of good data: if the data used to create the model is not trustworthy or high-quality, the model will inherently be flawed. Regardless, the scientist, organization, or lab environment still needs to proceed with testing that hypothesis.

The fascinating aspect here lies in the cost and subsequent cycle. The investment to build and train this model is significant -- likely multiple millions, if not tens of millions of dollars, factoring in compute power, personnel, and data curation. This endeavor is a real showcase of what is possible with major investment from partners like Google and Yale.

Now that the model exists, as we go through the design, make, test, analyze cycle in the lab -- the "wet work" -- and generate new data, that data can be used to augment and retrain the model. If an experimental result confirms the model's prediction, the model's trust in that relationship is enhanced. If the experiment result “fails”, the significance of that perceived relationship is lowered.

This entire cycle must continue: you continually add new information based on the hypotheses the model generated. This process is called reinforcement or checking of the model. As you conduct more testing, the model becomes progressively better, because you are exploring, extending and validating the entire “cell process space” that exists within the model.

Okay, so how was this hypothesis being validated? I am asking because we are so used to the consumerization of LLM models. People can ask ChatGPT or any other LLM any question, and it throws out an answer, but we don't know if it's true or if it's just hallucinating, and so on. From the paper, were you able to deduce any guardrails that Google and Yale put in place to ensure that the hypothesis generated by the model is actually validated, and not just hallucinating? How do they ensure that?

There are different ways to reduce the risk of hallucination. One is ensuring the data used is of good quality. This alone is a labor-intensive problem, requiring knowledgeable people to validate the data. However, a number of automated and other tools can be used to check the data for consistency.

As for the guardrails, this is where the concept of a data foundation comes in. Beyond the raw data, the critical piece is the ontology used to govern the relationship between all the entities. You can start with a gene, a pathway, a hormone, or a metabolite and traverse the relationships between all those entities.

The ontology is critical; they used and curated a number of external ontologies to ensure they had a good definition of all the relationships and terminology. This ontology can then be used to check the data. It asks: "Does this new data fit the ontology, or does it suggest claims or relationships that are either new or contradict existing information?" If a contradiction arises, scientists must go in and examine the data to see if it was tagged incorrectly or produced inconsistently.

The large language model (LLM) itself also needs guardrails to prevent it from making incorrect assumptions or stating that two things are related when they are not. The ontology helps serve as the quality checker or QC element to support the LLM. The final piece, of course, is that you cannot trust anything without experimental testing and checking.

What excites you about this AI model?

The exciting part here is using this concept of a model to empower scientists to explore hypotheses. The next step would be to have an agent that can take that hypothesis and turn it into an experimental design -- for example, "I can help you design an experiment to test the relationship between this gene-pathway-protein-disease”. This could then be executed by another agent that controls an automated robotics laboratory -- and a third agent could pull all that data and analysis together for the scientists, before pushing it back into the model after the scientist intervention.

This creates a virtuous cycle of support. As you add more data from successful or failed experimental studies, the model is continually refined through reinforcement. If something works, it enhances the trust in that relationship. If it doesn't, it lowers the significance of that potential relationship. The more testing conducted, the better the model becomes, because you are exploring and validating the entire experimental space.

Playing this forward, the ultimate goal is to have a whole set of models that mimic a human, starting at the cell level, scaling up to an organ, linked organs, and eventually the entire system -- a human -- the “moon shot” perhaps. Previous attempts at this were small-scale, and while useful, they are limited. The power of this new wave of large compute is that it enables us to build much better, larger-scale models.

Okay, moving on, Paul. As a scientist yourself, how important do you think scientists are in the current day, now that these AI models are slowly emerging, and what makes their job so important?

I think it's really interesting talking to fellow scientists. As I do most weeks, many of them are a little distrustful of AI generally, mainly because of the hype and the outlandish claims some people have made about what it can do. I think it’s all about being very thoughtful and employing a healthy scepticism -- and that’s exactly what I see with my science colleagues and believe myself.

There are some scientists who are "all in" and pursuing it aggressively, and others who are completely against it. But the biggest group is that middle band utilizing healthy scepticism and asking, "What’s in it for me? How will this help me do my science?"

You have to imagine there are so many different types of scientists and problems that scientifically driven organizations are trying to solve. The example with Google and Yale -- a single-cell cancer model -- is perhaps only applicable to 0.1% of the scientific community. This is where the hype has taken over a little: the declaration that "we have done this, and it’s amazing" isn't relevant if you talk to chemists, food scientists, or people who don't work in cancer or cells. These other scientists are asking, "I need something that will help me."

I think having healthy scepticism is very good. The job now for all organizations is to determine what kind of support will work for each specific scientist or scientific group. Will it be a “model” like this, or a different concept of how machine learning or an LLM-based search interface can interrogate information?

Fundamentally, there is a whole host of different scientific problems to solve. Whether it is physics, chemistry, biology, or material science, all have their own unique issues. The key is to put the scientist at the center and determine how we can help them do their job: help them ask questions they perhaps didn't think of and see patterns they can't see. This is the ultimate power of LLM interrogation of big models: surfacing patterns, links, and relationships that are impossible for a human being to perceive.

The hypothesis that Google generated for a single-cell cancer model -- does this qualify as a discovery? And with AI entering the process, does that generation itself qualify as a discovery?

Personally, I don't believe it qualifies as discovery. Discovery is the finding of proof of something -- being able to definitively state, "We now know this behaves this way and this is a new discovery."

What Google’s single-cell model has done is create an environment for people to ask questions they perhaps couldn't have asked and to generate hypotheses, or "multipaths," that then need to be tested. Until that relationship is proven in the lab -- both in vitro, which they have successfully done, and then in vivo, which they are currently working on -- it does not qualify as a discovery for me.

The model didn't just tell the scientists there's a relationship "out of the blue." You had to apply their scientific knowledge of disease biology and pathways to formulate the initial queries -- as they describe in the paper -- to build up the investigation. Only then is the model able to suggest a potential link and the treatment regime for the specific problem.

It's an interesting journey through the semantics of what constitutes discovery. I believe these models, at this stage, will support scientists in exploring areas and relationships they had never seen before -- but the scientist still must prompt and interrogate the model.

All of the scientific AI we hear about at the moment is, so to speak, primarily focused on the discovery side --specifically drug discovery -- which is arguably the most glamorous part. However, discovery alone is not all of science, right? There is a whole host of processes involved in bringing a drug to the market, and currently, all AI efforts are concentrated on discovery.

It's becoming a bit like a fairy tale where some people claim, "Oh, I just need to ask GPT to find the next prototype for paracetamol, or whatever, and it gives me an answer." Do you think there is too much focus on the discovery side because it's obviously glamorous -- the idea of just finding the next magical drug?

The chatter suggests that drug discovery might move from wet labs to computation. I know these are very far-fetched, futuristic pronouncements, but we shouldn't dismiss them either, as we don't know how technology will evolve.

As a scientist for a long time, Paul, when you hear such pronouncements, what comes to your mind? What do you think is going to happen?

You are absolutely right; the discovery topic is the one that grabs the headlines. However, behind the scenes, at all the scientifically driven organizations I talk to in Pharma-Biotech, FMCG, Speciality Chemicals, Food & Beverage, Agrotech etc., their understanding of AI is evolving very rapidly. They are looking at how AI can impact all parts of the lifecycle: not just discovery, but also research, development, manufacturing, and product trials.

They are exploring concepts like AI augmentation, decision support, pattern recognition, and workflow support to enable things they couldn't do before. Crucially, the time to value is often faster the further up the chain we go toward development and manufacturing.

This is because there are many problems in those stages that are manual and stepwise: creating documentation, checking data quality, and so on. These tasks are perfectly suited to well-trained models and AI and Agentic concepts. Furthermore, the amount of money you need to spend to address these issues is much lower. The problems are more discreet and defined -- you can precisely put a box around them. For example, if the goal is to create an Investigational New Drug (IND) application, you can work backward, mapping all the required data sources, variables, and parameters. This process can then be monitored and augmented with AI.

In contrast, the discovery problems are massive. Look at the investment required to get a single-cell model to partially describe one disease type: multiple millions to tens of millions of dollars, plus the cost of aggregating data from many different places. These discovery problems are so hard, yet they dominate the headlines.

We have seen many startups in this drug discovery space that require a huge amount of capital that haven't succeeded yet despite spending heavily on aggregating old data and generating new data.

I feel we are definitely heading toward, or are already in, the trough of disillusionment in this space. The actual wins we are seeing are in more process-oriented, stepwise concepts for AI, and that success will build trust.

But who knows? We could eventually see multiple massive models -- a "model of the heart," a "model of the lungs" -- each described at this huge scale. Models exist now that describe these organs, but they aren't built on these massive datasets. The interesting future question will be to compare the massive models with the smaller ones and determine how much better they are versus how much more money you have to spend to build them.

And, personally, what do you think is the importance of a scientist in this age of AI?

I think we are heading toward a state where these AI models will adopt the co-pilot concept -- an assistant that is simply a part of our life as scientists. How fast we get there, I don't know, but we are already seeing it. Product companies are implementing it, and organizations are building their own support tools.

This raises many questions: How do I know that a product-embedded AI is compatible with my own corporate AI strategy and models, or any orchestration layer I am using? Can I leverage its functionality, but turn it off if I want to use my own AI models in the scientist workflow?

This is because you have to trust the AI. You must trust the model, how it's been developed, the data it was created on, and its output. Without that trust, it's like a game of Russian roulette; you don't know if you'll get the right answer. This brings us back to: How do we avoid or pick up hallucinations and ensure the AI is giving a quality, proper result or inference?

Then we enter the realm of governance -- AI governance, model governance, and the concepts of AIOps and GenAIOps, similar to DevOps in many ways. I believe this will become highly connected to the quality groups within organizations, as they are responsible for the quality of the final product. If we are using tools that directly impact that quality, there will be a much bigger emphasis on governance.

Organizations are not fully there yet; some are just beginning to build this concept. It all comes back to: How do I build trust? The answer is: I need really good, trustworthy data. This is where the scientific data foundations become essential again -- they must be in place to deliver reliable data for the models to consume and provide quality inference or hypothesis support.

What does it take to be a scientist in the age of AI? I think it's all down to being a scientist -- healthy scepticism with an equal amount of curiosity and willingness to learn and try new things. Applying the DMTA cycle to AI in the same way we typically use it to our chosen area of science. Embracing the technology will help us in finding new ways to work that let us explore new innovative approaches, speed up current things, increase precision and accuracy or enable more capacity. All of these will add value to the scientists, helping us be better scientists, all thanks in part to our scientific AI assistants.