When Anthropic’s blog post went online, Ferdous Nasri had no idea how strong the response would be.
What followed was a flood of emails. Messages from the community. Collaboration requests. Feedback from people saying: “Thank you for building a solution.” That was when she realized just how many researchers had been waiting for a solution like this.
Ferdous Nasri is a doctoral researcher at the Hasso Plattner Institute (HPI), where her work brings together bioinformatics, AI, and global health. In her current project, she and her team ask:
What happens when AI agents are asked to query biological data, but the data infrastructure was never built for AI?
Ferdous explains it with an image:
Biological databases are often like old European cities: winding, complex, full of small streets. They work well for experts who know which paths gets them better access and what not to trust blindly.
AI agents enter this city more like high-powered sports cars. The old streets were not designed for them. So they take shortcuts and produce answers that may sound convincing, but are not necessarily reliable.
In bioinformatics, labs worldwide upload genetic virus data to databases such as those maintained by the NCBI. Researchers use these data to track outbreaks and variants. Ferdous and her team investigated how well AI agents can retrieve information from them.
The team developed VirBench: 120 queries across 40 pathogens. They tested AI agents on their ability to find relevant datasets. The results were not reliably accurate. This is particularly sensitive because inaccurate outputs may lead to flawed conclusions.
For their analyses, the team needed powerful models and a large compute budget. Anthropic and OpenAI provided tokens. Ferdous initiated and coordinated the transatlantic collaboration with the Broad Institute of MIT and Harvard and the NCBI.
Anthropic later published a blog post about the work because the team had built a solution: gget virus.
Returning to the image of the old European city, the tool is like a structured road built underneath it. A clear route that allows AI agents and humans to retrieve virus data reliably. And when something does not work, the tool makes that visible.
For Ferdous, data work is not tedious. It is the essential foundation. Without reliable data, even the best model cannot produce a reliable answer. While many are talking about larger models, better agents, and smarter prompts, data quality should be the first priority:
We cannot simply prompt our way out of this data bottleneck.
The paper is on its way to a journal, and Ferdous is gathering feedback at conferences. Several labs have reached out and some are already using the tool.
Grateful to have led this alongside Laura Luebbert, Sarah Gurev, Krithik Ramesh, Patrick Varilly, Nuala Oleary, Jonah Cool, Pardis Sabeti, Bernhard Renard.