BERIL: Integrated AI Infrastructure for Genome-to-Phenotype Research Using the BER Data Lakehouse

BERIL is a new (late 2025) project funded by the Department of Energy’s department of Biological and Environmental Research (BER). The BER Data Lakehouse seeks to provide consistently labeled, harmonized datasets designed for AI-driven reasoning and inference across multiple heterogeneous data sources. The BER Integrative Layer will develop an extensible, self‑updating AI ecosystem that leverages the BER Data Lakehouse to orchestrate specialized reasoning agents and accelerate genome‑to‑phenotype discovery.

BERIL’s unified agentic AI infrastructure includes a Central Orchestration Agent (Generalist Agent) working collaboratively with multiple Specialized Agents. The Generalist Agent coordinates overall activities and user interaction and facilitates literature searches, tool execution and Data Lakehouse queries. It also manages "reasoner traces," collecting detailed records of reasoning processes and engaging in rationale-driven refinement loops. This dual-layer feedback approach combines curiosity-driven interactions (reasoner traces) and formal benchmarking. The integrated agent coordination framework will employ a hybrid approach, combining curiosity-driven exploration and structured modular extensions via the Model Context Protocol (MCP).

As a pilot demonstration of BERIL, we will select genes, pathways, and microbial strain targets that support DOE BER biomanufacturing and bioproduct development objectives.

Edit