Genome-scale metabolic models (GSM) underpin pathway and strain engineering by enabling systematic interventions of cell metabolism for pathway design and host optimization in bioproduction. They support constraint-based analyses such as flux balance analysis, which inform pathway reconstruction, knockout analysis and other engineering strategies to guide experimental design aimed at improving yields of target chemicals.
Despite their central role, practical challenges remain; constructing an efficient and robust implementation workflow is technically demanding, often require specialized expertise, rigorous feasibility checks, and integrated simulation tool chains. Large language models (LLMs) have emerged as powerful assistants for scientific work, offering natural-language interfaces that can explain concepts, parse files, and generate code and documentation, which could lower the barrier to GSM interpretation and analysis workflow setup, accelerating hypothesis generation and improving accessibility for non-experts. Yet, there is limited evidence regarding LLMs’domain knowledge for interpreting GSM and implementing the analysis tasks.
In this work, we present a comprehensive evaluation of LLM capabilities for understanding and analyzing GSMs in metabolic engineering. We have systematically evaluated five main areas: (i) domain knowledge, (ii) metabolic flux prediction, (iii) model reconstruction for pathway, (iv) up- or down-regulation analysis for pathway optimization, and (v) knockout analysis. We benchmarked four prominent LLMs (GPT, Gemini, Claude, and Deepseek-R1) and assessed their outputs using standardized rubric-based scoring metrics, with independent evaluations by all these models. We identified recurrent failure modes and task- and model-specific limitations and articulated best practices for deploying LLMs within GSM workflow and integrating them for knowledge extraction and analysis implementation pipelines in metabolic engineering. This work establishes an evidence-based baseline for LLM-enabled GSM analysis and informs the development of more reliable, accessible, and automation-ready computational workflow for pathway and strain design.