Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in GMT
Tuesday, April 21st
8:45-9:00
Welcome

Authors List: Show

  • Megha Hegde, Kingston University, Stela Vilhar, University of Westminster
9:00-9:15
PISTACHIO: Proteomics-Constrained Negative Binomial Matrix Factorisation for Spatial Transcriptomics Deconvolution
Confirmed Presenter: Esra Büşra Işık, University of Manchester, United Kingdom


Authors List: Show

  • Esra Büşra Işık, University of Manchester, United Kingdom
  • Sokratia Georgaka, University of Manchester, United Kingdom
  • Mauricio Alvarez, University of Manchester, United Kingdom
  • Magnus Rattray, University of Manchester, United Kingdom

Presentation Overview: Show

Spatial transcriptomics measures gene expression while preserving tissue architecture; however, most sequencing-based platforms capture mixtures of multiple cells per spatial location, necessitating robust deconvolution. This challenge is particularly acute in heterogeneous tumours, where spatial intermixing, transcriptional plasticity, and the limited availability of well-matched single-cell references undermine existing approaches. We present PISTACHIO, a reference-free deconvolution framework for spatial transcriptomics based on constrained non-negative matrix factorisation with a negative binomial loss. The model operates directly on raw count data to capture overdispersion and incorporates binary spatial priors derived from spatial proteomics (Imaging Mass Cytometry) to enforce biologically grounded sparsity. These priors restrict cell-type contributions to spatially plausible locations, suppressing spurious assignments while preserving interpretability. We evaluate PISTACHIO on synthetic spatial transcriptomics data generated from tumour single-cell references, as well as on real tumour sections from prostate cancer and glioblastoma with matched spatial proteomics. Across all settings, the method accurately recovers spatial cell-type distributions and interpretable gene expression profiles, remains robust to realistic perturbations of proteomics-derived spatial masks, and completes inference within minutes on standard CPU hardware. Benchmarking against reference-based and unsupervised methods demonstrates improved robustness under reference mismatch and spatial noise. By embedding proteomics-derived spatial constraints into a statistically principled matrix factorisation framework, PISTACHIO provides a scalable and interpretable solution for multimodal spatial omics analysis, enabling reliable deconvolution in complex tissue environments without reliance on matched single-cell RNA sequencing data.

9:15-9:30
MultiCOCor: identifying multiple clustering structures inhigh-dimensional data
Confirmed Presenter: Jack Hodgkinson, MRC Biostatistics Unit, University of Cambridge, United Kingdom


Authors List: Show

  • Jack Hodgkinson, MRC Biostatistics Unit, University of Cambridge, United Kingdom
  • Paul Kirk, MRC Biostatistics Unit, University of Cambridge, United Kingdom

Presentation Overview: Show

Clustering is an exploratory tool for molecular datasets, for example to discover disease subtypes. Such high-dimensional data often reflect multiple biomolecular processes and pathways, each associated with different groups of variables, and may therefore exhibit multiple clustering structures. Standard clustering methods that produce a single partition of the dataset are not suited to these settings, as they can conflate heterogeneous signals arising from distinct processes and obscure meaningful structure.

We propose a novel, two-stage algorithm known as MultiCOCor that recovers multiple clustering structures. MultiCOCor proceeds in two stages: first, features are partitioned into distinct groups ('views') based on their correlation structure; second, samples are clustered within each view to identify subtype structures. This enables the identification of meaningful feature sets, where each set captures distinct biomarkers that may provide molecular signals for distinct subtypes.

We benchmark MultiCOCor using a simulation study that evaluates structure recovery performance across a wide range of scenarios. The method demonstrates robust performance and scalability across high-dimensional settings, establishing it as a competitive alternative to existing approaches. We further apply MultiCOCor to transcriptomic data from The Cancer Genome Atlas Network (PMID: 23000897), where sample clusters inferred within individual views show strong agreement with established clinical annotations, including PAM50 subtypes and hormone receptor status. The resulting views correspond to interpretable sets of co-regulated genes, highlighting distinct molecular signals associated with known disease subtypes.

Overall, MultiCOCor provides a scalable, flexible, and computationally efficient approach to multi-view clustering, enabling the discovery of multiple latent structures in heterogeneous high-dimensional data.

9:30-9:45
DeepPathway: Predicting Pathway Expression from Histopathology Images
Confirmed Presenter: Muhammad Ahtazaz Ahsan, University of Manchester, United Kingdom


Authors List: Show

  • Muhammad Ahtazaz Ahsan, University of Manchester, United Kingdom
  • Karen Piper Hanley, University of Manchester, United Kingdom
  • Martin Fergie, University of Manchester, United Kingdom
  • Claire O'Leary, University of Manchester, United Kingdom
  • Gerben Borst, University of Manchester, United Kingdom
  • Federico Roncaroli, University of Manchester, United Kingdom
  • Fayyaz Minhas, University of warwick, United Kingdom
  • Magnus Rattray, University of Manchester, United Kingdom
  • Mudassar Iqbal, University of Manchester, United Kingdom
  • Syed Murtuza Baker, University of Manchester, United Kingdom

Presentation Overview: Show

Spatial transcriptomics (ST) technologies provide spatially localized gene expression along with Hematoxylin and Eosin (H&E)-stained image data, facilitating the joint analysis in tissue microenvironment. Despite their transformative potential, the widespread adoption of ST remains constrained due to high costs and technical challenges in data acquisition. Thus, there have been recent efforts to develop deep learning methods for inferring spatial gene expression from much cheaper and easily available H&E images. These methods demonstrate promising results in reconstructing transcriptomic landscapes within tissue sections. While existing approaches predominantly focus on gene-level prediction, biological processes are often regulated at the
pathway level through coordinated activity among functionally related genes. We present DeepPathway, a bimodal contrastive learning approach trained on ST data to predict pathway expression from H&Es. Pathway expression profiles are derived by summarizing the expression of constituent genes using established pathway definitions. We assess the performance of DeepPathway on multiple cancer datasets, and further validate it on the H&E images from The Cancer Genome Atlas(TCGA), where the model clearly discriminates between normal and tumour tissues. Finally, we apply our method to predict hypoxia signatures using H&Es of brain tumour samples where hypoxia staining with pimonidazole was available as ground truth. Implementation code for DeepPathway is available at https://github.com/aahsan045/DeepPathway.

9:45-10:00
Leveraging open-access data in ChEMBL to explore emergingdrug modalities
Confirmed Presenter: Emma Manners, EMBL-EBI, United Kingdom


Authors List: Show

  • Emma Manners, EMBL-EBI, United Kingdom
  • Zainab Ashimiyu-Abdusalam, EMBL-EBI, United Kingdom
  • Melanie Schneider, EMBL-EBI, United Kingdom
  • Barbara Zdrazil, EMBL-EBI, United Kingdom
  • Noel O'Boyle, EMBL-EBI, United Kingdom

Presentation Overview: Show

Drug discovery is lengthy and costly with a high attrition
rate. It’s therefore crucial that existing knowledge is
captured early during drug development. ChEMBL is an
open-access bioactivity database providing a valuable
source of curated data linking drugs and “druglike”
compounds to their biological targets.

Small molecule inhibitors comprise the core bioactivity
data in ChEMBL, however biotherapeutics and emerging
chemical modalities are becoming more prominent. New
chemical modalities include “targeted protein degraders”
(TPDs), such as PROTACs. TPDs are heterobifunctional small
molecules that elicit degradation of their therapeutic
targets using the host degradation machinery. TPDs bind
host degradation effectors, as well as disease-targets,
which is reflected in their annotations. Their higher MWT
means classical drug design principles (Lipinski’s rule of
5) are less relevant whilst ADME properties and drug
delivery strategies remain key.

Our goal was to extract TPD-related data and enrich ChEMBL
with new “modality” annotations (ChEMBL 37 onwards). We
achieved this by a) mining bioassay descriptions in ChEMBL
using a curated list of keywords derived from literature
and existing ChEMBL bioassays, and b) extracting
bioactivities mapped to UniProt protein accessions for
putative host degradation effectors. The combined dataset
provided ~ 5400 TPD compounds, tested against ~500 targets,
with ~21.7K associated bioactivities (ChEMBL version 36).
Users can easily access assays, activities, targets, and
compound structural information for degrader-target
interactions. The characteristics, targets and mechanisms
of TPDs can be explored, associated ChEMBL data extracted
(i.e. pharmacokinetic parameters), and the dataset
harnessed for ML training applications.

10:00-10:15
Predicting drug resistance across cancer types usingmulti-omics transfer learning
Confirmed Presenter: Semih Alpsoy, Department of Molecular Biotechnology, Türkisch-DeutscheUniversität, Istanbul, Turkey, Turkey


Authors List: Show

  • Semih Alpsoy, Department of Molecular Biotechnology, Türkisch-DeutscheUniversität, Istanbul, Turkey, Turkey
  • Osman Ugur Sezerman, Acibadem University, School of Medicine, Biostatistics andMedical Informatics, Istanbul, Turkey, Turkey

Presentation Overview: Show

Drug resistance remains one of the major obstacles to
effective cancer therapy. In this study, we present a deep
neural network (DNN)–based transfer learning (TL) framework
to predict drug response and systematically investigate
drug resistance mechanisms across multiple cancer types. We
integrated gene expression, somatic mutation, and copy
number aberration (CNA) data with drug response profiles
using a multi-omics integration strategy. Models were
trained on the Genomics of Drug Sensitivity in Cancer
(GDSC) dataset by leveraging drugs targeting similar
biological pathways to construct pan-drug predictive
models. The trained models were evaluated on independent
preclinical in vivo patient-derived xenograft (PDX)
datasets and ex vivo patient cohorts from The Cancer Genome
Atlas (TCGA). To elucidate molecular mechanisms of
resistance, we performed pathway enrichment analyses for
paclitaxel, 5-fluorouracil (5-FU), gemcitabine, and
cetuximab, and applied Fisher’s exact test to assess
associations between resistance and gene-level mutations or
CNAs. Our pan-drug models consistently outperformed
comparable approaches, achieving superior predictive
performance measured by the area under the precision–recall
curve (AUCPR). Enrichment analyses indicated that
LDHB-mediated pyruvate metabolism and FYN-mediated focal
adhesion signaling may play pivotal roles in paclitaxel
resistance, whereas PINK1-mediated mitophagy emerged as a
key mechanism in 5-FU resistance. Beyond transcriptional
regulation, Fisher’s exact test suggested that CNAs in LDHB
and PINK1 may further contribute to resistance to
paclitaxel and 5-FU, respectively. Furthermore, shared
resistance mechanisms between paclitaxel and cetuximab were
identified. Importantly, these findings are supported by
prior experimental evidence, providing literature-based
validation. This framework provides predictive accuracy and
interpretable insights for precision oncology.

10:15-10:30
Cell-specific rewiring of GPCR signalling networks: A systems pharmacology perspective
Confirmed Presenter: Shanlin Rao, University of Cambridge, United Kingdom


Authors List: Show

  • Shanlin Rao, University of Cambridge, United Kingdom
  • Maria Marti Solano, University of Cambridge, United Kingdom

Presentation Overview: Show

The G protein-coupled receptor (GPCR) signalling system controls key physiological functions including vision, neurotransmission, hormonal regulation, and immunity. Its fundamental role in cellular homeostasis has also made GPCRs attractive pharmacological targets, with these receptors representing the most prevalent drug target class. To exert their function, GPCRs transmit extracellular signals into the cell via multiple transducers whose activities are modulated by regulatory proteins. Although it is widely accepted that cell-specific members of the GPCR signalling machinery can diversify phenotypic responses to the same ligand, we have little comprehensive understanding of how signalling pathway rewiring alters physiology and drug responses. To gain a systematic view of this phenomenon, we have integrated existing high-throughput experimental data characterising pairwise interactions between receptors and G proteins and β-arrestins, alongside interactions of receptor activity-modifying proteins (RAMPs) and regulators of G protein signalling (RGS proteins), to construct a network representing the GPCR signalling pathway interactome. Cell-specific subnetting based on Human Protein Atlas RNAseq data shows the extent of signalling pathway diversification across cell types, potentially impacting ligand responses for most GPCRs. To facilitate context-specific exploration of GPCR signalling, we present GPCRchitect, an interactive web-based tool for visualising GPCR pathway composition downstream of user-selected receptors and ligands across > 200 cell types and > 1,000 cell lines. Overall, our results illuminate the degree of functional diversity that could arise from cell-dependent GPCR pathway architectures, and serve to guide original drug design strategies that transition from targeting individual GPCRs to modulating cell-specific signalling pathways associated with particular therapeutic responses.

10:30-10:45
Integrating protein structure and population genomic datato detect diversifying selection related to immunity
Confirmed Presenter: Leonie J. Lorenz, EMBL-EBI, United Kingdom


Authors List: Show

  • Leonie J. Lorenz, EMBL-EBI, United Kingdom
  • Joel Hellewell, School of Engineering Mathematics and Technology,
    University of Bristol, United Kingdom
  • Matthew J. Russell, EMBL-EBI, United Kingdom
  • John A. Lees, EMBL-EBI, United Kingdom

Presentation Overview: Show

As populations adapt to changes in their environment, this
can leave traces of diversifying selection in their
genomes. In pathogens, an example of such an adaption is
immune escape. Diversifying selection can be detected by
calculating the ratio of nonsynonymous to synonymous
substitutions, dN/dS. Classical methods for estimating
dN/dS rely on phylogenetic trees, which limits scalability
and can give misleading results for bacterial species as
trees generally cannot be corrected to exclude
recombination. We have developed a new tool called
TOMBOMBADIL (Tree-free Omega Mapping By Observing Mutations
of Bases and Amino acids Distributed Inside Loci), which
computes dN/dS directly from codon counts from alignments
by comparing to expected frequencies from the coalescent.
We implemented TOMBOMBADIL as an end-to-end differentiable
model in both Stan and JAX, allowing us to estimate dN/dS
values across genes for alignments with potentially
millions of samples. TOMBOMBADIL includes different models
of evolution (NY98, GTR) and different assumptions of how
codons evolve (e.g., independently or similar within the
same gene/domain). We validated our method by comparing it
to established, tree-based dN/dS methods and on well-known
examples of bacterial genes under positive selection,
including the outer membrane protein PorB of Neisseria
meningitidis. We have added regression terms between
protein features and selection, which will help map sites
of positive selection to predictions of protein structures,
leveraging the relationship of functional evolution and
structure. In summary, TOMBOMBADIL allows fast detection of
positive selection in large-scale bacterial datasets using
a tree-free dN/dS approach and machine learning methods.

10:45-11:00
Detection of recombination in Arabidopsis centromeres

Authors List: Show

  • Jacob González Isa, Department of Plant Sciences, Unviersity of Cambridge, Spain
  • Matthew Naish, Department of Plant Sciences, Unviersity of Cambridge, United Kingdom
  • Namil Son, Department of Plant Sciences, Unviersity of Cambridge, United Kingdom
  • Ian Henderson, University of Cambridge, United Kingdom

Presentation Overview: Show

Centromeres are essential for chromosome segregation, and
their function is highly conserved, but centromeric DNA and
associated proteins (which make up the kinetochore) evolve
very rapidly; a phenomenon known as the ‘centromere
paradox’. Due to its repetitive region, recent advances in
long-read sequencing have provided the first complete
centromere in genome assemblies.

Recombination during meiosis is one of the major
contributors to genetic variation, due to crossovers in the
chromosome arms, which generate new sequences from
combinations of the parental genotypes. However, meiotic
recombination events within centromeres are known to be
suppressed and have so far not been detected.

In this work, we present a general k-mer-based
computational pipeline to detect, for the first time, rare
meiotic recombination events at Arabidopsis centromeres,
which might improve our understanding of centromeric
structural variations.

To detect recombination events, we nanopore sequenced
pollen DNA from the F1 generation of the hybrid between
two ecotypes of Arabidopsis: Columbia (Col) and Landsberg
(Ler). Then, we analyse these recombinant reads with
cenhap-mers (centromere, haplotype-specific k-mers), with
which we can increase the genomic markers in the
centromeres. We then assigned Col and Ler cenhap-mers to
the reads and, with this, we can detect transitions between
ecotype-specific k-mers and detect rare meiotic
recombination events in centromeres - around 100 of events
per million of reads.

This work provides new insighta into the genetic complexity
of centromeres and, with a novel approach, we find
recombination events at centromeres that could explain the
high variability between centromeres of closely related
species.

11:30-11:45
Pushing the limits of AlphaFold3: detecting DNA-binding domains at scale
Confirmed Presenter: Francesco Costa, EMBL-EBI, United Kingdom


Authors List: Show

  • Francesco Costa, EMBL-EBI, United Kingdom
  • Antonina Andreeva, EMBL-EBI, United Kingdom
  • Jeremy Pollak, EMBL-EBI, United Kingdom
  • Alex Bateman, EMBL-EBI, United Kingdom

Presentation Overview: Show

DNA-binding proteins (DBPs) regulate gene expression, DNA replication, and cell differentiation. Existing tools for predicting DNA binding rely primarily on sequence-based features, which provide no structural insight into the nature of DNA binding. The recent release of AlphaFold3, capable of modelling protein-nucleic acid complexes, offers a promising way to overcome the limitations of sequence-based methods by providing structural context to DNA-binding predictions. However, AlphaFold3 requires known binding partners as input information, which is unavailable when characterising novel proteins. To overcome this limitation, we trained a random forest model on a non-redundant dataset of SwissProt entries to predict DNA-binding using structural and confidence features derived from AlphaFold3-predicted complexes. Our method reached a ROC-AUC of 0.93, outperforming state-of-the-art sequence-based predictors when benchmarked on a dataset of recently released PDB structures. We demonstrate the potential of our workflow by predicting DNA-binding across all Domains of Unknown Function (DUFs) from the Pfam database. This led to the identification of novel putative DNA-binding domains, comprising putative transcriptional regulators, putative toxin-antitoxin systems and putative transposases. Our work establishes structure-informed DBP prediction as a more accurate and interpretable alternative to sequence-based methods, and provides a blueprint for integrating AlphaFold3 into functional annotation pipelines.

11:45-12:00
Identification and characterization ofPolyurethane-Degrading Enzymes from MGnify Metagenomes
Confirmed Presenter: Joel Roca-Martinez, University College London, United Kingdom


Authors List: Show

  • Joel Roca-Martinez, University College London, United Kingdom
  • Chirstine Orengo, University College London, United Kingdom

Presentation Overview: Show

The discovery of enzymes capable of degrading synthetic
polymers remains a major challenge due to the vast size and
functional diversity of metagenomic sequence space. Here,
we present a rational, structure- and sequence-informed
pipeline for enzyme discovery, applied to the
identification of novel polyurethane-degrading enzymes
(PURases) from large-scale metagenomic data. Starting from
2.4 billion protein sequences in MGnify, we performed
homology-based filtering against three known PURases
followed by conservation analysis of the catalytic triad,
yielding ~8,000 high-confidence candidates. To prioritize
functionally diverse yet tractable subsets, we clustered
sequences into functional families using an embedding-based
classifier (eMMA-FunFamer) and identified
function-determining positions (FDPs) conserved within
families but variable across them. Physicochemical
variation at these FDPs was used to construct a sequence
similarity network, revealing distinct functional clusters
that guided representative selection.
Candidate prioritization further integrated metagenomic
biome metadata to enrich for thermostable enzymes and
structural features such as loop architecture near the
active site, predicted using AlphaFold2. From this rational
selection pipeline, 20 diverse enzymes were selected for
experimental characterization. Biochemical assays
identified 9 enzymes with activity against carbamate
substrates, including 2 that also exhibited activity
against polyurethane polymers. These results demonstrate
that combining functional family analysis, residue
tunability metrics, and structural diversity enables
efficient navigation of metagenomic sequence space and
substantially improves hit rates in enzyme discovery. This
pipeline is broadly applicable to other challenging
catalytic functions where experimental screening capacity
is limited.

12:00-12:15
Metagenomic analysis identifies co-occurrance ofDesulfovibrio and curli genes in Parkinson´s patients
Confirmed Presenter: Fang Chi, University of Helsinki, Finland


Authors List: Show

  • Fang Chi, University of Helsinki, Finland

Presentation Overview: Show

Parkinson’s disease (PD) is increasingly linked to gut
microbiome dysbiosis, yet specific microbial drivers remain
inconsistent across heterogeneous populations. While
sulfate-reducing bacteria and bacterial amyloids have been
individually implicated in PD, their ecological
interactions and combined contributions to disease risk and
metabolic dysfunction remain unclear.
We performed a large-scale meta-analysis of 1,609 fecal
metagenomes from 10 PD cohorts across three continents.
Using a unified mixed-effects framework, we integrated
community-level profiling, targeted feature analysis, and
functional annotation to investigate the prevalence,
abundance, and synergistic interactions of Desulfovibrio,
Escherichia coli, and bacterial amyloid (curli) genes.
Global gut microbiome structure showed extensive overlap
between PD patients and controls, with limited
disease-associated variance at the community level.
Desulfovibrio, E. coli, and curli genes exhibited robust
and reproducible enrichment in PD patients at the
prevalence level. We identified a disease-specific
ecological coupling between Desulfovibrio and curli genes
that was absent in healthy controls. Individuals with
concurrent high exposure to both features displayed a
non-linear amplification of PD risk (odds ratio = 2.87),
exceeding the effects of either feature alone. This
synergistic interaction was associated with a distinct
functional reorganization of the gut microbiome,
characterized by the enrichment of virulence
factors—including lipopolysaccharide and aerobactin
biosynthesis—concomitant with the depletion of protective
biosynthetic pathways, specifically L-glutamine and biotin
biosynthesis.
PD-associated gut dysbiosis is predominantly
prevalence-driven and shaped by interaction-based
functional reprogramming rather than global community
shifts. Our findings support a multi-hit model in which
synergistic interactions between sulfate-reducing bacteria
and curli-producing microbes jointly define a
disease-relevant metabolic state.

12:15-12:30
Exploring Feature Representations for Cancer-AssociatedsORF Prediction in Non-coding RNA
Confirmed Presenter: Fabiana Rodrigues de Goes, Rosalind Franklin Institute, United Kingdom


Authors List: Show

  • Fabiana Rodrigues de Goes, Rosalind Franklin Institute, United Kingdom
  • Makanaka Mazheke, University College London, United Kingdom
  • Aparajita Karmakar, Rosalind Franklin Institute, United Kingdom
  • Nayane Souza, Universidade Tecnológica Federal do Paraná, Brazil
  • Amanda Piveta Schnepper, Sao Paulo State University, UNESP, Brazil
  • Robson Francisco Carvalho, Sao Paulo State University, UNESP, Brazil
  • Mark Basham, Rosalind Franklin Institute, United Kingdom
  • Alexandre Rossi Paschoal, Rosalind Franklin Institute, United Kingdom

Presentation Overview: Show

Advances in cancer bioinformatics have expanded our
understanding of tumor evolution and molecular
heterogeneity. Microproteins encoded by small open reading
frames (sORFs) in non-coding RNAs (ncRNAs) represent a
largely unexplored layer of cancer biology, with potential
roles in oncogenic regulation, biomarker discovery, and
therapeutic targeting. However, systematic identification
of cancer-associated sORFs remains challenging due to
experimental costs and technical limitations, highlighting
the need for computational approaches to enable large-scale
screening. Here, we present a comprehensive evaluation of
machine learning models and sequence feature
representations for predicting cancer-associated sORFs
using the Spencer database, which catalogs 29,526
ncRNA-derived small peptides across 15 cancer types.
Instead of relying on increasingly complex model
architectures, we systematically investigate the impact of
feature extraction strategies. Three classifiers (Random
Forest, Support Vector Machine, and Multilayer Perceptron)
were benchmarked with three feature types: k-mer frequency,
Word2Vec embeddings, and embeddings from pre-trained
genomic language models (gLMs). Our results show that
classical models, when combined with appropriate feature
engineering, consistently outperform the CoraL baseline,
achieving up to 10% higher accuracy. Notably, k-mer–based
representations often provided more stable and accurate
predictions than gLM embeddings without fine-tuning,
indicating that increased model complexity does not
guarantee superior performance. Tokenization choices, such
as k-mer length, also significantly affected outcomes.
Certain datasets, for example skin cancer, exhibited
reduced sensitivity, suggesting intrinsic challenges for
positive-case detection. Overall, our findings emphasize
the critical role of feature representation in
cancer-associated sORF prediction and demonstrate that
well-designed, interpretable models can outperform more
complex deep learning approaches in this domain.

12:30-12:40
Closing Remarks and Awards

Authors List: Show

13:30-13:45
Welcome and Opening Remarks

Authors List: Show

  • Mark Wass, University of Kent, Daniela Hensen, Biotechnology and Biological Sciences Research Council, part of UK Research and Innovation
13:45-14:30
Invited Presentation: Keynote from Dr. Rob Finn
Moderator(s): Mark Wass


Authors List: Show

  • Rob Finn, European Bioinformatics Institute
14:30-14:45
ImmunoMatch learns and predicts cognate pairing of heavyand light immunoglobulin chains
Confirmed Presenter: Dongjun Guo, University College London, United Kingdom


Authors List: Show

  • Dongjun Guo, University College London, United Kingdom
  • Deborah Dunn-Walters, University of Surrey, United Kingdom
  • Franca Fraternali, University College London, United Kingdom
  • Joseph Ng, University College London, United Kingdom

Presentation Overview: Show

Immunoglobulin heavy (H) and light (L) chains are assembled
separately through gene segment recombination, generating
enormous antibody diversity. While this process is well
characterized, the molecular rules governing H-L chain
pairing preferences remain poorly understood. However,
stable pairing is crucial for B cell maturation: for
example, many B cell lymphomas rely on functional B cell
receptors to sustain intracellular signalling for survival.
Moreover, stable H-L pairing affects the “developability”
of therapeutic antibodies. Here we present ImmunoMatch, an
artificial intelligence framework that deciphers the
principles underlying antibody H-L chain pairing.
ImmunoMatch leverages the pre-trained antibody language
model AntiBERTa2 and learns from human single-cell
sequencing data to distinguish cognate H-L pairs from
randomly paired sequences. The model achieves strong
validation performance (ROC-AUC = 0.75) and demonstrates
enhanced predictive capability when trained separately on κ
and λ light chain types, with ImmunoMatch-λ showing
superior generalizability. This is consistent with the
sequential assembly of H-κ and H-λ pairs during B cell
development in bone marrow. Analysis of B cell leukemia and
lymphoma samples reveals that chain pairing propensity
aligns with B cell maturation stages, suggesting refinement
of H-L pairing is a hallmark of B cell maturation.
ImmunoMatch enables diverse applications including
reconstructing paired antibody chains from spatial
transcriptomics and therapeutic antibody design
optimization. By addressing H-L chain pairing as an
under-explored dimension of antibody developability,
ImmunoMatch facilitates computational assessment and
large-scale optimization of stable immunoglobulin
assemblies for efficacious antibody therapeutics.

14:45-15:00
DNA Language Models for Efficient Non-Coding Variant EffectPrediction
Confirmed Presenter: Megha Hegde, Kingston University London, United Kingdom


Authors List: Show

  • Megha Hegde, Kingston University London, United Kingdom
  • Jean-Christophe Nebel, Kingston University London, United Kingdom
  • Farzana Rahman, Kingston University London, United Kingdom

Presentation Overview: Show

In the era of personalised medicine, interpreting the
effects of variants in human DNA is crucial. Though over
98% of known variants in human DNA reside in the non-coding
regions, research into these variants has been limited by
the difficulty of decoding the complex pathways through
which they interact. While large Transformer-based genomic
language models excel in interpreting patterns in coding
DNA, they scale poorly for non-coding DNA due to the
quadratic complexity of the attention mechanism. Models
such as Enformer and AlphaGenome have made excellent
advances in this area, combining Transformers with CNNs to
model long sequences. However, like other models in the
field, they still struggle to capture the effects of distal
regulatory elements (>100kb from the gene). While
non-attention methods such as Caduceus and HyenaDNA (using
Mamba and Hyena respectively) have tackled the quadratic
complexity challenge, they have not yet improved predictive
power on these distal regions. Hence, further exploration
is necessary to address this problem.
Recent work has explored techniques that can improve the
variant effect prediction ability of genomic LLMs, while
reducing their computational cost. Notable techniques
include combining ensembling with distillation, and using
layerwise pruning to identify and remove redundant layers.
The approach presented in this work employs a combination
of ensembling, distillation and pruning to produce a highly
accurate, yet efficient, method for non-coding DNA variant
effect prediction. Preliminary experiments on the ncVarDB
dataset have achieved close to state-of-the-art performance
without the need for extensive fine-tuning.

15:00-15:15
ESMRank: A ranking-based AI framework for interpretableprediction of protein variant effects Topic
Confirmed Presenter: Riccardo Arnese, Università di Napoli Federico II, Italy


Authors List: Show

  • Riccardo Arnese, Università di Napoli Federico II, Italy
  • Gennaro Gambardella, telethon institute of genetics and medicine, Italy

Presentation Overview: Show

Predicting the functional effects of protein variants
remains a central challenge in both clinical
genomics and protein engineering. Despite the growing
availability of Deep Mutational Scanning
(DMS) datasets, their integration is hindered by assay
heterogeneity and batch effects.
Here, we present ESMRank, a novel AI-framework that
reformulates variant effect prediction as a
learn-to-rank problem and trained on over 2M variants from
MAVEdb, harmonized through a
Reciprocal Rank Fusion strategy we developed. Built on the
LambdaMART algorithm, ESMRank
directly optimizes the ordering of variants by functional
relevance, integrating rich protein
representations from the ESM-2 language model, including
sequence embeddings and residue–
residue contact maps, with physicochemical descriptors of
mutational impact.
When benchmarked on protein stability assays, ESMRank
consistently outperformed state-of-the-art
sequence- and structure-based predictors. On the Human
Domainome dataset (~500,000 mutations
across 500 human protein domains; Beltran et al., Nature
2025), ESMRank achieved a Spearman
correlation coefficient (ρ) of 0.62 versus 0.46 for
ThermoMPNN, representing a 35% improvement
in predictive accuracy. On ProteinGym, it again ranked
first on stability assays (mean ρ = 0.64 vs
0.59 for ProSST), confirming a 10% performance gain. On
VariBench, ESMRank’s predictions
strongly correlated with both protein folding (ρ = 0.55)
and unfolding rates (ρ = –0.49), further
corroborating its ability to identify stability-affecting
mutations.
By combining ranking-based learning with protein language
models, ESMRank bridges AI and
molecular biology, providing a scalable, interpretable, and
biologically grounded framework for
variant interpretation and protein design.

16:30-16:45
Leveraging protein language models and a scoring functionfor indel characterisation and transfer learning
Confirmed Presenter: Oriol Gracia I Carmona, King's College London and University College London, United Kingdom


Authors List: Show

  • Oriol Gracia I Carmona, King's College London and University College London, United Kingdom
  • Vile Leipart, Norwegian University of Life Sciences, Norway
  • Franca Fraternali, University College London, United Kingdom
  • Christine Orengo, University College London, United Kingdom
  • Gro V Adam, Arizona State University, United States

Presentation Overview: Show

Insertions and deletions (indels) represent a major yet
understudied class of genetic variation, largely because
they alter protein sequence length and disrupt direct
comparisons between wild-type and mutant proteins. This
poses a challenge for protein language models (PLMs) in
particular, whose success in variant effect prediction has
primarily focused on single amino acid substitutions.
Existing indel pathogenicity predictors rely on limited
annotations, are often restricted to human proteins, and
provide little interpretability, limiting their broader
applicability and biological insight.
Here, we introduce IndeLLM Zero-shot, a scoring framework
that leverages PLMs to systematically assess the effects of
in-frame indels across diverse organisms. Our approach
resolves the challenge of differing sequence lengths by
enabling meaningful comparisons between wild-type and
indel-containing proteins using sequence information alone.
IndeLLM Zero-shot supports zero-shot inference, avoiding
task-specific fine-tuning while achieving performance
comparable to state-of-the-art indel pathogenicity
predictors at a fraction of the computational cost.
Crucially, the framework allows indel effects to be mapped
directly onto specific protein regions, enhancing
interpretability and enabling structural and functional
insights.
Building on this scoring strategy, we design a compact
Siamese neural network that applies transfer learning on
PLM embeddings. This task-specific architecture
significantly outperforms existing indel pathogenicity
prediction tools, achieving a Matthews correlation
coefficient of 0.77 across benchmark datasets. We further
provide practical guidelines for efficient transfer
learning with PLMs in indel-focused studies. To promote
accessibility and reproducibility, we release IndeLLM as a
plug-and-play Google Colab notebook with integrated
visualization of indel effects on protein sequence and
structure. https://github.com/OriolGraCar/IndeLLM

16:45-17:00
Mapping the space of protein binding sites by integratingsequence-based protein language models with pocket-context
Confirmed Presenter: Jim Horne, Astex Pharmaceuticals, United Kingdom


Authors List: Show

  • Jim Horne, Astex Pharmaceuticals, United Kingdom
  • Tugce Oruc, Astex Pharmaceuticals, United Kingdom
  • Maria Kadukova, Astex Pharmaceuticals, United Kingdom
  • Thomas Davies, Astex Pharmaceuticals, United Kingdom
  • Marcel Verdonk, Astex Pharmaceuticals, United Kingdom
  • Carl Poelking, Astex Pharmaceuticals, United Kingdom

Presentation Overview: Show

Binding sites are the key interfaces that determine a
protein’s biological activity, and therefore common targets
for therapeutic intervention. Techniques that help us
detect, compare, and contextualize binding sites are hence
of immense interest to drug discovery.

We present an approach that integrates protein language
models with a 3D tessellation technique to derive rich and
versatile representations of binding sites that combine
functional, structural, and evolutionary information with
unprecedented detail. We demonstrate that the associated
similarity metrics induce meaningful pocket clusterings by
balancing local structure against global sequence effects.

The resulting embeddings are shown to simplify a variety of
downstream tasks: they help organize the ‘pocketome’ in a
way that efficiently contextualizes new binding sites,
construct performant druggability models, and can be used
to define debiased train-test splits for believable
benchmarking of pocket-centric machine-learning models.

17:00-17:15
Are We Teaching Computational Biology Backwards? A Call for a Renaissance of Critical Thinking in the GenAI Era
Confirmed Presenter: Eva Caamano Gutierrez, University of Liverpool, United Kingdom


Authors List: Show

  • Eva Caamano Gutierrez, University of Liverpool, United Kingdom
  • David Hughes, University of Liverpool, United Kingdom
  • Christina Birch, University of Liverpool, United Kingdom
  • Emily Johnson, University of Liverpool, United Kingdom
  • Anthony Evans, University of Liverpool, United Kingdom
  • Elisabeth Deja, University of Liverpool, United Kingdom
  • Alasdair Ivens, University of Edinburgh, United Kingdom
  • Andrew Jones, University of Liverpool, United Kingdom

Presentation Overview: Show

Computational biology is still too often taught as
something that happens after data acquisition, reduced to a
collection of tools and procedural steps. As a result,
educational practices remain misaligned with the skills
required for rigorous, responsible, and sustainable
data-driven science. In the era of GenAI, this imbalance is
no longer merely inefficient, it is dangerous.

Drawing on feedback from our MRC-funded project BIOMEDASA,
we present evidence from two work packages. NURTURE, which
delivers outreach training to professionals who routinely
work with data but are not formally trained as data
specialists, exposes persistent gaps in critical skills.
Participants consistently report weaknesses in study
design, limited understanding of FAIR principles and
regulatory compliance, and uncertainty around the
appropriate use of AI-driven methods.

We also present preliminary findings from ANALYSE, a
national review of MSc programmes in biomedical data
science, and related fields. Student perspectives reveal
limited exposure to real-world applications, challenges
with use of GenAI and other insights that reinforce a
broader pedagogical problem: bioinformatics is taught as a
set of techniques rather than as a way of thinking. Widely
used platforms such as Galaxy exemplify this tension,
enabling accessibility while most training focuses on “what
to click” instead of “why an analysis is appropriate”.

As GenAI systems increasingly generate code and analyses on
demand, critical judgement, based on a deep understanding
of systems, becomes the defining skill of the computational
biologist. But how do we, as a community, reclaim and
collectively nurture critical thinking as our most powerful
research capability?

17:15-18:00
Panel: AI Roundtable Discussions

Authors List: Show

Wednesday, April 22nd
9:00-9:20
Panel: AI Roundtable Discussion Outcomes and Highlights

Authors List: Show

9:20-9:35
Integrating Predicted and Experimental Structures: The Roleof AlphaFold DB in Modern Structural Biology
Confirmed Presenter: Joseph Ellaway, EMBL-EBI, United Kingdom


Authors List: Show

  • Jennifer Fleming, EMBL-EBI PDBe, United Kingdom
  • Damian Bertoni, EMBL-EBI PDBe, United Kingdom
  • Maxim Tsenkov, EMBL-EBI PDBe, United Kingdom
  • Paulyna Magaña, EMBL-EBI PDBe, United Kingdom
  • Sreenath Nair, EMBL-EBI PDBe, United Kingdom
  • Ivanna Pidruchna, EMBL-EBI PDBe, United Kingdom
  • Marcelo Querino Lima Afonso, EMBL-EBI PDBe, United Kingdom
  • Adam Midlik, EMBL-EBI PDBe, United Kingdom
  • Urmila Paramval, EMBL-EBI PDBe, United Kingdom
  • Melanie Vollmar, EMBL-EBI PDBe, United Kingdom
  • Joseph Ellaway, EMBL-EBI, United Kingdom
  • Dare Lawal, EMBL-EBI PDBe, United Kingdom
  • Ahsan Tanweer, EMBL-EBI PDBe, United Kingdom
  • Sameer Velankar, EMBL-EBI PDBe, United Kingdom

Presentation Overview: Show

The AlphaFold Protein Structure Database (AFDB), developed
by EMBL-EBI and Google DeepMind, provides open,
proteome-scale access to high-accuracy protein structure
predictions, offering structural coverage for hundreds of
millions of sequences across UniProt reference proteomes.
AFDB delivers standardised coordinate files, unified
metadata, and detailed confidence metrics, including pLDDT
and predicted aligned error (PAE) plots, ensuring reliable
interpretation and downstream use. Structural coverage has
recently been expanded to include isoforms and the
underlying multiple sequence alignments supporting each
prediction, enabling deeper analysis of conservation,
co-evolution, and model support.


A comprehensive redesign of the entry page enhances
usability and structural interpretation by integrating
annotations directly with an interactive Mol* viewer and
introducing dedicated Domains, Annotations, and Similar
Proteins tabs. The Similar Proteins tab presents
Foldseek-based structural homologues and clustered views of
evolutionarily related proteins, while the Annotations tab
displays AlphaMissense visualisation and a new system for
uploading user-defined annotations, creating a flexible
framework that will accommodate custom data. Programmatic
APIs, FTP and cloud-hosted datasets, and bulk download
options support large-scale computational workflows.
Integration with UniProt, PDBe and PDBe-KB lets researchers
place AFDB predictions within their broader biological and
experimental context.


AFDB continues to expand in partnership with the scientific
community, guided by three principles: filling gaps in
structural coverage; improving model accuracy and utility;
and addressing global challenges such as antimicrobial
resistance and food security. Prioritising model inclusion
and validation, and fostering community-driven annotation,
AFDB endeavours to be a FAIR, knowledge-rich resource that
accelerates discovery and amplifies the impact of protein
structure data.

9:35-9:50
Phyre2.2: Predicting protein structure and protein/ligandinteractions prediction in the AlphaFold era
Confirmed Presenter: Michael J E Sternberg, Imperial College London, United Kingdom


Authors List: Show

  • Harold R Powell, Imperial College London, United Kingdom
  • Suhail A Islam, Imperial College London, United Kingdom
  • Alessia David, Imperial College London, United Kingdom
  • Michael J E Sternberg, Imperial College London, United Kingdom

Presentation Overview: Show

Phyre2.2 is an updated version of our template-based
protein structure prediction server which is widely used
with over 35K users in 2025. We report two developments.
The first is to facilitate a user to input their sequence
and find the closest AlphaFold model which then serves as
the template to yield the predicted model for the query.
The second, implemented in Phyre2.2 Ligand which is shortly
to be launched, is to predict the location of a ligand in
the predicted structure. There are two approaches. One
approach is transplanting a ligand from the PDB template
into the predicted model. The second approach is to enable
the user to identify a ligand of choice. The user is
presented with a list of ligands in the template and
details of any ligand listed in UniProt. In addition the
user can input any ligand of their choice. Cavities are
identified in the predicted structure. Then using AutoDock
Vina the ligand is docked into the selected cavity. Ligand
placement is presented to the user via a molecular graphics
interface based on JSMol. Coordinates can be downloaded.
Phyre2.2 is freely available to all users, including
commercial users, at https://www.sbg.bio.ic.ac.uk/phyre2/ .

9:50-10:05
FAIRDOM-SEEK: Platform for FAIR data and research assetmanagement
Confirmed Presenter: Munazah Andrabi, The University of Manchester, United Kingdom


Authors List: Show

  • Munazah Andrabi, The University of Manchester, United Kingdom
  • Stuart Owen, The University of Manchester, United Kingdom
  • Carole Goble, The University of Manchester, United Kingdom

Presentation Overview: Show

As research becomes more data-driven, collaborative, and
interdisciplinary, the need for structured, accessible, and
well-curated data outputs with rich, standardised metadata
is critical to ensure data is discoverable and reusable
beyond its original context. FAIRDOM-SEEK platform
addresses these challenges by providing a customisable,
open-source, web-based catalogue designed to support FAIR
(Findable, Accessible, Interoperable, Reusable) data and
asset management.

FAIRDOM-SEEK enables scientists to organize, document,
share, and publish research data using the Investigation,
Study, Assay (ISA) framework, which structures experiments
and related assets such as samples, protocols. Key features
include robust metadata and sample management, version
control, linking to external repositories, and integration
with modeling tools. Controlled sharing and DOI creation
further enhance collaboration and long-term accessibility.

The platform supports the creation of dedicated Project
Hubs, which are customised local instances deployed for
specific projects. These allow tailored use of the
platform’s core capabilities, including modified
appearance, structure, and content. Notable examples of
hubs include IBISBAHub, WorkflowHub, NFDI4Heath Local
DataHub and DataHub. MIT BioMicroCenter has integrated the
platform to streamline data and sample management for their
ongoing research initiatives. In addition, FAIRDOMHub, the
flagship public instance, serves over 400 national and
international projects as both a repository and a
knowledge-sharing platform, promoting interdisciplinary
collaboration and community engagement. As a core resource
for many European organisations (e.g de.NBI, ELIXIR) and
international consortia, FAIRDOMHub, plays a vital role in
research data management. In the talk we will present the
salient features of FAIRDOM-SEEK and highlighting how it
facilitates FAIR Data Management.

10:05-10:20
Royal Society journals and open access publishing
Confirmed Presenter: Jessica Miller, Royal Society Publishing, United Kingdom


Authors List: Show

  • Jessica Miller, Royal Society Publishing, United Kingdom

Presentation Overview: Show

The Royal Society is the UK's academy of the sciences and
has been publishing scientific articles for 360 years. To
this day, it publishes internationally leading journals
across the sciences, including cutting edge
cross-disciplinary work particularly in Journal of the
Royal Society Interface and Interface Focus. In this talk,
learn more about the Royal Society and its journals,
including how we support authors and the exciting work we
publish from researchers worldwide. We will also provide
information on our extensive support for open access
publishing.

About the speaker: Following a Masters in science
communication, Jessica Miller moved into scientific
publishing. Now with over 10 years’ experience in the
field, Jessica is passionate about working directly with
the multidisciplinary science teams. She has worked with
the interdisciplinary journals Journal of the Royal Society
Interface and Interface Focus for 6 years.

10:20-10:25
A foundation model to study the molecular principles of codon usage in eukaryotes
Confirmed Presenter: Susanne Bornelöv, Department of Biochemistry, University of Cambridge, UK, United Kingdom


Authors List: Show

  • Tirtharaj Dash, Department of Biochemistry, University of Cambridge, UK, United Kingdom
  • Toby Clark, Department of Biochemistry, University of Cambridge, UK, United Kingdom
  • Gautam, Department of Biochemistry, University of Cambridge, UK, United Kingdom
  • Susanne Bornelöv, Department of Biochemistry, University of Cambridge, UK, United Kingdom

Presentation Overview: Show

Understanding how DNA and RNA sequences encode quantitative aspects of gene expression remains a central challenge in molecular biology. I will present our effort to build a foundation model for coding sequences, designed to learn functional and evolutionary sequence representations that capture how synonymous codon choices influence mRNA stability and protein expression. Using an improved BERT architecture and limiting training to eukaryotic sequences, our model achieves state-of-the-art performance on downstream mRNA-related tasks such as mRNA stability and mRNA abundance predictions. Moreover, we show that by combining embeddings from our coding sequence model with embeddings from UTR models, we can further improve mRNA stability predictions to levels achieved by a previously published hybrid convolution and recurrent neural network, Saluki, built specifically for this task. Together, our work highlights how a carefully designed foundation model can achieve performance well above that of other recently published models, and be used as an in-silico system to study the rules underlying mRNA regulation and fate.

10:25-10:30
A Sneaky Peek at the CRUK Data Hub 
Confirmed Presenter: Frances Pearl, University of Sussex, United Kingdom


Authors List: Show

  • Sarah Wooller, University of Sussex, United Kingdom
  • Ayoola Olojede, University of Sussex, United Kingdom
  • Sanika Raut, CRUK, United Kingdom
  • Leslie Glass, HDR-UK, United Kingdom
  • Joseph Day, Cancer Research UK, United Kingdom
  • Eytan Kovelr, CRUK, United Kingdom
  • Andrew Blake, Oxford University, United Kingdom
  • Loki Sinclaire, HDR-UK, United Kingdom
  • Peter Harrison, HDR-UK, United Kingdom
  • Frances Pearl, University of Sussex, United Kingdom

Presentation Overview: Show

One of the biggest problems in bioinformatics today is
finding and accessing suitable datasets. To address this
problem in the field of cancer data, CRUK and the
Bioinformatics Lab at the University of Sussex are working
together to develop an online CRUK Data Hub where
researchers can search for information about cancer
datasets funded by CRUK. Fully compatible with the UK
Health Data Research Gateway, the Hub will provide details
on the datasets and how to access them, making it easier
for researchers to find and reuse cancer datasets,
fostering collaboration and the sharing of data. This
resource network will ultimately lead to more effective
research and, most importantly, better outcomes for cancer
patients. This is an opportunity for a sneaky peek at the
pilot hub  providing the opportunity to influence the final
product so that it better serves your needs.

10:30-10:35
Pandemic-scale phylogenetics
Confirmed Presenter: Nicola De Maio, EMBL-EBI, United Kingdom


Authors List: Show

  • Nicola De Maio, EMBL-EBI, United Kingdom

Presentation Overview: Show

Large-scale genomic epidemiological data can reveal details
of pathogen transmission and evolution, such as the
emergence of new variants and antimicrobial resistance.
However, analysing large collections of genomes is
computationally challenging, particularly in phylogenetics
and phylodynamics. Here I will present recent progress in
massively scalable phylogenetic inference, in the
measurement and representation of phylogenetic uncertainty,
and in improving phylogenetic accuracy by modelling
recurrent mutations and sequence errors. Altogether these
advances enable rapid, accurate, and robust pathogen
surveillance.

11:45-12:00
Federated Learning Approaches to Biomedical KnowledgeDiscovery
Confirmed Presenter: Gamze Gursoy, University of Cambrdige, United Kingdom


Authors List: Show

  • Gamze Gursoy, University of Cambrdige, United Kingdom

Presentation Overview: Show

The rapid expansion of omics technologies, coupled with the
growing availability of structured medical records, creates
unprecedented opportunities to deepen our understanding of
health and disease. Yet these advances also raise
formidable challenges: protecting patient privacy and
enabling the integration of sensitive data across
institutions. In this talk, I will present our lab’s work
on privacy-preserving informatics and machine learning
methods that enable critical biomedical analyses without
requiring raw data to be centralized or shared. I will
highlight techniques such as federated learning and secure
multiparty computation that make it possible to discover
new knowledge while maintaining strong privacy guarantees.
Finally, I will discuss how standard federated learning
often breaks down under real-world distributional shifts,
and introduce novel approaches we have developed to address
these limitations.

12:00-12:15
Mind your own binding: computational prediction ofparatope-epitope interfaces
Confirmed Presenter: Montader Ali, University of Cambridge, United Kingdom


Authors List: Show

  • Samo Miklavc
  • Eva Smorodina, University of Oslo, Norway
  • Montader Ali, University of Cambridge, United Kingdom
  • Klara Kropivsek, University of Nova Gorica, Slovenia
  • Leonardo Salicari, CINECA, Italy
  • Aibek Kappassov, Imperial College London, United Kingdom
  • Chengcheng Fu, University of Oslo, Norway
  • Pietro Sormanni, Imperial College London, United Kingdom
  • Ario de Marco, University of Nova Gorica, Slovenia
  • Victor Greiff, University of Oslo, Norway

Presentation Overview: Show

Antibodies are key immunotherapeutic biomolecules
distinguished by their ability to bind antigens with high
specificity. Although computational methods for protein
structure prediction have advanced considerably, accurately
predicting the precise paratope-epitope interface, where
antibody CDR loops interact with antigen surfaces, remains
a major challenge. Reliable computational prediction of
these binding sites would transform antibody design and
therapeutic development. In this study, we evaluate the
performance of current structure prediction and
protein-protein docking tools in identifying correct
paratope-epitope pairs and modeling antibody–antigen
complexes. We assess their accuracy in prioritizing true
binding interfaces and highlight key limitations and
opportunities for improvement. Our findings provide
critical insights into the current capabilities of
computational paratope-epitope prediction and offer
guidance for the next generation of models aimed at
improving antibody–antigen interaction modeling for
therapeutic applications.

12:15-12:30
Chemistry Aware AI Model for Interpretable siRNAEngineering and Activity Prediction
Confirmed Presenter: Aparajita Karmakar, The Rosalind Franklin Institute, United Kingdom


Authors List: Show

  • Aparajita Karmakar, The Rosalind Franklin Institute, United Kingdom
  • Angus Weir, The Rosalind Franklin Institute, United Kingdom
  • Alex Lubbock, The Rosalind Franklin Institute, United Kingdom
  • Grzegorz Kudla, The University of Edinburgh, United Kingdom
  • Mark Basham, The Rosalind Franklin Institute, United Kingdom

Presentation Overview: Show

Small interfering RNAs (siRNAs) are a promising therapeutic
approach because they silence genes at the mRNA level,
enabling targeting of pathways that are difficult to
address with conventional drugs. Their specificity and ease
of synthesis support clinical use, but natural siRNAs often
suffer from limited stability, delivery, and potency.
Chemical modifications can overcome these issues, yet
identifying optimal designs is costly and labour-intensive.
Computational and automated methods therefore provide a
faster, more cost-effective strategy for developing
improved siRNA candidates.

In this study, we developed a computational pipeline to
link the chemical properties of modified sequences with
target gene knockdown. Molecular fingerprints and
chemistry-focused language models (ChemBERTa, MolBERT) were
used to generate high-dimensional embeddings, which were
fed into a convolutional neural network for efficacy
prediction. The model achieved an AUC of up to 0.86.

While our deep learning model can predict the activity of
any modified siRNA, it is equally important to elucidate
the chemical features that drive these outcomes. Gaining
this mechanistic insight is crucial for chemists aiming to
design candidates more intuitively, yet this dimension has
remained largely unexplored. Therefore, we applied SHAP
analysis to our model to identify sequence-chemical
features that are associated with its binding affinity.
These insights highlight underlying patterns in siRNA
chemistry and can inform improved siRNA design and
accelerate therapeutic discovery.

12:30-12:35
Training a force field for proteins and small molecules from scratch
Confirmed Presenter: Joe Greener, MRC Laboratory of Molecular Biology, United Kingdom


Authors List: Show

  • Joe Greener, MRC Laboratory of Molecular Biology, United Kingdom
  • Alexandre Blanco-González, MRC Laboratory of Molecular Biology, United Kingdom
  • Thea Schulze, MRC Laboratory of Molecular Biology, United Kingdom
  • Evianne Rovers, MRC Laboratory of Molecular Biology, United Kingdom

Presentation Overview: Show

Force fields for molecular dynamics are usually developed manually, limiting their transferability and making systematic exploration of functional forms challenging. We developed a graph neural network that assigns all force field parameters for diverse molecules using continuous atom typing. The freely-available model, called Garnet, was trained on quantum mechanical, condensed phase and protein nuclear magnetic resonance data without the use of existing parameters. The resulting force field shows comparable performance to current force fields on small molecules, folded proteins, protein complexes and disordered proteins. It shows similar results to popular approaches for relative binding free energy predictions across a range of targets. Assessing different functional forms shows that the double exponential potential is a flexible and accurate alternative to the Lennard-Jones potential. Garnet provides a platform for automated, reproducible force field discovery that brings the benefits of machine learning to classical force fields.

12:35-12:40
Allosteric Communication and Kinetic Regulation in Membrane Protein

Authors List: Show

  • Hossein Batebi, Free University of Berlin, Germany

Presentation Overview: Show

A major open question in GPCR biology is how small chemical
changes, like ligand protonation or nucleotide exchange,
trigger the structural shifts that determine signaling.
High-resolution structures show us the endpoints, but they
don't explain how these long-range effects actually happen.
Even when we characterize intermediate states using
extensive MD simulations or time-resolved cryo-EM—as we did
recently with the β AR complex—standard analyses only show
correlated motions. They fail to map the specific routes
mechanical information takes.Additionally, while enhanced
sampling methods help model these dynamics at a lower cost,
they typically generate free-energy profiles that ignore
friction. This is a significant oversight because spatially
varying friction heavily influences the speed of
conformational changes. As a result, we still cannot fully
link chemistry at one site to signaling kinetics tens of
ångströms away. To bridge this gap, we developed two
complementary methods. The first is a force-flow framework
that tracks how mechanical signals propagate, identifying
the specific residues acting as relay points. The second is
a friction-aware kinetic model that computes
position-dependent friction to predict accurate timescales.
This approach reveals kinetic bottlenecks that standard
energy profiles miss. Together, these tools connect
structure to kinetics, offering a quantitative way to
understand the mechanisms regulating GPCR activation.

12:40-12:45
InterProScan 6: a modern large-scale protein functionannotation pipeline
Confirmed Presenter: Matthias Blum, European Molecular Biology Laboratory, European
Bioinformatics Institute (EMBL-EBI), United Kingdom


Authors List: Show

  • Matthias Blum, European Molecular Biology Laboratory, European
    Bioinformatics Institute (EMBL-EBI), United Kingdom
  • Emma Hobbs, European Molecular Biology Laboratory, European
    Bioinformatics Institute (EMBL-EBI), United Kingdom
  • Laise Cavalcanti Florentino, European Molecular Biology Laboratory, European
    Bioinformatics Institute (EMBL-EBI), United Kingdom
  • Alex Bateman, European Molecular Biology Laboratory, European
    Bioinformatics Institute (EMBL-EBI), United Kingdom

Presentation Overview: Show

InterProScan is a widely used software system for
large-scale protein function annotation, forming a core
component of annotation pipelines at resources such as
UniProt, Ensembl, and MGnify. After more than a decade of
use, InterProScan 5 faced increasing challenges related to
scalability, deployment complexity, data management, and
integration with modern computational infrastructures.

We present InterProScan 6, a complete reimplementation of
the pipeline as a Nextflow-based workflow designed to
support contemporary bioinformatics use cases. The new
architecture improves portability, reproducibility, and
scalability across local workstations, high-performance
computing (HPC) systems, and cloud environments.
InterProScan 6 decouples pipeline logic from signature
data, enabling flexible management of InterPro releases and
on-demand data retrieval, and provides native support for
containerised execution using Docker, Singularity, and
Apptainer.

Performance benchmarking across nine reference proteomes
spanning bacteria to large eukaryotes shows consistent
reductions in wall-clock runtime relative to InterProScan
5, with approximately two-fold speedups on complex
eukaryotic proteomes. When pre-computed annotations are
available, integration with a redesigned Matches API
enables runtimes to be reduced to minutes. Concordance
analysis across the full Swiss-Prot dataset demonstrates
that InterProScan 6 reproduces InterProScan 5 annotations
with near-identical precision and sensitivity across all
InterPro member databases.

InterProScan 6 provides a robust and future-proof platform
for efficient, flexible, and reproducible genome-scale
protein function annotation in large-scale biological data
analysis.

12:45-13:10
Invited Presentation: BioFAIR

Authors List: Show

  • Ed Clark, Eva Wan
14:25-14:40
REMAG: recovery of eukaryotic genomes from metagenomes using contrastive learning
Confirmed Presenter: Daniel Gómez Pérez, Earlham Institute, United Kingdom


Authors List: Show

  • Daniel Gómez Pérez, Earlham Institute, United Kingdom
  • Sebastien Raguideau, Earlham Institute, United Kingdom
  • Falk Hildebrand, Quadram Institute Bioscience, United Kingdom
  • Christopher Quince, Earlham Institute, United Kingdom

Presentation Overview: Show

Metagenome-assembled genomes (MAGs) are central to exploring microbial communities. Yet, despite the relevance of protists and fungi to many diverse communities, eukaryotic MAG recovery lags behind that of prokaryotes. A major bottleneck is that most state-of-the-art binning pipelines exclusively rely on prokaryotic single-copy core gene reference databases and are optimized for smaller genomes. We present REMAG (Recovery of Eukaryotic MAGs), a tool designed to recover high-quality eukaryotic genomes from metagenomic data. REMAG leverages fine-tuned HyenaDNA genomic foundation models to efficiently filter eukaryotic contigs. It then employs a dual-encoder Siamese network trained with Barlow Twins contrastive loss to learn a shared embedding space from contig composition and differential coverage. High-quality bins are extracted using greedy iterative Leiden clustering optimized with eukaryotic single-copy core gene constraints. In benchmarks based on simulated and real prokaryotic/eukaryotic datasets of varying sizes, we demonstrate REMAG's ability to recover more near-complete eukaryotic genomes than similar state-of-the-art tools, which often produce highly fragmented eukaryotic bins. Overall, our approach provides a reference-free method for eukaryotic binning that scales well with the increasing size and sequencing depth of metagenomic datasets.

14:40-14:55
Protal: Ultra-fast metagenomic profiling and strain-resolved analysis
Confirmed Presenter: Joachim Fritscher, M3 Research Center Tuebingen, Quadram Institute Bioscience
Norwich, Earlham Institute Norwich, Germany


Authors List: Show

  • Joachim Fritscher, M3 Research Center Tuebingen, Quadram Institute Bioscience
    Norwich, Earlham Institute Norwich, Germany
  • Anthony Duncan, Quadram Institute Bioscience Norwich, Earlham Institute
    Norwich, United Kingdom
  • Falk Hildebrand, Quadram Institute Bioscience Norwich, Earlham Institute
    Norwich, United Kingdom

Presentation Overview: Show

Efficient high-resolution metagenomic profiling is becoming
more important, as larger metagenomic studies allow for
finely-resolved bacterial association studies. Yet
contemporary programs offer mostly trade-off between
better sensitivity, precision, the represented taxonomic
complexity, and computational efficiency. To address these
challenges, we developed a strain-resolved metagenomic
profiler protal (profiling through alignment), that excels
in all four categories providing, fast, precise,
strain-resolved taxonomic profiling covering the full range
of taxa in GTDB r214.

This is made possible through a combination of a new
alignment algorithm, machine learning, reliance on the
standardized and regularly-updated GTDB taxonomy and
reliance on conserved bacterial marker genes. Protal can
reliably profile all 80,789 GTDB species, with profiles
more precise than all other tested contemporary profilers
(MetaPhlAn4, mOTUs3, bracken, sylph). In our benchmarks,
protal offered the highest species-level precision (0.989)
and sensitivity (0.951) of all tested tools. Further, we
demonstrate that protal reconstructed intra-specific
phylogenies are at least as precise as those of other
profilers. Compared to other alignment-based profilers
capable of de novo strain-resolved phylogenetics
(MetaPhlAn4+StrainPhLAn4), protal is at least 5x and 70x
faster at species- at strain-level profiling, respectively.

14:55-15:10
Detecting signatures underlying the composition ofbiological data
Confirmed Presenter: Anthony Duncan, Earlham Institute, Quadram Institute, United Kingdom


Authors List: Show

  • Anthony Duncan, Earlham Institute, Quadram Institute, United Kingdom
  • Wing Koon, Quadram Institute, United Kingdom
  • Katarzyna Sidorczuk, Quadram Institute, United Kingdom
  • Christopher Quince, Earlham Institute, Quadram Institute, United Kingdom
  • Clémence Frioux, Inria, France
  • Falk Hildebrand, Earlham Institute, Quadram Institute, United Kingdom

Presentation Overview: Show

Biological community data is inherently multidimensional
and therefore difficult to visualize and interpret. To
allow for the automatic decomposition of large surveys of
community composition into ‘signatures’ which capture
gradients in co-occurring features, we developed a new
software package ‘cvaNMF’. Our benchmarks on synthetic data
show the effectiveness of cross-validation and our novel
signature-similarity method to identify a suitable
decomposition using non-negative matrix factorization
(NMF). This software provides a complete set of tools to
identify and visualize biologically informative signatures
which we demonstrate in a wide range of microbial and
cellular datasets, ranging in size from under 100 to over
16,000. We detected ‘Enterosignatures’ in gut metagenomes
which differentiated human hosts with diverse diseases. We
found five ‘terrasignatures’ from rhizosphere metagenomes
which differentiated root- or soil-associated microbiomes,
while being refined enough to infer geographic distances
between plants. Large-scale data representing 25 biomes
were decomposed into environmental and host-associated
microbiomes based on five newly discovered signatures.
Finally, analysis of the cell composition of non-small cell
lung cancer samples allowed separation of cancerous and
inflamed tissues based on four cell-type signatures. cvaNMF
is a python package available on bioconda, along with
accompanying Nextflow pipeline for large data analysis.

15:10-15:15
COTAN: scRNA-seq comprehensive workflow based on genecorrelations
Confirmed Presenter: Silvia Giulia Galfre', University of Pisa, Italy


Authors List: Show

  • Silvia Giulia Galfre', University of Pisa, Italy
  • Marco Fantozzi, University of Parma, Italy
  • Alina Sirbu, Department of Computer Science, University of Pisa, Italy
  • Irene Testa, University of Pisa, Italy
  • Matteo Tolloso, University of Pisa, Italy
  • Andrea Alberti, University of Pisa, Italy
  • Corrado Priami, University of Pisa, Italy
  • Francesco Morandin, University of Pisa, Italy

Presentation Overview: Show

Motivation:
Single-cell RNA sequencing (scRNA-seq) generates extremely
sparse UMI count matrices, where many biologically
important regulators are expressed at very low levels.
Standard workflows borrowed from bulk RNA-seq
(normalization, log-transformation, and feature filtering)
can distort the information and systematically discard
low-abundance but informative genes. Methods that model
sparsity directly and avoid imputation are therefore
desirable.

Results:
We present comprehensive COTAN workflow, a
contingency-table based framework that models zero counts
directly and does not use log-normalization, scaling, or
imputation. COTAN is built on a robust gene-gene
co-expression matrix with fewer spurious correlations on
UMI scRNA-seq data, improves marker detection without the
need of removing any gene, and introduces highly sensitive
scoring via the Global Differentiation Index (GDI) that
enables detection of heterogeneous cell groups and
statistical assessment of cluster homogeneity. Together,
these advances provide a biologically grounded, end-to-end
solution for scRNA-seq analysis.

Availability: Bioconductor package and GitHub repository

15:15-15:20
Fine-tuning Oxford Nanopore basecalling models for high-accuracy repeat expansion calling
Confirmed Presenter: Rugare Maruzani, King's College London, United Kingdom


Authors List: Show

  • Rugare Maruzani, King's College London, United Kingdom
  • Ali Awan, King's College London, United Kingdom
  • Catherine Sutherland, King's College London, United Kingdom
  • Chloe Fisher, King's College London, United Kingdom

Presentation Overview: Show

Tandem repeat expansions cause several neurological
disorders, including Huntington’s disease, Kennedy’s
disease, Fragile X syndrome, amyotrophic lateral sclerosis
(ALS), and frontotemporal dementia (FTD). Pathogenicity
depends on both repeat length and interruptions within the
expanded tandem repeats. However, Oxford Nanopore
Technologies (ONT) basecalling models perform poorly when
calling long, repetitive sequences. Here, we fine-tuned ONT
basecalling models to improve accuracy for the CGG
trinucleotide repeat in FMR1 and the GGGGCC hexanucleotide
repeat in C9orf72.

We selected the latest ONT basecalling model as a
pretrained baseline and generated standards comprising two
FMR1 expansions (160 and 640 CGG repeats) and four C9orf72
expansions (128, 256, 512, and 1024 GGGGCC repeats). Two
fine-tuned models were produced, one for each expansion
type. The C9orf72 model was trained using reads from the
512-repeat standard, while the FMR1 model was trained using
reads from the 640-repeat standard. Model performance was
assessed using repeat tract purity, defined as the
percentage of sliding k-mers matching any rotation of the
expected repeat sequence. Performance was also compared
against the PacBio platform.

The fine-tuned C9orf72 ONT model achieved a mean purity of
99.53% across all expansion lengths, compared to 66.62% for
the pretrained model and 96.78% for PacBio. For FMR1,
PacBio outperformed ONT, with 98.63% purity compared to
84.56% and 55.90% for the fine-tuned and pretrained ONT
models, respectively. Future work will investigate
associations between repeat genotypes and clinical outcomes
in Fragile X, ALS, and FTD patients.

15:20-15:25
ProQuest: A Large Language Model Application on the UniprotProtein Sequence and Annotation Database
Confirmed Presenter: Melike Akkaya, Hacettepe University, Turkey


Authors List: Show

  • Melike Akkaya, Hacettepe University, Turkey
  • Rauf Yanmaz, Hacettepe University, Turkey
  • Sezin Yavuz, Hacettepe University, Turkey
  • Vishal Joshi, EMBL-EBI, United Kingdom
  • Maria Martin, EMBL-EBI, United Kingdom
  • Tunca Dogan, Hacettepe University, Turkey

Presentation Overview: Show

Accessing complex biological data through natural language
remains a significant challenge for researchers,
particularly in fields like proteomics where large-scale,
annotated datasets are the norm. In this project, we
present ProQuest (https://proquest.ngrok.app /
https://github.com/HUBioDataLab/PROQUEST), a
Retrieval-Augmented Generation (RAG) system which uses flat
files from UniProtKB (https://www.uniprot.org/) that
enables intuitive, efficient, and semantically rich
querying of protein-related information. The system is
built on a two-stage pipeline: retrieval and generation. In
the retrieval phase, user queries are vectorized using the
nomic-ai/nomic-embed-text-v1
(https://huggingface.co/nomic-ai/nomic-embed-text-v1) model
and matched with semantically similar documents stored in a
ChromaDB (https://www.trychroma.com/) vector database. To
enhance retrieval coverage, we also integrate two
keyword-based search techniques: SQLite FTS5
(https://www.sqlite.org/fts5.html) using trigram-based
inverted indexing for substring-level precision, and BM25
Encoder
(https://pinecone-io.github.io/pinecone-text/pinecone_text.html)
via Pinecone, which enables sparse vector scoring based on
dot product similarity. In the generation stage, the
retrieved documents are synthesized with the user query to
produce natural language responses using a large language
model. This design allows users to explore complex protein
data through simple queries, making the system accessible
to both domain experts and non-specialists. Although
numerical performance evaluations are ongoing, early
semantic testing has shown that the system consistently
provides coherent and relevant results. Future phases of
the project will focus on parameter tuning, deeper analysis
across biological use cases, and integration with
additional data sources. Overall, the system offers a
powerful new interface for biological data
exploration—enhancing search efficiency, reducing cognitive
load, and accelerating insight generation in protein
research.

15:45-16:00
NetREm: Network Regression Embeddings reveal cell-type transcription factor coordination for gene regulation

Authors List: Show

  • Saniya Khullar, University of Wisconsin - Madison, United States
  • Xiang Huang, University of Wisconsin - Madison, United States
  • Raghu Ramesh, University of Wisconsin - Madison, United States
  • John Svaren, Waisman Center, United States
  • Daifeng Wang, University of Wisconsin - Madison, United States

Presentation Overview: Show

Background: Transcription factor (TF) coordination plays a
key role in gene regulation via direct and/or indirect
protein–protein interactions (PPIs) and co-binding to
regulatory elements on DNA. Single-cell technologies enable
gene expression measurement for individual cells and
identification of distinct cell types, yet the link between
TF-TF coordination and target gene (TG) regulation across
diverse cell types remains poorly understood.

Method: In response, we introduce Network Regression
Embeddings (NetREm), an innovative computational approach
to uncover cell-type-specific TF-TF coordination activities
driving TG regulation. NetREm leverages network-constrained
regularization, integrating prior knowledge of TF-TF PPIs
with single-cell/bulk-level gene expression data. It
identifies transcriptional regulatory modules (TRMs)
composed of antagonistic and/or cooperative TF-TF PPIs and
predicts novel TF-TG regulatory links complementing
state-of-the-art gene regulatory networks (GRNs).

Results: We validate NetREm’s performance through
simulation studies and benchmark it across multiple
datasets in humans, mice, yeast. NetREm prioritizes
biologically-meaningful TF-TF coordination networks in 9
peripheral blood mononuclear cell types and 42 immune cell
subtypes. Additionally, we apply NetREm to cell types
(e.g., neurons, glia, Schwann cells) from central and
peripheral nervous systems, and to Alzheimer’s disease
versus control brains. Top predictions are supported by
orthogonal experimental validation data, including:
ChIP-seq, CUT&RUN, scATAC-seq, knockout studies, expression
QTLs, genome-wide association studies, and beyond. We
further link disease-associated single nucleotide
polymorphism variants to our inferred networks.

Conclusion: NetREm provides a powerful and interpretable
framework to predict cutting-edge GRNs and unprecedented
coordination networks in a cell-type-specific manner. Our
tool is on GitHub to help propel functional genomics and
therapeutic discovery.

16:00-16:15
Evolutionary conservation and rewiring of enhancer-promoterconnectivity across mammals
Confirmed Presenter: Stephen Rong, Institute of Clinical Sciences, Imperial College London;
MRC Laboratory of Medical Sciences, United Kingdom


Authors List: Show

  • Stephen Rong, Institute of Clinical Sciences, Imperial College London;
    MRC Laboratory of Medical Sciences, United Kingdom
  • Martina Rimoldi, EMBL-EBI, Wellcome Genome Campus, United Kingdom
  • Sarah Elderkin, Nuclear Dynamics Programme, The Babraham Institute, United Kingdom
  • Duncan Odom, Division of Regulatory Genomics and Cancer Evolution, DKFZ, Germany
  • Mikhail Spivakov, Institute of Clinical Sciences, Imperial College London;
    MRC Laboratory of Medical Sciences, United Kingdom

Presentation Overview: Show

Enhancers are crucial to gene regulation, often controlling
genes located far away through 3D chromatin interactions.
Gene expression patterns are generally conserved across
evolution, yet the enhancers that control them turn over
rapidly. What determines whether gene expression remains
robust to enhancer turnover or undergoes regulatory
divergence? Comparing orthologous genes across species
offers a natural perturbation experiment, but without
knowing which enhancers physically contact which genes, it
remains difficult to test mechanisms for robustness or
identify drivers of divergence.

To address this, we generated promoter capture Hi-C
(PCHi-C) from brain, liver, and testis across five mammals
spanning 94 million years of divergence (macaque, marmoset,
mouse, rat, and dog). Unlike Hi-C, PCHi-C provides the high
resolution needed to identify enhancer-gene contacts.
Integrating with published H3K27ac ChIP-seq and RNA-seq
(Roller et al., 2021), we mapped the evolution of
expression, activity, and contacts for 3,444 one-to-one
orthologous genes and over 86,000 enhancer-gene pairs
across species.

Gene expression, enhancer activity, and chromatin
interactions exhibit distinct evolutionary dynamics.
Expression clusters primarily by tissue, while enhancer
activity and interactions show stronger lineage-specific
divergence, consistent with different selective constraints
acting on each modality. Individual loci reveal examples of
deep conservation of long-range promoter-promoter
interactions, regulatory robustness despite proximal
enhancer turnover, and lineage-specific changes in distal
enhancers and interaction rewiring correlated with
expression divergence. These findings demonstrate how
regulatory architectures at some genes achieve robustness
through maintained chromatin contacts even as enhancers
turn over, while others undergo coordinated regulatory
rewiring, driving gene expression divergence.

16:15-16:30
FlowSign – A NextFlow Workflow for Network Orientation andRegulatory Sign Prediction Using Prior Knowledge and OmicsData
Confirmed Presenter: Benjamin Dominik Maier, European Bioinformatics Institute (EMBL-EBI), United Kingdom


Authors List: Show

  • Benjamin Dominik Maier, European Bioinformatics Institute (EMBL-EBI), United Kingdom
  • Evangelia Petsalaki, European Bioinformatics Institute (EMBL-EBI), United Kingdom

Presentation Overview: Show

Biological networks derived from literature and omics data are widely used to study cellular signaling, yet most lack information on interaction direction and regulatory effect—whether a protein activates or inhibits another—limiting interpretability and mechanistic modeling. We present FlowSign, a framework that predicts edge directionality and regulatory sign in protein networks by integrating prior knowledge with data-driven inference.

FlowSign provides a precomputed, harmonized protein–protein interaction resource with directionality and sign confidence scores for all protein interaction. It integrates diverse prior knowledge, including transcription
factor regulons, pathway databases, and kinase–substrate relationships. We then propagate regulatory insights to related proteins and infer inconsistent/missing annotations using a random forest classifier.

Users supply a network together with anchor proteins (e.g., receptors and transcription factors) to guide regulatory flow. FlowSign maps prior knowledge onto the network and iteratively infers edge direction and effect, integrating
user-provided omics and annotations alongside perturbation data. Contradictory edges are only retained when supported by evidence. For interpretability, FlowSign can trim
non-contributing nodes, compress linear cascades, and extract subnetworks or shortest paths between anchors.

Benchmarking on knock-out and drug response datasets shows that FlowSign consistently outperforms state-of-the-art tools such as SIGNAL and NeKo in predicting regulatory direction and effect.

Implemented in R and distributed as a scalable Nextflow pipeline, FlowSign bridges data-driven network inference and executable mechanistic models, enabling conversion to
Boolean or ODE frameworks for context-specific signaling simulations. Future extensions will support time-course
data and iterative updating of prior knowledge through predictions. GitHub: https://github.com/benjamindmaier/flowsign-public

16:30-16:35
Advancing Careers and Team Science in Biomedical DataScience
Confirmed Presenter: Daria Sokolova, EMBL-EBI, United Kingdom


Authors List: Show

  • Daria Sokolova, EMBL-EBI, United Kingdom
  • Denise Bianco, The Alan Turing Institute, United Kingdom
  • Giulia Tomba, The Alan Turing Institute, United Kingdom
  • Kim Gurwitz, EMBL-EBI, United Kingdom
  • Catherine Brooksbank, EMBL-EBI, United Kingdom
  • Vera Matser, The Alan Turing Institute, United Kingdom
  • Emma Karoune, The Alan Turing Institute, United Kingdom

Presentation Overview: Show

The rapid evolution of biomedical research increasingly
relies on data-intensive methodologies, necessitating a
workforce proficient in data science. However, the
recognition and advancement of careers in biomedical data
science remain inconsistent, often hindered by unclear role
definitions, unstructured technical career pathways, and
the undervaluation of team-science approaches.
The Advancing Biomedical Data Science Careers is a two-year
project, funded by the MRC and jointly led by The Alan
Turing Institute and EMBL-EBI, that aims to document
skills, roles, career pathways, and team-science approaches
in biomedical data science.
The project is currently at its midpoint, allowing us to
present preliminary findings. These include an extensive
stakeholder map of the UK biomedical data science
ecosystem, developed by the research team and informed by
stakeholder input gathered through engagement activities
and open calls for feedback. The map provides a foundation
for high-level mapping of selected competency frameworks
relevant to the field, supporting the community in
navigating the landscape and identifying skills gaps. The
poster will also present early insights from an ongoing
qualitative study examining interdisciplinary biomedical
data science practices across organisations of different
types and scales. This work aims to document key
challenges, emerging best practices, and future needs for
initiating and sustaining effective team-science.
By fostering a structured understanding of competencies and
collaborative dynamics, the project aims to strengthen the
visibility, sustainability, and inclusivity of careers in
biomedical data science. Ultimately, this research
contributes to building a resilient, well-supported
workforce equipped to meet the demands of modern
data-driven biomedical research.

16:35-16:40
InterProScan 6: a modern large-scale protein functionannotation pipeline
Confirmed Presenter: Matthias Blum, European Molecular Biology Laboratory, European
Bioinformatics Institute (EMBL-EBI), United Kingdom


Authors List: Show

  • Matthias Blum, European Molecular Biology Laboratory, European
    Bioinformatics Institute (EMBL-EBI), United Kingdom
  • Emma Hobbs, European Molecular Biology Laboratory, European
    Bioinformatics Institute (EMBL-EBI), United Kingdom
  • Laise Cavalcanti Florentino, European Molecular Biology Laboratory, European
    Bioinformatics Institute (EMBL-EBI), United Kingdom
  • Alex Bateman, European Molecular Biology Laboratory, European
    Bioinformatics Institute (EMBL-EBI), United Kingdom

Presentation Overview: Show

InterProScan is a widely used software system for
large-scale protein function annotation, forming a core
component of annotation pipelines at resources such as
UniProt, Ensembl, and MGnify. After more than a decade of
use, InterProScan 5 faced increasing challenges related to
scalability, deployment complexity, data management, and
integration with modern computational infrastructures.

We present InterProScan 6, a complete reimplementation of
the pipeline as a Nextflow-based workflow designed to
support contemporary bioinformatics use cases. The new
architecture improves portability, reproducibility, and
scalability across local workstations, high-performance
computing (HPC) systems, and cloud environments.
InterProScan 6 decouples pipeline logic from signature
data, enabling flexible management of InterPro releases and
on-demand data retrieval, and provides native support for
containerised execution using Docker, Singularity, and
Apptainer.

Performance benchmarking across nine reference proteomes
spanning bacteria to large eukaryotes shows consistent
reductions in wall-clock runtime relative to InterProScan
5, with approximately two-fold speedups on complex
eukaryotic proteomes. When pre-computed annotations are
available, integration with a redesigned Matches API
enables runtimes to be reduced to minutes. Concordance
analysis across the full Swiss-Prot dataset demonstrates
that InterProScan 6 reproduces InterProScan 5 annotations
with near-identical precision and sensitivity across all
InterPro member databases.

InterProScan 6 provides a robust and future-proof platform
for efficient, flexible, and reproducible genome-scale
protein function annotation in large-scale biological data
analysis.

16:40-16:45
A Sneaky Peek at the CRUK Data Hub 
Confirmed Presenter: Frances Pearl, University of Sussex, United Kingdom


Authors List: Show

  • Sarah Wooller, University of Sussex, United Kingdom
  • Ayoola Olojede, University of Sussex, United Kingdom
  • Sanika Raut, CRUK, United Kingdom
  • Leslie Glass, HDR-UK, United Kingdom
  • Joseph Day, Cancer Research UK, United Kingdom
  • Eytan Kovelr, CRUK, United Kingdom
  • Andrew Blake, Oxford University, United Kingdom
  • Loki Sinclaire, HDR-UK, United Kingdom
  • Peter Harrison, HDR-UK, United Kingdom
  • Frances Pearl, University of Sussex, United Kingdom

Presentation Overview: Show

One of the biggest problems in bioinformatics today is
finding and accessing suitable datasets. To address this
problem in the field of cancer data, CRUK and the
Bioinformatics Lab at the University of Sussex are working
together to develop an online CRUK Data Hub where
researchers can search for information about cancer
datasets funded by CRUK. Fully compatible with the UK
Health Data Research Gateway, the Hub will provide details
on the datasets and how to access them, making it easier
for researchers to find and reuse cancer datasets,
fostering collaboration and the sharing of data. This
resource network will ultimately lead to more effective
research and, most importantly, better outcomes for cancer
patients. This is an opportunity for a sneaky peek at the
pilot hub  providing the opportunity to influence the final
product so that it better serves your needs.

16:45-17:30
Invited Presentation: Keynote from Dr. Syma Khalid

Authors List: Show

  • Syma Khalid, University of Oxford
17:30-17:45
Closing Remarks and Awards

Authors List: Show

  • Mark Wass