Architectural Nuances and Benchmark Gaps in Scientific ML: Two Vignettes

This was part of Statistical and Computational Challenges in Probabilistic Scientific Machine Learning (SciML)

Andrej Risteski, Carnegie Mellon University

Wednesday, June 11, 2025

Abstract: In deep learning, small architectural modifications—such as residual connections or rotary embeddings—have often had outsized impact. This talk explores two short vignettes in scientific machine learning that follow a similar spirit: that modest architectural adjustments can matter in the right regimes, and often in ways that standard benchmarks fail to capture. The first vignette considers a variant of GNNs that maintain edge-level states. We prove that in graphs with bottlenecks or hubs, this added state yields strictly more powerful models when the memory of the nodes (a proxy for the embedding dimension) and the depth is bounded. This representational separation would not be captured by prior theoretical lenses—which focus only on symmetry—and would not surface in standard benchmarks. Mathematically, we bring to bear techniques inspired by results in time-space tradeoffs in theoretical computer science, which to the best of our knowledge are new in this area. In the second vignette, we revisit time-dependent PDEs and consider a simple architectural change: adding a lightweight memory layer (based on state-space models like S4) to a neural operator. While this has little effect under full observability, it significantly improves performance when the system is partially observed or coarsely resolved—a setting where simple theory (via Mori-Zwanzig-Nakajima formalisms) predicts memory should help. Based on https://arxiv.org/abs/2410.09867 (to appear in ICML 2025) and https://arxiv.org/abs/2409.02313 (appeared in ICLR 2025).