AI4Science loader
Authors
Brence, Jure, Džeroski, Sašo, Todorovski, Ljupčo
Publication
International Conference on Discovery Science, 2025
Abstract

Equation discovery, or symbolic regression, aims to uncover closed-form mathematical expressions from data, with the goal of facilitating scientific discovery in fields like physics and biology. Probabilistic context-free grammars (PCFGs) provide a structured, interpretable framework for encoding domain knowledge and defining distributions over candidate expressions. While sampling from a fixed PCFGs can provide a simple and effective approach to equation discovery, it struggles to identify complex expressions due to the vastness of the search space. To address this obstacle to successful equation discovery, we consider a novel Bayesian approach that iteratively refines the production rule probabilities within a PCFG, effectively guiding the search toward more promising regions of the space based on observed data. Our computational experiments demonstrate that this approach substantially increases the probabilities assigned to true equations, directly improving both the performance and computational efficiency of the discovery process. However, the lack of context sensitivity in PCFGs limits the effect of tuning production rule probabilities and consequently limits the gains in computational efficiency. This highlights an important challenge for future work in this area. The introduced approach is implemented as part of the Python package ProGED: https://github.com/brencej/ProGED.