AI4Science loader

Project Overview

AI4Science project is a multidisciplinary initiative aimed at advancing the role of artificial intelligence in the scientific process. It focuses on developing novel AI methodologies—spanning explainable machine learning, foundation models, automated scientific modeling, and semantic technologies—to address the unique challenges of applying AI in the physical and life sciences. The project emphasizes the integration of data and domain knowledge, transparency in AI models, and support for open science principles. Applications range from drug and gene therapy design to equation discovery, environmental modeling, and materials science. With a strong consortium of Slovenian research institutions and access to state-of-the-art computational infrastructure, AI4Science seeks to significantly enhance scientific discovery through AI-driven tools and frameworks.

Work Packages

Each Work Package (WP) includes two parts: developing AI methods and applying them in scientific domains, each divided into three tasks. Method development tasks usually start earlier and follow a cycle of design, implementation, and evaluation of AI methods. Application tasks begin with problem definition and data collection, followed by applying AI methods (often machine learning) to generate insights, and ending with evaluation from the perspective of the scientific field.

WP1: Explainable Machine Learning for Science

Explainable Machine Learning for Science The central objective of WP1 is to create new ML methods that can develop models that are both effective and easy to interpret, particularly for complex data. The project also aims to use these methods to explain model predictions and monitor trends in scientific fields by analyzing bibliographic data. The motivation behind this is to increase transparency, reproducibility, and trust in AI-driven discoveries, especially in fields such as healthcare and genomics. The methods developed under this work package will be applied to practical problems like monitoring scientific field development, designing gene therapy, and drug design.

WP2: Foundation Models for Science

Foundation models for Science WP2 aims to develop multimodal foundation models that can be applied to various scientific domains. The project will create new methodologies for pre-training and fine-tuning these models, which will be able to handle multiple data types (modalities) such as text, images, and other scientific data like patient records and genomic information. The models will be designed to handle missing data and fuse information from different modalities into a single representation. The developed models will be applied to practical tasks in different scientific fields, including medicine (e.g., diagnosing breast cancer and predicting brain cancer prognosis), life sciences (e.g., predicting protein-RNA interactions and designing gene therapies), and materials science (e.g., discovering new materials and simulating manufacturing processes).

WP3: Automated Scientific Modelling

Automated Scientific Modelling The main objective of WP3 is to develop new AI methods for discovering scientific models, represented as equations, from both data and existing domain knowledge. The project will use both symbolic and neural approaches to ensure the models are both accurate and interpretable. It will also develop methods for discovering different types of equations, including ordinary, delay, and fractional-order differential equations, which can be used to model complex systems. The developed methods will be applied to various scientific domains, including plant biology, ecology, electrochemistry, and materials science, to solve problems such as estimating reaction rates in plant stress signaling networks and modeling the behavior of solid oxide cells.

WP4: Semantic Technologies for Open Science

Semantic Technologies for Open Science The primary goal of WP4 is to develop and extend semantic resources, such as ontologies, for the fields of machine learning and optimization. This includes creating resources that can effectively represent complex data types, tasks, and models, including large language models and multimodal foundation models. The project will also develop semantic resources to support AI applications in various scientific domains like materials science, mathematics, and medicine. The ultimate objective is to apply these semantic resources to explain the relationships between problem properties, algorithm configurations, and algorithm performance, thereby enabling tasks like automated machine learning (AutoML) and automated optimization (AutoOPT) in a more explainable manner.

WP5: Project management, dissemination, and communication

Semantic Technologies for Open Science The core objective of WP5 is to ensure the project is completed successfully and on time through efficient coordination, management, and outreach. This includes maintaining high quality standards for all project outcomes, implementing robust monitoring to manage risks, and promoting open science principles. The work package also focuses on broadly disseminating project results to a wide range of stakeholders through a dedicated project website, various communication materials, and events like workshops, a summer school, and conference presentations. This multi-faceted approach aims to maximize the project's impact by making its findings and resources transparent, accessible, and widely adopted by the scientific community.

Objectives

  • 1a. To develop explainable ML methods for interpretable modeling of complex data, integrating neural and symbolic approaches, explaining predictions, and tracking scientific trends using bibliographic data.
  • 1b. To apply explainable ML in complex settings to scientific problems, including trend monitoring, gene therapy, drug design, and mathematical discovery.
  • 2a. To develop methodology for pre-training and fine-tuning multimodal foundation models, to be used in different domains of science.
  • 2b. To learn and apply multimodal foundation models in different scientific domains, including medicine and healthcare, as well as materials science.
  • 3a. To develop AI methods to learn equation-based scientific models from data and domain knowledge using symbolic and neural approaches for accuracy and interpretability.
  • 3b. To apply equation-based AI methods to different domains: plant biology and ecology, as well as electrochemistry and materials science.
  • 4a. To develop semantic resources that describe ML and optimization and support AI applications across scientific domains.
  • 4b. To apply explainable ML to relate tasks/problems and properties/configurations of algorithms to algorithm performance in the areas of ML and optimization.
  • 5. To ensure effective project execution through strong coordination, quality management, open science practices, and broad dissemination to key stakeholders.

Timeline

Start: October 1st 2024
End: September 30th 2027.