
Job Information
Organisation/Company
École nationale des ponts et chaussées
Application Deadline
30 Nov 2025 – 23:59 (Europe/Paris)Type of Contract
Offer Description
Context
France is characterized by a vast number of underground shallows abandoned mines due to its geological features and industrial legacy. It is estimated that there are hundreds of thousands of these mines, many of which remain unidentified. Over time, these underground mines deteriorate, potentially leading to their collapse or/and ground movements such as subsidence, as well as localised or large-scale collapses. Particular attention must therefore be given to old, abandoned mines. It is essential to address the consequences of ceased mining operations in order to prevent risks to people, property and the environment. The mining risk at former mining locations should be determined by considering different factors: predisposition, the triggering or aggravating factor of external origin and intensity. As an aggravating factor, the climate change is expected to increase the frequency of extreme hydroclimatic incidents, such as severe droughts, heavy rains or floods. These climatic events may cause substantial changes in groundwater or watercourse levels, as well as significant water infiltration. However, this factor alone cannot fully explain the risk of collapse; other aspects, such as the geometric and geomechanical characteristics of the mines must also be considered.
Recent advances in the geotechnical and geomechanical fields have led to a significant increase in the usage of machine learning (ML), thanks to its computational power and ability to solve complex problems. However, challenges concerning its efficacy when data availability is limited remain, which is a common issue in these fields. In such cases, models may overfit, resulting in inaccurate predictions when new datasets are introduced. To address this, there has been a concerted effort to integrate traditional numerical methods with ML to improve our understanding of complex behaviours. Various methodologies have been developed, including physics-informed ML approaches that use numerical modelling to create synthetic datasets (e.g. Tristani et al., 2025). Additionally, approaches combining multiple ML models have been explored to optimise predictions, enabling algorithms to collaborate and achieve better results. Ensemble methods, in particular, have demonstrated superior performance and reliability with constrained datasets (e.g. Guayacán-Carrillo et al., 2025; Richa et al., 2025; Tristani et al., 2024). Furthermore, for practical applications in these fields, it is crucial to incorporate interpretable tools that offer transparency and assist engineers in understanding model outputs. One promising technique for this purpose is symbolic regression (SR), which aims to identify a mathematical expression that accurately describes the input–output relationship. The effectiveness of SR has been clearly demonstrated in underground projects (e.g. Guayacán-Carrillo & Sulem, 2024).
This PhD project will analyse data from Ineris comprising almost 4,000 records of localised collapse cases in abandoned French mines. The primary goal is to assess the effectiveness of applying machine learning tools to this dataset to gain deeper insights into the mechanics of collapses. Subsequently, the project will focus on developing a predictive tool to assist local authorities in making informed decisions about abandoned mines at risk of collapse.
The research is divided into three main and current parts:
* Creating a foundational database for machine learning application: this involves implementing a strategy designed specifically to meet the project’s unique requirements, which demands expert knowledge. The process also includes merging different data types into a single repository to enable fast, effective and secure access. An initial review of the Ineris inventory is therefore planned. Furthermore, comprehensive pre-processing and statistical analyses will be conducted. Additionally, unsupervised learning methods will be employed on the dataset to reveal hidden patterns and identify influential factors that might otherwise remain invisible or difficult to detect.
* Generating synthetic data: This project will explore the effectiveness and validity of using machine learning approaches to generate synthetic data in order to address incompleteness and improve the training datasets of machine learning models. Additionally, synthetic data informed by physical principles will be also utilised. Significant advances have been made in understanding field phenomena observed in underground mines through empirical and numerical methods. Physics-based learning could enhance model quality and generalisation by providing a physical foundation using synthetic data from these methods.
* Developing machine learning surrogate models: Drawing on engineering expertise and insights from previous stages, this phase has two main objectives:
(1) To assess and verify the effectiveness of ensemble methods with the database established in this project. Building on the proven effectiveness of ensemble techniques such as random forests (RF) with limited or incomplete datasets (Guayacán-Carrillo et al., 2025), the study will evaluate various ensemble approaches. These will include RF and XGBoost which are well-known for their efficiency with small datasets.
(2) To develop accessible predictive tools for engineers and local authorities. In this endeavour, symbolic regression will be utilised. These tools are essential as they provide explicit mathematical expressions linking input and output data, thereby enhancing the model’s relevance to engineers and facilitating informed decision-making.
