As part of the Data Science Project Unit in collaboration with SNCF, our team analyzed data collected from catenaire uplift sensors across the French railway network to model and predict extreme uplift events. I was responsible for cleaning and structuring the dataset, ensuring consistency across more than 800,000 raw records from six monitoring stations. This involved filtering out erroneous or incomplete sensor readings and aggregating validated data into a single dataset for analysis.
I also led the Extreme Value Theory (EVT) study, applying both the Generalized Pareto Distribution and the Generalized Extreme Value to characterize rare but critical uplift events. By determining optimal thresholds through Monte Carlo simulations, I was able to identify physically meaningful extremes and estimate failure probabilities for each monitoring site. This work provided valuable insights into the reliability and safety of the SNCF’s overhead contact line system.