All the presentations are available here.

__Marcelo Barreiro (University of the Republic of Uruguay):__ "Climate networks and atmospheric connectivity"

Advancing our understanding of the complex dynamics of our climate requires the development and use of new approaches for climate data analysis. In this talk I will present how the application of the complex network approach in conjunction with nonlinear analysis has yielded new insights into atmospheric and oceanic phenomena. In particular, I will focus on the detection and variability of atmospheric connectivity during the XX century and how it might change under anthropogenic forcing.

__Marc Bocquet (Ecole des Ponts ParisTech):__ "Dynamics-based reduction of data assimilation for chaotic models"

Data assimilation in geophysics handle huge sets of data. As opposed to big data techniques, data assimilation uses high-dimensional dynamical models to make sense of these data. For the sake of consistency and computational fficiency, there are several pragmatic ways in geophysical DA to discard irrelevant data, through quality control, thinning, etc; and others to reduce the model complexity: upscaling, parametrizations, etc. Quite differently, I will show how certain features of the dynamics of chaotic models could be used to alleviate such big data problem. In the line of the work of Anna Trevisan and co-authors, this talk will focus on the impact of the unstable and neutral subspace, i.e. the space spanned by the backward Lyapunov vectors with non-negative exponents, on (ensemble) Kalman filtering and smoothing techniques. I will demonstrate that, in the linear and perfect model case, the error covariance matrix is asymptotically supported by the unstable and neutral subspace only. I will examine what becomes of this picture in the weakly nonlinear, possibly imperfect case, and will also discuss how this extends to new techniques such as 4D nonlinear ensemble variational methods. These investigations suggest how the cost of representing the uncertainty in chaotic dynamical systems could be significantly reduced.

__Ronan Fablet (IMT-Atlantique, France):__ "Analog assimilation for high-dimensional geophysical dynamics"

Analog assimilation has recently been introduced as a data-driven alternative to model-driven data assimilation. The key idea is to build an exemplar-based dynamical model from a representative dataset of exemplars of the considered state-space. Here, we specifically address and discuss analog assimilation for high-dimensionsal state-space, more specifically spatio-temporal fields. We introduce a novel model, which combines a patch-based representation to a multi-scale and PCA-based decomposition. This model amounts to decomposing the global high-dimensional assimilation problem into a series of local low-dimensional assimilation problems. We demonstrate its relevance through an application to the reconstruction of spatio-temporal fields from irregularly-sampled observations. As case-study, we consider the spatio-temporal interpolation of satellite-derived sea surface geophysical fields. We further discuss large-scale implementation issues associated with the analog assimilation.

__Dorit Hammerling (IMAGe-NCAR, US):__ "Compression and Conditional Emulation of Climate Model Output"

Numerical climate model simulations runs at high spatial and temporal resolutions generate massive quantities of data. As our computing capabilities continue to increase, storing all of the generated data is becoming a bottleneck, and thus is it important to develop methods for representing the full datasets by smaller compressed versions. We propose a statistical compression and decompression algorithm based on storing a set of summary statistics as well as a statistical model describing the conditional distribution of the full dataset given the summary statistics. The statistical model can be used to generate realizations representing the full dataset, along with characterizations of the uncertainties in the generated data. Thus, the methods are capable of both compression and conditional emulation of the climate models. Considerable attention is paid to accurately modeling the original dataset, particularly with regard to the inherent spatial nonstationarity in global temperature fields, and to determining the statistics to be stored, so that the variation in the original data can be closely captured, while allowing for fast decompression and conditional emulation on modest computers.

__Alexis Hannart (Ouranos, Canada):__ "A few problems in climate research that data science may help tackle"

This conference attempts to gather researchers in the environmental sciences with researchers in data science, in the hope that the interaction between the two areas will be fruitful. But why are we inclined to believe that such an interaction could actually be fruitful? Beyond the very high level and blur «data deluge » argument, can we be more specific? Here we attempt to draft a list of precise scientific questions in the realm of climate sciences, for which a data science approach seems to be relevant and may potentially make a big difference. Furthermore, we attempt to reframe these scientific questions in a way that makes them directly amenable to data science usual tools and concepts, in order to make them more accessible and attractive to data scientists. An illustration on the attribution of climate trends and events is discussed.

__Ibrahim Hoteit (KAUST, Saudi Arabia):__ "Gaussian-mixture filtering high dimensional systems with small ensembles"

The update step of the Gaussian-mixture filter consists of an ensemble of Kalman updates for each center of the mixture, generalizing the ensemble Kalman filters (EnKFs) update to non-Gaussian distributions. Sampling the posterior distribution is required for forecasting with the numerical model. Because of computational limitations, only small samples could be considered when dealing with large scale atmospheric and oceanic models. As such a "targeted" sampling that captures some features of the posterior distribution might be a better strategy than a straightforward random sampling. This was numerically demonstrated for the Gaussian-based ensemble filters, with the deterministic EnKFs outperforming the stochastic EnKF in many applications. In this talk, I will present two filtering algorithms based on this idea of "targeted" sampling; the first one introduces a deterministic sampling of the observations perturbations in the stochastic EnKF in order to exactly match the first two moments of the Kalman filter, and the second one is based on a Gaussian-mixture update step based on a clustering of the forecast ensemble and a resampling step matching the first two moments of the posterior distribution. Numerical results will be presented and discussed.

__Erwan Le Pennec (Ecole Polytechnique, France):__ "A gentle introduction to Data Science"

In this talk, I will try to explain what is Data Science, to demystify the Big Data term, to present a few Data Science open challenges and to describe what should be a Data scientist.

__Pierre-Yves Le Traon (Mercator, France):__ "The Copernicus Marine Environnment Monitoring Service"

More than ever, there is a need to continuously monitor the oceans. This is imperative to understanding and predicting the evolution of our weather and climate. This is also essential for a better and sustainable management of our oceans and seas. The Copernicus Marine Environment Monitoring Service (CMEMS) has been set up to answer these challenges. CMEMS provides a unique monitoring of the global ocean and European seas based on satellite and in situ observations and models. CMEMS monitors past (over the last 30 years) and current marine conditions and provide short-term forecasts. Mercator Ocean was tasked by the EU to implement the service. The organisation is based on a strong European partnership with more than 60 marine operational and research centres in Europe that are involved in the service and its evolution. An overview of CMEMS, its drivers, organization and initial achievements will be given. The essential role of in-situ and satellite upstream observations will be discussed as well as CMEMS Service Evolution Strategy, associated R&D priorities and future technical and scientific challenges. Challenges related to big data issues will be, in particular, addressed.

__Manuel Lopez Radcenco (IMT-Atlantique, France):__ "Non-negative and sparse decomposition of geophysical dynamics"

The growing availability of multi-source environmental data (remote sensing data, numerical simulations, in situ data, etc.) paves the way for the development of novel data-driven models and strategies for the characterization, reconstruction and forecasting of geophysical dynamics. In this context, the observation-driven identification and separation of contributions and operators associated with different geophysical sources or processes is a key issue. Following significant advances reported in signal processing with the introduction of non-negative and sparse formulations, we address this issue from the blind decomposition of linear operators or transfer functions between variables or processes of interest. The proposed scheme relies on multiple superimposing linear regressions and on their calibration from the observed data. We explore locally-adapted multi-modal regression models and investigate different dictionary-based decompositions, namely based on principal component analysis (PCA), sparse priors and non-negativity constraints. This is regarded as a key feature to improve model calibration robustness. We illustrate and demonstrate the relevance of such decompositions for the analysis and reconstruction of geophysical dynamics. We first address forecasting issues. Using Lorenz ‘96 dynamical system as case-study, we introduce the blind dictionary-based decomposition of local linear operators. Our numerical experiments resort to improved forecasting performance when dealing with small-sized and noisy observation datasets. A second application addresses the super-resolution of irregularly-sampled ocean remote sensing images. We focus on the reconstruction of high-resolution Sea Surface Height (SSH) from the synergy between along-track altimeter data, OI-interpolated SSH fields and satellite-derived high-resolution Sea Surface Temperature (SST) fields. The reported experiments, for a case study region in the Western Mediterranean Sea, demonstrate the relevance of the proposed model, specially of locally-adapted parametrizations with non-negativity constraints, to outperform optimally-interpolated reconstructions.

__Guillaume Maze (IFREMER, France):__ "Applications of ocean profile classification modelling"

Ocean dynamics and the induced 3-dimensional structure and variability is so complex that it is very difficult to develop objective and efficient diagnostics of horizontally and vertically coherent oceanic patterns. However, identifying such patterns is crucial to the understanding of interior mechanisms as, for instance, the integrand giving rise to Global Ocean Indicators (e.g. heat content and sea level rise). We believe that, by using state of the art machine learning algorithms and by building on the increasing availability of ever-larger in situ and numerical model datasets, we can address this challenge in a way that was simply not possible a few years ago. We will present the principles and first results of an approach introduced by Maze et al (2017) based on what is coined a "Profile Classification Model" or PCM that focuses on vertically coherent patterns and their spatial distribution. PCM can be used in a variety of oceanographic problems (front detection, water mass identification, natural region contouring, reference profile selection for validation, etc...).

__Patrick McDermott (University of Missouri, US):__ "A hierarchical spatio-temporal analog forecasting model for nonlinear ecological processes"

Analog forecasting has been successful at producing robust forecasts for a variety of ecological and physical processes. In essence, analog forecasting is a mechanism-free nonlinear method that forecasts a system forward in time by examining how past states deemed similar to the current state moved forward. Previous work on analog forecasting has typically been presented in an empirical or heuristic context, as opposed to a formal statistical framework. We propose a Bayesian model for analog forecasting, building upon previous analog methods. Thus, unlike traditional analog forecasting methods, the use of Bayesian modeling allows one to rigorously quantify uncertainty to obtain realistic posterior predictive forecasts. The model is applied to the long-lead time forecasting of mid-May averaged soil moisture anomalies in Iowa over a high-resolution grid of spatial locations. We also further develop the model in a hierarchical framework for the purpose of forecasting count-valued data by using nonnegative matrix factorization (NMF) to conduct dimension reduction. This extension of the model is applied to the forecasting of waterfowl counts in the United States and Canada.

__Olivier Mestre (Météo France):__ "Calibration of Numerical Weather Forecasts using Machine Learning Algorithms"

NWP models usually capture main circulation patterns, but are usually biased in accounting for local variations in surface local meteorological parameters. Hence, statistical post-processing techniques are used to improve local weather predictions : MOS (Model Ouput Statistics), EMOS (Ensemble Model Output Statistics). In this talk, we briefly recall the principle of MOS techniques, and show examples of applications for parameters such as temperatures, windspeed, cloud cover. We discuss the applicability of linear models versus classical machine learning algorithms : from trees to random forests, SVM, etc. Since data amounts implied in post-processing of high resolution gridded fields is huge (> Tbyte), we investigate clues to solve computation time problems. Similarly to deterministic models, Ensemble Forecast Systems tend to be biased, but this bias also affects dispersion, very often raw ensembles tend to be underdispersive. We will show how techniques based on Quantile Regression Forests are able to efficiently correct probabilistic forecasts in a non parametric way.

__Takemasa Miyoshi (RIKEN, Japan):__ "Big Data Assimilation for 30-second-update 100-m-mesh Numerical Weather Prediction"

As computer and sensor technologies advance, numerical weather prediction will face the challenge of integrating Big Simulations and observation Big Data. I will present my perspective on the next 10-20 years of data assimilation with the future-generation sensors and post-peta-scale supercomputers, based on our own experience with the 10-petaflops “K computer”. New sensors produce orders of magnitude more data than the current sensors, and faster computers enable orders of magnitude more precise simulations, or “Big Simulations”. Data assimilation integrates the “Big Data” from both new sensors and Big Simulations. We started a “Big Data Assimilation” project, aiming at a revolutionary weather forecasting system to refresh 30-minute forecasts at 100-m resolution every 30 seconds, 120 times more rapidly than hourly-updated systems. We also investigated ensemble data assimilation using 10240 ensemble members, largest ever for the global atmosphere. Based on the experience using the K computer, we will discuss the future of data assimilation in the forthcoming Big Data and Big Simulation era.

__Philippe Naveau (IPSL, France):__ "Revising return periods for record events in a climate event attribution context"

Both climate and statistical models play an essential role in the process of demonstrating that the distribution of some atmospheric variable has changed over time and in establishing the most likely causes for the detected change. One statistical difficulty in the research field of Detection and Attribution resides in defining events that can be easily compared and accurately inferred from reasonable sample sizes. As many impacts studies focus on extreme events, the inference of small probabilities and the computation of their associated uncertainties quickly becomes challenging. In the particular context of event attribution, we address the question of how to compare records between the so-called world as "it might have been been without anthropogenic forcings" and the "world that is". Records are often the most important events in terms of impact and get much media attention. We will show how to efficiently estimate the ratio of two small probability of records. The inferential gain is particularly substantial when a simple hypothesis testing procedure is implemented. The theoretical justification of such a proposed scheme can be found in Extreme Value Theory. To illustrate our approach, classical indicators in event attribution studies like the Risk Ratio or the Fraction of Attributable Risk, are modified and tailored to handle records. We illustrate the advantages of our method through theoretical results, simulation studies, temperature records in Paris and outputs from a numerical climate model.

__Douglas Nychka (IMAGe-NCAR, US):__ "Large and non-stationary spatial fields: Quantifying uncertainty in the pattern scaling of climate models"

Pattern scaling has proved to be a useful way to extend and interpret Earth system model (i.e. climate) simulations. In the simplest case the response of local temperatures is assumed to be a linear function of the global temperature. This relationship makes it possible to consider many different scenarios of warming by using simpler climate models and combining them with the scaling pattern deduced from a more complex model. This work explores a methodology using spatial statistics to quantify how the pattern varies across an ensemble of model runs. The key is to represent the pattern uncertainty as a Gaussian process with a spatially varying covariance function. We found that when applied to the NCAR/DOE CESM1 large ensemble experiment we are able to reproduce the heterogenous variation of the pattern among ensemble members. Also these data present an opportunity to fit a large, fixed-rank Kriging model (LatticeKrig) to give a global representation of the covariance function on the sphere. The climate model output at 1 degree resolution has more than 50,000 spatial locations and so requires special numerical approaches to fit the covariance function and simulate fields. Much of the local statistical computations are embarrassingly parallel and the analysis can be accelerated by parallel tools within the R statistical environment.

__Thierry Penduff (IGE, France):__ "Probabilistic analysis of the OCCIPUT global ocean simulation ensemble"

The ocean dynamics are described by nonlinear Partial Derivative Equations, in which the time-dependent atmospheric forcing (winds, heat/freshwater fluxes) is prescribed as boundary conditions. Ocean General Circulation Models (OGCMs) are used to solve these equations, in order to study the Global Ocean evolution over weeks to centuries in a realistic context (in terms of physics, initial/boundary conditions, domain geometry, etc).

Unlike low-resolution OGCMs that were used in recent climate projections (IPCC), high-resolution OGCMs are nonlinear enough to spontaneously generate an intrinsic ocean variability, i.e. under constant forcing. This strong phenomenon has a chaotic behavior (i.e. sensitivity to initial perturbations) and impacts many climate-relevant quantities over a broad range of spatio-temporal scales (up to the scale of oceanic basins and multiple decades). Whether and how this atmospherically-modulated, low-frequency oceanic chaos may, in turn, impact the atmosphere and climate is an unsettled issue; it is however crucial in the perspective of the next IPCC projections, which will use high-resolution OGCMs coupled to the atmosphere.

Before addressing this coupled issue, oceanographers need to disentangle the forced/intrinsic parts of the oceanic variability, identify the structure and scales of both components, and their possible interplays. In the framework of the OCCIPUT ANR/PRACE project, we have performed a 50-member ensemble of global ocean/sea-ice 3D simulations, driven by the same 1958-2015 atmospheric forcing after initial state perturbations. The structure and temporal evolution of the resulting ensemble PDFs hence yield a new (probabilistic) description of the global ocean/sea-ice multi-scale variability over the last 5 decades, raising new questions regarding the detection and attribution of climatic signals, and providing new insights about the complex oceanic dynamical system.

We will first describe our objectives, our ensemble simulation strategy, the classical approaches we have first used to analyze these data and our present results. We will present the non-gaussian metrics (based e.g. on the Information Theory) we are developping to more thoroughly characterize the features, scales and imprints of the oceanic chaos and of their atmospheric modulation. As a perspective, it is likely that more specific (supervised/unsupervised classification/analysis/pattern recognition) signal processing techniques could provide more relevant information from this large (~100 TB), novel 5-dimensional (space, time, ensemble) dataset, and strengthen the emergence of probabilistic oceanography for climate science.

__Nicolas Raillard (IFREMER, France):__ "Spatial modeling of extreme"

In ocean engineering, estimating the probability of occurrence of extreme sea-states is crucial for the conception of Marines Renewable Energy structures (MRE). This probability is usually estimated for a return period which is considerably larger than the observation period, and thus one needs to extrapolate far beyond the observed range of data, which leads to uncertain estimates. In addition, as far as structural safety is concerned, the extremal loads involve covariates and multivariate modeling. In this study, we will review methods available to assess the extremal behavior of sea-states: we will give a particular attention to the definition of an extremal event, to the threshold selection and to the temporal and spatial correlations. For example, the spatial variability should be taken into account in order to reduce the uncertainty of single site analysis, and automated methods must be developed to cope with the large amount of available data. Moreover, modeling multivariate extremes is still in early stage and many methods exist and will be compared. The influence of the differences between the methods on structural safety will be assessed on systems whose responses cannot be easily characterized by only one variable. For illustration purpose, we will rely on a recently released dataset, the HOMERE hindcast database, which is available on an unstructured grid which evolves from about 10 km offshore down to 200 m near-shore. The domain of the model extends from the South of the North Sea to the North of Spain covering the whole continental shelf in the Bay of Biscay. Global parameters are available at each point of the high resolution computational grid over the whole 19-year simulation running from 1994 to 2012 with an hourly time step. A comparison to other available data sources will also be carried out.

__Mélanie Rochoux (Cerfacs, France):__ "Environmental risk prediction using reduced-cost Ensemble Kalman Filter based on Polynomial Chaos surrogate"

In flood forecasting uncertainties in river bed roughness coefficients, upstream and lateral discharges as well as water level-discharge relation translate into uncertainties in simulated water levels. In wildfire forecasting, uncertainties in the biomass moisture content, biomass fuel properties, surface wind velocity and orography induce uncertainties in the simulated position and intensity of active flame areas. These uncertainties in physical parameters, external forcing and modeling assumptions go beyond the limitations of deterministic forecast capabilities of a given dynamical system. They suggest the use of ensemble forecasts to stochastically characterize the response of the forward model and thereby establish the range of possible scenarios in case of imminent environmental risk. Uncertainty quantification methods aim at designing a cost-effective surrogate to the forward model, which is then used to identify the main sources of uncertainties in a given system and to quantify how they translate into uncertainties in the Quantities of Interest (QoI) for the risk managers, for instance the probability to exceed a given threshold at some strategic locations. We focus here on Polynomial Chaos (PC) expansion since its coefficients directly relate to the statistics of the quantities of interest. A promising approach to reduce uncertainties is to integrate in situ and/or remote sensing observations into the model using an Ensemble Kalman Filter (EnKF). This algorithm requires the stochastic estimation of the covariances between the target sources of uncertainties and the QoI, thus implying a large number of model integrations due to possible model nonlinearity. In order to reduce the EnKF computational cost, a probabilistic sampling based on PC expansion is combined with the EnKF algorithm (PC-EnKF). In this hybrid strategy, the PC surrogate is used in place of the forward model to estimate the input-output error cross-covariance and output error covariance statistics, leading to the formulation of the Kalman gain matrix. The new estimate of the target sources of uncertainties is then obtained by applying the classical Kalman filter equation. The hybrid PC-EnKF was found to feature similar performance as that of the standard EnKF for both wildfire and open-channel flow applications, without loss of accuracy but at a much-reduced computational cost. The following question is to be addressed in the near future: Can data science help estimating at reduced cost the input-output error cross-covariance and output error covariance statistics in the framework of the PC-EnKF algorithm to pave the way toward operational applications?

__Thomas Romary (Mines ParisTech, France):__ "Covariance decomposition for the kriging of large datasets"

Large spatial datasets are becoming ubiquitous in environmental sciences with the explosion in the amount of data produced by sensors that monitor and measure the Earth system. Consequently, methods for the geostatistical analysis of these data have to be developed. Richer datasets lead to more complex modeling but may also prevent from using classical techniques. Indeed, the kriging predictor is not straightforwarldly available as it requires the inversion of the covariance matrix of the data. The challenge of handling such datasets is therefore to extract the maximum of information they contain while ensuring the numerical tractability of the associated inference and prediction algorithms. In this work, we will first provide an overview of the different approaches that have been developed in the literature to address this problem. They can be classified into two families, both aiming at making the inversion of the covariance matrix computationally feasible. The tapering approach circumvents the problem by enforcing the sparsity of the covariance matrix, making it invertible in a reasonable computation time. The second available approach assumes, on the contrary, a low rank representation of the covariance function. While both approaches have their drawbacks, we propose a way to combine them. The covariance model is assumed to have the form sparse plus low rank, both terms being possibly non stationary. The choice of the basis functions sustaining the low rank component is data driven and is achieved through a selection procedure. This model expresses as a spatial random effects model and maximum likelihood estimation of the parameters can be conducted through the expectation-maximization algorithm.

__Jakob Runge (Imperial College, UK):__ "Causal inference methods in the geosciences"

What can we learn about interactions between subprocesses of the Earth system from studying time series measurements? What can novel causal inference methods tell us beyond common correlation analyses and Granger causality, and what are the pitfalls? The focus of this talk will be statistical methods of causal inference to reconstruct causal associations and identify causal pathways in the reconstructed interaction network. These approaches build on recent causal discovery methods and machine learning techniques. The talk will interweave methodological parts with climate examples.

__Adam Sykulski (Lancaster University, UK):__ "Stochastic Lagrangian modelling of ocean surface drifter trajectories"

The analysis of large-scale oceanographic datasets obtained from surface drifters is critical for developing our understanding of ocean circulation. One such database is from the Global Drifter Program by NOAA which has deployed over 23,000 drifters resulting in over 100 million recorded positions since 1979. In this talk we present a novel stochastic model that describes the motion of ocean surface drifters. The aim is to construct a model that simultaneously provides a good fit to observations, but is also constructed from geophysical fluid flow principles such that estimated parameters of the model are physically informative and in meaningful units. The model constitutes of four physically-motivated stochastic components. The first is for modelling the turbulent background flow, the second is for inertial oscillations, the third is for the semidiurnal tide, and the fourth is for the diurnal tide. For the turbulent background, we build on existing stochastic models in the literature, by proposing a more generalised stochastic process that allows for a wide range of decay rates of the Lagrangian velocity spectral slope. To model inertial oscillations, we construct a stochastic analogue of the damped-slab model of the surface mixed layer. We then construct novel and computationally-efficient procedures for fitting the aggregated stochastic model to observed Lagrangian velocity spectra from surface drifters. In total we estimate up to nine free parameters for each analysed trajectory segment, and these parameter estimates then provide useful summaries of structure at the drifter’s location. As examples of summaries, spatially-dependent estimates can be made of the rate of horizontal diffusivity, the damping timescale of inertial oscillations, or the rate of decay of the Lagrangian spectral slope. We present the results of two global analyses. The first uses all observations of the global surface drifter dataset since 1979 at lower temporal resolution to analyse the turbulent background. These findings provide the first global estimates of the Lagrangian spectral slope and how this varies spatially. The second analysis uses higher temporal resolution observations available since 2005 to resolve parameters that describe inertial oscillations. Here the aggregated stochastic model is used to separate the properties of inertial oscillations from the background flow and tidal signals. We present high-resolution global maps of inertial oscillation amplitude and damping timescales.

__Eniko Szekely (CIMS-NYU, US):__ "Data-driven kernel methods for dynamical systems with application to atmosphere ocean science"

Datasets generated by dynamical systems are often high-dimensional, but they only display a small number of patterns of interest. The underlying low-dimensional structure governing such systems is generally modeled as a manifold, and its intrinsic geometry is well described by local measures that vary smoothly on the manifold, such as kernels, rather than by global measures, such as covariances. In this talk, a kernel-based nonlinear dimension reduction method, namely nonlinear Laplacian spectral analysis (NLSA), is used to extract a reduced set of basis functions that describe the large-scale behavior of the dynamical system. These basis functions are the leading Laplace-Beltrami eigenfunctions of a discrete Laplacian operator. They can be further employed as predictors to quantify the regime predictability of a signal of interest using clustering and information-theoretic measures. In this talk, NLSA will be employed to extract physically meaningful spatiotemporal patterns from organized tropical convection covering a wide range of timescales, from interannual to annual, semiannual, intraseasonal and diurnal scales.

__Laurent Terray (Cerfacs, France):__ "Can Data Science help in the attribution of the southeastern United States *Warming Hole*?"

The past evolution of the Northern hemisphere land surface air temperatures (SAT) reflects the combined influence of external forcing with that of internal variability. Robust attribution statements (such as a given external forcing caused an observed variation) are challenging at regional scale because response to external forcings can be easily obscured by the possible confounding role of internal variability. Here we focus on the Warming Hole, a period of twenty-five years (1950-1975) with a large summer cooling trend that has occurred over the southeastern United States. Previous studies have claimed that increasing anthropogenic aerosol emissions are the main driver of the Warming Hole. Here we use both observations and initial condition large-ensembles of historical climate model simulations to revisit this attribution statement. In addition to the usual forced/free (internal) separation, we also perform a dynamical adjustment procedure. Dynamical adjustment simply decomposes the forced response and internal variability trend contributions into dynamically and thermodynamically induced components. We use Data Science algorithms, namely Random Forest and Stochastic Gradient Boosting, to perform the dynamical adjustment. The causes of SAT trends can then be formally separated into four components, whether they are free or forced, dynamically or thermodynamically induced. We then investigate the respective contributions of each component to the Warming Hole SAT trends. The free dynamical component is shown to be mainly responsible for the Warming Hole SAT trends. The forced thermodynamical component (representing the radiative effect of aerosols and other forcings) is shown to have a much less significant influence.

__Christopher Wikle (University of Missouri, US):__ "Recent Advances in Quantifying Uncertainty in Nonlinear Spatio-Temporal Statistical Models"

Spatio-temporal data are ubiquitous in the environmental sciences, and their study is important for understanding and predicting a wide variety of processes of interest to meteorologists and climate scientists. One of the primary difficulties in modeling spatial processes that change with time is the complexity of the dependence structures that must describe how such a process varies, and the presence of high-dimensional datasets and prediction domains. Much of the methodological development in recent years has considered either efficient moment-based approaches or spatio-temporal dynamical models. To date, most of the focus on statistical methods for dynamic spatio-temporal processes has been on linear models or highly parameterized nonlinear models (e.g., quadratic nonlinear models). Even in these relatively simple models, there are significant challenges in specifying parameterizations that are simultaneously useful scientifically and efficient computationally. Approaches for nonlinear spatio-temporal data from outside statistics (e.g., analog methods, neural networks, agent-based models) offer intriguing alternatives. Yet, these methods often do not have formal mechanisms to quantify various sources of uncertainty in observations, model specification, and parameter estimation. This talk presents some recent attempts to place these models, many of which were motivated in the atmospheric and oceanic sciences, into a more rigorous uncertainty quantification framework.