Data Stewards play an important role in institutional, project and national data infrastructures to support a sustainable, FAIR, and efficient management of research data. The goal of this coffee meeting is to foster exchange among domain-specific (embedded) data stewards with data experts engaged in support infrastructures like the NFDI4Earth helpdesk, the DataHUB support group or federal state networks, to collaboratively develop strategies for enhancing data visibility and reusability by supporting researchers and ensuring a sustainable management of research data. The focus is on practical solutions and best practices that advance Data Stewardship in Earth System Sciences and beyond.
Meeting rooms
We present a comprehensive machine learning framework for predicting spatially distributed geographical data from point measurements. The framework takes as input a set of geographical features at a specified grid resolution (e.g., 5 arc-minute scale) and corresponding point measurements with their spatial coordinates and target values. The framework trains and evaluates multiple machine learning models, including both tree-based methods (Random Forest, XGBoost, CatBoost) and deep learning architectures (feed forward neural networks, TabPFN[1]), to identify the optimal predictive model for the given dataset.
The framework incorporates hyperparameter search(depth and width) for deep learning models and systematic parameter search for tree-based models (e.g., number of estimators). This ensures robust model selection and performance optimization across different geographical contexts and data characteristics. The framework outputs the best-performing model along with comprehensive performance metrics and uncertainty estimates.
As a non-trivial application, we demonstrate the framework's effectiveness in predicting total organic carbon (TOC) concentrations[2] and sedimentation rates in the ocean. This involves integrating features from both the sea surface and seafloor, encompassing a diverse array of oceanographic, geological, geographic, biological, and biogeochemical parameters. The framework successfully identifies the most suitable model architecture and hyperparameters for this complex spatial prediction task, providing both high accuracy and …
Meeting room
Lecture Hall
This study presents an end-to-end deep learning framework, 4DVarNet, for reconstructing high-resolution spatiotemporal fields of suspended particulate matter (SPM) in the German Bight under realistic satellite data gaps. Using a two-phase approach, the network is first pretrained on gap-free numerical model outputs masked with synthetic cloud patterns, then fine-tuned against sparse CMEMS observations with an additional independent validation mask. The framework architecture embeds a trainable dynamical prior and a convolutional LSTM solver to iteratively minimize a cost function that balances data agreement with physical consistency. The framework is applied for one year data (2020) of real observations (CMEMS) and co-located model simulations, demonstrating robust performance under operational conditions. Reconstructions capture major spatial patterns with correlation R2 = 0.977 and 50% of errors within ± 0.2 mg/L, even when 27% of days lack any observations. Sensitivity experiments reveal that removing 60% of available data doubles RMSE and smooths fine-scale SPM spatial features. Moreover, increasing the assimilation window reduces edge discontinuities between the data-void area and the adjacent data-rich region, whereas degrades sub-daily variability. Extending 4DVarNet to higher temporal resolution (hourly) reconstruction will require incorporating tidal dynamics to account for SPM resuspension, enabling real-time sediment transport forecasting in coastal environments.
Meeting room
Lecture Hall
Images and videos are usually a more vivid data source than raw scalar data. However, even in the era of analog photo albums, metadata was added to images to preserve their context for the future. Today, the marine community wants to analyze far larger datasets of videos and images using computers, which generally cannot easily understand the image content on their own. Therefore, researchers have to record the content and context of images in a structured format to enable automated, systematic and quantitative image analysis.
The metadata file format FAIR Digital Objects for images (iFDOs) provides this structure for describing individual images and hole datasets. iFDOs primarily structure the answers to the five W's and H questions: Where were the images taken, by whom, why, when, how, and what is actually shown in the images or videos. Together, these pieces of information provide FAIRness (findability, accessibility, interoperability and reusability) to datasets.
Researchers benefit from iFDO enhanced datasets, as they already provide the information necessary for data homogenization, enabling machine learning applications and mass-data-analysis. Data viewers and portals, such as marine-data.de , can increase the reach and impact of datasets by visualizing the datasets and making them findable using the context …
Compliant with the FAIR data principles, the long-term archiving of marine seismic data acquired from active-source surveys remains a critical yet complex task within the geophysical data life cycle. Data infrastructures such as PANGAEA – Data Publisher for Earth & Environmental Science and affiliated repositories must address the increasing volume, heterogeneity, and complexity of these datasets, which are produced using a variety of acquisition systems. To support this, the German marine seismic community is actively developing metadata standards tailored to different seismic data types, enabling their proper integration and archiving in PANGAEA. In parallel, new semi-automated workflows and standard operating procedures (SOPs) are being established and implemented to ensure consistent data publication and sustainable long-term stewardship.
These advancements are being driven by the “Underway” Research Data project, a cross-institutional initiative of the German Marine Research Alliance (Deutsche Allianz Meeresforschung e.V., DAM). Initiated in mid-2019, the project aims to standardize and streamline the continuous data flow from German research vessels to open-access repositories, in alignment with FAIR data management practices. Marine seismic data curation, in particular, stands out as a successful use case for integrating expedition-based data workflows. By leveraging the tools, infrastructure, and expertise provided by the “Underway” Research Data …
Meeting room
Lecture Hall
Autonomous Underwater Vehicles (AUVs) and Remotely Operated Vehicles (ROVs) are essential tools for investigating marine environments. These large-scale platforms are equipped with a variety of sensors and systems, including CTD, fluorometers, multibeam echosounders, side-scan sonar, and camera systems. ROVs also have the capability to collect water, biological, and geological samples. As a result, the datasets acquired from these missions are highly heterogeneous, combining diverse data types that require careful handling, standardization of metadata information, and publication.
At GEOMAR, we develop and implement within the context of the Helmholtz DataHub a comprehensive workflow that spans the entire data lifecycle for large scale facilities.
It combines using the infrastructures of O2A Registry for device management, Ocean Science Information System (OSIS) for cruise information, PANGAEA for data publication and the portal earth-data.de for future visualization of AUV and ROV missions.
The presented workflow is currently deployed for GEOMAR’s REMUS6000 AUV "Abyss", and is being designed with scalability in mind, enabling its future application to other AUVs and ROVs.
Meeting room
Lecture Hall
The German research vessels Alkor, Elisabeth Mann Borgese, Heincke, Maria S. Merian, Meteor, Polarstern and Sonne steadily provide oceanographic, meteorological and other data to the scientific community. However, accessing and integrating time series raw data from these platforms has traditionally been fragmented and technically challenging. The newly deployed DSHIP Land System addresses this issue by consolidating time series data from marine research vessels into a unified and scalable data warehouse.
At its core, the new system stores raw measurement data in the efficient and open Apache Parquet format. These columnar storage files allow for rapid querying and filtering of large datasets. To ensure flexible and high-performance access, the system uses a Trino SQL query engine running on a Kubernetes cluster composed of three virtual machines. This setup can be elastically scaled to meet variable demand, enabling efficient data access even under high load.
This talk will briefly introduce the technical foundations of the DSHIP Land System, highlight the choice of storage format, the architecture of the Trino engine, and its deployment in a containerized Kubernetes environment. The focus will then shift to a demonstration how users can interactively query the datasets using standard SQL, enabling cross-vessel data exploration, filtering by …
Meeting room
Lecture Hall
The Baltic Sea is a semi-enclosed shelf sea and characterized by its distinct geographical and oceanographic features. One of the Baltic’s most remarkable features is its surface salinity gradient that is horizontally decreasing from the saline North Sea to the near fresh Bothnian Sea in the north, and Gulf of Finland in the east. Additionally, a vertical gradient and strong stratification separate between less saline surface water and deep saline water. These salinity features are mainly driven by a combination of river runoff, net precipitation, wind conditions, and geographic features that lead to restricted and irregular inflow of saltwater into the Baltic and limited mixing. The overall positive freshwater balance causes the Baltic to be much fresher compared to fully marine ocean waters with a mean salinity of only about 7 g/kg. The Baltic Sea is particularly sensitive to climate change and global warming due to its shallowness, small volume and limited exchange with the world oceans. Consequently, it is changing more rapidly than other regions. Recent changes in salinity are less clear due to a high variability but overall surface salinity seems to decrease with a simultaneous increase in the deeper water layers. Furthermore. the overall salinity distribution is …
Meeting room
Lecture Hall
The novel DSHIP land system integrates new concepts and state-of-the-art technology to explore, access, and process environmental time series data from various platforms. In this demo session we would like to show i) where to find and how to access the long established features from the land system, ii) highlight some of the new features, such as filtering, querying, and subsetting of (mass) data, iii) options for traceability and interoperability, and iv) give some insights and benchmarks of the systems.