Submissions to the Data Science Symposium 2025

AK Metadaten-PIDs

There is an ever-increasing amount of data, projects, and publications hosted across several platforms coming from different disciplines. For one part, funding agencies require that as many results as possible are made available to be accessible even after a project is finished. Furthermore, there should be an intrinsic motivation by scientists to make their data as well described and easily findable as possible. Thoroughly described metadata is key to increase findability and subsequently increase reusability, which can increase the impact of one’s own research. Such metadata can be rated using the FAIR principles. However, only less than half of all scientists have ever heard, and even less used them to efficiently describe their research data. Consequently, metadata found on scientific repositories shows often strongly varying levels of completeness, or is simply faulty. There are various guidelines and initiatives led by e.g. research institutes to improve metadata quality, as it really is as famously noted “A love letter to the future”. We, the National Centre for Environmental and Nature Conservation Information based at the German Environment Agency, aim to integrate Germany’s scattered data and information landscape on environmental and nature knowledge into one central access point. We have encountered a wide …

Meeting rooms

Library Meetingroom

View details

Big Meetingroom

View details

Small Meetingroom

View details

License: CC-BY-4.0

View details

AK Metadaten-PIDs

📢 You're Invited: Advancing FAIR Data with NetCDF – Join the Conversation!

Ensuring that scientific data is Findable, Accessible, Interoperable, and Reusable (FAIR) is more important than ever. In Earth System Science, NetCDF has become the quasi-standard for storing multidimensional data. But to truly unlock its potential, we need rich, standardized metadata .

Join us for an insightful discussion where we’ll explore what are the key challenges in metadata compatibility and completeness. What tools do we need for improving the metadata of our scientific output? And how can we guarantee seamless metadata integration, AI-readiness, and improved data discoverability.

🔍 What to Expect:

An overview of the HMG NetCDF Initiative and its goals
A first look at the NetCDF metadata attribute guidelines
Discussion on aligning metadata fields across disciplines
A Discussion on tools for machine-readable templates and user-friendly metadata entry

🌍 This is a collaborative effort across German research centers and contributes to broader Helmholtz initiatives like HMC .

Let’s shape the future of geoscientific data together. We look forward to your participation and insights!

Meeting rooms

Library Meetingroom

View details

Big Meetingroom

View details

Small Meetingroom

View details

License: CC-BY-4.0

View details

MareHub

Data Stewards play an important role in institutional, project and national data infrastructures to support a sustainable, FAIR, and efficient management of research data. The goal of this coffee meeting is to foster exchange among domain-specific (embedded) data stewards with data experts engaged in support infrastructures like the NFDI4Earth helpdesk, the DataHUB support group or federal state networks, to collaboratively develop strategies for enhancing data visibility and reusability by supporting researchers and ensuring a sustainable management of research data. The focus is on practical solutions and best practices that advance Data Stewardship in Earth System Sciences and beyond.

Meeting rooms

Library Meetingroom

View details

Big Meetingroom

View details

Small Meetingroom

View details

License: CC-BY-4.0

View details

The World Data Center for Climate ( WDCC ) is a repository hosted by the German Climate Computing Center (DKRZ) in Hamburg, Germany. It provides access to and offers long-term archiving for data relevant for climate and Earth System research in a highly standardized manner following the FAIR principles. WDCC is an accredited regular member of the World Data System (WDS) since 2003 and certified as a Trustworthy Data Repository by CoreTrustSeal ( https://www.coretrustseal.org ).

WDCC services are aimed at both scientists who produce data (e.g. long-term archiving to fulfill the guidelines of good scientific practice) and those who re-use published data for new research. In Earth System Sciences often a big quantity of data is produced and needed for climate research. This is especially true for climate model output. To enable scientists to re-use data also across
domains, it is essential that data is archived including rich metadata. Before data is published in WDCC, it undergoes multiple checks and curation. Recently WDCC has established its own standard for NetCDF file headers, so that only data fulfilling this standard are accepted for publication.

We will be available at the booth to provide advice on data preparation, submission, publication -
how …

Meeting room

Foyer

View details

License: CC-BY-4.0

View details

Discover the full spectrum of data science support and expertise at the Helmholtz booth , where five Helmholtz-wide platforms showcase their combined strength to accelerate research—particularly in the Earth and Environment field. These platforms—Helmholtz AI, Helmholtz Information and Data Science Academy (HIDA), Helmholtz Federated IT Services (HIFIS), Helmholtz Imaging and Helmholtz Metadata Collaboration (HMC)—operate across the entire Helmholtz Association to advance artificial intelligence, imaging, metadata, research software and training . Our booth serves as a one-stop hub for information and data science exchange and support.

What We Offer:
- Helmholtz AI offers tailored consulting, computing resources, and funding opportunities to democratise and accelerate the application of artificial intelligence in science - supporting a wide range of use cases, from foundation models to domain-specific solutions.
- HIDA offers a rich portfolio of training, networking, and funding opportunities, fostering the next generation of data scientists and supporting upskilling across all career stages.
- HIFIS provides and brokers federated cloud services hosted across Helmholtz centers and delivers training and support on using those services, as well as on Research Software Engineering.
- Helmholtz Imaging provides access to cutting-edge imaging infrastructure, funding opportunities, and expert consulting - enabling researchers to unlock new insights from …

Meeting room

Foyer

View details

License: CC-BY-4.0

View details

NFDI4Earth is part of the National Research Data Infrastructure (NFDI), and specifically aims to address the needs of the Earth System Science community.
The ambition of the NFDI4Earth is to further identify the demands for digital changes in the German Earth System Science community, to establish a set of common principles, rules and standards for research data management in Earth System Science, and to provide tools and mechanisms for data integration and analysis in a structured community consultation process. NFDI4Earth will provide simple, efficient, open, and – whenever possible – unrestricted access to all relevant Earth system data, scientific data management and data analysis services. It will also provide learning and training materials through a learning community portal and open community resources.

NFDI4Earth is a community-driven process - its network comprises of 65 partnering institutions from Universities, Research Organisations, Infrastructure Providers, Governmental Institutions as well as Scientific Associations and Networks.

Visit us at our booth to meet our team and to get more information about this exciting project!

NFDI4Earth project website: https://www.nfdi4earth.de/

OneStop4All portal: https://onestop4all.nfdi4earth.de/

Meeting room

Foyer

View details

License: CC-BY-4.0

View details

Climate research often requires substantial technical expertise. This involves managing data standards, various file formats, software engineering, and high-performance computing. Translating scientific questions into code that can answer them demands significant effort. The question is, why? Data analysis platforms like Freva (Kadow et al. 2021, e.g., gems.dkrz.de ) aim to enhance user convenience, yet programming expertise is still required. In this context, we introduce a large language model setup and chat bot interface for different core e.g. based on GPT-4/ChatGPT or DeepSeek, which enables climate analysis without technical obstacles, including language barriers. Not yet, we are dealing with climate LLMs for this purpose. Dedicated natural language processing methodologies could bring this to a next level. This approach is tailored to the needs of the broader climate community, which deals with small and fast analysis to massive data sets from kilometer-scale modeling and requires a processing environment utilizing modern technologies, but addressing still society after all, such as those in the Earth Virtualization Engines (EVE - eve4climate.org ). Our interface runs on an High Performance Computer with access to PetaBytes of data - everything just a chat away.

Meeting room

Lecture Hall

View details

License: CC-BY-4.0

View details

The protection of critical underwater infrastructure, such as pipelines, data cables, or offshore energy assets, has become an emerging security challenge. Despite its growing importance, maritime infrastructure monitoring remains limited by high costs, insufficient coverage, and fragmented data processing workflows. The ARGUS project addresses these challenges by developing an AI-driven platform to support risk assessment and surveillance at sea.

At its core, ARGUS integrates satellite-based Synthetic Aperture Radar (SAR) imagery, AIS vessel tracking data, and spatial information on critical assets into a unified data management system. A key functionality is detecting so-called "ghost ships" – vessels that deliberately switch off their AIS transponders – using object detection techniques on SAR imagery.

At the same time, we are currently developing methods for underwater anomaly and change detection based on optical imagery. This work is still ongoing and focuses on identifying relevant structural or environmental changes in submerged infrastructure through automated image comparison and temporal analysis.

In this talk, we present the architecture and workflows of the ARGUS system, including our use of deep learning (YOLO-based object detection) in the maritime context. We share insights into the current capabilities and limitations of AI models for maritime surveillance, especially in the context of …

Meeting room

Lecture Hall

View details

License: CC-BY-4.0

View details

Current AI cannot function without data, yet this precious resource is often underappreciated. In the context
of machine learning, dealing with incomplete datasets are a widespread challenge. Large, consistent, and
error-free data sets are essential for an optimally trained neural network. Complete and well-structured in-
puts substantially contribute to both training, results and subsequent conclusions. As a result, using high-
quality data improves the performance and the ability of neural networks to generalize.
However, real-world datasets from field measurements can contain information leakage. Sensor failures,
maintenance issues or inconsistent data collection can cause invalid ('NaN', Not a Number) values to appear
in the neural network input matrices.
Imputation techniques are an important step in data processing for handling missing values. Estimating
'NaN' values or replacing them with plausible values directly affects the quality of the input data and thus
the effectiveness of the neural network.

In this contribution, we present a neural network-based regression model (ANN regression), that explains
the salt characteristics in the Elbe estuary. In this context, we focus on selecting appropriate imputation
strategies.
While traditional methods such as imputation by mean, median, or mode are simple and computationally
efficient, they sometimes fail to preserve the underlying data …

Meeting room

Lecture Hall

View details

License: CC-BY-4.0

View details

MareHub

We present a comprehensive machine learning framework for predicting spatially distributed geographical data from point measurements. The framework takes as input a set of geographical features at a specified grid resolution (e.g., 5 arc-minute scale) and corresponding point measurements with their spatial coordinates and target values. The framework trains and evaluates multiple machine learning models, including both tree-based methods (Random Forest, XGBoost, CatBoost) and deep learning architectures (feed forward neural networks, TabPFN[1]), to identify the optimal predictive model for the given dataset.
The framework incorporates hyperparameter search(depth and width) for deep learning models and systematic parameter search for tree-based models (e.g., number of estimators). This ensures robust model selection and performance optimization across different geographical contexts and data characteristics. The framework outputs the best-performing model along with comprehensive performance metrics and uncertainty estimates.
As a non-trivial application, we demonstrate the framework's effectiveness in predicting total organic carbon (TOC) concentrations[2] and sedimentation rates in the ocean. This involves integrating features from both the sea surface and seafloor, encompassing a diverse array of oceanographic, geological, geographic, biological, and biogeochemical parameters. The framework successfully identifies the most suitable model architecture and hyperparameters for this complex spatial prediction task, providing both high accuracy and …

Meeting room

Lecture Hall

View details

License: CC-BY-4.0

View details

Urban-scale air quality data is crucial for exposure assessment and decision-making in cities. However, high-resolution Eulerian Chemistry Transport Models (CTMs) with street-scale resolutions (100 m x 100 m), while process-based and scenario-capable, are computationally expensive and require city-specific emission inventories, meteorological fields and boundary concentrations. In contrast, machine learning (ML) offers a scalable and efficient alternative to enhance spatial resolution using existing regional-scale (1 km - 10 km grid resolutions) reanalysis datasets.

We present a reproducible ML framework that downscales hourly NO ₂ data from the CAMS Europe ensemble (~10 km resolution) to 100 × 100 m ² resolution, using 11 years of data (2013–2023) for Hamburg. The framework integrates satellite-based and modelled inputs (CAMS, ERA5-Land), spatial predictors (CORINE, GHSL, OSM), and time indicators. Two ML approaches are employed: XGBoost for robust prediction and interpretability (via SHAP values), and Gaussian Processes for quantifying spatial and temporal uncertainty.

The downscaling is evaluated through random, time-based and leave-site-out validation approaches. Results demonstrate good reproduction of observed spatial and temporal NO ₂ patterns, including traffic peaks and diurnal/seasonal trends. The trained models generate over 160 million hourly predictions for Hamburg with associated uncertainty fields. Although developed for Hamburg, the framework has been successfully …

Meeting room

Lecture Hall

View details

License: CC-BY-4.0

View details

MareHub

This study presents an end-to-end deep learning framework, 4DVarNet, for reconstructing high-resolution spatiotemporal fields of suspended particulate matter (SPM) in the German Bight under realistic satellite data gaps. Using a two-phase approach, the network is first pretrained on gap-free numerical model outputs masked with synthetic cloud patterns, then fine-tuned against sparse CMEMS observations with an additional independent validation mask. The framework architecture embeds a trainable dynamical prior and a convolutional LSTM solver to iteratively minimize a cost function that balances data agreement with physical consistency. The framework is applied for one year data (2020) of real observations (CMEMS) and co-located model simulations, demonstrating robust performance under operational conditions. Reconstructions capture major spatial patterns with correlation R2 = 0.977 and 50% of errors within ± 0.2 mg/L, even when 27% of days lack any observations. Sensitivity experiments reveal that removing 60% of available data doubles RMSE and smooths fine-scale SPM spatial features. Moreover, increasing the assimilation window reduces edge discontinuities between the data-void area and the adjacent data-rich region, whereas degrades sub-daily variability. Extending 4DVarNet to higher temporal resolution (hourly) reconstruction will require incorporating tidal dynamics to account for SPM resuspension, enabling real-time sediment transport forecasting in coastal environments.

Meeting room

Lecture Hall

View details

License: CC-BY-4.0

View details

The scientific landscape is continually shifting towards increasing amounts of data, demanding a greater investment of (time-) resources into the management, and (pre-) processing of these data. As a result, data literacy has become a key element for researchers from all domains. Additionally, interdisciplinary, multidisciplinary, and collaborative approaches are more essential now than ever before. The Rhine-Ruhr Center for Scientific Data Literacy (DKZ.2R) focuses on a combined methodological data literacy, integrating data science and machine learning skills, high performance computing and research data management competencies. Our main objective is to promote a holistic data literacy offering support for researchers in the form of trainings, consultings, data challenges and tools for data analysis and management.

The availability of ever larger and more complex amounts of data requires comprehensive and methodological skills that researchers must often learn independently. These skills begin with the consideration of how scientific data should be collected, extending to questions about data processing applications, methods, infrastructure, and finally, publishing. The DKZ.2R focuses on offering support for researchers to break through data related hurdles in order to find cross-domain solutions and synergies.

In our contribution we are presenting our workflow on the filtering of training data for Foundation Models …

License: CC-BY-4.0

View details

MareHub

Images and videos are usually a more vivid data source than raw scalar data. However, even in the era of analog photo albums, metadata was added to images to preserve their context for the future. Today, the marine community wants to analyze far larger datasets of videos and images using computers, which generally cannot easily understand the image content on their own. Therefore, researchers have to record the content and context of images in a structured format to enable automated, systematic and quantitative image analysis.

The metadata file format FAIR Digital Objects for images (iFDOs) provides this structure for describing individual images and hole datasets. iFDOs primarily structure the answers to the five W's and H questions: Where were the images taken, by whom, why, when, how, and what is actually shown in the images or videos. Together, these pieces of information provide FAIRness (findability, accessibility, interoperability and reusability) to datasets.

Researchers benefit from iFDO enhanced datasets, as they already provide the information necessary for data homogenization, enabling machine learning applications and mass-data-analysis. Data viewers and portals, such as marine-data.de , can increase the reach and impact of datasets by visualizing the datasets and making them findable using the context …

License: CC-BY-4.0

View details

In Environmental Sciences, Time-series data is key to, for example, monitoring environmental processes and validating Earth system models. A major issue is the lack of a consistent data availability standard aligned with the FAIR principles, but the Helmholtz Earth and Environment DataHub is working with the Helmholtz Metadata Collaboration project STAMPLATE to address this.

The seven participating research centers are building a large-scale infrastructure using the Open Geospatial Consortium's SensorThings API (STA) as the central data interface. It is linked to other community-driven tools, such as sensor and device management systems, data ingestion systems, and the Earth Data Portal (www.earth-data.de) with highly customizable viewers.

Our custom, semantic metadata profile augments STA’s core data model with domain-specific information. This ensures that metadata entered in any user self-service is also displayed in the Earth Data Portal along with the ingested data.

The operationalization of the framework and its subsequent integration into research data workflows is imminent, thereby rendering long-term, nationwide measurements spanning decades available. Concurrently, our RDM processes are undergoing a transformative shift, moving from manual, person-based to self-organized, digital-supported workflows.

This poster presents the fundamental elements of our initiative and the associated challenges. It also encourages new domains to get involved.

License: CC-BY-4.0

View details

The O cean S cience I nformation S ystem (OSIS), developed at GEOMAR Helmholtz Centre for Ocean Research Kiel, is a central platform for managing and publishing metadata related to marine research expeditions, experiments and simulations. In response to evolving national needs and broader integration efforts, OSIS is currently undergoing a major transformation.

A key driver of this development is its adoption by the Deutsche Allianz Meeresforschung (DAM) as the primary system for recording German research cruises across partner institutions. This expansion has necessitated enhanced interoperability, standardized metadata workflows, and scalable infrastructure.

In parallel, OSIS is building stronger integration with the O2ARegistry, developed by AWI as a cross-institutional metadata registry for sensors and platforms. These efforts aim to support the reuse of expedition and instrument metadata in broader national and international contexts.

As part of its modernization, OSIS will support single sign-on (SSO) via the Helmholtz AAI, enabling seamless and secure access for users across participating institutions.

Another major focus is the automated import of planned expedition data from upstream expedition planning and logistics systems such as MFP (Marine Facilities Planning) and EIS (Expeditions-Informationssystem). These enhancements are designed to streamline data entry, reduce redundancy, and improve data consistency across the …

License: CC-BY-4.0

View details

Building web applications for the exploration of scientific data presents several challenges. These include the difficulty of accessing large volumes of data, the need for high-performance computing (HPC) resources to process and analyze such data, and the complexity of developing intuitive web frontends—especially for scientists who are not trained web developers. The Data Analysis Software Framework (DASF) addresses these challenges by enabling scientists to focus on Python-based backend development while seamlessly integrating HPC resources, even when these are not directly exposed to the internet. DASF also provides an automated mechanism to generate web frontends, significantly lowering the barrier to entry for scientific web application development (DOI:10.5194/egusphere-egu25-3120).

Complementing this, the ESRI Experience Builder empowers users to create multi-page web applications and dashboards through a content management system, without requiring expertise in JavaScript-based frontend frameworks. This makes it an ideal platform for scientists to build rich, interactive data exploration tools. The newly developed DASF plugin for the Experience Builder (available at https://codebase.helmholtz.cloud/dasf/dasf-experiencebuilder-plugin) bridges these two ecosystems. It enables seamless access to data and computational resources from within Experience Builder applications, facilitating the creation of powerful, user-friendly scientific web portals.

License: CC-BY-4.0

View details

We, the National Centre for Environmental and Nature Conservation Information, develop the portal umwelt.info which acts as central access point to all of Germany’s knowledge on the environment and nature protection. We integrate all openly accessible sources from municipalities, to federal states, civil society, economy and sciences into one flexible catalogue. This catalogue at its core will make it easier to find and share all kinds of data and information, like web applications, research data, or editorials. Here, we want to present our approach on how to combine this diverse data ecosystem into one searchable catalogue. Our approach is to develop an open-source software, where everybody can contribute to the development. We want to give insights into our development process, both in our front- and back-end.

To support the open data community, we offer a native API, as well as an emulated CKAN interface. Furthermore, we create editorials and scripts about data availability with a current focus on water-related data sets in Germany. These products aim to help scientists to gain easier access, as well as information on reusability. Our current product can be found at https://umwelt.info and our current development stage at https://gitlab.opencode.de/umwelt-info .

License: CC-BY-4.0

View details

Aquatic life is crucial for human well-being, playing a key role in carbon sequestration, climate regulation, biodiversity conservation and nutrition. Plankton are the basis of aquatic food webs and sustainably sequester vast amounts of carbon from the atmosphere to the ocean’s interior. Impacts of climate change and pollution on plankton functioning and diversity not only impact fish resources that play a major role in human nutrition, but also the efficiency of the biological carbon pump. The critical role of aquatic life in biogeochemical cycles, climate regulation, conservation of aquatic biodiversity and human nutrition mandates precise mapping and monitoring. Distributed pelagic imaging techniques enable comprehensive global observation, providing critical insights for decision-making and carbon dioxide removal strategies. To this end, each day, millions of images of plankton and particles are taken by researchers around the globe, using a variety of imaging systems. Each individual image and its associated metadata can provide crucial information not only about the individual organism or particle, but also on the biodiversity and functioning of aquatic food webs, ecosystem status of the related water body, and its role in carbon sequestration. The Aquatic Life Foundation Project will, for the first time, combine billions of images acquired with …

License: CC-BY-NC-4.0

View details

AK Viewer

In forestry, having consistently relevant and correct information is critical towards environmentally conscionable decision making. Consulting commercially available or open-source Large Language Models (LLMs) in this decision-making process can be an effective way towards informed decision making. However, currently LLMs have demonstrated to be deficient in three critical areas: reliable information sources, lack of access to real world data, and ambiguity in their scientific reasoning ability.

To overcome these shortcomings we opted to instantiate a curated knowledgebase containing information from relevant CC-0 research articles. With clearly defined constraints applied to each research article intended for ingestion in the knowledgebase, it becomes possible for the underlying LLM to produce feedback that is correct, concise and accurate. Some of the constraints of the knowledgebase are in place to adhere to European laws regarding the ethical use of AI as well as comply with copyright laws.

Aside from having access to relevant research papers, having access to real world data is one of the cornerstones of the proposed framework. By utilizing calibrated level 1 data from multiple sensors, platforms and measuring devices, we can implement agentic RAG functionality to retrieve information about our area of interest.

Lastly, reasoning abilities in Large Language Models …

License: CC0-1.0

View details

Image-based data analysis is becoming increasingly important in Earth and environmental sciences – for example, in marine biodiversity monitoring, drone image evaluation, or automated habitat classification. Deep learning approaches such as YOLO (You Only Look Once) offer powerful tools for object detection, but their application is often limited by technical complexity and the need for programming skills.

In this demo, I present a user-friendly graphical interface that allows researchers to upload their own image datasets and annotations, and then configure and run a complete object detection workflow – including data preparation, model training, validation, and testing – all without writing any code. This tool is particularly aimed at scientists working with image data who want to apply deep learning methods without needing expertise in machine learning frameworks.

The demo will showcase typical use cases from marine biology, but the workflow is domain-agnostic and easily transferable to a wide range of Earth and environmental science applications. In the future, the tool will be available via BinderHub, allowing users to run the entire workflow directly in their web browser without any local installation.

License: CC-BY-4.0

View details

AK Metadaten-PIDs

At the Helmholtz Association, we aim to establish a well-structured and harmonized data space that connects information across distributed data infrastructures. Achieving this goal requires the standardization of dataset descriptions using appropriate metadata and the definition of a single source of truth for much of this metadata, from which different systems can draw. Persistent Identifiers (PIDs) in metadata enable the reuse of common information from shared sources. Broad adoption of PID types enhances interoperability and supports machine-actionable data. As a first step, we recommend implementing ROR, ORCID, IGSN, PIDINST, DataCite DOI, and Crossref DOI in our data systems.

However, to practically record and integrate this information into our repositories, we must first identify the specific locations and stakeholders within institutions where this data is generated and maintained. We must also assess what kinds of tools and services the Association needs to provide to support seamless data management for its users.

In this presentation, we highlight several tools we propose to implement across the organization, based on envisioned workflows. These include, for example, repository software, electronic lab notebooks (ELNs), terminology services, and other infrastructure components. Implementing these tools will support the various stakeholder groups in fulfilling their roles and will contribute …

License: CC-BY-4.0

View details

Bathymetric evolution in coastal environments is driven by complex interactions between hydrodynamics, sediment transport, and morphodynamics. Traditional morphodynamic models often face challenges in capturing these dynamics, particularly in regions like the Wadden Sea, where the feedback mechanisms between physical processes and seabed changes are highly intricate. In this study, we explore the potential of deep learning techniques to address these limitations, using convolutional neural networks (CNN) for bathymetric reconstruction and convolutional long short-term memory (ConvLSTM) networks for forecasting. We applied these models to a dataset of bathymetric observations from the German Bight, which provides detailed coverage of the seabed from 1983 to 2012. First, we demonstrated that CNN effectively reconstructs spatial bathymetric patterns based on incomplete data inputs, achieving accurate reproductions of observed bathymetry with minimal reconstruction error, particularly in regions with active dynamics like tidal channels. Second, we used ConvLSTM for forecasting, training the model with past observations to predict bathymetry. The ConvLSTM model performed well, with an area-averaged root mean square error of 0.139 m. Our results indicate that deep learning techniques offer promising alternatives to traditional methods for both spatial reconstruction and forecasting of bathymetric changes. These models can improve predictions of seabed dynamics, which are critical …

License: CC-BY-4.0

View details

AK Metadaten-PIDs

The proliferation of digital portals for accessing environmental and geoscientific data has significantly enhanced the ability of researchers, policymakers, and the public to retrieve and utilize critical information.

Furthermore, metadata content can be harvested, which brings the added value that information is collected only once and then presented in different web presences. In this way, it is possible to tailor the presentation of metadata attributes to the relevant user groups - optimally presented according to their priorities.

The Earth Data Portal (https://earth-data.de/), as a collaborative effort of the Helmholtz centers of the research field Earth and Environment enables querying data from multiple repositories, particularly from the Helmholtz research centers.

In addition, umwelt.info (https://umwelt.info/de), operated by the German Environment Agency, offers a user-friendly interface of openly available environmental and nature protection data tailored for seamless access by the general public.

The respective metadata content, on the other hand, is certainly best presented on the website of the source repository.

This poster provides a general overview of the highlighted portals and delves into the specifics of their implementations, with a focus on the use of Persistent Identifiers (PIDs). PIDs play a crucial role in ensuring the long-term accessibility and citability of data, …

License: CC-BY-4.0

View details

AK Viewer

The Helmholtz Model Zoo (HMZ) is a cloud-based platform that provides remote access to deep learning models within the Helmholtz Association. It enables seamless inference execution via both a web interface and a REST API, lowering the barrier for scientists to integrate state-of-the-art AI models into their research.

Scientists from all 18 Helmholtz centers can contribute their models to HMZ through a streamlined, well-documented submission process on GitLab. This process minimizes effort for model providers while ensuring flexibility for diverse scientific use cases. Based on the information provided about the model, HMZ automatically generates the web interface and API, tests the model, and deploys it. The REST API further allows for easy integration of HMZ models into other computational pipelines.

With the launch of HMZ, researchers can now run AI models within the Helmholtz Cloud while keeping their data within the association. The platform imposes no strict limits on the number of inferences or the volume of uploaded data, and it supports both open-access and restricted-access model sharing. Data uploaded for inference is stored within HIFIS dCache InfiniteSpace and remains under the ownership of the uploading user.

HMZ is powered by GPU nodes equipped with four NVIDIA L40 GPUs per …

Meeting room

Lecture Hall

View details

License: CC-BY-NC-ND-4.0

View details

In 2022, GEOMAR created the Data Science Unit as its internal start-up to centralize Data Science support and activities. With up to eight data scientists as support personnel for GEOMAR, various projects and services were addressed in the following years. Now, three years since the foundation, we present lessons-learned such as the importance of on-site training programs, the challenges in balancing generalisation and customization or the varied success in achieving science-based key performance indicators.

Meeting room

Lecture Hall

View details

License: CC-BY-4.0

View details

MareHub

Compliant with the FAIR data principles, the long-term archiving of marine seismic data acquired from active-source surveys remains a critical yet complex task within the geophysical data life cycle. Data infrastructures such as PANGAEA – Data Publisher for Earth & Environmental Science and affiliated repositories must address the increasing volume, heterogeneity, and complexity of these datasets, which are produced using a variety of acquisition systems. To support this, the German marine seismic community is actively developing metadata standards tailored to different seismic data types, enabling their proper integration and archiving in PANGAEA. In parallel, new semi-automated workflows and standard operating procedures (SOPs) are being established and implemented to ensure consistent data publication and sustainable long-term stewardship.

These advancements are being driven by the “Underway” Research Data project, a cross-institutional initiative of the German Marine Research Alliance (Deutsche Allianz Meeresforschung e.V., DAM). Initiated in mid-2019, the project aims to standardize and streamline the continuous data flow from German research vessels to open-access repositories, in alignment with FAIR data management practices. Marine seismic data curation, in particular, stands out as a successful use case for integrating expedition-based data workflows. By leveraging the tools, infrastructure, and expertise provided by the “Underway” Research Data …

Meeting room

Lecture Hall

View details

License: CC-BY-4.0

View details

MareHub

Autonomous Underwater Vehicles (AUVs) and Remotely Operated Vehicles (ROVs) are essential tools for investigating marine environments. These large-scale platforms are equipped with a variety of sensors and systems, including CTD, fluorometers, multibeam echosounders, side-scan sonar, and camera systems. ROVs also have the capability to collect water, biological, and geological samples. As a result, the datasets acquired from these missions are highly heterogeneous, combining diverse data types that require careful handling, standardization of metadata information, and publication.
At GEOMAR, we develop and implement within the context of the Helmholtz DataHub a comprehensive workflow that spans the entire data lifecycle for large scale facilities.
It combines using the infrastructures of O2A Registry for device management, Ocean Science Information System (OSIS) for cruise information, PANGAEA for data publication and the portal earth-data.de for future visualization of AUV and ROV missions.
The presented workflow is currently deployed for GEOMAR’s REMUS6000 AUV "Abyss", and is being designed with scalability in mind, enabling its future application to other AUVs and ROVs.

Meeting room

Lecture Hall

View details

License: CC-BY-4.0

View details

MareHub

The German research vessels Alkor, Elisabeth Mann Borgese, Heincke, Maria S. Merian, Meteor, Polarstern and Sonne steadily provide oceanographic, meteorological and other data to the scientific community. However, accessing and integrating time series raw data from these platforms has traditionally been fragmented and technically challenging. The newly deployed DSHIP Land System addresses this issue by consolidating time series data from marine research vessels into a unified and scalable data warehouse.

At its core, the new system stores raw measurement data in the efficient and open Apache Parquet format. These columnar storage files allow for rapid querying and filtering of large datasets. To ensure flexible and high-performance access, the system uses a Trino SQL query engine running on a Kubernetes cluster composed of three virtual machines. This setup can be elastically scaled to meet variable demand, enabling efficient data access even under high load.

This talk will briefly introduce the technical foundations of the DSHIP Land System, highlight the choice of storage format, the architecture of the Trino engine, and its deployment in a containerized Kubernetes environment. The focus will then shift to a demonstration how users can interactively query the datasets using standard SQL, enabling cross-vessel data exploration, filtering by …

Meeting room

Lecture Hall

View details

License: CC-BY-NC-SA-4.0

View details

MareHub

The Baltic Sea is a semi-enclosed shelf sea and characterized by its distinct geographical and oceanographic features. One of the Baltic’s most remarkable features is its surface salinity gradient that is horizontally decreasing from the saline North Sea to the near fresh Bothnian Sea in the north, and Gulf of Finland in the east. Additionally, a vertical gradient and strong stratification separate between less saline surface water and deep saline water. These salinity features are mainly driven by a combination of river runoff, net precipitation, wind conditions, and geographic features that lead to restricted and irregular inflow of saltwater into the Baltic and limited mixing. The overall positive freshwater balance causes the Baltic to be much fresher compared to fully marine ocean waters with a mean salinity of only about 7 g/kg. The Baltic Sea is particularly sensitive to climate change and global warming due to its shallowness, small volume and limited exchange with the world oceans. Consequently, it is changing more rapidly than other regions. Recent changes in salinity are less clear due to a high variability but overall surface salinity seems to decrease with a simultaneous increase in the deeper water layers. Furthermore. the overall salinity distribution is …

Meeting room

Lecture Hall

View details

License: CC-BY-4.0

View details

The growing complexity of digital research environments and the explosive increase in data volume demand robust, interoperable infrastructures to support sustainable Research Data Management (RDM). In this context, data spaces have emerged—especially in industry—as a powerful conceptual framework for organizing and sharing data across ecosystems, institutional boundaries, and disciplines. Although the term is not yet fully established in the research community, it maps naturally onto scientific practice, where the integration of heterogeneous datasets and cross-disciplinary collaboration are increasingly central.

Aligned with the principles of open science, FAIR Digital Objects (FDOs) provide a promising infrastructure for structuring these emerging data spaces. FDOs are standardized, autonomous, and machine-actionable digital entities that encapsulate data, metadata, software, and semantic assertions. They enable both humans and machines to Find, Access, Interoperate, and Reuse (FAIR) digital resources efficiently. By abstracting from underlying technologies and embedding persistent, typed relations, FDOs allow for seamless data integration, provenance tracking, and rights management across domains. This structure promotes reproducibility, trust, and long-term sustainability in data sharing.

Using an example from climate research, we demonstrate how data from from different data spaces can be combined. By employing STACs (Spatio Temporal Asset Catalogs) defined as FAIR Digital Objects facilitating the European Open …

Meeting room

Lecture Hall

View details

License: CC-BY-SA-4.0

View details

Based on statistical analysis combined with numerical modeling and machine learning, We investigated annual- to decadal-scale morphodynamic patterns of the German Wadden Sea and their predictability at the relevant scales. Results from the multivariate EOF (Empirical Orthogonal Function) analysis of the annual bathymetry data spanning from 1998 till 2022 and potential related drivers and environmental factors (tidal range, storm surge level and frequency, sediment properties and longshore currents) provide insights into morphodynamic patterns of the study area. Both extreme water levels (storm surges) and tidal range show a significant positive correlation with the magnitude of morphological changes, indicating their important role in controlling sediment transport and morphological evolution. Coastal longshore currents exhibit a correlation with the movement of tidal channels which are continuously migrating and deepening in the East and North Frisian regions and oscillating in the estuarine areas (Ems, Wesser and Elbe). Numerical modeling was then applied to derive a process-based understanding of the feedback mechanisms between the physical drivers and the morphology of the Wadden Sea. Finally, state-of-the-art machine learning approaches were used to explore the predictability of morphological change of the Wadden Sea and compared with numerical predictions to identify the strengths and weakness of both methods.

License: CC-BY-SA-4.0

View details

AK Viewer

Coastal zones face increasing pressure from both natural forces and human activity, including sea-level rise, erosion, and expanding infrastructure. Understanding how these landscapes evolve over time is essential for informed decision-making in environmental management, urban planning, and climate adaptation.

We present AutoCoast, a web-based platform for long-term coastal monitoring that combines multi-source satellite imagery with machine learning to detect and visualize shoreline changes from 2015 to 2024. Initially developed for the Baltic Sea, the system is being expanded to cover additional regions such as the North Sea.

A key component of the platform is a custom annotation tool that supports rapid image labeling through active learning. This approach reduces manual effort while maintaining high-quality training data. Our curated dataset, based on Sentinel-2 imagery, includes coastal-specific classes such as beaches, marshes, tidal flats, cliffs, and man-made structures. The resulting segmentation model can reliably identify and classify coastal landforms.

To enhance temporal consistency and spatial accuracy, we implement post-processing steps such as tidal normalization and integrate complementary Sentinel-1 radar data for detecting elevation-driven changes and improving resilience to cloud cover.

The user interface supports dynamic visualization and comparison of coastline evolution, enabling exploration of trends in erosion, accretion, and land use change. …

License: CC-BY-SA-4.0

View details

When applying sustainable Nature-based Solution (NbS) for coastal engineering, a major challenge lies in determining the effectiveness of these NbS approaches in mitigating coastal erosion. The efficacy of NbS is influenced by various factors, including the specific location, layout, and the scale of implementation. This study integrates artificial intelligence (AI) with hydro-morphodynamic numerical simulations to develop an AI-based emulator focused on predicting Bed Level Changes (BLC) as indicators of erosion and deposition dynamics. In particular, we explore the influence of seagrass meadows, which vary in their initial depth (hs) and depth range (hr), on the attenuation of coastal erosion during storm events.

The framework employs a hybrid approach combining the SCHISM-WWM hydrodynamic model with XBeach to simulate 180 depth range and starting depth combination (h _r -h _s ) scenarios along the Norderney coast in the German Bight. A Convolutional Neural Network (CNN) architecture is used with two inputs—roller energy and Eulerian velocity—to efficiently predict BLC. The CNN shows high accuracy in replicating spatial erosion patterns and quantifying erosion/deposition volumes, achieving an R² of 0.94 and RMSE of 3.47 cm during validation.

This innovative integration of AI and NbS reduces computational costs associated with traditional numerical modelling and improves the …

License: CC-BY-4.0

View details

Monitoring marine mammals is critical during noisy activities such as seismic surveys and naval operations, where the use of loud airguns and sonars can harm whales and seals. Traditional visual monitoring by marine mammal observers is limited by factors such as low light, rough seas, fog, and human fatigue. Drawing on over 10 years of at-sea and ashore research, thousands of whale cues, primarily blows and bodily displays were captured using infrared cameras. Studies demonstrated that infrared imaging reliably detects whale blows worldwide across all climate zones and operates continuously over extended periods (with fog being the primary limitation) [1].

To enable continuous, 24/7 monitoring of whales, we developed a deep learning framework that utilizes infrared video captured by a commercial, cooled 360° thermal imaging sensor for automatic whale blow detection. We evaluated multiple machine learning models, including 2D CNN, 3D CNN, and classical algorithms such as Random Forest and SVM, with infrared video data as input. Each video contains a set of 30 frames, which the system analyzes to determine whether a whale is present in any of them.

Our experiments demonstrated strong detection performance. The results highlighted the robustness of our model, demonstrating its adaptability across different environmental …

License: CC-BY-4.0

View details

MareHub

The novel DSHIP land system integrates new concepts and state-of-the-art technology to explore, access, and process environmental time series data from various platforms. In this demo session we would like to show i) where to find and how to access the long established features from the land system, ii) highlight some of the new features, such as filtering, querying, and subsetting of (mass) data, iii) options for traceability and interoperability, and iv) give some insights and benchmarks of the systems.

License: CC-BY-4.0

View details

Soil moisture is a critical environmental variable, that has a big impact on hydrological extremes, plant water availability and climate processes. Hence, accurate and extensive data on in situ soil moisture is crucial to assess these environmental impacts. The International Soil Moisture Network (ISMN, https://ismn.earth ) collects such data from several monitoring networks providing a harmonized, quality controlled and free accessible archive.

Collecting and integrating soil moisture data from various data providers presents significant challenges, as they use diverse measurement technologies and data formats. Next to daily ingesting data into the database, keeping the metadata updated and clean remains challenging. ISMN’s existing metadata mapping between input data and database was potentially leading to data corruption due to non-robust and weak data processing.

To ensure reliable data ingestion and mitigate corruption risks, we significantly enhanced the data integration and harmonization process. This involved improving data download robustness, developing an automated procedure for metadata mapping, and implementing automated metadata checks and retrievals. After successfully testing the automated detection and incorporation of metadata changes (e.g. sensor types, exchange dates) the maintenance efforts and human interventions were reduced significantly. Future work will focus on expanding this functionality to other providers and fully incorporating it …

License: CC-BY-4.0

View details

Effective data stewardship in research hinges upon the consistent and FAIR (Findable, Accessible, Interoperable, Reusable) representation of scientific variables across diverse environmental disciplines. Within the Helmholtz Earth and Environment DataHub initiative, we are, hence, developing an innovative approach utilizing Large Language Models (LLMs) to support data producers by automating the semantic annotation of research data. Our service employs the community-driven I-ADOPT framework, which decomposes variable definitions from natural language descriptions into essential atomic parts, ensuring naming consistency and interoperability.

In this poster, we present our approach to developing an LLM-based annotation service, highlighting key challenges and solutions as well as the integration in higher-level infrastructures of the Helmholtz DataHub and beyond. The proposed annotation framework significantly streamlines the integration and harmonizes the description of environmental data across domains such as climate, biodiversity, and atmospheric sciences, aligning closely with the objectives of the NFDI and European Open Science Cloud (EOSC).

This contribution showcases how advanced semantic annotation tools can support data stewardship in practical research contexts, enhancing reproducibility, interoperability, and collaboration within the scientific community.

License: CC-BY-4.0

View details

AK Metadaten-PIDs

The Helmholtz Metadata Collaboration (HMC) Hub Earth and Environment seeks to create a framework for semantic interoperability across the diverse research data platforms within the Helmholtz research area Earth and Environment (E&E). Standardizing metadata annotations and aligning the use of semantic resources are essential for overcoming barriers in data sharing, discovery, and reuse. To foster a unified, community-driven approach, HMC, together with the DataHub, has established the formal "Metadata-Semantics" Working Group, which brings together engaged data stewards from major Helmholtz research data platforms within the E&E domain.

As part of its strategy to standardize metadata annotation in collaboration with the community, the working group will begin by harmonizing device-type denotations across two Helmholtz sensor registries: the O2A REGISTRY, developed at AWI, and the Sensor Management System (SMS), maintained by UFZ, GFZ, KIT, and FZJ. This harmonization involves the development of a shared FAIR controlled vocabulary and the implementation of a peer-reviewed curation process for it.

The common vocabulary will support the creation of referenceable ad-hoc terms when needed, incorporate versioning and quality assurance measures, and establish links with existing terminologies in the field (e.g., NERC L05, L06, ODM2, GCMD). Its development will involve experts from various disciplines within Helmholtz E&E …

License: CC-BY-4.0

View details

Climate change has strong effects on many areas in our daily life. It, therefore, requires a thorough understanding of climate change and its consequences to develop effective mitigation and adaptation strategies. The wealth of available data and services is a key to help understand climate change. However, they are often not easily accessible and interpretable, and the hurdle to work with them is high for non-domain experts.

The EU Horizon project FOCAL aims to bridge the gap between data, services and end users by implementing a platform that enables easy and efficient exploration of climate data on a local scale. In particular, decision-making processes of stakeholders in the fields of forestry and urban planning shall be supported with the developed tools. To this end, an open compute platform will be implemented and launched that combines intelligent workflow management with high-performance computing (HPC) resources. A modular and interoperable platform architecture will be used that provides software container-based services. Furthermore, new AI tools to enhance the climate data analysis in terms of speed, robustness and accuracy and to broaden the toolkit of climate data analysis and impact assessment tools will be investigated, developed and made available via the platform.

An additional co-design …

License: CC-BY-SA-4.0

View details

Confronting the escalating impacts of climate change on our coastlines demands a revolution in how we monitor, predict, and protect these vital zones. The rapid advancement of the Digital Ocean, powered by AI-enhanced and data fusion high-resolution coastal observations, and sophisticated AI-driven numerical models, offers this crucial leap forward. Building on this technology and the Copernicus Marine Environment Monitoring Service (CMEMS), the Horizon Europe programme of the European Commission launched the Forecasting and Observing the Open-to-Coastal Ocean for Copernicus Users (FOCCUS) project (foccus-project.eu), consisting of 19 partners from 11 countries. Member State Coastal Systems (MSCS) and users will collaborate to advance the coastal dimension of CMEMS by improving existing capability and developing innovative coastal products.

FOCCUS enhances CMEMS’s coastal capabilities through three key pillars: i) developing novel high-resolution coastal data products by integrating multi-platform observations (remote sensing and in-situ), Artificial Intelligence (AI) algorithms, and advanced data fusion techniques for improved monitoring; ii) developing advanced hydrology and coastal models including a pan-European hydrological ensemble for improved river discharge predictions, and improving member state coastal systems by testing new methodologies in MSCS production chains while taking advantage of stochastic simulation, ensemble approaches, and AI technology; and iii) demonstrating innovative products and improved …

License: CC-BY-4.0

View details

Software development and data curation or analysis share many of their issues: keeping track of the evolution of files -- ideally with information on how, why, when, and by whom --, organizing collaboration with multiple people, keeping track of known issues and other TODOs, discussing changes, making versions available to others, automating tasks, and more. Often you will even have to write code as part of a data project, blurring the line between the two even more. In the free and open-source software development world these issues already have well established solutions: a version control system keeps track of your projects history and ongoing development, a forge can serve as a collaboration hub, CI/CD services provide flexible automation. So, why not apply them to our data management needs? Forgejo-aneksajo does just that and extends Forgejo with git-annex support, making it a versatile (meta-)data collaboration platform that neatly fits into a data management ecosystem around git, git-annex and DataLad.

License: CC-BY-4.0

View details

Morbidity from heat extremes is much higher than mortality and has higher costs but has received less attention. In addition, there are a few, if any, models available to assess the current and potential future impacts of heat extremes on morbidity under climate change. In this study we develop a machine learning model based on a large insurance dataset for the springtime (Q2: April, May, June) and summertime (Q3: July, August, September) for the period 2013-2023 in Germany. From this dataset, we construct a spatially distributed 1km ² dataset on incidence of heat strokes and volume depletion for the federal state of North-Rhine Westphalia. We link this to detailed estimates of past heat extremes (maximum air temperature, average air temperature, number of hot days) as well as air pollution (NO ₂ , O ₃ , PM ₁₀ , and PM _2.5 ), and socioeconomic factors (education level, household income, and unemployment rate) to explain temporal and spatial differences in incidence. We present results for the XGBoost algorithm, as well as initial results for deep-learning algorithms.

License: CC-BY-NC-4.0

View details

ADCP moving ship measurements provide high-resolution insight into the hydrodynamics and sediment transport processes of a waterbody, thus improving the understanding of physical processes. In order to facilitate scientific advancement, the BAW provides ADCP data from a measurement campaign conducted in the Eider estuary in 2020. Data compliant with the FAIR principles can be downloaded from the BAW-Datenrepository. The raw and analysed data are available in ASCII format. Entering the ISO metadata and uploading the files is currently performed manually.

As an increasing number of data sets are to be published, the error-prone manual work is to be replaced by an automated workflow. In a first step, the raw data is converted into binary NetCDF files including metadata that complies with CF metadata conventions. The measurement and metadata are inseparable and consequently, the process is more error-proof. The existing ADCP2NETCDF converter has been extended for this purpose. Software that supports the generated CF NetCDF file type “trajectoryProfile” can process the offered files directly.

In a second processing step, it is planned to convert the NetCDF files into ISO-compliant metadata in XML format, which can be imported into the metadata information system (MIS) of the BAW-Datenrepository. A method that has already …

License: CC-BY-4.0

View details

Neural networks have been applied for fast downscaling of environmental fields. However, their inherent randomness can lead to prediction instability. This study introduces an ensemble neural network to assess the effectiveness of the ensemble method in mitigating instability in statistical spatial wave downscaling. Its performance is compared with a deterministic linear regression model. Significant wave height (SWH) in the western Black Sea is considered, with low-resolution SWH and wind data from ERA5 and high-resolution SWH data from a regional numerical model. Both self-variable downscaling (from low-resolution SWH) and cross-variable downscaling (from low-resolution wind fields) are considered. Results show that the ensemble method significantly reduces the base neural network’s prediction instability. In self-variable SWH downscaling, two models perform similarly well, whereas in cross-variable downscaling, the ensemble model outperforms the linear model. These findings provide valuable insights into downscaling methodologies, contributing to improved spatial wave predictions.

License: CC-BY-4.0

View details

AK Metadaten-PIDs

To ensure FAIR data (Wilkinson et al., 2016: https://doi.org/10.1038/sdata.2016.18 ), well-described datasets with rich metadata are essential for interoperability and reusability. In Earth System Science, NetCDF is the quasi-standard for storing multidimensional data, supported by metadata conventions such as Climate and Forecast (CF, https://cfconventions.org/ ) and Attribute Convention for Data Discovery (ACDD, https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3 ).

While NetCDF can be self-describing, metadata often lacks compatibility and completeness for repositories and data portals. The Helmholtz Metadata Guideline for NetCDF (HMG NetCDF) Initiative addresses these issues by establishing a standardized NetCDF workflow. This ensures seamless metadata integration into downstream processes and enhances AI-readiness.

A consistent metadata schema benefits the entire processing chain. We demonstrate this by integrating enhanced NetCDF profiles into selected clients like the Earth Data Portal (EDP, https://earth-data.de ). Standardized metadata practices facilitate repositories such as PANGAEA ( https://www.pangaea.de/ ) and WDCC ( https://www.wdc-climate.de ), ensuring compliance with established norms.

The HMG NetCDF Initiative is a collaborative effort across German research centers, supported by the Helmholtz DataHub. It contributes to broader Helmholtz efforts (e.g., HMC) to improve research data management, discoverability, and interoperability.

Key milestones include:

Aligning metadata fields across disciplines,
Implementing guidelines,
Developing machine-readable templates and validation tools,
Supporting user-friendly metadata …

Meeting room

Lecture Hall

View details

License: CC-BY-4.0

View details

There is an increasing effort in scientific communities to create shared vocabularies and ontologies. These build the foundation of a semantically annotated knowledge graph which can surface all research data and enable holistic data analysis across various data sources and research domains.

Making machine-generated data available in such a knowledge graph is typically done by setting up scripts and data transformation pipelines which automatically add semantic annotations. Unfortunately, a good solution for capturing manually recorded (meta)data in such a knowledge graph is still lacking.

Herbie, the semantic electronic lab notebook and research database developed at Hereon, fills this gap. In Herbie, users can enter all (meta)data on their experiments in customized web forms. And once submitted, Herbie automatically adds semantic annotations and stores everything directly in the knowledge graph. So it is as easy to use as a spreadsheet but produces FAIR data without any additional post-processing work. Herbie is configured using the standardized SHACL Shapes Constraint Language and furthermore builds on well-established frameworks in the RDF ecosystem like RDFS, OWL, or RO-Crate.

We will showcase this approach through a typical example of a production and analysis chain as can be found in many scientific domains.

Meeting room

Lecture Hall

View details

License: CC-BY-ND-4.0

View details

The collection and use of sensor data are vital for scientists monitoring the Earth's environment. It allows for the evaluation of natural phenomena over time and is essential for validating experiments and simulations. Assessing data quality requires understanding the sensor's state, including operation and maintenance, such as calibration parameters and maintenance schedules. In the HMC project MOIN4Herbie, digital recording of FAIR sensor maintenance metadata is developed using the electronic lab notebook Herbie.

In this talk, we will describe the process of configuring Herbie with ontology-based forms for sensor maintenance metadata in our two pilot cases, the Boknis Eck underwater observatory and the Tesperhude research platform. This includes the development of a sensor maintenance ontology and task-specific ontologies tailored for each use case. Ontologies, in information science, are a formalization of concepts, their relations, and properties. They allow for the collection of input that is immediately fit for purpose as findable, machine-readable, and interoperable metadata. By using ontologies, we can ensure the use of controlled vocabularies and organize the knowledge stored within for accessibility and reusability.

A further focus will be the translation of maintenance tasks into Shapes Constraint Language (SHACL) documents that can be rendered as forms to the users …

Meeting room

Lecture Hall

View details

License: CC-BY-4.0

View details

Technology is revolutionizing our approach to environmental challenges. Among the most promising tools of digitalization is the Digital Twin (DT), or more specifically the Digital Twin of the Ocean (DTO). This is a virtual replica of the ocean that holds immense potential for sustainable marine development. In order to successfully confront the increasing impacts and hazards of a changing climate (such as coastal erosion and flooding), it is vital to further develop the DTO in order to be able to monitor, predict, and protect vulnerable coastal communities. DTOs are powered by AI-enhanced data that integrates ocean conditions, ecosystems, and anthropogenic influences, along with novel AI-driven predictive modeling capabilities, combining wave, hydrodynamic, and morphodynamic models. This enables unprecedented accuracy in seamless forecasting capabilities. In addition to natural phenomena, DTOs can also include socio-economic factors (e.g. ocean-use, pollution). Thus, DTOs can be used to monitor the current ocean state, but also to simulate future ‘What-if’ Scenarios (WiS) for various human interventions. In this way the DTO can guide decisions for protecting the coast and sustainable use of marine resources, while also promoting collaboration on effective solutions for ocean conservation.

In European projects such as the European Digital Twin Ocean (EDITO) ModelLab, work …

Meeting room

Lecture Hall

View details

License: CC-BY-4.0

View details

AK Viewer

Digital twins of the ocean (DTO) make marine data available to support the development of the blue economy and enable a direct interaction through bi-directional components. Typical DTOs provide insufficient detail near the coast, because their resolution is too coarse and the underlying models lack processes that become relevant in shallow areas, e.g., at wetting and drying of tidal flats. As roughly 2.13 Billion of the world’s population live near a coast, downscaling ocean information to a local scale becomes necessary, as many practical applications, e.g., sediment management, require high resolution data. For this reason, we focused on the appropriate downscaling of regional and global data from existing DTOs using a high-resolution (100s of meters), unstructured, three-dimensional, process-based hindcast model in combination with in-situ observations. This high-resolution model allows the fine tidal channels, estuaries, and coastal structures like dams and flood barriers to be represented digitally. Our digital twin includes tidal dynamics, salinity, sea water temperature, waves, and suspended sediment transport. Thanks to a fast and intuitive web interface of our prototype digital twin, the model data provided enable a wide range of coastal applications and support sustainable management. Bi-directional web processing services (WPS) were implemented within the interactive web-viewer …

Meeting room

Lecture Hall

View details

License: CC-BY-4.0

View details

The rapid growth of offshore wind energy requires effective decision-support tools to optimize operations and manage risks. To address this, we developed iSeaPower, a web-based platform designed to support decision-making in offshore renewable energy tasks through real-time data analysis and interactive visualizations. iSeaPower integrates detailed meteorological and oceanographic data with advanced statistical methods, machine learning forecasts, and data assimilation techniques. This integration enables accurate predictions of weather windows, thorough risk assessments, and efficient operational planning for offshore wind energy stakeholders. iSeaPower is designed to optimize journey planning by considering weather conditions and travel duration. The current framework includes five methods tailored to different operational requirements. First, the forecasting method evaluates wind speed and wave height risks over short-term windows (1–3 days) using real-time weather data to quickly identify potential hazards. Second, historical database analysis calculates exceedance probabilities based on 30-day intervals from long-term historical data, revealing recurring weather risk patterns. Third, the delay time estimation method determines potential task delays across the entire year by analyzing monthly weather trends, supporting long-term operational planning and risk management. Fourth, machine learning approaches enhance the accuracy of seven-day forecasts by combining historical data with machine learning, improving short-term predictions. Finally, the updated statistics …

Meeting room

Lecture Hall

View details

License: CC-BY-4.0

View details

Lab and field notebooks are essential tools for documenting structured information during measurement campaigns or field and laboratory work. Modern Electronic Lab Notebooks (ELNs) offer advanced features to support this documentation process and can enrich records with additional metadata—such as instrumentation details, personnel involved, sample registration, and more. To fully harness this potential, it is desirable to integrate ELNs seamlessly into the center’s data workflows—supporting information flow from sample and data acquisition, through measurement activities, all the way to data publication in repositories.

However, many laboratories face significant barriers: ELNs are not readily available, may require costly licenses, and often lack institutional support or training opportunities. As a result, their use is not yet widespread.

In this coffee round, we invite participants to explore the potential of ELNs in scientific workflows. Together, we’ll discuss desirable features, briefly review a few existing solutions, and consider whether centrally provided ELN services across Helmholtz could be a sustainable way forward.

Meeting rooms

Library Meetingroom

View details

Big Meetingroom

View details

License: CC-BY-4.0

View details

Modern AI tools for academic literature research can be divided into finders and connectors .

Finders operate in a quite similar way like catalogues: you enter a keyword, a phrase or – even better – a complete question and receive responding matches. Finders work particularly well in the natural sciences, where semantic analyses are less dependent on monographs or older literature sources then in humanities or social sciences. This is due to the fact that finders primarily index and analyse English journal articles with given DOIs and open access status. Connectors , on the other hand, start with a pre-existing literature source, designated as a seed . You enter part of the seed’s metadata (ideally a DOI) into the AI tool to identify further publications. This process automatically identifies literature, which is cited, thematically related or methodologically relevant.

In this Collaboration Coffee session, I will demonstrate how to use both options effectively: The start into a specific research question can be aided by a finder such as Semantic Scholar or Elicit. Based on a selected publication, connector such as ResearchRabbit or Inciteful then help to tap into further literature sources. Alongside my specific examples, I would like to use an …

Meeting rooms

Library Meetingroom

View details

Big Meetingroom

View details

License: CC-BY-4.0

View details

Data Science Symposium 2025

Cookies disclaimer

Don’t let your data be a needle in a haystack – efficient metadata curation for optimal findability Collaboration Coffee

Meeting rooms

Library Meetingroom

Big Meetingroom

Small Meetingroom

Let's talk about NetCDF! Collaboration Coffee

Meeting rooms

Library Meetingroom

Big Meetingroom

Small Meetingroom

Data Steward Roles in Support Networks Collaboration Coffee

Meeting rooms

Library Meetingroom

Big Meetingroom

Small Meetingroom

World Data Center for Climate – Repository for the Earth System Sciences Booth

Meeting room

Foyer

Enpowering Earth & Environment Research: The Helmholtz Information and Data Science Platforms Booth

Meeting room

Foyer

NFDI4Earth – for the community, with the community Booth

Meeting room

Foyer

The chat bot that writes poetry can do climate analysis? Keynote

Meeting room

Lecture Hall

ARGUS – Automated Recognition of Ghost Ships and Underwater Surveillance Talk

Meeting room

Lecture Hall

Comparative study to handle missing data in a machine learning model of tidal salinities Talk

Meeting room

Lecture Hall

Spatial Prediction Framework: From Point Measurements to Grid-Scale Estimates using Machine Learning models Talk

Meeting room

Lecture Hall

Downscaling Regional CAMS Reanalysis to the Urban Scale with XGBoost and Gaussian Processes: A Transferable Machine Learning Framework Talk

Meeting room

Lecture Hall

Reconstructing Coastal Sediment Dynamics with 4DVarNet: Neural Data Assimilation Leveraging Models and Satellite Data Talk

Meeting room

Lecture Hall

Can you Predict the Data? A Workshop on Reproducibility for Language Modelling Poster

iFDOs: A Key to Unlocking the Potential of Marine Image Datasets Poster

A Large-scale FAIR Research Data Infrastructure for Environmental Time Series Data Poster

OSIS – Modernizing the Ocean Science Information System for marine Expedition Management Poster

Building explorative Data Analysis Web-Frontends with DASF and the ESRI Experience Builder Digital Poster/Demo

Sharing is caring – how umwelt.info aims for easier access to environmental and nature protection data and information in Germany Digital Poster/Demo

AqQua The Aquatic Life Foundation Project: Quantifying Life at Scale in a Changing World Poster

Towards an AI enabled forest systems platform Digital Poster/Demo

YOLO at Your Fingertips: Object Detection without Coding Digital Poster/Demo

Enabling Stakeholders to Maintain PID Metadata: Which Tools are Required? Poster

Advancing bathymetric reconstruction and forecasting using deep learning Poster

Different Portals - Different Views on the Same Content Poster

The Helmholtz Model Zoo: Enabling AI Model Sharing and Inference in the Helmholtz Cloud Talk

Meeting room

Lecture Hall

The Data Science Unit at GEOMAR Talk

Meeting room

Lecture Hall

Advancing FAIR Seismic Data Publications in PANGAEA and Supplementary Visualization with the Viewer Technology of the German Marine Data Portal Talk

Meeting room

Lecture Hall

Increasing the availability and visibility of data deriving from AUVs and ROVs in Marine Research Talk

Meeting room

Lecture Hall

A novel system to integrate DSHIP data from marine research vessels Talk

Meeting room

Lecture Hall

Application of SMOS SSS L4 data to improve the understanding of the salinity dynamics and circulation of the Baltic Sea Talk

Meeting room

Lecture Hall

Science Meets Data Spaces: FAIR Digital Objects as a Gateway to Interdisciplinary Science Talk

Meeting room

Lecture Hall

Morphodynamic patterns of the German Wadden Sea and predictability Poster

AutoCoast: Scalable Coastal Change Detection with Active Learning and Satellite Data Digital Poster/Demo

Toward an AI-enhanced hydro-morphodynamic model for nature-based solutions in coastal erosion mitigation Poster