Poster Session

  • Sept. 3, 2025, 15:45 – 16:30

Postersession No. 1

Sept. 3, 2025, 15:45 – 16:30

Timetable

Sept. 3, 2025

In forestry, having consistently relevant and correct information is critical towards environmentally conscionable decision making. Consulting commercially available or open-source Large Language Models (LLMs) in this decision-making process can be an effective way towards informed decision making. However, currently LLMs have demonstrated to be deficient in three critical areas: reliable information sources, lack of access to real world data, and ambiguity in their scientific reasoning ability.

To overcome these shortcomings we opted to instantiate a curated knowledgebase containing information from relevant CC-0 research articles. With clearly defined constraints applied to each research article intended for ingestion in the knowledgebase, it becomes possible for the underlying LLM to produce feedback that is correct, concise and accurate. Some of the constraints of the knowledgebase are in place to adhere to European laws regarding the ethical use of AI as well as comply with copyright laws.

Aside from having access to relevant research papers, having access to real world data is one of the cornerstones of the proposed framework. By utilizing calibrated level 1 data from multiple sensors, platforms and measuring devices, we can implement agentic RAG functionality to retrieve information about our area of interest.

Lastly, reasoning abilities in Large Language Models …

The proliferation of digital portals for accessing environmental and geoscientific data has significantly enhanced the ability of researchers, policymakers, and the public to retrieve and utilize critical information.

Furthermore, metadata content can be harvested, which brings the added value that information is collected only once and then presented in different web presences. In this way, it is possible to tailor the presentation of metadata attributes to the relevant user groups - optimally presented according to their priorities.

The Earth Data Portal (https://earth-data.de/), as a collaborative effort of the Helmholtz centers of the research field Earth and Environment enables querying data from multiple repositories, particularly from the Helmholtz research centers.

In addition, umwelt.info (https://umwelt.info/de), operated by the German Environment Agency, offers a user-friendly interface of openly available environmental and nature protection data tailored for seamless access by the general public.

The respective metadata content, on the other hand, is certainly best presented on the website of the source repository.

This poster provides a general overview of the highlighted portals and delves into the specifics of their implementations, with a focus on the use of Persistent Identifiers (PIDs). PIDs play a crucial role in ensuring the long-term accessibility and citability of data, …

At the Helmholtz Association, we aim to establish a well-structured and harmonized data space that connects information across distributed data infrastructures. Achieving this goal requires the standardization of dataset descriptions using appropriate metadata and the definition of a single source of truth for much of this metadata, from which different systems can draw. Persistent Identifiers (PIDs) in metadata enable the reuse of common information from shared sources. Broad adoption of PID types enhances interoperability and supports machine-actionable data. As a first step, we recommend implementing ROR, ORCID, IGSN, PIDINST, DataCite DOI, and Crossref DOI in our data systems.

However, to practically record and integrate this information into our repositories, we must first identify the specific locations and stakeholders within institutions where this data is generated and maintained. We must also assess what kinds of tools and services the Association needs to provide to support seamless data management for its users.

In this presentation, we highlight several tools we propose to implement across the organization, based on envisioned workflows. These include, for example, repository software, electronic lab notebooks (ELNs), terminology services, and other infrastructure components. Implementing these tools will support the various stakeholder groups in fulfilling their roles and will contribute …

Images and videos are usually a more vivid data source than raw scalar data. However, even in the era of analog photo albums, metadata was added to images to preserve their context for the future. Today, the marine community wants to analyze far larger datasets of videos and images using computers, which generally cannot easily understand the image content on their own. Therefore, researchers have to record the content and context of images in a structured format to enable automated, systematic and quantitative image analysis.

The metadata file format FAIR Digital Objects for images (iFDOs) provides this structure for describing individual images and hole datasets. iFDOs primarily structure the answers to the five W's and H questions: Where were the images taken, by whom, why, when, how, and what is actually shown in the images or videos. Together, these pieces of information provide FAIRness (findability, accessibility, interoperability and reusability) to datasets.

Researchers benefit from iFDO enhanced datasets, as they already provide the information necessary for data homogenization, enabling machine learning applications and mass-data-analysis. Data viewers and portals, such as marine-data.de , can increase the reach and impact of datasets by visualizing the datasets and making them findable using the context …

In Environmental Sciences, Time-series data is key to, for example, monitoring environmental processes and validating Earth system models. A major issue is the lack of a consistent data availability standard aligned with the FAIR principles, but the Helmholtz Earth and Environment DataHub is working with the Helmholtz Metadata Collaboration project STAMPLATE to address this.

The seven participating research centers are building a large-scale infrastructure using the Open Geospatial Consortium's SensorThings API (STA) as the central data interface. It is linked to other community-driven tools, such as sensor and device management systems, data ingestion systems, and the Earth Data Portal (www.earth-data.de) with highly customizable viewers.

Our custom, semantic metadata profile augments STA’s core data model with domain-specific information. This ensures that metadata entered in any user self-service is also displayed in the Earth Data Portal along with the ingested data.

The operationalization of the framework and its subsequent integration into research data workflows is imminent, thereby rendering long-term, nationwide measurements spanning decades available. Concurrently, our RDM processes are undergoing a transformative shift, moving from manual, person-based to self-organized, digital-supported workflows.

This poster presents the fundamental elements of our initiative and the associated challenges. It also encourages new domains to get involved.

We, the National Centre for Environmental and Nature Conservation Information, develop the portal umwelt.info which acts as central access point to all of Germany’s knowledge on the environment and nature protection. We integrate all openly accessible sources from municipalities, to federal states, civil society, economy and sciences into one flexible catalogue. This catalogue at its core will make it easier to find and share all kinds of data and information, like web applications, research data, or editorials. Here, we want to present our approach on how to combine this diverse data ecosystem into one searchable catalogue. Our approach is to develop an open-source software, where everybody can contribute to the development. We want to give insights into our development process, both in our front- and back-end.

To support the open data community, we offer a native API, as well as an emulated CKAN interface. Furthermore, we create editorials and scripts about data availability with a current focus on water-related data sets in Germany. These products aim to help scientists to gain easier access, as well as information on reusability. Our current product can be found at https://umwelt.info and our current development stage at https://gitlab.opencode.de/umwelt-info .

The scientific landscape is continually shifting towards increasing amounts of data, demanding a greater investment of (time-) resources into the management, and (pre-) processing of these data. As a result, data literacy has become a key element for researchers from all domains. Additionally, interdisciplinary, multidisciplinary, and collaborative approaches are more essential now than ever before. The Rhine-Ruhr Center for Scientific Data Literacy (DKZ.2R) focuses on a combined methodological data literacy, integrating data science and machine learning skills, high performance computing and research data management competencies. Our main objective is to promote a holistic data literacy offering support for researchers in the form of trainings, consultings, data challenges and tools for data analysis and management.

The availability of ever larger and more complex amounts of data requires comprehensive and methodological skills that researchers must often learn independently. These skills begin with the consideration of how scientific data should be collected, extending to questions about data processing applications, methods, infrastructure, and finally, publishing. The DKZ.2R focuses on offering support for researchers to break through data related hurdles in order to find cross-domain solutions and synergies.

In our contribution we are presenting our workflow on the filtering of training data for Foundation Models …

Building web applications for the exploration of scientific data presents several challenges. These include the difficulty of accessing large volumes of data, the need for high-performance computing (HPC) resources to process and analyze such data, and the complexity of developing intuitive web frontends—especially for scientists who are not trained web developers. The Data Analysis Software Framework (DASF) addresses these challenges by enabling scientists to focus on Python-based backend development while seamlessly integrating HPC resources, even when these are not directly exposed to the internet. DASF also provides an automated mechanism to generate web frontends, significantly lowering the barrier to entry for scientific web application development (DOI:10.5194/egusphere-egu25-3120).

Complementing this, the ESRI Experience Builder empowers users to create multi-page web applications and dashboards through a content management system, without requiring expertise in JavaScript-based frontend frameworks. This makes it an ideal platform for scientists to build rich, interactive data exploration tools. The newly developed DASF plugin for the Experience Builder (available at https://codebase.helmholtz.cloud/dasf/dasf-experiencebuilder-plugin) bridges these two ecosystems. It enables seamless access to data and computational resources from within Experience Builder applications, facilitating the creation of powerful, user-friendly scientific web portals.

Bathymetric evolution in coastal environments is driven by complex interactions between hydrodynamics, sediment transport, and morphodynamics. Traditional morphodynamic models often face challenges in capturing these dynamics, particularly in regions like the Wadden Sea, where the feedback mechanisms between physical processes and seabed changes are highly intricate. In this study, we explore the potential of deep learning techniques to address these limitations, using convolutional neural networks (CNN) for bathymetric reconstruction and convolutional long short-term memory (ConvLSTM) networks for forecasting. We applied these models to a dataset of bathymetric observations from the German Bight, which provides detailed coverage of the seabed from 1983 to 2012. First, we demonstrated that CNN effectively reconstructs spatial bathymetric patterns based on incomplete data inputs, achieving accurate reproductions of observed bathymetry with minimal reconstruction error, particularly in regions with active dynamics like tidal channels. Second, we used ConvLSTM for forecasting, training the model with past observations to predict bathymetry. The ConvLSTM model performed well, with an area-averaged root mean square error of 0.139 m. Our results indicate that deep learning techniques offer promising alternatives to traditional methods for both spatial reconstruction and forecasting of bathymetric changes. These models can improve predictions of seabed dynamics, which are critical …

Aquatic life is crucial for human well-being, playing a key role in carbon sequestration, climate regulation, biodiversity conservation and nutrition. Plankton are the basis of aquatic food webs and sustainably sequester vast amounts of carbon from the atmosphere to the ocean’s interior. Impacts of climate change and pollution on plankton functioning and diversity not only impact fish resources that play a major role in human nutrition, but also the efficiency of the biological carbon pump. The critical role of aquatic life in biogeochemical cycles, climate regulation, conservation of aquatic biodiversity and human nutrition mandates precise mapping and monitoring. Distributed pelagic imaging techniques enable comprehensive global observation, providing critical insights for decision-making and carbon dioxide removal strategies. To this end, each day, millions of images of plankton and particles are taken by researchers around the globe, using a variety of imaging systems. Each individual image and its associated metadata can provide crucial information not only about the individual organism or particle, but also on the biodiversity and functioning of aquatic food webs, ecosystem status of the related water body, and its role in carbon sequestration. The Aquatic Life Foundation Project will, for the first time, combine billions of images acquired with …

Image-based data analysis is becoming increasingly important in Earth and environmental sciences – for example, in marine biodiversity monitoring, drone image evaluation, or automated habitat classification. Deep learning approaches such as YOLO (You Only Look Once) offer powerful tools for object detection, but their application is often limited by technical complexity and the need for programming skills.

In this demo, I present a user-friendly graphical interface that allows researchers to upload their own image datasets and annotations, and then configure and run a complete object detection workflow – including data preparation, model training, validation, and testing – all without writing any code. This tool is particularly aimed at scientists working with image data who want to apply deep learning methods without needing expertise in machine learning frameworks.

The demo will showcase typical use cases from marine biology, but the workflow is domain-agnostic and easily transferable to a wide range of Earth and environmental science applications. In the future, the tool will be available via BinderHub, allowing users to run the entire workflow directly in their web browser without any local installation.

The O cean S cience I nformation S ystem (OSIS), developed at GEOMAR Helmholtz Centre for Ocean Research Kiel, is a central platform for managing and publishing metadata related to marine research expeditions, experiments and simulations. In response to evolving national needs and broader integration efforts, OSIS is currently undergoing a major transformation.

A key driver of this development is its adoption by the Deutsche Allianz Meeresforschung (DAM) as the primary system for recording German research cruises across partner institutions. This expansion has necessitated enhanced interoperability, standardized metadata workflows, and scalable infrastructure.

In parallel, OSIS is building stronger integration with the O2ARegistry, developed by AWI as a cross-institutional metadata registry for sensors and platforms. These efforts aim to support the reuse of expedition and instrument metadata in broader national and international contexts.

As part of its modernization, OSIS will support single sign-on (SSO) via the Helmholtz AAI, enabling seamless and secure access for users across participating institutions.

Another major focus is the automated import of planned expedition data from upstream expedition planning and logistics systems such as MFP (Marine Facilities Planning) and EIS (Expeditions-Informationssystem). These enhancements are designed to streamline data entry, reduce redundancy, and improve data consistency across the …

Supplementary Software Demo to Submission 76 by Naseem Ali, Geyer B. and Schulz-Stellenfleth, J. of the Helmholtz-Zentrum Hereon

The rapid growth of offshore wind energy requires effective decision-support tools to optimize operations and manage risks. To address this, we developed iSeaPower, a web-based platform designed to support decision-making in offshore renewable energy tasks through real-time data analysis and interactive visualizations. iSeaPower integrates detailed meteorological and oceanographic data with advanced statistical methods, machine learning forecasts, and data assimilation techniques. This integration enables accurate predictions of weather windows, thorough risk assessments, and efficient operational planning for offshore wind energy stakeholders. iSeaPower is designed to optimize journey planning by considering weather conditions and travel duration. The current framework includes five methods tailored to different operational requirements. First, the forecasting method evaluates wind speed and wave height risks over short-term windows (1–3 days) using real-time weather data to quickly identify potential hazards. Second, historical database analysis calculates exceedance probabilities based on 30-day intervals from long-term historical data, revealing recurring weather risk patterns. Third, the delay time estimation method determines potential task delays across the entire year by analyzing monthly weather trends, supporting long-term operational planning and risk management. Fourth, machine learning approaches enhance the …

Supplementary Software Demo to Submission 110 by Kirchner, Fabian , Eschke, C. and Held, M. from the Helmholtz-Zentrum Hereon

There is an increasing effort in scientific communities to create shared vocabularies and ontologies. These build the foundation of a semantically annotated knowledge graph which can surface all research data and enable holistic data analysis across various data sources and research domains.

Making machine-generated data available in such a knowledge graph is typically done by setting up scripts and data transformation pipelines which automatically add semantic annotations. Unfortunately, a good solution for capturing manually recorded (meta)data in such a knowledge graph is still lacking.

Herbie, the semantic electronic lab notebook and research database developed at Hereon, fills this gap. In Herbie, users can enter all (meta)data on their experiments in customized web forms. And once submitted, Herbie automatically adds semantic annotations and stores everything directly in the knowledge graph. So it is as easy to use as a spreadsheet but produces FAIR data without any additional post-processing work. Herbie is configured using the standardized SHACL Shapes Constraint Language and furthermore builds on well-established frameworks in the RDF ecosystem like RDFS, OWL, or RO-Crate.

We will showcase this approach through a typical …