Poster

  • 18:15 – 19:00
Community members

LLM-Assisted Variable Annotation using the I-ADOPT Framework

Poster
In session Postersession No. 2 , Sept. 3, 2025, 18:15 – 19:00
Exact timing: 18:15 – 19:00

Rastegar, Arvin1ORCID iD icon
  1. Karlsruhe Institute of Technology

Effective data stewardship in research hinges upon the consistent and FAIR (Findable, Accessible, Interoperable, Reusable) representation of scientific variables across diverse environmental disciplines. Within the Helmholtz Earth and Environment DataHub initiative, we are, hence, developing an innovative approach utilizing Large Language Models (LLMs) to support data producers by automating the semantic annotation of research data. Our service employs the community-driven I-ADOPT framework, which decomposes variable definitions from natural language descriptions into essential atomic parts, ensuring naming consistency and interoperability.

In this poster, we present our approach to developing an LLM-based annotation service, highlighting key challenges and solutions as well as the integration in higher-level infrastructures of the Helmholtz DataHub and beyond. The proposed annotation framework significantly streamlines the integration and harmonizes the description of environmental data across domains such as climate, biodiversity, and atmospheric sciences, aligning closely with the objectives of the NFDI and European Open Science Cloud (EOSC).

This contribution showcases how advanced semantic annotation tools can support data stewardship in practical research contexts, enhancing reproducibility, interoperability, and collaboration within the scientific community.