NFDI4Earth Pilots

The development of the NFDI4Earth is driven by the researchers’ needs and requirements for research data management according to the FAIR principles. NFDI4Earth uses Earth System Science pilots to engage the community. These pilots stem from different domains of the ESS community,  manifest the researchers’ needs and are the community’s contribution to an agile development of the NFDI4Earth.

Pilots typically run for one year and are collected from an open call and reviewed in an open process. In a first round in 2020 14 pilots out of 38 proposals were selected and started in April 2022. Further calls for pilots will be launched in 2023 and 2024.
The next call is planned for spring 2023.

The submitted proposals are evaluated by NFDI4Earth co-applicants according to the following equally weighted criteria:

If you are interested in current or future pilot rounds, please contact the coordinator of pilot projects. For contact persons of specific pilots see descriptions below.

Bathy4All: Workflows for Mulitbeam Processing and Visualization

Domain: Geophysics, Geodesy 
Contact: Vikram Unnithan, Jacobs University Bremen
Duration:  ​01.08.22 - 31.03.23

Bathymetry data is used in a wide spectrum ocean research ranging from anthropogenic impacts, natural hazards, benthic habitats and ecosystems to maritime transport and security. Over the past decade, most German research vessels have been equipped with multibeam echosounders, which provide accurate seafloor topography and in some cases information within the water column.
The pilot will increase accessibility and thus reusability of bathymetry data. In a first step, a common processing algorithm will be integrated into open-source software to make the workflow that led to the provided data more transparent. Furthermore, the visualization and processing of water column data will be explored, which can possibly provide insights on fish and phytoplankton. In order to enable easy access and reuse available bathymetry data will be combined in a Data Cube.  

Keywords: ​Bathymetry, Marine Sciences, Accessibility, Reusability, DataCube

Data Cube Visualisation

Domain: Geography
Contact: Maximilian Söchting, Universität Leipzig
Duration: ​01.04.22 - 31.03.23

Many subsystems of the Earth are constantly monitored in space and time with a large number of different data streams (e.g. gridded climate data, biophysical parameters of the land surface, or of aquatic bodies etc.). Since the spatial and temporal resolution of these data sets continuously rises with the development of improved sensors, global and local insights into these data sets become more difficult to obtain. In order to facilitate research processes and easily gain insights from large data cubes, we want to explore different approaches to interactively visualize data cubes generated from socioeconomic and multivariate remote sensing data sources. We are aiming to extend our existing client-server software architecture for interactive exploration and visualization of data cubes. New features that will be developed as part of this pilot include 3D volume visualization and further features from the domain of visual analytics.

Keywords: Data Cubes, Visualisation, Earth Observation, Socioeconomic Data
Updates: The current version of the visualisation tool is available at (June 2022)

Developing Tools and FAIR Principles for the MetBase Database

Domain: Geochemistry, Mineralogy and Crystallography
Contact: Dominik Hezel, Goethe-Universität Frankfurt am Main
Duration: ​01.04.22 - 31.03.23

The study of geo- and cosmochemical material provide important insights to the formation and evolution of terrestrial planets, early solar system evolution as well as understanding the formation and evolution of the early Earth. MetBase is the largest cosmochemcial database, currently hosted in Germany and of high relevance for the community. However, metadata and FAIR implementation remain rudimentary, and the database needs to be modernized for efficient use and new scientific approaches. To alleviate joint analysis and interoperability, the MetBase data will be harmonized and merged with the US-American AstroMaterials database. In a further step the existing analysis interface will be enhanced by integrating new tools and enable access to various geo- and cosmochemical data bases, such as MetBase, GeoRoc, and the like.

Keywords: Cosmochemistry, Meteorites, Metadata, Interoperability, Reusability, Visualization, Analysis Platform

Enhancing Earth System Model Evaluation with Data Cube enabled Machine Learning  

Domain: Atmospheric Science, Oceanography and Climate Research
Contact: Rémi Kazeroni, German Aerospace Center (DLR)
Duration: ​01.04.22 - 31.03.23

The evaluation of Earth System Models (ESMs) using observations is crucial to improve models and assure reliable climate projections. The ESM Evaluation Tool (ESMValTool) is a community-driven diagnostics and performance metrics tool for routine evaluation of ESMs, supporting the activities within the Coupled Model Intercomparison Project (CMIP) and at individual modelling centers. The aim of this pilot is to enhance the ESMValTool with Machine Learning (ML) techniques, which offer great potential to overcome some of the existing limitations in Earth System Science. To efficiently handle the large volume of input data required for ML the ESMValTool will be adapted to interoperate with Data Cubes. Once this is achieved an exemplary ML algorithm, the PCMCI causal discovery algorithm, will be integrated into the tool.

Keywords: ​Earth System Models, DataCube, Machine Learning, Interoperability, Model Data Integration

German Marine Seismic Data Access

Domain: Geophysics and Geodesy
Contact: Janine Berndt, GEOMAR Kiel
Duration: ​01.04.22 - 31.03.23

Reflection seismic data are the most important source of information for marine subsurface structure and thus facilitate research on submarine slope stability, mega-thrust faults, or distribution and formation of natural resources in the subsurface. The major challenges of data management of reflection seismic data are the large size of datasets and the lack of standardization in processing and storage. The aim of the pilot is twofold: 1) Develop a systematic procedure for data acquisition with a unified metadata standard and extensively documented processing, verified with a test cruise 2) Develop a strategy to rescue and standardize legacy data that run the risk of being lost. Both aspects foster the reusability of reflection seismic data within the German research community.

Keywords: Seismics, Measurement Harmonization, Reusability, Interoperability, Metadata, Data Rescue, Legacy Data

GeoFRESH: Getting freshwater spatio-temporal data on track

Domain: Water Research
Contact: Sami Domisch, Leibniz Institute of Freshwater Ecology and Inland Fisheries (IGB)
Duration: ​01.04.22 - 31.03.23

Freshwater water bodies are highly connected with each other and with their terrestrial catchments. In the light of climate and land use changes as well as feedback mechanisms between earth systems, the integration of earth system data into freshwater research is long-overdue to assess those interdependencies. However, freshwater-specific characteristics like spatial connectivity and fragmentation as well as legacy effects require a specialized workflow. Within the pilot a prototype for a new online platform, called GeoFRESH will be developed that will provide the integration, processing, management and visualization of various standardized spatiotemporal freshwater-related earth system data. The platform will be built around IGBs Geonode using RShiny and include a newly created global high-resolution hydrographic network dataset.

Keywords: ​​​Freshwater, Earth System, Interoperability, Visualization, Analysis Platform

Interoperability and Reusability of Geoscientific Lab Data  

Domain: Geophysics and Geodesy
Contact: Sven Nordsiek, Leibniz-Institut für Angewandte Geophysik (LIAG) Hannover
Duration: ​01.05.22 - 31.03.23

Geoscientific research has become a highly complex and interdisciplinary task that often produces huge amounts of manifold data, associated data types and related documentations accordingly. Interoperability and reusability are often severely limited, due to non-existing standards – especially within the field of petrophysics. This pilot develops a data map, i.e., a structured overview of existing instruments, methods, parameters etc. and their interconnections. This semantic map aims to resolve intra- and interdisciplinary ambiguities. Initially it will cover petrophysics and then be expanded to other disciplines like geochemistry, mineralogy, hydrology and many others. Based on the data map a database model will be created, which in turn could be used to develop automated tools for metadata generation and incorporation of new methods and parameters. Such a tool can be used as upstream layer before data is published to ensure compliance to metadata standards.

Keywords: ​​Petrophysics, Metadata, Interoperability, Reusability, Interdisciplinary, Lab, Measurements

Linking Environmental Data into European Scale Research Infrastructures

Domain: Ecology, Biogeochemistry
Contact: Jan Bumberger, Helmholtz Centre for Environmental Research (UFZ)
Duration:  position to be filled (status 10th June 2022)

Long-Term Monitoring data are needed to analyze the effects of climate change on ecosystem processes and biodiversity. The eLTER initiative strives for harmonization of measurements at its sites to overcome interoperability issues when combining observations for large-scale assessments.  This pilot will set up a data node that standardizes and connects observation data with the European Open Science Cloud EOSC. It is targeted towards data from UFZ but is expected to be transferable to other German eLTER observation site operators. Furthermore, new methods for automated data curation and quality assessment will be explored. This pilot concerns the abiotic data gathered at the biogeochemical observation sites, whereas an analog use case study in NFDI4Biodiversity covers biotic observations.

Keywords: ​​Sensor Data, Measurement Harmonization, Accessibility, Interoperability

NFDI for Seamless Earth System Model-Data Integration

Domain: Atmospheric Science, Oceanography and Climate Research
Contact: Naixin Fan, Technische Universität Dresden
Duration: ​01.04.22 - 31.03.23

Global Earth observation (EO) data is invaluable to evaluate, parametrize and enhance Earth System Models (ESMs). However, the integration of EO data with ESMs currently lacks a formal infrastructure but involves the use of different file formats and various software tools in different programming languages. This pilot envisions a seamless model data integration for EO data that builds on existing tools and develops the missing connections. In a first step the requirements for the framework and existing implementations will be assessed in exchange with fellow researchers. Based on these insights a model-data interface toolbox and a calibration toolbox will be developed as a prototype. The pilot will focus on the land components of ESMs, so called dynamic global vegetation models, however, the framework is expected to be transferable to other models or model parts.

Keywords: Earth System Models, Earth Observation, DataCube, Interoperability, Model Data Integration

OcMOD: Observations closer to Model Data

Domain: Atmospheric Science and Climate Research
Contact: Martin Schupfner, German Climate Computing Centre (DKRZ)
Duration: ​01.04.22 - 30.09.23

The output of climate models is usually validated with reference data from observations. Model data and commonly used observational data must be obtained from different sources. Model data stem from research institutions, observational data often from public authorities. To proceed with further analysis all data must be standardized and brought to the same format. The objective of this pilot is to bring observational data closer to the model output and to increase number of users by making data from public authorities more easily accessible. The pilot will exemplarily integrate the German Weather Service (DWD) reanalysis dataset COSMO-REA6 to the infrastructure of Earth System Grid Federation ESGF. The developed workflow will be documented and serve as a blueprint for further integration of datasets from DWD or other public authorities.

Keywords: ​​​Climate Models, Earth System Models, Authority Data, Interoperability, Model Data Integration

PAMbase: A Repository of Soundscape Recordings to Study Earth’s Phonosphere  

Domain: Landscape Ecology
Contact: Jan Engler, Technische Universität Dresden
Duration: ​01.04.22 - 31.03.23

Acoustic environmental data are used to monitor biodiversity, calving of glaciers, or noise pollution in aquatic ecosystems. Developments in automated sound recording and analysis technologies provide unprecedented possibilities for studying acoustic environments. However, a centralized data hub to store and manage the increasing amount of data is so far missing. The pilot develops a prototype repository for passive acoustic monitoring data called PAMbase. Crucial aspects in the development are a unified standard for metadata and indexing as well as solutions for data storage. PAMbase will feature a user-friendly front-end for uploading, searching, and exploring sound files, potentially including citizen science data. Furthermore, tools for automated sound detection and signal classification will be explored.

Keywords: Ecology, Biodiversity, Accessibility, Repository, Analysis Platform

Reusability of Data with Complex Semantic Structure

Domain: Geology and Paleontology
Contact: Lukas Jonkers, MARUM Universität Bremen
Duration: ​01.08.22 - 31.03.23

Data on the occurrence and abundance of fossils provide invaluable insights into past climate and biodiversity change. However, lack of common taxonomic standards and associated vocabularies, limit reusability of fossil data and thus global assessments. Inconsistent and variable taxonomy are a common challenge faced in biodiversity research using species occurrence data. The pilot aims to resolve those semantic barriers for the example of planktonic foraminifera. This includes a community-driven process to develop metadata standards and ontologies that can accommodate varying research needs as well as future changes in taxonomy. This approach can on the one hand be used to create a pipeline for submissions of new data. On the other hand, it enables the reuse of legacy data by translation to the newly developed standard.

Keywords: ​​Interoperability, Ontology, Metadata, Taxonomy, Data Rescue, Legacy Data

Statistical Learning to assess factors underlying environmental changes

Domain: Atmospheric Science, Oceanography and Climate Research
Contact: Daniel Pabon, ​Max Planck Institute for Biogeochemistry Jena
Duration: ​01.06.22 - 31.05.23

The earth system currently experiences profound environmental changes, for example in the climate, in biogeochemical flows or in biodiversity. However, assessing the drivers underlying these changes is a challenging task, both technically and scientifically. This pilot will implement different methods for driver attribution in the programming language Julia in the data cube environment based on the zarr format. Gridded environmental observational data is increasingly stored in data cubes enabling a straightforward analysis of multiple variables across different dimensions. The technical implementation is accompanied by an example analysis that explores the impact of land cover change on different climate variables. In the implementation and testing phase simulated data will be used, afterwards the analysis will be performed using the global multivariate dataset of the Earth System Data Lab.

Keywords: DataCube, Driver Attribution, Deep Learning, Julia, Earth System

World Settlement Footprint (WSF)

Domain: Geography
Contact: Jan Karl Haug, German Aerospace Centre (DLR)
Duration: ​01.04.22 - 31.03.23

Urbanization is the cause and consequence of most environmental and societal changes on Earth. The German Aerospace Centre (DLR) has developed a suite of global maps, the World Settlement Footprint (WSF), that contain the distribution and evolution of human settlements. The pilot makes the data and its future updates available for the research community via the EOC Geoservice. The WSF data suite will be equipped with STAC-compliant metadata to ensure easy use and interoperability of the data, enabling for example the integration into Data Cubes. Alongside Leibniz Institute of Ecological Urban and Regional Development (IÖR) they will promote common standards for human settlement data to ease joint analysis of WSF and IÖR-Monitor data.

Keywords: Earth Observation, Urban Data, Accessibility, Interoperability

