We fund the Earth system science community to create educational content through annual calls for educational pilots. The outcomes will be available through the NFDI4Earth educational portal for the public. The submitted proposals are evaluated by NFDI4Earth co-applicants according to the following criteria (read the full guideline here):
a. Relevance to NFDI4Earth
b. State of the art content
c. Novelty (addressing the gaps in existing OERs in ESS)
d. Use of active teaching methods
e. Relevance to RDM in ESS
f. Potential for integration into NFDI4Earth curricula
Here's a peek at the educational pilots that have made the cut year after year, each adding something special to our understanding of Earth system sciences:
The increasing availability of satellite data in recent years opens up new applications in many areas of environmental science. The processing of large amounts of data, especially satellite data, is one of the most important pillars of environmental monitoring. However, they also require extensive knowledge and appropriately trained personnel. Educational institutions such as universities have the task of adapting to these requirements. This adaptation must include all steps of the complex process chain for processing satellite data. In this context, it is important not only to train technical skills, but also that methodological competencies enable students to critically evaluate their own work steps. To reduce the complexity, a modular structure is used, which also makes it possible to take into account existing skills in the data processing chain.
At the latest due to the restrictions by the Corona pandemic digital teaching and learning opportunities experienced an enormous boost. Experience has shown that flexible content can be an essential element in motivating learning. The growing importance of MOOCs impressively underlines this development. These developments and effects represent an opportunity to transform the necessary content of satellite data processing into teaching and self-learning materials.
The objective is to develop Jupyter notebooks as self-learning material which provide a processing chain of common classification task with remote sensing data.
The project comprises three main work stages, whereby the technical implementation of the modules is the central element, namely methodological development of learning modules on the process flow of processing satellite data, technical implementation of the modules in Jupyter notebooks with example datasets, and testing and assessment of the modules with MSc students in Environmental Sciences at TU Dresden.
Generally, simulation and modelling of the environmental processes are accomplished on the grid level in which the investigation region is discretized to numerous grid points in the three dimensions of space plus time. Consequently, these simulations produce enormous data sets and processing this data extends beyond the current average Personal Computer capacity. However, only some people have access to high-performance computing centers. Additionally, the possibility of speeding up calculations and modelling exists in each PC through compiled programming languages such as Fortran. This solution speeds up computations and can reduce the CO2 footprint drastically.
R is one of the languages widely used in data analysis, visualization, and presentation, and it has a wide supporting community and thousands of packages. Nevertheless, Fortran is one of the fastest-performing languages -if not the fastest in number crunching- and one of the oldest. Due to the latter, interest in Fortran is constantly low. Considering all the above, the need for educational material that links R and Fortran is essential.
This project aims to provide one OER platform that will be a one-stop for all R users looking for speed in general and users from Environmental Science disciplines in particular.
Many developers have made efforts to speed up R using C++; however, integration of Fortran and R in such a package has yet to exist, to the best of the authors' knowledge. Filling this gap is important because Fortran is well-suited for numerical and scientific computations due to its array processing capabilities, performance, and efficiency. Commonly, computationally demanding models are written in Fortran; thus, integrating Fortran and R will allow environmental modelers and researchers to minimize changes between different programming languages.
Lead isotopes are a well-known geochronological tool. However, lead
isotope signatures can also be used to link non-ferrous metal objects to ore
deposits because they do not fractionate in metallurgical processes. Based on
this link, lead isotopes are a powerful tool to reconstruct past economical
networks. When combined with other methods, they also help to decipher past interactions between humankind and the environment, especially the impact of
mining activities. For these reasons, lead isotopes are a particularly
well-suited example for an interdisciplinary approach that combines Earth
System Sciences, Humanities, and Data Sciences.
The Educational Pilot “Teaching lead
isotope geochemistry and application in archaeometry (LIGA-A)” will create a
collection of educational materials that highlights this interlinkage and the
importance of modern data scientific approaches to the topic. The educational
materials will stand on their own but follow the way of lead isotope signatures
from their generation in ore deposits through the metallurgical process and their
measurement in the lab to the proper handling of such data, their visualization
and interpretation and finally their application in concert with data from e.g.
archaeological excavations, textual sources, and sediment cores.
To reach this aim, the educational resources
will utilize a wide range of formats such as presentations, quizzes,
animations, interactive visualization, and coding exercises. At the same time, the
Educational Pilot will focus on the creation of materials that are as inclusive
as possible from a technical point of view but also with regards to different
impairments of the learners.
For the efficient handling of large gridded datasets, the concept of a
datacube has received much attention in the last years. A datacube stores
datasets with common axes (like latitude, longitude, time) in a neatly
organized and easily accessible format, that e. g. allows fast data subsetting.
Part of the convenience of a datacube originates from the data being stored in
so called chunks; memory readable standardized subsets of the data that allow
efficient data access and parallel processing. However, accessing data on disk
also creates an overhead on computation time from input/output operations. Thus
access to the data cube is only fast when the data is provided with suitable
chunking aligned to the analysis in question: To illustrate, if data is chunked
for time series access, it will be inefficient to access a map (one timepoint
from each chunk), and vice versa if data is chunked for spatial processing, it
will be inefficient to access a time series separated across many chunks.
A proper chunking for efficient data reading and writing is especially
important due to the following factors: The datasets that we have to handle in
the earth system sciences are getting so large that they cannot be loaded in
full into the working memory anymore. But when data have to be accessed on
disk, the number of input/output operations should be minimized to avoid
limiting computation speed. More and more data is also available in the cloud
and needs to be made cloud compatible. Since data latency times become even more
important in the cloud, the data is compressed. It is then very important to
only decompress the data that is needed for the given analysis to optimize
resources and computational speed. Both can be achieved by optimal chunking.
This course provides interactive
notebooks and explorable explanations to give the student an intuition of the
usage of different chunking strategies and their influence on the performance
of the computations. The material will be provided as interactive Jupyter
notebooks, so that the learners could follow along, experiment and modify the
code at their own pace. The notebooks will be made available in Binder,
allowing interactive online code execution, to lower the entry barrier. The
material will be provided in English. The target group is expected to have some
programming experience and some experience in the work with gridded data.
Coding exercises are an important component of teaching data analysis in
ESS today. Manually correcting assignments is often a heavy workload for
exercise instructors. Students also often do not submit in time nor receive
timely feedback. Therefore, automated code checking systems are promising for a
wide range of teaching activities in ESS education. Several universities offer
this service, based on different software architectures and infrastructures.
Most of them are closed to their own students. In addition, the same basic
content is often designed repeatedly at different universities, or even in
different departments of a university.
Nbgrader is an existing tool that supports creating and grading
assignments for Jupyter Notebooks. It can be easily deployed in a conventional
server, where student users can program Python code online in a
Jupyter-Notebook interface and the exercise instructors can automatically grade
their submissions. The Institute of Cartography and Geoinformatics at the
University of Hannover has implemented such a system and successfully deployed
it for teaching activities using Python as the programming language since 2021
for their courses such as GIS I - modeling and data structure, laser scanning
data processing, SLAM and etc.
The reuse of existing teaching materials is also
of great importance. Within the education-oriented project ICAML -
Interdisciplinary Center for Applied Machine Learning2 (Coordinated by
co-applicant Martin Werner, BMBF funded 2018-2020), numerous Jupyter Notebook
tutorials for machine learning topics in geospatial data analysis were
developed and introduced to the community. While an interactive code checking
process is important to further develop these tutorials and make these contents
interactive and effortless to be included in future E-Teaching activities
related to geospatial data analysis.
Changes in land use/cover are taking place worldwide on a variety of
spatiotemporal scales and intensities. In this context, urbanization is a
process that is affecting more and more areas of society and nature. Today, more
than half of the world's population already lives in cities – in some European
countries, the figure is up to 80%. Even though built-up areas account for only
2-3% of the land surface worldwide, their “ecological footprint” is enormous.
Agricultural land, in particular, is being taken up for the expansion of
settlement and transport areas. The analysis of such changes based on
heterogeneous geospatial data sources is an important work step to estimate the
future evolution of socio-ecological parameters such as migration, erosion,
runoff patterns, biodiversity, etc.
Regional case studies from “hot spots” of
urbanization will be used to perform the necessary work steps to capture and
quantify urbanization in the context of sustainable development (Sustainable
Development Goal 11). Modern methods for accessing open geodata will be
presented and the extraction of thematic information from volunteered
geographic information (VGI), social media geographic information (SMGI) and
earth observation (EO) data with Python will be taught. The learners can
comprehend all work steps independently on their own computer. Basic knowledge
of digital image processing and Geographic Information Systems is required.