Scope
e-Infrastructure for Earth Sciences
H. Igel, H.-P. Bunge, LMU Munich
Introduction – Rationale
It is unquestionable that in all major branches of natural sciences computational simulations will play an increasingly important role in hypothesis testing, data modelling, and the development of new theories. Europe (e.g., EU-HPC initiative in FP7) and Germany in particular (e.g., the recently formed Gauss Centre for Supercomputing) is investing heavily into the development and maintenance of supercomputing infrastructure. In comparison, little funding is dedicated to scientific projects exploiting the available resources with specific applications and to the development of an e-infrastructure that enables the wider community to make use of advanced simulation technology.
Computational scientists are predicting a new era of supercomputing: After the “microprocessor era” with ever increasing clock rates, we are now entering an era with heterogeneous “multi-core” architectures that will impact hardware from large-scale shared-memory machines (SMP) down to mobile phones, PC processors and clusters. These rapid developments will have fundamental consequences for the development and use of simulation tools in all fields of sciences: (1) Standard code implementations have to be in parallel, new (non-MPI) programming paradigms may have to be adopted according to hardware developments; (2) Portability may become more and more difficult due to the heterogeneous architectures, optimisation will be hardware-specific; and (3) Multi-platform use and development requires substantial interaction with professional software engineering and can no longer be done by scientists alone.
These developments require a paradigm shift in the approach to “cpu-rich” applications (simulation technology) particularly in the Earth Sciences: “software is infrastructure”. This implies that qualitatively a new level of IT-support for internationally-competitive leading-edge research will be necessary in the future.
What are the consequences for Earth Science research?
In the past few years numerous simulation techniques were developed in Earth Science branches such as geodynamics, seismology, crustal deformation, geophysical fluid flow, geomaterial simulation, core dynamics, and others. These developments – often undertaken in doctoral or postdoctoral projects – have led to highly complex algorithms that are usually implemented and maintained on local (parallel) hardware without support through professional software engineering. In most cases these “heroic codes” (IT jargon) are sufficient for internal use but substantial effort usually by young researchers is required to adapt them to new hardware (operating systems, compilers) and/or maintain them on a longer time scale. With increasing algorithmic and hardware complexity this implies that (1) a substantial amount of time is required until researchers can do science with specific simulation techniques; (2) access to and use of advanced simulation tools requires expertise knowledge in computational sciences, supercomputing technology, and parallel programming; (3) a large part of the scientific community is excluded from access and use of advanced simulation tools with negative consequences on the scientific output of data modelling projects.
Steps towards establishing an e-infrastructure for Earth Sciences
In order to allow – particularly the data rich part of - the Earth Science community to make efficient use of available computational resources the following steps are necessary:
- Identification of central, advanced applications/simulation tools in Earth Sciences with a sufficiently large user group
- Establishment of professional coding standards for development and maintenance (version control, automated testing, benchmarking)
- Porting specific applications to software development platforms with close interaction between software engineers and scientists
- Development of a library with (multi-platform) tested and benchmarked simulation software (“community software”) and standardized user interfaces for efficient use by scientists without expertise training in computational science
- Development/adaptation of (international) standards for Earth model and data exchange, definitions of meta-data for simulation results
- Establishment of links between simulation technology and observational infrastructure (e.g., data archives)
- Development of web-interfaces into German (European) supercomputing hardware (e.g., GRIDs, D-GRID, DEISA), “on-demand” computing for dedicated applications.
- Provision of training programs, extensive documentation, e-learning tools for the engineered simulation techniques in the library
- Development and provision of standardized co- and post-processing tools for simulation data (e.g., visualization, data contraction, data analysis, etc.)
Development of the proposed e-infrastructure would have a strong impact on the practice of research in many Earth science branches: (1) It would improve the scientific output of data modelling studies and – through the rigid development strategy – would imply a big step towards the reproducibility and reliability of results; (2) The community making use of High-Performance Computing (HPC) Infrastructure would increase and in turn further support and justify investments into future HPC developments; (3) The time to leading-edge-research requiring simulations will be shortened particularly for researchers at the beginning of their career; (4) It would improve the dissemination of German Earth Science simulation technology through open access to software libraries.
What are necessary elements for an e-infrastructure initiative?
- Network of Earth Science research groups, computational scientists developing HPC applications, and representatives of user communities
- Science steering committee selecting applications for the software library
- Pool of software engineers developing, adapting, optimising, implementing, and maintaining specific applications for wide use by Earth scientists
- Access to local, national and European HPC hardware (multi-core PCs, PC clusters, supercomputers)
- Bug-tracking, news groups, evaluation mechanisms for library contents
- Infrastructure for simulation data handling (storage, visualisation, analysis)
- Cooperation with international projects of similar scope and data centres (e.g., CIG, GEON, DEGREE, DEISA, ORFEUS, IRIS, SCEC, etc.)
- Funding for training courses, workshops and meetings
- Computational resources for “on-demand” applications (e.g., GRIDs)