Adrien Devresse1, Omar Awile1, Jorge Blanco1, Tristan Carel1, Nicolas Cornu1, Tom de Geus2, Luc Grosheintz-Laval1, Pramod Kumbhar1, Fernando Pereira1, Sergio Rivas Gomez1, Matthias Wolf1, James King1
1 Blue Brain Project, École Polytechnique Fédérale de Lausanne, Switzerland
2 Physics of Complex Systems Laboratory, École Polytechnique Fédérale de Lausanne,
Switzerland
The use of portable scientific data formats are vital for managing complex workflows, reliable data storage, knowledge transfer, and long-term maintainability and reproducibility. Hierarchical Data Format (HDF) 5 is considered the de-facto industry-standard for this purpose. While the official HDF5 library is versatile and well supported, it only provides a low-level C/C++ interface. Lacking proper high-level C++ abstractions dissuades the use of HDF5 in scientific applications. There are a number of C++ wrapper libraries available. Many, however, are domain-specific, incomplete or not actively maintained.
HighFive is an attempt to address these challenges.
It is an easy to use modern C++ header-only library that reduces most of the book-keeping overhead required by HDF5. HighFive uses RAII to handle object life-times and automatically handles reference counting on HDF5 objects. The library makes use of C++ templating for automatic type mapping. These features significantly increase programmer productivity and reduce coding bugs.
HighFive | HDF5 |
|
|
Its simplified data-management does not come at a loss of HDF5's flexibility and advanced features and tunable parameters are exposed through a simple interface. File version bounds can be read and written to define object compatibility, the metadata block size can be set. Group properties for compression, chunking and link info estimates can be set and read:
HighFive is built with scientific applications in mind. Besides scalar and simple STL
vectors it is possible to map C++ structs to HDF5 compound types and to read and write
Boost, Boost ublas, Eigen and XTensor array types. The library is also able to handle
combinations of array types (e.g. std::vector<Eigen::Matrix>
). This is
achieved through various templated converters. Additionally, HighFive supports enums and
various string types.
With the aim to support large-scale scientific application, we have made an effort to
also natively support the HDF5 MPI backend in HighFive. A special
MPIOFileDriver
is used in the application code to ensure that HDF5 is correctly
initialized. No other special API calls are required since all necessary provisions are
handled transparently.
HighFive also offers the H5Easy API (on its own namespace). This has an API in which things can be done in one-liners, with a syntax comparable to for example h5py for Python. It offers overloads for STL containers, Boost, Eigen, xtensor, and OpenCV.
H5Easy | h5py |
|
|
HighFive is developed open-source and can be cloned and forked from
GitHub.
It can also be installed from the clone,
via spack,
or via conda.
It can then be used for example with find_package(HighFive)
in CMake.
Being a header-only library, HighFive can be used directly as a subfolder in a C++ project,
for example by adding it as a submodule.
More details can be found in the
README.md file.