Using the cross-domain Data-Management Platform repository

A step-by-step overview of how to use the repository.

Beta: This tutorial is in a testing phase. Feedback is welcome.

Overview

Time estimation: 40 minutes

Version: main

Last update: 2025-12-10

Questions:

How to start looking for a Data-Management Solution?

What benefits are there to having a cross-domain repository?

How to extend a Data-Management solution using APIs and other platforms?

Objectives:

Understand the purpose and scope of the Data Management Platforms repository.

Learn how to navigate and interpret the listed platforms.

Contribute new or updated information to the repository.

Be able to contribute to it,

Know its limits and what to do about them,

Table of Contents
In this tutorial we will deal with:

Scope

Prerequisites

Setting-up a Data Management Solution, a short overview

Goals and Purposes

Means

Constraints

Introduction to the Data Management Platform Repository

Hands-on: Using the Repository

Step 1: Access the Repository

Step 2: Understand the Main Goals

Step 3: Using the repository

Using the menus

Hands On: Find a commercial DM platform for a domain

Answer

A typical entry

Hands On: Find details about a specific entry

Answer

Using the search

details

Hands On: Find details about a specific entry

Answer

Extras

Step 3: Generalists vs Specialists

Hands On: Find one generalist platform and a specialist one

Answer

Hands On: How could you combine 2 platforms

Answer

Step 4: Considering Set-up Options

Step 5: Be Aware of What’s NOT Included

warning

Step 6: Next Technical Steps

Step 7: Contribute to the Repository

Conclusion

Scope

This tutorial is not a comprehensive guide to doing Data Management, but a “been-there-done-that” introduction followed by how to use the DeKCD registry of FAIR data management platforms for the Cloud. It is mostly on the technical side of Data Management.

Prerequisites

To follow this tutorial:

Knowledge of what a Data Management Platform is,
Knowledge of the FAIR principles.

For a good use of the repository:

Knowledge of Linux and Containers (Docker),
Knowledge of APIs.

Setting-up a Data Management Solution, a short overview

The repository is mostly a technical overview, and as such this section is mostly scoped to the technical aspects.
Several online resources help with a Data Management Plan, making it FAIR and various aspects, such as RDM Kit (in Life Sciences), FairCookBook, Data Stewardship Wizard, FairWizard.

Goals and Purposes

The choice of a Data Management Solution is deeply dependent on the given goals and purposes. A solution for supporting a one-year work period could use a simple web application installed using Docker on a local server, while a solution meant for 10 years and several projects might benefit from being on the cloud, using several connected web applications.
Unfortunately there is no silver bullet, as the best solution will depend on your specific conditions, if you are part of a consortium proposing some infrastructure, if your institution offers some solutions, and if you have the means to get a technical person for the duration of the project.

Means

Some Data Management platforms are easy to set up, easy to update, and easy to use. Some are not, but they might address your needs better. In that case, it is probably good to consider the highest point of friction: what will cost the most in the long run. But also critical points: what cannot be accepted.

A critical point is the impossibility of doing updates. If your Data Management platform is online, it needs to be updated for security reasons. It might not be too critical if the application is behind a VPN, but that can change and might be a bigger problem later.

Points of friction are the ease of use, the difficulty to update, and the difficulty to set it up, from most important to less important, as a difficult-to-use application might simply be avoided. Users might also use some of their own solutions, making future data management really difficult. But you have to be able to set it up and update it, and a difficult setup might turn an emergency into a long downtime.

Similarly, an assembly can be desirable: one application closer to the lab communicating with a sharing platform, for instance. But the API communication needs to be secure, robust, and well documented (also on your setup).

Versioning is always a good option for all your customisations, APIs, templating, parameters… When the customisation is not easily versionable, it is a good idea, when it is available as text, to work on a versioned copy. This works particularly well with a test setup, which is also always good to have: apply the working changes only on test, and apply them on production only when committed in the Version Control System.

Constraints

Constraints might be funding constraints, data usage constraints due to data privacy policy, but also non-commercial-use-only of software or data…

They might be considered as a chain: your Data Management platform needs to pass all elements of the chain.

Some constraints might be extremely costly, like adapting a GPL-licensed platform in a commercial environment: the licence forces you to share all changes, which could be a no-go for a private entity. Non-compliance with data privacy could result in a fine, and a leak of personal data could be devastating.

They have strong connections with your means. You should have some leeway in order to manage constraints: i.e. the setup should not be so complex that the person(s) managing it can only focus on the needed technical aspect.

Some decisions related to constraints must also be made before setting up the platform. For instance, if you work on patient-related data and need encrypted storage, you need to choose a platform supporting it.

Introduction to the Data Management Platform Repository

The Data Management Platforms (DMP) repository is a curated collection of major platforms used in research data management.
It emphasizes the FAIR principles (Findable, Accessible, Interoperable, Reusable) and evaluates how platforms integrate with cloud infrastructure.
This tutorial will guide you through using the repository, understanding its scope, and contributing to it.

Hands-on: Using the Repository

Step 1: Access the Repository

Open your browser and go to https://dekcd.github.io/FAIR-DMP-Registry/.
Spend some time exploring the different menus; we will cover them later.
On the top header, the title and the Home link both return to the main page. About is a short description of the registry.
The left column shows the different pages, while the right column shows the topics in the current page.

Step 2: Understand the Main Goals

The repository highlights platforms with:

Reusability across domains,
Interoperability,
Affinity to cloud usage/set-ups,
Good documentation.

Note: While most platforms align with FAIR principles, not all do completely.

Step 3: Using the repository

Using the menus

The menus are on the top, the left, and the right:

Top Menu: The Home page and the About page.
Left Menu: All pages of the repository.
Right Menu: The topics in the current page.

The top menu will typically be used once or twice, to check the About page.

Image showing the top menu

The left menu will be used to select in which category to look for:

Image showing the left menu

The right menu will be used to go to a specific topic and most of the time to a specific domain. Some topics have subtopics that become visible in the menu when in the parent topic.

Image showing the right menu

Hands On: Find a commercial DM platform for a domain

From the Home Page, navigate to the Commercials subtopic of Geomatics in the list of Major Data Management platforms.

Answer

First click on the Major Data Management Platforms link in the left menu.

Then on Geomatics on the right menu, and finally on Commercials in the sub-menu.

A typical entry

Not all entries are identical, as there is some extra information for some entries. But all entries should have a common base, and most entries will stick to it:

Link with the Name, quick description pointing generally to the URL of the main web page or the documentation.
- Description and/or link to potential Docker image/Docker compose/Kubernetes manifest/…
- Link to API(s), quick clarification on how well documented it is.
- Interoperability NONE/LOW/MEDIUM/HIGH / No interoperability: explanation on why.
- Not/Partly/Mostly/Fully cross-domain: explanation on why.

Hands On: Find details about a specific entry

In the Major Data Management Platforms page, find the details about pyiron, which is in Materials Science. Check if it is cross-domain.

Answer

Click on Materials Science on the right menu, then eventually scroll down to find the details about pyiron, including if it is cross-domain.

Using the search

The repository provides a search function, powered by the document system Quarto, which is accessed by clicking on the magnifying lens icon at the top right of the top menu.

In the pop-up that appears, typing should automatically start a search.

Doing a search

Clicking on one result should lead to the paragraph containing the result. It is possible to use the browser “Search in Page” to go to the actual entry.

Finding the entry using the browser search

details

Depending on the size of the page, there could be 2 magnifying lenses. They have both the exact same function.

Hands On: Find details about a specific entry

Using the search, look for CIViC.

Answer

Click on the magnifying lens.

Then type CIV in the search bar.

Finally click on the search result.

The result should roughly be in the middle of the page.

Extras

The repository also lists, in a simplified format:

the major ontologies and thesauruses,
the major taxonomies and classifications,
Some useful tools and platforms for Data Management, these are applications and tools that can be used together with Data Management platforms, such as central authority providers that can help set up a Single Sign On (SSO, login once for several applications) or used by Data Management platforms, such as an S3 implementation, to store data in a cloud environment.

As for the Data Management Platforms, they do not pretend to be exhaustive and all contributions are welcome.

Step 3: Generalists vs Specialists

One main consideration when choosing a Data Management platform is the choice between generalist platforms (i.e. not dedicated to a domain) and specialist platforms, dedicated and adapted to a domain. A specialist platform might still be used in another domain, eventually with some caveats. The last point of each entry explains how cross-domain an entry is.
While choosing a generalist platform might allow you to quickly start working and storing data, a specialised platform might offer some clear benefits: pre-existing metadata and/or ontologies/taxonomies, a structured storage of data making reusing and sharing easier, …

Generalist platforms: e.g. NextCloud (supporting tasks), iRODS or RUCIO (registering/storing data).
Specialist platforms: e.g. Electronic Laboratory Notebooks for domain-specific work.
Tip: Combining both may be necessary or useful. Interconnection often depends on available APIs and documentation.

Hands On: Find one generalist platform and a specialist one

Using the right menu, look for Generalists. Find a Generalist platform of your choice.

Then select a specific domain to find a Specialist platform.

Answer

Click on the Generalists in the right menu.

The section lists the Generalist Data Management Platforms. The next section lists the Authority control platforms, which are generally online and also generalist, such as ORCID. The last list is for platforms which are not exactly a Data Management platform but can be used as one or used by another platform and lists, at the time of this writing, only NextCloud.

NextCloud can be used as a flexible Data Management platform, with the limit that it is difficult to structure the stored data, or as a storage solution for another platform.

Hands On: How could you combine 2 platforms

Your institution is already using NextCloud as a generic distributed storage facility. Inside a shared folder, you store many large microscopy images. You would like to have an online tool to visualise and work on these images.

Answer

Omero is an online imaging platform for microscopy outputs, and can be connected to other platforms using APIs. A central login is also possible using a central identity service (but this information is not part of the current registry).

Using both NextCloud and Omero APIs, it is possible to bridge both, eventually with a small script fetching the images from NextCloud and pushing them to Omero. Omero also proposes specific import scenarios.

Step 4: Considering Set-up Options

While not giving a detailed explanation on how to set up a platform, an entry will say if a containerized setup exists, as well as an integration in Kubernetes.

Containerized platforms are generally easier to set up and should be easier to maintain and update. But it is important to check thoroughly how well this support is: if the image is updated regularly, if an update process exists.

An integration with Kubernetes might allow easier continuous operation, where Kubernetes will take care of the lifeline of the platform. But the setup will generally be more complex as well.

In both cases, it is important to know where the data will be stored, as both are based on images, so the data will be stored externally (most probably in volumes in both cases). The data should be securely stored and backed up regularly.

Some platforms will come with a Docker Compose or a Helm chart, and in that case the storage might be set up as part of the configuration. But generally not the backup.

A lack of containerized setup does not mean that the platform will be difficult to set up. But with a standard setup, it will always be necessary to consult the installation setup documentation and follow the given procedure.

Finally, each entry tries to detail if there exists an API access, enabling interoperability and data extraction.

Adding an API access is not a simple task, so the need to have one should be clarified before choosing a platform. This access can be via an HTTP REST API, generally easy to use, a CLI, which will generally be non-standard, or a language API: Python, Java, … Interfacing using the same language will be easy, probably easier than through a REST API (or very similar in the case of Python), but interfacing with a different language might be difficult.

These technical aspects might give a security risk: if the platform is hard to update, if the API is too open, if the setup is too complex and might let some openings. In some cases, simpler is better, so it is possible to set up the platform with a reasonable level of security. If IT support is available, it is also advisable to involve them early.

Step 5: Be Aware of What’s NOT Included

The repository does not provide:

A full Data Management Plan (e.g. retention policy, access rights).
Guarantees on platform lifetime or funding.
Alignment between scientists’ low-friction tools vs. institutes’ and funders’ FAIR requirements.

warning

Pitfall: Modifying open-source software may block future updates. It is advised to prefer configuring over modifying.

Step 6: Next Technical Steps

A Quickstart guide for setting up a Data Management platform is planned:

From bare-metal to full Kubernetes deployments,
With details on costs, security, and support needs,
With vocabulary clarification for better understanding of existing documentation.

Step 7: Contribute to the Repository

You can contribute via GitHub - the link is also provided on the top header with the GitHub logo:

Create an issue to suggest a change.
Fork the repository and submit a pull request to add or modify entries.

A typical entry should include:

Platform name & quick description (with link),
Links to Docker/Kubernetes manifests,
Links to APIs & documentation,
Interoperability level (None/Low/Medium/High),
Cross-domain applicability (Not/Partly/Mostly/Fully).

* [Name, quick description](https:URL of main web page)
	+ Description and/or link to potential Docker image/Docker compose/Kubernetes manifest/…
	+ Link to API(s), quick clarification on how well documented it is.
	+ Interoperability **NONE/LOW/MEDIUM/HIGH** / No interoperability: explanation on why.
	+ **Not/Partly/Mostly/Fully** cross-domain: explanation on why.

Conclusion

You now know how to use and navigate the Data Management Platforms repository.
It is a living resource focused on FAIR principles and cloud compatibility, and it depends on community contributions to grow and remain current.

For questions or contributions, contact:
Alain Becam – Alain.Becam@bioquant.uni-heidelberg.de
Project site: https://datenkompetenz.cloud/

Key Points

The repository lists cross-domain Data Management platforms with a FAIR and cloud focus.

It does not replace a full Data Management Plan.

Platforms can be generalist or specialist, depending on needs.

Contributions are welcome via GitHub issues and pull requests.

💬 Feedback: Found something unclear or want to suggest an improvement? Open a feedback issue.

👥 Contribution: We also welcome contributions when you spot an opportunity to improve the training materials. Please review the contribution page first. Then, edit this material on GitHub to suggest your improvements.

Contributions

Author(s): Alain Becam

Editor(s): AB

Supported by: