Roadmap for next NLnet funding
I’m pleased to announce that the NLnet foundation has approved follow-up funding for Ontogen from the NGI0 Commons Fund. I’m grateful for NLnet’s trust in this project and their general commitment to advancing open source technologies.
Unfortunately, the future of such funding through the EU is uncertain. There are plans to redirect these funds in questionable directions, potentially affecting this and many other open source projects. For more information on this concerning development, see this article by the Free Software Foundation.
Fortunately, the funding for Ontogen’s development is secured for the coming year. With this support, the plan is to overcome the current limitations mentioned in the introductory articles (and now also clearly stated in the project READMEs) and enhance Ontogen’s functionality. Here’s an overview of the planned work.
Roadmap
Mud
Bog, Ontogen’s configuration language and corresponding precompiler for automatic ID management in RDF, will be extracted from Ontogen and realized as its own project, as its functions can be usefully applied outside the versioning context. In the process, it will be renamed to Mud.
The problem of cryptic graph names will be solved by extending the mechanism for generating URIs for resources to include the ability to later rename them to custom URIs and track the past URIs. The possibility of generating URIs, previously limited to the resources of an Ontogen repository, will be extended to the resources of the version-controlled dataset.
Furthermore, the range of functions will be expanded to include additional typical and extensible operations that one might want to perform when working with RDF datasets, such as URI normalizations, RDF smushing, etc.
Saga Synchronization Protocol
Sagas introduces a synchronization protocol to keep different storage locations in sync by utilizing Ontogen’s version history itself. Just as VCS can be viewed as a synchronization solution for copies of a repo, “Sagas” can be viewed as “clones” within an Ontogen instance that are kept in sync via the shared version history.
This allows us to turn necessity into virtue and actually duplicate and synchronize not just the repository configuration in the file system, but the entire content of the dataset. This duplication opens up new possibilities. By having a complete copy of the repository in the file system, we enable the use of the many file-based tools for working with the data, for example, we can use ordinary text editors to edit the data kept in sync with the triple store.
This significantly enhances the flexibility and accessibility of the data, allowing users to leverage their preferred tools and workflows. Furthermore, this approach allows for integration with Git. By leveraging Git’s versioning capabilities alongside Ontogen’s specialized RDF versioning, we can achieve better collaboration, history backups and migrations, and even more flexible workflows in RDF dataset management, especially until Ontogen is more mature and can offer similar functionalities (although the syntactic versioning of Git will never be offered and for users who want or need that, this integration is useful in the long term).
Multi-Graph Dataset Versioning
Supporting versioning of datasets with multiple graphs is a larger undertaking, unfortunately, as RDF-star, which forms the basis of Ontogen’s versioning model, only allows annotation of triples and not quads. Therefore, a rather complex tracking of the assignment of individual triples to graphs must be realized at the level of the RDF-star annotations of RTC compounds.
DID Integration
In addition to these core developments, Patrick Oscity will contribute by developing an Elixir implementation of W3C Decentralized Identifiers (DIDs). This implementation will be integrated into Mud and Ontogen to support the automatic generation of DIDs, providing a standardized, interoperable format for persistent identifiers with enhanced privacy features, for datasets where this is needed or useful.
Scalability problem
A previously unmentioned limitation should be mentioned: Ontogen, in its current form, is not suitable for large datasets. While I work on the planned developments, I’m also exploring solutions to this scalability issue.
At least an issue causing timeouts with larger queries was removed in the recently released version 0.1.3. If you’ve already installed Ontogen, you can upgrade using
brew upgrade ontogen