Introduction¶

This site introduces and compares graph database benchmarks, with a focus on RDF and semantic graph databases.
It is intended for architecture teams, performance engineers, and decision makers who need data-driven ways to select and operate graph database technologies as part of a wider Virtual Enterprise Knowledge Graph (EKG) and GenAI capability.

What is the problem with Semantic Graph Database Benchmarks?¶

Benchmarks for semantic graph databases usually provide tools and guidelines for vendors to set up and run the tests themselves, then publish the results on their own websites. In practice, vendors often publish only partial, non-audited results (for example, data import performance on W3C.org), while ignoring other aspects of the benchmark.

Only some benchmarks support reasoning, which is one of the main selling points of RDF graph databases. Even fewer can simulate realistic enterprise environments where concurrent users execute use-case-driven query workloads over time. These workloads are essential to understand scalability and to expose contention and data concurrency bottlenecks.

The few benchmarks that do combine realistic workloads, reasoning support, and auditing are not widely used by triplestore vendors, which leaves buyers with an incomplete and fragmented picture of performance.

Note

Most semantic graph database benchmark websites have not been actively maintained, and few publish audited results anymore.
Apart from partial results from some vendors, there is little up-to-date, trustworthy information.

As a result, prospective buyers who lack hands-on experience with RDF graph databases find it extremely difficult to make evidence-based product selections for mission-critical use cases.

Benchmark facts¶

Very few official, audited benchmark results from RDF and property graph database vendors are publicly available.
Most RDF graph database vendors publish unofficial, partial results (for example, data import only, without query timings).
Not all benchmarks provide realistic transactional and analytical workloads.
The benchmarks that do support these workloads tend to be rarely used by RDF graph database vendors.
Not all benchmarks support reasoning.
There is currently no single, complete, and reliable source of benchmark results that buyers can use to select RDF graph databases.

Benchmarks available¶

The following are the main graph database benchmarks available today. Click the links for short descriptions and details:

The W3C RDF Store Benchmarking page collects references to RDF graph benchmarks, results, and papers on graph benchmarking. On the W3C Large Triple Stores page you will see that most vendors have published results for the Lehigh University Benchmark (LUBM).

LUBM is the most commonly used RDF graph benchmark, but it is not the most complete and has a number of shortcomings that we address with our extended versions.

Benchmarks in an Enterprise Knowledge Graph context¶

In an Enterprise Knowledge Graph (EKG), you typically run hundreds of different use cases across many lines of business, often over multiple types of graph databases. For each use case, you need to determine which backend graph database is best suited.

In practice, a single use case may run across multiple graph databases (for example, a combination of Amazon Neptune and RDFox, or Stardog or Ontotext with Neo4j or TigerGraph). Large enterprises usually end up with several graph technologies co-existing.

To manage this complexity, benchmarking must be integrated with monitoring and DevOps (or DataOps / SemOps). We move towards continuous benchmarking, where benchmark scenarios are defined as part of model-driven use cases in the EKG and are executed repeatedly over time as part of a Plan–Build–Run lifecycle.

For more background on EKGs, see the Enterprise Knowledge Graph Foundation (EKGF) at ekgf.org, and our own view of the Virtual Enterprise Knowledge Graph and Plan–Build–Run journey on the main agnos.ai site.

The benchmark work described here is part of a broader EKGF ecosystem.
You may also want to explore the following EKGF sites:

Use Case Tree Method – a method for developing EKG capabilities through structured use cases and outcomes: method.ekgf.org
EKG Use Case Catalog – a catalog of EKG use cases, datasets, and ontologies curated by the EKGF community: catalog.ekgf.org
EKG Maturity Model – a maturity model for assessing and planning Enterprise Knowledge Graph capabilities: maturity.ekgf.org
EKG Principles & Manifesto – foundational principles and manifesto for Enterprise Knowledge Graphs: principles.ekgf.org

Solving the problem¶

At agnos.ai we believe that robust graph database benchmarks are essential for the success of the graph database market, much as TPC has been for the relational database market for decades.

Note

We translate each benchmark into a use case model that becomes part of the knowledge graph itself.

Likewise, every use case defined as a metamodel in your EKG—with its stories and test scenarios—can be treated as a benchmark: the benchmark test-bed can run them against multiple backend databases, capture performance metrics over time, and build time series that show how performance evolves (for better or worse).

Graph databases are implemented by a growing number of storage systems and are widely used across enterprises and the Web. As SPARQL adoption increases, there is a growing need for benchmarks to compare the performance of systems that expose SPARQL endpoints via the SPARQL protocol.

We have been extending some of the most popular benchmarks to create more comprehensive and mature test suites. We currently offer:

We are also working with the Linked Data Benchmark Council (LDBC) benchmarks, including the Semantic Publishing Benchmark (LDBC-SPB) and the Financial Benchmark (LDBC FinBench).

We provide benchmarking services that help organizations:

Select graph database vendors based on use-case requirements translated into measurable metrics.
Collect continuous, use-case-specific performance metrics across product versions, providing insight into use-case maturity and performance evolution over time.