Skip to content

Benchmarks

There are many public benchmarks that test the performance of graph databases. We assessed a few of them and summarized our findings in the bullet points below.

LUBM Benchmark

The Lehigh University Benchmark (LUBM) is the most popular benchmark among Graph database vendors, and has been used by Oracle, AnzoGraph, AllegroGraph, OpenLink Virtuoso, Stardog, RDFox, Blazegraph, YARS2, Sesame/RDF4J, among others. Some of the results published publicly can be found on the Large Triplestores page on W3C.org.

For more details, please refer to LUBM Original Version and LUBM Extended Version implemented by agnos.ai.

WatDiv Benchmark

The WatDiv Benchmark was developed to measure how an RDF data management system performs across a wide spectrum of SPARQL queries with varying structural characteristics and selectivity classes.

Main characteristics:

  • does not support reasoning
  • can be configured to run on stress testing mode
  • comes with a data generator

agnos.ai has extended the original version of the WatDiv queries in order to add more complexity.

Berlin Benchmark (BSBM)

The Berlin SPARQL Benchmark (BSBM) defines a suite of benchmarks for comparing the performance of to compare the performance of storage systems that expose SPARQL endpoints via the SPARQL protocol. The benchmark is built around an e-commerce use case in which a set of products is offered by different vendors and consumers have posted reviews about products. The benchmark query mix illustrates the search and navigation pattern of a consumer looking for a product.

Main characteristics:

  • e-commerce use case with products offered by vendors and consumers
  • has no ontology
  • does not support reasoning
  • has a good query mix that covers explore, update and business intelligence use cases
  • comes with a data generator
  • contains a dictionary with product and person names

BSBM benchmark should not be designed to require complex reasoning but to measure SPARQL query performance against large amounts of RDF data.

Oxigraph published results for BSBM on their github page that includes other vendors like Blazegraph, GraphDB, Jena, and Virtuoso.

LDB Council Benchmark (LDBC)

The Linked Data Benchmark Council is a non-profit organization aiming to define standard graph benchmarks to foster a community around graph processing technologies. It's probably the only active benchmark organization out there. The LDB council has an extensive set of benchmarks covering different use cases, including a brand new financial benchmark.

LDBC currently offers the following benchmarks:

  • Semantic Publishing Benchmark: an RDF-based benchmark for semantic databases (with reasoning)
  • Social Network Benchmark Suite: targets graph database management systems, consists of two workloads, Interactive and Business Intelligence, and it's mainly used by Property Graph vendors.
  • The Financial Benchmark (FinBench): is a project aims to define a graph database benchmark targeting financial scenarios such as anti-fraud and risk control. The FinBench is scheduled to be released in the end of 2022.
  • Graphalyitcs: graph algorithms for graph analytical platforms.

As FinBench is work-in-progress, the following are the current stable LDBC benchmarks of interest:

Graph500

Graph500 is mostly used by Property Graph vendors and is backed by a steering committee of over 50 international HPC experts from academia, industry, and national laboratories, Graph 500 will establish a set of large-scale benchmarks for these applications. The Graph 500 steering committee is in the process of developing comprehensive benchmarks to address three application kernels: concurrent search, optimization (single source shortest path), and edge-oriented (maximal independent set). Further, we are in the process of addressing five graph-related business areas: Cybersecurity, Medical Informatics, Data Enrichment, Social Networks, and Symbolic Networks.

References


  1. A "triplestore" is also known as an RDF graph database, a semantic graph database or a quad-store, it's a database that stores triples i.e. RDF statements to form a graph.