The goal of this article is to introduce Datahike’s Java API. To do so we will create a small web application together. The figure below illustrates the application architecture and lists some of the core advantages of using Datahike. Datahike is an open source, light-weight Datalog runtime with durable and auditable storage. It has an efficient query engine and is written in Clojure. On the left we can see the Spring Boot based application interacting with Datahike’s Java API through its Controllers. Since Clojure is also hosted on the JVM, objects such as lists or functions can be transparently passed into Datahike’s runtime.

Architecture of Java application using Datahike.

There are good reasons for using a Datalog database like Datahike. First, it is simple. It is a very small, well-factored core codebase (< 5000 lines of code) of a few core concepts. This allows it to be flexibly recomposed and integrated with existing data sources in novel ways. Furthermore, it is more declarative than SQL by its roots in logic programming languages like Prolog, which provide first class support for implicit binding by logic variables. Because of its support for recursion it is also strictly more expressive than pure relational algebras such as those described by SQL. Compared to non-functional databases, Datahike provides coordination-free read scaling by automatically snapshotting all write operations. These snapshots can be read in parallel in each JVM runtime context of an arbitrary number of reading processes. It can also be audited at any point in time like git. Datahike requires only Java dependencies and can be used in-memory, with a simple file based backend, with Redis for high throughput, with auto-scaling cloud infrastructure like AWS S3 and DynamoDB, or with all of these backends combined in one query context.

This combination makes Datahike an extremely powerful, but light-weight runtime to reason about data. Datahike is following the path pioneered by Cognitect’s Datomic and DataScript. We also take inspiration from the long and rich research tradition on Datalog. Datahike is largely API-compatible to Datomic. Datomic has more features and is more mature, but Datahike is open-source, leaner and easier to adapt.

Quick introduction to Datahike

Datoms and Entities

A Datahike database stores all facts as so called Datoms. A Datom is a tuple made of an entity id, an attribute and the associated value. The same concept is at the foundation of the semantic web where resources are described using a subject–predicate–object relation also known as triples (see RDF). It is possible to model relational, graph or columnar databases. Below is a Datahike fact stating that the player with id 532 is named ’John’.

[532 :player/name "John"]

In this article we will often use Clojure syntax. Unless obvious, I will introduce the terms along the way. In the above example we denote a Clojure vector. The attribute part, :player/name, is what is called a Clojure keyword. In Datahike, an entity is a group of Datoms sharing the same entity id. An entity can be seen like an object, grouping multiple facts together in time and space. The first three Datoms below describe the entity for player 532.

;; ...
[532 :player/name  "John"]
[532 :player/team  1201]
[532 :player/event 2534]
;; ...
[1201 :team/name   "The Blue Jays"]
;; ...

It tells us that the player’s name is John and that the player’s team is the entity whose id is 1201 (the entity is partially shown in the above figure). This example shows how entities reference other entities. The database also contains the fact that the player’s team name is ’The Blue Jays’ and that the player is associated with event with entity id 2534.

Queries

Datahike queries are written in Datalog. The general structure of a query (in Clojure syntax) is as follows:

[:find ?e
 :where [?e :player/name "John"]]

Here ?e is a variable and [?e :player/name "John"] is a clause. The query extracts all Datoms which have an attribute :player/name and value John.

Recursion

 1  (transact conn [{:db/id 1
 2                   :ancestor 2}
 3                  {:db/id 2
 4                   :ancestor 3}
 5                  {:db/id 3
 6                   :ancestor 4}])
 7  
 8  (def rule '[[(ancestor ?e1 ?e2)
 9               [?e1 :ancestor ?e2]]
10              [(ancestor ?e1 ?e2)
11               [?e1 :ancestor ?t]
12               (ancestor ?t ?e2)]])
13  
14  (q '[:find  ?u1 ?u2
15       :in    $ %
16       :where (ancestor ?u1 ?u2)]
17     @conn
18     rule)

The above example illustrates how Datahike supports recursion. The call to the transact function inserts three entities into the database (represented by the connection conn). The syntax is slightly different here as we use Clojure’s map syntax to pass each entity to the transactor. This transactor DSL is flexible and convenient. This transactions ends up with the following Datoms as facts:

[1 :ancestor  2]
[2 :ancestor  3]
[3 :ancestor  4]

Line 8 to 12 define a Datalog rule. A rule’s role is to infer new facts from existing ones. In our example the rule defines the meaning of an ’ancestor’, i.e., ?e1 is an ancestor of ?e2 if there are facts with attribute :ancestor in the database. And, by induction, ?e1 is an ancestor of ?e2 if ?e1 is an ancestor of another entity ?t and ?t is also the ancestor of ?e2.

Lines 14-18 define the recursive query for deducing and retrieving all ancestor relationships. Notice that the implementation of the equivalent query in SQL would require a lot more work and would not be expressed as elegantly. Datalog turns out to be well-suited both in SQL and graph database use cases.

Our small web application example

The goal of our small web application is to let a team plan for events it is going to participate in. More precisely it lets teams state which of their players will participate in which events.

Below is an illustration of the application. It shows a team with its events and for each event the players who are going to join.

A team at events

As the application is built using Spring Boot, we are going to use Thymeleaf for the view part, and it will be bootstraped using Spring Initializr. The latter will generate a skeleton of the application with all the libraries and dependencies correctly setup.

To build our example, first go to the Spring Initializr web page https://start.spring.io. Fill in the fields as shown in form shown below. Don’t forget to add Spring Web, Spring Boot DevTools and Thymeleaf to the dependencies, then click Generate. This will download a zip file called team-events.zip. Unzip the file and you have your project ready. Import the project in your favorite editor. As I am using IntelliJ, I will illustrate the steps using it.

Screenshot of Spring Initializr.

Select File | New | Project from Existing Source and choose the folder where you unzipped your project. Choose to import the project as a maven project: Import project from external model | maven. Select a JDK no older than JDK 11. In your pom.xml add the dependency to the latest version of Datahike and Clojure.

<dependency>
    <groupId>io.replikativ</groupId>
    <artifactId>datahike</artifactId>
    <version>0.3.0</version>
</dependency>

<dependency>
    <groupId>org.clojure</groupId>
    <artifactId>clojure</artifactId>
    <version>1.10.1</version>
</dependency>

Then select the TeamEventsApplication.java file and run it. If all went well, it should start a web server listening locally on port 8080. If you go to your web browser at localhost:8080 you should see the following error page.

Screenshot of Spring Initializr.

Building the application

It is now time to build the logic of our application. First we need to create a database. With a Datahike database, the first step is to decide whether to create your database with or without a schema. In our application we are going to use a schema. In Datahike, the schema’s role is to define and constrain each attribute.

{:db/ident :player/event
 :db/valueType :db.type/ref
 :db/cardinality :db.cardinality/many}

In the above example, we declare that the :player/event attribute is of type ref because it is used to reference another entity in the database and it is of cardinality many meaning that the attribute can appear multiple times inside one entity. An attribute can also be declared as unique, which means that each value it refers to uniquely identifies the entity. This allows very handy shorthands in queries.

Below is a typical sequence for creating a Datahike database in Java. After defining the schema, we create the database and connect to it. This returns conn, a reference to the database which we will use to interact with it.

Object schema = Clojure.read(" [{:db/ident :team/name\n" +
                             "                 :db/valueType :db.type/string\n" +
                             "                 :db/unique :db.unique/identity\n" +
                             "                 :db/index true\n" +
                             "                 :db/cardinality :db.cardinality/one}\n" +

                             "                {:db/ident :team/event\n" +
                             "                 :db/valueType :db.type/ref\n" +
                             "                 :db/cardinality :db.cardinality/many}" +

                             "                {:db/ident :event/name\n" +
                             "                 :db/valueType :db.type/string\n" +
                             "                 :db/unique :db.unique/identity\n" +
                             "                 :db/cardinality :db.cardinality/one}" +

                             "                {:db/ident :player/name\n" +
                             "                 :db/valueType :db.type/string\n" +
                             "                 :db/unique :db.unique/identity\n" +
                             "                 :db/cardinality :db.cardinality/one}" +

                             "                {:db/ident :player/event\n" +
                             "                 :db/valueType :db.type/ref\n" +
                             "                 :db/cardinality :db.cardinality/many}" +

                             "                {:db/ident :player/team\n" +
                             "                 :db/valueType :db.type/ref\n" +
                             "                 :db/cardinality :db.cardinality/many}" +

                             "]");
                             
String uri = "datahike:file:///home/user/temp/team-db";
Datahike.createDatabase(uri, k(":initial-tx"), schema);
Object conn = Datahike.connect(uri);

It is now time to build the logic of our application. We will start with what is required to handle players.

Creating a player

To create a new Player, in the PlayerController class we define the code that defines the behaviour following a post request at the /players URL. The request will provide the player’s name as a request parameter.

    @RequestMapping(path = "/players", method = RequestMethod.POST)
    public String create(Model model, @RequestParam("name") String name) {
        Datahike.transact(conn, vec(map(k(":player/name"), name)));
        return "redirect:/players";
    }

As we can see, adding a new entity to a Datahike database is fairly simple. It consists in transacting a Datom whose attribute is :player/name and associating it with the player’s name. In Clojure this would be done this way: (transact conn [{:player/name name}]). The equivalent code in Java is what is written in our controller: Datahike.transact(conn, vec(map(k(":player/name"), name))). We can see that the Java methods vec and map are the equivalent of Clojure’s literal vector and map (written [] and {} respectively). Notice also that Clojure keywords are built using the k method in Java. It expects a Java string starting with a colon as argument. Finally, we use conn as the reference to the database for the transaction.

Listing all the players

Listing the players currently in the database is also fairly simple:

    @RequestMapping(path = "/players", method = RequestMethod.GET)
    public String index(Model model){
        String query = "[:find ?e ?n :in $ :where [?e :player/name ?n]]";
        Set<PersistentVector> res = Datahike.q(query, dConn(conn));
        model.addAttribute("players", res);
        return "players/index";
    }

Datahike’s q method is used for querying the database. It takes a query and a variable number of arguments used as input to the query. In the current version of the API, we pass the query as a string written in Clojure syntax. In our example the query input is the database itself. More precisely we use a dereferenced version of the database, dConn(conn). This ensures that we are always accessing the latest version of the database as a snapshot. A query returns a set of PersistentVectors which is the Java implementation of Clojure’s vector type. In Java we can use it simply as a Java vector.

The query [:find ?e ?n :in $ :where [?e :player/name ?n]] extracts all existing players. This query returns a set of tuples consisting of an entity id and the name of a player (respectively the variables ?e ?n). The query result is then passed as an attribute to the Spring model object so that it becomes reachable from the view. An excerpt of the view listing the players is shown below:

    <h1>Players</h1>
    
    <table>
      <tr>
        <th>NAME</th>
      </tr>
      <tr th:each="player : ${players}">
        <td th:text="${player[0]}">id</td>
        <td th:text="${player[1]}">name</td>
        ...
      </tr>
    </table>

We use Thymeleaf’s operator th:each to iterate over the players that were passed to the view as attribute players. Below figure shows the list of players.

List of Players

Adding a player to an event

For a player to attend an event with her/his team, we need the following controller:

     1  @RequestMapping(path = "/events/{eventId}/players", method = RequestMethod.POST)
     2  public String create(Model model, @PathVariable("eventId") int eventId,
     3                       @RequestParam("playerId") int playerId,
     4                       @RequestParam("teamName") String teamName) {
     5      String query = "[" +
     6          ":find ?ti ?teamName " +
     7          ":in $ ?teamName " +
     8          ":where [?ti :team/name ?teamName]]";
     9      Set<PersistentVector> res = Datahike.q(query, dConn(conn), teamName);
    10      int teamId = (int) res.iterator().next().get(0);
    11  
    12      Datahike.transact(conn,
    13                        vec(vec(k(":db/add"), playerId, k(":player/event"), eventId),
    14                            vec(k(":db/add"), playerId, k(":player/team"), teamId)));
    15      return "redirect:/teams/" + teamName;
    16  }

The goal is to transact the association between a player and an event, plus the association between a player and a team. This is what the call to Datahike’s transact method is doing (line 12). To be safe to use with user inputs, a query (line 5-8) should always be a constant String. Parameters can be freely passed to q explicitly and there is no need to concatenate strings. It is only concatenated here to make it more readable.

Since in the first part of the controller the request only provides us the team name, we query the database for the team id of the given team. Notice that along with the database, here the teamName is passed as additional data to the query (line 9).

Removing a player from an event

     1  @RequestMapping(value = "events/{eventId}/teams/{teamId}/players/delete/{playerId}",
     2                  method = RequestMethod.GET)
     3  public String deleteFromEvent(Model model, @PathVariable("teamId") int teamId,
     4                                                @PathVariable("eventId") int eventId,
     5                                                @PathVariable("playerId") int playerId) {
     6      Datahike.transact(conn, vec(vec(k(":db/retract"), playerId, k(":player/event"),
     7                                      eventId)));
     8  
     9      String query = "[:find ?teamName :in $ ?ti :where [?ti :team/name ?teamName]]";
    10      Set<PersistentVector> res = Datahike.q(query, dConn(conn), teamId);
    11      String teamName = (String) res.iterator().next().get(0);
    12      return "redirect:/teams/" + teamName;
    13  }

To remove a player from an event, we use Datahike’s retraction API which consists in transacting a tuple starting with the keyword :db/retract followed by the Datom we want to remove (line 6).

The rest of the method is a query to retrieve the team name from its team id and use its name to render the correct view.

Removing a player from all its events and teams

    @RequestMapping(value = "/players/delete/{id}", method = RequestMethod.GET)
    public String delete(Model model, @PathVariable("id") int id) {
        Datahike.transact(conn, vec(vec(k(":db.fn/retractEntity"), id)));
        "return redirect:/players;
    }

To fully remove a player from the database, we remove its entity from the database. This will remove all its relations to events and teams. To do so, we use Datahike’s entity removal API, which consists in transacting the keyword :db.fn/retractEntity followed by the entity id to remove.

Other controllers

The controllers for events and teams follow the same principle as the player controller. For brevity I will not list the code here. You can retrieve the full code at the application repository.

Conclusion

In this article we have introduced the Java API of Datahike, a durable Datalog based database. We have shown how easy it is to build a Java web application on top of Datahike. We will keep the Java API in sync with our ongoing development of Datahike. As future work, we plan to provide an embedding of Datalog as a Java DSL in addition to the string based query representation. If you want more information on Datahike please visit our repository, the Java API definition or the Slack and Zulip channels. If you have special needs regarding Datahike, we are happy to help. In that case, please contact info@lambdaforge.io.