Datahike

(Scicloj Online Meetup 2019 October 14th)

Konrad Kühne

Created: 2019-10-14 Mo 16:35

Motivation

giphy.gif

Motivation

  • Datomic concept well-established
    • fast iterations, scalable
  • datalog popular in academic research
    • i.e. static code analysis
  • successful application in customer projects
  • only few no extensible open source solutions

Triple Store

  • database for storage and retrieval of three-part segments
    • entity, attribute, value
  • value either primitive data type or reference to another segment
  • retrieval via query language

Datomic

  • transactional immutable distributed database with commercial licensing
    • database of facts
  • developed by Rich Hickey
    • motivation for Clojure
  • interdependent components
    • client, server, storage, transactor, console
  • implements flavor of Datalog
    • query, pull
  • many persistence protocols
    • e.g. cloud, relational databases

Datascript

  • in-memory open source datalog implementation
  • very mature (5 years+ development)
  • faster in memory query engine than Datomic
  • supports a lot of Datomic’s datalog flavour
  • only partial schema-on-read functionality

Datalog

giphy.gif

Features

  • declarative logic programming language
  • subset of Prolog
  • based on first-order logic
  • not Turing complete
  • implicit joins
  • list of predicates (clauses)

Datalog and others

SQL

SELECT id
FROM members
WHERE
name = 'Alice'

Prolog

name(X, "Alice")

Datalog

  [:find ?id
   :where
   [?id :name "Alice"]]
;; => #{[22]}

Queries

Facts

[[1 :name "Alice"]
 [2 :name "Bob"]
 [3 :name "Charlie"]
 [1 :age 45]
 [2 :age 35]
 [3 :age 25]]

Basic

  [:find ?e
   :where
   [?e :age 45]]
;; => #{[1]}

Unification

[:find ?n
 :where
 [?e :age 45]
 [?e :name ?n]]
;; => #{["Alice"]}

Any

[:find ?n
 :where
 [_ :name ?n]]
;; => #{["Alice" "Bob" "Charlie"]}

Inputs

[:find ?a
 :in $ ?n
 :where
 [?e :name ?n]
 [?e :age ?a]]
;; <= db "Alice"
;; => #{[45]}

Result Specs

[:find ?e .
 :where [?e :name "Alice"]]
;; => 1

[:find [?n ...]
 :where [_ :name ?n]]
;; => ["Alice" "Bob" "Charlie"]

[:find [?e ?a]
 :in $ ?n
 :where [?e :name ?n]
        [?e :age ?a]]
;; <= db, "Alice"
;; => [1 45]

Pull

Syntax

[*]
[:name :age]
[:name :age {:car [*]}]
[:name :age
 {:car [* {:vendor}]}]

Within queries

[:find (pull ?e [:name :age])
 :where [?e :name _]]
;; => #{{:name "Alice" :age 45}
;;      {:name "Bob" :age 35}
;;      {:name "Charlie" :age 25}}

Datahike

giphy.gif

Background

  • first database ideas with replikativ (x-platform replication system)
  • originally fork of datascript combined with hitchhiker tree index
  • partially compliant to Datomic API

Philosophy

  • open source
  • datalog
  • composable
  • extensible
  • configurable

Features

  • datalog query engine
    • Datomic flavored
  • multi-index
  • schema flexibility
    • on-read, on-write
  • storage
    • in-memory, file, LevelDB, PostgreSQL
  • auditable
    • time travel, history
  • community friendly
    • modular, extensible components

Internals

giphy.gif

Architecture

Sorry, your browser does not support SVG.

  • API:
    • connection from application
  • Connector
    • mutable db connection, persistence communication
  • Query Engine
    • datalog parser, joins predicates, uses db search
  • DB
    • core record, index transaction and search
  • Indices
    • EAVT, AEVT, AVET and temporal
  • Store
    • persistence, (e.g. PostgreSQL, Memory)

Transaction Flow

Sorry, your browser does not support SVG.

Query Flow

Sorry, your browser does not support SVG.

Live Coding

source.gif

  • Helsinki public services data
  • repo

Next Steps

  • data migration facilities
  • Java / Scala bindings
  • standalone client and server
  • query planner
  • identity and access management
  • probabilistic reasoning
  • datalog for modelling economic systems (datopia)

Thanks

  • Nikita Prokopov (datascript)
  • Rich Hickey (Clojure, Datomic )
  • David Greenberg (Hitchhiker Tree)

References