Screenshot of datalog linting in editor.

TLDR; How do I use it?

It is very simple, just use the newest clj-kondo version and you are good to go. We are able to detect Datalog in the position of first argument to DataScript’s, Datomic’s and Datahike’s q function. I am not sure whether there are other situations where you know that a data structure corresponds to a query of these functions, but we could also support a metadata hint for example.

Why support linting for Datalog?

We use Datalog daily, either to build frontends with DataScript, backends with Datomic or with our composable database stack Datahike. These three systems all use the Datalog dialect that has first been introduced by Datomic. A benefit of this dialect is that it is simply described as a symbolic expression built out of edn. Its ability to bind and join input data over multiple data sources as well as invoke pure Clojure functions is extremely powerful even without managing durable data. But also, whether beginner or expert, one still often makes mistakes in query formulations and you need to invoke a query first before you get feedback. Datalog is a very concise language for which we have an informative parser that both checks syntax and part of its semantics, so my colleague Konrad suggested that it would be very useful to extend the linter of clj-kondo and tighten the feedback loop for writing Datalog queries as shown in the picture above. So I just did that.

Implementing a linter

Let’s see how easy implementing static analysis can be. Static analysis means that your code is not really run, i.e. it is interpreted in a way to prove its (in)consistencies without causing any side effects, e.g. in a type checker or a verification tool. clj-kondo is a linter, a tool that does analysis on your code in a separate context, that means it is not interfering with the REPL and not stopping you from running inconsistent code. This allows you to care of consistency whenever you have figured out what you actually want to do, compared to languages that need intrinsic type safety to be run. So clj-kondo needs to analyze your code, but instead of executing (compiling and running) it, it just tries to find (likely) inconsistencies, e.g. unbound variable errors or redundant code warnings. To add the Datalog analyzer I have added a case in the analysis of function calls for the three query functions of DataScript, Datomic and Datahike for the following function:

(defn analyze-datalog [{:keys [:findings] :as ctx} expr]
  (let [children (next (:children expr))
        query-raw (first children)
        quoted? (when query-raw
                  (= :quote (tag query-raw)))
        ;; 1. Let's try to unpack the first argument (query) if it is a quoted literal.
        datalog-node (when quoted?
                       (when-let [edn-node (first (:children query-raw))]
                         (when (one-of (tag edn-node) [:vector :map])
                           edn-node)))]
    (when datalog-node
      (try
        ;; 2. Call the external parser.
        (datalog/parse (sexpr datalog-node))
        nil
        (catch Exception e
          ;; 3. Finally we can extract the error from the exception
          ;; and point to the source.
          (findings/reg-finding! findings
                                 (node->line (:filename ctx) query-raw
                                             :error :datalog-syntax
                                             (.getMessage e))))))))

This code mainly follows 3 steps. First it uses rewrite-clj to extract the symbolic expression for the first argument and tries to unpack it into a Datalog literal. Second it calls our datalog-parser, which is derived from DataScript, to check whether it would succeed. Third if it throws an Exception we extract the error message from the parser and register it with reg-finding! for the original query-raw expression. The only thing missing is to define the :datalog-syntax type in the default configuration map in config.clj. We also had to make sure that the parser does not use reflection so that it is compatible with GraalVM’s native image support that is used to compile clj-kondo. This functionality has only added a few hundred kilobytes to the final static binary. Overall it was a very pleasing experience and doable in two afternoons. Done.

Pushing things further

Depending on how much you work in a specific domain with formally specified language abstractions providing a linter for it can pay off quickly. I am sure it was even worth the effort if I would just have done it for myself. It might also be good to improve your parser/checker from the perspective of an interactive helper, maybe we should try to catch more errors by making sure all variables passed to predicates are properly bound?

In general I think static analysis is extremely underexplored in Clojure, partially because of the very good arguments Rich Hickey made against global static type checking and the resulting puzzling mentality that distracts from the actual problem solving in the vaguely formalized domains of everyday programming. Here is a good clarifying article about his talk . But we can try to proof consistencies locally and for some domains stronger static guarantees might be worth it, e.g. if the cost of uncaught downstream errors is high. Clojure’s immutable nature makes it a superb target for such analysis. clj-kondo already provides a simple extrinsic type system that works very well for me. Static analysis for Clojure is also provided by joker and Cursive.

The Clojure ecosystem can borrow plenty of good ideas in this direction from the tower of languages and libraries provided in Racket. While this community is much more academic, they are our friendly cousins in parenthesis. I interact with researchers working in Racket daily and I am pretty sure they would help you if you want to port one of their languages or libraries to Clojure. core.typed is a port of Typed Racket for instance.

A pretty mind boggling approach is to use Datalog itself for analysis, e.g. with clindex, which I plan to use in my research compiler. Facebook is using a similar approach in Glean and Rust’s borrow checker can be described in Datalog. Semmle has been bought by Github and is integrating its Datalog based static analysis now. There are also many other interesting ideas around Datalog in the programming language community. e.g. Lambda Prolog and datafun. I will try to collect some interesting type system related resources for clj-kondo here, feel free to reach out to me or Michiel Borkent if you have ideas and want to contribute.

Finally I want to thank Michiel Borkent for clj-kondo and for guiding me through the process, Nikita Tonsky for providing the original Datalog parser, Mistral Constratin for pointers to static analysis and his work on data flow analysis for Datalog itself and Ron Garcia for teaching me a thing or two about static analysis ;).

Let’s bring more static analysis and Datalog to Clojure for even more fun and profit!