NHacker Next
login
▲Datalog in Rustgithub.com
254 points by brson 16 hours ago | 26 comments
Loading comments...
Leynos 15 hours ago [-]
It's funny seeing this as the top story.

I'm in the middle of putting together a realtime strategy game using Differential Datalog[1] and Rust, with DDL managing the game's logic. Mostly as an excuse to expose myself to new ideas and engage in a whole lot of yak shaving.

[1] https://github.com/vmware-archive/differential-datalog

foota 20 minutes ago [-]
I wonder if you could make a frankenstein version of differential datalog by combing the OP repo with salsa[1] (the crate that powers rust-analyzer)

[1] https://github.com/salsa-rs/salsa

Yoric 15 hours ago [-]
On, nice!

I'll be interested in reading how this goes!

cmrdporcupine 14 hours ago [-]
Very cool, I'm curious to see what the state of that implementation is and how far you get, since DDLog is not being actively maintained anymore.
burakemir 11 hours ago [-]
I made some progress porting mangle datalog to Rust https://github.com/google/mangle/tree/main/rust - it is in the same repo as the golang implementation.

It is slow going, partly since it is not a priority, partly because I suffer from second system syndrome. Mangle Rust should deal with any size data through getting and writing facts to disk via memory mapping. The golang implementation is in-memory.

This post is nice because it parses datalog and mentions the LSM tree, and much easier to follow than the data frog stuff.

There are very many datalog implementations in Rust (ascent, crepe) that use proc-macros. The downside is that they won't handle getting queries at runtime. For the static analysis use case where queries/programs are fixed, the proc macro approach might be better.

maweki 12 hours ago [-]
It is nice to see a core group of Datalog enthusiasts persist, even though the current Datalog revival seems to be on the decline. The recent Datalog 2.0 conference was quite small compared to previous years and the second HYTRADBOI conference was very light on Datalog as well, while the first one had a quarter of submissions with Datalog connection.

I'm encouraged by the other commenters sharing their recent Datalog projects. I am currently building a set of data quality pipelines for a legacy SQL database in preparation of a huge software migration.

We find Datalog much more useful in identifying and looking for data quality issues thatn SQL, as the queries can be incredibly readable when well-structured.

kmicinski 11 hours ago [-]
No offense, but I wouldn't take Datalog 2.0's small attendance as an exemplar of Datalog's decline, even if I agree with that high-level point. Datalog 2.0 is a satellite workshop of LPNMR, a relatively-unknown European conference that was randomly held in Dallas. I myself attended Datalog 2.0 and also felt the event felt relatively sparse. I also had a paper (not my primary work, the first author is the real wizard of course :-) at the workshop. I myself saw relatively few folks in that space even attending that event--with the notable exception of some European folks (e.g., introducing the Nemo solver).

All of this is to say, I think Datalog 2.0's sparse attendance this year may be more indicative of the fact that it is a satellite workshop of an already-lesser-prestigious conference (itself not even the main event! That was ICLP!) rather than a lack of Datalog implementation excitement.

For what it's worth, none of what I'm saying is meant to rebut your high-level point that there is little novelty left in implementing raw Datalog engines. Of course I agree, the research space has moved far beyond that (arguably it did a while ago) and into more exotic problems involving things like streaming (HydroFlow), choice (Dusa), things that get closer to the general chase (e.g., Egglog's chase engine), etc. I don't think anyone disagrees that vanilla Datalog is boring, it's just that monotonic, chain-forward saturation (Horn clauses!) are a rich baseline with a well-understood engineering landscape (esp in the high-performance space) to build out more interesting theories (semirings, Z-sets, etc..).

banana_feather 10 hours ago [-]
I like the author's datalog work generally, but I really wish his introductory material did not teach using binary join, which I found to get very messy internally as soon as you get away from the ideal case. I found the generic join style methods to be much, much simpler to generalize in one's head (see https://en.wikipedia.org/wiki/Worst-case_optimal_join_algori...).
davery22 9 hours ago [-]
related: McSherry's preceding blog post was all about demonstrating how binary joins can achieve worst-case optimal runtime, given suitable adjustments to the query plan.

- https://github.com/frankmcsherry/blog/blob/master/posts/2025...

kmicinski 8 hours ago [-]
For materialization-heavy workloads (program analysis, etc.), we often find that optimized binary join plans (e.g., profile-optimized, hand-optimized, etc.) beat worst-case optimal plans due to the ability to get better scalability (less locking) without the need to use a trie-based representation. Within the space of worst-case optimal plans, there are still lots of choices: but a bad worst-case optimal plan can often beat a bad (randomly-chosen) binary plan. And of course (the whole point of this exercise), there are some queries where every binary plan explodes and you do need WCOJ. There's also some work on making more traditional binary joins robust (https://db.in.tum.de/people/sites/birler/papers/diamond.pdf), among other interesting work (https://arxiv.org/html/2502.15181v1). Effectively parallelizing WCOJs is still an open problem as far as I am aware (at least, this is what folks working on it tell me), but there are some exciting potential directions in tackling that that several folks are working on I believe.
tulio_ribeiro 13 hours ago [-]
"I, a notorious villain, was invited for what I was half sure was my long-due comeuppance." -- Best opening line of a technical blog post I've read all year.

The narrator's interjections were a great touch. It's rare to see a post that is this technically deep but also so fun to read. The journey through optimizing the aliasing query felt like a detective story. We, the readers, were right there with you, groaning at the 50GB memory usage and cheering when you got it down to 5GB.

Fantastic work, both on the code and the prose.

baby 13 hours ago [-]
[flagged]
Philpax 13 hours ago [-]
No offence, but your comment violates the HN guidelines: https://news.ycombinator.com/newsguidelines.html
rnikander 9 hours ago [-]
Some Clojure fans once told me they thought datalog was better than SQL and it was a shame that the relational DBs all used SQL. I never dug into it enough to find out why they thought that way.
kragen 3 hours ago [-]
Basically Datalog is much less verbose than SQL, imposes much lighter tariffs on factoring out views, and supports transitive closure enormously better. I started http://canonical.org/~kragen/binary-relations off with a simple nonrecursive query for which the SQL translation (given below) is already criminal and whose properly factored SQL solution merits the death penalty.

Recent additions to ANSI SQL have added the capacity for recursion, so it's no longer completely impossible. But they have three big disadvantages:

1. They accidentally made SQL Turing-complete. Datalog queries, by contrast, are guaranteed to terminate.

2. They're still extremely clumsy to use.

3. Because of #1, they're often not implemented fully, so they are hard to rely on.

jitl 5 hours ago [-]
I struggle to understand the Clojure/Datomic dialect, but I agree generally. I recommend Percival for playing around with Datalog in a friendly notebook environment online: https://percival.ink/

Although there’s no “ANSI SQL” equivalent standard across Datalog implementations, once you get a hang of the core idea it’s not too hard to understand another Datalog.

I started a Percival fork that compiles the Datalog to SQLite, if you want to check out how the two can express the same thing: https://percival.jake.tl/ (unfinished when it comes to aggregates and more advanced joins but the basic forms work okay). Logica is a much more serious / complete Datalog->SQL compiler written by a Google researcher that compiles to BigTable, DuckDB, and a few other SQL dialects (https://logica.dev/).

One area Datalog is an order of magnitude easier is when working with recursive queries / rules; this is possible in SQL but feels a bit like drinking playdough through a straw. Frank’s Materialize.com has a “WITH MUTUALLY RECURSIVE” SQL form (https://materialize.com/blog/recursion-in-materialize/) that’s much nicer than the ancient ANSI SQL recursive approach, we’re evaluating it for page load queries & data sync at Notion.

Feldera has a similar form for recursive views as well (https://www.feldera.com/blog/recursive-sql-queries-in-felder...). I like that Feldera lets you make each “rule” or subview its own statement rather than needing to pack everything into a single huge statement. Main downside I found when testing Feldera is that their SQL dialect has a bunch of limitations inherited from Apache Calcite, the Materialize SQL dialect tries very hard to be PostgresSQL compatible.

ben_pfaff 2 hours ago [-]
> Main downside I found when testing Feldera is that their SQL dialect has a bunch of limitations inherited from Apache Calcite

At Feldera, we're adding features to our SQL over time, by contributing them upstream to Calcite, making it better for everyone. Mihai Budiu, who is the author of the Feldera SQL compiler, is a Calcite committer.

rienbdj 15 hours ago [-]
A new McSharry post! Excellent

Last I checked, VMWare had moved away from differential datalog?

jitl 15 hours ago [-]
The Differential Datalog team founded Feldera: https://www.feldera.com/

They switched from differential Datalog to differential SQL, I think because they realized Datalog is a really tough sell.

rebanevapustus 14 hours ago [-]
They did, and their product is great.

It is the only database/query engine that allows you to use the same SQL for both batch and streaming (with UDFs).

I have made an accessible version of a subset of Differential Dataflow (DBSP) in Python right here: https://github.com/brurucy/pydbsp

DBSP is so expressive that I have implemented a fully incremental dynamic datalog engine as a DBSP program.

Think of SQL/Datalog where the query can change in runtime, and the changes themselves (program diffs) are incrementally computed: https://github.com/brurucy/pydbsp/blob/master/notebooks/data...

gunnarmorling 12 hours ago [-]
> It is the only database/query engine that allows you to use the same SQL for both batch and streaming (with UDFs).

Flink SQL also checks that box.

jitl 9 hours ago [-]
Flink SQL is quite limited compared to Feldera/DBSP or Frank’s Materialize.com, and has some correctness limitations: it’s “eventually consistent” but until you stop the data it’s unlikely to ever be actually correct when working with streaming joins. https://www.scattered-thoughts.net/writing/internal-consiste...
rebanevapustus 11 hours ago [-]
Not true.

There has to be some change in the code, and they will not share the same semantics (and perhaps won't work when retractions/deletions also appear whilst streaming). And let's not even get to the leaky abstractions for good performance (watermarks et al).

29athrowaway 13 hours ago [-]
If you wish to use Datalog and Rust, cozodb is written in Rust and has a Datalog query syntax.
jitl 12 hours ago [-]
Cozodb seems cool but also inactive. I poked around about in November 2024 and found some low hanging fruit in the sqlite storage backend: https://github.com/cozodb/cozo/issues/285
29athrowaway 7 hours ago [-]
It's not a lot of code so it's easy to tinker with.
rc00 15 hours ago [-]
Posted 1 day ago

https://news.ycombinator.com/item?id=44274592