A hypergraph query language that supports multiple, independent time dimensions — enabling reasoning over how data evolves and when it was known
May 20th, 2026
Yesterday we introduced hgres. Today we're releasing the first research preview of HashQL, the query language we've been building alongside it.
Building hgres put us in an awkward spot. It sits at the intersection of three things that are usually kept apart:
No existing query language gives you all three at once. The usual workaround is to bolt them on through extensions or to glue a couple of languages together with application code; however both options leak abstractions, temporal axes tend to get lost, and queries that sound simple become computationally infeasible. HashQL exists because we needed a single language in which temporality, typing, and graph structure are all first-class — not merely hacky extensions on top of something else.
HashQL is a statically typed, functional query language. Whereas most query languages produce strings that databases parse and plan at runtime, HashQL queries are programs that go through a compiler — either at query time or ahead of time — which lets us apply the same techniques you'd expect from any other compiled language to reason about a query's structure before any of it executes.
At its core HashQL has a type system that extends SemType with closures, unions, intersections, and generic containers. It's structural by default in the TypeScript sense, with nominal typing available through newtypes — no branded types, no awkward indexing tricks. A Hindley-Milner style inference engine over subtyping constraints handles the bookkeeping, so most queries need no annotations at all; the compiler derives types from usage.
Normal declarative query languages express traversals through pattern matching, turning every query into a subgraph isomorphism problem. These become difficult to deal with once adding multiple time axes, with temporal traversals hard to express clearly in terms of patterns. HashQL takes a traversal-based (rather than declarative) approach, similar to that found in the Gremlin query language, helping us avoid these problems: every traversal is just a sequence of calls to traversal operators, and a query is just a program.
In HashQL you describe the traversal directly: start from a set of entities, filter, transform, follow links, aggregate.
head::entities(temporal_axes)
|> body::filter(entity -> `https://.../person/v/1` in entity.entity_types)
|> body::filter(entity -> entity.properties.`https://.../age` > 30)
|> tail::collect
head::entities establishes the temporal slice and vertex type. Each body::filter closure receives a typed entity value and returns a boolean. The compiler verifies that the property paths exist on the declared type and that the comparison is valid. tail::collect materializes the result. Because closures are first-class, predicates compose and transformations are reusable... with aggregations, transformations, and filters all forming part of the same typed program, with no separation between the querying language and that used to compute over the results.
All interaction with the graph is isolated by the effect type Graph<T>. Graph operations compose but are opaque — only the host runtime can execute them — so ordinary expressions stay side-effect-free and equational reasoning is preserved, while the compiler keeps full control over when and where graph operations execute.
But typing a query is only half the problem. The other half is what to do when the data the query touches lives in more than one place.
In practice, no single database handles every shape of data well: relational data may belong in PostgreSQL, embeddings in a dedicated vector store, and time-series data somewhere else again. Instead of trying to build "the ultimate datastore" that does everything, hgres and HashQL treat specialization as the starting point, rather than as an enemy to be eliminated.
But that raises the question: how do we query data that's spread across multiple backends? One answer is to push that complexity to the user, make them specify where each piece of data comes from, wire up different APIs, write different query syntaxes. This is how most systems work today. But we started asking ourselves: what if the user didn't need to decide? What if we could query data transparently, regardless of where it lives? And if we can do that, can we colocate computation with data, minimizing the transfer and computational overhead?
After three years of research and experimentation we now know the answer to be "yes". While HashQL began as a query language, the same compiler-first principles also let us assign computation transparently across heterogenous execution targets, each with its own capibilities, performance characters, and comparative advantages. None of this complexity has to be surfaced to query authors. Rather, at a system-level, you define where data lives, how it can be accessed, and HashQL takes care of the rest. The remainder of this post explains how.
Instead of generating query strings, HashQL compiles. Similar to languages such as Rust, HashQL moves through a chain of intermediate representations:
Placement analysis assigns each basic block to the backend that can execute it most efficiently. Not every backend can run every instruction: PostgreSQL handles arithmetic, comparisons, and aggregate construction, but rejects closures and function calls; the interpreter handles the full instruction set, but at higher per-instruction cost. These capability constraints are hard — if a statement produces a value a backend cannot represent, every downstream consumer of that value is also ineligible on that backend. Correctness defines the feasible space; within it, a cost model guides the solver toward the cheapest assignment, approximating ideal placement cost along three axes:
Separating data locality from transfer cost may like double-counting at first, but it isn't. Data locality is a property of the backend target we execute on, whereas transfer cost arises universally at every backend transition regardless of where the data originated. The full reasoning is explored further in the linked documents below.
A solver applies the cost model and partitions the MIR into execution islands: connected groups of same-backend blocks that get dispatched as a unit. Each island is then handed to a target-specific code generator — similar to how Rust hands its MIR off to LLVM, except we have one backend per execution target, rather than one per CPU architecture... and each target produces its own artifacts. While that process might sound slow, the whole pipeline from MIR optimization through solving to code generation takes just 0.72% of total pipeline time, and once the artifacts and island graph are produced the result is cached and reused without recompilation.
Finally, at runtime, an orchestrator coordinates execution across backends. Two details here are worth flagging.
First, PostgreSQL is outgoing-only in our transition model: execution can exit from PostgreSQL into another backend, but no backend can transition back into PostgreSQL. That constraint allows us to guarantee that every PostgreSQL island forms an entry-side prefix of any execution path. The head determination (which entities match the temporal slice) and all PostgreSQL island evaluation therefore collapse into a single prepared statement executed in just one database round-trip.
Second, each row that comes back from PostgreSQL carries a continuation: a triple-valued filter where TRUE means the row was accepted inside the database, FALSE means it was rejected (and the WHERE clause has already filtered it out), and NULL means execution needs to continue elsewhere. NULL rows additionally carry the block to resume at and the live-out state. In practice, most of the work happens inside PostgreSQL; the interpreter only picks up what PostgreSQL cannot express — closures, complex control flow, and operations without a SQL translation.
When the interpreter encounters a graph effect — a traversal that needs the orchestrator — it yields a suspension. The orchestrator fulfills the request, resumes the interpreter, and the cycle continues. Only the vertex fields that downstream islands actually need are transferred, determined statically by traversal-aware liveness analysis performed during compilation.
HashQL started as a research project at the Chair for Compiler Construction at TU Dresden. The language design, type system, and compiler frontend are the subject of "A Processing Pipeline for Querying Bi-temporal Graph Databases.", while everything else (from the MIR and placement through to code generation and orchestration) are explored in "An Extended Evaluator and Backend for Querying Bi-temporal Graph Databases." Both papers are publicly available and go into substantially more detail than this post can.
The hgres.org/hashql documentation sits between this post and the theses: it covers the type system, compiler pipeline, target placement, and code generation and execution at a level that's meant to be read without academic context.
This is a research preview, not a 1.0 release of HashQL. The version of HashQL launched today supports read queries with filter and collect operations across two backends. Next up, we'll be adding support for mutation operations (create, update, and delete with transactional semantics), additional execution backends beyond PostgreSQL and the interpreter, a lower-level IR for backend-specific optimization, and further passes in the canonicalization pipeline. Adding new backends is cheap by design, requiring adding only cost entries and capability constraints, rather than restructuring the solver itself.
HashQL is open-source as part of hgres and welcomes contributions. Try it out, build something on top of it, and tell us what breaks — either by getting in touch or by opening an issue or discussion in the HASH repository.
Get notified when new long-reads and articles go live. Follow along as we dive deep into new tech, and share our experiences. No sales stuff.