HashQL: Research Preview

A hypergraph query language that supports multiple, independent time dimensions — enabling reasoning over how data evolves and when it was known

May 20th, 2026

Bilal MahmoudMember of Technical Staff, HASH

Yesterday we introduced hgres. Today we're releasing the first research preview of HashQL, the query language we've been building alongside it.

Why HashQL

Building hgres put us in an awkward spot. It sits at the intersection of three things that are usually kept apart:

Graph databases capable of representing data as a knowledge graph — an abstraction that lends itself well to modeling the real world;
A strong type system that encodes not only constraints but also semantic meaning; and
Temporality: the idea that each vertex in the graph, be it ontology or knowledge, is bound to a specific point in time across two independent time axes.

No existing query language gives you all three at once. The usual workaround is to bolt them on through extensions or to glue a couple of languages together with application code; however both options leak abstractions, temporal axes tend to get lost, and queries that sound simple become computationally infeasible. HashQL exists because we needed a single language in which temporality, typing, and graph structure are all first-class — not merely hacky extensions on top of something else.

What is HashQL?

HashQL is a statically typed, functional query language. Whereas most query languages produce strings that databases parse and plan at runtime, HashQL queries are programs that go through a compiler — either at query time or ahead of time — which lets us apply the same techniques you'd expect from any other compiled language to reason about a query's structure before any of it executes.

At its core HashQL has a type system that extends SemType with closures, unions, intersections, and generic containers. It's structural by default in the TypeScript sense, with nominal typing available through newtypes — no branded types, no awkward indexing tricks. A Hindley-Milner style inference engine over subtyping constraints handles the bookkeeping, so most queries need no annotations at all; the compiler derives types from usage.

Normal declarative query languages express traversals through pattern matching, turning every query into a subgraph isomorphism problem. These become difficult to deal with once adding multiple time axes, with temporal traversals hard to express clearly in terms of patterns. HashQL takes a traversal-based (rather than declarative) approach, similar to that found in the Gremlin query language, helping us avoid these problems: every traversal is just a sequence of calls to traversal operators, and a query is just a program.

In HashQL you describe the traversal directly: start from a set of entities, filter, transform, follow links, aggregate.

head::entities(temporal_axes)
  |> body::filter(entity -> `https://.../person/v/1` in entity.entity_types)
  |> body::filter(entity -> entity.properties.`https://.../age` > 30)
  |> tail::collect

head::entities establishes the temporal slice and vertex type. Each body::filter closure receives a typed entity value and returns a boolean. The compiler verifies that the property paths exist on the declared type and that the comparison is valid. tail::collect materializes the result. Because closures are first-class, predicates compose and transformations are reusable... with aggregations, transformations, and filters all forming part of the same typed program, with no separation between the querying language and that used to compute over the results.

All interaction with the graph is isolated by the effect type Graph<T>. Graph operations compose but are opaque — only the host runtime can execute them — so ordinary expressions stay side-effect-free and equational reasoning is preserved, while the compiler keeps full control over when and where graph operations execute.

But typing a query is only half the problem. The other half is what to do when the data the query touches lives in more than one place.

The heart of the problem: heterogeneous data

In practice, no single database handles every shape of data well: relational data may belong in PostgreSQL, embeddings in a dedicated vector store, and time-series data somewhere else again. Instead of trying to build "the ultimate datastore" that does everything, hgres and HashQL treat specialization as the starting point, rather than as an enemy to be eliminated.

But that raises the question: how do we query data that's spread across multiple backends? One answer is to push that complexity to the user, make them specify where each piece of data comes from, wire up different APIs, write different query syntaxes. This is how most systems work today. But we started asking ourselves: what if the user didn't need to decide? What if we could query data transparently, regardless of where it lives? And if we can do that, can we colocate computation with data, minimizing the transfer and computational overhead?

After three years of research and experimentation we now know the answer to be "yes". While HashQL began as a query language, the same compiler-first principles also let us assign computation transparently across heterogeneous execution targets, each with its own capabilities, performance characteristics, and comparative advantages. None of this complexity has to be surfaced to query authors. Rather, at a system-level, you define where data lives, how it can be accessed, and HashQL takes care of the rest. The remainder of this post explains how.

The compiler

Instead of generating query strings, HashQL compiles. Similar to languages such as Rust, HashQL moves through a chain of intermediate representations:

A parser produces an abstract syntax tree (AST). The language is deliberately syntax-agnostic, so different surface syntaxes can target the same compiler.
The AST is then lowered into a higher-level intermediate representation (HIR) that preserves the programmer's intent and drives type checking and inference.
The HIR is lowered again in turn into a middle-level IR (MIR), and the shape of the language shifts. HashQL has no statements at the source level, only expressions, but the MIR is statement-based and laid out as a control-flow graph in SSA form. The flat representation buys us two things: standard compiler optimizations (constant propagation, dead store elimination, inlining, CFG simplification) become directly applicable, and target placement can operate at instruction-level granularity. The MIR is the same kind of representation that systems languages compile through. Each basic block is a sequence of statements ending in a single terminator, and graph effects are modeled as terminators that suspend execution and yield to an orchestrator. Because effects are control-flow boundaries, the same representation does double duty for optimization, placement, and runtime dispatch, without needing special cases for any of them.
The MIR is lowered and optimized a final time, and the question of where to run each individual piece of the query — not the program as a whole, but its constituent blocks — is now ready to be answered.

Source

Surface syntax

Parse

AST

Syntax-agnostic tree

Lower

HIR

Programmer intent; type inference

Lower

MIR

SSA control-flow graph; graph effects as terminators

Optimize

Opt MIR

Const-prop, DSE, inlining

Solve

Placement

Cost-driven backend assignment

Codegen

Islands

Per-backend artifacts + orchestrator plan

HashQL compiler pipeline: source code is parsed into an AST, lowered into a higher-level intermediate representation that drives type inference, lowered again into a middle-level IR in SSA form, optimized, partitioned by a placement solver, and finally lowered into per-backend execution islands together with an orchestrator plan.

Transparent placement

Placement analysis assigns each basic block to the backend that can execute it most efficiently. Not every backend can run every instruction: PostgreSQL handles arithmetic, comparisons, and aggregate construction, but rejects closures and function calls; the interpreter handles the full instruction set, but at higher per-instruction cost. These capability constraints are hard — if a statement produces a value a backend cannot represent, every downstream consumer of that value is also ineligible on that backend. Correctness defines the feasible space; within it, a cost model guides the solver toward the cheapest assignment, approximating ideal placement cost along three axes:

Computational cost. The cost of executing a given operation on a given backend. PostgreSQL evaluating a comparison inside the database engine is far cheaper than the interpreter doing the same work per-row.
Transfer cost. The cost of moving data between backends at every transition: serialization, deserialization, and bandwidth.
Data locality. The cost of retrieving vertex data from the graph when the data originates on a different backend than the one executing.

Separating data locality from transfer cost may seem like double-counting at first, but it isn't. Data locality is a property of the backend target we execute on, whereas transfer cost arises universally at every backend transition regardless of where the data originated. The full reasoning is explored further in the linked documents below.

A solver applies the cost model and partitions the MIR into execution islands: connected groups of same-backend blocks that get dispatched as a unit. Each island is then handed to a target-specific code generator — similar to how Rust hands its MIR off to LLVM, except we have one backend per execution target, rather than one per CPU architecture... and each target produces its own artifacts. While that process might sound slow, the whole pipeline from MIR optimization through solving to code generation takes just 0.72% of total pipeline time, and once the artifacts and island graph are produced the result is cached and reused without recompilation.

Execution

Finally, at runtime, an orchestrator coordinates execution across backends. Two details here are worth flagging.

First, PostgreSQL is outgoing-only in our transition model: execution can exit from PostgreSQL into another backend, but no backend can transition back into PostgreSQL. That constraint allows us to guarantee that every PostgreSQL island forms an entry-side prefix of any execution path. The head determination (which entities match the temporal slice) and all PostgreSQL island evaluation therefore collapse into a single prepared statement executed in just one database round-trip.

Second, each row that comes back from PostgreSQL carries a continuation: a triple-valued filter where TRUE means the row was accepted inside the database, FALSE means it was rejected (and the WHERE clause has already filtered it out), and NULL means execution needs to continue elsewhere. NULL rows additionally carry the block to resume at and the live-out state. In practice, most of the work happens inside PostgreSQL; the interpreter only picks up what PostgreSQL cannot express — closures, complex control flow, and operations without a SQL translation.

When the interpreter encounters a graph effect — a traversal that needs the orchestrator — it yields a suspension. The orchestrator fulfills the request, resumes the interpreter, and the cycle continues. Only the vertex fields that downstream islands actually need are transferred, determined statically by traversal-aware liveness analysis performed during compilation.

Execution island

PostgreSQL island

Closure-free arithmetic, comparisons, aggregate construction. Executes as a single prepared statement, one round-trip.

WHERE entity_type = 'person'
  AND properties->'age' > 30
RETURN row, continuation,
       resume_at, live_out;

Execution island

Interpreter island

Closures, complex control flow, graph effects. Picks up where Postgres cannot continue.

resume bb_03 {
  for entity in live_out:
    let related = traverse(entity, link)
    if predicate(related): emit(entity)
}

Row continuation column

Row

Continuation

Meaning

row 1

TRUE

accepted inside Postgres; row continues

row 2

FALSE

rejected by the WHERE clause; row dropped

row 3

NULL

needs interpreter; carries resume_at + live-out state

A small MIR fragment partitioned into a PostgreSQL island and an interpreter island. The Postgres island returns each row with a triple-valued continuation column: TRUE means accepted in Postgres, FALSE means dropped by the WHERE clause, NULL means the row must continue execution in the interpreter and carries resume-at and live-out state with it.

Where this comes from

HashQL started as a research project at the Chair for Compiler Construction at TU Dresden. The language design, type system, and compiler frontend are the subject of "A Processing Pipeline for Querying Bi-temporal Graph Databases.", while everything else (from the MIR and placement through to code generation and orchestration) are explored in "An Extended Evaluator and Backend for Querying Bi-temporal Graph Databases." Both papers are publicly available and go into substantially more detail than this post can.

The hgres.org/hashql documentation sits between this post and the theses: it covers the type system, compiler pipeline, target placement, and code generation and execution at a level that's meant to be read without academic context.

What's next

This is a research preview, not a 1.0 release of HashQL. The version of HashQL launched today supports read queries with filter and collect operations across two backends. Next up, we'll be adding support for mutation operations (create, update, and delete with transactional semantics), additional execution backends beyond PostgreSQL and the interpreter, a lower-level IR for backend-specific optimization, and further passes in the canonicalization pipeline. Adding new backends is cheap by design, requiring adding only cost entries and capability constraints, rather than restructuring the solver itself.

Get involved

HashQL is open-source as part of hgres and welcomes contributions. Try it out, build something on top of it, and tell us what breaks — either by getting in touch or by opening an issue or discussion in the HASH repository.

Join our community of HASH developers

Explore open rolesCareers

Star on GitHubGitHub1.6k

Get in touchContact

HashQL: Research Preview

Why HashQL

What is HashQL?

The heart of the problem: heterogeneous data

The compiler

Transparent placement

Execution

Where this comes from

What's next

Get involved

Join our community of HASH developers

Learn

Projects

Get Involved

HashQL: Research Preview

Why HashQL

What is HashQL?

The heart of the problem: heterogeneous data

The compiler

Transparent placement

Execution

Where this comes from

What's next

Get involved

Get new posts in your inbox

Join our community of HASH developers