Dev Blog/Technologies, HASH

Multi-temporal versioning in Postgres

Implementing an n-temporal datastore from scratch atop PostgreSQL

December 15th, 2024

Tim DiekmannSenior Platform Engineer, HASH

Versioned Datastores

Datastores like databases, file systems, or version control systems often include a way to store the date and time when a record was added or its last modified timestamp. Some datastores even store timestamps on different timelines, otherwise referred to as a temporal axis. For example, an application which supports offline editing might want to record the moment when a user edited a document, versus the time that the edit was submitted to the backend.

Storing this information is particularly useful for applications with a heavy focus on data, where it’s important to have confidence in how up-to-date information is. Our vision of HASH is a highly-featured application that includes:

collaborative realtime editing;
offline editing; and
an auditable and explainable datastore.

These are all features which end up benefiting from a multi-temporal versioning model in the application’s datastore (i.e. a model which utilizes multiple temporal axes). To understand this better, in this post we explore why some applications need versioning at all, and how you can extend that model to multiple time axes. We think this is of general interest to application builders. After that, for anybody interested in the nitty-gritty, we’ll do a deeper-dive into how we actually implemented our system in Postgres, for HASH, and some of the challenges involved.

What is a Versioned Datastore?

Let’s assume we have a datastore, which only contains the current state of entities. When someone updates the content of a paragraph in such a datastore, it overwrites the previous one, erasing that information forever.

If someone else wants to revert that update, they could rewrite the paragraph with the same content as the original. However, this is not a true reversion of the update, as the metadata associated with the contents, such as the user who wrote the content, may not match. Determining the author of the original contents of the paragraph may not be possible.

An alternative to this is an append-only or versioned datastore, which marks the previous revision as obsolete but does not physically remove it from the database. This allows for the possibility of later recovering outdated records if necessary, maintaining the integrity of previous data while still allowing new data to be added. Additionally, it ensures that all changes can be tracked, audited, or easily be reverted to a previous state, which is commonly used to create a undo-history.

We want HASH to provide a clear record of the history of all data in the system, including which users created or updated it. Therefore having a versioned datastore, can help maintain the integrity of data, prevent data loss and maintain a clear provenance of updates.

Temporal Datastores

A clear extension of a versioned datastore is to store the time at which records were inserted, therefore making it temporally versioned. These times are commonly referred to as “transaction time” or “system time”. This makes it possible to not only track the changes itself but also track the time when something has been changed. This auditing possibility is very commonly used in version control systems like git.

Moreover, having temporal versioning allows for more accurate analysis of the data over time. This is particularly useful in fields such as finance, where a complete history of transactions is crucial to understanding trends and patterns. Additionally, it provides a way to recover lost data, as it allows for the reconstruction of a database as it existed at a given point in time. Overall, the benefits of temporal versioning go well beyond simple tracking of changes, making it a valuable tool in various scenarios.

One way of doing is to store a single insertion timestamp, but we can additionally store the time when a row was superseded. This avoids a query needing to look for the actual row of interest and searching for the row which supersedes it, when wanting to know the range of time over which a row was active.

These timestamps form an interval, where all intervals for a record are non-overlapping, which means that for any given point in time, you will have at most one unique active revision of the record.

To create the interval the inserted_at and superseded_at columns are added to the database schema. The inserted_at field represents the time when the record was inserted into the database, while the superseded_at field represents the time when the record ceased to be the latest inserted record, which will be the time that the next record was inserted_at. If the record was never superseded, the superseded_at contains a sentinel value, such as NULL or is set to infinity (∞).

Taking the example of a company recording sales figure for its coffee product, the revisions of the entity might look like this:

`product`	`inserted_at`	`superseded_at`	`sales`
Coffee	10:00	10:10	500
Coffee	10:10	10:32	600
Coffee	10:32	∞	635

Now, it is possible to make precise queries to the database using a specific time as a reference. For example, we can determine that the database’s record of the sales figures at 10:31 was 600, and that this information will be superseded at 10:32.

Data is no longer lost when changing records, but we only know when the database was updated – we can’t tell when the user took action to create an update, which might take time to reach the database, for example if they were editing offline, or if some central process coordinates updates from multiple users as part of collaborative editing. In order to know when a user took action, we need another time axis.

Bi-temporal Datastores

Similar to the above approach, we add another time axis to the database – the “decision time” – which results in two new columns to the table: decided_at and decision_superseded_at.

Another common axis is the “valid time”, which is also often referred to as the “application time” but the time at which something was valid in the real world must be specified by the user, and would require users to manually maintain valid time ranges. There are also entities where “real world validity” is not an applicable concept. We therefore choose to leave valid time as a concept that users could encode in entity properties as and when it was appropriate – for example, the valid time for an employment relationship between a person and company – rather than making a it a native feature of the datastore (for now). The decision time axis has an interesting property: It’s always before the transaction time as it’s not possible to decide to write something to the store after it was written to the store.

Representing two dimensions in a database requires the time axis to be represented as a Cartesian product, making the representation more complex than the previous approach. To help understand this, we'll go through the stream of events step by step. For the sake of simplicity, let's assume that the data was sent to the database 5 minutes after the user input it, so the decision time is 5 minutes ahead of the transaction time.

At 10:00 the entity was added with 500 sales. which was inserted into the database 5 minutes later.

# product decided_at decision_superseded_at inserted_at superseded_at sales
1 Coffee 10:00 ∞ 10:05 ∞ 500

Initially both upper bounds are unbounded: the data has not been superseded, yet.
At 10:10 a user added a new revision of the entity with 600 sales, which was inserted at 10:15 into the database.

# product decided_at decision_superseded_at inserted_at superseded_at sales
1 Coffee 10:00 ∞ 10:05 10:15 500
2 Coffee 10:00 10:10 10:15 ∞ 500
3 Coffee 10:10 ∞ 10:15 ∞ 600

Adding a new revision to a bi-temporal table requires two new rows to be able to represent all possible permutations. A given decision time may have a transaction interval during which the decision was not superseded, which has to be recorded, as well as a transaction interval during which it was superseded. Even that transaction interval may be superseded if an out-of-order decision arrives later.

With the exception of the most recent decision transacted at the most recent time, for every row with an unbounded decision interval we would expect to see a bounded transaction interval, and vice versa, except where an out-of-order decision is transacted, which causes bounded intervals on both axes. The latter case will be demonstrated later.

The table above shows that (by row number):
1. We had a record of a decision to record 500 sales at 10:00 that was not superseded, but this record is no longer current in the database (the transaction was superseded_at 10:15).
2. We have a record of a decision to record 500 sales at 10:00 that was then superseded at 10:10, the database recorded this at 10:15, and it is still the database's current view of the decision about the entity between 10:00 and 10:15 (the transaction interval does not have an upper bound).
3. We have a record of a decision to record 600 sales at 10:10, this decision has not been superseded, and it is the current database view of the current decision (neither interval has an upper bound).
We cannot simply set Row 1's decision_superseded_at to 10:10 when the next decision is made, because this would be an inaccurate representation of the database state in the 10:05 to 10:15 transaction interval (the decision was not superseded according to the database during this time).

#	`product`	`decided_at`	`decision_superseded_at`	`inserted_at`	`superseded_at`	`sales`
1	Coffee	10:00	∞	10:05	∞	500

At 10:32 the entity was updated again with 635 sales, at 10:37 the database was notified about this change.

#	`product`	`decided_at`	`decision_superseded_at`	`inserted_at`	`superseded_at`	`sales`
1	Coffee	10:00	∞	10:05	10:15	500
2	Coffee	10:00	10:10	10:15	∞	500
3	Coffee	10:10	∞	10:15	10:37	600
4	Coffee	10:10	10:32	10:37	∞	600
5	Coffee	10:32	∞	10:37	∞	635

We had a record of a decision to record 500 sales at 10:00 that was not superseded, but this record is no longer current in the database (the transaction was superseded_at 10:15).
We have a record of a decision to record 500 sales at 10:00 that was then superseded at 10:10, the database recorded this at 10:15, and it is still the database's current view of the decision about the entity between 10:00 and 10:15 (the transaction interval does not have an upper bound).
We have a record of a decision to record 600 sales at 10:10, this decision has not been superseded, and it is the current database view of the current decision (neither interval has an upper bound).

We cannot simply set Row 1’s decision_superseded_at to 10:10 when the next decision is made, because this would be an inaccurate representation of the database state in the 10:05 to 10:15 transaction interval (the decision was not superseded according to the database during this time).

At 10:32 the entity was updated again with 635 sales, at 10:37 the database was notified about this change.

#	`product`	`decided_at`	`decision_superseded_at`	`inserted_at`	`superseded_at`	`sales`
1	Coffee	10:00	∞	10:05	10:15	500
2	Coffee	10:00	10:10	10:15	∞	500
3	Coffee	10:10	∞	10:15	10:37	600
4	Coffee	10:10	10:32	10:37	∞	600
5	Coffee	10:32	∞	10:37	∞	635

We had a record of a decision to record 500 sales at 10:00 that was not superseded, but this record is no longer current in the database (the transaction was superseded_at 10:15).
We have a record of a decision to record 500 sales at 10:00 that was then superseded at 10:10, the database recorded this at 10:15, and it is still the database's current view of the decision about the entity between 10:00 and 10:15 (the transaction interval does not have an upper bound).
We have a record of a decision to record 600 sales at 10:10, this decision has not been superseded, and it is the current database view of the current decision (neither interval has an upper bound).

We cannot simply set Row 1's decision_superseded_at to 10:10 when the next decision is made, because this would be an inaccurate representation of the database state in the 10:05 to 10:15 transaction interval (the decision was not superseded according to the database during this time).

At 10:32 the entity was updated again with 635 sales, at 10:37 the database was notified about this change.

#	`product`	`decided_at`	`decision_superseded_at`	`inserted_at`	`superseded_at`	`sales`
1	Coffee	10:00	∞	10:05	10:15	500
2	Coffee	10:00	10:10	10:15	∞	500
3	Coffee	10:10	∞	10:15	10:37	600
4	Coffee	10:10	10:32	10:37	∞	600
5	Coffee	10:32	∞	10:37	∞	635

We had a record of a decision to record 500 sales at 10:00 that was not superseded, but this record is no longer current in the database (the transaction was superseded_at 10:15).
We have a record of a decision to record 500 sales at 10:00 that was then superseded at 10:10, the database recorded this at 10:15, and it is still the database's current view of the decision about the entity between 10:00 and 10:15 (the transaction interval does not have an upper bound).
We have a record of a decision to record 600 sales at 10:10, this decision has not been superseded, and it is the current database view of the current decision (neither interval has an upper bound).

At 10:32 the entity was updated again with 635 sales, at 10:37 the database was notified about this change.

#	`product`	`decided_at`	`decision_superseded_at`	`inserted_at`	`superseded_at`	`sales`
1	Coffee	10:00	∞	10:05	10:15	500
2	Coffee	10:00	10:10	10:15	∞	500
3	Coffee	10:10	∞	10:15	10:37	600
4	Coffee	10:10	10:32	10:37	∞	600
5	Coffee	10:32	∞	10:37	∞	635

Similar to the previous update, the information in row 3 has to be updated that the superseded transaction time is at 10:37.

Now we have three rows with transaction intervals without an upper bound, because we have three decision time intervals. We have two rows (1 and 3) with a fully bounded transaction interval, because now in the the database those decision intervals are no longer accurate.

As the time axes act as identifiers both axes has to be specified to precisely identify an entity. To read the the most up-to-date information it’s possible to simply use the current timestamp, which in the above case would return row 5: 635 sales.

Alternatively, one could specify the current timestamp for transaction time only in order to receive a history of decisions that were made according to the database’s latest view, which in this case would be rows 2, 4 and 5.

An incredibly useful feature in multi-temporal datastores is to not only allow appending data but being able to update records in the past. This means that it’s possible to correct earlier data while fully preserving audit functionality. This is powerful for offline editing, for example, if the transaction to the database occurred much later than the actual decision to add the data.

To demonstrate this, let’s assume that at 11:00 a much-delayed update with a decision time of 10:15 to record 550 sales hit the database. With a bi-temporal store this can be captured, and the resulting table would look like:

#	`product`	`decided_at`	`decision_superseded_at`	`inserted_at`	`superseded_at`	`sales`
1	Coffee	10:00	∞	10:05	10:15	500
2	Coffee	10:00	10:10	10:15	∞	500
3	Coffee	10:10	∞	10:15	10:37	600
4	Coffee	10:10	10:32	10:37	11:00	600
5	Coffee	10:32	∞	10:37	∞	635
6	Coffee	10:10	10:15	11:00	∞	600
7	Coffee	10:15	10:32	11:00	∞	550

The modifications from the previous table work as follows:

Row 4 was the current database’s view of the decision that applied between 10:10 and 10:32, i.e. the decision interval containing 10:15. Since we now have a different decision landing during that interval, we record a superseded_at time of 11:00 for the row (when the update hit).
We need to account for the 600 sales that Row 4 had recorded as being decided between 10:10 and 10:32, but we now know only applied until 10.15 (the decision time of the delayed update), so we add Row 6 to capture this.
Finally, we add Row 7 to capture the information from our update that from 10:15 to 10:32 the sales were in fact decided as 550. This row has a decision_superseded_at of 10:32 because we already had row 5 capturing a later decision to change sales to 635.

If queried for rows at the latest transaction time, the database would return the history of decisions represented by rows 2, 5, 6 and 7 – in the previous example row 4 was included, but the database now knows that row 4’s decision time was superseded earlier than it previously recorded.

We use a bitemporal datastore in HASH, recording the transaction time and decision time for every update, making it possible to determine how an entity must have appeared to users at a given point in time, allowing for a more accurate representation of the entity’s history and changes over time.

More Dimensions

A temporal datastore can have an unlimited number of possible dimensions. However, each additional dimension significantly increases the space required to store the data, as illustrated by the bi-temporal approach.

It's worth noting that the implementation for two dimensions already adds significant complexity to the implementation. It's also important to note that the ordering of temporal axes and their associated dimensions is semantically significant. For example, the transaction time cannot precede the decision time, since data cannot be added to the database before a decision has been made to store it.

There are many possible combinations of dimensions in a temporal datastore. It's perfectly valid to use a bi-temporal datastore that includes e.g. valid time and decision time, instead of decision time and transaction time.

We’re now going to get into the detail of how we implemented bi-temporal axes in Postgres. This requires a reasonable understanding of advanced SQL features.

Implementation of a Bi-Temporal Datastore in PostgreSQL

At the time of implementing our initial solution (2022), Postgres plugins for uni-temporal modeling were available, but none existed for adding multi-temporal tables. This may still be the case as of this post’s publication date, but no comprehensive check has been made. Due to the lack of existing solutions, we wrote the required code ourselves.

Database Structure

We require a bi-temporal datastore with the decision time and transaction time as our two axes. Our goal is to store entities and all of their editions. Due to the multi-dimensional nature of multi-temporal tables, they take up much more space than uni-temporal tables. Therefore, we split our data into three main tables.

Entity IDs

The entity_ids table only holds the identifiers of entities and is used as a foreign key constraint:

-- Only holds the entity identifier to allow the use of foreign key constraints
CREATE TABLE IF NOT EXISTS
	"entity_ids" (
		"entity_id" SERIAL PRIMARY KEY
	);

Entity Editions

To hold the actual data of entities, entity_editions is created, which holds the actual data of an entity. We simplify the definition by assuming that entities only consist of a JSON blob called properties:

-- Holds an entity properties snapshot
CREATE TABLE IF NOT EXISTS
	"entity_editions" (
		"entity_edition_id" SERIAL PRIMARY KEY,
		"properties" JSONB NOT NULL
	);

Temporal Metadata

The heart of the temporal versioned datastore is the actual temporal data, entity_temporal_metadata, which associates the temporal axes with an entity edition identifier and the entity id. Postgres has built-in datatypes for time ranges which support various functions such as intersections and overlapping checks. This means that instead of having two columns for each temporal axis, only a single column is required:

-- The temporal metadata associated with an entity edition and its identifier
CREATE TABLE IF NOT EXISTS
	"entity_temporal_metadata" (
		"entity_id" SERIAL REFERENCES "entity_ids",
		"entity_edition_id" SERIAL REFERENCES "entity_editions",
		"decision_time" tstzrange NOT NULL,
		"transaction_time" tstzrange NOT NULL
	);

Temporal datastores comes with some constraints. In particular, with the given axes it’s possible to restrict the temporal data such that the decision start time always has to be strictly lower or equal to the transaction start time:

-- It's not allowed to have a higher decision start time than the transaction start time
ALTER TABLE "entity_temporal_metadata"
	ADD CHECK (LOWER(decision_time) <= LOWER(transaction_time));

In addition, for a given entity id it must not be possible to have overlapping intervals for both axes. Postgres comes with a btree_gist extension to achieve this precise constraint.

CREATE EXTENSION IF NOT EXISTS "btree_gist";

-- Ensure that no entity editions belonging to the same entity overlap in time
-- `&&` is the overlap operator for ranges
ALTER TABLE "entity_temporal_metadata"
	ADD CONSTRAINT "entity_temporal_metadata_overlapping"
	EXCLUDE USING gist (
		entity_id WITH =,
		decision_time WITH &&,
		transaction_time WITH &&
	);

Creation and Updating of Entities

Since the publication of this blog post, we have moved away from implementing the logic in PL/pgSQL and implemented it in the application code directly. The application logic is easier to maintain and test. In addition, it’s easier to handle errors which arise in application code than handling different kinds of exceptions from Postgres.

Creating and updating entities should be as straightforward as possible. It should not be possible to manually set the transaction time, and if the decision time is not specified it should default to the transaction time. Due to potential race condition in the updating logic and to avoid dealing with these constraints every time we create update_entity as a PL/pgSQL (Procedural Language/PostgreSQL) function. For consistency, we also create a create_entity PL/pgSQL function:

CREATE OR REPLACE FUNCTION "create_entity" (
  "_properties" JSONB,
  "_decision_time" TIMESTAMP WITH TIME ZONE
) RETURNS SETOF "entity_temporal_metadata" AS $pga$
    BEGIN
      IF _decision_time IS NULL THEN _decision_time := now(); END IF;

      RETURN QUERY WITH
        -- The new entity identifier ...
        entity_id AS (
          INSERT INTO entity_ids (
            entity_id
          ) VALUES (
            DEFAULT
          ) RETURNING entity_ids.entity_id
        ),
        -- ... and the new entity edition is inserted first ...
        entity_edition AS (
          INSERT INTO entity_editions (
            entity_edition_id,
            properties
          ) VALUES (
            DEFAULT,
            _properties
          ) RETURNING entity_editions.entity_edition_id
        )
      -- ... and then the temporal metadata with the new identifiers.
      INSERT INTO entity_temporal_metadata (
        entity_id,
        entity_edition_id,
        decision_time,
        transaction_time
      ) VALUES (
        (SELECT entity_id.entity_id FROM entity_id),
        (SELECT entity_edition.entity_edition_id FROM entity_edition),
        tstzrange(_decision_time, NULL, '[)'),
        tstzrange(now(), NULL, '[)')
      ) RETURNING entity_temporal_metadata.*;
    END
    $pga$ VOLATILE LANGUAGE plpgsql;

The function requires the properties associated with the entity and an optional decision time, and returns the newly inserted entity_temporal_metadata row. If a decision time is not provided, the current time is used for both the decision time and transaction time. A new entity id is inserted into its corresponding table and the properties are then added to entity_editions. Finally, both newly added identifiers are inserted with the given decision time and current time as transaction time, with both time ranges having an unbounded upper bound until the entity is updated or archived.

The function can then be called by using a SELECT statement:

SELECT * FROM create_entity('{"sales": "500"}', '2000-01-01 10:00');

Which may returns a table like

`entity_id`	`entity_edition_id`	`decision_time`	`transaction_time`
`1`	`1`	`[2000-01-01 10:00, ∞)`	`[2000-01-01 10:05, ∞)`

Updating an entity is a bit more complicated due to the nature of having multiple time axes. Up to the time when the update happens the previous entity edition is valid, so the old entity's transaction time has to be changed to end at the current time. The same row has to be duplicated so that until the specified decision time the old entity was valid with the previous transaction time. Finally, a new row is inserted with the new properties and the new time ranges. For example if the above entity was updated with…

When updating an entity it’s crucial that two Postgres connections do not try to update the same row at the same time. First, this implies that we must not use any SELECT statement inside of the update_entity function. Second this means, that we should update the entity_edition_id and the temporal axes columns instead of inserting them. Postgres locks the row if it’s going to be updated to avoid updating the same row from two different transactions, so we can utilize this feature. For this, we introduce a function which can act as a TRIGGER. The function is supposed to be run before the actual update happens and will insert row 1 and 2 of the table above. The third row will be the updated first row.

CREATE OR REPLACE FUNCTION "update_entity_version_trigger" () RETURNS TRIGGER AS $pga$
    BEGIN
      -- Insert a new row with the old decision time and set the upper bound of the transaction time
      INSERT INTO entity_temporal_metadata (
        entity_id,
        entity_edition_id,
        decision_time,
        transaction_time
      ) VALUES (
        OLD.entity_id,
        OLD.entity_edition_id,
        OLD.decision_time,
        tstzrange(lower(OLD.transaction_time),lower(NEW.transaction_time), '[)')
      );

      -- Insert a new row with the new transaction time and set the upper bound of the decision time
      INSERT INTO entity_temporal_metadata (
        entity_id,
        entity_edition_id,
        decision_time,
        transaction_time
      ) VALUES (
        OLD.entity_id,
        OLD.entity_edition_id,
        tstzrange(lower(OLD.decision_time), lower(NEW.decision_time), '[)'),
        NEW.transaction_time
      );

      RETURN NEW;
    END
    $pga$ VOLATILE LANGUAGE plpgsql;

-- The function should be called before the row is updated
CREATE OR REPLACE TRIGGER "update_entity_version_trigger"
  BEFORE UPDATE ON "entity_temporal_metadata"
  FOR EACH ROW EXECUTE PROCEDURE "update_entity_version_trigger" ();

The actual update_entity function is similar to the creation function:

CREATE OR REPLACE FUNCTION "update_entity" (
  "_entity_id" INTEGER,
  "_properties" JSONB,
  "_decision_time" TIMESTAMP WITH TIME ZONE
) RETURNS SETOF "entity_temporal_metadata" AS $pga$
    BEGIN
      IF _decision_time IS NULL THEN _decision_time := now(); END IF;

			-- The update will insert overlappping time ranges before the actual row will be updated so the overlapping constraint has to be disabled
      SET CONSTRAINTS entity_temporal_metadata_overlapping DEFERRED;

      RETURN QUERY
      WITH entity_edition AS (
        INSERT INTO entity_editions (
          entity_edition_id,
          properties
        ) VALUES (
          DEFAULT,
          _properties
        ) RETURNING entity_editions.entity_edition_id
      )
      UPDATE entity_temporal_metadata
			SET decision_time = tstzrange(_decision_time, upper(entity_temporal_metadata.decision_time), '[)'),
          transaction_time = tstzrange(now(), NULL, '[)'),
          entity_edition_id = (SELECT entity_edition.entity_edition_id FROM entity_edition)
      WHERE entity_temporal_metadata.entity_id = _entity_id
        -- `@>` denotes the "contains" operator - "Does the time axis contains the timestamp?"
        AND entity_temporal_metadata.decision_time @> _decision_time
        AND entity_temporal_metadata.transaction_time @> now()
      RETURNING entity_temporal_metadata.*;

			-- The row is now updated so the overlapping constraint can be enabled again
      SET CONSTRAINTS entity_temporal_metadata_overlapping DEFERRED;
    END
    $pga$ VOLATILE LANGUAGE plpgsql;

Similar to the create_entity function, we insert the properties into the editions table, but since the entity identifier already exists, it's passed to the function instead. The UPDATE clause updates the row identified by the entity identifier and the provided timestamps with the new data. If you call the function as shown above, the newly added row will be returned. For the first update, this table will be returned:

`entity_id`	`entity_edition_id`	`decision_time`	`transaction_time`
`1`	`2`	`[2000-01-01 10:10, ∞)`	`[2000-01-01 10:15, ∞)`

An important detail in the above implementation is that it does not set the upper value of the decision time to an unbounded limit but instead uses the previous upper decision time axis. With this it's even possible to update data in the past. It's for example possible to add a another edition of the entity like:

Note, that the transaction_time will always use an unbounded upper limit as time goes on and Postgres does not travel faster than light — yet.

Querying Entities

For a single entity many different editions may exist in the database. And for each update there are two different time ranges per edition. This implies that to uniquely identify an entity its entity_id and at least one timestamp are required. A typical use case is that the transaction_time is used for auditing: when a record hits the database. In most cases it’s only interesting what the current state of the database is. In addition, the latest version of an entity is most likely the interesting part. The simplest way to achieve this default behavior is to create a VIEW:

CREATE VIEW "entities"
  AS SELECT entity_temporal_metadata.*,
            entity_editions.properties
       FROM entity_temporal_metadata
       JOIN entity_editions
         ON entity_temporal_metadata.entity_edition_id = entity_editions.entity_edition_id;

CREATE VIEW "latest_entities"
  AS SELECT entity_id,
            lower(decision_time) AS decision_time,
            lower(transaction_time) AS transaction_time,
            properties
       FROM entities
      -- `@>` denotes the "contains" operator - "Does the time axis contains the timestamp?"
      WHERE decision_time @> now() AND transaction_time @> now();

SELECT * FROM latest_entities WHERE entity_id = 1;

The above statement will return only the latest row for entity_id = 1:

`entity_id`	`decision_time`	`transaction_time`	`properties`
`1`	`2000-01-01 10:32`	`2000-01-01 10:37`	`{ "sales": "635" }`

In some cases, e.g. when a timeline over data should be created, not only the latest data is interesting, but also previous data. Some queries could for example be…

-- Query all entity with id `1` between 2000-01-01 and 2000-01-02
-- This will return all entities four entity revisions.
SELECT * FROM entities
  WHERE entity_id = 1
    -- `&&` denotes the "overlapping" operator
    AND decision_time && tstzrange('2000-01-01', '2000-01-02')
    AND transaction_time @> now();

-- What were the state of the database on 2000-01-01, 10:45 for all
-- current entities? This will return `635`.
SELECT properties->>'sales' FROM entities
  WHERE decision_time @> now()
    AND transaction_time @> '2000-01-01 10:45'::timestamptz;

Conclusion

As demonstrated earlier, implementing a bi-temporal datastore can be a complex task, requiring careful consideration of the application's querying methods. While it is possible to add more dimensions to the datastore, the process of updating it becomes increasingly challenging with each additional axis. Furthermore, the addition of each new dimension also increases the required storage space, which can become a significant concern.

However, despite these potential challenges, a temporal datastore offers numerous benefits that cannot be achieved with a non-temporal one. For instance, multiple dimensions can be queried independently, providing a powerful tool for data analysis and exploration. Additionally, a temporal datastore can help identify patterns and trends over time, facilitating more accurate forecasting and decision-making.

As such, while the implementation of a bi-temporal datastore may require additional effort and resources, the potential benefits can be significant, making them a worthwhile investment for applications with collaborative editing, data robustness, or other information assurance/safety requirements.

Join our community of HASH developers

Browse open issues

Star us on GitHub Get in touch

Multi-temporal versioning in Postgres

Versioned Datastores

What is a Versioned Datastore?

Temporal Datastores

Bi-temporal Datastores

More Dimensions

Implementation of a Bi-Temporal Datastore in PostgreSQL

Database Structure

Entity IDs

Entity Editions

Temporal Metadata

Creation and Updating of Entities

Querying Entities

Conclusion

Join our community of HASH developers

Resources

Projects

Get Involved

Multi-temporal versioning in Postgres

Versioned Datastores

What is a Versioned Datastore?

Temporal Datastores

Bi-temporal Datastores

More Dimensions

Implementation of a Bi-Temporal Datastore in PostgreSQL

Database Structure

Entity IDs

Entity Editions

Temporal Metadata

Creation and Updating of Entities

Querying Entities

Conclusion

Get new posts in your inbox

Join our community of HASH developers