This week I noticed strange behaviour in a system containing a table that is pretty straight forward and does not have any history logging, versioning or anything like that. The contents of the table are updated by automatic processes without logging also. Since two weeks there have been updates to the contents of this table which I did not notice in the years before. At such moments it is difficult to track where the updates come from or maybe who updated it manually. This situation always results in guessing what happened. In the end I restored the contents of the table from 2 weeks ago (thank you backups!) but this is generally not ideal.
History in database data in my opinion is about keeping track of changes of database records. For example, you want to keep track of when and by whom a database record was updated. This is the case when you do not want to lose earlier data or want to be able to see what the value was of a specific database record in the past. Most importantly, if a record is deleted from the database, you will never again know what the contents was of that specific database record.
This can be implemented for any database type and product, but there are differences how you can choose to implement them. In practice I see basically two variants of how to implement this. As I am mostly working with relational databases, let's take a moment to look at those variants:
Versioning table
Write a copy of a record to another table every time a modification on a record is executed.
This is handled by code or database triggers that store a full copy of
the database record in another table.
An example like this:
CREATE TABLE book (
id INT,
bookname VARCHAR(50),
author VARCHAR(50),
etc
);
CREATE TABLE book_version (
id INT,
time DATETIME,
versioned_by VARCHAR(50),
book_id INT,
bookname VARCHAR(50),
author VARCHAR(50),
etc
);
Note, SQL commands in this article are not meant to directly execute on a database. They are solely for explanatory purposes to indicate the subject.
In this example we have a table book, and just before a mutation of a book is written to the database, a copy of the old record
is stored in the book_version, along with the timestamp and name of the user that updated the book.
A lot of Object - Relation Mapping (ORM) libraries for different languages support this kind of history keeping.
The usual history flow of a database record in the table book happens as follows:
1. When a new book is added to the database, an SQL INSERT command is used.
Some ORM libraries add also an initial version to the book_version table, but not all.
2. When a book is updated, just before issuing the SQL UPDATE command, this happens:
INSERT INTO book_version (time, versioned_by, book_id, bookname, author)
SELECT CURRENT_TIMESTAMP, 'username', id, bookname, author
FROM book
WHERE id = ?
- Step 2 also happens when a book is removed from the
book
table before executing the DELETE command.
This way you can very easily get a full history of records in a database.
Advantages
- Minimal overhead.
- History is separated from the current version of the data.
- Better normalization of data.
Disadvantages
- The only way to detect if a record was deleted from the table
book
is to examine thebook_history
table. Only if the book_id is present in thebook_history
table and not in thebook
table, can you determine this record was present in the table at some moment. - While you can see the old values of the records in the
book_history
table, it is more difficult to compare the versioned records to the current state of the record. You have to write a join between the tables to select both current and past data. In short: the normalization is a disadvantage in this case.
Keeping track of modifications in the same table
Basically, this variant adds timestamp columns to the table for each record:
CREATE TABLE book (
objectid BIGINT,
book_id INT,
bookname VARCHAR(50),
author VARCHAR(50),
etc,
added_dt TIMESTAMP,
added_by VARCHAR(50),
modified_dt TIMESTAMP,
modified_by VARCHAR(50),
removal_dt TIMESTAMP,
removal_by VARCHAR(50)
);
- Now, every time a record is inserted, only the
added_dt
andadded_by
columns are filled (along with the content columns). - For updates to the record, two things happen:
- Mark the old record as removed by setting the
removal_dt
andremoval_by
columns. - Insert a new record based on the existing record with the new values, and by also updating the
modified_dt
andmodified_by
columns.
- Mark the old record as removed by setting the
- Deleting a record also means updating the
removal_dt
andremoval_by
columns. This way, the most recent data of the records is always available by using a SELECT query like this:
SELECT *
FROM book
WHERE removal_dt IS NULL;
If you want to see how the changes of the book progressed over time, just write a query like this without the use of the removal_dt
column in the WHERE clause:
SELECT *
FROM book
ORDER BY objectid;
which gives a result like this:
objectid | book_id | bookname | author | added_dt | added_by | modified_dt | modified_by | removal_dt | removal_by |
---|---|---|---|---|---|---|---|---|---|
1 | 1 | The fellowship of the Ring | Tolkien | 2025-05-27 08:00:00 | Arjan van der Veen | NULL | NULL | 2025-05-27 08:00:15 | Arjan van der Veen |
2 | 1 | The fellowship of the Ring | J.R.R. Tolkien | 2025-05-27 08:00:00 | Arjan van der Veen | 2025-05-27 08:00:15 | Arjan van der Veen | 2025-05-27 08:00:30 | Arjan van der Veen |
3 | 1 | The Fellowship of the Ring | J.R.R. Tolkien | 2025-05-27 08:00:00 | Arjan van der Veen | 2025-05-27 08:00:30 | Arjan van der Veen | NULL | NULL |
Here I added "The Fellowship of the Ring" to the table, and updated it a few seconds later by correcting the name of the author and then the title.
Advantages
- Minimal overhead.
- History is stored within the same table, so it is easier to see what changed over time.
- Deleted records are still visible for a DBA.
- The latest change is always visible with the current record (in this case record with
objectid
3).
Disadvantages
- You have to extend WHERE clauses of your SQL statements everywhere to use the
removal_dt IS NULL
to get your current data. This can easily be solved by using a database view for this table. But since almost all SQL queries have a WHERE clause this is no issue for me personally. - More complex to use this database design with an ORM (Object - Relation Mapping) library when coding.
In the past I wrote an extension to Java Hibernate ORM to automatically make use of this pattern for a client, but I do not have the license to share that code. It is doable though to implement it and significantly makes coding an application using this easier. Nowadays I just write SQL commands directly and noticed that, for me personally, it does not take much longer to develop applications with this database design.