Tracking history in database data

This week I noticed strange behaviour in a system containing a table that is pretty straight forward and does not have any history logging, versioning or anything like that. The contents of the table are updated by automatic processes without logging also. Since two weeks there have been updates to the contents of this table which I did not notice in the years before. At such moments it is difficult to track where the updates come from or maybe who updated it manually. This situation always results in guessing what happened. In the end I restored the contents of the table from 2 weeks ago (thank you backups!) but this is generally not ideal.

History in database data in my opinion is about keeping track of changes of database records. For example, you want to keep track of when and by whom a database record was updated. This is the case when you do not want to lose earlier data or want to be able to see what the value was of a specific database record in the past. Most importantly, if a record is deleted from the database, you will never again know what the contents was of that specific database record.

This can be implemented for any database type and product, but there are differences how you can choose to implement them. In practice I see basically two variants of how to implement this. As I am mostly working with relational databases, let's take a moment to look at those variants:

Versioning table

Write a copy of a record to another table every time a modification on a record is executed. This is handled by code or database triggers that store a full copy of the database record in another table.
An example like this:

CREATE TABLE book (
  id INT,
  bookname VARCHAR(50),
  author VARCHAR(50),
  etc
);

CREATE TABLE book_version (
  id INT,
  time DATETIME,
  versioned_by VARCHAR(50),
  book_id INT,
  bookname VARCHAR(50),
  author VARCHAR(50),
  etc
);

Note, SQL commands in this article are not meant to directly execute on a database. They are solely for explanatory purposes to indicate the subject.

In this example we have a table book, and just before a mutation of a book is written to the database, a copy of the old record is stored in the book_version, along with the timestamp and name of the user that updated the book. A lot of Object - Relation Mapping (ORM) libraries for different languages support this kind of history keeping.
The usual history flow of a database record in the table book happens as follows: 1. When a new book is added to the database, an SQL INSERT command is used.
Some ORM libraries add also an initial version to the book_version table, but not all. 2. When a book is updated, just before issuing the SQL UPDATE command, this happens:

INSERT INTO book_version (time, versioned_by, book_id, bookname, author)
SELECT CURRENT_TIMESTAMP, 'username', id, bookname, author
FROM book
WHERE id = ?

Step 2 also happens when a book is removed from the book table before executing the DELETE command.

This way you can very easily get a full history of records in a database.

Advantages

Minimal overhead.
History is separated from the current version of the data.
Better normalization of data.

Disadvantages

The only way to detect if a record was deleted from the table book is to examine the book_history table. Only if the book_id is present in the book_history table and not in the book table, can you determine this record was present in the table at some moment.
While you can see the old values of the records in the book_history table, it is more difficult to compare the versioned records to the current state of the record. You have to write a join between the tables to select both current and past data. In short: the normalization is a disadvantage in this case.

Keeping track of modifications in the same table

Basically, this variant adds timestamp columns to the table for each record:

CREATE TABLE book (
  objectid BIGINT,
  book_id INT,
  bookname VARCHAR(50),
  author VARCHAR(50),
  etc,
  added_dt TIMESTAMP,
  added_by VARCHAR(50),
  modified_dt TIMESTAMP,
  modified_by VARCHAR(50),
  removal_dt TIMESTAMP,
  removal_by VARCHAR(50)
);

Now, every time a record is inserted, only the added_dt and added_by columns are filled (along with the content columns).
For updates to the record, two things happen:
1. Mark the old record as removed by setting the removal_dt and removal_by columns.
2. Insert a new record based on the existing record with the new values, and by also updating the modified_dt and modified_by columns.
Deleting a record also means updating the removal_dt and removal_by columns. This way, the most recent data of the records is always available by using a SELECT query like this:

SELECT *
FROM book
WHERE removal_dt IS NULL;

If you want to see how the changes of the book progressed over time, just write a query like this without the use of the removal_dt column in the WHERE clause:

SELECT *
FROM book
ORDER BY objectid;

which gives a result like this:

objectid	book_id	bookname	author	added_dt	added_by	modified_dt	modified_by	removal_dt	removal_by
1	1	The fellowship of the Ring	Tolkien	2025-05-27 08:00:00	Arjan van der Veen	NULL	NULL	2025-05-27 08:00:15	Arjan van der Veen
2	1	The fellowship of the Ring	J.R.R. Tolkien	2025-05-27 08:00:00	Arjan van der Veen	2025-05-27 08:00:15	Arjan van der Veen	2025-05-27 08:00:30	Arjan van der Veen
3	1	The Fellowship of the Ring	J.R.R. Tolkien	2025-05-27 08:00:00	Arjan van der Veen	2025-05-27 08:00:30	Arjan van der Veen	NULL	NULL

Here I added "The Fellowship of the Ring" to the table, and updated it a few seconds later by correcting the name of the author and then the title.

Advantages

Minimal overhead.
History is stored within the same table, so it is easier to see what changed over time.
Deleted records are still visible for a DBA.
The latest change is always visible with the current record (in this case record with objectid 3).

Disadvantages

You have to extend WHERE clauses of your SQL statements everywhere to use the removal_dt IS NULL to get your current data. This can easily be solved by using a database view for this table. But since almost all SQL queries have a WHERE clause this is no issue for me personally.
More complex to use this database design with an ORM (Object - Relation Mapping) library when coding.

In the past I wrote an extension to Java Hibernate ORM to automatically make use of this pattern for a client, but I do not have the license to share that code. It is doable though to implement it and significantly makes coding an application using this easier. Nowadays I just write SQL commands directly and noticed that, for me personally, it does not take much longer to develop applications with this database design.

Versioning table

Advantages

Disadvantages

Keeping track of modifications in the same table

Advantages

Disadvantages

Links