In a traditional centralized infrastructure supports buying and selling your favorite cryptocurrencies, we needed to store purchase and sales information as accurately as possible so that should dispute arise, we have a way to credit users correctly.
A difficulty in financial data is that many things change about transactions. While transactions from one user to another is relatively easy (sending like for like), transactions involving time-varying price can get hairy.
The impulse is to store a single row per purchase or sales of a cryptocurrency and then update the status of the given purchase or sales event as they proceed as a state machine. State-machine is a construction that allows changing states in a directed fashion, and an example could be something like this:
NEW -> OPEN --> PARTIAL --> FILLED --> DONE | | | | | | | |-> FAILED | | | |-> EXPIRED | | | | | |-> FAILED | | |-> EXPIRED | | | |-> CANCEL_REQUESTED -> CANCELED | |-> REJECTED
In the above state machine, the sale of
BAR, an example cryptoasset, goes from
NEW state to
DONE state. In a database, we update the entry and change the state, and then update the time at which this was updated.
Unfortunately, updating the row of data destroys its history. We know about the last time we updated that particular client order, but we don't know what its prior state was.
The next naive thing that one can think of is to incorporate a way to store historical data by creating an append-only database. Sounds familiar?
So when the order has been submitted, internal ledger might have something like the following:
and when the order has been submitted to the services in the backend:
id | userId | symbol | price | size | status | createdAt | updatedAt 1 | foo | BAR | 1.00 | 1.00 | NEW | 2018-10-24 00:00:00 | 2018-10-24 00:00:00
id | userId | symbol | price | size | status | createdAt | updatedAt 1 | foo | BAR | 1.00 | 1.00 | NEW | 2020-10-24 00:00:00 | 2020-10-24 00:00:00 1 | foo | BAR | 1.00 | 1.00 | OPEN | 2020-10-24 00:00:10 | 2020-10-24 00:00:10
At which point
updatedAt becomes somewhat useless:
id | userId | symbol | price | size | status | createdAt 1 | foo | BAR | 1.00 | 1.00 | NEW | 2020-10-24 00:00:00 1 | foo | BAR | 1.00 | 1.00 | OPEN | 2020-10-24 00:00:10
However, the problem with this arrangement is that this is inefficient when locating the entries from the database quickly. One would have to look at the group of market orders by
id and then only return the latest using the
createdAt timestamp. Fortunately, we can separate this with current data vs. historical data. This temporal database also does not give information about what was the actual execution price vs. what was submitted.
Tacen has to ensure that all orders are handled in a timely manner. We at Tacen try to make sure that all events are replayable and deterministic - i.e., the event happens in a time-ordered way, and the orderbook's processing is deterministic. However, there will be certain edge cases where errors do occur, whether by backend error or an act of God.
This means that sometimes when we execute the orders on the market, there may be cases where new information about a rollback of an error may happen immediately afterward to auto-correct an error, compensate the trader for the error, and so on. Normally because all order flows to funnel into orderbook service serially, this isn't a problem. For the sake of illustrating the benefit of what we're doing, however, consider the following series of events where trader submitted a market order and got filled partially:
id | userId | type | symbol | price | size | status | createdAt 1 | foo | SELL | BAR | 1.00 | 1.00 | NEW | 2020-10-24 00:00:00 1 | foo | SELL | BAR | 1.00 | 1.00 | OPEN | 2020-10-24 00:00:10 1 | foo | SELL | BAR | 1.00 | 0.80 | PARTIAL | 2020-10-24 00:00:11
It's essential to give the users accurate information about the order, but when making any request over the Internet, your data at one point in time is invalid at another point in time. When one deals with financial markets, one often sees discrepancies in the price quoted vs. price executed.
In our case, the partial order might have been filled with size 1,00 and price 1.05 if we used market order at
2020-10-24 00:00:11 instead of size 0.80 and price 1.00, but how do we update what happened to the user while keeping the most accurate history of the events? Is the following sufficient?
id | userId | type | symbol | price | size | status | createdAt 1 | foo | SELL | BAR | 1.00 | 1.00 | NEW | 2020-10-24 00:00:00 1 | foo | SELL | BAR | 1.00 | 1.00 | OPEN | 2020-10-24 00:00:10 1 | foo | SELL | BAR | 1.00 | 0.80 | PARTIAL | 2020-10-24 00:00:11 1 | foo | SELL | BAR | 1.05 | 1.00 | FILLED | 2020-10-24 00:00:12
In the above table, the
00:00:12 is the updated timestamp of the order, not the actual time when the order has been executed at the right size and price.
Much information has been lost even though we kept the temporal information about the status update and the size and price of the symbol.
Enter the bitemporal database.
The above can be stored as the following:
id | userId | type | symbol | price | size | status | actual | recorded 1 | foo | SELL | BAR | 1.00 | 1.00 | NEW | 2020-10-24 00:00:00 | 2020-10-24 00:00:00 1 | foo | SELL | BAR | 1.00 | 1.00 | OPEN | 2020-10-24 00:00:10 | 2020-10-24 00:00:10 1 | foo | SELL | BAR | 1.00 | 0.80 | PARTIAL | 2020-10-24 00:00:11 | 2020-10-24 00:00:11 1 | foo | SELL | BAR | 1.05 | 1.00 | FILLED | 2020-10-24 00:00:11 | 2020-10-24 00:00:12
In the above table, we separate the data into
recorded time, which allows us to understand what the corrected data is, even if the first record was erroneously done (third row).
We do this by checking whether the
recorded time is later for the entry with the same
This allows us to have the complete replayable database of what the user sees (actual) vs. what was the truth (latest recorded) and also allows us to see changes in the system's understanding of the event as time progresses in the
Here's a human-readable representation of what's happening:
actual | system | time SELL 1.00 BAR @ 1.00 NEW | SELL 1.00 BAR @ 1.00 NEW | 2020-10-24 00:00:00 SELL 1.00 BAR @ 1.00 OPEN | SELL 1.00 BAR @ 1.00 OPEN | 2020-10-24 00:00:10 SELL 1.00 BAR @ 1.05 FILLED | SELL 0.80 BAR @ 1.00 PARTIAL | 2020-10-24 00:00:11 SELL 1.00 BAR @ 1.05 FILLED | SELL 1.00 BAR @ 1.00 FILLED | 2020-10-24 00:00:12
The concept of a bitemporal database has been around for quite a while, and are used in many serious financial institutions. We at Tacen decided to go with the bitemporal database for the market transactions because we wanted to make sure that we have full replayable events to keep our accounting straight, which ultimately means less headache for the users. Trust is possible when temporal data is kept, and trust is improved when temporal data can be compared against what the actual value was versus what the system thought it was.
Complete event replayability is useful in many aspects of financial soundness. One place where this is broadly useful is in accounting. Another area where this is useful is in forensics analysis should something goes awry.
I hope you enjoyed our first engineering blog!