Continuing on the previous write up on how time series data can be stored in Postgres efficiently, here is another approach, this time providing for extreme write performance.
The “horizontal” data structure in the last article requires an SQL statement for every data point update. If you cache data points long enough, you might be able to collect a bunch for a series and write them out at once for a slight performance advantage. But there is no way to update multiple series with a single statement, it’s always at least one update per series. With a large number of series, this can become a performance bottleneck. Can we do better?
One observation we can make about incoming time series data is that commonly the data points are roughly from the same time period, the current time, give or take. If we’re storing data at regularly-spaced intervals, then it is extremely likely that many if not all of the most current data points from various time series are going to belong to the exact same time slot. Considering this observation, what if we organized data points in rows of arrays, only now we would have a row per timestamp while the position within the array would determine the series?
Lets create the tables:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Notice how the step and size now become properties of the bundle
rather than the rra which now refers to a bundle. In the ts
table,
i
is the index in the round-robin archive (which in the previous
“horizontal” layout would be the array index).
The data we used before was a bunch of temperatures, lets add two more series, one where temperature is 1 degree higher, and one where it’s 1 degree lower. (Not that it really matters).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
Notice that every INSERT adds data for all three of our series in a single database operation!
Finally, let us create the view. (How it works is described in detail in the previous article)
1 2 3 4 5 6 7 8 9 |
|
And now let’s verify that it works:
1 2 3 4 5 6 7 8 |
|
This approach makes writes blazingly fast though it does have its drawbacks. For example there is no way to read a single series - even though the view selects a single array element, under the hood Postgres reads the whole row. Given that time series is more write intensive and rarely read, this may not be a bad compromise.