Grisha Trubetskoy

Notes to self.

Simple Tgres Part II - a High Rate Counter

| Comments

Continuing on the the previous post on simple use of Tgres components, let’s try to count something that goes by really fast.

This time let’s start out with creating a memory-based SerDe. This means that all our data is in memory and there is no database backing our series.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
package main

import (
    "fmt"
    "net/http"
    "time"

    "github.com/tgres/tgres/dsl"
    h "github.com/tgres/tgres/http"
    "github.com/tgres/tgres/receiver"
    "github.com/tgres/tgres/rrd"
    "github.com/tgres/tgres/serde"
)

func main() {

    step := 1 * time.Second // 1 second resolution
    span := 600 * step      // spanning 10 minutes

    // In-memory SerDe
    ms := serde.NewMemSerDe()

    // Create a receiver of our data points backed by the above
    // memory SerDe
    rcvr := receiver.New(ms, &receiver.SimpleDSFinder{&rrd.DSSpec{
        Step: step,
        RRAs: []rrd.RRASpec{
            rrd.RRASpec{Function: rrd.WMEAN,
                Step: step,
                Span: span,
            },
        }}})
    rcvr.Start()

Now let’s create a goroutine which creates data points as fast as it can, the difference from the previous blog post is that we are using QueueGauge(), which is a paced metric, meaning that it flushes to the time series only periodically (once per second by default) so as to not overwhelm the I/O and or network (even though in this case it doesn’t really matter since we’re using a memory-based SerDe anyway).

1
2
3
4
5
6
7
8
9
10
    start := time.Now()
    end := start.Add(span)

    go func() {
        n := 0
        for t := time.Now(); t.Before(end); t = time.Now() {
            rcvr.QueueGauge(serde.Ident{"name":"foo.bar"}, float64(n)/(t.Sub(start)).Seconds())
            n++
        }
    }()

And finally, as before, we need to hook up a couple of http handlers:

1
2
3
4
5
6
7
8
9
10
    db := dsl.NewNamedDSFetcher(ms.Fetcher())

    http.HandleFunc("/metrics/find", h.GraphiteMetricsFindHandler(db))
    http.HandleFunc("/render", h.GraphiteRenderHandler(db))

    listenSpec := ":8088"
    fmt.Printf("Waiting for requests on %s\n", listenSpec)
    http.ListenAndServe(listenSpec, nil)

} // end of main()

Now if we run the above code with something like go run simpletgres.go, we’ll notice that unlike with the previous example, the web server starts right away, and the data points are being written while the server is running. If we aim Grafana at it, we should be able to see the chart update in real time.

After a couple of minutes, mine looks like this:

So my macbook can crank these out at about 2.5 million per second.

In my experience instrumenting my apps with simple counters like this and having them available directly from the app without having to send them to a separate statsd server somewhere has been extremely useful in helping understand performance and other issues.

Why Is There No Formal Definition of Time Series?

| Comments

If you’re reading this, chances are you may have searched for definition of “Time Series”. And, like me, you were probably disappointed by what you’ve found.

The most popular “definition” I come across amongst our fellow programmer folk is that it’s “data points with timestamps”. Or something like that. And you can make charts from it. And that’s about it, alas.

The word time suggests that is has something to do with time. At first it seems reasonable, I bite. The word series is a little more peculiar. A mathematician would argue that a series is a sum of a sequence. Most people though think “series” and “sequence” are the same thing, and that’s fine. But it’s a clue that time series is not a scientific term, because it would have been called time sequence most likely.

Lets get back to the time aspect of it. Why do data points need timestamps? Or do they? Isn’t it the time interval between points that is most essential, rather than the absolute time? And if the data points are spaced equally (which conforms to the most common definiton of time series), then what purpose would any time-related information attached to a data point serve?

To understand this better, picture a time chart. Of anything - temperature, price of bitcoin over a week, whatever. Now think - does the absolute time of every point provide any useful information to you? Does the essential meaning of the chart change depending on whether it shows the price of bitcoin in the year 2016 or 2098 or 10923?

Doesn’t it seem like “time” in “time series” is a bit of a red herring?

Here is another example. Let’s say I decide to travel from San-Francisco to New York taking measurements of elevation above the sea level at every mile. I then plot that sequence on a chart where x-axis is distance traveled and y-axis is elevation. You would agree that this chart is not a “time series” by any stretch, right? But then if I renamed x-axis to “time traveled” (let’s assume I moved at constant speed), the chart wouldn’t change at all, but now it’s okay to call it “time series”?

So it’s no surprise that there is no formal definition of “time series”. In the end a “time series” is just a sequence. There are no timestamps required and there is nothing at all special regarding a dimension being time as opposed to any other unit, which is why there is no mathematical definition of “time series”. Time series is a colloquial term etymological origins of which are not known to me, but it’s not a thing from a scientific perspective, I’m afraid.

Next time you hear “time series” just substitute it with “sequence” and see how much sense that makes. For example a “time series database” is a “sequence database”, i.e. database optimized for sequences. Aren’t all relational databases optimized for sequences?

Something to think about over the holidays…

Edit: Someone brought up the subject of unevenly-spaced time series. All series are evenly spaced given proper resolution. An unevenly-spaced time series with timestamps accurate to 1 millisecond is a sparse evenly-spaced series with a 1 millisecond resolution.

Simple Time Series App With Tgres

| Comments

Did you know you can use Tgres components in your code without PostgreSQL, and in just a dozen lines of code instrument your program with a time series. This example shows a complete server emulating Graphite API which you can use with Grafana (or any other tool).

In this example we will be using three Tgres packages like so (in addition to a few standard ones, I’m skipping them here for brevity - complete source code gist):

1
2
3
4
5
import (
    "github.com/tgres/tgres/dsl"
    h "github.com/tgres/tgres/http"
    "github.com/tgres/tgres/rrd"
)

First we need a Data Source. This will create a Data Source containing one Round Robin Archive with a 10 second resolution spanning 1000 seconds.

1
2
3
4
5
6
7
8
9
step := 10 * time.Second
span := 100 * step

ds := rrd.NewDataSource(rrd.DSSpec{
    Step: 1 * time.Second,
    RRAs: []rrd.RRASpec{
        rrd.RRASpec{Step: step, Span: span},
    },
})

Let’s shove a bunch of data points into it. To make it look extra nice, we can make these points look like a sinusoid with this little function:

1
2
3
4
func sinTime(t time.Time, span time.Duration) float64 {
    x := 2 * math.Pi / span.Seconds() * float64(t.Unix()%(span.Nanoseconds()/1e9))
    return math.Sin(x)
}

And now for the actual population of the series:

1
2
3
4
5
6
start := time.Now().Add(-span)

for i := 0; i < int(span/step); i++ {
    t := start.Add(time.Duration(i) * step)
    ds.ProcessDataPoint(sinTime(t, span), t)
}

We will also need to create a NamedDSFetcher, the structure which knows how to search dot-separated series names a la Graphite.

1
db := dsl.NewNamedDSFetcherMap(map[string]rrd.DataSourcer{"foo.bar": ds})

Finally, we need to create two http handlers which will mimic a Graphite server and start listening for requests:

1
2
3
4
5
6
http.HandleFunc("/metrics/find", h.GraphiteMetricsFindHandler(db))
http.HandleFunc("/render", h.GraphiteRenderHandler(db))

listenSpec := ":8088"
fmt.Printf("Waiting for requests on %s\n", listenSpec)
http.ListenAndServe(listenSpec, nil)

Now if you point Grafana at it, it will happily think it’s Graphite and should show you a chart like this:

Note that you can use all kinds of Graphite functions at this point - it all “just works”.

Enjoy!

Storing Time Series in PostgreSQL (Continued)

| Comments

I have previously written how time series can be stored in PostgreSQL efficiently using arrays.

As a continuation of that article, I shall attempt to describe in detail the inner workings of an SQL view that Tgres uses to make an array of numbers appear as a regular table (link to code).

In short, I will explain how incomprehensible data like this:

1
2
3
4
5
6
7
=> select * from ts;
 rra_id | n |           dp
--------+---+------------------------
      1 | 0 | {64,67,70,71,72,69,67}
      1 | 1 | {65,60,58,59,62,68,70}
      1 | 2 | {71,72,77,70,71,73,75}
      1 | 3 | {79,82,90,69,75,80,81}

… can be transformed in an SQL view to appear as so:

1
2
3
4
5
6
7
8
=> select * from tv order by t;
 rra_id |           t            | r
--------+------------------------+----
      1 | 2008-03-06 00:00:00+00 | 64
      1 | 2008-03-07 00:00:00+00 | 67
      1 | 2008-03-08 00:00:00+00 | 70
      1 | 2008-03-09 00:00:00+00 | 71
...

This write up will make a lot more sense if you read the previous post first. To recap, Tgres stores series in an array broken up over multiple table rows each containing an array representing a segment of the series. The series array is a round-robin structure, which means that it occupies a fixed amount of space and we do not need to worry about expiring data points: the round-robin nature of the array takes care of it by overwriting old data with new on assignment.

An additional benefit of such a fixed interval round-robin structure is that we do not need to store timestamps for every data point. If we know the timestamp of the latest entry along with the series step and size, we can extrapolate the timestamp of any point in the series.

Tgres creates an SQL view which takes care of this extrapolation and makes this data easy to query. Tgres actually uses this view as its only source of time series information when reading from the database thus delegating all the processing to the database server, where it is close to the data and most efficient.

If you would like to follow along on the Postgres command line, feel free to create and populate the tables with the following SQL, which is nearly identical to the schema used by Tgres:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
CREATE TABLE rra (
  id SERIAL NOT NULL PRIMARY KEY,
  step_s INT NOT NULL,
  steps_per_row INT NOT NULL,
  size INT NOT NULL,
  width INT NOT NULL,
  latest TIMESTAMPTZ DEFAULT NULL);

CREATE TABLE ts (
  rra_id INT NOT NULL,
  n INT NOT NULL,
  dp DOUBLE PRECISION[] NOT NULL DEFAULT '{}');

INSERT INTO rra VALUES (1, 60, 1440, 28, 7, '2008-04-02 00:00:00-00');

INSERT INTO ts VALUES (1, 0, '{64,67,70,71,72,69,67}');
INSERT INTO ts VALUES (1, 1, '{65,60,58,59,62,68,70}');
INSERT INTO ts VALUES (1, 2, '{71,72,77,70,71,73,75}');
INSERT INTO ts VALUES (1, 3, '{79,82,90,69,75,80,81}');

And finally create the view:

1
2
3
4
5
6
7
8
CREATE VIEW tv AS
  SELECT rra.id rra_id,
         latest - INTERVAL '1 SECOND' * rra.step_s * rra.steps_per_row *
           MOD(rra.size + MOD(EXTRACT(EPOCH FROM rra.latest)::BIGINT/(rra.step_s * rra.steps_per_row), size) + 1
           - (generate_subscripts(dp,1) + n * width), rra.size) AS t,
         UNNEST(dp) AS r
    FROM rra
   INNER JOIN ts ts ON ts.rra_id = rra.id;

Now give it a whirl with a SELECT * FROM tv ORDER BY t. Impressive? So how does it work?

First let’s go over the columns of the rra table.

  • step_s: the minimal unit of time expressed in seconds (60 or 1 minute in the above data).
  • steps_per_row: the number of the step_s intervals in one slot of our time series. In our example it is 1440, which is the number of minutes in a day, thus making our time series resolution one day.
  • size: number of slots in the series. Ours is 28, i.e. four weeks.
  • width: size of a segment which will be stored in a single row, which in our case is 7 (one week).
  • latest: the timestamp of the last data point in the series.

Next, let’s look at the UNNEST keyword in the SQL of the view. UNNEST takes an array and turns it into row, e.g.:

1
2
3
4
5
6
=> SELECT UNNEST(dp) AS r FROM ts;
 r
----
 64
 67
...

UNNEST works in conjunction with the generate_subscripts PostgreSQL function which generates index values:

1
2
3
4
5
6
=> SELECT generate_subscripts(dp,1) AS i, UNNEST(dp) AS r FROM ts;
 i | r
---+----
 1 | 64
 2 | 67
...

Let us now zoom in on the very long expression in the view, here it is again:

1
2
3
latest - INTERVAL '1 SECOND' * rra.step_s * rra.steps_per_row *
  MOD(rra.size + MOD(EXTRACT(EPOCH FROM rra.latest)::BIGINT/(rra.step_s * rra.steps_per_row), size) + 1
  - (generate_subscripts(dp,1) + n * width), rra.size) AS t

A perhaps not immediately apparent trick to how all this works is that all our series are aligned on the beginning of the epoch. This means that at UNIX time 0, any series’ slot index is 0. From then on it increments sequentially until the series size is reached, at which point it wraps-around to 0 (thus “round-robin”). Armed with this information we can calculate the index for any point in time.

The formula for calculating the index i for a given time t is:

1
i = t/step % size.

We need time to be expressed as a UNIX time which is done with EXTRACT(EPOCH FROM rra.latest)::BIGINT. Now you should recognize the above formula in the more verbose expression

1
MOD(EXTRACT(EPOCH FROM rra.latest)::BIGINT/(rra.step_s * rra.steps_per_row), size)

where rra.step_s * rra.steps_per_row is the size of our series in seconds.

Next, we need to compute the distance between the current slot and the last slot (for which we know the timestamp). I.e. if the last slot is i and the slot we need the timestamp for is j, the distance between them is i-j, but with a caveat: it is possible for j to be greater than i if the series wraps around, in which case the distance is the sum of the distance from j to the end of the series and the distance from the beginning to i. If you ponder over it with a pencil and paper long enough, you will arrive at the following formula for distance between two slots i and j in a wrap-around array:

1
distance = (size + j - i) % size

Another thing to consider is that we’re splitting our series across multiple rows, thus the actual index of any point is the subscript into the current segment plus the index of the segment itself (the n column) multiplied by the wdith of the segment: generate_subscripts(dp,1) + n * width.

Which pieced together in SQL now looks like this:

1
2
MOD(rra.size + MOD(EXTRACT(EPOCH FROM rra.latest)::BIGINT/(rra.step_s * rra.steps_per_row), size) + 1
  - (generate_subscripts(dp,1) + n * width), rra.size)

Astute readers should notice an unexplained + 1. This is because PostgreSQL arrays are 1-based.

Now we need to convert the distance expressed in array slots into a time interval, which we do by multiplying it by INTERVAL '1 SECOND' * rra.step_s * rra.steps_per_row.

And finally, we need to subtract the above time interval from the latest stamp which yields (ta-da!) the timestamp of the current slot:

1
2
3
latest - INTERVAL '1 SECOND' * rra.step_s * rra.steps_per_row *
  MOD(rra.size + MOD(EXTRACT(EPOCH FROM rra.latest)::BIGINT/(rra.step_s * rra.steps_per_row), size) + 1
  - (generate_subscripts(dp,1) + n * width), rra.size) AS t

That’s it! And even though this may look complicated, from the computational view point it is very efficient, and PostgreSQL can handle it easily.

As an exercise, try setting latest to various timestamps and observe how it affects the output of the view and see if you can explain how and why it happens.

Parsing Table Names From SQL

| Comments

Sometimes it is useful to extract table names from an SQL statement, for example if you are trying to figure out dependencies for your Hive or BigQuery (or whatever) tables.

It is actually a lot simpler than it seems and you don’t need to write your own SQL parser or find one out there. In SQL table names always follow the FROM and JOIN keywords. So all you have to do is split the statemement into tokens, and scan the list for any mention of FROM or JOIN and grab the next token.

Here is a very simplistic Python function that does this using regular expressions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def tables_in_query(sql_str):

    # remove the /* */ comments
    q = re.sub(r"/\*[^*]*\*+(?:[^*/][^*]*\*+)*/", "", sql_str)

    # remove whole line -- and # comments
    lines = [line for line in q.splitlines() if not re.match("^\s*(--|#)", line)]

    # remove trailing -- and # comments
    q = " ".join([re.split("--|#", line)[0] for line in lines])

    # split on blanks, parens and semicolons
    tokens = re.split(r"[\s)(;]+", q)

    # scan the tokens. if we see a FROM or JOIN, we set the get_next
    # flag, and grab the next one (unless it's SELECT).

    result = set()
    get_next = False
    for tok in tokens:
        if get_next:
            if tok.lower() not in ["", "select"]:
                result.add(tok)
            get_next = False
        get_next = tok.lower() in ["from", "join"]

    return result

This is obviously not perfect, for example in BigQuery there is a possibility that what follows SELECT is a UDF name, but I’ll leave working around that as an exercise for the reader.

Load Testing Tgres

| Comments

So I finally got around to some load testing of Tgres. Load testing is mysterious, it never goes the way you think it would, and what you learn is completely unexpcted.

Given that I presently don’t have any spare big iron at my disposal and my “servers” are my macbook and an old thinkpad, all I really was after is making sure that Tgres is “good enough” whatever that means. And I think it is.

I was hoping to gather some concrete numbers and may be even make a chart or two, but in the end it all turned out to be so tedious and time consuming, running the tests with various setting for hours on, that I just gave up for now - after all, “premature optimization is the root of all evil”.

I also wanted to see how it stacks up against Graphite carbon-cache.py. As in, is it on par, or much better or much worse. My expectation was that Graphite could outperform it, because what it does is so much simpler (and I was right). First thing I tried to do is overwhelm Graphite. I never succeeded in that - I probably could have tried harder, but I quickly learned that I don’t know what symptoms I’m looking for. I wronte a Go program that blasted UDP data points at 10K/sec across 10K different series, and taking it to over 20K/sec saturated my network before Graphite showed any signs of deterioration. There was also no reliable way for me to audit the data points - may be some of them got lost, but at 600K+ per minute, I don’t know of any practical way of doing it. Not without a lot of work, at least.

With Tgres things were much easier. The weakest link is, not surpisingly, PostgreSQL. What I learned was that there are two kinds of deterioration when it comes to PostgreSQL though. The first one is outright, and that one manifests in database requests getting progressively slower until Tgres gets stuck with all its channels full.

You can make PostgreSQL very significantly faster with a few simple tricks. For example the following settings can make it much faster:

1
2
synchronous_commit = off
commit_delay = 100000

This post isn’t about PostgreSQL, and so I’m not going to get into the details of what this does, there is plenty of documentation and blog posts on the subject. If you plan on hosting a busy Tgres set up, you should probably have the above settings.

The second way PostgreSQL deteriorates is not immediately apparent - it is the infamous table bloat. Getting autovacuum to keep up with the ts table (which stores all the time series) is tricky, and once you’ve ran out of options to tweak, this is probably it - the maximum load the database can handle, even if it may seem relatively calm.

Autovacuum has a lot of knobs, but ultimately they all exist to take advantage of the variability of load in a database, i.e. you can let it get behind during the day and catch up at night when the database is not as busy. It doesn’t really work with time series, which are not variable by nature - if you’re receiving 5 thousand data points per second at noon, you can expect the same rate at 4am. I think the setting that worked best for me were:

1
2
3
4
5
autovacuum_max_workers = 10
autovacuum_naptime = 1s
autovacuum_vacuum_threshold = 2000
autovacuum_vacuum_scale_factor = 0.0
autovacuum_vacuum_cost_delay = 0 # disable cost based

To the best of my undestanding the above setting disables cost-based autovacuum (meaning it doesn’t pause periodically to yield resources to the normal db tasks), makes autovacuum kick in after 2K updates (which happens in no time), and sleeps 1s in between runs, which means it’s running pretty much continuosly.

I was able to sustain a load of ~6K datapoints per second across 6K series - anything higher caused my “database server” (which is a 2010 i7 Thinkpad) autovacuum to get behind.

I also did some testing of how TOAST affects performance. There is no setting for turning TOAST on or off, but it can easily be done in Tgres by changing the number of data points per row. The default is 768 which is about 75% of a page. If you for example double it, then each row becomes larger than a page and TOAST kicks in. TOAST is compressed, which is an advantage, but it is a separate table, which is a disadvantage. In the end it seemed like the database detirorated quicker with TOAST, but it was rather inconclusive.

In the end the key factor, or the weakest link, was the rate of queries per second. I now added a special rate limiting setting feature to Tgres (max-flushes-per-second) which trumps all other settings and will keep your database happy at the expense of Tgres possibly caching a little more points in memory than expected.

I will probably get back to some more load testing in a while, but for now this is it.

Golang Receiver vs Function Argument

| Comments

What is the difference between a Go receiver (as in “method receiver”) and a function argument? Consider these two bits of code:

1
2
3
func (d *duck) quack() { // receiver
     // do something
}

versus

1
2
3
func quack(d *duck) { // funciton argument
    // do something
}

The “do something” part above would work exactly the same regardless of how you declare the function. Which begs the question, which should you use?

In the object-oriented world we were used to objects doing things, and in that context d.quack() may seem more intuitive or familiar than quack(d) because it “reads better”. After all, one could argue that the former is a duck quacking, but the latter reads like you’re quacking a duck, and what does that even mean? I have learned that you should not think this way in the Go universe, and here is why.

First, what is the essential difference? It is that at the time of the call, the receiver is an interface and the function to be called is determined dynamically. If you are not using interfaces, then this doesn’t matter whatsoever and the only benefit you are getting from using a method is syntactic sweetness.

But what if you need to write a test where you want to stub out quack(). If your code looks like this, then it is not possible, because methods are attached to their types inflexibly, you cannot change them, and there is no such thing as a “method variable”:

1
2
3
4
5
6
7
8
9
10
type duck struct{}

func (d *duck) quack() {
     // do something
}

// the function we are testing:
func testme(d *duck) {
    d.quack() // cannot be stubbed
}

However, if you used a function argument, it would be easy:

1
2
3
4
5
6
7
8
9
10
type duck struct{}

var quack = func(d *duck) {
     // do something
}

// the function we are testing:
func foo(d *duck) {
    quack(d)
}

Now you can assign another function to quack at test time, e.g. quack = func(d *duck) { // do something else } and all is well.

Alternatively, you can use an interface:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
type quacker interface {
    quack()
}

type duck struct{}

var func (d *duck) quack() { // satisfies quacker
     // do something
}

// the function we are testing:
func foo(d quacker) {
    d.quack()
}

Here, if we need to test foo() we can provide a different quacker.

Bottom line is that it only makes sense to use a receiver if this function is part of an interface implementation, OR if you never ever need to augment (stub) that function for testing or some other reason. As a practical matter, it seems like (contrary to how it’s done in the OO world) it is better to always start out with quack(d) rather than d.quack().

How Data Points Build Up

| Comments

This silly SVG animation (animation not my strong suit) demonstrates what happens when multiple Tgres data points arrive within the same step (i.e. smallest time interval for this series, also known as PDP, primary data point).

You browser does not support SVG objects?

Explanation

Let’s say we have a series with a step of 100 seconds. We receive the following data points, all within the 100 second interval of a single step:

Time Value Recorded
25s 2.0 0.5
75s 3.0 2.0
100s 1.0 2.25
  Final: 2.25

Tgres will store 2.25 as the final value for this step. So how does 1, 2 and 3 add up to 2.25?

One way to think about it is that the incomplete step is an empty swimming pool as wide as 1 step, into which we dump blocks of water. The first data point dumps a 2.0 × 0.25 block of water, which fills the pool to 0.5. The second data point dumps a 3.0 × 0.50 block, which raises the water another 1.5 to 2.0. The last data point dumps a 1.0 × 0.25 block which raises it to the final value of 2.25. Compare this with Graphite which would simply discard the first two data points and we are left with 1.0 as the final value.

Why is it done this way? Because this is how rates add up. If this was speed of a car in meters per second (more like a bycicle, I guess), its weighted average speed for the duration of this step of 2.25 meters per second would mean that in the 100s it would have traveled exactly 225 meters. As one of the smart readers pointed out in the comments to this post, this boils down the Fundamental Theorem of Calculus.

NaNs or “Unknowns”

What if instead of the first data point, the first 25s were “unknown” (recorded as NaN)? This would happen, for example, if the series heartbeat (maximum duration without any data) was exceeded. Even though the data point has a value of 2.0, it gets recorded as NaN.

Time Value Recorded
25s 2.0 NaN
75s 3.0 2.0
100s 1.0 2.33
  Final: 2.33

But wait a second… 0.50 × 3 + 0.25 × 1 = 1.75 ? Where did the value of 2.33 come from?

The reason for this is that NaN ought not be influencing the value. The above calculation would only be correct if we assumed that NaN is synonymous with zero, but that would be a false assumption, as NaN means “we do not know”.

Therefore, we must only consider the known part of the data point, which is 75s. We can think of it that the data point (the “swimming pool”) just got smaller. Thus the correct calculation for the 3.0 point would be 3.0 × 50 ÷ 75 = 2.0 and for the 1.0 point 2.0 + 1.0 × 25 ÷ 75 = 2.33.

Here it is in SVG:

You browser does not support SVG objects?

Also note how the value of the data point which was recorded as NaN (2.0 in our example) is essentially irrelevant. This is because any calculation with a NaN always results in a NaN. The only thing we know about this data point is that it was not NaN and that it marked the end of period recorded as NaN. The next data point after this (3.0 in our example) is not affected by the NaN, however, this is because it in effect starts its own data point afresh, not considering anything in the past.

Introducing Tgres - a Time Series DB on Top of PostgreSQL

| Comments

Tgres is a metrics collection and storage server, aka a time series database. I’m not very comfortable with referring to it as a database, because at least in case of Tgres, the database is actually PostgreSQL. But also “database” to me is in the same category as “operating system” or “compiler”, a thing so advanced that only few can claim to be it without appearing pretentious. But for the sake of tautology avoidance, I might occasionally refer to Tgres as a TS database.

Ulike Graphite or RRDTool, Tgres produces no charts, it assumes you’re using something like Grafana. Currently Tgres supports most of the Graphite functionality (including vast majority of the functions) as well as Statsd functionality. Tgres supports clustering, albeit whereby all nodes must share the PostgreSQL instance. Tgres can be used as a standalone server or as a Go package compiled into your app.

Current status

It’s been over a year since I began hacking on it in this incarnation, though the idea and a couple of scrapped implementations thereof go back more than two years. Tgres is still not quite production quality, though it’s probably stable enough for someone who knows their way around Go to give it a whirl. At this point I have proven the concept, and believe the architecture is sound, but the magnitude of the project turned out to be much grater than I originally pictured, and so it still needs lots and lots of proofreading, t’s crossed and i’s dotted.

Raisons d’etre

With Go, new things are possible

The idea of a TS database came about when I first decided to dive into Golang. Go can do great stuff, but I didn’t see how it applied to anything I was working on at the time. I needed a project that was a better match for the domain of applications that Go made possible, something where performance and scale matter, something with concurrent moving pieces, something challenging. A “time series database” seemed like it had potential. It has all kinds of curious requirements that could be great fun to implement in Go.

Present state of “time series databases” is dismal

I was (and still am) frustrated with the state of TS in our industry. Since the appearance of MRTG back in 1995 when the network admins of the then burgeoning Internet realized that TS is essential to device monitoring, not much has happened.

RRDTool was definitely a major step forward from MRTG which was merely a Perl script. RRDTool to this day is the best implementation of a round-robin database for time series data (in C to boot). Similarly to MRTG, RRDTool was designed as a command-line tool, the server component was left as an exercise for the user. And even though linking RRDTool into your app was not too difficult (I did it in 2004), somehow an “RRD server” never appeared.

Then there was Graphite. (I think Graphite is a reflection of the Python-can-do-anything era.) Graphite borrowed a lot of ideas from RRDTool, though its re-implementation of round-robin on-disk files in pure Python while claiming superiority is not that much better, if at all, IMHO when compared to RRDTool in both accuracy and performance. In general though, I think storing data directly in files is the wrong approach to begin with.

Graphite’s appeal is that it’s an easy-to-start server that does everything, and it became especially popular alongside Statsd a tool with umpteen different implementation designed to sit in front of Graphite. Eventually people stopped using Graphite to make charts favoring instead the most excellent Grafana, while Graphite (or its nephew Graphite-API) became a UI-less server-only component to store and retrieve data.

Graphite and RRDTool didn’t scale very well, so for “Big Time Series” (as in very large networks, or specialized fields like finance, weather, etc.) people used solutions backed by Cassandra, HBase, or Solr such as OpenTSDB.

There are also new kids on the block such as InfluxDB or Prometheus, which are a little too flashy and commercial by my taste, each trying to solve problems that I don’t think I have.

Bottom line is that some 20 years after MRTG, time series remains mostly a system monitoring aid and has never crossed over to the mainstream application development.

Data isolation

Virtually all of the aforementioned tools contribute to a problem I dub data isolation. Data isolation is when a part of our data is stored using a separate tool in a different format and is therefore not as easily accessible. For example if our metrics are in Graphite, we probably don’t even know how to get them out of it, nor does it occur to us that it might be useful. All we’ve been able to do is get a Grafana chart and are quite satisfied with it. We do not question why it isn’t a first-class citizen right in the database as a table, where we could use it in SQL joins, for example. Or export it to our big data rig and query it with Hive or Spark, etc.

Why is getting a quick chart of customer sign-ups per second next to all my customer data such a big deal these days? Why can’t it be as simple as a model in my Rails or Django app?

PostgreSQL - Avoid the storage mire

I believe that there is nothing about time series that makes it unfit for a relational database. Many projects out there are spinning their wheels solving the wrong problem, that of data storage. Storage is one of the hardest problems in computers, time series databases should focus on time series and delegate the storage to tried-and-true tools which are good at it.

Time series data does carry certain special requirements, and I’ve researched extensively all different ways TS can be stored in a relational database. It does require taking advantage of some newer features that in the open source database world seem most available in PostgreSQL. I am guessing that with time these capabilities will become more available in other databases, and some of them already are, but for the time being I’ve decided that Tgres is PostgreSQL-only.

A bit of detail

Emulating Graphite as a starting point

I would like Tgres to be useful. The simplest way I could think of achieving usefulness is by emulating an existing tool so that it can become a drop-in replacement. This makes adoption easy and it also proves that the underlying architecture is capable. It also lets us compare performance.

It doesn’t mean that I am a fan of how Graphite does things, but I think that if Tgres is architected in such a way that there is a lower level which does the heavy lifting and then a layer on top of it that makes it behave like Graphite, that’s a great start, and it leaves options open for potential improvement and a different/better interface.

General terminology

I always liked how RRDTool documentation broke down the problem of time series into concise and clear terms. Tgres tries to leverage the RRDTool terminology. Tgres also adopts the same techniques to the extent that is possible given a considerably different architecuture. Unlike RRDTool, Tgres uses a millisecond as the smallest unit of time measure.

Data Point (DP)

A data point is a value (a floating point number) a time stamp and a string name identifying the series. (For a while I contemplated allowing a data point to have multiple values, but it made things too complicated, so I reverted to a single value per data point).

Round-Robin Archive (RRA)

Tgres stores data points in round-robin archives. While “round-robin” is an implementation detail, it is part of the name because the only way it can be round-robin is the number of data points in the archive is constant. The time-span of the RRA is determined by the step (resolution) and the size of the archive (in steps). Thus RRA’s are defined by step and size, e.g. 10s for 24 hours (a data point every 10s for 24 hours, or 8,640 points).

A series is usually is stored in multiple RRA’s. The RRA’s typically have varying resolutions, e.g. we want a 10s step for the past 24h, but also a 1h step for a week and a 6h step for 3 years. In this example we have 3 RRA’s. Tgres takes care of maintaining the RRA’s and selecting the right resultion for a given query so that there is no need to deal with individual RRA’s directly.

Data Source (DS)

A group of RRA’s under the same identifier (aka series name) is referred to as a data source (DS). I suppose “DS” can be used interchangeably with “series”. Depending on how Tgres is configured, DS’s are either predefined or are created on the fly based on DS name matching rules.

Note that Tgres does not store the original data points, but only the weighted averages of the received data points in each RRA. This is how RRDTool does it. Graphite doesn’t bother averaging the points but simply discards previous data points within the same step. At first it may seem not ideal that the original data is discarded, but experience shows that just about any time series operation results in a conversion to a fixed interval form as the first step, so it might as well just be done upfront.

Heartbeat (HB)

Every DS has a heartbeat, a time duration which defines the longest possible period of inactivity before the DS becomes considered dysfunctional. If the heartbeat is exceeded, the data since the last update will be recorded as NaNs.

Xfiles factor (XFF)

When data is consolidated from smaller to larger step RRAs, the XFF determines how much of the data is allowed to be NaN before the consolidated value becomes NaN. For example if we are consolidating per-minute values into a per-hour value, if one of the minutes happens to be NaN, strictly speaking the whole hour ought ot be NaN, but that wouldn’t be very useful. Default XFF is .5, i.e. more than half of the per-minute values should be NaN before the per-hour value is considered NaN.

Postgres storage format

A time series is a series of floats. Note that when it’s stored in RRA’s, there is no need for timestamps - each position in an RRA has its timestamp defined by the current state of the RRA. If we know the timestamp of the tip, we know the timestamp of every element going back to the beginning of the RRA.

To store data points Tgres takes advantage of PostgreSQL arrays. A single row stores many data points. Tgres further splits series into multiple rows to optimize the IO.

To make the data easy to use, Tgres also creates a view which makes the data points structured as a regular table with a row per data point.

There are only 3 tables and 1 view required for Tgres operation. You can use the same database you use for any other web app you have. This means you can access the time series by simply just adding a model pointing at the Tgres time series view to your Rails/Django/whatever to get access to the data.

Tgres components

Tgres is organized as a set of Go packages.

tgres/daemon

The daemon is the main process that runs everything. It includes the config parser, and the listeners that receive and parse incoming data points using both UDP and TCP Graphite formats, as well as Python Pickle format (though I’m not sure who out there really uses it). It’s not too hard to add more formats, for example I think it’d be neat if Tgres could receive data points via an HTTP pixel that could be embedded in web pages.

The daemon also takes care of graceful restarts, logging and other typical long-running service stuff.

trges/receiver

The receiver (formerly known as transceiver) is the data point router and cache. It maintains a set of workers responsible for writing the data points to their respective RRA’s, as well as caching and periodic flushing of the cache. Flushing is done once a certian number of points has accumulated or a period of time has passed, but not more often than the minimal flush frequency (all configurable).

tgres/rrd

The responsibility of rrd is to add data points to RRA’s. This is not as simple as it sounds, a good description of the concepts behind it is available here.

tgres/http

http is the place for all things related to HTTP, which currently is just the Graphite API. The API requests are passed down to the DSL level for processing.

tgres/dsl

dsl is an implementation of the Graphite functions. There are a few differences because I used the Go parser which is nearly syntactically identical. (For example a series name cannot begin with a digit because that is not a proper Go identifier).

Graphite has a lot number of functions available in its DSL, and I spent a lot of time during our beach vacation last summer trying to implement them all, but I think a few are still left undone. Some were harder than others, and some led me on side adventures such as figuring out the Holt-Winters triple exponential smoothing and how to do it correctly. (ZZZ - link)

tgres/serde

The interface to the database is reduced to a fairly compact SerDe (Serialize-Deserializer) interface. While the SerDe itself is utterly simplistic (e.g. “get me this series”), the SQL behind it anything but, still, it should be possible to throw together an alternative SerDe for a different relational database (or not a database at all?).

tgres/statsd

Statsd is currently in a separate Go package, but I might integrate with the RRD because it is not very clear that it needs to be a separate thing. Somehow it so happened that Graphite and Statd are two separate projects, but the reasons for this are probably more cultural than by design.

tgres/cluster

Cluster supports very basic clustering. At this point it’s “good enough” given that it’s OK to occasionally lose data points during cluster transitions and all that we want to make sure of is that nodes can come and go without disruptions.

The principle behind cluster is that each node is responsible for one or more series and other nodes will forward data points to the responsible node. There is nearly zero configuration, and any node can act as the point of contact, i.e. there is no leader.

The way clustering is done is in flux at the moment, we might change it to something more robust in the near future, but for the time being it addresses the horizontal scaling problem.

There’s still lots to do…

There’s still a lot of work to be done on Tgres. For one thing, I don’t have any tests. This is mainly because I don’t believe in testing that which hasn’t “gelled”, and I wouldn’t be surprised if the above organization of packages and how they interface changes as I understand the problem better. We also need documentation. And some real-life use/testing/feedback would be great as well.

Deploying a Golang App to AWS ECS With Terraform

| Comments

I’ve put together a basic example of a “Hello World” Go program which runs in Amazon AWS Elastic Compute Service (ECS), which allows running applications in Docker containers and has the ability to scale on demand.

I initially wanted to write about the components of this system and the tools you use to deploy your application, but soon realized that this would make for an extremely long post, as the number of components required for a simple “Hello World” is mind boggling. However problematic it may seem, it’s par for the course, this is what takes to run an application in our cloudy times.

I used Terraform to build all the AWS infrastructure. Initially I was skeptical on how well it could accomplish such a tedious task, but I have say my confidence in Terraform grew the more I used it.

The main top level tool for everything is the good old make, a tool that stood the test of time.

Here is the code of the example, read the README, I hope you find it useful:

https://github.com/grisha/hello-go-ecs-terraform