Building a Go Web App - Part 3

This is part 3. See part 1 and part 2.

The previous two posts got us to a point where we had a Go app which was able to serve a tiny bit of HTML. This post will talk about the client side, which, alas, is mostly JavaScript, not Go.

JavaScript in 2017

This is what gave me the most grief. I don’t really know how to categorize the mess that present day JavaScript is, nor do I really know what to attribute it to, and trying to rationalize it would make for a great, but entirely different blog post. So I’m just going to accept this as the reality we cannot change and move on to how to best work with it.

Building a Go Web App - Part 4

This is part 4. See part 1, part 2 and part 3.

In this part I will try to briefly go over the missing pieces in our very simplistic Go Web App.

HTTP Handler Wrappers

I tiny rant: I do not like the word “middleware”. The concept of a wrapper has been around since the dawn of computing, there is no need to invent new words for it.

Having that out of the way, let’s say we need to require authentication for a certain URL. This is what our index handler presently looks like:

Tgres 0.10.0b - Time Series with Go and PostgreSQL

After nearly two years of hacking, I am tagging this version of Tgres as beta. It is functional and stable enough for people to try out and not feel like they are wasting their time. There is still a lot that could and should be improved, but at this point the most important thing is to get more people to check it out.

What is Tgres?

Tgres is a Go program which can receive time series data via Graphite, Statsd protocols or an http pixel, store it in PostgreSQL, and provide Graphite-like access to the data in a way that is compatible with tools such as Grafana. You could think of it as a drop-in Graphite/Statsd replacement, though I’d rather avoid direct comparison, because the key feature of Tgres is that data is stored in PostgreSQL.

Tgres Load Testing Follow Up

To follow up on the previous post, after a bunch of tweaking, here is Tgres (commit) receiving over 150,000 data points per second across 500,000 time series without any signs of the queue size or any other resource blowing up.

This is both Tgres and Postgres running on the same i2.2xlarge EC2 instance (8 cores, 64GB, SSD).

At this point I think there’s been enough load testing and optimization, and I am going to get back to crossing the t’s and dotting the i’s so that we can release the first version of Tgres.

PostgreSQL vs Whisper, which is Faster?

Note: there is an update to this post.

TL;DR

On a 8 CPU / 16 GB EC2 instance, Tgres can process 150,000 data points per second across 300,000 series (Postgres running on the same machine). With some tweaks we were able to get the number of series to half a million, flushing ~60K data points per second.

Now the long version…

If you were to ask me whether Tgres could outperform Graphite, just a couple of months ago my answer would have been “No”. Tgres uses Postgres to store time series data, while Graphite stores data by writing to files directly, the overhead of the relational database just seemed too great.

Storing Time Series in PostgreSQL - Optimize for Write

Continuing on the previous write up on how time series data can be stored in Postgres efficiently, here is another approach, this time providing for extreme write performance.

The “horizontal” data structure in the last article requires an SQL statement for every data point update. If you cache data points long enough, you might be able to collect a bunch for a series and write them out at once for a slight performance advantage. But there is no way to update multiple series with a single statement, it’s always at least one update per series. With a large number of series, this can become a performance bottleneck. Can we do better?

Simple Tgres Part II - A High Rate Counter

Continuing on the the previous post on simple use of Tgres components, let’s try to count something that goes by really fast.

This time let’s start out with creating a memory-based SerDe. This means that all our data is in memory and there is no database backing our series.

package main

import (
    "fmt"
    "net/http"
    "time"

    "github.com/tgres/tgres/dsl"
    h "github.com/tgres/tgres/http"
    "github.com/tgres/tgres/receiver"
    "github.com/tgres/tgres/rrd"
    "github.com/tgres/tgres/serde"
)

func main() {

    step := 1 * time.Second // 1 second resolution
    span := 600 * step      // spanning 10 minutes

    // In-memory SerDe
    ms := serde.NewMemSerDe()

    // Create a receiver of our data points backed by the above
    // memory SerDe
    rcvr := receiver.New(ms, &receiver.SimpleDSFinder{&rrd.DSSpec{
        Step: step,
        RRAs: []rrd.RRASpec{
            rrd.RRASpec{Function: rrd.WMEAN,
                Step: step,
                Span: span,
            },
        }}})
    rcvr.Start()

Now let’s create a goroutine which creates data points as fast as it can, the difference from the previous blog post is that we are using QueueGauge(), which is a paced metric, meaning that it flushes to the time series only periodically (once per second by default) so as to not overwhelm the I/O and or network (even though in this case it doesn’t really matter since we’re using a memory-based SerDe anyway).

Why is there no Formal Definition of Time Series?

If you’re reading this, chances are you may have searched for definition of “Time Series”. And, like me, you were probably disappointed by what you’ve found.

The most popular “definition” I come across amongst our fellow programmer folk is that it’s “data points with timestamps”. Or something like that. And you can make charts from it. And that’s about it, alas.

The word time suggests that is has something to do with time. At first it seems reasonable, I bite. The word series is a little more peculiar. A mathematician would argue that a series is a sum of a sequence. Most people though think “series” and “sequence” are the same thing, and that’s fine. But it’s a clue that time series is not a scientific term, because it would have been called time sequence most likely.

Simple Time Series App with Tgres

Did you know you can use Tgres components in your code without PostgreSQL, and in just a dozen lines of code instrument your program with a time series. This example shows a complete server emulating Graphite API which you can use with Grafana (or any other tool).

In this example we will be using three Tgres packages like so (in addition to a few standard ones, I’m skipping them here for brevity - complete source code gist):

Storing Time Series in PostgreSQL (Continued)

Edit: there is now a part iii in this series of articles.

I have previously written how time series can be stored in PostgreSQL efficiently using arrays.

As a continuation of that article, I shall attempt to describe in detail the inner workings of an SQL view that Tgres uses to make an array of numbers appear as a regular table (link to code).

In short, I will explain how incomprehensible data like this:

=> select * from ts;
 rra_id | n |           dp
--------+---+------------------------
      1 | 0 | {64,67,70,71,72,69,67}
      1 | 1 | {65,60,58,59,62,68,70}
      1 | 2 | {71,72,77,70,71,73,75}
      1 | 3 | {79,82,90,69,75,80,81}

… can be transformed in an SQL view to appear as so: