Parsing Table Names from SQL

2016-11-14

Sometimes it is useful to extract table names from an SQL statement, for example if you are trying to figure out dependencies for your Hive or BigQuery (or whatever) tables.

It is actually a lot simpler than it seems and you don’t need to write your own SQL parser or find one out there. In SQL table names always follow the FROM and JOIN keywords. So all you have to do is split the statemement into tokens, and scan the list for any mention of FROM or JOIN and grab the next token.

Load testing Tgres

2016-11-08

Edit: There is an update to this story.

So I finally got around to some load testing of Tgres. Load testing is mysterious, it never goes the way you think it would, and what you learn is completely unexpcted.

Given that I presently don’t have any spare big iron at my disposal and my “servers” are my macbook and an old thinkpad, all I really was after is making sure that Tgres is “good enough” whatever that means. And I think it is.

Golang receiver vs function argument

2016-09-22

What is the difference between a Go receiver (as in “method receiver”) and a function argument? Consider these two bits of code:

func (d *duck) quack() { // receiver
     // do something
}

versus

func quack(d *duck) { // funciton argument
    // do something
}

The “do something” part above would work exactly the same regardless of how you declare the function. Which begs the question, which should you use?

In the object-oriented world we were used to objects doing things, and in that context d.quack() may seem more intuitive or familiar than quack(d) because it “reads better”. After all, one could argue that the former is a duck quacking, but the latter reads like you’re quacking a duck, and what does that even mean? I have learned that you should not think this way in the Go universe, and here is why.

How Data Points Build Up

2016-08-04

This silly SVG animation (animation not my strong suit) demonstrates what happens when multiple Tgres data points arrive within the same step (i.e. smallest time interval for this series, also known as PDP, primary data point).

Explanation

Let’s say we have a series with a step of 100 seconds. We receive the following data points, all within the 100 second interval of a single step:

Introducing Tgres - A Time Series DB on top of PostgreSQL

2016-07-29

Tgres is a metrics collection and storage server, aka a time series database. I’m not very comfortable with referring to it as a database, because at least in case of Tgres, the database is actually PostgreSQL. But also “database” to me is in the same category as “operating system” or “compiler”, a thing so advanced that only few can claim to be it without appearing pretentious. But for the sake of tautology avoidance, I might occasionally refer to Tgres as a TS database.

Deploying a Golang app to AWS ECS with Terraform

2016-04-19

I’ve put together a basic example of a “Hello World” Go program which runs in Amazon AWS Elastic Compute Service (ECS), which allows running applications in Docker containers and has the ability to scale on demand.

I initially wanted to write about the components of this system and the tools you use to deploy your application, but soon realized that this would make for an extremely long post, as the number of components required for a simple “Hello World” is mind boggling. However problematic it may seem, it’s par for the course, this is what takes to run an application in our cloudy times.

Holt-Winters Forecasting for Dummies - Part III

2016-02-17

If you haven’t read Part I and Part II you probably should, or the following will be hard to make sense of.

In Part I we’ve learned how to forceast one point, in Part II we’ve learned how to forecast two points. In this part we’ll learn how to forecast many points.

More Terminology

Season

If a series appears to be repetitive at regular intervals, such an interval is referred to as a season, and the series is said to be seasonal. Seasonality is required for the Holt-Winters method to work, non-seasonal series (e.g. stock prices) cannot be forecasted using this method (would be nice though if they could be).

Holt-Winters Forecasting for Dummies - Part II

2016-02-16

If you haven’t read Part I you probably should, or the following will be hard to make sense of.

All the forecasting methods we covered so far, including single exponential smoothing, were only good at predicting a single point. We can do better than that, but first we need to be introduced to a couple of new terms.

More terminology

Level

Expected value has another name, which, again varies depending on who wrote the text book: baseline, intercept (as in Y-intercept) or level. We will stick with “level” here.

Holt-Winters Forecasting for Dummies (or Developers) - Part I

2016-01-29

This three part write up [Part II Part III] is my attempt at a down-to-earth explanation (and Python code) of the Holt-Winters method for those of us who while hypothetically might be quite good at math, still try to avoid it at every opportunity. I had to dive into this subject while tinkering on tgres (which features a Golang implementation). And having found it somewhat complex (and yet so brilliantly simple), figured that it’d be good to share this knowledge, and in the process, to hopefully solidify it in my head as well.

Storing Time Series in PostgreSQL efficiently

2015-09-23

With the latest advances in PostgreSQL (and other db’s), a relational database begins to look like a very viable TS storage platform. In this write up I attempt to show how to store TS in PostgreSQL. (2016-12-17 Update: there is a part 2 of this article.)

A TS is a series of [timestamp, measurement] pairs, where measurement is typically a floating point number. These pairs (aka “data points”) usually arrive at a high and steady rate. As time goes on, detailed data usually becomes less interesting and is often consolidated into larger time intervals until ultimately it is expired.