Gregory Trubetskoy

Relative Imports Hack in Golang

2018-10-18T12:42:00-04:00

If you have multiple packages in your program’s Go source code, you probably have come across a situation where if you create a fork on Github or move/rename your top level directory, all your import statements need to be adjusted.

There is a simple hack to accomplish relative imports by using the vendor directory and symlinks. If you have a package directory called mypkg, then the following should work:

$ mkdir -p vendor/relative
$ ln -s ../../mypkg vendor/relative/mypkg

And now, your Go code could do this:

package main

import (
    "fmt"

    "relative/mypkg"
)

I cannot think of any downsides to this approach, if you know of any, please do comment!

Blockchain Proof-of-Work is a Decentralized Clock

2018-01-23T11:41:00-05:00

This is an explanation of the key function on Proof-of-Work in the Bitcoin blockchain. It focuses on the one feature of Proof-of-Work that is essential and shows that other features often talked about such as security are secondary side-effects, useful, but not essential.

This explanation rests on illustrating a few interesting properties of how Proof-of-Work is used in the blockchain that are not immediately obvious and sometimes are rather counter-intuitive, for example how participants collectively solve a problem without ever communicating.

Having understood each of these properties, one should conclude that Proof-of-Work is primarily a mechanism which accomplishes a distributed and decentralized system of timing, i.e. a clock.

Note that this write up isn’t about Proof-of-Work per se, it explains how the blockchain takes advantage of it. If you do not know anything about Proof-of-Work, then this link might be a good start.

The Decentralized Ledger Time Ordering Problem

Before describing the solution, let us focus on the problem. Much of the literature around Proof-of-Work is so confusing because it attempts to explain the solution without first identifying the problem.

Any ledger absolutely needs order. One cannot spend money that has not been received, nor can one spend money that is already spent. Blockchain transactions (or blocks containing them) must be ordered, unambiguously, and without the need for a trusted third party.

Even if the blockchain was not a ledger but just data like a log of some sort, for every node to have an identical copy of the blockchain, order is required. A blockchain in a different order is a different blockchain.

But if transactions are generated by anonymous participants all over the world, and no central party is responsible for organizing the list, how can it be done? For example transactions (or blocks) could include timestamps, but how could these timestamps be trusted?

Time is but a human concept, and any source of it, such as an atomic clock, is a “trusted third party”. Which, on top of everything, is slightly wrong most of time due to network delays as well as the effects of Relativity. Even time dilation between someone in an airplane vs the ground, though minute, is sufficient to make ordering impossible. Paradoxically, relying on a timestamp to determine event order is not possible in a decentralized geographically dispersed system.

The “time” we are interested in is not the year, month, day, etc. that we are used to. What we need is a mechanism by which we can verify that one event took place before another or perhaps concurrently.

First though, for the notions of before and after to be applicable, a point in time needs to be established. Establishing a point in time may seem theoretically impossible at first because there is no technology accurate enough to measure a Planck. But as you’ll see, Bitcoin works around this by creating its own notion of time where precise points in time are in fact possible.

This problem is well described in Leslie Lamport’s 1978 paper “Time, Clocks, and the Ordering of Events in a Distributed System” which doesn’t actually provide a comprehensive solution other than “properly synchronized physical clocks”. In 1982 Lamport also described the “Byzantine Generals Problem”, and Satoshi in one of his first emails explains, how Proof-of-Work is a solution, though the Bitcoin paper states “To implement a distributed timestamp server on a peer-to-peer basis, we will need to use a proof-of-work system”, suggesting that it primarily solves the issue of timestamping.

Timing is the Root Problem

It must be stressed that the impossibility of associating events with points in time in distributed systems was the unsolved problem that precluded a decentralized ledger from ever being possible until Satoshi Nakamoto invented a solution. There are many other technical details that play into the blockchain, but timing is fundamental and paramount. Without timing there is no blockchain.

Proof-of-Work Recap

Very briefly, the Bitcoin Proof-of-Work is a value whose SHA-2 hash conforms to a certain requirement which makes such a value difficult to find. The difficulty is established by requiring that the hash is less than a specific number, the smaller the number, the more rare the input value and the higher the difficulty of finding it.

It is called “Proof Of Work” because it is known that a value with such a hash is extremely rare, which means that finding such a value requires a lot of trial and error, i.e. “work”. Work in turn implies time.

By varying the requirement, we can vary the difficulty and thus the probability of such a hash being found. The Bitcoin Difficulty adjusts dynamically so that a proper hash is found on average once every ten minutes.

Nothing Happens Between Blocks

The state of the chain is reflected by its blocks, and each new block produces a new state. The blockchain state moves forward one block at a time, and the average 10 minutes of a block is the smallest measure of blockchain time.

SHA is Memoryless and Progress-Free

The Secure Hash Algorithm is what is known in statistics and probability as memoryless. This is a property that is particularly counter-intuitive for us humans.

The best example of memoryless-ness is a coin toss. If a coin comes up heads 10 times in a row, does it mean that the next toss is more likely to be tails? Our intuition says yes, but in reality each toss has a 50/50 chance of either outcome regardless of what happened immediately prior.

Memorylessness is required for the problem to be progress-free. Progress-free means that as miners try to solve blocks iterating over nonces, each attempt is a stand-alone event and the probability of finding a solution is constant at each attempt, regardless of how much work has been done in the past. In other words at each attempt the participant is not getting any “closer” to a solution or is making no progress. And a miner who’s been looking for a solution for a year isn’t more likely to solve a block at the next attempt than a miner who started a second ago.

The probability of finding the solution given a specific difficulty in a given period of time is therefore determined solely by the speed at which all participants can iterate through the hashes. Not the prior history, not the data, just the hashrate.

The hashrate in turn is a function of the number of participants and the speed of the equipment used to calculate the hash.

(NB: Though strictly speaking SHA is not progress-free because there is a finite number of hashes, the range of a 256-bit integer is so vast that it is practically progress-free.)

The SHA Input is Irrelevant

In the Bitcoin blockchain the input is a block header. But if we just fed it random values, the probability of finding a conforming hash would still be the same. Regardless of whether the input is a valid block header or bytes from /dev/random, it is going to take 10 minutes on average to find a solution.

Of course if you find a conforming hash but your input wasn’t a valid block, such a solution cannot be added to the blockchain, but it is still Proof-of-Work (albeit useless).

The Difficulty is Intergalactic

Curiously, the difficulty is universal, meaning it spans the entire universe. We could have miners on Mars helping out, they do not need to know, or communicate with the Earth miners, the problem would still be solved every 10 minutes. (Ok, they’ll need to somehow tell the Earth people that they solved it if they do, or else we’ll never know about it.)

Remarkably, the distant participants are communicating without actually communicating, because they are collectively solving the same statistical problem and yet they’re not even aware of each other’s existence.

This “universal property” while at first seemingly magical is actually easy to explain. I used the term “universal” because it describes it well in one word, but really it means “known by every participant”.

The input to SHA-256 can be thought of as an integer between 0 and 2²⁵⁶ (because the output is 32 bytes, i.e. also between 0 and 2²⁵⁶, anything larger guarantees a collision, i.e. becomes redundant). Even though it is extremely large (exponentially larger than the number of atoms in the perceivable universe), it is a set of numbers that is known by every participant and the participants can only pick from this set.

If the input set is universally known, the function (SHA-256) is universally known, as well as the difficulty requirement is universally known, then the probability of finding a solution is also indeed “universal”.

Trying a SHA Makes You a Participant

If the stated problem is to find a conforming hash, all you have to do is to try it once, and bingo, you’ve affected the global hash rate, and for that one attempt you were a participant helping others solve the problem. You did not need to tell others that you did it (unless you actually found a solution), others didn’t need to know about it, but your attempt did affect the outcome. For the whole universe, no less.

If the above still seems suspicious, a good analogy might be the problem of finding large prime numbers. Finding the largest prime number is hard and once one is found, it becomes “discovered” or “known”. There is an infinite number of prime numbers, but only one instance of each number in the universe. Therefore whoever attempts to find the largest prime is working on the same problem, not a separate instance of it. You do not need to tell anyone you decided to look for the largest prime, you only need to announce when you find one. If no one ever looks for the largest prime, then it is never going to be found. Thus, participation (i.e. an attempt to find one), even if it’s in total secrecy, still affects the outcome, as long as the final discovery (if found at all) is publicized.

Taking advantage of this mind-boggling probabilistic phenomenon whereby any participation affects the outcome even if in complete secrecy and without success, is what makes Satoshi’s invention so remarkably brilliant.

It is noteworthy that since SHA is progress-free, each attempt could be thought of as a participant joining the effort and immediately leaving. Thus miners join and leave, quintillions of times per second.

The Participation is Revealed in Statistics

The magical secret participation property also works in reverse. The global hashrate listed on many sites is known not because every miner registered at some “miners registration office” where they report their hash rate periodically. No such thing exists.

The hash rate is known because for a solution of a specific difficulty to be found in 10 minutes, on average this many attempts (~10²¹ as of this writing) had to have been made by someone somewhere.

We do not know who these participants are, they never announced that they are working, those who did not find a solution (which is practically all of them) never told anyone they were working, their location could have been anywhere in the universe, and yet we know with absolute certainty that they exist. Simply because the problem continues to be solved.

Work is a Clock

And there is the crux of it: The difficulty in finding a conforming hash acts as a clock. A universal clock, if you will, because there is only one such clock in the universe, and thus there is nothing to sync and anyone can “look” at it.

It doesn’t matter that this clock is imprecise. What matters is that it is the same clock for everyone and that the state of the chain can be tied unambiguously to the ticks of this clock.

This clock is operated by the multi-exahash rate of an unknown number of collective participants spread across the planet, completely independent of one another.

Last Piece of the Puzzle

The solution must be the hash of a block (the block header, to be precise). As we mentioned, the input doesn’t matter, but if it is an actual block, then whenever a solution is found, it happened at the tick of our Proof-of-Work clock. Not before, not after, but exactly at. We know this unambiguosly because the block was part of that mechanism.

To put it another way, if blocks weren’t the input to the SHA256 function, we’d still have a distributed clock, but we couldn’t tie blocks to the ticks of this clock. Using blocks as input addresses this issue.

Noteworthy, our Proof-of-Work clock only provides us with ticks. There is no way tell order from the ticks, this is what the hash chain is for.

What About the Distributed Consensus?

Consensus means agreement. What all participants have no choice but to agree on is that the clock has ticked. Also that everyone knows the tick and the data attached to it. And this, in fact, does solve the Byzantine Generals Problem, as Satoshi explained in an email referenced earlier.

There is a separate consensus in a rare but common case of two consecutive ticks being associated with conflicting blocks. The conflict is resolved by what block will be associated with the next tick, rendering one of the disputed blocks “orphan”. How the chain will continue is a matter of chance, and so this too could probably be indirectly attributed to the Proof-of-Work clock.

And that is it

This is what Proof-of-Work does for the blockchain. It is not a “lottery” where miners win the right to solve a block, nor is it some peculiar conversion of real energy into a valuable concept, those are all red herrings.

For example the lottery and the miner’s reward aspect is what encourages miners to participate, but it isn’t what makes the blockchain possible. Blocks hashes form a chain, but again, that has nothing to do with Proof-of-Work, it cryptographically reinforces recording of the block ordering. The hash chain also makes the previous ticks “more certain”, “less deniable” or simply more secure.

Proof-of-Work is also the mechanism by which blocks become effectively immutable, and that’s a nice side-effect which makes Segregated Witness possible, but it could just as well be done by preserving the signatures (witness), so this too is secondary.

Conclusion

The Bitcoin blockchain Proof-of-Work is simply a distributed, decentralized clock.

If you understand this explanation, then you should have a much better grasp of how Proof-of-Work compares to Proof-of-Stake, and it should be apparent that the two are not comparable: Proof-Of-Stake is about (randomly distributed) authority, while Proof-of-Work is a clock.

In the context of the blockchain, Proof-of-Work is probably a misnomer. The term is a legacy from the Hashcash project, where it indeed served to prove work. In the blockchain it is primarily about verifiably taking time. When one sees a hash that satisfies the difficulty, one knows it must have taken time. The method by which the delay is accomplished is “work”, but the hash is primarily interesting because it is a proof of time.

The fact that Proof-of-Work is all about time rather than work also suggests that there may be other similar statistical challenges that are time-consuming but require less energy. It may also mean that the Bitcoin hashrate is excessive and that the Bitcoin clock we described above could operate as reliably on a fraction of the hashrate, but it is the incentive structure that drives up the energy consumption.

Figuring out a way to pace ticks with less work is a trillion dollar problem, if you find one, please do let me know!

P.S. Special thanks to Sasha Trubetskoy of UChicago Statistics for the review and suggestions for the above text.

The Bitcoin Blockchain PostgresSQL Schema

2017-12-15T08:28:00-05:00

In a previous post I wrote some initial thoughts on storing the blockchain in Postgres. It’s been a couple of months and I’ve made some progress on the import project. This post documents the latest incarnation of the SQL schema used to store the blockchain as well as thoughts on why it was decided to be this way.

Blockchain Data Structure Overview

The Bitcoin blockchain consists of blocks. A block is a set of transactions. A block also contains some block-specific information, such as the nonce for the Proof-Of-Work validating the block.

A transaction consist of inputs and outputs. The inputs reference outputs from prior transactions, which may include transactions in the same block. When an output is referenced by an input, the output is considered spent in its entirety, i.e. there is no way to spend a part of an output.

When two different transactions’ inputs reference the same output, it is considered a double spend, and only one of the spending transactions is valid (the details of how validity is determined are outside the scope of this write up). While double-spends do imply that one of the transactions is invalid, it is not uncommon for double-spends to exist at least for a period of time, thus the database schema needs to allow them.

A transaction’s input value is the sum of its inputs and the output value is the sum of its outputs. Naturally, the output value cannot exceed the input value, but it is normal for the output value to be less than the input value. The discrepancy between the input and the output is the transaction fee and is taken by the miner solving the block in which the transaction is included.

The first transaction in a block is referred to as the coinbase. Coinbase is a special transaction where the inputs refer to a (non-existent) transaction hash of all zeros. Coinbase outputs are the sum of all the fees and the miner reward.

Curiously it is possible for the same coinbase transaction to be included in more than one block, and there is at least one case of this in the blockchain. The implication of this is that the second instance of such a transaction is unspendable. This oddity was addressed by a change in the consensus which requires the block height to be referenced in the coinbase and is since then no longer possible (see BIP30).

The same transaction can be included in more than one block. This is common during chain splits, i.e. when more than one miner solves a block. Eventually one of such blocks will become orphaned, but there is a period of time during which it is not known which chain is considered “best”, and the database structure needs to accommodate this. Chain splits also cause multiple blocks to have the same height which implies that height alone cannot identify a particular block or that it is unique.

With introduction of SegWit transactions also include witness data. Witness is stored at the end of a transaction as a list where each entry corresponds to an input. A witness entry is in turn a list, because an input can have multiple signatures (aka witness). Presently per-input witness list is stored in the input record as a BYTEA.

Row Ids and Hashes

In the blockchain blocks and transactions are always referred to through their hash. A hash is an array of 32 bytes. While in theory we could build a schema which relies on the hash as the record identifier, in practice it is cumbersome compared to the traditional integer ids. Firstly, 32 bytes is four times larger than a BIGINT and eight times larger than an INT, which impacts greatly the amount of space required to store inputs and outputs as well as degrades index performance. For this reason we use INT for block ids and BIGINT for transaction ids (INT is not big enough and would overflow in a few years).

There is also an ambiguity in how the hash is printed versus how it is stored. While the SHA standard does not specify the endian-ness of the hash and refers to it as an array of bytes, Satoshi Nakomoto decided to treat hashes as little-endian 256-bit integers. The implication being that when the hash is printed (e.g. as a transaction id) the order of bytes is the reverse of how it is stored in the blockchain.

Using integer ids creates a complication in how inputs reference outputs. Whereas in the blockchain it is done entirely via a transaction hash, here we need to also store the integer id of the referenced transaction (prevout_tx_id). This is an easily justifiable optimization, without it to lookup the input transaction would require first finding the transaction integer id. The downside is that during the initial import maintaining the hash to integer id correspondence in an efficient manner is bit of a challenge.

Integers

Most integers in Core are defined as uint32_t, which is an unsigned 4-byte integer. Postgres 4-byte INT is signed, which presents us with two options: (1) use BIGINT instead, or (2) use INT with the understanding that larger values may appear as negative. We are opting for the latter as preserving space is more important and for as long as all the bits are correct, whether the integer is interpreted as signed or unsigned is of no consequence.

Blocks

Blocks are collections of transactions. It is a many-to-many relationship as multiple blocks can include to the same transaction. The CBlockHeader is defined in Core as follows:

class CBlockHeader
{
public:
    // header
    int32_t nVersion;
    uint256 hashPrevBlock;
    uint256 hashMerkleRoot;
    uint32_t nTime;
    uint32_t nBits;
    uint32_t nNonce;

Our blocks table is defined as follows:

  CREATE TABLE blocks (
   id           SERIAL
  ,height       INT NOT NULL
  ,hash         BYTEA NOT NULL
  ,version      INT NOT NULL
  ,prevhash     BYTEA NOT NULL
  ,merkleroot   BYTEA NOT NULL
  ,time         INT NOT NULL
  ,bits         INT NOT NULL
  ,nonce        INT NOT NULL
  ,orphan       BOOLEAN NOT NULL DEFAULT false
  ,status       INT NOT NULL
  ,filen        INT NOT NULL
  ,filepos      INT NOT NULL
  );

Columns orphan, status, filen and filepos are from the CBlockIndex class which is serialized in LevelDb and not formally part of the blockchain. It contains information about the file in which the block was stored on-disk as far as Core is concerned. This information is only necessary for debugging purposes, also note that it is unique to the particular instance of the Core database, i.e. if you were to wipe it and download the chain from scratch, location and even status of blocks is likely to be different.

Note that the C++ CBlockHeader class does not actually include the hash, it is computed on-the-fly as needed. Same is true with respect to transaction id.

We also need a many-to-many link to transactions, which is the block_txs table. Not only do we need to record that a transaction is included in a block, but also its exact position relative to other transactions, denoted by the n column:

  CREATE TABLE block_txs (
   block_id      INT NOT NULL
  ,n             INT NOT NULL
  ,tx_id         BIGINT NOT NULL
  );

Transactions

A transaction is a collection of inputs and outputs. The CTransaction C++ class is defined as follows:

class CTransaction
{
public:
    const int32_t nVersion;
    const std::vector<CTxIn> vin;
    const std::vector<CTxOut> vout;
    const uint32_t nLockTime;

In Postgres transactions are in the txs table:

  CREATE TABLE txs (
   id            BIGSERIAL
  ,txid          BYTEA NOT NULL
  ,version       INT NOT NULL
  ,locktime      INT NOT NULL
  );

The txid column is the transaction hash and should not be confused with tx_id in other tables referencing the transaction. (“txid” is what the transaction hash is typically called in code and documentation).

Outputs

In Core an output is represented by the CTxOut class:

class CTxOut
{
public:
    CAmount nValue;
    CScript scriptPubKey;

The CAmount type above is a typedef int64_t, it is the value of the output in satoshis which can be as high as 21M * 100M (the number of satoshis in a bitcoin).

In SQL, an output looks like this:

  CREATE TABLE txouts (
   tx_id        BIGINT NOT NULL
  ,n            INT NOT NULL
  ,value        BIGINT NOT NULL
  ,scriptpubkey BYTEA NOT NULL
  ,spent        BOOL NOT NULL
  );

The tx_id column is the transaction to which this output belongs, n is the position within the output list.

The spent column is an optimization, it is not part of the blockchain. An output is spent if later in the blockchain there exists an input referencing it. Core maintains a separate LevelDb dataset called the UTXO Set (Unspent Transaction Output Set) which contains all unspent outputs. The reason Core does it this way is because by default it does not index transactions, i.e. Core actually does not have a way of quickly retrieving a transaction from the store as there generally is no need for such retrieval as part of a node operation, while the UTXO Set is both sufficient and smaller than a full transaction index. Since in Postgres we have no choice but to index transactions, there is no benefit in having UTXOs as a separate table, the spent flag serves this purpose instead.

The UTXO Set does not include any outputs with the value of 0, since there is nothing to spend there even though no input refers to them and they are not technically spent.

Inputs

An input in Core is represented by the CTxIn class, which looks like this:

class CTxIn
{
public:
    COutPoint prevout;
    CScript scriptSig;
    uint32_t nSequence;
    CScriptWitness scriptWitness;

The COutPoint class is a combination of a hash and an integer representing an output. CScriptWitness is an array of “witnesses” or (roughly speaking) signatures, which are byte arrays, just like the scriptSig.

In our schema, an input is defined as:

  CREATE TABLE txins (
   tx_id         BIGINT NOT NULL
  ,n             INT NOT NULL
  ,prevout_hash  BYTEA NOT NULL
  ,prevout_n     INT NOT NULL
  ,scriptsig     BYTEA NOT NULL
  ,sequence      INT NOT NULL
  ,witness       BYTEA
  ,prevout_tx_id BIGINT
  );

As we already mentioned above witness is stored as opaque bytes. The prevout_tx_id is the database row id of the transaction this input is spending.

Indexes and Foreign Key Constraints

Blocks and transactions are indexed by id as their primary index. Blocks also need an index on hash (unique), as well as on height and on prevhash (not unique). Transactions need a unique index on the txid.

Inputs and outputs need (tx_id, n) as primary indexes. Inputs are also indexed on (prevout_tx_id, prevout_n) so that we can quickly identify the spending input given an output.

Finally, we need a basic set of foreign key constraints that ensure the integrity between all the related tables.

Triggers

The spent column in the output and the prevout_tx_id of an input are maintained by a trigger on the txins table. Every time an input is inserted, it locates the database id of the transaction it spends as well as updates the spent flag of the corresponding output.

Technically it is done using two triggers for performance reasons. This is because a trigger that modifies the row being inserted must be a BEFORE trigger, but BEFORE triggers are not allowed to be to be CONSTRAINT triggers. CONSTRAINT triggers have the advantage of being deferrable, i.e. they can be postponed until (database) transaction commit time. Deferring constraints can speed up inserts considerably, for this reason the code that updates spent is in a separate AFTER trigger.

The trigger code is still rough around the edges, but here it is for posterity anyway:

CREATE OR REPLACE FUNCTION txins_before_trigger_func() RETURNS TRIGGER AS $$
  BEGIN
    IF (TG_OP = 'UPDATE' OR TG_OP = 'INSERT') THEN
      IF NEW.prevout_n <> -1 AND NEW.prevout_tx_id IS NULL THEN
        SELECT id INTO NEW.prevout_tx_id FROM txs WHERE txid = NEW.prevout_hash;
        IF NOT FOUND THEN
          RAISE EXCEPTION 'Unknown prevout_hash %', NEW.prevout_hash;
        END IF;
      END IF;
      RETURN NEW;
    END IF;
    RETURN NULL;
  END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER txins_before_trigger
BEFORE INSERT OR UPDATE OR DELETE ON txins
  FOR EACH ROW EXECUTE PROCEDURE txins_before_trigger_func();

CREATE OR REPLACE FUNCTION txins_after_trigger_func() RETURNS TRIGGER AS $$
  BEGIN
    IF (TG_OP = 'DELETE') THEN
      IF OLD.prevout_tx_id IS NOT NULL THEN
        UPDATE txouts SET spent = FALSE
         WHERE tx_id = prevout_tx_id AND n = OLD.prevout_n;
      END IF;
      RETURN OLD;
    ELSIF (TG_OP = 'UPDATE' OR TG_OP = 'INSERT') THEN
      IF NEW.prevout_tx_id IS NOT NULL THEN
        UPDATE txouts SET spent = TRUE
         WHERE tx_id = NEW.prevout_tx_id AND n = NEW.prevout_n;
        IF NOT FOUND THEN
          RAISE EXCEPTION 'Unknown prevout_n % in txid % (id %)', NEW.prevout_n, NEW.prevout_hash, NEW.prevout_tx_id;
        END IF;
      END IF;
      RETURN NEW;
    END IF;
    RETURN NULL;
  END;
$$ LANGUAGE plpgsql;

CREATE CONSTRAINT TRIGGER txins_after_trigger
AFTER INSERT OR UPDATE OR DELETE ON txins DEFERRABLE
  FOR EACH ROW EXECUTE PROCEDURE txins_after_trigger_func();

Identifying Orphaned Blocks

While this is not part of the schema, I thought it would be of interest to the readers. An orphaned block is a block to which no other prevhash refers. At the time of a chain split we start out with two blocks referring to the same block as previous, but the next block to arrive will identify one of the two as its previous thereby orphaning the other of the pair.

To identify orphans we need to walk the chain backwards starting from the highest height. Any block that this walk does not visit is an orphan.

In SQL this can be done using the WITH RECURSIVE query like so:

UPDATE blocks
   SET orphan = a.orphan
  FROM (
    SELECT blocks.id, x.id IS NULL AS orphan
      FROM blocks
      LEFT JOIN (
        WITH RECURSIVE recur(id, prevhash) AS (
          SELECT id, prevhash, 0 AS n
            FROM blocks
                            -- this should be faster than MAX(height)
           WHERE height IN (SELECT height FROM blocks ORDER BY height DESC LIMIT 1)
          UNION ALL
            SELECT blocks.id, blocks.prevhash, n+1 AS n
              FROM recur
              JOIN blocks ON blocks.hash = recur.prevhash
            %s
        )
        SELECT recur.id, recur.prevhash, n
          FROM recur
      ) x ON blocks.id = x.id
   ) a
  WHERE blocks.id = a.id;

The WITH RECURSIVE part connects rows by joining prevhash to hash, thereby building a list which starts at the highest hight and going towards the beginning until no parent can be found.

Then we LEFT JOIN the above to the blocks table, and where there is no match (x.id IS NULL) we mark it as orphan.

Conclusion

Devising this schema was surprisingly tedious and took many trial and error attempts to reimport the entire blockchain which collectively took weeks. Many different variations on how to optimize operations were attempted, for example using an expression index to only index a subset of a hash (first 10 bytes are still statistically unique), etc.

I would love to hear comments from the database experts out there. I’m not considering this version “final”, there is probably still room for improvement and new issues might be discovered as I progress to writing up how to insert new blocks and actually verify blocks and transactions.

Blockchain in PostgreSQL Part 2

2017-10-20T08:05:00-04:00

Update: there is now a better write up of the PostgreSQL schema. This post was rather half-baked as much was still not understood when I wrote it.

In a previous post I described a simplistic schema to store the Bitcoin blockchain in PostgreSQL. In this post I’m investigating pushing the envelope with a bit of C programming.

The Missing Functionality

Postgres cannot do certain things required to fully handle transactions. The missing functionality is (at least):

Support for Variable Length Integer used in the blockchain and more generally the binary encoding of a transaction or its components.
Elliptic Curve Signature. Even though postgres integrates with OpenSSL, which has that functionality, there is no way to call the EC functions.
Ability to parse and evaluate Bitcoin script. This is a biggie, as transaction verification requires it, and it is one of the more complex and bug-prone aspects of Bitcoin.

It is also important that all of the above be performant. Even though varints, script and even elliptic curve could be implemented in plain PL/pgSQL, it probably wouldn’t be fast enough for practical use. Which leaves us with the only possible option: a C extension.

Avoid Reinventing the Wheel

Anything is possible in C, but can we avoid having to reimplement it from scratch? Are there libraries that could be leveraged?

As it is now, the Bitcoin protocol is primarily specified by its source code, and the source of all truth is the Bitcoin Core. It is possible to use C++ in PG extensions, which means at least in theory the Bitcoin Core code could be leveraged somehow.

My initial conclusion is that this would be a daunting task. Bitcoin Core code requires at least C++11, as well as Boost. It also seems that the core code assumes its own specific storage and caching mechanism and isn’t easily abstracted away from it. Not to mention that using C++ libs from Postgres has complexities of its own.

I looked around for a plain C implementation of Bitcoin and found a few rather incomplete ones. The most functional one seems to be Jeff Garzik’s picocoin. With the looming Segwit2x fork and all the controversy surrounding it this may seem like an odd choice of a library, but for the purpose of what we are doing, I think it’s fine. It also seems like Picocoin isn’t actively developed, which is not great. I would very much appreciate opinions/advice on this, if you know of a better C lib, do leave a comment.

The C extension

Thanks to this excellent series of posts and Postgres’ superb documentation, I was able to put together a proof-of-concept extension, available at https://github.com/blkchain/pg_blkchain. While the C internals of it would be subject for a whole separate post (or few), suffice it to say that it is fairly rudimentary and all the heavy lifting is delegated to the picocoin lib.

As of now, the extension provides a handful of functions:

get_vin(tx bytea) This is a Set Returning Function (SRF), which returns the transaction inputs as rows.
get_vout(tx bytea) Similarly to get_vin(), an SRF that returns outputs.
parse_script(script bytea) An SRF which parses a Bitcoin script and returns (more or less) human-readable rows.
verify_sig(tx bytea, previous_tx bytea, n int) Verifies a specific input of a transaction (denoted by n), given a the previous transaction to which the input refers. Returns a boolean.

This is hardly enough to support all of what would be required by a full node, but this is sufficient to do some interesting stuff.

Note that the function names and signatures are not final, this is a work in progress and I expect this all to evolve and change. For example, initially I implemented get_vout() which returned an array, but in the end an SRF seemed like a more flexible approach.

The Schema

In the last post I used separate tables for the transaction, inputs and outputs. With the ability to serialize/deserialize transactions at our disposal, there are more interesting options.

The most compact way to store transactions is to just use the serialized binary form in a binary (bytea) column. We can get at any particulars of it by using our functions.

The examples below are based on a single table created as

CREATE TABLE rtxs (
   id            BIGINT NOT NULL,
   tx            BYTEA NOT NULL
);

I imported the first 100K blocks or so into this table, how it was done I might describe in a separate post.

I’ll introduce the extension with my favorite example: the decoding of the signature of the genesis block input:

SELECT (sig).op_sym, encode((sig).data, 'escape')
  FROM (
    SELECT parse_script((get_vin(tx)).scriptSig) AS sig FROM rtxs
    WHERE digest(digest(tx, 'sha256'), 'sha256') = E'\\x3ba3edfd7a7b12b27ac72c3e67768f617fc81bc3888a51323a9fb8aa4b1e5e4a'
  ) x;
   op_sym    |                                encode
-------------+-----------------------------------------------------------------------
 OP_PUSHDATA | \377\377\000\x1D
 OP_PUSHDATA | \x04
 OP_PUSHDATA | The Times 03/Jan/2009 Chancellor on brink of second bailout for banks

Expression Indexes

One neat feature of PostgreSQL is ability to index expressions. For example, we know that we can compute a transaction hash with

select digest(digest(tx, 'sha256'), 'sha256') from rtxs limit 1;
                               digest
--------------------------------------------------------------------
 \x6e29b04a029e308344995fab2b75e953e1efa914d306ad47c14a3cebc84564fd

Note that this is little-endian, while conventionally transaction id’s are represented with bytes reversed (big-endian): fd6445c8eb3c4ac147ad06d314a9efe153e9752bab5f994483309e024ab0296e

Now if we want to be able to look up transactions quickly by the transaction hash, as is the convention, we can create an expression index like so:

CREATE INDEX ON rtxs(digest(digest(tx, 'sha256'), 'sha256'));

When we do this, PostgreSQL scans the entire table, computes the hash and stores it in the index. An index, after all, is just another table (of sorts), and there is nothing wrong with indexes containing values that do not exist in the table to which the index refers.

Once we do this, any time the expression digest(digest(tx, 'sha256'), 'sha256') is used in reference to the rtxs table, PostgreSQL will not execute the digest() function, but would instead use the value stored in the index.

We can attest to this with

explain analyze SELECT id
FROM rtxs
WHERE digest(digest(tx, 'sha256'), 'sha256') = E'\\x6e29b04a029e308344995fab2b75e953e1efa914d306ad47c14a3cebc84564fd';
                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using rtxs_digest_idx on rtxs  (cost=0.42..8.44 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=1)
   Index Cond: (digest(digest(tx, 'sha256'::text), 'sha256'::text) = '\x6e29b04a029e308344995fab2b75e953e1efa914d306ad47c14a3cebc84564fd'::bytea)
 Planning time: 0.077 ms
 Execution time: 0.037 ms
(4 rows)

This is pretty clever - even though we do not have an actual “transaction hash” column in our table, we do have the value and an index in the database.

Views

But what if we wanted to have a better readable representation of transactions, for example something that includes the transaction hash?

The best way to do this is via a view:

CREATE VIEW tx_view AS
  SELECT id, digest(digest(tx, 'sha256'), 'sha256') AS txid, tx
    FROM rtxs;

Postgres is clever enough to use the above index for the view:

explain analyze SELECT * FROM tx_view
 WHERE txid = E'\\x6e29b04a029e308344995fab2b75e953e1efa914d306ad47c14a3cebc84564fd';
--------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using rtxs_digest_idx on rtxs  (cost=0.42..8.45 rows=1 width=318) (actual time=0.045..0.046 rows=1 loops=1)
   Index Cond: (digest(digest(tx, 'sha256'::text), 'sha256'::text) = '\x6e29b04a029e308344995fab2b75e953e1efa914d306ad47c14a3cebc84564fd'::bytea)
 Planning time: 0.104 ms
 Execution time: 0.067 ms

A similar technique can applied to inputs and outputs, for example for outputs we could create a view like so:

CREATE VIEW rtxouts AS
 SELECT id, (vout).n, (vout).value, (vout).scriptpubkey
  FROM ( SELECT id, get_vout(tx) vout FROM rtxs) x;

The outputs are now easily accessibly as:

# select * from rtxouts limit 3;
 id | n |   value    |                                                               scriptpubkey
----+---+------------+------------------------------------------------------------------------------------------------------------------------------------------
  1 | 0 | 5000000000 | \x4104678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5fac
  2 | 0 | 5000000000 | \x410496b538e853519c726a2c91e61ec11600ae1390813a627c66fb8be7947be63c52da7589379515d4e0a604f8141781e62294721166bf621e73a82cbf2342c858eeac
  3 | 0 | 5000000000 | \x41047211a824f55b505228e4c3d5194c1fcfaa15a456abdf37f9b9d97a4040afc073dee6c89064984f03385237d92167c13e236446b417ab79a0fcae412ae3316b77ac
(3 rows)

Want to know the most popular opcode used in scripts?

--Note: this is obviously not the full blockchain

SELECT (parse_script(scriptpubkey)).op_sym, count(1)
  FROM (SELECT scriptpubkey FROM rtxouts) x
GROUP BY op_sym
ORDER BY count(1);
  op_sym     |  count
----------------+---------
 OP_NOP         |       5
 OP_DUP         | 1007586
 OP_EQUALVERIFY | 1007586
 OP_HASH160     | 1007586
 OP_PUSHDATA    | 1139431
 OP_CHECKSIG    | 1151434
(6 rows)

Anyway, that’s it for now. Please comment your questions/comments below, or via twitter, I am very curious on what people think on this approach!

Bitcoin Transaction Hash in Pure PostgreSQL

2017-10-10T17:54:00-04:00

Update: hacked together this, more details to follow later…

In theory, Postgres should be able to verify transactions and blocks, as well as do a lot of other things that are currently only done by full nodes. For this to be performant, it will most likely require an extension written in C, but I’m curious how far we can get with bare bones Postgres.

More importantly, would that actually be useful? A node is really just a database, a very efficient one for a very specific purpose, but would leveraging the full power of Postgres be somehow more beneficial than just running Bitcoin-Qt or btcd, for example?

To get to the bottom of this would be a lot of work, and potentially a lot of fun. It would also be a great blockchain learning exercise. (If you’re working on a PG extension for Bitcoin or more generally blockchain, please do let me know!)

Random Thoughts

The structure of the Bitcoin blockchain is relatively simple. We have transactions, which in turn have inputs and outputs and belong to blocks. Four tables, that’s it.

I’ve been able to import the whole blockchain with some fairly basic Go code into my old Thinkpad running Linux overnight. The Go code needs some more polishing and is probably worthy of a separate write up, so I won’t get into it for now. Below is the schema I used. I intentionally left out referential integrity and indexes to keep it simple and avoid premature optimization.

CREATE TABLE blocks (                     -- CBlockIndex (chain.h)
   id           BIGINT NOT NULL
  ,prev         BIGINT NOT NULL              -- .prev->nHeight  // genesis will have -1
  ,height       BIGINT NOT NULL              -- .nHeight
  ,hash         BYTEA NOT NULL            -- 
  ,version      BIGINT NOT NULL              -- .nVersion
  ,prevhash     BYTEA NOT NULL            -- .pprev->GetBlockHash()
  ,merkleroot   BYTEA NOT NULL            -- .hashMerkleRoot
  ,time         BIGINT NOT NULL           -- .nTime
  ,bits         BIGINT NOT NULL           -- .nBits
  ,nonce        BIGINT NOT NULL           -- .nNonce
);

CREATE TABLE txs (
   id            BIGINT NOT NULL
  ,txid          BYTEA NOT NULL
  ,version       BIGINT NOT NULL
  ,locktime      BIGINT NOT NULL
);

CREATE TABLE txins (
   id            BIGINT NOT NULL
  ,tx_id         BIGINT NOT NULL
  ,n             BIGINT NOT NULL
  ,prevout_hash  BYTEA NOT NULL
  ,prevout_n     BIGINT NOT NULL
  ,scriptsig     BYTEA NOT NULL
  ,sequence      BIGINT NOT NULL
);

CREATE TABLE txouts (
   id           BIGINT NOT NULL
  ,tx_id        BIGINT NOT NULL
  ,n            BIGINT NOT NULL
  ,value        BIGINT NOT NULL
  ,scriptpubkey BYTEA NOT NULL
);

There are a couple projects out there that keep the blockchain in a database, most notably Abe. I haven’t studied the code very carefully, but my initial impression was that Abe tries to use standard SQL that would work across most big databases, which is philosophically different from my objective of going 100% Postgres and leveraging all that it can do for us.

Bitcoin uses a lot of uint32’s. A Postgres INT is the correct size, but it is signed, which means we have to use the next larger type, BIGINT. It seems like it might be a waste to use 64 bits for a 32-bit value, but I couldn’t think of anything better than a BIGINT. For the binary stuff it seems like BYTEA is the best match.

So what can we do with this? There is no easy way to create or verify an Elliptic Curve signature in Postgres, but with the help of the pgcrypto extension, we should be able to at least generate the correct SHA256 digest which is used in the signature. As a side note, EC signature math is actually remarkably simple and could probably be implemented as a PG function, but I’m too lazy. Here it is in a few lines of Python.

The rules on how Bitcoin generates the hash (which is then signed) are slightly complicated, and that’s an understatement.

For the purposes of this exercise, I’d just be happy with a value that matches, even if the code does not fully comply with the Bitcoin rules.

One problem I ran into was that, because Bitcoin blockchain is little-endian except for where it isn’t, you often need a way to reverse bytes in a BYTEA. Strangely, Postgres does not provide a way to do that, unless I’m missing something. But thanks to stackoverflow, here is one way to do this:

CREATE OR REPLACE FUNCTION reverse(bytea) RETURNS bytea AS $reverse$
    SELECT string_agg(byte,''::bytea)
       FROM (
          SELECT substr($1,i,1) byte
             FROM generate_series(length($1),1,-1) i) s
$reverse$ LANGUAGE sql;

We also have no way to render a Bitcoin varint, but we can fake it with some substringing for the time being.

Equipped with this, we can construct the following statement, sorry it’s a little long and I do not have the patience to explain it in writing.

select digest(digest(tx_ser || hashtype, 'sha256'), 'sha256') as shasha from (
 select substring(reverse(int8send(version)) from 1 for 4) ||
       vin ||
       vout ||
       substring(reverse(int8send(locktime)) from 1 for 4) AS tx_ser,
       substring(reverse(int8send(1)) from 1 for 4) AS hashtype
  from txs t
  join txins tt ON tt.tx_id = t.id
  join lateral (
    select tx_id, substring(reverse(int8send(count(1))) from 1 for 1) || string_agg(txin_ser, '') as vin
    from (
      select
         ti.tx_id,
         reverse(prevout_hash) ||
         substring(reverse(int8send(prevout_n)) from 1 for 4) ||
         substring(reverse(int8send(length(CASE WHEN ti.n = tt.n THEN ptxout.scriptpubkey ELSE '' END))) from 1 for 1) ||
         CASE WHEN ti.n = tt.n THEN ptxout.scriptpubkey ELSE '' END ||
         substring(reverse(int8send(sequence)) from 1 for 4) as txin_ser
      from txins ti
      join txs ptx on ti.prevout_hash = ptx.txid
      join txouts ptxout on ptxout.tx_id = ptx.id and ti.prevout_n = ptxout.n
      order by ti.n
     ) x
   group by tx_id
   ) vin on vin.tx_id = tt.tx_id
   join (
      select tx_id, substring(reverse(int8send(count(1))) from 1 for 1) || string_agg(txout_ser, '') as vout
      from (
        select
          tx_id,
          reverse(int8send(value)) ||
          substring(reverse(int8send(length(scriptpubkey))) from 1 for 1) ||
          scriptpubkey as txout_ser
        from txouts
        order by n
        ) x
      group by tx_id
    ) out ON out.tx_id = tt.tx_id
 where tt.tx_id = 37898
) x;
-[ RECORD 1 ]--------------------------------------------------------------
shasha | \x23c3bf5091f3cdaf5996b0091c5f5bb6d82f3cdc2ce077018bb854f40274e512
-[ RECORD 2 ]--------------------------------------------------------------
shasha | \xbcd4d519931da3ab98ca9745a0ceba79f05306cad4fa6ee9863819d1783a2e00

The particular transaction we are looking at is this. It happens to have id of 37898 in my database. In case you’re wondering, for this example I used a subset of the blockchain which only has the first 182,000 blocks. On the full blockchain and without indexes, this statement would have taken an eternity to execute.

What makes this particular transaction interesting is that it has two inputs, which is slightly trickier, because to spend them, there need to be two different signatures of the same transaction. This is because before signing, the input scriptSig needs to be replaced with the output’s scriptPubKey (the oversimplified version). This is reflected in the SQL in the use of LATERAL and CASE.

You do not have to take my word that the two hashes are correct, we can verify them fairly easily with a bit of help from the Python ecdsa library. Here is the code to verify the second hash. The key and the signature are in the transaction itself.

import ecdsa
import codecs
key = codecs.decode(
    "04de99a4267263f495e07721f96241359b48b9f522973b9d333ed8e296357c595130535ca387601955f1406e335cf658bb6a12d62c177e9511498fefcafead1c0e",
    "hex")
der = '0V0\x10\x06\x07*\x86H\xce=\x02\x01\x06\x05+\x81\x04\x00\n\x03B\x00' + key
digest = codecs.decode("bcd4d519931da3ab98ca9745a0ceba79f05306cad4fa6ee9863819d1783a2e00", "hex")
signature = codecs.decode(
    "30460221008e95fd3536cfd437c49e4c1dfaeeb2ece0e521420c89f1487ca6eff94053485c022100ef3a8cdc9b0a6d6d403bf7758c6b617380db6936de2bbcd3b556ec5f45c03b54",
    "hex")
vk = ecdsa.VerifyingKey.from_der(der)
print vk.verify_digest(signature, digest, sigdecode=ecdsa.util.sigdecode_der)
# True

I hope this was fun! Now I wonder how hard it would be to make an extension to provide all the functionality required by Bitcoin….

Electricity cost of 1 Bitcoin (Sep 2017)

2017-09-28T16:38:00-04:00

How much does it cost in electricity to mine a Bitcoin?

As of Sep 28, 2017, according to blockchain.info the hashrate is: 9,214,860,125 GH/s.

These days it seems that the best miner available for sale is the AntMiner S9. It is actually over a year old, and there are faster and more energy efficient ASICs now, e.g. BitFury, but it is very hard to get any information on those, so we will just use the S9 information.

The S9 is capable of 12,930 GH/s. The collective Bitcoin hash rate is equivalent to 712,672 S9 miners running in parallel.

An S9 uses 1375W, which means that in 1 hour it consumes 1.375 kW/h.

In USA, a kWh costs $0.12 on average. (It can be as low as 0.04, according to this EIA chart.)

At 12c per kWh a running S9 costs $0.165 per hour.

712,672 running S9’s would cost $117,591.02 per hour.

Bitcoin blocks are solved at 6 per hour on average. Thus, each block costs $19,598.50 to solve.

The current mining reward is 12.5 BTC, which gives us the answer:

At \$0.12 kW/h a Bitcoin costs \$1,567.88 to mine.

At \$0.04 kW/h a Bitcoin costs \$522.62 to mine.

This, of course, does not include hardware and other costs.

It’s quite likely that the largest mining operations pay even less than $0.04 for electricity and the hardware they use is many times more efficient.

While grosly inaccurate, this shows that mining is quite profitable, and that Bitcoin price would have to fall a lot for mining to stop being profitable.

Looking at the Trend

Current difficulty is 1103400932964, The difficulty before that was 922724699725.

Difficulty adjusts every 2,016 blocks, which is about two weeks.

The Difficulty number is a coefficient of the “difficulty 1 target”, i.e. where the hash has to begin with 4 zero bytes (32 zero bits). It means is that it is N times harder than “1 target”.

We can see that at the last adjustment it went up by 180,676,233,239, or 16%, which is quite a bit in just two weeks. The last adjustment before that was from 888171856257, or 4%.

Assuming that the only miner in the world was the S9, the difficulty adjustment can only be explained by more S9’s coming online. The number of S9’s online is directly proportional to the hashrate, which is directly proportional to the difficulty. Thus there is a direct relationship between energy cost and the difficulty.

When the difficulty was 922724699725 (Sep 6 through 16), the hash rate was at about 8,000,000 TH/s, or equivalent of 618,716 S9’s. At that difficulty and the 12c kW/h price, a BTC cost $1,361 to mine.

Now let’s look back at the world before the S9, which uses the Bitmain BM1387 16nm ASIC. Before the S9, there was S7, based on BitMain BM1385 28nm ASIC. The S7 power consumption is roughly same as S9, or let’s assume it is for simplicity, but it is only capable of 4,000 GH/s.

Back at the end of 2015 when S7 was announced, the hashrate was at around 700,000,000 GH/s, or equivalent to 175,000 S7’s. That cost \$28,875 per hour, or \$4,812.5 per block. The block reward was 25 Bitcoins then, so a Bitcoin would cost only \$192 to mine. (With a 12.5 reward it would have been \$385).

This is all very confusing, but we can see that faster hardware and more of it drives the cost of mining up and the rlationship between the difficulty and the cost of mining a Bitcoin is linear. Faster hardware enables higher hash rate at improved energy efficiency, and the difficulty adjusts to keep the rate of blocks and supply of new BTC at 10 minutes.

The cost factor behind Bitcoin is energy, and spending more energy on mining makes a Bitcoin more expensive and less profitable. However, a more energy-expensive Bitcoin is a more sound/secure Bitcoin from the cryptographic perspective, which means it is likely to go up in USD price, and thus should still be profitable for the miners. This is a very interesting factor here, because if the BTC/USD price wasn’t going up, the miners would be bitter enemies and would do everything possible to prevent more miners from coming online. The rise of the BTC/USD price is what justifies as positive more miners coming online. So far we have not seen any news reports of mining facilities being sabotaged, which probably means miners are not enemies.

I will need to think on this some more as there are a lot of moving parts. But if I can make a cursory conclusion here, it is that (industrial) mining is and will remain very profitable for some time.

Bitcoin: USD Value

2017-09-25T08:50:00-04:00

In part 1 I explained how money has always been a global ledger and Bitcoin is just a different implementation of one. The million dollar question remains, what should Bitcoin be worth in a currency we’re more familiar with, such as USD?

Asset Pricing

To illustrate the dilemma we’re faced with, lets look at three types of assets and how we price them.

Stocks: price is determined by the present and future profits, which is relatively straight forward.

Commodities (e.g oil): price is a function of supply (oil being produced) and demand (oil burned in engines or whatever).

Store of value: this is anything that is bought because it keeps its value. Most commonly it is precious metals like gold, but it can also be fiat currency or valuable works of art. This category is most fitting for Bitcoin. Pricing of a store of value is strangely arbitrary, I attempt to explain it below.

Speculative Demand

There are two kinds of demand. Actual demand is based in our everyday needs. For example we rely on combustible engines which consume oil. Engines burn oil (or its byproducts) converting it to exhaust gases, at which point it is no more. The more we drive, the more oil we burn, the higher the demand.

The second kind of demand is speculative. It is based on the expectation of a future price change. Speculators buy assets they expect to go up in price and sell those they don’t. When everyone wants to buy oil because they think the price is going up, its price does indeed goes up. But that is not directly related to the actual supply of oil from the ground (it often is, but not always).

Right Price

In part 1 we covered how a sale is actually a loan, and how the seller ends up with tokens representing the value owed to the seller. The number of tokes (aka price) is a reflection of how we value things relative to each other.

When we buy stuff for everyday use we establish a price range that is driven by, for lack of a better term, common sense. For example we may think that a loaf of bread is worth a dozen eggs. If the price of something exceeds common sense, we will forego buying it. This means that the price has a direct effect on demand especially when it comes to everyday consumption items such as food or fuel.

Speculators have no respect for common sense and the right price. They are only concerned with the trend. It is possible for the speculative demand to drive the price way above the common sense level, we saw that when “peak oil” was a thing. But the price will eventually gravitate towards the common sense price.

Store of Value Price

In contrast, price for a store of value is purely speculative, which means the sensibility of the price does not apply.

Let’s take gold, for example. Intuitively we might reason that there is a (non-speculative) supply and demand for gold, but it’s actually illusory. The annual production of gold is minute compared to the total gold above ground, which means there is essentially no supply. There is also next to no demand, because gold cannot be consumed. There is never less or more gold available in the world, its quantity is fairly constant. Yet the price fluctuates. The only explanation for this is speculation.

There is simply no such thing as the “right price” for store of value. If I want to move a million dollars into gold, the price of gold is of no consequence to me, be it a thousand or a million dollars per ounce. So long as I know that it is stable, it is a good store of value.

The good news here is that no price is too high (or too low) for Bitcoin. 4K only seems high because a year ago it was 400. We tend to judge the price based on history, and there is good sense in that, indeed what goes up in value too much too fast often subsequently corrects.

Importance of Market Cap

Market capitalization is the price of all of the asset available in the world. It’s easy to compute the market cap for a stock because we know the number of shares outstanding. There is no such thing as a market cap for a commodity because it is continuously produced and consumed. When it comes to something like gold, we can estimate the market cap because we know approximately how much physical gold is above ground. Bitcoin, like gold, has an approximate market cap (approximate because it is not possible to know how much BTC has been lost).

Market cap size is critical for adoption of a store of value. It needs to be large enough to “fit” even very large amounts of fiat, ideally without affecting the market. Gold market cap is estimated at 7 trillion USD, which means that even the richest people can move all their assets into gold and not move the market. (At least one at a time. All of them at once will move the market big time).

Bitcoin market cap of about 70B USD is not large enough for even one of the richest people on the planet. This implies that if the market cap does not grow, Bitcoin is likely to fail as store of value.

Hash Rate

What sets Bitcoin apart from all other crypto currencies is its extremely high hash rate. This means that a Bitcoin is orders of magnitude more “precious” than any other crypto coin presently in existence.

There is a definite correlation between the Bitcoin hash rate and the price. Some people argue that hash rate follows price, not the other way around, and it’s probably true.

Bitcoin’s high hash rate is what makes it the best store of value among crypto coins today.

Adoption

Ultimately, I believe adoption is the most important factor in Bitcoin USD price. Greater adoption will increase the number of speculators willing to own Bitcoin, it will drive the price up, and hopefully bring it to a level comparable to that of gold.

The key to adoption is not ease of payment or volume of transactions like we used to think until very recently. The key to adoption is understanding of the mathematics behind Bitcoin. With all the hype surrounding it, only remarkably few understand how sound Bitcoin actually is. In many ways it is more sound than any other store of value known, including gold.

Regardless of whether Bitcoin becomes de facto digital gold or not, we are witnessing a historic transformation possibly bigger than the Internet itself.

Bitcoin: Better Ink than Gold?

2017-09-22T07:52:00-04:00

The fundamental question about Bitcoin is not whether it is sound from the cryptography standpoint. The question is: what is it?

Money is Debt Ink

To define Bitcoin we need to look back at the history of money. The earliest money was in the form of things that were scarce and impossible to falsify, something like specific kinds of sea shells. Everyone knew that stuff could be traded for these tokens.

Once such monetary tokens were invented, we no longer needed to decide what to barter right there and then, we could postpone the decision. One could sell milk for tokens, then use those tokens to buy a spear later, this way the milk didn’t spoil while waiting for the spear to be made.

What is not very obvious is that the tokens represented debt. A sale is really a loan in disguise. Before the sale, the seller had milk. After the sale, the seller had tokens, which are proof that value of tokens is owed to the seller. In other words, the tokens received for the milk sold were a record of debt. Tokens are the ink in which this record is written.

It is noteworthy that there is no money if there is no debt, or that money implies debt. It’s a simple principle that so few understand.

World-Wide Debt Ledger

The best way of thinking about money is that it is the medium in which we maintain a world-wide record of debt. The entries in this book or ledger are written as physical tokens. Only the people in possession of the tokens actually know how much they have and there is no history, only the final state. The history exists only in the minds (or records) of the traders. It is very private.

Gold Ink

Later people started using rare metals such as gold or silver as money. Metals were better than sea shells because they were divisible. We could now make arbitrary size tokes we called (coined?) coins.

Although we intuitively think that gold has a lot of value, in reality it has very little. Gold does not feed us or keep us warm. It does have some unique properties, but back when we started using gold as money we couldn’t possibly appreciate those, other than perhaps gold being pretty and extremely durable.

Gold is also rare. But rarity does not imply value. The sea shells were worthless before they were used as money, and they are worthless now, yet they too are rare.

Banks, Paper and Records of Records (of Records)

But it turned out that keeping valuable tokens was difficult, they could be lost or stolen, and worse, people were willing to kill for them. And so we decided to keep them all safe in one place. This was the original bank.

The bank issued paper notes that corresponded to the gold in the vault. Now these paper notes could be traded for anything. This was because people knew that even though the paper is worthless, it represents gold that is in the bank. At any time one could go to the bank, give the bank the paper note and receive gold (at which point the bank would destroy the paper note because the debt is settled).

A paper note is a record of the record of debt. The true record was in gold, paper was a copy. It’s a bit of a mind-twister, but humans have become really good at rewriting the original debt ledger in other mediums.

Ironically the concept of the bank as a safe vault never really worked: people were willing to steal and kill for paper money just the same. These days bank vaults keep paper notes as if its gold. And the bank’s computer keeps a record of the record of the record of debt.

Real Estate Ink

At some point bankers realized that they can manipulate the monetary supply because only the bankers actually knew how much gold they had. It was done “for the good of the public” who could get easier loans, but it was also an easy way for the banks to make money out of nothing.

Eventually it was decreed that not just gold, but anyhting could be similarly held by the bank so that vastly more paper notes could be issued. Most notably real estate, the arrangement of issuing paper notes for a house being known as a mortgage. And since a house cannot be placed into the vault, it too had to be recorded, creating yet another layer of abstraction. It all ended with collateralized debt obligations, credit default swaps and ultimately the 2008 subprime mortgage crisis. Next year Bitcoin was born…

Monetary System is Just a Ledger

The bottom line remains: we kept a legder. The recording medium was precious metals, then evolved to paper and metals, and finally when we went off the gold standard it became just paper reflecting value of arbitrary things held under lien as collateral.

Enter Blockchain

The Bitcoin blockchian is also a medium for this legder, only instead of relying on scarcity of precious metals, the scarcity is in the mathematical complexity of a problem.

And this is where our minds begin to play tricks on us, because this is a concept previously unknown to humans. A Bitcoin, which takes an enormous amount of computational power to generate, is actually, really scarce. Yes, it is not physical, it is just “knowlegde” or “information”, but by all laws of nature it is scarce, in fact more scarce than gold, the total amount of which in the universe isn’t fully known.

But Bitcoin is just an Agreement?

Interestingly, Bitcoin is merely an agreement and one might argue that some day we can collectively decide to increase the 21 million limit thereby diluting Bitcoin value. But can we actually do that, or will it not be Bitcoin at that point? I think only time will tell.

We do already have a lot of things that we agree on and we don’t really question how it happened. The aforementioned shells were collectively agreed upon. We agree on what the current date is, does it matter how it happened? In fact, much of what the world is, just is, including the fiat money (where “fiat” literally means “let it be done”). And so now Bitcoin just is.

The sea shells ceased to exist as money in favor or precious metals, and it is likely that same will at some point happen with Bitcoin.

History does show that when it comes to money, people show their worst traits. This is why countries with solid currencies have big armies and police, and very strict laws regarding manipulation of money. This is how “fiat” actually works.

Amazingly, the mathematic principles on which Bitcoin is based do not need to be defended. No army in the world could ever change a single prime number.

Alternative Realities

The name Bitcoin refers to a specific blockchain. There can be many like it. The name could have been different, the parameters of the algorithm could have been different, just like the dollar bills could have been blue. There are other cryptographic currencies, and they are different, they too now exist. (Caveat: some of them are mathematically bogus).

One could argue that gold exists in nature, while Bitcoin was created by man, and thus gold is somehow more real. But Bitcoin rests on the mathematical principles that too are just part of this universe, they were not created by man, they were discovered and applied, and again in this sense Bitcoin isn’t much different than gold.

The Mystery of Value

The mystery to me is how we collectively set a value of things like gold or Bitcoins. Now that we’ve demonstrated that as money, they are equivalent. Why is an ounce of gold worth $1300? Who decided that? The market? Is the real value of it in how good of an ink it is in the world-wide debt ledger?

To be continued…

Tgres Status - July 4th 2017

2017-07-04T08:28:00-04:00

It’s been a while since I’ve written on Tgres, here’s a little update, Independence Day edition.

Current Status

The current status is that Tgres is looking more and more like a finished product. It still needs time and especially user testing (the ball is in your court, dear reader), because only time reveals the weirdest of bugs and validates stability. I would not ditch your current stack just yet, but at this point you’d be remiss not having given Tgres a spin.

Recently I had an opportunity to test Tgres as a mirror replica of a sizable Graphite/Statsd/Grafana set up receiving approximately 10K data points per second across more than 200K series, and the results were inspiring. Tgres handled the incoming data without breaking a sweat on “hardware” (ec2 instances, rather) that was a fraction of the Graphite machines while still outperforming it in most respects.

I’d say the biggest problem (and not really a Tgres one) is that mirroring Graphite functionality exactly is next to impossible. Or, rather, it is possible, but it would imply purposely introducing inaccuracies and complexities. Because of this Tgres can never be a “drop in” replacement for Graphite. Tgres can provide results that are close but not identical, and dashboards and how the data is interpreted would require some rethinking.

What’s new?

Data Point Versioning

In a round-robin database slot values are overwritten as time moves forward and the archive comes full-circle. Whenever a value is not overwritten for whatever reason, a stale value from an obsolete iteration erroneously becomes the value for the current iteration.

One solution is to be diligent and always make sure that values are overwritten. This solution can be excessively I/O intensive for sparse series. If a series is sparse, then more I/O resources are spent blanking out non-data (by setting the value to NaN or whatever) than storing actual data points.

A much more efficient approach is to store a version number along with the datapoint. Every time the archive comes full-circle, version is incremented. With versions there is no need to nullify slots, they become obsoleted by virtue of the data point version not matching the current version.

Under the hood Tgres does this by keeping a separate array in the ts table which contains a smallint (2 bytes) for every data point. The tv view is aware of it and considers versions without exposing any details, in other words everything works as before, only Tgres is a lot more efficient and executes a lot less SQL statements.

Zero Heartbeat Series

Tgres always strives to connect the data points. If two data points arrive more than a step apart, the slots in between are filled in to provide continuity. A special parameter called Heartbeat controls the maximum time between data points. A gap greater than the Heartbeat is considered unknown or NaN.

This was a deliberate design decision from the beginning, and it is not changing. Some tools choose to store data points as is, deferring any normalization to the query time. Graphite is kind of in the middle: it doesn’t store the data points as is, yet it does not attempt to do any normalization either, which ultimately leads to inaccuracies which I describe in another post.

The concept of Heartbeat should be familiar to those experienced with RRDTool, but it is unknown to Graphite users which has no such parameter. This “disconnected” behavior is often taken advantage of to track things that aren’t continuous but are event occurrences which can still benefit from being observed as a time series. Tracking application deploys, where each deploy is a data value of 1 is one such example.

Tgres now supports this behavior when the the Heartbeat is set to 0. Incoming data points are simply stored in the best matching slot and no attempt is made to fill in the gap in between with data.

Tgres Listens to DELETE Events

This means that to delete a DS all you need to do is run DELETE FROM ds WHERE ... and you’re done. All the corresponding table rows will be deleted by Postgres because of the foreign key constraints, and the DS will be cleared from the Tgres cache at the same time.

This is possible thanks to the Postgres excellent LISTEN/NOTIFY capability.

In-Memory Series for Faster Querying

A subset of series can be kept entirely in memory. The recent testing has shown that people take query performance very seriously, and dashboards with refresh rates of 5s or even 1s are not unheard of. When you have to go to the database to answer every query, and if the dashboard touches a hundred series, this does not work too well.

To address this, Tgres now keeps an in-memory cache of queried series. The cache is an LRU and its size is configurable. On restart Tgres saves cache keys and loads the series back to keep the cache “warm”.

Requests for some cached queries can now be served in literally microseconds, which makes for some pretty amazing user experience.

DS and RRA State is an Array

One problem with the Tgres table layout was that DS and RRA tables contained frequently updated columns such as lastupdate, value and duration The initial strategy was that these could be updated periodically in a lzay fashion, but it became apparent that it was not practical for any substantial number of series.

To address this all frequently mutable attributes are now stored in arrays, same way as data points and therefore can be updated 200 (or whatever segment width is configured) at a time.

To simplify querying DSs and RRAs two new views (dsv and rrav) were created which abstract away the array value lookup.

Whisper Data Migration

The whisper_import tool has been pretty much rewritten and has better instructions. It’s been tested extensively, though admittedly on one particular set up, your mileage may vary.

Graphite DSL

Lots and lots of fixes and additions to the Graphite DSL implementation. Tgres still does not support all of the functions, but that was never the plan to begin with.

Future

Here’s some ideas I might tackle in the near future. If you are interested in contributing, do not be shy, pull requests, issues and any questions or comments are welcome. (Probably best to keep development discussion in Github).

Get rid of the config file

Tgres doesn’t really need a config file - the few options that are required for running should be command line args, the rest, such as new series specs should be in the database.

A user interface

Not terribly high on the priority list, since the main UI is psql for low level stuff and Grafana for visualization, but something to list series and tweak config options might come in handy.

Track Usage

It would be interesting to know how many bytes exactly a series occupies, how often it is updated and queried, and what is the resource cost for maintaining it.

Better code organization

For example vcache could be a separate package.

Rethink the DSL

There should be a DSL version 2, which is not based on the Graphite unwieldiness. It should be very simple and versatile and not have hundreds of functions.

Authentication and encryption

No concrete ideas here, but it would be nice to have a plan.

Clustering needs to be re-considered

The current clustering strategy is flawed. It might work with the current plan, but some serious brainstorming needs to happen here. Perhaps it should just be removed in favor of delegating horizontal scaling to the database layer.

Building a Go Web App in 2017

2017-04-27T15:00:00-04:00

Update: part 2 is here, enjoy. And part 3. And part 4.

A few weeks ago I started building yet another web-based app, in Go. Being mostly a back-end developer, I don’t have to write web apps very often, and every time I do, it seems like a great challenge. I often wish someone would write a guide to web development for people who do not have all day to get into the intricacies of great design and just need to build a functional site that works without too much fuss.

I’ve decided to use this opportunity to start from scratch and build it to the best of my understanding of how an app ought to be built in 2017. I’ve spent many hours getting to the bottom of all things I’ve typically avoided in the past, just so that for once in many years I can claim to have a personal take on the matter and have a reusable recipe that at least works for me, and hopefully not just me.

This post is the beginning of what I expect to be a short series highlighting what I’ve learned in the process. The first post is a general introduction describing the present problematic state of affairs and why I think Go is a good choice. The subsequent posts have more details and code. I am curious whether my experience resonates with others, and what I may have gotten wrong, so don’t hesitate to comment!

Edit: If you’d rather just see code, it’s here.

Introduction

In the past my basic knowledge of HTML, CSS and JavaScript has been sufficient for my modest site building needs. Most of the apps I’ve ever built were done using mod_python directly using the publisher handler. Ironically for an early Python adopter, I’ve also done a fair bit of work with Rails. For the past several years I focused on (big) data infrastructure, which isn’t web development at all, though having to build web-based UI’s is not uncommon. In fact the app I’m referring to here is a data app, but it’s not open source and what it does really doesn’t matter for this discussion. Anyway, this should provide some perspective of where I come from when approaching this problem.

Python and Ruby

As recently as a year ago, Python and Ruby would be what I would recommend for a web app environment. There may be other similar languages, but from where I stand, the world is dominated by Python and Ruby.

For the longest time the main job of a web application was constructing web pages by piecing HTML together server-side. Both Python and Ruby are very well suited for the template-driven work of taking data from a database and turning it into a bunch of HTML. There are lots of frameworks/tools to choose from, e.g. Rails, Django, Sinatra, Flask, etc, etc.

And even though these languages have certain significant limitations, such as the GIL, the ease with which they address the complexity of generating HTML is far more valuable than any trade-offs that came with them.

The GIL

The Global Interpreter Lock is worthy of a separate mention. It is the elephant in the room, by far the biggest limitation of any Python or Ruby solution. It is so crippling, people can get emotional talking about it, there are endless GIL discussions in both Ruby and Python communities.

For those not familiar with the problem - the GIL only lets one thing happen at a time. When you create threads and it “looks” like parallel execution, the interpreter is still executing instructions sequentially. This means that a single process can only take advantage of a single CPU.

There do exist alternative implementations, for example JVM-based, but they are not the norm. I’m not exactly clear why, they may not be fully interchangeable, they probably do not support C extensions correctly, and they might still have a GIL, not sure, but as far as I can tell, the C implementation is what everyone uses out there. Re-implementing the interpreter without the GIL would amount to a complete rewrite, and more importantly it may affect the behavior of the language (at least that’s my naive understanding), and so for this reason I think the GIL is here to stay.

Web apps of any significant scale absolutely require the ability to serve requests in parallel, taking advantage of every CPU a machine has. Thus far the only possible solution known is to run multiple instances of the app as separate processes.

This is typically done with help of additional software such as Unicorn/Gunicorn with every process listening on its own port and running behind some kind of a connection balancer such as Nginx and/or Haproxy. Alternatively it can be accomplished via Apache and its modules (such as mod_python or mod_wsgi), either way it’s complicated. Such apps typically rely on the database server as the arbiter for any concurrency-sensitive tasks. To implement caching without keeping many copies of the same thing on the same server a separate memory-based store is required, e.g. Memcached or Redis, usually both. These apps also cannot do any background processing, for that there is a set of tools such as Resque. And then all these components need to be monitored to make sure it’s working. Logs need to be consolidated and there are additional tools for that. Given the inevitable complexity of this set up there is also a requirement for a configuration manager such as Chef or Puppet. And still, these set ups are generally not capable of maintaining a large number of long term connections, a problem known as C10K.

Bottom line is that a simple database-backed web app requires a whole bunch of moving parts before it can serve a “Hello World!” page. And nearly all of it because of the GIL.

Emergence of Single Page Applications

More and more, server-side HTML generation is becoming a thing of the past. The latest (and correct) trend is for UI construction and rendering to happen completely client-side, driven by JavaScript. Apps whose user interface is fully JS-driven are sometimes called Single Page Applications, and are in my opinion the future whether we like it or not. In an SPA scenario the server only serves data, typically as JSON, and no HTML is constructed there. In this set up, the tremendous complexity introduced primarily so that a popular scripting language could be used isn’t worth the trouble. Especially considering that Python or Ruby bring little to the table when all of the output is JSON.

Enter Golang

Go is gradually disrupting the the world of web applications. Go natively supports parallel execution which eliminates the requirement for nearly all the components typically used to work around the GIL limitation.

Go programs are binaries which run natively, so there is no need for anything language-specific to be installed on the server. Gone is the problem of ensuring the correct runtime version the app requires, there is no separate runtime, it’s part of the binary. Go programs can easily and elegantly run tasks in the background, thus no need for tools like Resque. Go programs run as a single process which makes caching trivial and means Memcached or Redis is not necessary either. Go can handle an unlimited number of parallel connections, eliminating the need for a front-end guard like Nginx.

With Go the tall stack of Python, Ruby, Bundler, Virtualenv, Unicorn, WSGI, Resque, Memcached, Redis, etc, etc is reduced to just one binary. The only other component generally still needed is a database (I recommend PostgreSQL). It’s important to note that all of these tools are available as before for optional use, but with Go there is the option of getting by entirely without them.

To boot this Go program will most likely outperform any Python/Ruby app by an order of magnitude, require less memory, and with fewer lines of code.

So Is there a Popular Framework Yet?

The short answer is: a framework is entirely optional and not recommended. There are many projects claiming to be great frameworks, but I think it’s best to try to get by without one. This isn’t just my personal opinion, I find that it is generally shared in the Go community.

It helps to think why frameworks existed in the first place. On the Python/Ruby side this was because these languages were not initially designed to serve web pages, and lots of external components were necessary to bring them up to the task. Same can be said for Java, which just like Python and Ruby, is about as old as the web as we know it, or even pre-dates it slightly.

As I remember it, out of the box, early versions of Python did not provide anything to communicate with a database, there was no templating, HTTP support was confusing, networking was non-trivial, bundling crypto would not even be legal then, and there was a whole lot of other things missing. A framework provided all the necessary pieces and set out rules for idiomatic development for all the common web app use cases.

Go, on the other hand, was built by people who already experienced and understood web development. It includes just about everything necessary. An external package or two can be needed to deal with certain specific aspects, e.g. OAuth, but by no means does a couple of packages constitute a “framework”.

If the above take on frameworks not convincing enough, it’s helpful to consider the framework learning curve and the risks. It took me about two years to get comfortable with Rails. Frameworks can become abandoned and obsolete, porting apps to a new framework is hard if not impossible. Given how quickly the information technology sands shift, frameworks are not to be chosen lightly.

I’d like to specifically single out tools and frameworks that attempt to mimic idioms common to the Python, Ruby or the JavaScript environments. Anything that looks or feels or claims to be “Rails for Go”, features techniques like injection, dynamic method publishing and the like which require relying heavily on reflection are not the Go way of doing things, it’s best to stay away from those.

Frameworks definitely do make some things easier, especially in the typical business CRUD world, where apps have many screens with lots of fields, manipulating data in complex and ever-changing database schemas. In such an environment, I’m not sure Go is a good choice in the first place, especially if performance and scaling are not a priority.

Another issue common to frameworks is that they abstract lower level mechanisms from the developer often in way that over time grows to be so arcane that it is literally impossible to figure out what is actually happening. What begins with an idiomatic alias for a single line of JavaScript becomes layers upon layers of transpilers, minimizers on top of helpers hidden somewhere in a sub-dependency. One day something breaks and it’s impossible to know where to even look for the problem. It’s nice to know exactly what is going on sometimes, Go is generally very good about that.

What about the database and ORM?

Similarly to frameworks, Go culture is not big on ORM’s. For starters, Go specifically does not support objects, which is what the O in ORM stands for.

I know that writing SQL by hand instead of relying on User.find(:all).filter... convenience provided by the likes of ActiveRecord is unheard of in some communities, but I think this attitude needs to change. SQL is an amazing language. Dealing with SQL directly is not that hard, and quite liberating, as well as incredibly powerful. Possibly the most tedious part of it all is copying the data from a database cursor into structures, but here I found the sqlx project very useful.

In Conclusion

I think this sufficiently describes the present situation of the server side. The client side I think could be separate post, so I’ll pause here for now. To sum up, thus far it looks like we’re building an app with roughly the following requirements:

Minimal reliance on third party packages.
No web framework.
PostgreSQL backend.
Single Page Application.

part 2

Building a Go Web App - Part 2

2017-04-27T14:00:00-04:00

This is a continuation of part 1. (There is also part 3 and part 4).

So our app is going to have two major parts to it: client and server. (What year is this?). The server side is going to be in Go, and the client side in JS. Let’s talk about the server side first.

The Go (Server) Side

The server side of our application is going to be responsible for initially serving up all the necessary JavaScript and supporting files if any, aka static assets and data in the form of JSON. That’s it, just two functions: (1) static assets and (2) JSON.

It’s worth noting that serving assets is optional: assets could be served from a CDN, for example. But what is different is that it is not a problem for our Go app, unlike a Python/Ruby app it can perform on par with Ngnix and Apache serving static assets. Delegating assets to another piece of software to lighten its load is no longer necessary, though certainly makes sense in some situations.

To make this simpler, let’s pretend we’re making an app that lists people (just first and last name) from a database table, that’s it. The code is here https://github.com/grisha/gowebapp.

Directory Layout

It has been my experience that dividing functionality across packages early on is a good idea in Go. Even if it is not completely clear how the final program will be structured, it is good to keep things separate to the extent possible.

For a web app I think something along the lines of the following layout makes sense:

# github.com/user/foo

foo/            # package main
  |
  +--daemon/    # package daemon
  |
  +--model/     # package model
  |
  +--ui/        # package ui
  |
  +--db/        # package db
  |
  +--assets/    # where we keep JS and other assets

Top level: `package main`

At the top level we have package main and its code in main.go. The key advantage here is that with this layout go get github.com/user/foo can be the only command required to install the whole application into $GOPATH/bin.

Package main should do as little as possible. The only code that belongs here is to parse the command argument flags. If the app had a config file, I’d stick parsing and checking of that file into yet another package, probably called config. After that main should pass the control to the daemon package.

An essential main.go is:

package main

import (
    "github.com/user/foo/daemon"
)

var assetsPath string

func processFlags() *daemon.Config {
    cfg := &daemon.Config{}

    flag.StringVar(&cfg.ListenSpec, "listen", "localhost:3000", "HTTP listen spec")
    flag.StringVar(&cfg.Db.ConnectString, "db-connect", "host=/var/run/postgresql dbname=gowebapp sslmode=disable", "DB Connect String")
    flag.StringVar(&assetsPath, "assets-path", "assets", "Path to assets dir")

    flag.Parse()
    return cfg
}

func setupHttpAssets(cfg *daemon.Config) {
    log.Printf("Assets served from %q.", assetsPath)
    cfg.UI.Assets = http.Dir(assetsPath)
}

func main() {
    cfg := processFlags()

    setupHttpAssets(cfg)

    if err := daemon.Run(cfg); err != nil {
        log.Printf("Error in main(): %v", err)
    }
}

The above program accepts three parameters, -listen, -db-connect and -assets-path, nothing earth shattering.

Using structs for clarity

In line cfg := &daemon.Config{} we are creating a daemon.Config object. It’s main purpose is to pass around configuration in a structured and clear format. Every one of our packages defines its own Config type describing the parameters it needs, and packages can include other package configs. We see an example of this in processFlags() above: flag.StringVar(&cfg.Db.ConnectString, .... Here db.Config is included in daemon.Config. I find doing this very useful. These structures also keep open the possibility of serializing configs as JSON, TOML or whatever.

Using http.FileSystem to serve assets

The http.Dir(assetsPath) in setupHttpAssets is in preparation to how we will serve the assets in the ui package. The reason it’s done this way is to leave the door open for cfg.UI.Assets (which is a http.FileSystem interface) to be provided by other implementations, e.g. serving this content from memory. I will describe it in more detail in a later post.

Lastly, main calls daemon.Run(cfg) which is what actually starts our app and where it blocks until it’s terminated.

`package daemon`

Package daemon contains everything related to running a process. Stuff like which port to listen on, custom logging would belong here, as well anything related to a graceful restart, etc.

Since the job of the daemon package is to initiate the database connection, it will need to import the db package. It’s also responsible for listening on the TCP port and starting the user interface for that listener, therefore it needs to import the ui package, and since the ui package needs to access data, which is done via the model package, it will need to import model as well.

A bare bones daemon might look like this:

package daemon

import (
    "log"
    "net"
    "os"
    "os/signal"
    "syscall"

    "github.com/grisha/gowebapp/db"
    "github.com/grisha/gowebapp/model"
    "github.com/grisha/gowebapp/ui"
)

type Config struct {
    ListenSpec string

    Db db.Config
    UI ui.Config
}

func Run(cfg *Config) error {
    log.Printf("Starting, HTTP on: %s\n", cfg.ListenSpec)

    db, err := db.InitDb(cfg.Db)
    if err != nil {
        log.Printf("Error initializing database: %v\n", err)
        return err
    }

    m := model.New(db)

    l, err := net.Listen("tcp", cfg.ListenSpec)
    if err != nil {
        log.Printf("Error creating listener: %v\n", err)
        return err
    }

    ui.Start(cfg.UI, m, l)

    waitForSignal()

    return nil
}

func waitForSignal() {
    ch := make(chan os.Signal)
    signal.Notify(ch, syscall.SIGINT, syscall.SIGTERM)
    s := <-ch
    log.Printf("Got signal: %v, exiting.", s)
}

Note how Config includes db.Config and ui.Config as I described earlier.

All the action happens in Run(*Config). We initialize a database connection, create a model.Model instance, and start the ui passing in the config, a pointer to the model and the listener.

`package model`

The purpose of model is to separate how data is stored in the database from the ui, as well as to contain any business logic an app might have. It’s the brains of the app if you will.

The model package should define a struct (Model seems like an appropriate name) and a pointer to an instance of the struct should be passed to all the ui functions and methods. There should only be one such instance in our app - for extra credit you can enforce that programmatically by making it a singleton, but I don’t think that’s necessary.

Alternatively you could get by without a Model and just use the package model itself. I don’t like this approach, but it’s an option.

The model should also define structs for the data entities we are dealing with. In our example it would be a Person struct. Its members should be exported (capitalized) because other packages will be accessing those. If you use sqlx, this is where you would also specify tags that map elements to db column names, e.g. `db:"first_name"`

Our Person type:

type Person struct {
    Id          int64
    First, Last string
}

In our case we do not need tags because our column names match the element names, and sqlx conveniently takes care of the capitalization, so Last matches the column named last.

package model should NOT import db

Somewhat counter-intuitive, model cannot import db. This is because db needs to import model, and circular imports are not allowed in Go. This is one case where interfaces come in handy. model needs to define an interface which db should satisfy. For now all we know is we need to list people, so we can start with this definition:

type db interface {
    SelectPeople() ([]*Person, error)
}

Our app doesn’t really do much, but we know it lists people, so our model should probably have a People() ([]*Person, error) method:

func (m *Model) People() ([]*Person, error) {
    return m.SelectPeople()
}

To keep things tidy, code should be in separate files, e.g. Person definition should be in person.go, etc. For readability, here is a single file version of our model:

package model

type db interface {
    SelectPeople() ([]*Person, error)
}

type Model struct {
    db
}

func New(db db) *Model {
    return &Model{
        db: db,
    }
}

func (m *Model) People() ([]*Person, error) {
    return m.SelectPeople()
}

type Person struct {
    Id          int64
    First, Last string
}

`package db`

db is the actual implementation of the database interaction. This is where the SQL statements are constructed and executed. This package also imports model because it will need to construct those structs from database data.

First, db needs to provide the InitDb function which will establish the database connection, as well as create the necessary tables and prepare the SQL statements.

Our simplistic example doesn’t support migrations, but in theory this is also where they might potentially happen.

We are using PostgreSQL, which means we need to import the pq driver. We are also going to rely on sqlx, and we need our own model. Here is the beginning of our db implementation:

package db

import (
    "database/sql"

    "github.com/grisha/gowebapp/model"
    "github.com/jmoiron/sqlx"
    _ "github.com/lib/pq"
)

type Config struct {
    ConnectString string
}

func InitDb(cfg Config) (*pgDb, error) {
    if dbConn, err := sqlx.Connect("postgres", cfg.ConnectString); err != nil {
        return nil, err
    } else {
        p := &pgDb{dbConn: dbConn}
        if err := p.dbConn.Ping(); err != nil {
            return nil, err
        }
        if err := p.createTablesIfNotExist(); err != nil {
            return nil, err
        }
        if err := p.prepareSqlStatements(); err != nil {
            return nil, err
        }
        return p, nil
    }
}

Our InitDb() creates an instance of a pgDb, which is our Postgres implementation of the model.db interface. It keeps all that we need to communicate with the database, including the prepared statements, and exports the necessary methods to satisfy the interface.

type pgDb struct {
    dbConn *sqlx.DB

    sqlSelectPeople *sqlx.Stmt
}

Here is the code to create the tables and the statements. From the SQL perspective this is rather simplistic, it could be a lot more elaborate, of course:

func (p *pgDb) createTablesIfNotExist() error {
    create_sql := `

       CREATE TABLE IF NOT EXISTS people (
       id SERIAL NOT NULL PRIMARY KEY,
       first TEXT NOT NULL,
       last TEXT NOT NULL);

    `
    if rows, err := p.dbConn.Query(create_sql); err != nil {
        return err
    } else {
        rows.Close()
    }
    return nil
}

func (p *pgDb) prepareSqlStatements() (err error) {

    if p.sqlSelectPeople, err = p.dbConn.Preparex(
        "SELECT id, first, last FROM people",
    ); err != nil {
        return err
    }

    return nil
}

Finally, we need to provide the method to satisfy the interface:

    people := make([]*model.Person, 0)
    if err := p.sqlSelectPeople.Select(&people); err != nil {
        return nil, err
    }
    return people, nil

Here we’re taking advantage of sqlx to run the query and construct a slice from results with a simple call to Select() (NB: p.sqlSelectPeople is a *sqlx.Stmt). Without sqlx we would have to iterate over the result rows, processing each with Scan, which would be considerably more verbose.

Beware of a very subtle “gotcha” here. people could also be defined as var people []*model.Person and the method would work just the same. However, if the database returned no rows, the method would return nil, not an empty slice. If the result of this method is later encoded as JSON, the former would become null and the latter []. This could cause problems if the client side doesn’t know how to treat null.

That’s it for db.

package ui

Finally, we need to serve all that stuff via HTTP and that’s what the ui package does.

Here is a very simplistic variant:

package ui

import (
    "fmt"
    "net"
    "net/http"
    "time"

    "github.com/grisha/gowebapp/model"
)

type Config struct {
    Assets http.FileSystem
}

func Start(cfg Config, m *model.Model, listener net.Listener) {

    server := &http.Server{
        ReadTimeout:    60 * time.Second,
        WriteTimeout:   60 * time.Second,
        MaxHeaderBytes: 1 << 16}

    http.Handle("/", indexHandler(m))

    go server.Serve(listener)
}

const indexHTML = `


  
    
    Simple Go Web App
  
  

  

`

func indexHandler(m *model.Model) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        fmt.Fprintf(w, indexHTML)
    })
}

Note how indexHTML contains next to nothing. This is 100% of the HTML that this app will ever serve. It will evolve a little as we get into the client side of the app, but only by a few lines.

Also noteworthy is how the handler is defined. If this idiom is not familiar, it’s worth spending a few minutes (or a day) to internalize it completely as it is very common in Go. indexHandler() is not a handler, it returns a handler function. It is done this way so that we can pass in a *model.Model via closure, since an HTTP handler function definition is fixed and a model pointer is not one of the parameters.

In the case of indexHandler() we’re not actually doing anything with the model pointer, but when we get to implementing an actual list of people we will need it.

Conclusion

Above is essentially all the knowledge required to build a basic Go web app, at least the Go side of it. Next week I’ll get into the client side and we will complete the people listing code.

Continue to part 3.

Building a Go Web App - Part 3

2017-04-27T13:00:00-04:00

This is part 3. See part 1 and part 2.

The previous two posts got us to a point where we had a Go app which was able to serve a tiny bit of HTML. This post will talk about the client side, which, alas, is mostly JavaScript, not Go.

JavaScript in 2017

This is what gave me the most grief. I don’t really know how to categorize the mess that present day JavaScript is, nor do I really know what to attribute it to, and trying to rationalize it would make for a great, but entirely different blog post. So I’m just going to accept this as the reality we cannot change and move on to how to best work with it.

Variants of JS

The most common variant of JS these days is known as ES2015 (aka ES6 or ECMAScript 6th Edition), and it is mostly supported by the more or less latest browsers. The latest released spec of JavaScript is ES7 (aka ES2016), but since the browsers are sill catching up with ES6, it looks like ES7 is never really going to be adopted as such, because most likely the next coming ES8 which might be released in 2017 will supersede it before the browsers are ready.

Curiously, there appears to be no simple way to construct an environment fully specific to a particular ECMAScript version. There is not even a way to revert to an older fully supported version ES5 or ES4, and thus it is not really possible to test your script for compliance. The best you can do is to test it on the browsers you have access to and hope for the best.

Because of the ever changing and vastly varying support for the language across platforms and browsers, transpilation has emerged as a common idiom to address this. Transpilation mostly amounts to JavaScript code being converted to JavaScript that complies with a specific ES version or a particular environment. For example import Bar from 'foo'; might become var Bar = require('foo');. And so if a particular feature is not supported, it can be made available with the help of the right plug-in or transpiler. I suspect that the transpilation proliferation phenomenon has led to additional problems, such as the input expected by a transpiler assuming existence of a feature that is no longer supported, same with output. Often this might be remedied by additional plugins, and it can be very difficult to sort out. On more than one occasion I spent a lot of time trying to get something to work only to find out later that my entire approach has been obsoleted by a new and better solution now built-in to some other tool.

JS Frameworks

There also seems to be a lot of disagreement on which JS framework is best. It is even more confusing because the same framework can be so radically different from one version to the next I wonder why they didn’t just change the name.

I have no idea which is best, and I only had the patience to try a couple. About a year ago I spent a bunch of time tinkering with AngularJS, and this time, for a change, I tinkered with React. For me, I think React makes more sense, and so this is what this example app is using, for better or worse.

React and JSX

If you don’t know what React is, here’s my (technically incorrect) explanation: it’s HTML embedded in JavaScript. We’re all so brainwashed into JavaScript being embedded in HTML as the natural order of things, that inverting this relationship does not even occur as a possibility. For the fundamental simplicity of this revolutionary (sic) concept I think React is quite brilliant.

A react “Hello World!” looks approximately like this:

class Hello extends React.Component {
  render() {
    let who = "World";
    return (
      <h1> Hello {who}! </h1>
    );
  }
}

Notice how the HTML just begins without any escape or delimiter. Surprisingly, the opening “<” works quite reliably as the marker signifying beginning of HTML. Once inside HTML, the opening curly brace indicates that we’re back to JavaScript temporarily, and this is how variable values are interpolated inside HTML. That’s pretty much all you need to know to “get” React.

Technically, the above file format is known as JSX, while React is the library which provides the classes used to construct React objects such as React.Component above. JSX is transpiled into regular JavaScript by a tool known as Babel, and in fact JSX is not even required, a React component can be written in plain JavaScript, and there is a school of thought whereby React is used without JSX. I personally find the JSX-less approach a little too noisy, and I also like that Babel allows you to use a more modern dialect of JS (though not having to deal with a transpiler is definitely a win).

Minimal Working Example

First, we need three pieces of external JavaScript. They are (1) React and ReactDOM, (2) Babel in-browser transpiler and (3) a little lib called Axios which is useful for making JSON HTTP requests. I get them out of Cloudflare CDN, there are probably other ways. To do this, we need to augment our indexHTML variable to look like this:

const (
	cdnReact           = "https://cdnjs.cloudflare.com/ajax/libs/react/15.5.4/react.min.js"
	cdnReactDom        = "https://cdnjs.cloudflare.com/ajax/libs/react/15.5.4/react-dom.min.js"
	cdnBabelStandalone = "https://cdnjs.cloudflare.com/ajax/libs/babel-standalone/6.24.0/babel.min.js"
	cdnAxios           = "https://cdnjs.cloudflare.com/ajax/libs/axios/0.16.1/axios.min.js"
)

const indexHTML = `

    Simple Go Web App
  
`

At the very end it now loads "/js/app.jsx" which we need to accommodate as well. Back in part 1 we created a UI config variable called cfg.Assets using http.Dir(). We now need to wrap it in a handler which serves files, and Go conveniently provides one:

    http.Handle("/js/", http.FileServer(cfg.Assets))

With the above, all the files in "assets/js" become available under "/js/".

Finally we need to create the assets/js/app.jsx file itself:

class Hello extends React.Component {
  render() {
    let who = "World";
    return (
      <h1> Hello {who}! </h1>
    );
  }
}

ReactDOM.render( <Hello/>, document.querySelector("#root"));

The only difference from the previous listing is the very last line, which is what makes the app actually render itself.

If we now hit the index page from a (JS-capable) browser, we should see a “Hello World”.

What happened was that the browser loaded “app.jsx” as it was instructed, but since “jsx” is not a file type it is familiar with, it simply ignored it. When Babel got its chance to run, it scanned our document for any script tags referencing “text/babel” as its type, and re-requested those pages (which makes them show up twice in developer tools, but the second request ought to served entirely from browser cache). It then transpiled it to valid JavaScript and executed it, which in turn caused React to actually render the “Hello World”.

Listing People

We need to first go back to the server side and create a URI that lists people. In order for that to happen, we need an http handler, which might look like this:

func peopleHandler(m *model.Model) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		people, err := m.People()
		if err != nil {
			http.Error(w, "This is an error", http.StatusBadRequest)
			return
		}

		js, err := json.Marshal(people)
		if err != nil {
			http.Error(w, "This is an error", http.StatusBadRequest)
			return
		}

		fmt.Fprintf(w, string(js))
	})
}

And we need to register it:

    http.Handle("/people", peopleHandler(m))

Now if we hit "/people", we should get a "[]" in response. If we insert a record into our people table with something along the lines of:

INSERT INTO people (first, last) VALUES('John', 'Doe');

The response should change to [{"Id":1,"First":"John","Last":"Doe"}].

Finally we need to hook up our React/JSX code to make it all render.

For this we are going to create a PersonItem component, and another one called PeopleList which will use PersonItem.

A PersonItem only needs to know how to render itself as a table row:

class PersonItem extends React.Component {
  render() {
    return (
      <tr>
        <td> {this.props.id}    </td>
        <td> {this.props.first} </td>
        <td> {this.props.last}  </td>
      </tr>
    );
  }
}

A PeopleList is slightly more complicated:

          {people}
        </tbody>class PeopleList extends React.Component {
  constructor(props) {
    super(props);
    this.state = { people: [] };
  }

  componentDidMount() {
    this.serverRequest =
      axios
        .get("/people")
        .then((result) => {
           this.setState({ people: result.data });
        });
  }

  render() {
    const people = this.state.people.map((person, i) => {
      return (
        <PersonItem key={i} id={person.Id} first={person.First} last={person.Last} />
      );
    });

    return (
      <div>
        <table><tbody>
          <tr><th>Id</th>FirstLast


      </div>
    );
  }
}

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33	`class PeopleList extends React.Component { constructor(props) { super(props); this.state = { people: [] }; } componentDidMount() { this.serverRequest = axios .get("/people") .then((result) => { this.setState({ people: result.data }); }); } render() { const people = this.state.people.map((person, i) => { return ( <PersonItem key={i} id={person.Id} first={person.First} last={person.Last} /> ); }); return ( <div> <table><tbody> <tr><th>Id</th>`	First	Last

It has a constructor which initializes a this.state variable. It also declared a componentDidMount() method, which React will call when the component is about to be rendered, making it the (or one of) correct place to fetch the data from the server. It fetches the data via an Axios call, and saves the result in this.state.people. Finally, render() iterates over the contents of this.state.people creating an instance of PersonItem for each.

That’s it, our app now responds with a (rather ugly) table listing people from our database.

Conclusion

In essence, this is all you need to know to make a fully functional Web App in Go. This app has a number of shortcomings, which I will hopefully address later. For example in-browser transpilation is not ideal, though it might be fine for a low volume internal app where page load time is not important, so we might want to have a way to pre-transpile it ahead of time. Also our JSX is confined to a single file, this might get hard to manage for any serious size app where there are lots of components. The app has no navigation. There is no styling. There are probably things I’m forgetting about…

Enjoy!

P.S. Complete code is here

Continued in part 4…

Building a Go Web App - Part 4

2017-04-27T09:13:00-04:00

This is part 4. See part 1, part 2 and part 3.

In this part I will try to briefly go over the missing pieces in our very simplistic Go Web App.

HTTP Handler Wrappers

I tiny rant: I do not like the word “middleware”. The concept of a wrapper has been around since the dawn of computing, there is no need to invent new words for it.

Having that out of the way, let’s say we need to require authentication for a certain URL. This is what our index handler presently looks like:

func indexHandler(m *model.Model) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		fmt.Fprintf(w, indexHTML)
	})
}

We could write a function which takes an http.Handler as an argument and returns a (different) http.Handler. The returned handler checks whether the user is authenticated with m.IsAuthenticated() (whatever it does is not important here) and redirects the user to a login page, or executes the original handler by calling its ServeHTTP() method.

func requireLogin(h http.Handler, m *model.Model) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		if !m.IsAuthenticated(r) {
			http.Redirect(w, r, loginURL, http.StatusFound)
			return
		}
		h.ServeHTTP(w, r)
	})
}

Given the above, the function registration now would look like this:

   http.Handle("/", requireLogin(indexHandler(m)), m)

Handlers can be wrapped this way in as many layers as needed and this approach is very flexible. Anything from setting headers to compressing output can be accomplished via a wrapper. Note also that we can pass in whatever arguments we need, for example our *model.Model.

URL Parameters

Sooner or later we might want to rely on URL parameters, e.g. /person/3 where 3 is a person id. Go standard library doesn’t provide any support for this leaving it as an exercise for the developer. The software component responsible for this sort of thing is known as a Mux or “router” and it can be replaced by a custom implementation. A Mux also provides a ServeHTTP() method which means it satisfies the http.Handler interface, i.e. it is a handler.

A very popular implementation is the Gorilla Mux. It is easy to delegate entire sub-urls to the Gorilla Mux wherever more flexibility is needed. For example we can decide that everything from /person and below is handled by an instance of a Gorilla router and we want that to be all authenticated, which might look like this:

	// import "github.com/gorilla/mux"
	pr := mux.NewRouter().PathPrefix("/person").Subrouter()
	pr.Handle("/{id}", personGetHandler(m)).Methods("GET")
	pr.Handle("/", personPostHandler(m)).Methods("POST")
	pr.Handle("/{id}", personPutHandler(m)).Methods("PUT")
	http.Handle("/person/", requireLogin(pr))

NB: I found that trailing slashes are important and the rules on when they are required are a bit confusing.

There are many other router/mux implementations out there, the beauty of not buying into any kind of a framework is that we can choose the one that works best for us or write our own (they are not difficult to implement).

Asset Handling

One of the neatest things about Go is that a compiled program is a single binary not a big pile of files like it is with most scripting languages and even compiled ones. But if our program relies on assets (JS, CSS, image and other files), we would need to copy those over to the server at deployment time.

There is a way we can preserve the “one binary” characteristic of our program by including assets as part of the binary itself. For that there is the go-bindata project and its nephew go-bindata-assetfs.

Since packing assets into the binary is slightly beyond what go build can accomplish, we will need some kind of a script to take care of it. My personal preference is to use the tried and true make, and it is not uncommon to see Go projects come with a Makefile.

Here is a relevant example Makefile rule

ASSETS_DIR = "assets"
build:
	@export GOPATH=$${GOPATH-~/go} && \
	go get github.com/jteeuwen/go-bindata/... github.com/elazarl/go-bindata-assetfs/... && \
	$$GOPATH/bin/go-bindata -o bindata.go -tags builtinassets ${ASSETS_DIR}/... && \
	go build -tags builtinassets -ldflags "-X main.builtinAssets=${ASSETS_DIR}"

The above rule creates a bindata.go file which will be placed in the same directory where main.go is and becomes part of package main. main.go will somehow know that assets are built-in and this is accomplished via an -ldflags "-X main.builtinAssets=${ASSETS_DIR}" trick, which is a way to assign values to variables at compile time. This means that our code can now check for the value of builtinAssets to decide what to do, e.g.:

	if builtinAssets != "" {
		log.Printf("Running with builtin assets.")
		cfg.UI.Assets = &assetfs.AssetFS{Asset: Asset, AssetDir: AssetDir, AssetInfo: AssetInfo, Prefix: builtinAssets}
	} else {
		log.Printf("Assets served from %q.", assetsPath)
		cfg.UI.Assets = http.Dir(assetsPath)
	}

The second important thing is that we are defining a build tag called builtinassets. We are also telling go-bindata about it, what this means is “only compile me when builtinassets is set”, and this controls under which circumstances bindata.go (which contains our assets as Go code) is to actually be compiled.

Pre-transpilation of JavaScript

Last, but not the least, I want to briefly mention packing of web assets. To describe it properly is enough material for a whole new series of posts, and this would really have nothing to do with Go. But I can at least list the following points.

You might as well give in and install npm, and make a package.json file.
Once npm is installed, it is trivial to install the Babel command-line compiler, babel-cli, which is one way to transpile JavaScript.
A more complicated, frustrating, but ultimately more flexible method is to use webpack. Webpack will pre-transpile and do things like combine all JS into a single file as well as minimize it.
I was surprised by how difficult it was to provide module import functionality in JavaScript. The problem is that there is an ES6 standard for import and export keywords, but there is no implementation, and even Babel assumes that something else implements it for you. In the end I settled on SystemJS. The complication with SystemJS is that now in-browser Babel transpilation needs to be something that SystemJS is aware of, so I had to use its Babel plugin for that. Webpack in turn (I think?) provides its own module support implementation, so SystemJS is not needed when assets are packed. Anyhow, it was all rather frustrating.

Conclusion

I would say that in the set up I describe in this four part series Go absolutely shines, while JavaScript not so much. But once I got over the initial hurdle of getting it all to work, React/JSX was easy and perhaps even pleasant to work with.

That’s it for now, hope you find this useful.

Tgres 0.10.0b - Time Series with Go and PostgreSQL

2017-03-22T13:52:00-04:00

After nearly two years of hacking, I am tagging this version of Tgres as beta. It is functional and stable enough for people to try out and not feel like they are wasting their time. There is still a lot that could and should be improved, but at this point the most important thing is to get more people to check it out.

What is Tgres?

Tgres is a Go program which can receive time series data via Graphite, Statsd protocols or an http pixel, store it in PostgreSQL, and provide Graphite-like access to the data in a way that is compatible with tools such as Grafana. You could think of it as a drop-in Graphite/Statsd replacement, though I’d rather avoid direct comparison, because the key feature of Tgres is that data is stored in PostgreSQL.

Why PostgreSQL?

The “grand vision” for Tgres begins with the database. Relational databases have the most man-decades of any storage type invested into them, and PostgreSQL is probably the most advanced implementation presently in existence.

If you search for “relational databases and time series” (or some variation thereupon), you will come across the whole gamut of opinions (if not convictions) varying so widely it is but discouraging. This is because time series storage, while simple at first glance, is actually fraught with subtleties and ambiguities that can drive even the most patient of us up the wall.

Avoid Solving the Storage Problem.

Someone once said that “anything is possible when you don’t know what you’re talking about”, and nowhere is it more evident than in data storage. File systems and relational databases trace their origin back to the late 1960s and over half a century later I doubt that any field experts would say “the storage problem is solved”. And so it seems almost foolish to suppose that by throwing together a key-value store and a concensus algorithm or some such it is possible to come up with something better? Instead of re-inventing storage, why not focus on how to structure the data in a way that is compatible with a storage implementation that we know works and scales reliably?

As part of the Tgres project, I thought it’d be interesting to get to the bottom of this. If not bottom, then at least deeper than most people dare to dive. I am not a mathematician or a statistician, nor am I a data scientist, whatever that means, but I think I understand enough about the various subjects involved, including programming, that I can come up with something more than just another off-the-cuff opinion.

And so now I think I can conclude definitively that time series data can be stored in a relational database very efficently, PostgreSQL in particular for its support for arrays. The general approach I described in a series of blogs starting with this one, Tgres uses the technique described in the last one. In my performance tests the Tgres/Postgres combination was so efficient it was possibly outperforming its time-series siblings.

The good news is that as a user you don’t need to think about the complexities of the data layout, Tgres takes care of it. Still I very much wish people would take more time to think about how to organize data in a tried and true solution like PostgreSQL before jumping ship into the murky waters of the “noSQL” ocean, lured by alternative storage sirens, big on promise but shy on delivery, only to drown where no one could come to the rescue.

How else is Tgres different?

Tgres is a single program, a single binary which does everything (one of my favorite things about Go). It supports all of Graphite and Statsd protocols without having to run separate processes, there are no dependencies of any kind other than a PostgreSQL database. No need for Python, Node or a JVM, just the binary, the config file and access to a database.

And since the data is stored in Postgres, virtually all of the features of Postgres are available: from being able to query the data using real SQL with all the latest features, to replication, security, performance, back-ups and whatever else Postgres offers.

Another benefit of data being in a database is that it can be accessible to any application frameworks in Python, Ruby or whatever other language as just another database table. For example in Rails it might be as trivial as class Tv < ActiveRecord::Base; end et voilà, you have the data points as a model.

It should also be mentioned that Tgres requires no PostgreSQL extensions. This is because optimizing by implementing a custom extension which circumvents the PostgreSQL natural way of handling data means we are solving the storage problem again. PostgreSQL storage is not broken to begin with, no customization is necessary to handle time series.

In addition to being a standalone program, Tgres packages aim to be useful on their own as part of any other Go program. For example it is very easy to equip a Go application with Graphite capabilities by providing it access to a database and using the provided http handler. This also means that you can use a separate Tgres instance dedicated to querying data (perhaps from a downstream Potgres slave).

Some Internals Overview

Internally, Tgres series identification is tag-based. The series are identified by a JSONB field which is a set of key/value pairs indexed using a GIN index. In Go, the JSONB field becomes a serde.Ident. Since the “outside” interface Tgres is presently mimicking is Graphite, which uses dot-separated series identifiers, all idents are made of just one tag “name”, but this will change as we expand the DSL.

Tgres stores data in evenly-spaced series. The conversion from the data as it comes in to its evenly-spaced form happens on-the-fly, using a weighted mean method, and the resulting stored rate is actually correct. This is similar to how RRDTool does it, but different from many other tools which simply discard all points except for last in the same series slot as I explained in this post.

Tgres maintains a (configurable) number of Round-Robin Archives (RRAs) of varying length and resolution for each series, this is an approach similar to RRDTool and Graphite Whisper as well. The conversion to evenly-spaced series happens in the rrd package.

Tgres does not store the original (unevenly spaced) data points. The rationale behind this is that for analytical value you always inevitably have to convert an uneven series to a regular one. The problem of storing the original data points is not a time-seires problem, the main challenge there is the ability to keep up with a massive influx of data, and this is what Hadoop, Cassandra, S3, BigQuery, etc are excellent at.

While Tgres code implements most of the Graphite functions, complete compatibility with the Graphite DSL is not a goal, and some functions will probably left uniplemented. In my opinion the Graphite DSL has a number of shortcomings by design. For example, the series names are not strings but are syntactically identifiers, i.e. there is no difference between scale(foo.bar, 10) and scale("foo.bar", 10), which is problematic in more than one way. The dot-names are ingrained into the DSL, and lots of functions take arguments denoting position within the dot-names, but they seem unnecessary. For example there is averageSeriesWithWildcards and sumSeriesWithWildcards, while it would be cleaner to have some kind of a wildcard() function which can be passed into average() or sum(). Another example is that Graphite does not support chaining (but Tgres already does), e.g. scale(average("foo.*"), 10) might be better as average("foo.*").scale(10). There are many more similar small grievances I have with the DSL, and in the end I think that the DSL ought to be revamped to be more like a real language (or perhaps just be a language, e.g. Go itself), exactly how hasn’t been crystalized just yet.

Tgres also aims to be a useful time-series processing Golang package (or a set of packages). This means that in Go the code also needs to be clean and readable, and that there ought to be a conceptual correspondence between the DSL and how one might to something at the lower level in Go. Again, the vision here is still blurry, and more thinking is required.

For Statsd functionality, the network protocol is supported by the tgres/statsd package while the aggregation is done by the tgres/aggregator. In addition, there is also support for “paced metrics” which let you aggregate data before it is passed on to the Tgres receiver and becomes a data point, which is useful in situations where you have some kind of an iteration that would otherwise generate millions of measurements per second.

The finest resolution for Tgres is a millisecond. Nanoseconds seems too small to be practical, though it shouldn’t be too hard to change it, as internally Tgres uses native Go types for time and duration - the milliseconds are the integers in the database.

When the Data points are received via the network, the job of parsing the network stuff is done by the code in the tgres/daemon package with some help from tgres/http and tgres/statsd, as well as potentially others (e.g. Python pickle decoding).

Once received and correctly parsed, they are passed on to the tgres/receiver. The receiver’s job is to check whether this series ident is known to us by checking the cache or that it needs to be loaded from the database or created. Once the appropriate series is found, the receiver updates the in-memory cache of the RRAs for the series (which causes the data points to be evenly spaced) as well as periodically flushes data points to the data base. The receiver also controls the aggregator of statsd metrics.

The database interface code is in the tgres/serde package which supports PostgreSQL or an in-memory database (useful in situations where persistence is not required or during testing).

When Tgres is queried for data, it loads it from the database into a variety of implementations of the Series interface in the tgres/series package as controlled by the tgres/dsl responsible for figuring out what is asked of it in the query.

In addition to all of the above, Tgres supports clustering, though this is highly experimental at this point. The idea is that a cluster of Tgres instances (all backed by the same database, at least for now) would split the series amongst themselves and forward data points to the node which is responsible for a particular series. The nodes are placed behind a load-balancer of some kind, and with this set up nodes can go in and out of the cluster without any overall downtime for maximum availability. The clustering logic lives in tgres/cluster.

This is an overly simplistic overview which hopefully conveys that there are a lot of pieces to Tgres.

Future

In addition to a new/better DSL, there are lots of interesting ideas, and if you have any please chime in on Github.

One thing that is missing in the telemetry world is encryption, authentication and access control so that tools like Tgres could be used to store health data securely.

A useful feature might be interoperability with big data tools to store the original data points and perhaps provide means for pulling them out of BigQuery or whatever and replay them into series - this way we could change the resolution to anything at will.

Or little details like a series alias - so that a series could be renamed. The way this would work is you rename a series while keeping its old ident as an alias, then take your time to make sure all the agents send data under the new name, at which point the alias can go away.

Lots can also be done on the scalability front with improved clustering, sharding, etc.

We Could Use Your Help

Last but not least, this is an Open Source project. It works best when people who share the vision also contribute to the project, and this is where you come in. If you’re interested in learning more about time series and databases, please check it out and feel free to contribute in any way you can!

Tgres Load Testing Follow Up

2017-02-28T21:40:00-05:00

To follow up on the previous post, after a bunch of tweaking, here is Tgres (commit) receiving over 150,000 data points per second across 500,000 time series without any signs of the queue size or any other resource blowing up.

This is both Tgres and Postgres running on the same i2.2xlarge EC2 instance (8 cores, 64GB, SSD).

At this point I think there’s been enough load testing and optimization, and I am going to get back to crossing the t’s and dotting the i’s so that we can release the first version of Tgres.

PostgreSQL vs Whisper, which is Faster?

2017-02-23T09:49:00-05:00

Note: there is an update to this post.

TL;DR

On a 8 CPU / 16 GB EC2 instance, Tgres can process 150,000 data points per second across 300,000 series (Postgres running on the same machine). With some tweaks we were able to get the number of series to half a million, flushing ~60K data points per second.

Now the long version…

If you were to ask me whether Tgres could outperform Graphite, just a couple of months ago my answer would have been “No”. Tgres uses Postgres to store time series data, while Graphite stores data by writing to files directly, the overhead of the relational database just seemed too great.

Well, I think I’ve managed to prove myself wrong. After re-working Tgres to use the write-optimized layout, I’ve run some tests on AWS yielding unexpectedly promising results.

As a benchmark I targeted the excellent blog post by Jason Dixon describing his AWS Graphite test. My goal was to get to at least half the level of performance described therein. But it appears the combination of Go, Postgres and some clever data structuring has been able to beat it, not without breaking a little sweat, but it has.

My test was conducted on a c4.2xlarge instance, which has 8 cores and 16 GB, using 100GB EBS (which, if I understood it correctly, comes with 300 IOPS, please comment if I’m wrong). The “c4” instances are supposed to be some of the highest speed CPU AWS has to offer, but compare this with the instance used in the Graphite test, an i2.4xlarge (16 CPU/ 122GB), it had half the CPU cores and nearly one tenth of the RAM.

Before I go any further, here is the obligatory screenshot, then my observations and lessons learned in the process, as well as a screenshot depicting even better performance.

The Tgres version running was this one, with the config detailed at the bottom of the post.

Postgres was whatever yum install postgresql95-server brings your way, with the data directory moved to the EBS volume formatted using ext4 (not that I think it matters). The Postgres config was modified to allow a 100ms commit delay and to make autovacuum extra aggressive. I did not increase any memory buffers and left everything else as is. Specifically, these were the changes:

autovacuum_work_mem = -1
synchronous_commit = off
commit_delay = 100000
autovacuum_max_workers = 10
autovacuum_naptime = 1s
autovacuum_vacuum_threshold = 2000
autovacuum_vacuum_scale_factor = 0.0
autovacuum_vacuum_cost_delay = 0

The data points for the test were generated by a goroutine in the Tgres process itself. In the past I’ve found that blasting a server with this many UDP packets can be tricky and hardware/network intensive. It’s also hard to tell when/if they get dropped and why, etc. Since Go is not known for having problems in its network stack, I was not too worried about it, I just wanted a reliable and configurable source of incoming packets, and in Go world writing a simple goroutine seemed like the right answer.

Somewhat Random Notes and Making Tgres Even Faster

Determining failure

Determining when we are “at capacity” is tricky. I’ve mostly looked at two factors (aside from the obvious - running out of memory/disk, becoming unresponsive, etc): receiver queue size and Postgres table bloat.

Queue size

Tgres uses “elastic channels” (so eloquently described here by Nick Patavalis) for incoming data points and to load series from Postgres. These are channel-like structures that can grow to arbitrary length only limited by the memory available. This is done so as to be able to take maximum advantage of the hardware at hand. If any of those queues starts growing out of control, we are failing. You can see in the picture that at about 140K data points per second the receiver queue started growing, though it did stay steady at this size and never spun out of control (the actual test was left overnight at this rate just to make sure).

PG Table Bloat

Table bloat is a phenomenon affecting Postgres in write-intensive situations because of its adherence to the MVCC. It basically means that pages on disk are being updated faster than the autovacuum process can keep up with them and the table starts growing out of control.

To monitor for table bloat, I used a simple formula which determined the approximate size of the table based on the row count (our data is all floats, which makes it very predictable) and compared it with the actual size. If the actual size exceeded the estimated size, that’s considered bloat. Bloat is reported in the “TS Table Size” chart. A little bloat is fine, and you can see that it stayed in fairly low percent throughout the test.

In the end, though more research is warranted, it may just turn out that contrary to every expectation PostgreSQL was not the limiting factor here. The postmaster processes stayed below 170MB RSS, which is absolutely remarkable, and Grafana refreshes were very quick even at peak loads.

Memory consumption

Tgres has a slight limitation in that creating a series is expensive. It needs to check with Postgres and for reasons I don’t want to bore you with it’s always a SELECT, optionally followed by an “UPSERT”. This takes time, and during the ramp-up period when the number of series is growing fast and lots of them need to be created, the Go runtime ends up consuming a lot of memory. You can see that screenshot image reports 4.69GB. If I were to restart Tgres (which would cause all existing DS names to be pre-cached) its memory footprint stayed at about 1.7GB. More work needs to be done to figure out what accounts for the difference.

Data Point Rate and Number of Series

The rate of data points that need to be saved to disk is a function of the number of series and the resolution of the RRAs. To illustrate, if I have one series at 1 point per second, even if I blast a million data points per second, still only 1 data point per second needs to be saved.

There is an important difference between Graphite and Tgres in that Tgres actually adjusts the final value considering the every data point value using weighted mean, while Graphite just ignores all points but the last. So Tgres does a bit more work, which adds up quickly at 6-figure rates per second.

The Graphite test if I read the chart correctly was able to process ~70K data points per second across 300K series. My test had 300K series and data points were coming in at over 150K/s. But just out of curiosity, I tried to push it to its limit.

At 400 series, you can see clear signs of deterioration. You can see how vcache isn’t flushed fast enough leaving gaps at the end of series. If we stop the data blast, it does eventually catch up, so long as there is memory for the cache.

If you don’t catch this condition in time, Tgres will die with:

fatal error: runtime: out of memory

runtime stack:
runtime.throw(0xa33e5a, 0x16)
        /home/grisha/.gvm/gos/go1.8/src/runtime/panic.go:596 +0x95
...

Segment Width

There is still one easy performance card we can play here. Segment width is how many data points are stored in one row, it is also the limit on how many points we can transfer in a single SQL operation. Segment width by default is 200, because a width higher than that causes rows to exceed a page and trigger TOAST. TOAST can be good or bad because it means data is stored in a separate table (not so good), but it also means it’s compressed, which may be an I/O win.

So what would happen if we set the segment width to 1000?

The picture changes significantly (see below). I was able to get the number of series to 500K, note the whopping 52,602 data points being written to the database per second! You can see we’re pushing it to the limit because the receiver queue is beginning to grow. I really wanted to get the rate up to 150K/sec, but it just didn’t want to go there.

And what would happen if we set the segment width to 4096?

Interestingly, the memory footprint is a tad larger while the vcache is leaner, the number of data points flushed per second is about same, though in fewer SQL statements, and the overall picture is about the same and the incoming queue still skyrockets at just about 100K/sec over 500K series.

Conclusion

There is plenty of places in Tgres code that could still be optimized.

One issue that would be worth looking into is exposing Tgres to the firehose on an empty database. The current code runs out of memory in under a minute when suddenly exposed to 300K new series at 150K/s. Probably the simplest solution to this would be to somehow detect that we’ve unable to keep up and start dropping data points. Eventually, when all the series are created and cached, performance should even out after the initial spike and all should be well.

In any event, it’s nice to be able to do something like this and know that it is performant as well:

tgres=> select t, r from ds
 join tv  on tv.ds_id = ds.id
where ident @> '{"name":"tgres.0_0_0_0.runtime.load.five"}'
  and tv.step_ms = 10000
order by t desc
limit 5;
           t            |       r
------------------------+----------------
 2017-02-23 22:31:50+00 | 1.256833462648
 2017-02-23 22:26:30+00 | 1.305209492142
 2017-02-23 22:24:10+00 | 1.554056287975
 2017-02-23 22:24:00+00 | 1.453365774931
 2017-02-23 22:23:50+00 | 1.380504724386
(5 rows)

Reference

For completness sake, the instance was created using Terraform config approximately like this:

variable "aws_region" { default = "us-east-1" }
variable "aws_zone" { default = "us-east-1a" }
variable "key_name" { default = "REDACTED"

provider "aws" {
  region = "${var.aws_region}"
}

resource "aws_ebs_volume" "ebs_volume" {
  availability_zone = "${var.aws_zone}"
  size = 100
}

resource "aws_volume_attachment" "ebs_att" {
  device_name = "/dev/sdh"
  volume_id = "${aws_ebs_volume.ebs_volume.id}"
  instance_id = "${aws_instance.tgres-test-tmp.id}"
}

resource "aws_instance" "tgres-test-tmp" {
  ami = "ami-0b33d91d"
  instance_type = "c4.2xlarge"
  subnet_id = "REDACTED"
  vpc_security_group_ids = [
    "REDACTED"
  ]
  associate_public_ip_address = true
  key_name = "${var.key_name}"
}

And then the following commands were used to prime everyting:

sudo mke2fs /dev/sdh
sudo mkdir /ebs
sudo mount /dev/sdh /ebs

sudo yum install -y postgresql95-server
sudo service postgresql95 initdb
sudo mkdir /ebs/pg
sudo mv /var/lib/pgsql95/data /ebs/pg/data
sudo ln -s /ebs/pg/data /var/lib/pgsql95/data

sudo vi /var/lib/pgsql95/data/postgresql.conf
# BEGIN postgres config - paste this somewhere in the file
autovacuum_work_mem = -1
synchronous_commit = off
commit_delay = 100000
autovacuum_max_workers = 10
autovacuum_naptime = 1s
autovacuum_vacuum_threshold = 2000
autovacuum_vacuum_scale_factor = 0.0
autovacuum_vacuum_cost_delay = 0
# END postgres config

sudo service postgresql95 restart

# create PG database

sudo su - postgres
createuser -s ec2-user   # note -s is superuser - not necessary for tgres but just in case
createdb tgres
exit

# Tgres (requires Go - I used 1.8)
# (or you can just scp it from some machine where you already have go environment)
mkdir golang
export GOPATH=~/golang/
go get github.com/tgres/tgres
cd /home/ec2-user/golang/src/github.com/tgres/tgres
go build
cp etc/tgres.conf.sample etc/tgres.conf

The tgres.conf file looked like this:

min-step                = "10s"

pid-file =                 "tgres.pid"
log-file =                 "log/tgres.log"
log-cycle-interval =       "24h"

max-flushes-per-second      = 1000000 # NB - Deprecated setting
workers                     = 4       # NB - Deprecated setting

http-listen-spec            = "0.0.0.0:8888"
graphite-line-listen-spec   = "0.0.0.0:2003"
graphite-text-listen-spec   = "0.0.0.0:2003"
graphite-udp-listen-spec    = "0.0.0.0:2003"
graphite-pickle-listen-spec = "0.0.0.0:2004"

statsd-text-listen-spec     = "0.0.0.0:8125"
statsd-udp-listen-spec      = "0.0.0.0:8125"
stat-flush-interval         = "10s"
stats-name-prefix           = "stats"

db-connect-string = "host=/tmp dbname=tgres sslmode=disable"

[[ds]]
regexp = ".*"
step = "10s"
#heartbeat = "2h"
rras = ["10s:6h", "1m:7d", "1h:1y"]

Tgres was running with the following. The TGRES_BLASTER starts the blaster goroutine.

TGRES_BIND=0.0.0.0 TGRES_BLASTER=1 ./tgres

Once you have Tgres with the blaster running, you can control it via HTTP, e.g. the following would set it to 50K/s data points across 100K series. Setting rate to 0 pauses it.

curl -v "http://127.0.0.1:8888/blaster/set?rate=50000&n=100000"

Storing Time Series in PostgreSQL - Optimize for Write

2017-01-21T09:33:00-05:00

Continuing on the previous write up on how time series data can be stored in Postgres efficiently, here is another approach, this time providing for extreme write performance.

The “horizontal” data structure in the last article requires an SQL statement for every data point update. If you cache data points long enough, you might be able to collect a bunch for a series and write them out at once for a slight performance advantage. But there is no way to update multiple series with a single statement, it’s always at least one update per series. With a large number of series, this can become a performance bottleneck. Can we do better?

One observation we can make about incoming time series data is that commonly the data points are roughly from the same time period, the current time, give or take. If we’re storing data at regularly-spaced intervals, then it is extremely likely that many if not all of the most current data points from various time series are going to belong to the exact same time slot. Considering this observation, what if we organized data points in rows of arrays, only now we would have a row per timestamp while the position within the array would determine the series?

Lets create the tables:

CREATE TABLE rra_bundle (
  id SERIAL NOT NULL PRIMARY KEY,
  step_ms INT NOT NULL,
  steps_per_row INT NOT NULL,
  size INT NOT NULL,
  latest TIMESTAMPTZ DEFAULT NULL);

CREATE TABLE rra (
  id SERIAL NOT NULL PRIMARY KEY,
  ds_id INT NOT NULL,
  rra_bundle_id INT NOT NULL,
  pos INT NOT NULL);

CREATE TABLE ts (
  rra_bundle_id INT NOT NULL,
  i INT NOT NULL,
  dp DOUBLE PRECISION[] NOT NULL DEFAULT '{}');

Notice how the step and size now become properties of the bundle rather than the rra which now refers to a bundle. In the ts table, i is the index in the round-robin archive (which in the previous “horizontal” layout would be the array index).

The data we used before was a bunch of temperatures, lets add two more series, one where temperature is 1 degree higher, and one where it’s 1 degree lower. (Not that it really matters).

INSERT INTO rra_bundle VALUES (1, 60000, 1440, 28, '2008-04-02 00:00:00-00');

INSERT INTO rra VALUES (1, 1, 1, 1);
INSERT INTO rra VALUES (2, 2, 1, 2);
INSERT INTO rra VALUES (3, 3, 1, 3);

INSERT INTO ts VALUES (1, 0, '{64,65,63}');
INSERT INTO ts VALUES (1, 1, '{67,68,66}');
INSERT INTO ts VALUES (1, 2, '{70,71,69}');
INSERT INTO ts VALUES (1, 3, '{71,72,70}');
INSERT INTO ts VALUES (1, 4, '{72,73,71}');
INSERT INTO ts VALUES (1, 5, '{69,70,68}');
INSERT INTO ts VALUES (1, 6, '{67,68,66}');
INSERT INTO ts VALUES (1, 7, '{65,66,64}');
INSERT INTO ts VALUES (1, 8, '{60,61,59}');
INSERT INTO ts VALUES (1, 9, '{58,59,57}');
INSERT INTO ts VALUES (1, 10, '{59,60,58}');
INSERT INTO ts VALUES (1, 11, '{62,63,61}');
INSERT INTO ts VALUES (1, 12, '{68,69,67}');
INSERT INTO ts VALUES (1, 13, '{70,71,69}');
INSERT INTO ts VALUES (1, 14, '{71,72,70}');
INSERT INTO ts VALUES (1, 15, '{72,73,71}');
INSERT INTO ts VALUES (1, 16, '{77,78,76}');
INSERT INTO ts VALUES (1, 17, '{70,71,69}');
INSERT INTO ts VALUES (1, 18, '{71,72,70}');
INSERT INTO ts VALUES (1, 19, '{73,74,72}');
INSERT INTO ts VALUES (1, 20, '{75,76,74}');
INSERT INTO ts VALUES (1, 21, '{79,80,78}');
INSERT INTO ts VALUES (1, 22, '{82,83,81}');
INSERT INTO ts VALUES (1, 23, '{90,91,89}');
INSERT INTO ts VALUES (1, 24, '{69,70,68}');
INSERT INTO ts VALUES (1, 25, '{75,76,74}');
INSERT INTO ts VALUES (1, 26, '{80,81,79}');
INSERT INTO ts VALUES (1, 27, '{81,82,80}');

Notice that every INSERT adds data for all three of our series in a single database operation!

Finally, let us create the view. (How it works is described in detail in the previous article)

CREATE VIEW tv AS
  SELECT rra.id as rra_id,
     rra_bundle.latest - INTERVAL '1 MILLISECOND' * rra_bundle.step_ms * rra_bundle.steps_per_row *
       MOD(rra_bundle.size + MOD(EXTRACT(EPOCH FROM rra_bundle.latest)::BIGINT*1000/(rra_bundle.step_ms * rra_bundle.steps_per_row),
       rra_bundle.size) - i, rra_bundle.size) AS t,
     dp[pos] AS r
  FROM rra AS rra
  JOIN rra_bundle AS rra_bundle ON rra_bundle.id = rra.rra_bundle_id
  JOIN ts AS ts ON ts.rra_bundle_id = rra_bundle.id;

And now let’s verify that it works:

=> select * from tv where rra_id = 1 order by t;
 rra_id |           t            | r
 --------+------------------------+----
       1 | 2008-03-06 00:00:00-00 | 64
       1 | 2008-03-07 00:00:00-00 | 67
       1 | 2008-03-08 00:00:00-00 | 70
 ...

This approach makes writes blazingly fast though it does have its drawbacks. For example there is no way to read a single series - even though the view selects a single array element, under the hood Postgres reads the whole row. Given that time series is more write intensive and rarely read, this may not be a bad compromise.

Simple Tgres Part II - A High Rate Counter

2016-12-28T17:06:00-05:00

Continuing on the the previous post on simple use of Tgres components, let’s try to count something that goes by really fast.

This time let’s start out with creating a memory-based SerDe. This means that all our data is in memory and there is no database backing our series.

package main

import (
    "fmt"
    "net/http"
    "time"

    "github.com/tgres/tgres/dsl"
    h "github.com/tgres/tgres/http"
    "github.com/tgres/tgres/receiver"
    "github.com/tgres/tgres/rrd"
    "github.com/tgres/tgres/serde"
)

func main() {

    step := 1 * time.Second // 1 second resolution
    span := 600 * step      // spanning 10 minutes

    // In-memory SerDe
    ms := serde.NewMemSerDe()

    // Create a receiver of our data points backed by the above
    // memory SerDe
    rcvr := receiver.New(ms, &receiver.SimpleDSFinder{&rrd.DSSpec{
        Step: step,
        RRAs: []rrd.RRASpec{
            rrd.RRASpec{Function: rrd.WMEAN,
                Step: step,
                Span: span,
            },
        }}})
    rcvr.Start()

Now let’s create a goroutine which creates data points as fast as it can, the difference from the previous blog post is that we are using QueueGauge(), which is a paced metric, meaning that it flushes to the time series only periodically (once per second by default) so as to not overwhelm the I/O and or network (even though in this case it doesn’t really matter since we’re using a memory-based SerDe anyway).

    start := time.Now()
    end := start.Add(span)

    go func() {
        n := 0
        for t := time.Now(); t.Before(end); t = time.Now() {
            rcvr.QueueGauge(serde.Ident{"name":"foo.bar"}, float64(n)/(t.Sub(start)).Seconds())
            n++
        }
    }()

And finally, as before, we need to hook up a couple of http handlers:

    db := dsl.NewNamedDSFetcher(ms.Fetcher())

    http.HandleFunc("/metrics/find", h.GraphiteMetricsFindHandler(db))
    http.HandleFunc("/render", h.GraphiteRenderHandler(db))

    listenSpec := ":8088"
    fmt.Printf("Waiting for requests on %s\n", listenSpec)
    http.ListenAndServe(listenSpec, nil)

} // end of main()

Now if we run the above code with something like go run simpletgres.go, we’ll notice that unlike with the previous example, the web server starts right away, and the data points are being written while the server is running. If we aim Grafana at it, we should be able to see the chart update in real time.

After a couple of minutes, mine looks like this:

So my macbook can crank these out at about 2.5 million per second.

In my experience instrumenting my apps with simple counters like this and having them available directly from the app without having to send them to a separate statsd server somewhere has been extremely useful in helping understand performance and other issues.

Why is there no Formal Definition of Time Series?

2016-12-23T09:13:00-05:00

If you’re reading this, chances are you may have searched for definition of “Time Series”. And, like me, you were probably disappointed by what you’ve found.

The most popular “definition” I come across amongst our fellow programmer folk is that it’s “data points with timestamps”. Or something like that. And you can make charts from it. And that’s about it, alas.

The word time suggests that is has something to do with time. At first it seems reasonable, I bite. The word series is a little more peculiar. A mathematician would argue that a series is a sum of a sequence. Most people though think “series” and “sequence” are the same thing, and that’s fine. But it’s a clue that time series is not a scientific term, because it would have been called time sequence most likely.

Lets get back to the time aspect of it. Why do data points need timestamps? Or do they? Isn’t it the time interval between points that is most essential, rather than the absolute time? And if the data points are spaced equally (which conforms to the most common definiton of time series), then what purpose would any time-related information attached to a data point serve?

To understand this better, picture a time chart. Of anything - temperature, price of bitcoin over a week, whatever. Now think - does the absolute time of every point provide any useful information to you? Does the essential meaning of the chart change depending on whether it shows the price of bitcoin in the year 2016 or 2098 or 10923?

Doesn’t it seem like “time” in “time series” is a bit of a red herring?

Here is another example. Let’s say I decide to travel from San-Francisco to New York taking measurements of elevation above the sea level at every mile. I then plot that sequence on a chart where x-axis is distance traveled and y-axis is elevation. You would agree that this chart is not a “time series” by any stretch, right? But then if I renamed x-axis to “time traveled” (let’s assume I moved at constant speed), the chart wouldn’t change at all, but now it’s okay to call it “time series”?

So it’s no surprise that there is no formal definition of “time series”. In the end a “time series” is just a sequence. There are no timestamps required and there is nothing at all special regarding a dimension being time as opposed to any other unit, which is why there is no mathematical definition of “time series”. Time series is a colloquial term etymological origins of which are not known to me, but it’s not a thing from a scientific perspective, I’m afraid.

Next time you hear “time series” just substitute it with “sequence” and see how much sense that makes. For example a “time series database” is a “sequence database”, i.e. database optimized for sequences. Aren’t all relational databases optimized for sequences?

Something to think about over the holidays…

Edit: Someone brought up the subject of unevenly-spaced time series. All series are evenly spaced given proper resolution. An unevenly-spaced time series with timestamps accurate to 1 millisecond is a sparse evenly-spaced series with a 1 millisecond resolution.

Simple Time Series App with Tgres

2016-12-21T19:55:00-05:00

Did you know you can use Tgres components in your code without PostgreSQL, and in just a dozen lines of code instrument your program with a time series. This example shows a complete server emulating Graphite API which you can use with Grafana (or any other tool).

In this example we will be using three Tgres packages like so (in addition to a few standard ones, I’m skipping them here for brevity - complete source code gist):

import (
    "github.com/tgres/tgres/dsl"
    h "github.com/tgres/tgres/http"
    "github.com/tgres/tgres/rrd"
)

First we need a Data Source. This will create a Data Source containing one Round Robin Archive with a 10 second resolution spanning 1000 seconds.

step := 10 * time.Second
span := 100 * step

ds := rrd.NewDataSource(rrd.DSSpec{
    Step: 1 * time.Second,
    RRAs: []rrd.RRASpec{
        rrd.RRASpec{Step: step, Span: span},
    },
})

Let’s shove a bunch of data points into it. To make it look extra nice, we can make these points look like a sinusoid with this little function:

func sinTime(t time.Time, span time.Duration) float64 {
    x := 2 * math.Pi / span.Seconds() * float64(t.Unix()%(span.Nanoseconds()/1e9))
    return math.Sin(x)
}

And now for the actual population of the series:

start := time.Now().Add(-span)

for i := 0; i < int(span/step); i++ {
    t := start.Add(time.Duration(i) * step)
    ds.ProcessDataPoint(sinTime(t, span), t)
}

We will also need to create a NamedDSFetcher, the structure which knows how to search dot-separated series names a la Graphite.

db := dsl.NewNamedDSFetcherMap(map[string]rrd.DataSourcer{"foo.bar": ds})

Finally, we need to create two http handlers which will mimic a Graphite server and start listening for requests:

http.HandleFunc("/metrics/find", h.GraphiteMetricsFindHandler(db))
http.HandleFunc("/render", h.GraphiteRenderHandler(db))

listenSpec := ":8088"
fmt.Printf("Waiting for requests on %s\n", listenSpec)
http.ListenAndServe(listenSpec, nil)

Now if you point Grafana at it, it will happily think it’s Graphite and should show you a chart like this:

Note that you can use all kinds of Graphite functions at this point - it all “just works”.

Enjoy!

Gregory Trubetskoy

Relative Imports Hack in Golang

Blockchain Proof-of-Work is a Decentralized Clock

The Decentralized Ledger Time Ordering Problem

Timing is the Root Problem

Proof-of-Work Recap

Nothing Happens Between Blocks

SHA is Memoryless and Progress-Free

The SHA Input is Irrelevant

The Difficulty is Intergalactic

Trying a SHA Makes You a Participant

The Participation is Revealed in Statistics

Work is a Clock

Last Piece of the Puzzle

What About the Distributed Consensus?

And that is it

Conclusion

The Bitcoin Blockchain PostgresSQL Schema

Blockchain Data Structure Overview

Row Ids and Hashes

Integers

Blocks

Transactions

Outputs

Inputs

Indexes and Foreign Key Constraints

Triggers

Identifying Orphaned Blocks

Conclusion

Blockchain in PostgreSQL Part 2

The Missing Functionality

Avoid Reinventing the Wheel

The C extension

The Schema

Expression Indexes

Views

Bitcoin Transaction Hash in Pure PostgreSQL

Random Thoughts

Electricity cost of 1 Bitcoin (Sep 2017)

Looking at the Trend

Bitcoin: USD Value

Asset Pricing

Speculative Demand

Right Price

Store of Value Price

Importance of Market Cap

Hash Rate

Adoption

Bitcoin: Better Ink than Gold?

Money is Debt Ink

World-Wide Debt Ledger

Gold Ink

Banks, Paper and Records of Records (of Records)

Real Estate Ink

Monetary System is Just a Ledger

Enter Blockchain

But Bitcoin is just an Agreement?

Alternative Realities

The Mystery of Value

Tgres Status - July 4th 2017

Current Status

What’s new?

Data Point Versioning

Zero Heartbeat Series

Tgres Listens to DELETE Events

In-Memory Series for Faster Querying

DS and RRA State is an Array

Whisper Data Migration

Graphite DSL

Future

Get rid of the config file

A user interface

Track Usage

Better code organization

Rethink the DSL

Authentication and encryption

Clustering needs to be re-considered

Building a Go Web App in 2017

Introduction

Python and Ruby

Top level: `package main`

`package daemon`

`package model`

`package db`