What is a blockchain?

Trustworthy Ledgers

A ledger is a list of transactions, often financial ones. Businesses typically keep track of their finances with one or more ledgers that serve as source information for various transactions. For example, when I was writing my book on being a software engineer, I spent some money on the storefront and sample copies, but I made some sales, too.

The ledger for my business might look like so:

ID | Date       | Amount    | Purpose
---|------------|-----------|---------------------------------------
 1 | 2013-09-11 |   (35.00) | Reviewer copies
 2 | 2013-09-26 |   (30.00) | Purchase physical copies for promotion
 3 | 2013-09-30 |    (9.00) | Send Owl Monthly Fee
 4 | 2013-10-01 |    25.00  | Sale
 5 | 2013-10-01 |    25.00  | Sale
 6 | 2013-10-01 |    25.00  | Sale

Each new transaction is added to this list as I make more sales or incur more expenses.

For managing my publishing empire, it's just me, so storing this in a google doc works fine. The only person that needs this is my accountant.

Even a huge public company still uses this underlying concept to manage their finances. In the case of such a company, there will be many hands on these ledgers, and the number of people and systems that consume them is large. There are also many people and systems that feed data into these ledgers (a purchasing system, ordering system, payments system, etc.). You've also got myriad folks on some finance team consolidating this to report on the company's financial health and, in the United States at least, the Chief Financial Officer and Chief Executive Officer have to sign off on the validity of this ledger.

So, this thing has to be relied-upon to be correct. To know that, the ledger must be consistent and correct.

Ensuring the Ledger is Correct

Above, I said that new transactions were added to the ledger. What about corrections? What if I really only made $23.75 per copy of my book, but my ledger says $25? I could change those entries, but then I'd be changing history.

The reason preserving the history of a ledger is important is that you make decisions based on what the ledger looked like at that time, and you publicize summarized results based on that ledger.

For example, the ledger above indicates I made $1 in profit. I'll need to pay tax on that, and I also might decide that my book-selling business is doing well and I don't need to pay for any new marketing.

If it turns out that I only made $23.75 for each copy sold, and I go back and change the ledger, it becomes confusing. I no longer see the information that led me to pay some taxes and decide to not buy more marketing. Even though I booked incorrect values, I used them to make decisions and take actions, and obscuring that history creates more problems than it solves.

Further, if the ledger can be changed willy-nilly after the fact, it allows all sorts of bad actors to conceal their corrupt activities.

Thus, we want each ledger entry to be relied-upon to be the same as it was when it was booked, and thus each subsequent entry to be based upon the previous one, without change. This paints an accurate history of what we made decisions on.

Making the Ledger Trustworthy, Part I

If we treat the ledger as an immutable append-only store, we can rely on it. So, how do we implement it in this way?

What most companies do is they store it in some database and add a ton of controls around access to that database so that they can feel like no one has tampered with it. The problem is that a SQL database (or whatever) is not designed to store immutable append-only data. It's designed to allow data to be mutated as needed. So, it's possible that someone could steal $1,000,000 and go back and edit the ledger in SQL to conceal this fact.

Most standard IT policies deal with this by instituting auditing and access controls. Auditing controls regularly report on what activity has taken place, and someone reviews it for appropriateness. Access controls limit who can do what.

These controls aren't perfect. It would be better if the ledger itself could be self-verifying. If we could look at any part of the ledger and its contents prove its accuracy, we wouldn't need all these controls.

The way the Bitcoin paper solves this problem is to store a cryptographic hash of the previous entry in the current entry. By calculating the hash of the previous entry and comparing it to the previously-calculated hash stored in the current entry, we can be sure that the current entry was booked against the previous one.

Let's see an example with my ledger from above. Let's convert it to JSON so we avoid wrapping in this text.

{ id: 1,
  date: "2013-09-11",
  amount: "-35.00",
  purpose: "Reviewer copies"
},
{ id: 2,
  date: "2013-09-26",
  amount: "-30.00",
  purpose: "Purchase physical copies for promotion"
},
{ id: 3,
  date: "2013-09-30",
  amount: "-9.00",
  purpose: "Send Owl Monthly Fee"
},
{ id: 4,
  date: "2013-10-01",
  amount: "25.00",
  purpose: "Sale"
},
{ id: 5,
  date: "2013-10-01",
  amount: "25.00",
  purpose: "Sale"
},
{ id: 6,
  date: "2013-10-01",
  amount: "25.00",
  purpose: "Sale"
}

With this in place, we can calculate the SHA-256 hash of each entry. We then store that hash in the subsequent entry to form a chain:

{ id:1,
  date:"2013-09-11",
  amount:"-35.00",
  purpose:"Reviewer copies",
  previous_hash: 0
},
{ id:2,
  date:"2013-09-26",
  amount:"-30.00",
  purpose:"Purchase physical copies for promotion",
  previous_hash: "3b54d7ae5e7afd268ad59b147b73da182589a661f38271f7c43b4e5427325a4a"
},
{ id:3,
  date:"2013-09-30",
  amount:"-9.00",
  purpose:"Send Owl Monthly Fee",
  previous_hash: "d2e695ed4c634f0cb48f6f397949fd4bc1001714e570febc3b555a40b3ff7eee"
},
{ id:4,
  date:"2013-10-01",
  amount:"25.00",
  purpose:"Sale",
  previous_hash: "3686edc9a31027262b53fa136942b7e5aec0f55cb054e142a8e0552c9a4603d2"
},
{ id:5,
  date:"2013-10-01",
  amount:"25.00",
  purpose:"Sale",
  previous_hash: "a6842789e3856ede5728142db8146647dbb8a4afb0028218003a60b9af03bf2c"
},
{ id:6,
  date:"2013-10-01",
  amount:"25.00",
  purpose:"Sale",
  previous_hash: "87cc440b361435d04d3177566a9d0ffd400f5abe3810d57e8be4df395a434041"
}

Now, the ledger is consistent and we can verify the entire thing. Starting with the first entry, we calculate the SHA-256 hash of its contents. If we compare that to the value for previous_hash in the second entry, we can see that it matches, and thus when the second entry we created, we can be sure that the contents of the first entry haven't changed. Thus, we know that this ledger's accounting of history is accurate.

We can then repeat that up the chain until the last entry. If each entry's previous_hash matches the hash of the previous entry, we know that the entire ledger is accurate and captures history the way it happened. The hashes form a chain, and if we consider each hashed entry as a block, we'd call this…a blockchain.

If someone had tampered with an older entry, the chain would be broken. When we calculate the SHA-256 of the tampered entry, it won't match the value of the subsequent entry's previous_hash and we'll know that the tampered entry was changed after the subsequent entry was created.

Of course, this is still not foolproof. Calculating a SHA-256 hash isn't computationally expensive. If someone wanted to subvert our ledger, they could change the values already logged, and then recompute the hashes for all entries, making a self-consistent ledger that was still wrong.

Trustworthy Ledgers, Part II

The Bitcoin whitepaper solves this problem with a proof of work, which is a computationally expensive operation that must be performed in order to log an entry into the ledger. This makes recomputing the entire ledger infeasible.

For example, we could require that each entry contain a nonsense value (called a nonce) that, when combined with the other values, produces a SHA-256 hash that meets certain criteria. In Bitcoin's case, that criteria is that the hash must start with some number of zeros. Instead of a hash like 87cc440b361435d04d3177566a9d0ffd400f5abe3810d57e8be4df395a434041, we'd only accept a hash like 00000000000035d04d3177566a9d0ffd400f5abe3810d57e8be4df395a434041 (note the leading 0's).

Here's how part of our ledger might look:

{ id:2,
  date:"2013-09-26",
  amount:"-30.00",
  purpose:"Purchase physical copies for promotion",
  nonce: 83495845,
  previous_hash: "00000000000000008ad59b147b73da182589a661f38271f7c43b4e5427325a4a"
},
{ id:3,
  date:"2013-09-30",
  amount:"-9.00",
  purpose:"Send Owl Monthly Fee",
  nonce: 94283243,
  previous_hash: "0000000000000000b48f6f397949fd4bc1001714e570febc3b555a40b3ff7eee"
},

Notice how all the hashes start with 16 0's.

The reason this is called “work” is that it's not easy to choose a nonce that, when combined with the ledger entry's actual data, produces a hash with the right number of leading 0's. It's so difficult, in fact, that it must be brute-forced—you can't choose a hash and reverse-engineer the nonce.

Thus, for each new entry, you start with a nonce of 0, add it to the entry, compute the SHA-256 hash and see what value comes out. If it's not one that starts with (in our case) 16 0's, you increment the nonce and try again. Repeat until you find the hash that starts with 16 0's.

This is expensive and takes a long time. And, there's no way around it.

SHA-256 is designed specifically to make it difficult to find such a nonce. You might think you could do this in reverse to save time: start with a hash that has 16 leading 0's, and then reverse-engineer it to find a nonce that works with our ledger entry. This is mathematically very difficult. And things that are mathematically difficult are really just impossible.

This property of SHA-256 is why it's used in cryptography. It's a hash function that cannot be reversed—you can't take a hash and figure out what data was fed into it without combinatorially trying all possible data (this is why Bitcoin is called a cryptocurrency).

What this means is that if you wanted to tamper with the ledger, and hide your tracks by recomputing the entire thing, it would be incredibly expensive to do so. It would be so expensive that you effectively couldn't do it, because legitimate entries would be added to the ledger faster than you could recompute the historical ones.

And this means the ledger is trustworthy.

So, how does this get us an electronic currency? Let's learn about that, and then see what other problems we can solve with this technology.