Sitemap

Hammersbald

4 min readNov 11, 2018

TL;DR: Generic databases and key-value stores offer more functionality than needed to store a blockchain and that superfluous functionality comes at a high cost in speed. Generic databases also lack an efficient mean of navigating and filtering by forks of the chain. I wrote a high performance, small footprint embedded database for applications in safe Rust with functionality tailored for a blockchain and discuss its design here.

About blockchain data

Bitcoin’s blockchain contains more than 354 million transactions in 549011 blocks as I write this document, its pure data content is around 190 GiB.

Sequential storage and processing of this amount of data is cheap and trivial, random access, navigation of internal references is not. Current best performing blockchain storages, such as Bitcoin Core’s own database or BlockSci, combine two technologies:

  • Append only storage of blockchain data
  • A key-value store that maps keys to pointers into the append only store.

A database that combines both technologies and adds navigation that follows internal references would be more convenient to use and allow blockchain specific optimizations:

  • Data is never deleted. Data structures can be optimized for insert and retrieve operations, we do not need to care how expensive it was to amend them.
  • Keys do not define a meaningful order. Popular key-value stores use structures that allow key iteration in a defined order. We do not need that, therefore may use a simple hash table instead of a search tree.
  • The data has internal references e.g. to previous block. We want to follow those references directly without a key-value lookup.

Introducing Hammersbald

Hammersbald is German slang for “haben wir es bald?”, approximately: “will we have it soon?”. A cheeky expression of impatience. Hammersbald is the blockchain database for the impatient.

Hammersbald is a project I have been working on during the last couple months. I wrote Hammersbald in safe Rust, from scratch here.

Hammersbald implements a persistent linear hash table pointing to append only DAG storage.

A Hammersbald database consists of four file groups. A file group consists files of max. 1 GiB size each.

  • Hash table slots. These are updated in random order as needed. (*.tb)
  • A hash table slot points to a list of data pointers. Data pointers are stored in an append only file group. (*.bl)
  • The blockchain data is stored in an append only file group (*.bc).
  • Log is a temporary store of hash table slot pre-images (.lg).
Image
Hammersbald storage (simplified)

The hash table is updated an consulted at high frequency therefore a copy of it is also maintained in memory. The number of slots is limited to 2³². A slot may point to any number of data elements, the mean length of the data pointer lists is configurable, but the actual length of a list is probabilistic. Hammersbald uses SipHash with a random key to achieve approximately even distribution of data to slots.

The hash table is written to disk at the end of a batch that can bundle any number of inserts.

Hammersbald recovers gracefully from a crash of the process it is running in: It stores pre-images of the hash table slots and sizes of the append only files in the log before changing the persistent hash table. It checks at start the log file, that might be left over from a crashed process, patches the hash table to its last known consistent state with the pre-images and truncate the append only stores to their last known correct size. In effect the last batch is either committed in its entirety or will be rolled back at startup.

Writes to append only files are asynchronous. Since hash table slots and links are cached in memory, read access to keyed data takes at most one disk seek.

Iteration is sequential in the reverse order of data inserts and skips over unreferenced data. Sequential search with iterators will be dominated by the speed of the data parser, not Hammersbald.

API

Hammersbald supports put, get and iterate operations.

Put and get operations may specify a key of up to 255 bytes length. Data stored with a key may be retrieved with it. Repeated storage with the same key renders the previously stored data inaccessible with the key. Data stored with or without a key may be retrieved if some other data stores a reference to it. A reference is returned by the put operation. Data may only refer to previously stored data, hence the data storage becomes a collection of directed acyclic graphs.

// stores keyed data and returns reference
fn put(&mut self, key: &[u8], data: &[u8], referred: &Vec<PRef>) -> Result<PRef, HammersbaldError>
// returns reference to data, data, referred other data
fn get(&self, key: &[u8]) -> Result<Option<(PRef, Vec<u8>, Vec<PRef>)>, HammersbaldError>
// store data and return a reference to it
fn put_referred(&mut self, data: &[u8], referred: &Vec<PRef>) -> Result<PRef, HammersbaldError>
// return optional data key, data, referred other data
fn get_referred(&self, pref: PRef) -> Result<(Vec<u8>, Vec<u8>, Vec<PRef>), HammersbaldError>

The iterator follows data references and thereby ignores data that is not reachable from the starting point of the iteration. Starting an iteration from the current chain tip will ignore transactions in orphaned forks.

// iterate staring from root through referred data recursively
fn dag<'a>(&'a self, root: PRef) -> impl Iterator<Item=(PRef, Envelope)> +'a

Bitcoin support

Hammersbald is blockchain agnostic, but comes with a specialized adapter and an example use with Bitcoin.

Target use

Hammersbald is a fast embedded blockchain database for applications. It only caches a hash table of data pointers in memory and not the blockchain data, therefore it will not reach BlockSci’s query performance, but does also not have BlockSci’s huge memory requirement. Hammersbald memory footprint is smaller, its design and implementation is simpler and safer than Bitcoin Core’s LevelDB + flat file combo.

Would be nice to have

  • Offline garbage collector for unreferenced data and data links.
  • Adapters for other blockchains.

--

--

Tamas Blummer
Tamas Blummer

Written by Tamas Blummer

Independent, Bitcoin Developer since 2012, Former: CLA @ Digital Asset Holdings, VP @ CoinTerra, CEO @ Bits of Proof, Engineer, Financial Risk Manager.