Running a Tezos node currently costs a lot of disk space, about 59 GB for the context database, the place where the node stores the states corresponding to every block in the blockchain, since the first one. Of course, this is going to decrease once garbage collection is integrated, i.e. removing very old information, that is not used and cannot change anymore (PR720 by Thomas Gazagnaire, Tarides, some early tests show a decrease to 14GB). As a side note, this is different from pruning, i.e. transmitting only the last cycles for “light” nodes (PR663 by Thomas Blanc, OCamlPro). Anyway, as Tezos will be used more and more, contexts will keep growing, and we need to keep decreasing the space and performance cost of Tezos storage.
As one part of our activity at OCamlPro is to allow companies to deploy their own private Tezos networks, we decided to experiment with new storage layouts. We implemented two branches: our branch
IronTez1 is based on a full LMDB database, as Tezos currently, but with optimized storage representation ; our branch
IronTez2 is based on a mixed database, with both LMDB and file storage.
To test these branches, we started a node from scratch, and recorded all the accesses to the context database, to be able to replay it with our new experimental nodes. The node took about 12 hours to synchronize with the network, on which about 3 hours were used to write and read in the context database. We then replayed the trace, either only the writes or with both reads and writes.
Here are the results:
The mixed storage is the most interesting: it uses half the storage of a standard Tezos node !
Again, the mixed storage is the most efficient : even with reads and writes,
IronTez2 is five time faster than the current Tezos storage.
Finally, here is a graph that shows the impact of the two attacks that happened in November 2018, and how it can be mitigated by storage improvement:
The graph shows that, using mixed storage, it is possible to restore the storage growth of Tezos to what it was before the attack !
Interestingly, although these experiments have been done on full traces, our branches are completely backward-compatible : they could be used on an already existing database, to store the new contexts in our optimized format, while keeping the old data in the ancient format.
Of course, there is still a lot of work to do, before this work is finished. We think that there are still more optimizations that are possible, and we need to test our branches on running nodes for some time to get confidence (TzScan might be the first tester !), but this is a very encouraging work for the future of Tezos !