<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Llorx on Medium]]></title>
        <description><![CDATA[Stories by Llorx on Medium]]></description>
        <link>https://medium.com/@Llorx?source=rss-68506c31b4b0------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*--kl5hKGdeufe4KvBgSxxw.jpeg</url>
            <title>Stories by Llorx on Medium</title>
            <link>https://medium.com/@Llorx?source=rss-68506c31b4b0------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Wed, 17 Jun 2026 12:39:48 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@Llorx/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[ Your Node.js Benchmarks Are (Probably) Invalid]]></title>
            <link>https://medium.com/@Llorx/your-node-js-benchmarks-are-probably-invalid-a4ed2f14aadf?source=rss-68506c31b4b0------2</link>
            <guid isPermaLink="false">https://medium.com/p/a4ed2f14aadf</guid>
            <category><![CDATA[optimization]]></category>
            <category><![CDATA[nodejs]]></category>
            <category><![CDATA[benchmark]]></category>
            <category><![CDATA[v8]]></category>
            <dc:creator><![CDATA[Llorx]]></dc:creator>
            <pubDate>Tue, 04 Nov 2025 16:40:06 GMT</pubDate>
            <atom:updated>2025-11-04T16:40:06.091Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*WWE7-snh2t7qo0hoifV9PQ.png" /></figure><h3>⚙️ The Problem with Node.js Benchmarks</h3><p>Node.js and its engine, <strong>V8</strong>, perform a lot of behind-the-scenes optimizations. That “magic” makes your JavaScript run fast — but it can also completely distort your benchmark results.</p><p>When I was developing <a href="https://github.com/Llorx/pacopack"><strong>pacopack</strong></a>, I wanted to squeeze every bit of performance out of Node.js so I used <a href="https://benchmarkjs.com/"><strong>benchmark.js</strong></a> to measure critical parts of the code.</p><p>Then something weird happened: I simply <strong>reordered the tests</strong>, and the results changed. Significantly.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/100/1*FnH96Ca24eUajcG1nQXk4w.gif" /><figcaption>wut</figcaption></figure><h3>🧙‍♂️ V8’s Hidden Optimizations</h3><p>Depending on what you’re testing, <strong>Node.js may reuse internal caches</strong>, and <strong>V8 applies just-in-time (JIT) optimizations</strong>.</p><p>This means one test can influence the next — what I like to call <strong>optimization pollution</strong>.</p><p>At first, I tried running each test individually to isolate them. It worked… but it broke the convenience of using benchmark suites. So I built a solution that handled isolation automatically.</p><p>That’s how <a href="https://github.com/Llorx/iso-bench"><strong>iso-bench</strong></a> was born.</p><h3>🧩 Example</h3><p>Let’s start with a simple benchmark suite using benchmark.js:</p><pre>import Benchmark from &quot;benchmark&quot;;<br>// The test functions<br>const functions = {<br>  read: function (buf) {<br>    return buf.readUint8(0);<br>  },<br>  direct: function (buf) {<br>    return buf[0];<br>  },<br>  read_again: function (buf) {<br>    return buf.readUint8(0);<br>  },<br>};<br>// Prepare buffers<br>const buffers = new Array(1000).fill(0).map(() =&gt; {<br>  const buf = Buffer.allocUnsafe(1);<br>  buf[0] = Math.floor(Math.random() * 0xff);<br>  return buf;<br>});<br>// The benchmark suite<br>const suite = new Benchmark.Suite();<br>for (const [type, fn] of Object.entries(functions)) {<br>  suite.add(`${type}`, () =&gt; {<br>    for (let i = 0; i &lt; buffers.length; i++) {<br>      fn(buffers[i]);<br>    }<br>  });<br>}<br>suite.on(&quot;cycle&quot;, (event) =&gt; console.log(String(event.target))).run({ async: true });</pre><p>The function read_again is identical to read — the only difference is that it runs <strong>after</strong> direct.</p><p>Results:</p><pre>read       x 1,531,039<br>direct     x 216,381<br>read_again x 312,087 // slower than &quot;read&quot;? It’s the same code!</pre><p>🤨 <em>Why is the same function suddenly slower?</em></p><h3>🔁 Changing the Order Changes Everything</h3><p>If we just swap the order and put direct first:</p><pre>const functions = {<br>  direct: function (buf) { // Now &quot;direct&quot; goes before &quot;read&quot;<br>    return buf[0];<br>  },<br>  read: function (buf) {<br>    return buf.readUint8(0);<br>  },<br>  read_again: function (buf) {<br>    return buf.readUint8(0);<br>  },<br>};</pre><p>It yields:</p><pre>direct     x 1,599,998 // &quot;direct&quot; 7x times faster than before?<br>read       x 163,890   // slower than before?<br>read_again x 184,126   // slower than before too??</pre><p><strong>WHAT.</strong></p><p>The only thing that changed was the <strong>order</strong> of the tests — not the code itself. This is the “benchmark pollution” I was talking about.</p><h3>🧱 The Fix: Run Benchmarks in Isolation</h3><p><strong>iso-bench</strong> solves this by running each benchmark in a separate process, keeping them fully isolated from one another.</p><p>Example:</p><pre>import { IsoBench } from &quot;iso-bench&quot;;<br>// The test functions<br>const functions = {<br>  read: function (buf) {<br>    return buf.readUint8(0);<br>  },<br>  direct: function (buf) {<br>    return buf[0];<br>  },<br>  read_again: function (buf) {<br>    return buf.readUint8(0);<br>  },<br>};<br>// Prepare buffers<br>const buffers = new Array(1000).fill(0).map(() =&gt; {<br>  const buf = Buffer.allocUnsafe(1);<br>  buf[0] = Math.floor(Math.random() * 0xff);<br>  return buf;<br>});<br>// The benchmark suite<br>const bench = new IsoBench();<br>for (const [type, fn] of Object.entries(functions)) {<br>  bench.add(`${type}`, () =&gt; {<br>    for (let i = 0; i &lt; buffers.length; i++) {<br>      fn(buffers[i]);<br>    }<br>  });<br>}<br>bench.consoleLog().run();</pre><p>It will yield these results:</p><pre>read       - 1,709,528 op/s<br>direct     - 1,713,134 op/s<br>read_again - 1,701,391 op/s</pre><p>Which make <em>way </em>more sense.</p><h3>🧠 How It Works</h3><p>Here’s what happens under the hood:</p><ul><li>The <strong>main process</strong> sets up the suite but doesn’t execute the test functions directly.</li><li>For each test, <strong>a new Node.js process</strong> is spawned.</li><li>Each process runs the same entry point and receives a signal with the test that it should <strong>run isolated in this process</strong>.</li><li>The results are sent back to the main process for aggregation.</li></ul><p>This ensures that <strong>each test runs in a completely clean context</strong>, unaffected by <strong>V8</strong> optimizations or cache warmups from previous tests.</p><p>And as always — it’s lightweight and has zero dependencies. 🙂</p><p>Install it with:</p><pre>npm install iso-bench</pre><p>👉Documentation: <a href="https://github.com/Llorx/iso-bench">https://github.com/Llorx/iso-bench</a></p><h3>🧭 TL;DR</h3><p>If you’re benchmarking Node.js code, don’t trust your numbers blindly. <strong>V8’s optimizations can change your results depending on test order.</strong></p><p>Use <a href="https://github.com/Llorx/iso-bench"><strong>iso-bench</strong></a> to run each benchmark in isolation — and finally get results you can believe without overthinking if optimizations are maybe applying.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a4ed2f14aadf" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How I Created My Own Testing Framework — An Opinionated Way of Testing]]></title>
            <link>https://medium.com/@Llorx/how-i-created-my-own-testing-framework-13d998ef5c73?source=rss-68506c31b4b0------2</link>
            <guid isPermaLink="false">https://medium.com/p/13d998ef5c73</guid>
            <category><![CDATA[snapshot-testing]]></category>
            <category><![CDATA[testing]]></category>
            <category><![CDATA[arrange-act-assert]]></category>
            <category><![CDATA[optimization]]></category>
            <category><![CDATA[javascript]]></category>
            <dc:creator><![CDATA[Llorx]]></dc:creator>
            <pubDate>Sat, 21 Jun 2025 09:53:22 GMT</pubDate>
            <atom:updated>2025-06-23T08:08:29.834Z</atom:updated>
            <content:encoded><![CDATA[<h3>How I Created My Own Testing Framework — An Opinionated Way of Testing</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*jX8qlmTuTjHA9R6hVFFeZQ.png" /></figure><h3>Motivation</h3><p>When I started doing tests — although they were big monolithic tests doing multiple actions and assertions under the same test — I loved testing. I could implement a full class without running any other part of the software, and when I <em>connected</em> the class to the remaining code, it worked on the first try. That feeling is awesome!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/480/1*GNg2ijLk79jWaHyyaBchMw.gif" /><figcaption>Me implementing tests</figcaption></figure><p>But when I had to modify or add tests, they were <em>a bit</em> painful. Something was off. Testing should make your life easier — not harder.</p><p>After reading a lot about testing, I stumbled upon this <a href="https://github.com/goldbergyoni/javascript-testing-best-practices?tab=readme-ov-file#section-0%EF%B8%8F%E2%83%A3-the-golden-rule">incredible guide about the Arrange-Act-Assert pattern</a>.</p><p>I loved the premise: <strong>make tests easy to process</strong>. Define explicit sections for arranging the test, doing an action, and asserting the result. So, I started adding // Arrange, // Act, and // Assert comments all around my tests.</p><p>I had mixed feelings doing this. Something was still off. I didn’t like using comments as section dividers. In my opinion, comments should explain code that is difficult to understand by itself. I read comments only when I don’t fully understand something, so my brain subconsciously ignores them until needed. It still wastes brain cycles to read the test and understand its sections.</p><p>With this in mind, <a href="https://github.com/Llorx/arrange-act-assert"><strong>arrange-act-assert</strong></a> was born.</p><h3>Optimize Brain Cycles</h3><p>Another thing I love is <strong>good practices enforced by software design</strong>, so I needed a library that forced me to create explicit <em>arrange</em>, <em>act</em>, and <em>assert</em> sections — rather than relying on optional comments.</p><p>Take this test, for example, written using the Node.js test runner:</p><pre>import test from &quot;node:test&quot;;<br>import Assert from &quot;node:assert&quot;;<br><br>import { MyFactory } from &quot;./MyFactory&quot;;<br>import { MyBase } from &quot;./MyBase&quot;;<br><br>test(&quot;should do that thing properly&quot;, () =&gt; {<br>    const baseOptions = {<br>        a: 1,<br>        b: 2,<br>        c: 3,<br>        d: 4<br>    };<br>    const base = new MyBase(baseOptions);<br>    test.after(() =&gt; base.close());<br>    base.setData(&quot;a&quot;, 2);<br>    const factory = new MyFactory();<br>    test.after(() =&gt; factory.dispose());<br>    const processor = factory.getProcessor();<br>    const data = processor.processBase(base);<br>    Assert.deepStrictEqual(data, {<br>        a: 2,<br>        b: 27<br>    });<br>});</pre><p>You’ll notice how you have to spend brain cycles to understand what’s going on. Whether it’s more or fewer, <em>some</em> are wasted.</p><p>To improve this, I added // Arrange, // Act, and // Assert comments:</p><pre>test(&quot;should do that thing properly&quot;, () =&gt; {<br>    // Arrange<br>    const baseOptions = { a: 1, b: 2, c: 3, d: 4 };<br>    const base = new MyBase(baseOptions);<br>    test.after(() =&gt; base.close());<br>    base.setData(&quot;a&quot;, 2);<br>    const factory = new MyFactory();<br>    test.after(() =&gt; factory.dispose());<br>    const processor = factory.getProcessor();<br><br>    // Act<br>    const data = processor.processBase(base);<br><br>    // Assert<br>    Assert.deepStrictEqual(data, { a: 2, b: 27 });<br>});</pre><p>This helps differentiate the sections and even discourages mistakes like mixing the Act and Assert steps:</p><pre>Assert.deepStrictEqual(processor.processBase(base), {...}); // ❌ Bad</pre><p>But I still had to look for the comments. Worse, I could still do weird things — like add multiple acts in a single test.</p><h3>Enforcing Better Structure</h3><p>So, I created a library where that same test looks like this:</p><pre>import test from &quot;arrange-act-assert&quot;;<br>import Assert from &quot;node:assert&quot;;<br><br>import { MyFactory } from &quot;./MyFactory&quot;;<br>import { MyBase } from &quot;./MyBase&quot;;<br><br>test(&quot;should do that thing properly&quot;, {<br>    ARRANGE(after) {<br>        const baseOptions = { a: 1, b: 2, c: 3, d: 4 };<br>        const base = after(new MyBase(baseOptions), base =&gt; base.close());<br>        base.setData(&quot;a&quot;, 2);<br>        const factory = after(new MyFactory(), factory =&gt; factory.dispose());<br>        const processor = factory.getProcessor();<br>        return { base, processor };<br>    },<br>    ACT({ base, processor }) {<br>        return processor.processBase(base);<br>    },<br>    ASSERT(data) {<br>        Assert.deepStrictEqual(data, { a: 2, b: 27 });<br>    }<br>});</pre><p>Yes, I hear you: <em>“Ugh, those uppercase section names!”</em> But <strong>that’s the whole point</strong>: they’re visible, they’re obvious, THEY’RE UPPERCASE — so your brain spends almost <strong>zero</strong> cycles identifying them.</p><p>I even considered adding a lowercase variant, but this is an <strong>opinionated</strong> testing library — and in my opinion, <strong>uppercase is way more noticeable</strong>.</p><p>This structure also helps you clearly distinguish the method you’re testing (processBase() in ACT) from the expected result ({ a: 2, b: 27 } in ASSERT).</p><p>Also, the library enforces one ACT per test. That’s a design feature to encourage cleaner, single-purpose tests.</p><h3>Smarter Resource Cleanup</h3><p>Consider this code using Node’s test runner:</p><pre>test(&quot;should do that thing properly&quot;, () =&gt; {<br>    const base = new MyBase();<br>    const factory = new MyFactory();<br>    test.after(() =&gt; {<br>        base.close();<br>        factory.dispose();<br>    });<br>    [...]<br>});</pre><p>If base.close() fails, then factory.dispose() will never be called. <strong>That’s a leak!</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/167/1*d6pYepazrhWfCkxJCnxTQw.png" /></figure><p>With arrange-act-assert, by design, it hints you to have a single cleanup callback for each element:</p><pre>test(&quot;should do that thing properly&quot;, {<br>    ARRANGE(after) {<br>        const base = after(new MyBase(), base =&gt; base.close());<br>        const factory = after(new MyFactory(), factory =&gt; factory.dispose());<br>        [...]<br>    },<br>    [...]<br>});</pre><p>Each disposable is tightly coupled to its cleanup logic, reducing the chance of unintended leaks.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/153/1*MLNXQcDvGZ72OeWhJAAOcQ.png" /></figure><h3>Human-Error-Safe Snapshots</h3><p>Tests exist because humans make mistakes — so the tools should also be <em>designed</em> to reduce human errors.</p><p>With this in mind, I introduced a new approach to snapshot testing: one that <strong>forces</strong> validation and <strong>reduces </strong>accidental acceptance of wrong outputs.</p><p>Snapshot testing is pretty straightforward:</p><pre>test(&quot;should do that thing properly&quot;, {<br>    ARRANGE(after) {<br>        [...]<br>        return whatever;<br>    },<br>    SNAPSHOT(whatever) {<br>        return whatever.methodToSnapshot();<br>    }<br>});</pre><p>How is this safer?</p><p>The library creates an <strong>unconfirmed snapshot</strong>. When run again, it <strong>fails</strong> because the snapshot is unconfirmed, so you must run the test again with --confirm-snapshots to explicitly confirm it. The library will assert the unconfirmed snapshot and create a <strong>confirmed snapshot</strong>.</p><p>You can’t confirm and create a snapshot in the same run, and you can’t confirm a snapshot if it returns a different result than the previous unconfirmed one. This <strong>two-pass design</strong> drastically reduces the chance of forgetting to validate what was snapshotted.</p><h3>Addendum</h3><p>That’s very much it! It also supports <strong>code coverage</strong>, <strong>parallelism</strong>, and, as always, has <strong>zero dependencies </strong>😉.</p><p>If you like the Arrange-Act-Assert pattern, I’d love for you to try it out.</p><p>👉 <a href="https://github.com/Llorx/arrange-act-assert"><strong>GitHub Repository</strong></a><br>📦 <strong>npm install arrange-act-assert</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=13d998ef5c73" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How I created this Crash-Safe Persistence System in Node.js — Avoid this common practice]]></title>
            <link>https://medium.com/@Llorx/this-one-flaw-could-destroy-your-data-an-analysis-on-resilient-data-persistence-node-js-cd33d8835346?source=rss-68506c31b4b0------2</link>
            <guid isPermaLink="false">https://medium.com/p/cd33d8835346</guid>
            <category><![CDATA[software-engineering]]></category>
            <category><![CDATA[javascript]]></category>
            <category><![CDATA[database]]></category>
            <category><![CDATA[nodejs]]></category>
            <category><![CDATA[software-development]]></category>
            <dc:creator><![CDATA[Llorx]]></dc:creator>
            <pubDate>Wed, 28 May 2025 23:31:20 GMT</pubDate>
            <atom:updated>2025-10-21T18:09:52.877Z</atom:updated>
            <content:encoded><![CDATA[<h3>How I created this Crash-Safe Persistence System in Node.js — Avoid this common practice (With animations 😊)</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*UpIN3ws65hkJcCBitSPH9Q.jpeg" /></figure><h3>The problem</h3><p>As a developer, you may find yourself (if it hasn’t happened already) needing to store a small cache or some kind of local state that must persist across runs. The usual approach to store data is to use a database, but sometimes that’s overkill, especially if you just need to store a little bit of data. In these cases, you might end up just <strong>saving the data in a local file</strong> and <strong>rewriting that file when the data changes</strong>. And <em>this</em> is <strong>the flaw</strong>.</p><h3>The flaw</h3><p>Imagine that you manage a parking lot. Someone<strong><em> </em></strong>arrives with an <strong><em>orange car</em></strong> and wants to park it, so you send him to the <strong><em>slot 1</em></strong>, he<strong><em> </em></strong>parks, and leaves. Later, this same person<strong><em> </em></strong>returns with a <strong><em>blue car</em></strong> and wants to switch vehicles. You tell him to unpark the <strong><em>orange car</em></strong> and then to park the <strong><em>blue car</em></strong> in the same <strong><em>slot 1</em></strong>. In the best-case scenario, this will work, but if something happens while he is unparking the <strong><em>orange car</em></strong> (a worker needs to suddenly leave, for example), when a new worker joins, there will be no cars in any slot and he will not know what the situation was.</p><p>That’s exactly what can happen when you overwrite data in a file. If a crash or power outage occurs mid-write, you can end up with a partially written file, which is often unreadable. Best-case scenario? You detect the corruption and reset the file, <em>only</em> losing your data. Worst-case? You read a mix of old and new data, leading to a false sense of correctness.</p><h3>The solution</h3><p>Continuing the parking lot analogy, one common way to overcome this is to park the <strong><em>blue car</em></strong> in the <strong><em>slot 2</em></strong>, and when you are sure that the <strong><em>blue car</em></strong> is correctly parked, then unpark the <strong><em>orange car</em></strong>. This is known as <a href="https://en.wikipedia.org/wiki/Copy-on-write"><strong>copy-on-write</strong></a>.</p><p>Actually, there’s more to it: we need integrity checks to avoid loading invalid data, versioning to avoid duplicates, space reclaiming to avoid files growing forever, and, last but not least, data relocation to shirnk this reclaimed space if needed. All these methods are applied by this Node.js <a href="https://github.com/Llorx/persistency"><strong>persistency library</strong></a> and will be explained in detail in this post.</p><h3>The details</h3><p>The main feature to avoid data corruption during a write is to apply <strong>copy-on-write</strong>. Instead of overwriting existing data, we always append a new version:</p><pre>async function appendData(key:string, value:any) {<br>    const data = Buffer.from(JSON.stringify({ key, value }));<br>    const dataSize = Buffer.allocUnsafe(4);<br>    dataSize.writeUint32BE(data.length);<br>    await Fs.appendFile(dataSize); // Save the data size as 4 bytes<br>    await Fs.appendFile(data); // Save the data<br>}<br>await appendData(&quot;blue&quot;, { value: &quot;X&quot; }); // write &quot;blue&quot;<br>await appendData(&quot;red&quot;, { value: &quot;Y&quot; });  // write &quot;red&quot;<br>await appendData(&quot;blue&quot;, { value: &quot;Z&quot; }); // write a new &quot;blue&quot; instead of overwriting the old one</pre><p>But because the code will get more complex over time, I’m going to use visuals instead. This script can be visualized like so:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*hKIv3r2EckYu9JPJ6JbvWg.gif" /></figure><p>With this simple <em>trick</em>, if the last blue “value: Z” write fails, you still have the previous blue “value: X” entry untouched. I’m sure that you’ve noticed a big problem with this: The data file will grow infinitely, but worry not, as we are going to implement a space reclaiming system.</p><p>After we ensure that the new blue “value: Z” entry is fully written, we reclaim the old blue “value: X” space, so when new data is added, it overwrites the unneeded old blue “value: X” entry:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*9aPj56d-5m5H5Cv3YptTlg.gif" /></figure><p>But, how do we ensure that the data is written to the storage device? That’s a simple question but has a pretty complex answer, so we are going to summarize it with two features: <strong>fsync</strong> and <em>time</em>.</p><p>When you tell the OS to write data to the storage device, the OS will cache this data and send it to the storage device at <em>some point in the future</em>. To overcome this, we signal the OS to flush the data to the storage device with Linux’s <a href="https://man7.org/linux/man-pages/man2/fsync.2.html"><strong>fsync</strong></a> or Windows’ <a href="https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-flushfilebuffers?redirectedfrom=MSDN"><strong>FlushFileBuffers</strong></a>, everything wrapped under the Node.js’s <a href="https://nodejs.org/api/fs.html#fsfsyncfd-callback"><strong>fs.fsync()</strong></a> API method.</p><p>But there’s a twist: Storage devices usually have an internal cache. Even after flushing the data, they might delay writing data physically. That’s very device-specific and different manufacturers offer different methods to avoid losing this cache, like using a capacitor to store enough power to flush the internal cache during a power outage. But still, to be safe, we assume the worst. The only way to ensure that it’s safe to reclaim old data is with <em>time</em>. When we want to reclaim old data, we can wait a specified amount of time to ensure that storage devices have finally persisted the new data. After some reading, I think that 15 minutes is, by far, way more than needed to assume the data is safely persisted, even on older or poorly designed devices:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*19qhteUN8Fq1bj75TMJFEw.gif" /></figure><p>But, what can we do with this empty space? Just fill it with new entries? But what if you deleted a large section and don’t expect to write that much? Worry not again, as we can implement a resilient compacting system by following the same principles as before.</p><p>When a space is reclaimed because the reclaiming delay has triggered or the data was directly deleted by the user, we can move entries from the end of the storage to this free space, and after that, we are going to <em>dealloc</em> (or truncate) this storage to reduce its maximum size.</p><p>But we don’t just move the data. To avoid data loss during compaction we need to follow the same principles defined above, so first we copy the last entry to the free location, and only after we are sure that the data is written (reclaim delay), we can free the old entry that we copied:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*TrCzizHg1nhFuR3Cd9W1qA.gif" /></figure><p>When deleting entries, keep in mind that you’re also removing all of their previous versions that haven’t yet passed the reclaim delay:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*t413EFCnLFZVycbpLwiYBg.gif" /></figure><p>But, what would happen if I’m overwriting a reclaimed space and a power outage happens? I will read mixed data when I open the file again! Yes, that’s why we added integrity checks. Every time we store data, a hash is computed and stored alongside the data, so when we load the data we will know if the data is valid. If any invalid data is found, compaction is applied:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*iUNPU5wIEEBfD5_eoLATlA.gif" /></figure><p>When entries are moved or rewritten, how do we know which version is the newest? Each entry has a version number. When loading, we simply keep the entry with the highest version (and then we apply compaction, as always):</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*kf3ra8pY3ssbIvSPiU0XrQ.gif" /></figure><p>After all this, we still have a small problem: each entry could have a different size, so when we save our data, we need to somehow save the data size so we know where this entry ends and the next entry begins. But what would happen if an entry is corrupted? We won’t know the entry size, so we won’t know where the next entry begins and we will lose all the entries after this corrupted one:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*sQkB5n5EgcSBgCZMlCgrJg.gif" /></figure><p>To overcome this situation, we have two files: one for the metadata, where each entry has a fixed size, and another for the data that just stores bytes:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*msE1FMLbslGuUnDVDD4h1w.gif" /></figure><p>If an entry in the metadata file is invalid, we just skip ahead by its fixed size to encounter the next entry, ensuring we don’t lose everything after it:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*JMcl6brnCGYlk-XU3ilDoA.gif" /></figure><p>Having two files, compaction is handled individually on each file. To compact the data file we move the data, create a new entry and reclaim the previous one:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*P9wP8S-3BTWY1EPJWKPFfQ.gif" /></figure><p>Sometimes we end up with a space in the metadata file (not the data file) that we need to compact without moving any data. It’s easier, as we only need to copy the last metadata entry pointing to the very same data location. When two metadata entries with the same version are found, the leftmost one takes precedence:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*XfPXf3umPd_lNeqKKWj0vQ.gif" /></figure><h3><strong>The conclusion</strong></h3><p>With all this, we’ve built a very resilient local persistence system with:</p><ul><li>💾 Data durability to avoid losing information in worst-case scenarios.</li><li>🛡️ Data integrity to avoid loading invalid entries.</li><li>♻️ Space reclaiming to avoid infinite growth.</li><li>📉 Automatic compaction to shrink the file sizes when needed.</li></ul><p>You can find all of these features implemented in Node.js, <strong>without dependencies </strong>😊:</p><ul><li>👉 <a href="https://github.com/Llorx/persistency">https://github.com/Llorx/persistency</a></li></ul><pre>npm install persistency</pre><p>There are other approaches too, like <a href="https://en.wikipedia.org/wiki/Write-ahead_logging"><strong>Write-ahead logging (wal)</strong></a>. Maybe I create a Node.js “persistency-wal” library and a detailed post on how it works 😉.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=cd33d8835346" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>