Stories by Yoshifumi Kawai on Medium

ToonEncoder — A JSON-Compatible Format Encoder for C# and LLMs

Yoshifumi Kawai — Thu, 25 Dec 2025 07:36:33 GMT

ToonEncoder — A JSON-Compatible Format Encoder for C# and LLMs

I’ve created a serializer (encode-only) for Token-Oriented Object Notation (TOON), a JSON-compatible format. When used appropriately, TOON has the potential to significantly reduce token consumption when interacting with LLMs. While this is a compact library, internally it processes everything in UTF8, and it’s equipped with all the essential features of a modern library, including IBufferWriter support and Source Generator-based serializer generation.

GitHub — Cysharp/ToonEncoder

Of course, compared to competitors, the performance and memory efficiency are overwhelmingly better.

Top 3 variationa are all ToonEncoder

I’m simply too experienced in serializer design at this point (MessagePack-CSharp, MemoryPack, Utf8Json, etc…) with plenty of track record and know-how! Given the nature of this field, there are many libraries out there that seem to have been thrown together with AI to “just make it work,” but they’re no match for this. After all, this is warm, handcrafted code! Hyper-handmade craft coding. Honestly, I believe current AI-generated code is still far from top-tier quality. Sure, it can produce working code, and that’s impressive, but still.

Now, let me briefly explain TOON. The following JSON data:

{
  "context": {
    "task": "Our favorite hikes together",
    "location": "Boulder",
    "season": "spring_2025"
  },
  "friends": ["ana", "luis", "sam"],
  "hikes": [
    {
      "id": 1,
      "name": "Blue Lake Trail",
      "distanceKm": 7.5,
      "elevationGain": 320,
      "companion": "ana",
      "wasSunny": true
    },
    {
      "id": 2,
      "name": "Ridge Overlook",
      "distanceKm": 9.2,
      "elevationGain": 540,
      "companion": "luis",
      "wasSunny": false
    },
    {
      "id": 3,
      "name": "Wildflower Loop",
      "distanceKm": 5.1,
      "elevationGain": 180,
      "companion": "sam",
      "wasSunny": true
    }
  ]
}

Can be expressed in TOON as follows, resulting in a much smaller representation:

context:
  task: Our favorite hikes together
  location: Boulder
  season: spring_2025
  friends[3]: ana,luis,sam
  hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
    1,Blue Lake Trail,7.5,320,ana,true
    2,Ridge Overlook,9.2,540,luis,false
    3,Wildflower Loop,5.1,180,sam,true

Rather than JSON, it’s more like a hybrid of YAML and CSV. In particular, arrays of objects containing only primitive elements — which can be represented as tables (CSV) — are output in a CSV-like format, significantly reducing data size. This reduction translates to token savings for LLMs, which has garnered some attention. You might ask, “Why not just use CSV instead of some obscure format?” Well, CSV alone only handles tables and can’t include accompanying metadata, making it impractical. TOON offers better usability in this regard. Additionally, since the specification maintains mutual compatibility with JSON, it can serve as a drop-in replacement for JSON — another selling point.

My personal take is that TOON is not human-readable. Because TOON prioritizes efficiency, there are three different ways to represent arrays. In ToonEncoder, I call them TabularArray, InlineArray, and NonUniformArray, but honestly, having three types makes it hard to read. Moreover, when TabularArray and NonUniformArray are combined with nested objects, the indentation becomes utterly confusing. While LLMs seem to somehow correctly interpret human-readable formats even if they’re unfamiliar, I’m concerned whether they can properly understand such broken-looking structures.

Therefore, rather than replacing all JSON, I think the sweet spot — in terms of token efficiency, LLM comprehension, and human readability — is to apply TOON to CSV-like tables (TabularArray) or flat objects with TabularArray appended at the end. ToonEncoder is tuned to deliver optimal performance for exactly this kind of usage, and it integrates with Microsoft.Extensions.AI to enable selective TOON conversion for specific types only.

Using with Microsoft.Extensions.AI

Download from NuGet/ToonEncoder to get the core library and Source Generator bundled together. Note that the minimum target platform is .NET 10.

Basically, you can use Encode to convert either JsonElement or T value.

using Cysharp.AI;

var users = new User[]
{
    new (1, "Alice", "admin"),
    new (2, "Bob", "user"),
};

// simply encode
string toon = ToonEncoder.Encode(users);

// [2]{Id,Name,Role}:
//   1,Alice,admin
//   2,Bob,user
Console.WriteLine(toon);

public record User(int Id, string Name, string Role);

In this case, since we have an array of objects containing only primitive elements, it’s serialized as a tabular layout (TabularArray).

For practical usage, when applying this to Function Calling with Microsoft.Extensions.AI, you can prepare a JsonSerializerOptions configured with converters for the target types and pass it to the options. The Source Generator creates efficient JsonConverters. Usage is simple—just apply [GenerateToonTabularArrayConverter] to your target type!

public IEnumerable GetAIFunctions()
{
    var jsonSerializerOptions = new JsonSerializerOptions
    {
        Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping,
        WriteIndented = false,
        DefaultIgnoreCondition = JsonIgnoreCondition.Never,
        Converters =
        {
            // setup generated converter
            new Cysharp.AI.Converters.CodeDiagnosticTabularArrayConverter(),
        }
    };
    jsonSerializerOptions.MakeReadOnly(true); // need MakeReadOnly(true) or setup converter to TypeInfoResolve

    var factoryOptions = new AIFunctionFactoryOptions
    {
        SerializerOptions = jsonSerializerOptions
    };

    yield return AIFunctionFactory.Create(GetDiagnostics, factoryOptions);
}

[Description("Get error diagnostics of the target project.")]
public CodeDiagnostic[] GetDiagnostics(string projectName)
{
    // ...
}

// Trigger of Source Generator
[GenerateToonTabularArrayConverter]
public class CodeDiagnostic
{
    public string Code { get; set; }
    public string Description { get; set; }
    public string FilePath { get; set; }
    public int LocationStart { get; set; }
    public int LocationLength { get; set; }
}

In this example, when the number of CodeDiagnostic[] items is large, there's a significant difference in token consumption between JSON and TOON, giving TOON a clear advantage. However, since TOON has its strengths and weaknesses, I recommend evaluating the characteristics of your data to decide whether to apply TOON (add a Converter) or leave it as-is (JSON).

For generating flat objects (containing primitives, arrays of primitives, or arrays of objects composed only of primitives), a different attribute [GenerateToonSimpleObjectConverter] handles scenarios with TabularArray plus additional metadata.

var item = new Item
{
    Status = "active",
    Users = [new(1, "Alice", "Admin"), new(2, "Bob", "User")]
};

var toon = Cysharp.AI.Converters.ItemSimpleObjectConverter.Encode(item);

// Status: active
// Users[2]{Id,Name,Role}:
//   1,Alice,Admin
//   2,Bob,User
Console.WriteLine(toon);

[GenerateToonSimpleObjectConverter]
public record Item
{
    public required string Status { get; init; }
    public required User[] Users { get; init; }
}

Conclusion

I created ToonEncoder as a component for Cysharp/CompilerBrain, a C# Coding Agent that’s still very much a work in progress. Since it handles a lot of data, I wanted to save on tokens. So, early next year I’ll be focusing on CompilerBrain… probably!

To be completely honest, I don’t think TOON itself is a particularly good format. In fact, I’d say it’s quite rough around the edges. However, the marketing appeal of “JSON-compatible drop-in replacement” seems to have resonated, and since CSV alone is indeed limiting, having an actual specification to work with makes it a reasonable compromise choice.

My lack of intention to serialize complex data is reflected in [GenerateToonTabularArrayConverter] and [GenerateToonSimpleObjectConverter]. These also function as Analyzers—they produce compile errors when you try to include unsupported nested properties, essentially creating a pseudo-subset of TOON. Of course, if you call the methods via JsonElement, nested properties will be serialized properly. The library passes all the official test suite cases (except for intentionally unsupported features).

Also, as the library name suggests, it only supports Encoding. There’s no Decode. Since this is meant for sending data to LLMs, decoding isn’t really necessary.

While there are various shortcuts that make this a compact library, it’s still quite practical, so if you’re interested, please give it a try!

2025 — Work, Life, OSS

Since it’s the end of the year, let me also reflect on this year’s situation, though it’s unrelated to the main topic.

My company Cysharp’s parent company Cygames released Shadowverse: Worlds Beyond this year. Cysharp contributed to the C#-related portions. Some details are explained in a Japanese presentation: Cygames Approach to Technical Design for the Latest Smartphone Games: The Challenge of Architectural Redesign in “Shadowverse: Worlds Beyond”.

The main card battle component and the lobby space where many users gather are built with Cysharp’s network framework MagicOnion, running on C# servers. As someone who strongly wishes to see C# thrive not just in the enterprise world, and not just within Unity, but in consumer-facing applications, I’m delighted that we could deliver this worldwide.

While work went very well, unfortunately I significantly damaged my health from mid-year onward. Because of this, I intentionally reduced the time I spend on OSS maintenance. Since it was difficult to find extended periods of concentration, I focused my time on only a few things like ConsoleAppFramework. Additionally, being frustrated by Microsoft (employees) was also part of the reason I stepped back a bit. As for my health, I’m on the path to recovery, so I plan to gradually get back into things next year.

As you can see from the many OSS projects provided at GitHub/Cysharp, the OSS and money issues that made waves this year are certainly not irrelevant to me. My personal opinion is that seeking money is the right thing to do. For a truly sustainable environment, money cannot be separated from the equation and must be addressed seriously. I’m very grateful to everyone sponsoring me at sponsors/neuecc. Thank you!

I hope to continue delivering results that surprise everyone in the C# world next year. Happy New Year!

“ZLinq”, a Zero-Allocation LINQ Library for .NET

Yoshifumi Kawai — Thu, 15 May 2025 09:44:43 GMT

“ZLinq”, a Zero-Allocation LINQ Library for .NET

I’ve released ZLinq v1 last month! By building on structs and generics, it achieves zero allocations. It includes extensions like LINQ to Span, LINQ to SIMD, LINQ to Tree (FileSystem, JSON, GameObject, etc.), a drop-in replacement Source Generator for arbitrary types, and support for multiple platforms including .NET Standard 2.0, Unity, and Godot. It has now exceeded 2000 GitHub stars.

https://github.com/Cysharp/ZLinq

Struct-based LINQ itself isn’t particularly rare, and many implementations have attempted this approach over the years. However, none have been truly practical until now. They’ve typically suffered from extreme assembly size bloat, insufficient operator coverage, or performance issues due to inadequate optimization, never evolving beyond experimental status. With ZLinq, we aimed to create something practical by implementing 100% coverage of all methods and overloads in .NET 10 (including new ones like Shuffle, RightJoin, LeftJoin), ensuring 99% behavior compatibility, and implementing optimizations beyond just allocation reduction, including SIMD support, to outperform in most scenarios.

This was possible because of my extensive experience implementing LINQ. In April 2009, I released linq.js, a LINQ to Objects library for JavaScript (it’s wonderful to see that linq.js is still being maintained by someone who forked it!). I’ve also implemented the widely-used Reactive Extensions library UniRx for Unity, and recently released its evolution, R3. I’ve created variants like LINQ to GameObject, LINQ to BigQuery, and SimdLinq. By combining these experiences with knowledge from zero-allocation related libraries (ZString, ZLogger) and high-performance serializers (MessagePack-CSharp, MemoryPack), we achieved the ambitious goal of creating a superior alternative to the standard library.

This simple benchmark shows that while normal LINQ allocations increase as you chain more methods (Where, Where.Take, Where.Take.Select), ZLinq remains at zero.

Performance varies depending on the source, quantity, element type, and method chaining. To confirm that ZLinq performs better in most cases, we’ve prepared various benchmark scenarios that run on GitHub Actions: ZLinq/actions/Benchmark. While there are cases where ZLinq structurally can’t win, it outperforms in most practical scenarios.

For extreme differences in benchmarks, consider repeatedly calling Select multiple times. Neither System.LINQ nor ZLinq apply special optimizations in this case, but ZLinq shows a significant performance advantage:

(Memory measurement 1B is BenchmarkDotNet MemoryDiagnoser errors. The documentation clearly states that MemoryDiagnoser has an accuracy of 99.5%, which means slight measurement errors can occur.)

In simple cases, operations that require intermediate buffers like Distinct or OrderBy show large differences because aggressive pooling significantly reduces allocations (ZLinq uses somewhat aggressive pooling since it’s primarily based on ref struct, which is expected to be short-lived):

LINQ applies special optimizations based on method call patterns, so reducing allocations alone isn’t enough to always outperform it. For operator chain optimizations, such as those introduced in .NET 9 and described in Performance Improvements in .NET 9, ZLinq implements all these optimizations to achieve even higher performance:

A great benefit of ZLinq is that these LINQ evolution optimizations become available to all .NET generations (including .NET Framework), not just the latest versions.

Usage is simple — just add an AsValueEnumerable() call. Since all operators are 100% covered, replacing existing code works without issues:

using ZLinq;

var seq = source
    .AsValueEnumerable() // only add this line
    .Where(x => x % 2 == 0)
    .Select(x => x * 3);
foreach (var item in seq) { }

To ensure behavior compatibility, ZLinq ports System.Linq.Tests from dotnet/runtime and continuously runs them at ZLinq/System.Linq.Tests.

9000 test cases guarantee behavior (Skip cases are due to ref struct limitations where identical test code can’t be run, etc.)

Additionally, ZLinq provides a Source Generator for Drop-In Replacement that can optionally eliminate even the need for AsValueEnumerable():

[assembly: ZLinq.ZLinqDropInAttribute("", ZLinq.DropInGenerateTypes.Everything)]

This mechanism allows you to freely control the scope of the Drop-In Replacement. ZLinq/System.Linq.Tests itself uses Drop-In Replacement to run existing test code with ZLinq without changing the tests.

ValueEnumerable Architecture and Optimization

For usage, please refer to the ReadMe. Here, I’ll delve deeper into optimization. The architectural distinction goes beyond simply implementing lazy sequence execution, containing many innovations compared to collection processing libraries in other languages.

The definition of ValueEnumerable, which forms the basis of chaining, looks like this:

public readonly ref struct ValueEnumerable(TEnumerator enumerator)
    where TEnumerator : struct, IValueEnumerator, allows ref struct // allows ref struct only in .NET 9 or later
{
    public readonly TEnumerator Enumerator = enumerator;
}

public interface IValueEnumerator : IDisposable
{
    bool TryGetNext(out T current); // as MoveNext + Current
    // Optimization helper
    bool TryGetNonEnumeratedCount(out int count);
    bool TryGetSpan(out ReadOnlySpan span);
    bool TryCopyTo(scoped Span destination, Index offset);
}

Based on this, operators like Where chain as follows:

public static ValueEnumerable, TSource> Where(this ValueEnumerable source, Func predicate)
    where TEnumerator : struct, IValueEnumerator, allows ref struct

We chose this approach rather than using IValueEnumerable because with a definition like (this TEnumerable source) where TEnumerable : struct, IValueEnumerable, type inference for TSource would fail. This is due to a C# language limitation where type inference doesn't work from type parameter constraints (dotnet/csharplang#6930). If implemented with that definition, it would require defining instance methods for a vast number of combinations. LinqAF took that approach, resulting in 100,000+ methods and massive assembly sizes, which wasn't ideal.

In LINQ, all implementation is in IValueEnumerator, and since all Enumerators are structs, I realized that instead of using GetEnumerator(), we could simply copy-pass the common Enumerator, allowing each Enumerator to process with its independent state. This led to the final structure of wrapping IValueEnumerator with ValueEnumerable. This way, types appear in type declarations rather than constraints, avoiding type inference issues.

TryGetNext

Let’s examine MoveNext, the core of iteration, in more detail:

// Traditional interface
public interface IEnumerator : IDisposable
{
    bool MoveNext();
    T Current { get; }
}

// iterate example
while (e.MoveNext())
{
    var item = e.Current; // invoke get_Current()
}
// ZLinq interface
public interface IValueEnumerator : IDisposable
{
    bool TryGetNext(out T current);
}
// iterate example
while (e.TryGetNext(out var item))
{
}

C#’s foreach expands to MoveNext() + Current, which presents two issues. First, each iteration requires two method calls: MoveNext and get_Current. Second, Current requires holding a variable. Therefore, I combined them into bool TryGetNext(out T current). This reduces method calls to one per iteration, improving performance.

This bool TryGetNext(out T current) approach is also used in Rust's iterator:

pub trait Iterator {
    type Item;
    // Required method
    fn next(&mut self) -> Option;
}

To understand the variable holding issue, let’s look at the Select implementation:

public sealed class LinqSelect(IEnumerator source, Func selector) : IEnumerator
{
    // Three fields
    IEnumerator source = source;
    Func selector = selector;
    TResult current = default!;

    public TResult Current => current;

    public bool MoveNext()
    {
        if (source.MoveNext())
        {
            current = selector(source.Current);
            return true;
        }
        return false;
    }
}

public ref struct ZLinqSelect(TEnumerator source, Func selector) : IValueEnumerator
    where TEnumerator : struct, IValueEnumerator, allows ref struct
{
    // Two fields
    TEnumerator source = source;
    Func selector = selector;
    public bool TryGetNext(out TResult current)
    {
        if (source.TryGetNext(out var value))
        {
            current = selector(value);
            return true;
        }
        current = default!;
        return false;
    }
}

IEnumerator requires a current field because it advances with MoveNext() and returns with Current. However, ZLinq advances and returns values simultaneously, eliminating the need to store the field. This makes a significant difference in ZLinq's struct-based architecture. Since ZLinq embraces a structure where each method chain encompasses the previous struct entirely (TEnumerator being a struct), struct size grows with each method chain. While performance remains acceptable within reasonable method chain lengths, smaller structs mean lower copy costs and better performance. The adoption of TryGetNext was essential to minimize struct size.

A drawback of TryGetNext is that it cannot support covariance and contravariance. However, I believe iterators and arrays should abandon covariance/contravariance support altogether. They’re incompatible with Span, making them outdated concepts when weighing pros and cons. For example, array Span conversion can fail at runtime without compile-time detection:

// Due to generic variance, Derived[] is accepted by Base[]
Base[] array = new Derived[] { new Derived(), new Derived() };

// In this case, casting to Span or using AsSpan() causes a runtime error!
// System.ArrayTypeMismatchException: Attempted to access an element as a type incompatible with the array.
Span foo = array;
class Base;
class Derived : Base;

While this behavior exists because these features were added before Span, it's problematic in modern .NET where Span is widely used, making features that can cause runtime errors practically unusable.

TryGetNonEnumeratedCount / TryGetSpan / TryCopyTo

Naively enumerating everything doesn’t maximize performance. For example, when calling ToArray, if the size doesn’t change (e.g., array.Select().ToArray()), we can create a fixed-length array with new T[count]. System.LINQ internally uses an Iterator type for such optimizations, but since the parameter is IEnumerable, code like if (source is Iterator iterator) is always needed.

Since ZLinq is designed specifically for LINQ from the start, we’ve prepared for these optimizations. To avoid assembly size bloat, we’ve carefully selected the minimal set of definitions that provide maximum effect, resulting in these three methods.

TryGetNonEnumeratedCount(out int count) succeeds when the original source has a finite count and no filtering methods (Where, Distinct, etc., though Take and Skip are calculable) intervene. This benefits ToArray and methods requiring intermediate buffers like OrderBy and Shuffle.

TryGetSpan(out ReadOnlySpan span) potentially delivers dramatic performance improvements when the source can be accessed as contiguous memory, enabling SIMD operations or Span-based loop processing for aggregation performance.

TryCopyTo(scoped Span destination, Index offset) enhances performance through internal iterators. To explain external vs. internal iterators, consider that List offers both foreach and ForEach:

// external iterator
foreach (var item in list) { Do(item); }

// internal iterator
list.ForEach(Do);

They look similar but perform differently. Breaking down the implementations:

// external iterator
List.Enumerator e = list.GetEnumerator();
while (e.MoveNext())
{
    var item = e.Current;
    Do(item);
}

// internal iterator
for (int i = 0; i < _size; i++)
{
    action(_items[i]);
}

This becomes a competition between delegate call overhead (+ delegate creation allocation) vs. iterator MoveNext + Current calls. The iteration speed itself is faster with internal iterators. In some cases, delegate calls may be lighter, making internal iterators potentially advantageous in benchmarks.

Of course, this varies case by case, and since lambda captures and normal control flow (like continue, break, await, etc…) aren’t available, I personally believe ForEach shouldn't be used, nor should custom extension methods be defined to mimic it. However, this structural difference exists.

TryCopyTo(scoped Span destination, Index offset) achieves limited internal iteration by accepting a Span rather than a delegate.

Using Select as an example, for ToArray when Count is available, it passes a Span for internal iteration:

public ref struct Select
{
    public bool TryCopyTo(Span destination, Index offset)
    {
        if (source.TryGetSpan(out var span))
        {
            if (EnumeratorHelper.TryGetSlice(span, offset, destination.Length, out var slice))
            {
                // loop inlining
                for (var i = 0; i < slice.Length; i++)
                {
                    destination[i] = selector(slice[i]);
                }
                return true;
            }
        }
        return false;
    }
}

// ------------------

// ToArray
if (enumerator.TryGetNonEnumeratedCount(out var count))
{
    var array = GC.AllocateUninitializedArray(count);
    // try internal iterator
    if (enumerator.TryCopyTo(array.AsSpan(), 0))
    {
        return array;
    }
    // otherwise, use external iterator
    var i = 0;
    while (enumerator.TryGetNext(out var item))
    {
        array[i] = item;
        i++;
    }
    return array;
}

Thus, while Select can’t create a Span, if the original source can, processing as an internal iterator accelerates loop processing.

TryCopyTo differs from regular CopyTo by including an Index offset and allowing destination to be smaller than the source (normal .NET CopyTo fails if destination is smaller). This enables ElementAt representation when destination size is 1 - index 0 becomes First, ^1 becomes Last. Adding First, Last, ElementAt directly to IValueEnumerator would create redundancy in class definitions (affecting assembly size), but combining small destinations with Index allows one method to cover more optimization cases:

public static TSource ElementAt(this ValueEnumerable source, Index index)
    where TEnumerator : struct, IValueEnumerator, allows ref struct
{
    using var enumerator = source.Enumerator;
    var value = default(TSource)!;
    var span = new Span(ref value); // create single span
    if (enumerator.TryCopyTo(span, index))
    {
        return value;
    }
    // else...
}

LINQ to Span

In .NET 9 and above, ZLinq allows chaining all LINQ operators on Span and ReadOnlySpan:

using ZLinq;

// Can also be applied to Span (only in .NET 9/C# 13 environments that support allows ref struct)
Span span = stackalloc int[5] { 1, 2, 3, 4, 5 };
var seq1 = span.AsValueEnumerable().Select(x => x * x);
// If enables Drop-in replacement, you can call LINQ operator directly.
var seq2 = span.Select(x => x);

While some libraries claim to support LINQ for Spans, they typically only define extension methods for Span without a generic mechanism. They offer limited operators due to language constraints that previously prevented receiving Span as a generic parameter. Generic processing became possible with the introduction of allows ref struct in .NET 9.

In ZLinq, there’s no distinction between IEnumerable and Span - they're treated equally.

However, since allows ref struct requires language/runtime support, while ZLinq supports all .NET versions from .NET Standard 2.0 up, Span support is limited to .NET 9 and above. This means in .NET 9+, all operators are ref struct, which differs from earlier versions.

LINQ to SIMD

System.Linq accelerates certain aggregation methods with SIMD. For example, calling Sum or Max directly on primitive type arrays provides faster processing than using a for loop. However, being based on IEnumerable, applicable types are limited. ZLinq makes this more generic through IValueEnumerator.TryGetSpan, targeting collections where Span can be obtained (including direct Span application).

Supported methods include:

Range to ToArray/ToList/CopyTo/etc…
Repeat for unmanaged struct and size is power of 2 to ToArray/ToList/CopyTo/etc...
Sum for sbyte, short, int, long, byte, ushort, uint, ulong, double
SumUnchecked for sbyte, short, int, long, byte, ushort, uint, ulong, double
Average for sbyte, short, int, long, byte, ushort, uint, ulong, double
Max for byte, sbyte, short, ushort, int, uint, long, ulong, nint, nuint, Int128, UInt128
Min for byte, sbyte, short, ushort, int, uint, long, ulong, nint, nuint, Int128, UInt128
Contains for byte, sbyte, short, ushort, int, uint, long, ulong, bool, char, nint, nuint
SequenceEqual for byte, sbyte, short, ushort, int, uint, long, ulong, bool, char, nint, nuint

Sum checks for overflow, which adds overhead. We've added a custom SumUnchecked method that's faster:

Since these methods apply implicitly when conditions match, understanding the internal pipeline is necessary to target SIMD application. Therefore, for T[], Span, or ReadOnlySpan, we provide the .AsVectorizable() method to explicitly call SIMD-applicable operations like Sum, SumUnchecked, Average, Max, Min, Contains, and SequenceEqual (though these fall back to normal processing when Vector.IsHardwareAccelerated && Vector.IsSupported is false).

int[] or Span gain the VectorizedFillRange method, which performs the same operation as ValueEunmerable.Range().CopyTo(), filling with sequential numbers using SIMD acceleration. This is much faster than filling with a for loop when needed:

Vectorizable Methods

Handwriting SIMD loop processing requires practice and effort. We’ve provided helpers that take Func arguments for casual use. While these incur delegate overhead and perform worse than inline code, they’re convenient for casual SIMD processing. They accept Func, Vector> vectorFunc and Func func, processing with Vector where possible and handling remainder with Func.

T[] and Span offer the VectorizedUpdate method:

using ZLinq.Simd; // needs using

int[] source = Enumerable.Range(0, 10000).ToArray();
[Benchmark]
public void For()
{
    for (int i = 0; i < source.Length; i++)
    {
        source[i] = source[i] * 10;
    }
}
[Benchmark]
public void VectorizedUpdate()
{
    // arg1: Vector => Vector
    // arg2: int => int
    source.VectorizedUpdate(static x => x * 10, static x => x * 10);
}

While faster than for loops, performance varies by machine environment and size, so verification is recommended for each use case.

AsVectorizable() provides Aggregate, All, Any, Count, Select, and Zip:

source.AsVectorizable().Aggregate((x, y) => Vector.Min(x, y), (x, y) => Math.Min(x, y))
source.AsVectorizable().All(x => Vector.GreaterThanAll(x, new(5000)), x => x > 5000);
source.AsVectorizable().Any(x => Vector.LessThanAll(x, new(5000)), x => x < 5000);
source.AsVectorizable().Count(x => Vector.GreaterThan(x, new(5000)), x => x > 5000);

Performance depends on data, but Count can show significant differences:

For Select and Zip, you follow with either ToArray or CopyTo:

// Select
source.AsVectorizable().Select(x => x * 3, x => x * 3).ToArray();
source.AsVectorizable().Select(x => x * 3, x => x * 3).CopyTo(destination);

// Zip2
array1.AsVectorizable().Zip(array2, (x, y) => x + y, (x, y) => x + y).CopyTo(destination);
array1.AsVectorizable().Zip(array2, (x, y) => x + y, (x, y) => x + y).ToArray();
// Zip3
array1.AsVectorizable().Zip(array2, array3, (x, y, z) => x + y + z, (x, y, z) => x + y + z).CopyTo(destination);
array1.AsVectorizable().Zip(array2, array3, (x, y, z) => x + y + z, (x, y, z) => x + y + z).ToArray();

Zip can be particularly interesting and fast for certain use cases (like merging two Vec3):

LINQ to Tree

Have you used LINQ to XML? In 2008 when LINQ appeared, XML was still dominant, and LINQ to XML’s usability was shocking. Now that JSON has taken over, LINQ to XML is rarely used.

However, LINQ to XML’s value lies in being a reference design for LINQ-style operations on tree structures — a guideline for making tree structures LINQ-compatible. Tree traversal abstractions work excellently with LINQ to Objects. A prime example is working with Roslyn’s SyntaxTree, where methods like Descendants are commonly used in Analyzers and Source Generators.

ZLinq extends this concept by defining an interface that generically enables Ancestors, Children, Descendants, BeforeSelf, and AfterSelf for tree structures:

This diagram shows traversal of Unity’s GameObject, but we’ve included standard implementations for FileSystem (DirectoryTree) and JSON (enabling LINQ to XML-style operations on System.Text.Json’s JsonNode). Of course, you can implement the interface for custom types:

public interface ITraverser : IDisposable
    where TTraverser : struct, ITraverser // self
{
    T Origin { get; }
    TTraverser ConvertToTraverser(T next); // for Descendants
    bool TryGetHasChild(out bool hasChild); // optional: optimize use for Descendants
    bool TryGetChildCount(out int count);   // optional: optimize use for Children
    bool TryGetParent(out T parent); // for Ancestors
    bool TryGetNextChild(out T child); // for Children | Descendants
    bool TryGetNextSibling(out T next); // for AfterSelf
    bool TryGetPreviousSibling(out T previous); // BeforeSelf
}

For JSON, you can write:

var json = JsonNode.Parse("""
// snip...
""");

// JsonNode
var origin = json!["nesting"]!["level1"]!["level2"]!;
// JsonNode axis, Children, Descendants, Anestors, BeforeSelf, AfterSelf and ***Self.
foreach (var item in origin.Descendants().Select(x => x.Node).OfType())
{
    // [true, false, true], ["fast", "accurate", "balanced"], [1, 1, 2, 3, 5, 8, 13]
    Console.WriteLine(item.ToJsonString(JsonSerializerOptions.Web));
}

We’ve included standard LINQ to Tree implementations for Unity’s GameObject and Transform and Godot's Node. Since allocation and traversal performance are carefully optimized, they might even be faster than manual loops.

OSS and Me

There have been several incidents in .NET-related OSS in recent months, including the commercialization of well-known OSS projects. With over 40 OSS projects under github/Cysharp and more under my personal and other organizations like MessagePack, totaling over 50,000 stars, I believe I’m one of the largest OSS providers in the .NET ecosystem.

Regarding commercialization, I have no plans for it, but maintenance has become challenging due to growing scale. A major factor in OSS projects attempting commercialization despite criticism is the mental burden on maintainers (compensation doesn’t match time investment). I experience this too!

Setting aside financial aspects, my request is for users to accept occasional maintenance delays! When developing large libraries like ZLinq, I need focused time, which means Issues and PRs for other libraries might go without response for months. I intentionally avoid looking at them, not even reading titles (avoiding dashboards and notification emails). This seemingly neglectful approach is necessary to create innovative libraries — a necessary sacrifice!

Even without that, the sheer number of libraries means rotation delays of months are inevitable. This is unavoidable due to absolute manpower shortage, so please accept these delays and don’t claim “this library is dead” just because responses are slow. That’s painful to hear! I try my best, but creating new libraries consumes tremendous time, causing cascading delays that drain my mental energy.

Also, irritations related to Microsoft can reduce motivation — a common experience for C# OSS maintainers. Despite this, I hope to continue long-term.

Conclusion

ZLinq’s structure changed significantly after feedback from the initial preview release. @Akeit0 provided many proposals for core performance-critical elements like the ValueEnumerable definition and adding Index to TryCopyTo. @filzrev contributed extensive test and benchmark infrastructure. Ensuring compatibility and performance improvements wouldn't have been possible without their contributions, for which I'm deeply grateful.

While zero-allocation LINQ libraries aren’t novel, ZLinq’s thoroughness sets it apart. With experience and knowledge, driven by sheer determination, we implemented all methods, ran all test cases for complete compatibility, and implemented all optimizations including SIMD. This was truly challenging!

The timing was perfect as .NET 9/C# 13 provided all the language features needed for a full implementation. Simultaneously, maintaining support for Unity and .NET Standard 2.0 was also important.

Beyond being just a zero-allocation LINQ, LINQ to Tree is a favorite feature that I hope people will try!

One LINQ performance bottleneck is delegates, and some libraries adopt a ValueDelegate approach using structs to mimic Func. We deliberately avoided this because such definitions are impractical due to their complexity. It’s better to write inline code than use LINQ with ValueDelegate structures. Complicating internal structure and bloating assembly size for benchmark hacks is wasteful, so we accept only System.Linq-compatible.

R3 was an ambitious library intended to replace .NET’s standard System.Reactive, but replacing System.Linq would be a much larger or perhaps excessive undertaking, so I think there might be some resistance to adoption. However, I believe we’ve demonstrated sufficient benefits to justify the replacement, so I’d be very happy if you could try it out!

MasterMemory v3 — A Fast Read-Only In-Memory Database for C# with Source Generator Support

Yoshifumi Kawai — Fri, 20 Dec 2024 09:57:57 GMT

MasterMemory v3 — A Fast Read-Only In-Memory Database for C# with Source Generator Support

I’ve released MasterMemory v3! It finally supports Source Generators!

MasterMemory is a C# in-memory database that is fast, memory-efficient, and type-safe. It’s 4700 times faster than using SQLite directly!

Originally, MasterMemory had an advanced design philosophy of generating C# code from C# code, doing Source Generator-like tasks in an era before Source Generators existed. When porting it now, I was impressed by how smoothly it could be ported and how the legacy code worked without any modifications. The times have finally caught up…

As such, database construction code and query portions are automatically generated by Source Generator from C# definitions like this:

[MemoryTable("person"), MessagePackObject(true)]
public record Person
{
    [PrimaryKey]
    public required int PersonId { get; init; }
    
    [SecondaryKey(0), NonUnique]
    [SecondaryKey(1, keyOrder: 1), NonUnique]
    public required int Age { get; init; }

    [SecondaryKey(2), NonUnique]
    [SecondaryKey(1, keyOrder: 0), NonUnique]
    public required Gender Gender { get; init; }

    public required string Name { get; init; }
}

Since it’s generated as C# code, not only do all queries have input completion and type-safe return values, but it also contributes to better performance.

Since it’s used as a read-only database, immutable class definitions are preferable, and with recent C# features like record, init, and required, it's become even more convenient to use as a Readonly Database. While required isn't available in Unity, record and init are, so there's no problem using it with Unity.

Note that the Unity version is now provided through NuGetForUnity. Also, it requires MessagePack for C# v3, which supports Source Generator.

MasterMemory is actually quite widely used. I’ve started to see it being adopted in games more frequently. So I’m really happy that we’ve finally resolved the hassle of code generation from external tools, which had been causing quite some concern!

Migration from v2 to v3 shouldn’t be too difficult. We deliberately avoided touching the quality of generated code, core functions, and method signatures, so it should work right out of the box just by removing the parts where you were running the command-line tool. Just make sure to set the namespace using assembly attributes.

Additionally, we’ve added support for records (which we hadn’t done before!) and #nullable enable (which we also hadn’t done before!), so the usability should be improved beyond just the generated parts.

In the future, we’re considering adding MemoryPack support, modernizing the API further (it’s currently netstandard2.0, so it’s old), and making overall improvements (like replacing generated code parts such as ImmutableBuilder). There’s a lot we can do, so I hope we can work on these improvements when the opportunity arises.

ConsoleAppFramework v5.3.0

Yoshifumi Kawai — Tue, 17 Dec 2024 03:39:50 GMT

ConsoleAppFramework v5.3.0 — Enhanced DI Integration through Auto-generated Methods from NuGet References, and More

I’ve made a relatively significant update to ConsoleAppFramework v5! For details about v5 itself, please refer to my previous article ConsoleAppFramework v5 — Zero Overhead, Native AOT-compatible CLI Framework for C#. While v5 introduced some interesting concepts that were well-received, it did sacrifice some usability aspects. This update addresses those issues, and I believe it has significantly improved the overall user experience!

Disabling Automatic Name Conversion

By default, command names and option names are automatically converted to kebab-case. While this follows standard command-line tool naming conventions, it might feel cumbersome when using the framework for internal applications or batch file creation. Therefore, we’ve added the ability to disable this conversion at the assembly level.

using ConsoleAppFramework;

[assembly: ConsoleAppFrameworkGeneratorOptions(DisableNamingConversion = true)]

var app = ConsoleApp.Create();
app.Add();
app.Run(args);

public class MyProjectCommand
{
    public void Execute(string fooBarBaz)
    {
        Console.WriteLine(fooBarBaz);
    }
}

The automatic conversion is disabled by [assembly: ConsoleAppFrameworkGeneratorOptions(DisableNamingConversion = true)]. In this example, the command becomes ExecuteCommand --fooBarBaz.

From an implementation perspective, while many Source Generators use AdditionalFiles with JSON or custom format files (like BannedSymbols.txt in BannedApiAnalyzers) to provide configuration, using files can be quite cumbersome. For setting just one or two boolean values, using assembly attributes is the most straightforward approach.

The implementation can pull this from CompilationProvider using Assembly.GetAttributes:

var generatorOptions = context.CompilationProvider.Select((compilation, token) =>
{
    foreach (var attr in compilation.Assembly.GetAttributes())
    {
        if (attr.AttributeClass?.Name == "ConsoleAppFrameworkGeneratorOptionsAttribute")
        {
            var args = attr.NamedArguments;
            var disableNamingConversion = args.FirstOrDefault(x => x.Key == "DisableNamingConversion").Value.Value as bool? ?? false;
            return new ConsoleAppFrameworkGeneratorOptions(disableNamingConversion);
        }
    }

    return new ConsoleAppFrameworkGeneratorOptions(DisableNamingConversion: false);
});

By combining this with Source from other SyntaxProviders, we can reference the attribute values during generation.

ConfigureServices/Logging/Configuration

ConsoleAppFramework v5 had a constraint where it couldn’t generate code dependent on specific libraries due to its zero-dependency principle. This meant that integrating with DI required manually building the ServiceProvider, adding an extra step for users. To address this, we’ve added functionality that analyzes NuGet DLL references and makes the ConfigureServices method available on ConsoleAppBuilder when Microsoft.Extensions.DependencyInjectionis referenced.

var app = ConsoleApp.Create()
    .ConfigureServices(service =>
    {
        service.AddTransient();
    });

app.Add("", ([FromServices] MyService service, int x, int y) => Console.WriteLine(x + y));

app.Run(args);

This provides a new experience where the framework itself maintains zero dependencies while still being able to generate library-dependent code. This is achieved by pulling from MetadataReferencesProvider and feeding it into the generation process:

var hasDependencyInjection = context.MetadataReferencesProvider
    .Collect()
    .Select((xs, _) =>
    {
        var hasDependencyInjection = false;

        foreach (var x in xs)
        {
            var name = x.Display;
            if (name == null) continue;

            if (!hasDependencyInjection && name.EndsWith("Microsoft.Extensions.DependencyInjection.dll"))
            {
                hasDependencyInjection = true;
                continue;
            }

            // etc...
        }

        return new DllReference(hasDependencyInjection, hasLogging, hasConfiguration, hasJsonConfiguration, hasHost);
    });

context.RegisterSourceOutput(hasDependencyInjection, EmitConsoleAppConfigure);

Reference analysis is performed for multiple dependencies. For example, if Microsoft.Extensions.Loggingis referenced, ConfigureLogging becomes available. This allows for clean integration with ZLogger:

// Package Import: ZLogger
var app = ConsoleApp.Create()
    .ConfigureLogging(x =>
    {
        x.ClearProviders();
        x.SetMinimumLevel(LogLevel.Trace);
        x.AddZLoggerConsole();
        x.AddZLoggerFile("log.txt");
    });

app.Add();
app.Run(args);

// inject logger to constructor
public class MyCommand(ILogger logger)
{
    public void Echo(string msg)
    {
        logger.ZLogInformation($"Message is {msg}");
    }
}

Loading configuration from appsettings.json is now a common pattern, and when Microsoft.Extensions.Configuration.Json is referenced, ConfigureDefaultConfiguration becomes available. This automatically performs SetBasePath(System.IO.Directory.GetCurrentDirectory()) and AddJsonFile("appsettings.json", optional: true) (additional configuration via Action is possible, and ConfigureEmptyConfiguration is also available).

This makes it simple to read configuration, bind it to classes, and inject it into commands:

// Package Import: Microsoft.Extensions.Configuration.Json
var app = ConsoleApp.Create()
    .ConfigureDefaultConfiguration()
    .ConfigureServices((configuration, services) =>
    {
        // Package Import: Microsoft.Extensions.Options.ConfigurationExtensions
        services.Configure(configuration.GetSection("Position"));
    });

app.Add();
app.Run(args);

// inject options
public class MyCommand(IOptions options)
{
    public void Echo(string msg)
    {
        ConsoleApp.Log($"Binded Option: {options.Value.Title} {options.Value.Name}");
    }
}

For those wanting to build with Microsoft.Extensions.Hosting, ToConsoleAppBuilder becomes available when Microsoft.Extensions.Hosting is referenced:

// Package Import: Microsoft.Extensions.Hosting
var app = Host.CreateApplicationBuilder()
    .ToConsoleAppBuilder();

Additionally, the configured IServiceProvider is now automatically disposed after Run or RunAsync completes.

RegisterCommands from Attribute

While commands previously required Add or Add, we've added functionality to automatically add commands through class attributes:

[RegisterCommands]
public class Foo
{
    public void Baz(int x)
    {
        Console.Write(x);
    }
}

[RegisterCommands("bar")]
public class Bar
{
    public void Baz(int x)
    {
        Console.Write(x);
    }
}

These are automatically added:

var app = ConsoleApp.Create();

// Commands:
//   baz
//   bar baz
app.Run(args);

You can still use Add and Add alongside these attribute-based registrations.

Initially, we planned to allow arbitrary attributes, but due to IncrementalGenerator API limitations, we're restricted to the fixed RegisterCommands attribute. Inheritance is also not supported.

Since the v5 release, we’ve continued making improvements, including allowing filters to be defined in external assemblies and optimizing the Incremental Generator implementation for better performance. The framework has evolved into an excellent solution!

By the way, regarding System.CommandLine, they announced Resetting System.CommandLine in March due to ongoing issues. As expected, there hasn’t been much progress. This was predictable, and it’s better not to have high expectations. Using ConsoleAppFramework is a solid choice moving forward.

MessagePack for C# v3 Release with Source Generator Support

Yoshifumi Kawai — Fri, 06 Dec 2024 08:10:47 GMT

Last month, the MessagePack for C# project joined the .NET Foundation! I hope this will help users feel more confident about using the library with a stable perspective.

And now, after long development, the major version upgrade v3 has been released. While the core part remains mostly unchanged from v2, it fully incorporates Source Generator. Since IL dynamic generation still exists, it becomes a hybrid serializer with both IL dynamic generation and Source Generator. v3 comes with built-in Source Generator and Analyzer, and existing code will automatically be Source Generator-enabled just by compiling with v3. No additional code writing is required from users to support Source Generator when updating from v2 to v3!

Let’s look at the behavior in detail. For example, when you write code like:

[MessagePackObject]
public class MyTestClass
{
    [Key(0)]
    public int MyProperty { get; set; }
}

The following code is automatically generated internally by the Source Generator:

partial class GeneratedMessagePackResolver
{
    internal sealed class MyTestClassFormatter : IMessagePackFormatter
    {
        public void Serialize(ref MessagePackWriter writer, MyTestClass value, MessagePackSerializerOptions options)
        {
            if (value == null)
            {
                writer.WriteNil();
                return;
            }

            writer.WriteArrayHeader(1);
            writer.Write(value.MyProperty);
        }

        public MyTestClass Deserialize(ref MessagePackReader reader, MessagePackSerializerOptions options)
        {
            if (reader.TryReadNil())
            {
                return null;
            }

            options.Security.DepthStep(ref reader);
            var length = reader.ReadArrayHeader();
            var ____result = new MyTestClass();

            for (int i = 0; i < length; i++)
            {
                switch (i)
                {
                    case 0:
                        ____result.MyProperty = reader.ReadInt32();
                        break;
                    default:
                        reader.Skip();
                        break;
                }
            }

            reader.Depth--;
            return ____result;
        }
    }
}

Moreover, this GeneratedMessagePackResolver is already registered in the default options (like StandardResolver):

public static readonly IFormatterResolver[] DefaultResolvers = [
    BuiltinResolver.Instance,
    AttributeFormatterResolver.Instance,
    SourceGeneratedFormatterResolver.Instance, // here
    ImmutableCollection.ImmutableCollectionResolver.Instance,
    CompositeResolver.Create(ExpandoObjectFormatter.Instance),
    DynamicGenericResolver.Instance, // only enable for RuntimeFeature.IsDynamicCodeSupported
    DynamicUnionResolver.Instance];

Serialization target classes included in user code assemblies will prioritize using code generated by the Source Generator. GeneratedMessagePackResolver offers several customization points, such as changing the default namespace and names, or modifying generated formatters to be map-based. For more details, please check the new documentation. For those wanting to know the detailed changes from v2 to v3, please check the Migration Guide v2 -> v3.

For Unity, the installation method has significantly changed. The core library is now common with the .NET version and requires installation from NuGet. Additionally, you need to download Unity-specific additional code via UPM. For details, please check the MessagePack-CSharp#unity-support section.

The .unitypackage distribution has been discontinued. Also, mpc, which was required for IL2CPP support, is no longer needed. It has been completely migrated to Source Generator. Therefore, Unity support version starts from 2022.3.12f1. Regarding Source Generator, it is automatically enabled when installing the core library via NuGetForUnity, so no additional work is required.

History and Next

The original MessagePack for C# (v1) was released by me (Yoshifumi Kawai/@neuecc) in 2017. I created it as a performance-focused binary serializer because the existing (binary) serializers in 2016 couldn’t meet the performance requirements for solving issues in the game I was developing at the time. Along with it, I also released MagicOnion, a gRPC-based RPC framework created as a network system.

While v1 release only targeted byte[], .NET kept adding new I/O-related APIs like Span and IBufferWriter, so v2 introduced a new design focusing on these. This implementation was led and released by Microsoft Engineer Andrew Arnott / @AArnott.

Since then, it has continued under joint maintenance and moved from my personal repository (neuecc/MessagePack-CSharp) to an organization (MessagePack-CSharp/MessagePack-CSharp). It’s used in major Microsoft products like Visual Studio 2022, SignalR’s binary protocol, and Blazor Server protocol, and has gathered the most stars on GitHub among .NET binary serializers. It’s also recommended as one of the migration targets for BinaryFormatter, which is being deprecated in .NET 9.

With v3’s Source Generator support, we’ve taken the first step toward higher performance, flexibility, and AOT compatibility.

While I consider the MessagePack for C# project a great success, AArnott is currently starting development on his own new MessagePack project. During this time, I’ve also released MemoryPack, a serializer with a different format. Therefore, I think it’s necessary to explain somewhat about the future of MessagePack for C# and its characteristics.

I believe the maintenance system will continue with two people, but regarding active development, I might take the lead again. I operate with the understanding that MessagePack and MemoryPack have different characteristics as formats, and both are important. I like the original implementation of MessagePack for C#, and I think it’s still absolutely competitive even today.

AArnott’s different MessagePack serializer has slightly different fundamental philosophy. In that regard, I recognize it not as an improved serializer but as one with a different personality. Let me explain the differences.

Binary spec, default settings and performance

What’s important for serializer performance is both “specification and implementation.” For example, binary formats are generally faster than text formats like JSON. However, a well-implemented JSON serializer is faster than a mediocre binary serializer (I’ve demonstrated this by creating a serializer called Utf8Json). So, both specification and implementation are important. If you can achieve both, that becomes the best-performing serializer.

MessagePack’s binary specification is expressed as a binary version of JSON, as its motto “It’s like JSON, but fast and small” suggests. However, MessagePack for C#’s default doesn’t necessarily aim to be JSON-like.

[MessagePackObject]
public class MsgPackSchema
{
    [Key(0)]
    public bool Compact { get; set; }
    [Key(1)]
    public int Schema { get; set; }
}

When this class is serialized, it would be expressed in JSON as [true, 0]. This is because the object is serialized array-based, whereas if serialized map-based, it would be expressed as {"Compact":true,"Schema":0}.

The advantage of array-based serialization is, as you can see, it becomes more compact in binary size. Compact size means less processing, which positively affects serialization speed. Also, for deserialization, since there’s no need to search for properties to deserialize by comparing strings, faster deserialization speed can be expected.

Note that array-based serialization is also adopted by msgpack-java, the reference implementation by MessagePack specification creator Sadayuki Furuhashi, so it’s not an unorthodox approach.

In MessagePack-CSharp, if you want to serialize in a JSON-like map-based format, you can write [MessagePackObject(true)]. Also, with Source Generator, you can override at the Resolver level to force map-based serialization.

[MessagePackObject(keyAsPropertyName: true)]
public class MsgPackSchema
{
    public bool Compact { get; set; }
    public int Schema { get; set; }
}

The advantages of maps are enabling flexible schema evolution, easier communication when interfacing with other languages, and higher self-descriptiveness of the binary itself. The disadvantages are the impact on size and performance, especially in arrays of objects where property names are included for each element, which becomes quite wasteful.

The default is set to array for pursuing compactness and performance. I considered MessagePack as a binary specification capable of achieving high performance before being JSON-like. Of course, maps are important too, so I made it possible to easily achieve map mode by just adding (true) to the attribute.

In array mode, you need to attach the Key attribute to all properties. This is necessary, just as Protocol Buffers requires numeric tags, when you’re not using the property name itself as the key. Of course, automatic numbering in sequence is possible, but I’ve determined that implicit handling of binary format keys is too risky (binary compatibility would break just by manipulating the order). In other words, explicit is the default. In large project development, both senior and junior members will touch the code; not everyone touching the code understands everything. So, implicit behavior should be avoided, and things should be explicit — this strong conviction led to this design choice.

However, attaching Keys to all properties is very painful (I had painful experiences with DataContract and protobuf-net before developing MessagePack-CSharp). So, we provided a feature to automatically attach them through Analyzer + Code Fix. This alleviates the pain of being explicit while getting the best of both worlds.

The other MessagePack serializer’s default appears to be map-based. This is partly because it’s based on PolyType, an abstraction library for creating Source Generator-based libraries, and partly because it seems to be an explicit preference for that approach.

A library can only choose one “default.” Even if it can process in either mode, there can only be one “default.” To reiterate, I prefer and prioritize “compactness and performance” as a binary format.

You might be hearing about PolyType for the first time. I’m not very favorable towards PolyType. While I think it’s very convenient for creating small things, I believe its limitations as an abstraction layer are too significant when aiming for best performance or expressing the best ideas. Therefore, I won’t adopt it in MessagePack for C# or in creating anything else.

Unity(multiplatform) Support

MessagePack for C# has provided first-class support for the Unity game engine since v1. This is partly because I serve as CEO of Cysharp, an affiliated company of Cygames, a Japanese game company, and have deep connections with the video game industry. We’ve actually created and used things that run on Unity ourselves. Of course, we also use it for server-side and desktop applications.

Unity has its own AOT system called IL2CPP, which is essential especially for releasing on mobile platforms like iOS. Even before Source Generator existed, we created and provided mpc, a code generation tool using Roslyn. It’s no exaggeration to say that MessagePack being used in hundreds of mobile games is thanks to my passionate support. With v3 finally becoming Source Generator-based, the workflow will be greatly simplified!

Generally, Unity support has been quite undervalued in the .NET community. Also, from an outside perspective, Microsoft and Microsoft employees seem to share this attitude, with little interest in platforms other than their own. I don’t think this attitude is very favorable, and it’s also limiting the potential of .NET. I think Xamarin’s failure to achieve growth trajectory was also partly due to such cold regard from Microsoft itself.

I take care to ensure that the libraries I create can properly support Unity as much as possible (the latest being Cysharp/R3, a new Reactive Extensions library). As for the other MessagePack serializer, it doesn’t seem likely to have solid Unity support…

Beyond v3

v3’s Native AOT Support is not complete. It’s challenging that just making it Source Generator-based doesn’t result in complete Native AOT support. This is honestly perplexing given that it works perfectly with Unity’s AOT, IL2CPP, and I think it also shows Microsoft’s not-so-good habits. In other words, they’re providing something complex to achieve perfect support. That’s the current Native AOT. While I can understand some aspects of the complex and bizarre attributes and flows, I think they should have been simplified more. Well, it probably won’t be fixed anymore…

In terms of performance, there are also points that regressed from v1 to v2, so we need to make implementation improvements based on the latest insights. I’m particularly dissatisfied with how the wide use of ReadOnlySequence creates significant constraints.

Better asynchronous APIs due to the standardization of PipeReader/PipeWriter in .NET 9, and streaming support that achieves both performance might also become major topics.

Because MessagePack for C# is widely used, breaking changes are difficult to make, and maintaining compatibility is the most important topic. However, as the world changes, choosing not to evolve is choosing the path to extinction. I think there’s still a lot we can do, so I want to continue being the cutting-edge, best binary serializer in .NET (MemoryPack too…!)

First, please try v3’s Source Generator. I think one of the good things about OSS is that we can create better things with everyone’s power.

Fast Dictionary Lookup of UTF-8 String in the C# 13 with .NET 9 AlternateLookup

Yoshifumi Kawai — Thu, 29 Aug 2024 08:20:45 GMT

Fast Dictionary Lookup of UTF-8 String in the C# 13 with .NET 9 AlternateLookup

In .NET 9, a new method GetAlternateLookup() has been added to dictionary-like classes: Dictionary, ConcurrentDictionary, HashSet, FrozenDictionary, and FrozenSet. Until now, Dictionary operations could only be performed via TKey. This was natural, but it became problematic with string keys, as we want to operate with both string and ReadOnlySpan. Previously, when only ReadOnlySpan was available, conversion to string using ToString was mandatory, it allocates new memory even if we just wanted to reference a Dictionary value!

This issue has been resolved with the introduction of GetAlternateLookup in .NET 9, which allows dictionaries to have alternate search keys.

var dict = new Dictionary
{
    { "foo", 10 },
    { "bar", 20 },
    { "baz", 30 }
};

var lookup = dict.GetAlternateLookup>();

var keys = "foo, bar, baz";

// .NET 9 SpanSplitEnumerator
foreach (Range range in keys.AsSpan().Split(','))
{
    ReadOnlySpan key = keys.AsSpan(range).Trim();

    // Get/Add/Remove from string key dictionary using ReadOnlySpan
    int value = lookup[key];
    Console.WriteLine(value);
}

By the way, the usual string Split allocates an array and individual split strings. However, in .NET 8, MemoryExtensions.Split was added, allowing a fixed number of splits on ReadOnlySpan. In .NET 9, a new Split that returns SpanSplitEnumerator has been added. This allows cutting out ReadOnlySpan from the original string without any additional allocations.

To reference keys with the extracted ReadOnlySpan, GetAlternateLookup becomes necessary.

One use case is serializers, which frequently require key-value lookups. In MessagePack for C# that I’m developing, we adopt multiple strategies for fast, allocation-free deserialization. One is AutomataDictionary, which treats UTF8 strings as 8-byte automata. This part is further inlined and embedded in IL Emit and Source Generator to eliminate dictionary lookups. Another is the AsymmetricKeyHashTable mechanism, which allows searching with two keys representing the same target, internally creating a dictionary searchable by both byte[] and ArraySegment.

// From MessagePack for C#
internal interface IAsymmetricEqualityComparer
{
    int GetHashCode(TKey1 key1);
    int GetHashCode(TKey2 key2);
    bool Equals(TKey1 x, TKey1 y);
    bool Equals(TKey1 x, TKey2 y); // Comparison between TKey1 and TKey2
}

In other words, until now, scenarios requiring dictionaries with alternate search keys necessitated creating custom dictionaries, and for performance, even basic data structures had to be custom-made. However, from .NET 9, this is finally achievable with standard tools.

What’s needed for AlternateLookup is IAlternateEqualityComparer, defined as follows: (The definition is similar to IAsymmetricEqualityComparer, so I might have anticipated the future by 10 years)

public interface IAlternateEqualityComparer
    where TAlternate : allows ref struct
    where T : allows ref struct
{
    bool Equals(TAlternate alternate, T other);
    int GetHashCode(TAlternate alternate);
    T Create(TAlternate alternate);
}

The language feature allows ref struct added in C# 13 allows ref structs, such as Span, to be used as generic type arguments.

Basically, this needs to be implemented along with IEqualityComparer. In fact, Dictionary.GetAlternateLookup throws a runtime exception (not a compile-time check!) if the Dictionary's IEqualityComparer doesn't implement IAlternateEqualityComparer. Also, it's a bit odd that an EqualityComparer has a Create method, but this is necessary for Add operations.

Currently, the standard only provides IAlternateEqualityComparer for string. The EqualityComparer typically used for strings implements IAlternateEqualityComparer and can be operated with ReadOnlySpan, but nothing else is provided.

However, what’s realistically needed in modern times is UTF8, ReadOnlySpan. I mentioned using it for serializer lookups, but the input of modern serializers is UTF8. There's no place for ReadOnlySpan. So, let's prepare an IAlternateEqualityComparer like this!

public sealed class Utf8StringEqualityComparer : IEqualityComparer, IAlternateEqualityComparer, byte[]>
{
    public static IEqualityComparer Default { get; } = new Utf8StringEqualityComparer();

    // IEqualityComparer

    public bool Equals(byte[]? x, byte[]? y)
    {
        if (x == null && y == null) return true;
        if (x == null || y == null) return false;

        return x.AsSpan().SequenceEqual(y);
    }

    public int GetHashCode([DisallowNull] byte[] obj)
    {
        return GetHashCode(obj.AsSpan());
    }

    // IAlternateEqualityComparer

    public byte[] Create(ReadOnlySpan alternate)
    {
        return alternate.ToArray();
    }

    public bool Equals(ReadOnlySpan alternate, byte[] other)
    {
        return other.AsSpan().SequenceEqual(alternate);
    }

    public int GetHashCode(ReadOnlySpan alternate)
    {
        // System.IO.Hashing package, cast to int is safe for hashing
        return unchecked((int)XxHash3.HashToUInt64(alternate));
    }
}

By default, byte[] is compared by reference, but we want to compare by data match, so we use ReadOnlySpan.SequenceEqual. This achieves fast comparison utilizing SIMD, especially when T is one of several primitives. For hash code calculation, it's best to use XxHash3, the .NET implementation of XXH3, the latest version of the fast xxHash algorithm series. This requires importing System.IO.Hashing from NuGet. The return value is ulong as it's calculated in 64 bits, but when a 32-bit value is needed, the xxHash author states that simply dropping bits is fine, so we can just cast to int.

Here’s an example of how to use it:

// Create a dictionary with Utf8StringEqualityComparer

var dict = new Dictionary(Utf8StringEqualityComparer.Default)
{
    { "foo"u8.ToArray(), true },
    { "bar"u8.ToArray(), false },
    { "baz"u8.ToArray(), false }
};

var lookup = dict.GetAlternateLookup>();

// Assume we have this input

ReadOnlySpan json = """    
{
    "foo": 0,
    "bar": 0,
    "baz": 0
}
"""u8;

// System.Text.Json
var reader = new Utf8JsonReader(json);

while (reader.Read())
{
    if (reader.TokenType == JsonTokenType.PropertyName)
    {
        // Can search with the extracted Key
        ReadOnlySpan key = reader.ValueSpan;
        var flag = lookup[key];
        
        Console.WriteLine(flag);
    }
}

One thing to note is that it’s better to avoid creating AlternateKey with string and ReadOnlySpan. This would always require encoding, resulting in the worst of both worlds (even if using Rune for allocation-less processing, it's no match for byte[] keys that can be compared with just binary comparison). If you absolutely need both searches, it's better to prepare two dictionaries.

Anyway, this is a long-awaited feature for me! I’ve created dictionaries in various variations many times, unable to use generics for Span support and having to hard-code them. I’m very excited that it’s now available for general use. While allows ref struct has some complexities in generic definitions (maybe automatic assignment would have been fine?), it's an important advancement as a language.

Let's start using .NET 9 and C# 13. It's still in preview, but the official release should be in November.

ZLogger v2 Architecture: Leveraging .NET 8 to Maximize Performance

Yoshifumi Kawai — Fri, 05 Jul 2024 11:42:05 GMT

ZLogger v2 Architecture: Leveraging .NET 8 to Maximize Performance

We have released ZLogger v2, a new ultra-fast and low-allocation logging library for C# and .NET. It’s been completely redesigned from v1 to align with the latest C# features. While it works best with .NET 8, it supports .NET Standard 2.0 and above, as well as Unity 2022.2 and above. Both .NET and Unity versions support text messages and structured logging(JSON and MessagePack in default).

Cysharp/ZLogger

The key point of the new design is the full adoption of String Interpolation, which achieves both clean syntax and performance.

logger.ZLogInformation($"Hello my name is {name}, {age} years old.");

Code written like this is compiled into:

if (logger.IsEnabled(LogLvel.Information))
{
    var handler = new ZLoggerInformationInterpolatedStringHandler(30, 2, logger);
    handler.AppendLiteral("Hello my name is ");
    handler.AppendFormatted(name, 0, null, "name");
    handler.AppendLiteral(", ");
    handler.AppendFormatted(age, 0, null, "age");
    handler.AppendLiteral(" years old.");
}

The efficiency is evident from the code: the format string is expanded at compile time rather than runtime, and parameters are received as generics in the form of AppendFormatted, avoiding boxing. Incidentally, 30 in the constructor represents the string length, and 2 is the number of parameters, which contributes to efficiency by calculating the required initial buffer size.

String Interpolation itself has been a feature since C# 6.0, but enhanced String Interpolation from C# 10.0 allows for custom String Interpolation.

The string fragments and parameters obtained this way are ultimately written directly to the Stream as UTF8 without being stringified, through Cysharp/Utf8StringInterpolation, achieving high speed and low allocation.

For Structured Logging as well, by tightly coupling with System.Text.Json’s Utf8JsonWriter:

// For example, write {"name":"foo",age:33} to Utf8JsonWriter
// Source Generator version, very easy to understand what's actually happening
public void WriteJsonParameterKeyValues(Utf8JsonWriter writer, JsonSerializerOptions jsonSerializerOptions)
{
    writer.WriteString(_jsonParameter_name, this.name);
    writer.WriteNumber(_jsonParameter_age, this.age);
}

// StringInterpolation version, seems a bit roundabout but does the same thing
public void WriteJsonParameterKeyValues(Utf8JsonWriter writer, JsonSerializerOptions jsonSerializerOptions)
{
    for (var i = 0; i < ParameterCount; i++)
    {
        ref var p = ref parameters[i];
        writer.WritePropertyName(p.Name.AsSpan());
        // Explanation of MagicalBox will come later
        if (!magicalBox.TryReadTo(p.Type, p.BoxOffset, jsonWriter, jsonSerializerOptions))
        {
            // ....
        }
    }
}

It’s written directly as UTF8 again. Structured Logging is a recent trend, so it’s implemented in loggers of various languages, but I don’t think there’s any other implementation that achieves such clean syntax while maintaining performance!

So, how about actual benchmark results? The allocation is at least overwhelmingly low.

The reason for the hesitant statement about allocation is that NLog, which was carefully set up for high speed, was faster than expected, grrr…

Now, another feature of ZLogger is that it’s built directly on top of Microsoft.Extensions.Logging. Usually, loggers have their own systems and use a bridge to connect with Microsoft.Extensions.Logging. In realistic applications, it’s almost impossible to avoid Microsoft.Extensions.Logging, such as when using ASP.NET. From .NET 8, with enhanced OpenTelemetry support and Aspire, the importance of Microsoft.Extensions.Logging is increasing. Unlike ZLogger v1, v2 supports all features of Microsoft.Extensions.Logging, including Scope.

And for example, the quality of Serilog’s bridge library is quite low (I checked the source code as well), which is reflected in the actual performance numbers. ZLogger incurs no such overhead.

Also, default settings are very important. The standard settings of most loggers are quite slow, such as flushing each time when writing to a file stream. To speed this up, you need to properly adjust async and buffered settings, and ensure a reliable flush at the end to avoid loss, which is quite difficult. So, many people probably leave it at the default settings? ZLogger is adjusted to be the fastest by default, and the final flush is automatically applied with the lifecycle of Microsoft.Extensions’ DI, so there’s no loss when constructing applications with ApplicationBuilder, etc., without any conscious effort.

Note that the performance of flushing each time heavily depends on storage write performance, so you might find it’s not that slow when benchmarking locally on recent machines with M.2 SSDs, which are very fast. However, it’s better not to trust local results too much, as the storage performance of cloud servers where you actually deploy applications is unlikely to be that high.

MagicalBox

Here, I’ll introduce some tricks used to achieve performance. What’s carried over from v1 is the creation of an async asynchronous writing process utilizing System.Threading.Channels and efficient use of buffered through IBufferWriter for optimizing writing to Stream, but I'll skip the explanation.

For JSON conversion, parameters are temporarily held as values in InterpolatedStringHandler. In this case, the question arises of how to hold the value of . Normally, you'd think to hold it as an object type, like List.

[InterpolatedStringHandler]
public ref struct ZLoggerInterpolatedStringHandler
{
    // Using object to store values of any  type, not good as it causes boxing
    List parameters = new ();

    public void AppendFormatted(T value, int alignment = 0, string? format = null, [CallerArgumentExpression("value")] string? argumentName = null)
    {
        parameters.Add((object)value);
    }
}

To avoid this, ZLogger has prepared a mechanism called MagicalBox.

[InterpolatedStringHandler]
public ref struct ZLoggerInterpolatedStringHandler
{
    // Pack infinitely into the magic box
    MagicalBox magicalBox;
    List boxOffsets = new (); // Actually, this part is carefully cached

    public void AppendFormatted(T value, int alignment = 0, string? format = null, [CallerArgumentExpression("value")] string? argumentName = null)
    {
        if (magicalBox.TryWrite(value, out var offset)) // No boxing occurs!
        {
            boxOffsets.Add(offset);
        }
    }
}

MagicalBox is based on the concept that it can write any type (limited to unmanaged types) without boxing. Its actual implementation is just writing to byte[] using Unsafe.Write and reading using Unsafe.Read based on the offset.

internal unsafe partial struct MagicalBox
{
    byte[] storage;
    int written;

    public MagicalBox(byte[] storage)
    {
        this.storage = storage;
    }

    public bool TryWrite(T value, out int offset)
    {
        if (RuntimeHelpers.IsReferenceOrContainsReferences())
        {
            offset = 0;
            return false;
        }
        Unsafe.WriteUnaligned(ref storage[written], value);
        offset = written;
        written += Unsafe.SizeOf();
        return true;
    }

    public bool TryRead(int offset, out T value)
    {
        if (!RuntimeHelpers.IsReferenceOrContainsReferences())
        {
            value = default!;
            return false;
        }
        value = Unsafe.ReadUnaligned(ref storage[offset]);
        return true;
    }
}

This is based on implementation experience from MemoryPack serializer and works well.

Note that in the actual code, it becomes a slightly more complex code including efficient reuse of byte[] storage, non-generic Read support, special handling for Enum, etc. As expected.

Custom Format Strings

A good point of ZLogger’s String Interpolation is that if you include method calls in parameter values, they are called after the LogLevel check, preventing unnecessary execution.

// This
logger.ZLogDebug($"Id {obj.GetId()}: Data: {obj.GetData()}.");

// Is checked for LogLevel validity before methods are called, like this
if (logger.IsEnabled(LogLvel.Debug))
{
    // snip...
    writer.AppendFormatterd(obj.GetId());
    writer.AppendFormatterd(obj.GetData());
}

However, when outputting method calls to Structured Logging, ZLogger uses CallerArgumentExpression added from C# 10.0 onwards to get the parameter name, so in the case of method calls, it’s output with the rather awkward name “obj.GetId()”. Therefore, you can specify an alias with a special custom format string.

// You can give an alias with @name
logger.ZLogDebug($"Id {obj.GetId():@id}: Data: {obj.GetData():@data}.");

In ZLogger, following the original expression of String Interpolation, you can specify alignment with “,” and format string with “:”. In addition, as a special designation, if the format string starts with @, it’s output as a parameter name.

The @ parameter name specification and format string can be used together.

// Today is 2023-12-19.
// {"date":"2023-12-19T11:25:34.3642389+09:00"}
logger.ZLogDebug($"Today is {DateTime.Now:@date:yyyy-MM-dd}.");

Another common special format string is “json”, which allows output in JsonSerialized form (this feature was inspired by Serilog’s capabilities)

var position = new { Latitude = 25, Longitude = 134 };
var elapsed = 34;

// {"position":{"Latitude":25,"Longitude":134},"elapsed":34}
// Processed {"Latitude":25,"Longitude":134} in 034 ms.
logger.ZLogInformation($"Processed {position:json} in {elapsed:000} ms.");

Special format strings are also prepared for PrefixFormatter/SuffixFormatter to add log levels, categories, dates to the beginning/end.

logging.AddZLoggerConsole(options =>
{
    options.UsePlainTextFormatter(formatter =>
    {
        // 2023-12-19 02:46:14.289 [DBG]......
        formatter.SetPrefixFormatter($"{0:utc-longdate} [{1:short}]", (template, info) => template.Format(info.Timestamp, info.LogLevel));
    });
});

For Timestamp, there are longdate, utc-longdate, dateonly, etc. For LogLevel, short converts to a 3-character log level notation (the length of the beginning matches, making it easier to read when opened in an editor). These built-in special format strings also have a performance optimization meaning. For example, the code for LogLevel looks like this, so it's absolutely more efficient to write with pre-built UTF8 strings than to create the format manually.

static void AppendLogLevel(ref Utf8StringWriter> writer, ref LogLevel value, ref MessageTemplateChunk chunk)
{
    if (!chunk.NoAlignmentAndFormat)
    {
        if (chunk.Format == "short")
        {
            switch (value)
            {
                case LogLevel.Trace:
                    writer.AppendUtf8("TRC"u8);
                    return;
                case LogLevel.Debug:
                    writer.AppendUtf8("DBG"u8);
                    return;
                case LogLevel.Information:
                    writer.AppendUtf8("INF"u8);
                    return;
                case LogLevel.Warning:
                    writer.AppendUtf8("WRN"u8);
                    return;
                case LogLevel.Error:
                    writer.AppendUtf8("ERR"u8);
                    return;
                case LogLevel.Critical:
                    writer.AppendUtf8("CRI"u8);
                    return;
                case LogLevel.None:
                    writer.AppendUtf8("NON"u8);
                    return;
                default:
                    break;
            }
        }

        writer.AppendFormatted(value, chunk.Alignment, chunk.Format);
        return;
    }

    switch (value)
    {
        case LogLevel.Trace:
            writer.AppendUtf8("Trace"u8);
            break;
        case LogLevel.Debug:
            writer.AppendUtf8("Debug"u8);
            break;
        case LogLevel.Information:
            writer.AppendUtf8("Information"u8);
            break;
        case LogLevel.Warning:
            writer.AppendUtf8("Warning"u8);
            break;
        case LogLevel.Error:
            writer.AppendUtf8("Error"u8);
            break;
        case LogLevel.Critical:
            writer.AppendUtf8("Critical"u8);
            break;
        case LogLevel.None:
            writer.AppendUtf8("None"u8);
            break;
        default:
            writer.AppendFormatted(value);
            break;
    }
}

.NET 8 XxHash3 + Non-GC Heap

XxHash3 has been added from .NET 8. It’s the latest series of XxHash, the fastest hash algorithm, and its performance is such that it can be used for almost everything from small to large data without hesitation. Note that it requires System.IO.Hashing from NuGet, so it can be used even with .NET Standard 2.0, not just .NET 8.

ZLogger uses it in multiple places, but as one example, here’s the process of retrieving a cache from String Interpolation string literals:

// LiteralList generated by $"Hello my name is {name}, {age} years old."
// ["Hello my name is ", "name", ", ", "age", " years old."]
// Process to retrieve UTF8 converted cache (MessageSequence) from this
static readonly ConcurrentDictionary cache = new();

// Non-.NET 8 version
#if !NET8_0_OR_GREATER
struct LiteralList(List literals) : IEquatable
{
    [ThreadStatic]
    static XxHash3? xxhash;

    public override int GetHashCode()
    {
        var h = xxhash;
        if (h == null)
        {
            h = xxhash = new XxHash3();
        }
        else
        {
            h.Reset();
        }

        var span = CollectionsMarshal.AsSpan(literals);
        foreach (var item in span)
        {
            h.Append(MemoryMarshal.AsBytes(item.AsSpan()));
        }

        // https://github.com/Cyan4973/xxHash/issues/453
        // XXH3 64bit -> 32bit, okay to simple cast answered by XXH3 author.
        return unchecked((int)h.GetCurrentHashAsUInt64());
    }

    public bool Equals(LiteralList other)
    {
        var xs = CollectionsMarshal.AsSpan(literals);
        var ys = CollectionsMarshal.AsSpan(other.literals);
        if (xs.Length == ys.Length)
        {
            for (int i = 0; i < xs.Length; i++)
            {
                if (xs[i] != ys[i]) return false;
            }
            return true;
        }
        return false;
    }
}
#endif

XxHash3 is a class (it would have been nice if it was a struct like System.HashCode), so it’s being reused with ThreadStatic while generating GetHashCode. XxHash3 only outputs ulong, but according to the author, when dropping to 32 bits, it’s okay to drop directly without XOR or anything.

This is the normal usage, but for the .NET 8 version, we implemented an extreme optimization.

#if NET8_0_OR_GREATER

struct LiteralList(List literals) : IEquatable
{
    // literals are all const string, in .NET 8 it is allocated in Non-GC Heap so can compare by address.
    // https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-8/#non-gc-heap
    static ReadOnlySpan AsBytes(ReadOnlySpan literals)
    {
        return MemoryMarshal.CreateSpan(
            ref Unsafe.As(ref MemoryMarshal.GetReference(literals)),
            literals.Length * Unsafe.SizeOf());
    }

    public override int GetHashCode()
    {
        return unchecked((int)XxHash3.HashToUInt64(AsBytes(CollectionsMarshal.AsSpan(literals))));
    }

    public bool Equals(LiteralList other)
    {
        var xs = CollectionsMarshal.AsSpan(literals);
        var ys = CollectionsMarshal.AsSpan(other.literals);
        return AsBytes(xs).SequenceEqual(AsBytes(ys));
    }
}
#endif

It converts List? to ReadOnlySpan, and then calls XxHash3.HashToUInt64 or SequenceEqual in one go. This is visibly more efficient, but is it legal to convert List? to ReadOnlySpan? In this case, the conversion of string means converting to ReadOnlySpan, that is, it's intended to convert to a list of addresses of strings in the heap.

That’s fine so far, but the problem is whether comparing addresses isn’t too dangerous. First, even if strings are identical as strings, they can often be at different addresses. Second, the addresses of strings in the heap are not fixed, they can move. If we’re asking for GetHashCode or Equals as a dictionary key, it must be completely fixed during application execution.

However, focusing on this usage example, AppendLiteral called by String Interpolation is always passed as a constant at compile time, like handler.AppendLiteral("Hello my name is ");. Therefore, it's guaranteed to point to the same entity.

[InterpolatedStringHandler]
public ref struct ZLoggerInterpolatedStringHandler
{
    public void AppendLiteral([ConstantExpected] string s)
}

As a precaution, we explicitly state that only constants should be passed using ConstantExpected, which has been enabled from .NET 8.

Another point is that such constant strings are already interned, but it wasn’t guaranteed that the place where they were interned wouldn’t move until .NET 8. However, with the introduction of Non-GC Heap from .NET 8, it can be said that it’s guaranteed not to move.

// From .NET 8, the result of GC.GetGeneration for constants is int.MaxValue (in Non-GC Heap)
var str = "foo";
Console.WriteLine(GC.GetGeneration(str)); // 2147483647

This allowed us to maximize the speed of conversion from UTF16 String to UTF8 String, which is unavoidable in C#. Note that the Source Generator version can eliminate this lookup cost itself, so as the benchmark results showed, it’s even faster.

.NET 8 IUtf8SpanFormattable

ZLogger uses writing directly to UTF8 without going through strings as a pillar of performance. From .NET 8, IUtf8SpanFormattable has been added, which allows for generic direct conversion of values to UTF8. ZLogger supports .NET Standard 2.0 before .NET 8, so basic primitives like int and double are directly written to UTF8 through special handling, but in the case of .NET 8, the range of support is wider, so .NET 8 is recommended if possible.

Note that IUtf8SpanFormattable doesn’t care about the alignment of format strings, so Cysharp/Utf8StringInterpolation, which is a separate library, is a library that adds alignment support while supporting .NET Standard 2.0.

.NET 8 TimeProvider

TimeProvider is an abstraction of time-related APIs (including TimeZone, Timer, etc.) added from .NET 8, and it’s very useful for unit testing, etc., and will be an essential class in the future. TimeProvider is also available for .NET Standard 2.0 and Unity through Microsoft.Bcl.TimeProvider, even for versions below .NET 8.

So in ZLogger, you can fix the time of log output by specifying TimerProvider in ZLoggerOptions.

// It's better to use FakeTimeProvider from Microsoft.Extensions.TimeProvider.Testing
class FakeTime : TimeProvider
{
    public override DateTimeOffset GetUtcNow()
    {
        return new DateTimeOffset(1999, 12, 30, 11, 12, 33, TimeSpan.Zero);
    }
    
    public override TimeZoneInfo LocalTimeZone => TimeZoneInfo.Utc;
}

public class TimestampTest
{
    [Fact]
    public void LogInfoTimestamp()
    {
        var result = new List();
        using var factory = LoggerFactory.Create(builder =>
        {
            builder.AddZLoggerInMemory((options, _) =>
            {
                options.TimeProvider = new FakeTime(); // Set TimeProvider to a custom one
                options.UsePlainTextFormatter(formatter =>
                {
                    // Add Timestamp to the beginning
                    formatter.SetPrefixFormatter($"{0} | ", (template, info) => template.Format(info.Timestamp));
                });
            }, x =>
            {
                x.MessageReceived += msg => result.Add(msg);
            });
        });

        var logger = factory.CreateLogger();
        logger.ZLogInformation($"Foo");

        Assert.Equal("1999-12-30 11:12:33.000 | Foo", result[0]);
    }
}

This can be effectively used when you need to test with exact matches of log output.

Source Generator

Microsoft.Extensions.Logging provides LoggerMessageAttribute and Source Generator as standard for high-performance log output.

While this is indeed excellent for generating UTF16 strings, there’s a question mark over the Structured Logging generation part.

// This partial method
[LoggerMessage(LogLevel.Information, "My name is {name}, age is {age}.")]
public static partial void MSLog(this ILogger logger, string name, int age, int other);

// Generates this class
private readonly struct __MSLogStruct : global::System.Collections.Generic.IReadOnlyList>
{
    private readonly global::System.String _name;
    private readonly global::System.Int32 _age;

    public __MSLogStruct(global::System.String name, global::System.Int32 age)
    {
        this._name = name;
        this._age = age;
    }

    public override string ToString()
    {
        var name = this._name;
        var age = this._age;
        return $"My name is {name}, age is {age}."; // String generation seems fast (it's riding on C# 10.0's String Interpolation Improvements, so no complaints!)
    }

    public static readonly global::System.Func<__MSLogStruct, global::System.Exception?, string> Format = (state, ex) => state.ToString();

    public int Count => 4;

    // This is the code for Structured Logging, but hmm...?
    public global::System.Collections.Generic.KeyValuePair this[int index]
    {
        get => index switch
        {
            0 => new global::System.Collections.Generic.KeyValuePair("name", this._name),
            1 => new global::System.Collections.Generic.KeyValuePair("age", this._age),
            2 => new global::System.Collections.Generic.KeyValuePair("other", this._other),
            3 => new global::System.Collections.Generic.KeyValuePair("{OriginalFormat}", "My name is {name}, age is {age}."),
            _ => throw new global::System.IndexOutOfRangeException(nameof(index)),  // return the same exception LoggerMessage.Define returns in this case
        };
    }

    public global::System.Collections.Generic.IEnumerator> GetEnumerator()
    {
        for (int i = 0; i < 4; i++)
        {
            yield return this[i];
        }
    }

    global::System.Collections.IEnumerator global::System.Collections.IEnumerable.GetEnumerator() => GetEnumerator();
}

[global::System.CodeDom.Compiler.GeneratedCodeAttribute("Microsoft.Extensions.Logging.Generators", "8.0.9.3103")]
public static partial void MSLog(this global::Microsoft.Extensions.Logging.ILogger logger, global::System.String name, global::System.Int32 age)
{
    if (logger.IsEnabled(global::Microsoft.Extensions.Logging.LogLevel.Information))
    {
        logger.Log(
            global::Microsoft.Extensions.Logging.LogLevel.Information,
            new global::Microsoft.Extensions.Logging.EventId(764917357, nameof(MSLog)),
            new __MSLogStruct(name, age),
            null,
            __MSLogStruct.Format);
    }
}

With KeyValuePair, boxing can't be avoided when created normally, can't be helped.

So, ZLogger provides a similar Source Generator attribute called ZLoggerMessageAttribute. This enables UTF8 optimization and boxing-less JSON logging.

// Just change LoggerMessage to ZLoggerMessage
// Note that in the format string part of ZLoggerMessage, you can use @ for aliases and json for JSON conversion, just like in the String Interpolation version
[ZLoggerMessage(LogLevel.Information, "My name is {name}, age is {age}.")]
static partial void ZLoggerLog(this ILogger logger, string name, int age);

// This kind of code is generated
readonly struct ZLoggerLogState : IZLoggerFormattable
{
    // Pre-generate JsonEncodedText for JSON
    static readonly JsonEncodedText _jsonParameter_name = JsonEncodedText.Encode("name");
    static readonly JsonEncodedText _jsonParameter_age = JsonEncodedText.Encode("age");

    readonly string name;
    readonly int age;

    public ZLoggerLogState(string name, int age)
    {
        this.name = name;
        this.age = age;
    }

    public IZLoggerEntry CreateEntry(LogInfo info)
    {
        return ZLoggerEntry.Create(info, this);
    }

    public int ParameterCount => 2;
    public bool IsSupportUtf8ParameterKey => true;
    public override string ToString() => $"My name is {name}, age is {age}.";

    // Text messages are directly written to UTF8
    public void ToString(IBufferWriter writer)
    {
        var stringWriter = new Utf8StringWriter>(literalLength: 21, formattedCount: 2, bufferWriter: writer);
        stringWriter.AppendUtf8("My name is "u8); // Write literals directly with u8
        stringWriter.AppendFormatted(name, 0, null);
        stringWriter.AppendUtf8(", age is "u8);
        stringWriter.AppendFormatted(age, 0, null);
        stringWriter.AppendUtf8("."u8);            
        stringWriter.Flush();
    }

    // For JSON output, write directly to Utf8JsonWriter to completely avoid boxing
    public void WriteJsonParameterKeyValues(Utf8JsonWriter writer, JsonSerializerOptions jsonSerializerOptions, IKeyNameMutator? keyNameMutator = null)
    {
        // The method called differs depending on the type (WriteString, WriteNumber, etc...)
        writer.WriteString(_jsonParameter_name, this.name);
        writer.WriteNumber(_jsonParameter_age, this.age);
    }

    // Methods for extensions such as MessagePack support are actually generated below, but omitted
} 

static partial void ZLoggerLog(this global::Microsoft.Extensions.Logging.ILogger logger, string name, int age)
{
    if (!logger.IsEnabled(LogLevel.Information)) return;
    logger.Log(
        LogLevel.Information,
        new EventId(-1, nameof(ZLoggerLog)),
        new ZLoggerLogState(name, age),
        null,
        (state, ex) => state.ToString()
    );
}

By writing directly to Utf8JsonWriter and pre-generating key names as JsonEncodedText, we maximize the performance of JSON conversion.

Also, Structured Logging is not limited to JSON, other formats are possible. For example, using MessagePack could make it smaller and faster. ZLogger defines interfaces to avoid boxing even for output to protocols that are not built-in like JSON-specific ones.

public interface IZLoggerFormattable : IZLoggerEntryCreatable
{
    int ParameterCount { get; }

    // Used for message output
    void ToString(IBufferWriter writer);
    
    // Used for JSON output
    void WriteJsonParameterKeyValues(Utf8JsonWriter jsonWriter, JsonSerializerOptions jsonSerializerOptions, IKeyNameMutator? keyNameMutator = null);

    // Used for other structured log outputs
    ReadOnlySpan GetParameterKey(int index);
    ReadOnlySpan GetParameterKeyAsString(int index);
    object? GetParameterValue(int index);
    T? GetParameterValue(int index);
    Type GetParameterType(int index);
}

It’s a bit of an unusual interface, but by running a loop like this, we can eliminate the occurrence of boxing:

for (var i in ParameterCount)
{
    var key = GetParameterKey(i);
    var value = GetParameterValue();
}

This design is the same as the usage of IDataRecord in ADO.NET. Also, in Unity, it’s common to retrieve via index to avoid allocation of arrays from native to managed.

Unity

Even with Unity 2023, the officially supported C# version is 9.0. ZLogger assumes C# 10.0 or higher String Interpolation as a prerequisite, so it won’t work normally. Normally. However, although it hasn’t been officially announced, we discovered that from Unity 2022.2, the version of the included compiler has been raised, and internally it's possible to compile with C# 10.0.

You can pass compiler options through the csc.rsp file, so if you explicitly specify the language version there, all C# 10.0 syntax becomes available.

-langVersion:10

As it is, the output csproj still specifies 9.0, so you can't write in C# 10.0 on the IDE. So let's overwrite the LangVersion using Cysharp/CsprojModifier. If you create a file called LangVersion.props like this and have CsprojModifier mix it in, you'll be able to write as C# 10.0 on the IDE as well.


  
    10
    enable

For Unity, we’ve added an extension called AddZLoggerUnityDebug, so

// Prepare such a global utility
public static class LogManager
{
    static ILoggerFactory loggerFactory;

    public static ILogger CreateLogger() => loggerFactory.CreateLogger();
    public static readonly Microsoft.Extensions.Logging.ILogger Global;
    
    static LogManager()
    {
        loggerFactory = LoggerFactory.Create(logging =>
        {
            logging.SetMinimumLevel(LogLevel.Trace);
            logging.AddZLoggerUnityDebug(); // log to UnityDebug
        });
        Global = loggerFactory.CreateLogger("Logger");
        Application.exitCancellationToken.Register(() =>
        {
            loggerFactory.Dispose(); // flush when application exit.
        });
    }
}

// Try using it like this, for example
public class NewBehaviourScript : MonoBehaviour
{
    static readonly ILogger logger = LogManager.CreateLogger();

    void Start()
    {
        var name = "foo";
        var hp = 100;
        logger.ZLogInformation($"{name} HP is {hp}.");
    }
}

Note that the performance improvement of C# 10.0 String Interpolation is only applicable when using ZLog, and using String Interpolation for normal String generation will not improve performance. This is because DefaultInterpolatedStringHandler is needed in the runtime for string generation performance improvement, which is only included in .NET 6 and above. If DefaultInterpolatedStringHandler doesn’t exist, it falls back to the traditional string.Format, so boxing occurs as usual.

It supports all JSON structured logging, output customization, file output, etc.

var loggerFactory = LoggerFactory.Create(logging =>
{
    logging.AddZLoggerFile("/path/to/logfile", options =>
    {
        options.UseJsonFormatter();
    });
});

And as one more bonus, with Unity 2022.3.12f1 and above, the C# compiler version is a bit higher, and if you specify -langVersion:preview, you can use C# 11.0. Also, ZLogger's Source Generator is automatically enabled, so you can use [ZLoggerMessage] to generate.

public static partial class LogExtensions
{
    [ZLoggerMessage(LogLevel.Debug, "Hello, {name}")]
    public static partial void Hello(this ILogger logger, string name);
}

Since the code generated by the Source Generator requires C# 11.0 (because it uses UTF8 String Literal extensively), [ZLoggerMessage] is limited to Unity 2022.3.12f1 and above.

By the way, Unity has released com.unity.logging as a standard logging library of the same kind. It allows structured logging and file output in the same way, and it had an interesting design of using Source Generator to automatically generate the class itself and generate method overloads according to arguments to avoid boxing of values. There’s a lot of talk about Burst, but I think this bold use of Source Generator is the key to performance. ZLogger is utilizing C# 10.0’s String Interpolation, but I hadn’t thought about such an approach as a workaround. It’s quite eye-opening. The performance is also quite refined.

ZLogger has better writing feel due to String Interpolation, and I’d like to think the performance is a good match… what do you think?

Conclusion

By the way, in creating ZLogger v2, @hadashiA, famous for VContainer and VYaml, helped me from idea generation to detailed implementation, and put up with repeated specification overhauls. I think this v2 has become very complete, but I wouldn’t have reached this point alone, so I’m very grateful.

Anyway, I think ZLogger has become the strongest logger in terms of both ease of use and performance, so please give it a try.

ConsoleAppFramework v5 — Zero Overhead, Native AOT-compatible CLI Framework for C#

Yoshifumi Kawai — Fri, 14 Jun 2024 10:09:33 GMT

ConsoleAppFramework v5 — Zero Overhead, Native AOT-compatible CLI Framework for C#

We have released a completely new version of ConsoleAppFramework. It is a brand new framework that has been completely redesigned and reimplemented from scratch. With the design principles of “Zero Dependency, Zero Overhead, Zero Reflection, Zero Allocation, AOT Safe”, it achieves overwhelming performance that outpaces others by a wide margin.

This benchmark is for cold startup without any warm-up, which we believe is the most relevant to actual usage in CLI applications. Compared to System.CommandLine, it’s 280 times faster! The amount of memory allocation is also 100 to 1000 times less than other frameworks (the 400B shown is almost entirely system allocation, so the framework itself is 0).

This performance is achieved by generating everything with Source Generators. For example, consider the following code:

using ConsoleAppFramework;

// args: ./cmd --foo 10 --bar 20
ConsoleApp.Run(args, (int foo, int bar) => Console.WriteLine($"Sum: {foo + bar}"));

ConsoleAppFramework’s Source Generator analyzes the arguments of the lambda expression passed to Run and generates the Run method itself.

internal static partial class ConsoleApp
{
    // Generate the Run method itself with arguments and body to match the lambda expression
    public static void Run(string[] args, Action command)
    {
        // code body
    }
}

Normally, C#’s Source Generators are triggered by attributes given to classes or methods, but ConsoleAppFramework monitors method invocations and uses them as the key for generation. This idea is inspired by Rust’s macros. In Rust, there are classifications like Attribute-like macros and Function-like macros, and this approach can be considered a Function-like style.

The actual generated code in its entirety looks something like this:

internal static partial class ConsoleApp
{
    public static void Run(string[] args, Action command)
    {
        if (TryShowHelpOrVersion(args, 2, -1)) return;

        var arg0 = default(int);
        var arg0Parsed = false;
        var arg1 = default(int);
        var arg1Parsed = false;

        try
        {
            for (int i = 0; i < args.Length; i++)
            {
                var name = args[i];

                switch (name)
                {
                    case "--foo":
                    {
                        if (!TryIncrementIndex(ref i, args.Length) || !int.TryParse(args[i], out arg0)) { ThrowArgumentParseFailed("foo", args[i]); }
                        arg0Parsed = true;
                        break;
                    }
                    case "--bar":
                    {
                        if (!TryIncrementIndex(ref i, args.Length) || !int.TryParse(args[i], out arg1)) { ThrowArgumentParseFailed("bar", args[i]); }
                        arg1Parsed = true;
                        break;
                    }
                    default:
                        // omit...(case-insensitive compare codes)
                        ThrowArgumentNameNotFound(name);
                        break;
                }
            }
            if (!arg0Parsed) ThrowRequiredArgumentNotParsed("foo");
            if (!arg1Parsed) ThrowRequiredArgumentNotParsed("bar");

            command(arg0!, arg1!);
        }
        catch (Exception ex)
        {
            Environment.ExitCode = 1;
            if (ex is ValidationException or ArgumentParseFailedException)
            {
                LogError(ex.Message);
            }
            else
            {
                LogError(ex.ToString());
            }
        }
    }

    static partial void ShowHelp(int helpId)
    {
        Log("""
Usage: [options...] [-h|--help] [--version]

Options:
  --foo      (Required)
  --bar      (Required)
""");
    }
}

It looks like straightforward and simple code without any twists, doesn’t it? That’s important! The simpler the code, the faster it is! Simple despite being a framework, that’s why it’s fast. There is no extraneous code, and all the processing is aggregated in the method body itself, achieving zero overhead as a framework and the same speed as optimized handwritten code.

CLI applications typically involve single-shot execution from a cold start, making dynamic code generation (IL.Emit or Expression.Compile) and caching (speeding up subsequent matching through ArrayPool or Dictionary generation) less effective. Creating those would add more overhead. On the other hand, using reflection directly is slow in itself. ConsoleAppFramework dramatically speeds up single-shot execution by inline-generating all the necessary processing.

With no reflection, it also has overwhelming affinity with Native AOT, eliminating any disadvantages of C# in terms of cold startup speed.

Another feature is that since everything, including the ConsoleApp class, is generated by Source Generators, there are absolutely no dependencies, including ConsoleAppFramework itself.

There are various situations for creating console applications. Sometimes it’s a large batch application with many dependencies, and other times it’s a tiny single-function command. When creating a small command, you wouldn’t want to add any additional dependencies at all. Adding a reference to Microsoft.Extensions.Hosting alone brings in dozens of dependent DLLs! With ConsoleAppFramework, there are zero dependencies, including itself.

The advantage of zero dependencies is obviously a smaller binary size. Especially with Native AOT, binary size is a concern, but with ConsoleAppFramework, the additional cost is nearly zero.

And of course, a single function is not enough for a framework, so the following features are implemented. The rich set of features should be on par with other frameworks.

SIGINT/SIGTERM(Ctrl+C) handling with gracefully shutdown via CancellationToken
Filter(middleware) pipeline to intercept before/after execution
Exit code management
Support for async commands
Registration of multiple commands
Registration of nested commands
Setting option aliases and descriptions from code document comment
System.ComponentModel.DataAnnotations attribute-based Validation
Dependency Injection for command registration by type and public methods
Microsoft.Extensions(Logging, Configuration, etc...) integration
High performance value parsing via ISpanParsable
Parsing of params arrays
Parsing of JSON arguments
Help(-h|--help) option builder
Default show version(--version) option

The generated code is modularized and varies depending on the features used by the code, always generating the minimum code required to implement that feature. This allows it to balance functionality and performance. Additionally, every feature has been carefully tuned to run at the fastest possible speed, so even with all features enabled, it remains overwhelmingly fast compared to others.

As an aside, delegates do have an allocation for delegate generation. In other words, it’s not truly zero allocation and zero overhead. However, ConsoleAppFramework does provide a mechanism to achieve true zero allocation. Pass a static function as a function pointer as follows:

unsafe
{
    ConsoleApp.Run(args, &Sum);
}

static void Sum(int x, int y) => Console.Write(x + y);

Then it generates a method body with a delegate* managed<> argument (it may not be familiar, but C# has a language feature called managed function pointers).

public static unsafe void Run(string[] args, delegate* managed command)

Now it’s completely and indisputably zero allocation and zero overhead!

High-performance Value Conversion

What is the fastest way to convert a string to a C# value? For int, it’s int.TryParse, right? What about others? Int is hardcoded, so it's easy, but how do you make string -> T (or object) generic? It becomes a bit tricky, and in the past, TypeConverter was used. Of course, the performance is poor.

Alternatively, since JsonSerializer is now built-in, you could delegate it to that. Of course, the performance is not particularly good. Especially when considering cold startup, JsonSerializer requires caching, adding significant overhead for single-shot execution.

ConsoleAppFramework adopts IParsable and ISpanParsable. These were added in .NET 7 and use the static abstract interface added in C# 11.

public interface IParsable where TSelf : IParsable?
{
 static abstract TSelf Parse(string s, IFormatProvider? provider);
 static abstract bool TryParse([NotNullWhen(true)] string? s, IFormatProvider? provider, [MaybeNullWhen(returnValue: false)] out TSelf result);
}

Finally, with C# 11, a generic “string -> value” conversion mechanism has been realized! ConsoleAppFramework adopts it without question as .NET 8/C# 12 is the minimum runtime requirement. New types introduced in .NET 8 such as Half and Int128, as well as user-defined types that implement IParsable, can be used for high-performance processing!

However, for basic types like int, the Source Generator already knows it’s an int, so it directly executes int.TryParse.

As for value binding, it also supports params arrays and default values.

ConsoleApp.Run(args, (
    [Argument]DateTime dateTime,  // Argument
    [Argument]Guid guidvalue,     //
    int intVar,                   // required
    bool boolFlag,                // flag
    MyEnum enumValue,             // enum
    int[] array,                  // array
    MyClass obj,                  // object
    string optional = "abcde",    // optional
    double? nullableValue = null, // nullable
    params string[] paramsArray
    ) => { });

C# 12 has just added the ability to use default values and params in lambda expressions, which is reflected here.

Defining with Document Comments

In the past, or in other frameworks, adding Description and Alias was done using attributes. However, assigning attributes to each parameter of a method, especially with quite long strings, makes the method significantly less readable.

So ConsoleAppFramework decided to utilize document comments.

class Commands
{
    /// 
    /// Display Hello.
    /// 

    /// -m, Message to show.
    public static void Hello(string message) => Console.Write($"Hello, {message}");
}

This becomes a command like:

Usage: [options...] [-h|--help] [--version]

Display Hello.

Options:
  -m|--message     Message to show. (Required)

With document comments, it’s possible to maintain a natural appearance even with many arguments. Being able to take this approach is a strength of the Source Generator approach, as the .xml file is not needed and comments can be read directly from the code. (However, some hacks were needed to make document comments readable in all environments with Source Generators)

Adding Multiple Commands

ConsoleApp.Run is a shortcut for a single command, but it's also possible to add multiple commands and nested subcommands. For example, let's look at the generation example when the following configuration is made.

var app = ConsoleApp.Create();

app.Add("foo", () => { });
app.Add("foo bar", (int x, int y) => { });
app.Add("foo bar barbaz", (DateTime dateTime) => { });
app.Add("foo baz", async (string foo = "test", CancellationToken cancellationToken = default) => { });

app.Run(args);

The Add in this code is expanded as follows. The Source Generator knows the types of all the lambda expressions being added, so it assigns them to fields with unique types.

partial struct ConsoleAppBuilder
{
    Action command0 = default!;
    Action command1 = default!;
    Action command2 = default!;
    Func command3 = default!;

    partial void AddCore(string commandName, Delegate command)
    {
        switch (commandName)
        {
            case "foo":
                this.command0 = Unsafe.As(command);
                break;
            case "foo bar":
                this.command1 = Unsafe.As>(command);
                break;
            case "foo bar barbaz":
                this.command2 = Unsafe.As>(command);
                break;
            case "foo baz":
                this.command3 = Unsafe.As>(command);
                break;
            default:
                break;
        }
    }
}

This prevents the need for arrays to hold Delegates and the reflection/boxing overhead of invoking them as Delegates.

In Run, a switch with constant strings is embedded to select the command from string[] args.

partial void RunCore(string[] args)
{
    if (args.Length == 0)
    {
        ShowHelp(-1);
        return;
    }
    switch (args[0])
    {
        case "foo":
            if (args.Length == 1)
            {
                RunCommand0(args, args.AsSpan(1), command0);
                return;
            }
            switch (args[1])
            {
                case "bar":
                    if (args.Length == 2)
                    {
                        RunCommand1(args, args.AsSpan(2), command1);
                        return;
                    }
                    switch (args[2])
                    {
                        case "barbaz":
                            RunCommand2(args, args.AsSpan(3), command2);
                            break;
                        default:
                            RunCommand1(args, args.AsSpan(2), command1);
                            break;
                    }
                    break;
                case "baz":
                    RunCommand3(args, args.AsSpan(2), command3);
                    break;
                default:
                    RunCommand0(args, args.AsSpan(1), command0);
                    break;
            }
            break;
        default:
            ShowHelp(-1);
            break;
    }
}

The fastest way in C# to jump from a string to specific code is to use a switch with string constants. The expanded algorithm has been revised several times, and in C# 12, as Performance: faster switch over string objects · Issue #56374 · dotnet/roslyn, it first checks the length and then narrows down to a single character where the difference exists to match.

This is faster than matching from Dictionary and has no initialization time or allocations, which is the strength of being able to leverage the C# compiler. Such processing can only be done with the Source Generator approach that outputs C# code itself. So it's absolutely the fastest.

DI, CancellationToken, and Lifetime

In addition to parameters that become valid as command parameters, arguments can also define types that you want to pass via DI (such as ILogger or Option) and special handling types such as ConsoleAppContext and CancellationToken.

Receiving via DI is effective in situations where the console application wants to share configuration files with ASP.NET projects. Forsuch cases, integration with Microsoft.Extensions.Hosting is possible.

Also, when CancellationToken is passed, lifetime management as a console application that hooks SIGINT/SIGTERM/SIGKILL (Ctrl+C) becomes active.

await ConsoleApp.RunAsync(args, async (int foo, CancellationToken cancellationToken) =>
{
    await Task.Delay(TimeSpan.FromSeconds(5), cancellationToken);
    Console.WriteLine($"Foo: {foo}");
});

The above code is expanded as follows:

using var posixSignalHandler = PosixSignalHandler.Register(ConsoleApp.Timeout);
var arg0 = posixSignalHandler.Token;

await Task.Run(() => command(arg0!)).WaitAsync(posixSignalHandler.TimeoutToken);

Using PosixSignalRegistration added in .NET 6, it hooks SIGINT/SIGTERM/SIGKILL and cancels the CancellationToken. At the same time, it suppresses immediate termination (normally pressing Ctrl + C causes an immediate Abort, but it no longer Aborts).

This leaves room for the application to properly handle the CancellationToken.

However, if the CancellationToken is not handled, it simply ignores the termination command, which is troublesome in itself, so a forced termination timeout is set. By default, it is set to 5 seconds, but this can be freely changed with the ConsoleApp.Timeout property. If you want to turn off forced termination, specify ConsoleApp.Timeout = Timeout.InfiniteTimeSpan.

Task.WaitAsync is from .NET 6. In addition to passing a TimeSpan, it’s also possible to pass a CancellationToken, allowing conditions such as firing WaitAsync after PosixSignalRegistration fires, then after a timeout, rather than a simple few seconds later.

Filter Pipeline

ConsoleAppFramework adopts Filters as a mechanism to hook before and after execution. Also known as the middleware pattern, it’s a pattern often seen in languages that support async/await.

internal class NopFilter(ConsoleAppFilter next) : ConsoleAppFilter(next) // ctor needs `ConsoleAppFilter next` and call base(next)
{
    // implement InvokeAsync as filter body
    public override async Task InvokeAsync(ConsoleAppContext context, CancellationToken cancellationToken)
    {
        try
        {
            /* on before */
            await Next.InvokeAsync(context, cancellationToken); // invoke next filter or command body
            /* on after */
        }
        catch
        {
            /* on error */
            throw;
        }
        finally
        {
            /* on finally */
        }
    }
}

This design pattern is truly excellent, and if you need to provide a mechanism to hook execution, I highly recommend adopting this pattern. If async/await existed in the GoF era, it would have been included as an important design pattern.

The README introduces logging execution time, customizing ExitCode, prohibiting multiple executions, and authentication processing as things that can be done with filters. The wonderfulness of being able to realize various processes with a single Task InvokeAsync.

There are various approaches to designing filters, but ConsoleAppFramework chose the method that yields the highest performance. By receiving Next in the constructor and determining all the filters to be used statically at code generation time (dynamic addition is not allowed), everything is embedded and assembled.

app.UseFilter();
app.UseFilter();
app.UseFilter();
app.UseFilter();
app.UseFilter();

// The above code will generate the following code:

sealed class Command0Invoker(string[] args, Action command) : ConsoleAppFilter(null!)
{
    public ConsoleAppFilter BuildFilter()
    {
        var filter0 = new NopFilter(this);
        var filter1 = new NopFilter(filter0);
        var filter2 = new NopFilter(filter1);
        var filter3 = new NopFilter(filter2);
        var filter4 = new NopFilter(filter3);
        return filter4;
    }

    public override Task InvokeAsync(ConsoleAppContext context, CancellationToken cancellationToken)
    {
        return RunCommand0Async(context.Arguments, args, command, context, cancellationToken);
    }
}

This avoids intermediate array allocations and lambda capture allocations, with only the number of filters + 1 (wrapping the method body) as the additional cost. Also, if the return value Task completes synchronously, something equivalent to Task.Completed is used, so there’s no need to make it a ValueTask.

Writing code that only receives Next in the constructor and passes it to base has become easy thanks to primary constructors in C# 12.

Command-line Argument Syntax

Apart from being passed to string[] args as space-separated, command-line arguments are completely free. It's somewhat assumed that -- or - are parameter identifiers, but in reality, anything goes, and in Windows, even / is often used.

That said, there are some standard rules to some extent. The most well-known are probably the POSIX standard and its extension, the GNU Coding Standards. ConsoleAppFramework also follows the POSIX standard to some extent and includes the --version and --help defined in the GNU Coding Standards as built-in options. The names are also --lower-kebab-case by default.

“To some extent” means that it doesn’t fully conform to the standard. Whether it’s standards or traditional conventions, not a few old rules are unacceptable from a modern perspective. For example, distinguishing between -x and -X to have different behaviors is an absolute no-no. Or even widely used practices like bundling, where -fdx is interpreted as -f, -d, -x, are not very good in my opinion. Bundling is also problematic in terms of performance as it complicates the parsing process.

Since ConsoleAppFramework prioritizes performance, it does not adopt rules that may cause performance issues. It is designed to not distinguish between uppercase and lowercase, but since case-insensitive matching is performed after lowercase matching first, there is no practical performance degradation.

Looking at Overview of System.CommandLine command-line syntax — .NET | Microsoft Learn, it’s clear that System.CommandLine allows for quite flexible syntax interpretation. That’s a very good thing! It’s a good thing, but if it causes performance degradation, it’s a problem. And in fact, as evident from the benchmark results, the performance of System.CommandLine is very poor. This is unacceptable.

The wandering System.CommandLine seems to be decomposed again and changing its implementation. With Resetting System.CommandLine, it aims to have a small core as a POSIX standard parser adopted as standard in .NET 9 or .NET 10.

Even if they are adopted as standard, from a performance perspective, they will absolutely never surpass ConsoleAppFramework.

Compatibility with v4

Breaking changes! Not shying away from breaking changes is a good thing, it doesn’t hinder innovation, it’s necessary to remain cutting-edge. Running at the forefront of C# is also part of Cysharp’s identity. At the same time, of course, it’s a huge inconvenience. This change from v4 to v5 is like the change from .NET Framework to .NET Core, or from ASP.NET to ASP.NET Core, so it can’t be helped, it was an absolutely necessary change…

However, in reality, it hasn’t changed that much. The name conversion logic (lower-kebab-case) uses the same logic, so there’s no concern about names going out of sync. It’s just a matter of mapping the method names that cause compile errors. That happens quite often, right?

var app = ConsoleApp.Create(args); app.Run(); -> var app = ConsoleApp.Create(); app.Run(args);
app.AddCommand/AddSubCommand -> app.Add(string commandName)
app.AddRootCommand -> app.Add("")
app.AddCommands -> app.Add
app.AddSubCommands -> app.Add(string commandPath)
app.AddAllCommandType -> NotSupported(use Add manually)
[Option(int index)] -> [Argument]
[Option(string shortName, string description)] -> Xml Document Comment
ConsoleAppFilter.Order -> NotSupported(global -> class -> method declrative order)
ConsoleAppOptions.GlobalFilters -> app.UseFilter

Overall, I think the specification changes can be considered as simplifications, in other words, “improvements”.

Also, not relying on Microsoft.Extensions.Hosting by default is a big difference, but it can be resolved by adding one line. Riding on top of Hosting means using the ServiceProvider generated by Hosting, that's all. In reality, there's also Lifetime management, but ConsoleAppFramework handles that on its own, so in practical terms, there's no difference as long as you pass the ServiceProvider for DI.

using var host = Host.CreateDefaultBuilder().Build(); // use using for host lifetime
ConsoleApp.ServiceProvider = host.ServiceProvider;

In v4, ConsoleAppBase had to be inherited, but in v5, POCO is sufficient. Instead, please receive ConsoleAppContext and CancellationToken via constructor injection. This has also become less troublesome thanks to primary constructors in C# 12. This is another reason for abandoning the mechanism that requires a base class.

True Incremental Generator

Incremental Generators, if you just create them without any consideration, don’t actually become Incremental.

The first thing to do is to make it visible whether it is Incremental or not. Normally, the internal state is completely invisible when running, so it’s important to make it possible to check the state in unit tests. For example, a unit test like this is written.

    [Fact]
    public void RunLambda()
    {
        var step1 = """
using ConsoleAppFramework;

ConsoleApp.Run(args, int () => 0);
""";

        var step2 = """
using ConsoleAppFramework;

ConsoleApp.Run(args, int () => 100); // body change

Console.WriteLine("foo"); // unrelated line
""";

        var step3 = """
using ConsoleAppFramework;

ConsoleApp.Run(args, int (int x, int y) => 100); // change signature

Console.WriteLine("foo");
""";

        var reasons = CSharpGeneratorRunner.GetIncrementalGeneratorTrackedStepsReasons("ConsoleApp.Run.", step1, step2, step3);

        reasons[0][0].Reasons.Should().Be("New");
        reasons[1][0].Reasons.Should().Be("Unchanged");
        reasons[2][0].Reasons.Should().Be("Modified");

        VerifySourceOutputReasonIsCached(reasons[1]);
        VerifySourceOutputReasonIsNotCached(reasons[2]);
    }

When you run the Driver with the trackIncrementalGeneratorSteps: true option for an Incremental Generator, the state of each step becomes visible. IncrementalStepRunReason has states like New, Unchanged, Modified, Cached, and Removed, and if the step before the final output is Unchanged or Cached, the output processing is skipped.

In the above unit test, step2 only has changes in parts that don’t affect the output code, so it’s Unchanged. So the final stage was Cached. step3 has changes that require regeneration, so it’s Modified and runs through the source code generation process.

IncrementalStepRunReason can be retrieved from TrackedSteps, but it's a bit too hard to read as is, so it's formatted to make it easier to check, which is the GetIncrementalGeneratorTrackedStepsReasons utility method.

public static (string Key, string Reasons)[][] GetIncrementalGeneratorTrackedStepsReasons(string keyPrefixFilter, params string[] sources)
{
    var parseOptions = new CSharpParseOptions(LanguageVersion.CSharp12); // 12
    var driver = CSharpGeneratorDriver.Create(
        [new ConsoleAppGenerator().AsSourceGenerator()],
        driverOptions: new GeneratorDriverOptions(IncrementalGeneratorOutputKind.None, trackIncrementalGeneratorSteps: true))
        .WithUpdatedParseOptions(parseOptions);

    var generatorResults = sources
        .Select(source =>
        {
            var compilation = baseCompilation.AddSyntaxTrees(CSharpSyntaxTree.ParseText(source, parseOptions));
            driver = driver.RunGenerators(compilation);
            return driver.GetRunResult().Results[0];
        })
        .ToArray();

    var reasons = generatorResults
        .Select(x => x.TrackedSteps
            .Where(x => x.Key.StartsWith(keyPrefixFilter) || x.Key == "SourceOutput")
            .Select(x =>
            {
                if (x.Key == "SourceOutput")
                {
                    var values = x.Value.Where(x => x.Inputs[0].Source.Name?.StartsWith(keyPrefixFilter) ?? false);
                    return (
                        x.Key,
                        Reasons: string.Join(", ", values.SelectMany(x => x.Outputs).Select(x => x.Reason).ToArray())
                    );
                }
                else
                {
                    return (
                        Key: x.Key.Substring(keyPrefixFilter.Length),
                        Reasons: string.Join(", ", x.Value.SelectMany(x => x.Outputs).Select(x => x.Reason).ToArray())
                    );
                }
            })
            .OrderBy(x => x.Key)
            .ToArray())
        .ToArray();

    return reasons;
}

It’s a mess and hard to understand, meaning that TrackedSteps itself is really hard to understand as is. Since TrackedSteps is an ImmutableDictionary, the enumeration order is random and hard to check, so I added numbering and sorting. Also, when multiple RegisterSourceOutputs are running (ConsoleAppFramework has two types: Run-based and Builder-based), it becomes confusing when they get mixed up, so I added filtering by keyPrefix.

Summary

Originally, ConsoleAppFramework was unique among Cysharp’s product lines in that it didn’t prioritize performance. It was built around the concept of integrating with Hosting, which was rare at the time, to create a CLI framework, and achieved some success. There were a few revisions that made Help richer and allowed writing in a Minimal API-like style, but the clunkiness became noticeable.

In particular, Cocona is a truly excellent library that was influenced by ConsoleAppFramework while offering more flexibility and powerful features. At this rate, ConsoleAppFramework would be just an inferior version, which was a concern. It’s painful to not be able to recommend it with confidence as the best. After all, the creator of Cocona is a colleague at Cysharp…

So this time, while taking influence from some of Cocona’s APIs (like [Argument]), I strived to make it a framework with a completely different character. As explained in the parsing section, ConsoleAppFramework v5 sacrifices some flexibility for performance, so if you need rich functionality, I recommend using System.CommandLine or Cocona.

Also, from a performance perspective, the longer the actual execution time, the less the framework overhead matters. If the processing takes 10 minutes, 1 minute, or even 10 seconds, whether the framework portion takes 1ms or 50ms is like a margin of error. This is true even for JIT compilation, but in recent times with complaints about Native AOT and cold startup speed, it’s not something that can be dismissed outright, and it’s certainly better to be faster.

While the advantages of performance and zero dependencies are obvious, I believe it has also become a unique and interesting framework in terms of approach and design. Please give it a try! Of course, it’s also extremely practical, so you could consider it an essential library without hesitation!

https://github.com/Cysharp/ConsoleAppFramework

How to create a modern C# web API client: An example implementation of the C# SDK for Anthropic…

Yoshifumi Kawai — Sun, 24 Mar 2024 21:21:25 GMT

How to create a modern C# web API client: An example implementation of the C# SDK for Anthropic Claude

Anthropic Claude 3, a recently emerged rising star among LLMs, has exceptionally high performance and surpasses GPT-4! I am greatly impressed by it. Therefore, I wanted to use it with C#, but since there was no SDK available, I created an unofficial one. The library is named Claudia, derived from Claude. It can be used across the .NET ecosystem, and I have confirmed its functionality in both Unity Runtime and Editor, so I believe it can be utilized in various ways depending on your ideas.

GitHub — Cysharp/Claudia

To give you an idea of what style of Web API SDK you can create in C#, please take a look at the Claudia usage example first.

The primary design principle in creating this SDK was to make it as similar as possible to the official Python SDK and TypeScript SDK. This is because the explanations in the documentation will be based on these official SDKs, and many articles in the world will also be based on the official SDKs. You may also want to use the official prompt library with API requests.

In such cases, if the API style is different, it will require cognitive load for conversion. Although it’s a trivial matter, it’s crucial and can be a stumbling block, so we thoroughly remove it. On top of that, balancing C#-ness without forcibly introducing dynamic elements is important in the design.

The appearance of the C# client looks like this:

// C#
using Claudia;

var anthropic = new Anthropic();

var message = await anthropic.Messages.CreateAsync(new()
{
    Model = "claude-3-opus-20240229",
    MaxTokens = 1024,
    Messages = [new() { Role = "user", Content = "Hello, Claude" }]
});

Console.WriteLine(message);

For comparison, the TypeScript version looks like this:

// TypeScript
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();

const message = await anthropic.messages.create({
    model: 'claude-3-opus-20240229',
    max_tokens: 1024,
    messages: [{ role: 'user', content: 'Hello, Claude' }],
});

console.log(message.content);

They are quite similar, right? On top of that, the C# version doesn’t use dynamic or Dictionary, and everything is specified with typed objects. The example above utilizes Target-typed new expressions added in C# 9.0 and Collection expressions added in C# 12, which are assumed to exist and are used to match the API nicely.

Often, APIs of dynamically typed languages appear (visually) simpler and easier to use, so being able to write with the same level of simplicity while being properly typed is a significant strength of modern C#. (The reason I decided to match the official TypeScript SDK in the first place was that I thought the API style of the official SDK was well-designed from my perspective; if it were terrible, I wouldn’t have attempted to match it.)

Streaming and Blazor

The Streaming API is also available, and when combined with Blazor, it’s easy to create a real-time updating Chat UI. The code is really just this, with the method body being just over 10 lines!

[Inject]
public required Anthropic Anthropic { get; init; }

double temperature = 1.0;
string textInput = "";
string systemInput = SystemPrompts.Claude3;
List chatMessages = new();

async Task SendClick()
{
    chatMessages.Add(new() { Role = Roles.User, Content = textInput });

    var stream = Anthropic.Messages.CreateStreamAsync(new()
    {
        Model = Models.Claude3Opus,
        MaxTokens = 1024,
        Temperature = temperature,
        System = string.IsNullOrWhiteSpace(systemInput) ? null : systemInput,
        Messages = chatMessages.ToArray()
    });

    var currentMessage = new Message { Role = Roles.Assistant, Content = "" };
    chatMessages.Add(currentMessage);

    textInput = "";
    StateHasChanged();

    await foreach (var messageStreamEvent in stream)
    {
        if (messageStreamEvent is ContentBlockDelta content)
        {
            currentMessage.Content[0].Text += content.Delta.Text;
            StateHasChanged();
        }
    }
}

All request/response types are serializable with System.Text.Json.JsonSerializer, so serializing this List as-is will save it, and deserializing it will load it.

Function Calling

Claudia is not just an SDK that requests a REST API. It utilizes Source Generators to provide a mechanism for easily defining Function Calling.

What are the benefits of Function Calling? Currently, there are several things that LLMs can’t do on their own. For example, calculation is an area where they often return plausible-looking answers, and while you can improve the accuracy of plausibility by having them think step-by-step, they can’t perform accurate calculations (when given complex calculations, they tend to give answers that look correct but are wrong). In that case, if calculation is needed, you can simply use a calculator to calculate and create sentences based on that answer. They also can’t answer the current date and time. If you ask them to summarize or translate a specified web page, they will say they can’t see the contents. Function Calling solves these issues.

First, as an example, let’s define a function that returns a specified URL’s web page to Claude.

public static partial class FunctionTools
{
    /// 
    /// Retrieves the HTML from the specified URL.
    /// 

    /// The URL to retrieve the HTML from.
    [ClaudiaFunction]
    static async Task GetHtmlFromWeb(string url)
    {
        using var client = new HttpClient();
        return await client.GetStringAsync(url);
    }
}

The function defined with [ClaudiaFunction] generates various things through the Source Generator. To use this, it will be as follows:

var input = new Message
{
    Role = Roles.User,
    Content = """
        Could you summarize this page in three lines?
        https://docs.anthropic.com/claude/docs/intro-to-claude
"""
};

var message = await anthropic.Messages.CreateAsync(new()
{
    Model = Models.Claude3Haiku,
    MaxTokens = 1024,
    System = FunctionTools.SystemPrompt, // set generated prompt
    StopSequences = [StopSequnces.CloseFunctionCalls], // set  as stop sequence
    Messages = [input],
});

var partialAssistantMessage = await FunctionTools.InvokeAsync(message);

var callResult = await anthropic.Messages.CreateAsync(new()
{
    Model = Models.Claude3Haiku,
    MaxTokens = 1024,
    System = FunctionTools.SystemPrompt,
    Messages = [
        input,
        new() { Role = Roles.Assistant, Content = partialAssistantMessage! } // set as Assistant
    ],
});

// The page can be summarized in three lines:
// 1. Claude is a family of large language models developed by Anthropic designed to revolutionize the way you interact with AI.
// 2. This documentation is designed to help you get the most out of Claude, with clear explanations, examples, best practices, and links to additional resources.
// 3. Claude excels at a wide variety of tasks involving language, reasoning, analysis, coding, and more, and the documentation covers key capabilities, getting started with prompting, and using the API.
Console.WriteLine(callResult);

Two requests are made to Claude. First, in the initial request to Claude, the question is sent along with a list and description of available functions. If it is determined that executing a function is optimal, the function name and parameters to be executed are returned. After that, executing the function locally and passing the result back to Claude yields the desired final result.

So what is the Source Generator doing? First, it generates FunctionTools.SystemPrompt that is passed to Claude's system text, and its contents are as follows (partially omitted).


    
        GetHtmlFromWeb
        Retrieves the HTML from the specified URL.
        
            
                url
                string
                The URL to retrieve the HTML from.

It’s XML. Claude is designed to recognize XML tags, and using XML tags is considered a best practice when you want to provide clear information systematically. Therefore, it automatically generates XML to pass from C# functions to Claude. You wouldn’t want to write this by hand, would you?

Claude then returns a result like the following in response to that request.


    
        GetHtmlFromWeb
        
            https://docs.anthropic.com/claude/docs/intro-to-claude

Again, it’s XML (the closing tag is missing because it’s stopped by StopSequences. No further information is needed if you want to call a function, so it’s cut off). The Source Generator generates the FunctionTools.InvokeAsync method to parse this, execute the function (GetHtmlFromWeb), and pass it to Claude. The actually generated InvokeAsync method looks like this:

public static async ValueTask InvokeAsync(MessageResponse message)
{
    var content = message.Content.FirstOrDefault(x => x.Text != null);
    if (content == null) return null;

    var text = content.Text;
    var tagStart = text .IndexOf("");
    if (tagStart == -1) return null;

    var functionCalls = text.Substring(tagStart) + "";
    var xmlResult = XElement.Parse(functionCalls);

    var sb = new StringBuilder();
    sb.AppendLine(functionCalls);
    sb.AppendLine("");

    foreach (var item in xmlResult.Elements("invoke"))
    {
        var name = (string)item.Element("tool_name")!;
        switch (name)
        {
            case "GetHtmlFromWeb":
                {
                    var parameters = item.Element("parameters")!;

                    var _0 = (string)parameters.Element("url")!;

                    BuildResult(sb, "GetHtmlFromWeb", await GetHtmlFromWeb(_0).ConfigureAwait(false));
                    break;
                }

            default:
                break;
        }
    }

    sb.Append(""); // final assistant content cannot end with trailing whitespace

    return sb.ToString();

    static void BuildResult(StringBuilder sb, string toolName, T result)
    {
        sb.AppendLine(@$"    
    {toolName}
    {result}
");
    }
}

You wouldn’t want to write this by hand. Especially as the number of functions you want to call increases, it becomes more and more difficult.

By invoking & generating XML and passing it back to Claude as the initial output result by the Assistant, you can obtain the desired answer. This technique is officially introduced as one of the best practices in Prefill Claude’s response and is beneficial for guiding Claude’s responses in the desired direction. For example, if you return { as a prefill response, the probability of Claude outputting the result as JSON increases dramatically.

vs Semantic Kernel

It seems that C# users, in particular, tend to utilize Semantic Kernel for everything, but the functionality of Semantic Kernel is a bit excessive. If you are a C# engineer, it’s better to handle data storage and many other features on your own.

The User Guides in Claude’s API documentation are clear and excellent. Regardless of the framework you go through, ultimately what gets executed is the Raw API. Instead of a generic abstraction, I think it’s good to focus specifically on Claude and consider how to leverage its distinctive XML-based instructions.

How to Create a Modern Web API Client

From here, we’ll discuss how to design a modern API client based on Claudia’s design.

First, use HttpClient as the communication foundation. It’s the only choice. There’s no room for objection. Even Grpc.Net.Client uses HttpClient for HTTP/2 gRPC communication. Like it or not, the foundation of all HTTP-based communication is HttpClient.

Here, it’s a good idea to allow accepting HttpMessageHandler from the outside.

public class Anthropic : IMessages, IDisposable
{
    readonly HttpClient httpClient;

    // Make it public to allow changes to DefaultRequestHeaders and BaseAddress
    public HttpClient HttpClient => httpClient;

    public Anthropic()
        : this(new HttpClientHandler(), true)
    {
    }

    public Anthropic(HttpMessageHandler handler)
        : this(handler, true)
    {
    }

    public Anthropic(HttpMessageHandler handler, bool disposeHandler)
    {
        this.httpClient = new HttpClient(handler, disposeHandler);
    }

    public void Dispose()
    {
        httpClient.Dispose();
    }
}

HttpClient is actually just a shell, and the entity is HttpMessageHandler. HttpMessageHandler can do various things, such as implementing DelegatingHandler to hook the before and after of requests, and Cysharp/YetAnotherHttpHandler replaces the entire communication processing with a Rust implementation in the form of a HttpMessageHandler implementation. In cases where you want to use UnityWebRequest instead of the .NET runtime’s communication implementation in Unity, you can use UnityWebRequestHttpMessageHandler.cs to replace the entire communication processing with Unity’s implementation.

Let’s also work on how to split the interfaces.

A two-level invocation style like client.Messages.CreateAsync, similar to .Controller.Method in MVC, is an intuitive and easy-to-use design. In particular, it's nice that it's friendly to input completion. To achieve this, first split the interface, but as a trick, make it an explicit interface implementation and return the interface itself with return this;.

public interface IMessages
{
    Task CreateAsync(MessageRequest request, RequestOptions? overrideOptions = null, CancellationToken cancellationToken = default);
    IAsyncEnumerable CreateStreamAsync(MessageRequest request, RequestOptions? overrideOptions = null, CancellationToken cancellationToken = default);
}

public class Anthropic : IMessages, IDisposable
{
    public IMessages Messages => this;

    async Task IMessages.CreateAsync(MessageRequest request, RequestOptions? overrideOptions, CancellationToken cancellationToken)
    {
        // ...
    }

    async IAsyncEnumerable IMessages.CreateStreamAsync(MessageRequest request, RequestOptions? overrideOptions, [EnumeratorCancellation] CancellationToken cancellationToken)    
    {
        // ...
    }
}

This way, there’s no allocation when going down one level (because it returns this), and since it’s an explicit implementation, it doesn’t appear in input completion at the top level, making it easy to use, performant, and easy to implement (because you can directly access all the client’s fields).

User-Friendly Request Type Generation

The Anthropic request types are quite organized and have a specification that is friendly to typed languages, but there are some parts that are either single string or an array of content blocks. It's a bit troublesome to have either/or, but it's not like Option>> or anything like that. If you define it that way, the API client's feel will be terrible. If you think about it, in this case of the Anthropic API, a string is equivalent to a string content of length 1.

// Instead of this
Content = [ new() { Type = "text", Text = "Hello, Claude" }]

// I want to write like this
Content = "Hello, Claude"

I think this is a good specification. It’s tedious to dogmatically write Type = “text”, Text = “…”. 95% of the usage will probably be single string content (Type can also be image, in which case the binary base64 string is set in Source; it’s an array to pass both images and text).

Let’s implement that specification in C#. In this case, it’s like normalizing, so I implemented it with implicit conversion.

public record class Message
{
    /// 
    /// user or assistant.
    /// 

    [JsonPropertyName("role")]
    public required string Role { get; set; }

    /// 
    /// single string or an array of content blocks.
    /// 

    [JsonPropertyName("content")]
    public required Contents Content { get; set; }
}

public class Contents : Collection
{
    public static implicit operator Contents(string text)
    {
        var content = new Content
        {
            Type = ContentTypes.Text,
            Text = text
        };
        return new Contents { content };
    }
}

Instead of Content[], I made it a custom collection and generated single string content from its implicit conversion from a string. It's not even the latest C# feature, but a technique that has been around for a long time. Reckless use is strictly prohibited, but utilizing it in such places is effective for improving the feel of the API client.

Timeout

Timeout is a common process, so it’s better to make it easily configurable by the user in the API client. However, since HttpClient has a Timeout property, it’s usually sufficient to set it. However, in Claudia, it’s intentionally disabled.

public class Anthropic : IMessages, IDisposable
{
    public TimeSpan Timeout { get; init; } = TimeSpan.FromMinutes(10);

    public Anthropic(HttpMessageHandler handler, bool disposeHandler)
    {
        this.httpClient = new HttpClient(handler, disposeHandler);
        this.httpClient.Timeout = System.Threading.Timeout.InfiniteTimeSpan;
    }
}

This is because the official Anthropic client has a specification that allows overriding the timeout setting for each method call, so I followed that specification. HttpClient or equivalent calls should be thread-safe (in fact, API clients may be registered as Singleton), so it’s not good to manipulate the properties of HttpClient in SendAsync. Therefore, the Timeout of HttpClient is disabled and processed manually.

The implementation method is to generate a LinkedTokenSource, create a CancellationToken that gets canceled after the timeout duration using CancelAfter, and pass it to HttpClient.SendAsync. This is the same as the internal implementation when HttpClient.Timeout has a timeout duration.

// The actual code is mixed with retry processing, so it's slightly different
async Task RequestWithAsync(HttpRequestMessage message, CancellationToken cancellationToken, RequestOptions? overrideOptions)
{
    var timeout = overrideOptions?.Timeout ?? Timeout;
    using (var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken))
    {
        cts.CancelAfter(timeout);

        try
        {
            var result = await httpClient.SendAsync(message, HttpCompletionOption.ResponseHeadersRead, cancellationToken).ConfigureAwait(ConfigureAwait);
            return result;
        }
        catch (OperationCanceledException ex) when (ex.CancellationToken == cts.Token)
        {
            if (cancellationToken.IsCancellationRequested)
            {
                throw new OperationCanceledException(ex.Message, ex, cancellationToken);
            }
            else
            {
                throw new TimeoutException($"The request was canceled due to the configured Timeout of {Timeout.TotalSeconds} seconds elapsing.", ex);
            }

            throw;
        }
    }
}

Be careful with error handling when cancellation actually occurs (OperationCanceledException is thrown). First, you need to strip the LinkedToken. If passed through as-is, the Token of OperationCanceledException remains the LinkedToken, but this cannot be used to determine the cause of cancellation on the upstream side. If the cause of cancellation was the cancellation of the passed CancellationToken, create a new OperationCanceledException and change the cancellation reason Token.

If it was a timeout, it’s better to throw a TimeoutException instead of an OperationCanceledException. Note that if you use the timeout implementation of HttpClient, it throws a TaskCanceledException due to historical reasons (apparently, they wanted to change it but couldn't due to compatibility; it's not a very good design, so you don't need to follow it).

Retry

There may be some debate as to whether the API client itself should have retry functionality. However, it’s not as simple as just catching an exception when it occurs and retrying; you need to first distinguish between what can be retried and what cannot. For example, if authentication fails or the JSON thrown into the request is corrupted, retrying is pointless no matter how many times you do it, so it shouldn’t be retried. However, since such detailed conditions are only known to the API client itself, it’s good to incorporate retry processing.

In Claudia, following the official client, 408 Request Timeout, 409 Conflict, 429 Rate Limit, and >=500 Internal errors are targeted for retry. Authentication failure (PermissionError(403)) or invalid request content (InvalidRequestError(400)) are not retried. The frequently occurring OverloadedError (error indicating that the result couldn’t be returned due to overload) is 529, which is resolved by hitting it a few times, so it’s retried.

The retry logic also follows the official client. If the response header has retry-after-ms or retry-after, it follows that, and if not (or if retry-after is larger than the specified value), the interval is controlled by Exponential Backoff with jitter.

Cancellation

The client side does not have a .Cancel() method or similar. This is because, in accordance with HttpClient, the client itself can be used almost like a singleton and shared across each call (it may be injected as a singleton by DI, depending on the case). Therefore, instead of .Cancel(), which affects everything, pass a CancellationToken to each call.

Ultra-Fast Parsing of Server Sent Events

The API for retrieving responses by streaming uses the server-sent events specification and is sent via streaming. Specifically, text messages like the following are received.

event: message_start
data: {"type":"message_start","message":...}

event: content_block_start
data: {"type":"content_block_start","index":...}

It’s a repetition of event: event name, data: JSON, and so on. Now, when it comes to newline-delimited text messages, using StreamReader and ReadLine is the correct answer, but it’s the wrong answer in modern C#.

ReadLine generates a string. To convert directly from UTF8 data for event name determination or eventually deserializing the JSON of data into an object, you can avoid using strings. In other words, zero allocation can be aimed for (except for generating objects to pass to the user). If you just don’t pass through strings. Therefore, StreamReader has no role to play.

Let’s look at the specific code. We’ll divide it into the first half (preparation) and the second half (parsing part).

internal class StreamMessageReader
{
    readonly PipeReader reader;
    readonly bool configureAwait;
    MessageStreamEventKind currentEvent;

    public StreamMessageReader(Stream stream, bool configureAwait)
    {
        this.reader = PipeReader.Create(stream);
        this.configureAwait = configureAwait;
    }

    public async IAsyncEnumerable ReadMessagesAsync([EnumeratorCancellation] CancellationToken cancellationToken)
    {
    READ_AGAIN:
        var readResult = await reader.ReadAsync(cancellationToken).ConfigureAwait(configureAwait);

        if (!(readResult.IsCompleted | readResult.IsCanceled))
        {
            var buffer = readResult.Buffer;

            while (TryReadData(ref buffer, out var streamEvent))
            {
                yield return streamEvent;
                if (streamEvent.TypeKind == MessageStreamEventKind.MessageStop)
                {
                    yield break;
                }
            }

            reader.AdvanceTo(buffer.Start, buffer.End);
            goto READ_AGAIN;
        }
    }

First, pass the Stream to System.IO.Pipelines.PipeReader. The Stream in this case is an unstable Stream streamed from the server over the network, so buffer management is difficult. PipeReader/PipeWriter has some quirks, but it takes care of that management nicely and is a very important library in modern C#.

The basic flow is to read the buffer (ReadAsync), parse it line by line (TryReadData) and yield return the object if it’s in a state where parsing is possible (the end of the line is not included, so it can’t be parsed), mark it up to the read part with AdvanceTo if the buffer is insufficient, and then ReadAsync again.

The user side was shown in the Blazor sample, but the basic approach is to enumerate with await foreach.

await foreach (var messageStreamEvent in Anthropic.Messages.CreateStreamAsync())
{
}

IAsyncEnumerable is very well-suited for streaming processing involving networks like this, and it has become much easier for the data source side to return an asynchronous sequence with yield return. It would be impossible to go back to the days when this didn’t exist.

Next is the second half, the processing to parse from the buffer decomposed by PipeReader.

[SkipLocalsInit]
bool TryReadData(ref ReadOnlySequence buffer, [NotNullWhen(true)] out IMessageStreamEvent? streamEvent)
{
    var reader = new SequenceReader(buffer);
    Span tempBytes = stackalloc byte[64]; // alloc temp
    
    while (reader.TryReadTo(out ReadOnlySequence line, (byte)'\n', advancePastDelimiter: true))
    {
        if (line.Length == 0)
        {
            continue; // next.
        }
        else if (line.FirstSpan[0] == 'e') // event
        {
            // Parse Event.
            if (!line.IsSingleSegment)
            {
                line.CopyTo(tempBytes);
            }
            var span = line.IsSingleSegment ? line.FirstSpan : tempBytes.Slice(0, (int)line.Length);

            var first = span[7]; // "event: [c|m|p|e]"

            if (first == 'c') // content_block_start/delta/stop
            {
                switch (span[23]) // event: content_block_..[]
                {
                    case (byte)'a': // st[a]rt
                        currentEvent = MessageStreamEventKind.ContentBlockStart;
                        break;
                    case (byte)'o': // st[o]p
                        currentEvent = MessageStreamEventKind.ContentBlockStop;
                        break;
                    case (byte)'l': // de[l]ta
                        currentEvent = MessageStreamEventKind.ContentBlockDelta;
                        break;
                    default:
                        break;
                }
            }
            else if (first == 'm') // message_start/delta/stop
            {
                switch (span[17]) // event: message_..[]
                {
                    case (byte)'a': // st[a]rt
                        currentEvent = MessageStreamEventKind.MessageStart;
                        break;
                    case (byte)'o': // st[o]p
                        currentEvent = MessageStreamEventKind.MessageStop;
                        break;
                    case (byte)'l': // de[l]ta
                        currentEvent = MessageStreamEventKind.MessageDelta;
                        break;
                    default:
                        break;
                }
            }
            else if (first == 'p')
            {
                currentEvent = MessageStreamEventKind.Ping;
            }
            else if (first == 'e')
            {
                currentEvent = (MessageStreamEventKind)(-1);
            }
            else
            {
                // Unknown Event, Skip.
                // throw new InvalidOperationException("Unknown Event. Line:" + Encoding.UTF8.GetString(line.ToArray()));
                currentEvent = (MessageStreamEventKind)(-2);
            }

            continue;
        }
        else if (line.FirstSpan[0] == 'd') // data
        {
            // Parse Data.
            Utf8JsonReader jsonReader;
            if (line.IsSingleSegment)
            {
                jsonReader = new Utf8JsonReader(line.FirstSpan.Slice(6)); // skip data: 
            }
            else
            {
                jsonReader = new Utf8JsonReader(line.Slice(6)); // ReadOnlySequence.Slice is slightly slow
            }

            switch (currentEvent)
            {
                case MessageStreamEventKind.Ping:
                    streamEvent = JsonSerializer.Deserialize(ref jsonReader, AnthropicJsonSerialzierContext.Default.Options)!;
                    break;
                case MessageStreamEventKind.MessageStart:
                    streamEvent = JsonSerializer.Deserialize(ref jsonReader, AnthropicJsonSerialzierContext.Default.Options)!;
                    break;
                // Omitted (Deserialize for MessageDela, MessageStop, ContentBlockStart, ContentBlockDelta, ContentBlockStop, error similarly)
                default:
                    // unknown event, skip
                    goto END;
            }

            buffer = buffer.Slice(reader.Consumed);
            return true;
        }
    }
END:
    streamEvent = default;
    buffer = buffer.Slice(reader.Consumed);
    return false;
}

The desired processing is to deserialize the JSON of data into an object from the two lines of event and data. The buffer doesn’t necessarily conveniently contain the two lines of event and data; it may contain only the event, only the data, or the data may be cut off (resulting in incomplete JSON). It needs to be structured so that it can be interrupted and resumed, taking these into consideration.

However, assuming that there is sufficient buffer for one line if a newline code exists, it loops with while (reader.TryReadTo(out ReadOnlySequence line, (byte)'\n', advancePastDelimiter: true)) and uses this as a substitute for StreamReader.ReadLine. This reader is a SequenceReader, a utility that supports reading from ReadOnlySequence, and since it's a ref struct, there's no allocation for the reader itself. ReadOnlySequence is a class with many pitfalls to use correctly and efficiently, so it's more convenient and safer to implement based on such utilities.

First, in parsing the event, it reads from here what type the data is. The straightforward approach would be to determine with if (span.SequenceEqual("content_block_start")). Calling SequenceEqual on Span is implemented efficiently, so it's not bad, but is a series of if statements really good? So, in Claudia, the determination is actually simplified as follows.

var first = span[7]; // "event: [c|m|p|e]"

if (first == 'c') // content_block_start/delta/stop
{
    switch (span[23]) // event: content_block_..[]
    {
        case (byte)'a': // st[a]rt
            currentEvent = MessageStreamEventKind.ContentBlockStart;
            break;
        case (byte)'o': // st[o]p
            currentEvent = MessageStreamEventKind.ContentBlockStop;
            break;
        case (byte)'l': // de[l]ta
            currentEvent = MessageStreamEventKind.ContentBlockDelta;
            break;
        default:
            break;
    }
}
else if (first == 'm') // message_start/delta/stop
{
    switch (span[17]) // event: message_..[]
    {
        case (byte)'a': // st[a]rt
            currentEvent = MessageStreamEventKind.MessageStart;
            break;
        case (byte)'o': // st[o]p
            currentEvent = MessageStreamEventKind.MessageStop;
            break;
        case (byte)'l': // de[l]ta
            currentEvent = MessageStreamEventKind.MessageDelta;
            break;
        default:
            break;
    }
}

There are 8 types of messages: content_block_start/delta/stop, message_start/delta/stop, ping, and error. First, the first character can be used to determine whether it’s a content system, message system, or other. For start/delta/stop, the third character can be used to determine. So, by checking 1 byte twice, it can be classified. It’s clearly fast! However, it should be noted that there is a non-zero possibility of the check being broken by the addition of message types in the future (for example, if content_block_fforward comes, it may be misidentified as content_block_stop). Claudia is optimistically assuming it will be fine, but it’s something to keep in mind.

This can be said to be a variation of the code in Modern High-Performance C# 2023, which I presented before.

https://medium.com/media/fd9f9af882971aab0226ef7257be0880/href

When looking at text protocols, it’s hard to resist the urge to somehow cheat the determination. If you want to do strict determination while avoiding a series of if statements, first put in a length check. Make a rough branch with the length and then do an accurate check with SequenceEqual. It’s just about doing the same thing as the optimization of swtich to string in C# (the compiler is converting it to that kind of processing!). If there are many branches, it may be a good idea to take a hash code and branch, in other words, implement an inline Dictionary.

Lastly, the data line is JSON Deserialization. To deserialize from ReadOnlySpan or ReadOnlySequence, you need to pass it through Utf8JsonReader. Note that Utf8JsonReader is also a ref struct, so it's not included in the allocation.

With this, we were able to process without going through String at all! There’s a feeling that it would be super simple if we used StreamReader, but we can’t help it because we’re suffering from a disease that makes us think we’ve lost if we go through a string.

Source Generator vs Reflection

For the implementation of Function Calling, Claudia adopted Source Generator. It was possible to create it based on reflection, but in this case, Source Generators yielded more desirable results. First, let’s compare what kind of function definition would be required if it were implemented with reflection, using the case of Semantic Kernel.

public static partial class FunctionTools
{
    // Claudia Source Generator

    /// 
    /// Retrieve the current time of day in Hour-Minute-Second format for a specified time zone. Time zones should be written in standard formats such as UTC, US/Pacific, Europe/London.
    /// 

    /// The time zone to get the current time for, such as UTC, US/Pacific, Europe/London.
    [ClaudiaFunction]
    public static string TimeOfDay(string timeZone)
    {
        var time = TimeZoneInfo.ConvertTimeBySystemTimeZoneId(DateTime.UtcNow, timeZone);
        return time.ToString("HH:mm:ss");
    }

    // Semantic Kernel

    [KernelFunction]
    [Description("Retrieve the current time of day in Hour-Minute-Second format for a specified time zone. Time zones should be written in standard formats such as UTC, US/Pacific, Europe/London.")]
    public static string TimeOfDay([Description("The time zone to get the current time for, such as UTC, US/Pacific, Europe/London.")]string timeZone)
    {
        var time = TimeZoneInfo.ConvertTimeBySystemTimeZoneId(DateTime.UtcNow, timeZone);
        return time.ToString("HH:mm:ss");
    }
}

In Function Calling, the information about the function must be given to Claude, so descriptions for both the method and parameters are required. In Claudia’s Source Generator implementation, I made it retrieve them from document comments. In Semantic Kernel, it retrieves them from the Description attribute. Document comments are more natural and easier to write. Attributes for parameters are not only harder to write but also become quite difficult to read when there are multiple parameters.

Also, with Source Generators, missing elements can be turned into compile errors as analyzers.

All checks, such as document comments not being written for all parameters or using unsupported types, can be known in real-time not only at compile-time but also at edit-time.

The drawback is that Source Generators have a higher implementation difficulty, and great care must be taken when using document comments.

To retrieve document comments on Roslyn, ISymbol.GetDocumentationCommtentXml() is the easiest, but whether it can be retrieved or not depends on . If it's false, it always returns null. That makes it too hard to use, so in Claudia, I tried to retrieve it from SyntaxNode, but that was also affected by .

So, I had no choice but to prepare an extension method like the following to successfully retrieve document comments in all situations (it’s a bit difficult to handle because it’s based on Trivia, but it’s much better than not being able to retrieve it).

public static DocumentationCommentTriviaSyntax? GetDocumentationCommentTriviaSyntax(this SyntaxNode node)
{
    if (node.SyntaxTree.Options.DocumentationMode == DocumentationMode.None)
    {
        var withDocumentationComment = node.SyntaxTree.Options.WithDocumentationMode(DocumentationMode.Parse);
        var code = node.ToFullString();
        var newTree = CSharpSyntaxTree.ParseText(code, (CSharpParseOptions)withDocumentationComment);
        node = newTree.GetRoot();
    }

    foreach (var leadingTrivia in node.GetLeadingTrivia())
    {
        if (leadingTrivia.GetStructure() is DocumentationCommentTriviaSyntax structure)
        {
            return structure;
        }
    }

    return null;
}

The state of DocumentationMode determines whether DocumentationCommentTriviaSyntax can be retrieved (it becomes None when GenerateDocumentaionFile=false), so if it's None, it's parsed again with DocumentationMode.Parse attached to retrieve it. Even if you generate a CSharpSyntaxTree by passing options to SyntaxNode as-is, it doesn't parse it again or changing DocumentationMode is useless, so it's done by converting it to a string and then calling ParseText.

JSON Serializer

Requests and responses are JSON in today’s world. And the library to use is System.Text.Json.JsonSerializer, period. There is room for objection, but there isn’t. Like it or not, you have to use it now.

A feature of System.Text.Json is that it can process based on UTF8, so if you try to avoid going through strings as much as possible, you can expect high performance. To deserialize ReadOnlySpan or ReadOnlySequence, you need to pass it through Utf8JsonReader. This is a ref struct, so there's no allocation, so just new it and use it. What about the Writer? Utf8JsonWriter is a class. Why? So, for the Writer, depending on how the application is built, if you can hold it in a field and reuse it, hold it in a field and reuse it (there's Reset), and if you can't hold it, pull it from [ThreadStatic].

When providing it in a library, since all the types to be used are determined, source generating it should improve performance and increase AOT safety. Claudia is also generating it.

[JsonSourceGenerationOptions(
    GenerationMode = JsonSourceGenerationMode.Default,
    DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull,
    WriteIndented = false)]
[JsonSerializable(typeof(MessageRequest))]
[JsonSerializable(typeof(Message))]
[JsonSerializable(typeof(Contents))]
[JsonSerializable(typeof(Content))]
[JsonSerializable(typeof(Metadata))]
[JsonSerializable(typeof(Source))]
[JsonSerializable(typeof(MessageResponse))]
[JsonSerializable(typeof(Usage))]
[JsonSerializable(typeof(ErrorResponseShape))]
[JsonSerializable(typeof(ErrorResponse))]
[JsonSerializable(typeof(Ping))]
[JsonSerializable(typeof(MessageStart))]
[JsonSerializable(typeof(MessageDelta))]
[JsonSerializable(typeof(MessageStop))]
[JsonSerializable(typeof(ContentBlockStart))]
[JsonSerializable(typeof(ContentBlockDelta))]
[JsonSerializable(typeof(ContentBlockStop))]
[JsonSerializable(typeof(MessageStartBody))]
[JsonSerializable(typeof(MessageDeltaBody))]
public partial class AnthropicJsonSerialzierContext : JsonSerializerContext
{
}

// When used internally, this JsonSerializerContext is always specified
JsonSerializer.SerializeToUtf8Bytes(request, AnthropicJsonSerialzierContext.Default.Options)

One thing I stumbled upon was that JsonIgnoreCondition.WhenWritingNull, which normally (reflection-based) worked for Nullable as well, stopped working with Source Generators and no longer ignored null. I had no choice but to work around it by directly attaching [JsonIgnore(Condition = JsonIgnoreCondition.WhenWritingDefault)] to all Nullable properties of the target types.

public record class MessageRequest
{
    // ...

    [JsonPropertyName("temperature")]
    [JsonIgnore(Condition = JsonIgnoreCondition.WhenWritingDefault)]
    public double? Temperature { get; set; }
}

Honestly, I feel like it’s an implementation leak in the Source Generator version, but since I was able to work around it, I’ll just leave it for now…

Like Azure OpenAI Service for the OpenAI API, people in AWS environments may find it easier to use Amazon Bedrock. So, already added Bedrock support! It should be even easier to use now.

Creating a Web API client is not that difficult. However, many SDKs out there are never designed to be easy to use. We hope this article will help you build a better design.

R3 — A New Modern Reimplementation of Reactive Extensions for C#

Yoshifumi Kawai — Tue, 05 Mar 2024 10:31:44 GMT

R3 — A New Modern Reimplementation of Reactive Extensions for C#

Recently, I officially released R3 as a new implementation of Reactive Extensions for C#! R3 is named as the third generation of Rx, considering Rx for .NET as the first generation and UniRx as the second. The core part of Rx (almost identical to dotnet/reactive) is provided as a library common to .NET, while custom schedulers and operators specific to each platform are separated into different libraries. This approach allows us to offer a core library for all .NET platforms and extension libraries for various frameworks such as Unity, Godot, Avalonia, WPF, WinForms, WinUI3, Stride, LogicLooper, MAUI, MonoGame and Blazor.

GitHub — Cysharp/R3

While it includes some breaking changes and is not a drop-in replacement, transitioning from dotnet/reactive or UniRx is kept realistically manageable. This is part of the beauty of Rx, where vocabulary and operations are largely standardized in a LINQ-like manner, meaning the transition may not seem significantly different.

The UniRx I had previously developed was an Rx exclusively for Unity. Therefore, R3 provides extensive support for Unity, offering sufficient functionality to serve as a migration destination from UniRx. Additionally, by becoming a general-purpose library, it now supports many scenarios for use with .NET. It is not an Rx specialized for use with game engines, but rather, it has been created as a new implementation of Reactive Extensions.

The History of Rx and vs. async/await

Are you using Rx? The number of people answering “no” is increasing, not just in .NET or Unity, but in Java, Swift, and Kotlin as well. Its presence is clearly declining. Why? The answer is simple: the advent of async/await. Reactive Extensions for .NET first appeared in 2009, during the era of C# 3.0 and .NET Framework 3.5, a time when platforms like Silverlight and Windows Phone, now defunct, were still relevant. async/await (introduced in C# 5.0, 2012) didn’t exist yet, and even Tasks had not been introduced. As a side note, the “Extensions” in Reactive Extensions were named after the earlier Parallel Extensions project, which included Parallel LINQ and the Task Parallel Library added in .NET Framework 4.0.

Initially, Rx spread across various languages as the definitive solution for asynchronous processing without language support, offering a powerful and user-friendly alternative to single-function Task or Promise. I, too, was mesmerized by Rx over TPL at the time. However, the landscape changed dramatically with the introduction of async/await, establishing it as the standard for asynchronous processing across numerous languages.

With the widespread adoption of async/await, the need for Rx just for asynchronous processing diminished, leading to a decline in its adoption rate. As the developer of UniRx, a standard for Rx in Unity, I quickly recognized the need for an async/await runtime tailored to game engines (Unity) and developed UniTask as soon as the necessary conditions (C# 7.0) were met in Unity.

Rediscovering the Value of Rx

Rx is not just for asynchronous processing, right? While it was hailed as “LINQ to Everything,” the notion of “Everything” might be noise, and it’s better to separate concerns and use the most optimal tools. Using Rx just for async processing is not the best approach; a single-value Observable should be represented by a Task for both clarity and performance benefits. This necessitates the integration of Rx with async/await through APIs that can coexist with asynchronous tasks, rather than focusing on minor details like being able to pass a Task to SelectMany because Observables are monads.

Simply being able to await is not sufficient for real-world application development. Various libraries have been devised for asynchronous/parallel processing, not just Rx, such as TPL Dataflow. However, few people would choose to use these libraries from scratch today. It’s now 2024, and the winners have been decided: language-supported IAsyncEnumerable and System.Threading.Channels are the best choices. These also incorporate backpressure characteristics, making operators related to backpressure in RxJava unnecessary for .NET. For more specific I/O operations, System.IO.Pipelines offers maximum performance.

Asynchronous LINQ might be a nice addition, but given its lower usage frequency compared to LINQ to Objects in actual asynchronous stream scenarios, it’s not something to be eagerly adopted (note that I have implemented UniTaskAsyncEnumerable and LINQ for UniTask myself). The dream of distributed queries (IQbservable) in Rx might have found its modern counterpart in GraphQL. In terms of distributed systems, Kubernetes has gained widespread adoption, with gRPC becoming the standard for RPC, and other options like Orleans, Akka.NET, SignalR, and MagicOnion offering a variety of choices.

The landscape is no longer the same as in 2009, where various technologies competed for dominance. Just as no one would choose Service Fabric today, venturing into distributed processing is not the future of Rx, in my opinion. Just because Rx was created by the Cloud Programmability Team doesn’t mean that making it useful for the cloud is the only correct approach. Of course, there could be multiple futures, and I hope one possible future for Rx is R3.

So, where does the value of Rx lie? I believe it returns to its roots: processing in-memory messaging with LINQ, or LINQ to Events. Especially on the client side and in UI processing, Rx continues to be valued, with Rx-like but more language-optimized options like Kotlin Flow and Swift Combine still actively used. Even in complex, event-heavy game applications, as a developer of UniRx used in the game engine (Unity), I find it extremely beneficial. The significance of the observer pattern and events is undeniable, and Rx’s role as a “better event” or the ultimate observer pattern remains unchanged.

Reconstruction with R3

Initially, I debated whether to maintain 100% compatibility with Rx interfaces while removing legacy APIs and adding new ones, or to fundamentally change them. However, to solve all the issues I perceive, radical changes were necessary. Inspired by the success of Kotlin Flow and Swift Combine, I decided to reconstruct Rx completely anew, tailored to the modern C# environment of .NET 8 and C# 12.

Even so, the differences in interfaces are not that significant in the end.

public abstract class Observable
{
    public IDisposable Subscribe(Observer observer);
}

public abstract class Observer : IDisposable
{
    public void OnNext(T value);
    public void OnErrorResume(Exception error);
    public void OnCompleted(Result result); // Result is (Success | Failure)
}

At a glance, the main changes are the transformation of OnError into OnErrorResume and the shift from interface to abstract class. One of the changes I felt was necessary was OnError, where the behavior of unsubscribing due to exceptions in the pipeline was considered a billion-dollar mistake in Rx. In R3, exceptions flow to OnErrorResume without unsubscribing. Instead, the pipeline’s termination is indicated by passing a Result representing Success or Failure to OnCompleted.

The definitions of IObservable/IObserver are closely related to those of IEnumerable/IEnumerator, and they are claimed to be a mathematical duality, but there are practical inconveniences, with the most significant being that it stops on OnError. The inconvenience stems from the different lifetimes of exceptions in IEnumerable's foreach and IObservable. While an exception in foreach ends the iteration there, and if necessary, is handled with try-catch, often without retrying, subscribing to an Observable is different. The lifespan of an event subscription is long, and it's not unnatural to want it not to stop even if an exception occurs. Normal events do not stop when an exception occurs, but in Rx, due to the operator chain, there's always a possibility of exceptions occurring in the pipeline (e.g., Select or Where might throw exceptions). When considered as an alternative or superior to events, it becomes unnatural for it to stop due to exceptions.

And it’s not just about catching and retrying if needed! Re-subscribing to a stopped event in Rx is very difficult! Unlike events, Observables have a concept of completion. Subscribing to a completed IObservable immediately calls OnError | OnCompleted, thus automatic re-subscription risks re-subscribing to a completed sequence. Of course, this would lead to an infinite loop, without a way to detect and handle it properly. There are many questions on Stack Overflow about how to re-subscribe to UI subscriptions in Rx/Combine/Flow, and the answers often require writing very complex code. Reality is not solved with Repeat/Retry alone!

Therefore, it was changed to not stop on exceptions. To avoid confusion with the traditional stopping behavior, it was renamed to OnErrorResume. This solves all issues related to re-subscription. Moreover, this change has advantages; changing from stopping to not stopping is impossible (as the Dispose chain would run, making it impossible to restore state, leaving no means other than total re-subscription), but changing from not stopping to stopping is very easy to implement and performs well. Just prepare an operator that converts OnErrorResume to OnCompleted(Result.Failure) (a standard operator OnErrorResumeAsFailure has been added).

Rx itself, while having complex contracts (such as either OnError or OnCompleted is issued, but not both), lacks implementation guarantees in its interface, making correct implementation of custom operator is difficult. For instance, correctly handling the Disposable returned when Subscribe is delayed (using SingleAssignmentDisposable) is difficult to understand properly. Where do exceptions occurring in onNext during Subscribe go, to onError to be Disposed of, or do they continue? This behavior is not specifically regulated, so implementations can vary. R3 ensures most contracts by becoming an abstract class, unifying behavior and easing custom implementations.

The primary reason for making it an abstract class was to centralize management of all subscriptions. All Subscribes must go through the base class’s Subscribe implementation, enabling tracking of subscriptions. For example, it can be displayed as follows:

This is an extension window for Unity, but it exists for Godot and is offered as an API, allowing it to be logged or retrieved at any time, and making custom visualization possible.

While Task has Parallel Debugger (which also centralizes management in the base class when s_asyncDebuggingEnabled), visualizing Rx subscriptions is far more critical. Event subscription leaks are common, and developers often scramble to find them at the end of development, but with R3, this is no longer necessary! Significantly improved development efficiency!

R3 prioritizes subscription management and leak prevention, tracking all subscriptions with Observable Tracker and introducing the concept that “all Observables can complete.”

The basic principle of subscription management in Rx is disposing of IDisposable. However, unsubscribing is not limited to this; it can also be done by flowing OnError | OnCompleted (not guaranteed by the IObservable contract but implemented as such, and R3 ensures it will always be so through the base class). Thus, handling leaks from both upstream (issuance of OnError | OnCompleted) and downstream (Dispose) can more reliably prevent leaks.

While this may seem excessive, experience in developing actual applications suggests that excessive subscription management is just right. From this philosophy, R3 has made Observables like Observable.FromEvent, Observable.Timer, EveryUpdate, which previously had no means to issue OnCompleted, able to do so. The method of issuance is by passing a CancellationToken, leveraging the widely (or excessively) used CancellationToken in the modern API design post-async/await. Additionally, with the idea that all Observables can complete, disposing of a Subject now standardly issues OnCompleted.

Reconsidering IScheduler

IScheduler is the mechanism that enables the magic of moving through time and space in Rx. By passing it to Timer or ObserveOn, you can move values to any place (Thread, Dispatcher, PlayerLoop, etc.) and time.

public interface IScheduler
{
    DateTimeOffset Now { get; }

    IDisposable Schedule(TState state, Func action);
    IDisposable Schedule(TState state, TimeSpan dueTime, Func action);
    IDisposable Schedule(TState state, DateTimeOffset dueTime, Func action);
}

And, it turns out to be flawed. If you have ever looked at the Rx source code, you may have noticed that from the beginning, an additional, different definition has been prepared. For example, ThreadPoolScheduler implements interfaces like the following.

public interface ISchedulerLongRunning
{
    IDisposable ScheduleLongRunning(TState state, Action action);
}

public interface ISchedulerPeriodic
{
    IDisposable SchedulePeriodic(TState state, TimeSpan period, Func action);
}

public interface IStopwatchProvider
{
    IStopwatch StartStopwatch();
}

public abstract partial class LocalScheduler : IScheduler, IStopwatchProvider, IServiceProvider
{
}

public sealed class ThreadPoolScheduler : LocalScheduler, ISchedulerLongRunning, ISchedulerPeriodic
{
}

And the following calls are made.

public static IStopwatch StartStopwatch(this IScheduler scheduler)
{
    var swp = scheduler.AsStopwatchProvider();
    if (swp != null)
    {
        return swp.StartStopwatch();
    }

    return new EmulatedStopwatch(scheduler);
}

private static IDisposable SchedulePeriodic_(IScheduler scheduler, TState state, TimeSpan period, Func action)
{
    var periodic = scheduler.AsPeriodic();
    if (periodic != null)
    {
        return periodic.SchedulePeriodic(state, period, action);
    }

    var swp = scheduler.AsStopwatchProvider();
    if (swp != null)
    {
        var spr = new SchedulePeriodicStopwatch(scheduler, state, period, action, swp);
        return spr.Start();
    }
    else
    {
        var spr = new SchedulePeriodicRecursive(scheduler, state, period, action);
        return spr.Start();
    }
}

In essence, there are quite a few cases where raw IScheduler is not used. The reason for not using it is due to performance issues, as IScheduler.Schedule is defined only for single executions, and the idea is that multiple calls can be made recursively, but generating a new IDisposable each time poses a performance issue. To avoid this, ISchedulerPeriodic and others were prepared.

In that case, wouldn’t it be better to use something that properly reflects the reality rather than IScheduler? This led to the discovery that TimeProvider, added in .NET 8, can do what IScheduler did more efficiently.

public abstract class TimeProvider
{
    // use these.
    public virtual ITimer CreateTimer(TimerCallback callback, object? state, TimeSpan dueTime, TimeSpan period);
    public virtual long GetTimestamp();
}

The ITimer generated by CreateTimer has sufficient functionality to perform what ISchedulerPeriodic can do, and in scenarios where one-time executions are repeated (Schedule(TState state, TimeSpan dueTime, Func action)), using ITimer is more efficient than dotnet/reactive's ThreadPoolScheduler (which creates a new Timer each time).

Regarding the current time acquisition, TimeProvider also has DateTimeOffset TimeProvider.GetUtcNow() similar to DateTimeOffset IScheduler.Now, but it only uses long GetTimestamp. The reason is that only ticks are necessary for operator implementation, so it's better to avoid the overhead of wrapping it in DateTimeOffset and directly handle raw ticks for time calculations.

DateTimeOffset.UtcNow can be affected by changes to the OS system time, so it's better to use GetTimestamp (which uses a high-resolution timer from Stopwatch.GetTimestamp() as standard) without going through DateTimeOffset for that reason as well.

Another problem with IScheduler is the existence of synchronously operating schedulers like ImmediateScheduler and CurrentScheduler. Assigning time-related processes like Timer or Delay to these results in emulating asynchronous code that should not be used, i.e., sleeping the thread. Therefore, it might be better not to have synchronous Schedulers at all. In R3, they were completely removed, and specifying TimeProvider means always making asynchronous calls.

The problem with ImmediateScheduler and CurrentScheduler is not just that, but also that their performance is critically poor.

Result of Observable.Range(1, 10000).Subscribe()

The poor results of ImmediateScheduler, if not CurrentScheduler, might be counterintuitive. The ImmediateScheduler in dotnet/reactive new AsyncLockScheduler() for call Schedule, and the constructor of the base class LocalScheduler called by AsyncLockScheduler does SystemClock.Register, which locks, new WeakReference(scheduler), and HashSet.Add. It's no wonder the performance is bad (although it's limited to just generating a SingleAssignmentDisposable each time for recursive calls, which is still a lot).

You might think it’s okay because Range is rarely used, but ImmediateScheduler is actually used quite often in unexpected places. A typical example is Merge, which uses ImmediateScheduler when IScheduler is unspecified, so if it's built to repeat frequent subscriptions, it may be called a considerable number of times. In fact, when I use dotnet/reactive in a server application, Merge and ImmediateScheduler once accounted for a significant portion of the server's memory usage. At that time, I managed to get by by creating a custom lightweight scheduler, specifying it directly, and thoroughly avoiding ImmediateScheduler. If there is a next dotnet/reactive, the performance improvement of ImmediateScheduler should be the first thing to do.

The reason for doing SystemClock.Register seems to be for monitoring changes to the system time with DateTimeOffset.UtcNow. In other words, had we used long(timestamp) from the start, we wouldn't have invited such critical performance degradation. This is also one of the reasons for the failure in defining the IScheduler interface.

By adopting TimeProvider, it’s also worth noting that unit testing has become easier with standard methods using Microsoft.Extensions.Time.Testing.FakeTimeProvider.

FrameProvider

One thing that is not present in other Rx libraries, but has been immensely effective in UniRx, is the frame-based operator. These include operators like DelayFrame that executes after a set number of frames, NextFrame for execution in the next frame, EveryUpdate as a factory that emits every frame, and EveryValueChanged for monitoring values every frame, all of which are convenient for use in game engines.

What I realized is that time and frames are conceptually similar, and not just in game engines, but also in UI processes where you have message loops and rendering loops, these concepts exist across various frameworks. Therefore, in R3, we abstracted frame-based processing in the form of FrameProvider, complementing TimerProvider. This allows the frame-based operators, previously only available to Unity, to work across any framework that supports C# (WinForms, WPF, WinUI3, MAUI, Godot, Avalonia, Stride, etc…).

public abstract class FrameProvider
{
    public abstract long GetFrameCount();
    public abstract void Register(IFrameRunnerWorkItem callback);
}

public interface IFrameRunnerWorkItem
{
    // true, continue
    bool MoveNext(long frameCount);
}

In R3, for every operator that requires a TimeProvider, we implemented a corresponding ***Frame operator.

Return <-> ReturnFrame
Yield <-> YieldFrame
Interval <-> IntervalFrame
Timer <-> TimerFrame
Chunk <-> ChunkFrame
Debounce <-> DebounceFrame
Delay <-> DelayFrame
DelaySubscription <-> DelaySubscriptionFrame
ObserveOn(TimeProvider) <-> ObserveOn(FrameProvider)
Replay <-> ReplayFrame
Skip <-> SkipFrame
SkipLast <-> SkipLastFrame
SubscribeOn(TimeProvider) <-> SubscribeOn(FrameProvider)
Take <-> TakeFrame
TakeLast <-> TakeLastFrame
ThrottleFirst <-> ThrottleFirstFrame
ThrottleFirstLast <-> ThrottleFirstLastFrame
ThrottleLast <-> ThrottleLastFrame
Timeout <-> TimeoutFrame

async/await Integration

First, we thoroughly eliminated Observables that return a single value, which are seen as a bad practice in existing Rx. These should be handled with async/await instead, as operators that return or expect a single value can introduce noise that leads to bad practices. First becomes FirstAsync, returning a Task. AsyncSubject is removed; please use TaskCompletionSource instead.

Moreover, current C# code often involves asynchronous code, but fundamentally, Rx only accepts synchronous code. Carelessly, this could lead to a FireAndForget situation, and simply mixing it with SelectMany is not sufficient. Thus, we introduced special methods for Where/Select/Subscribe.

SelectAwait(this Observable source, Func> selector, AwaitOperation awaitOperation = Sequential, ...)
WhereAwait(this Observable source, Func> predicate, AwaitOperation awaitOperation = Sequential, ...)
SubscribeAwait(this Observable source, Func onNextAsync, AwaitOperation awaitOperation = Sequential, ...)
SubscribeAwait(this Observable source, Func onNextAsync, Action onCompleted, AwaitOperation awaitOperation = Sequential, ...)
SubscribeAwait(this Observable source, Func onNextAsync, Action onErrorResume, Action onCompleted, AwaitOperation awaitOperation = Sequential, ...)

public enum AwaitOperation
{
    /// All values are queued, and the next value waits for the completion of the asynchronous method.

    Sequential,
    /// Drop new value when async operation is running.

    Drop,
    /// If the previous asynchronous method is running, it is cancelled and the next asynchronous method is executed.

    Switch,
    /// All values are sent immediately to the asynchronous method.

    Parallel,
    /// All values are sent immediately to the asynchronous method, but the results are queued and passed to the next operator in order.

    SequentialParallel,
    /// Send the first value and the last value while the asynchronous method is running.

    ThrottleFirstLast
}

SelectAwait, WhereAwait, SubscribeAwait accept asynchronous methods and offer six patterns for handling values that arrive while the asynchronous method is executing. Sequential queues the values until the asynchronous method completes. Drop discards all values that arrive while it’s executing,

useful for preventing multiple submissions in event handling. Switch cancels the ongoing asynchronous method and starts the next one, similar to Observable.Switch. Parallel executes methods in parallel, like Observable.Merge. SequentialParallel runs operations in parallel but ensures the values are passed to the next operator in the order they arrived. ThrottleFirstLast sends the first and last values received while the asynchronous method is running.

Furthermore, the following time-based filtering methods now accept asynchronous methods as well.

Debounce(this Observable source, Func throttleDurationSelector, ...)
ThrottleFirst(this Observable source, Func sampler, ...)
ThrottleLast(this Observable source, Func sampler, ...)
ThrottleFirstLast(this Observable source, Func sampler, ...)

We have also made Chunk accept asynchronous methods, and SkipUntil/TakeUntil has a variation that accepts CancellationToken and Task.

SkipUntil(this Observable source, CancellationToken cancellationToken)
SkipUntil(this Observable source, Task task)
SkipUntil(this Observable source, Func asyncFunc, ...)
TakeUntil(this Observable source, CancellationToken cancellationToken)
TakeUntil(this Observable source, Task task)
TakeUntil(this Observable source, Func asyncFunc, ...)
Chunk(this Observable source, Func asyncWindow, ...)

For example, using an asynchronous version of Chunk allows you to create chunks at random intervals, not just fixed ones, enabling complex logic to be written more naturally and simply.

Observable.Interval(TimeSpan.FromSeconds(1))
    .Index()
    .Chunk(async (_, ct) =>
    {
        await Task.Delay(TimeSpan.FromSeconds(Random.Shared.Next(0, 5)), ct);
    })
    .Subscribe(xs =>
    {
        Console.WriteLine(string.Join(", ", xs));
    });

async/await is indispensable in modern C# code, and we’ve made every effort to integrate it smoothly with Rx.

Retry operations can also benefit from async/await for better handling. Previously, Rx could only retry the entire pipeline, but with R3’s acceptance of async/await, retries can be performed on a per-asynchronous method execution basis.

button.OnClickAsObservable()
    .SelectAwait(async (_, ct) =>
    {
        var retry = 0;
    AGAIN:
        try
        {            
            return await DownloadTextAsync("https://google.com/", ct);
        }
        catch
        {
            if (retry++ < 3) goto AGAIN;
            throw;
        }
    }, AwaitOperation.Drop)
    .Subscribe();

Repeat can also be implemented with async/await. In this case, managing repeat conditions can be simpler than relying solely on Rx operators, potentially offering higher readability. Prioritizing readability (and performance) in coding is crucial. Let’s continue to effectively integrate Rx with async/await for better code.

You can also create Observables from asynchronous methods with Create and CreateFrom, which might allow for more straightforward descriptions compared to forcibly twisting operators.

Create(Func, CancellationToken, ValueTask> subscribe, ...)
CreateFrom(Func> factory)

Naming Rules

In R3, the names of several methods have been changed from those in dotnet/reactive or UniRx. For example:

Buffer -> Chunk
StartWith -> Prepend
Distinct(selector) -> DistinctBy
Throttle -> Debounce
Sample -> ThrottleLast

Let’s explain the reasons for these changes.

First, when creating a LINQ-style library in .NET, the highest priority should be given to the method names implemented in LINQ to Objects (Enumerable). The reason why Buffer was changed to Chunk is because Enumerable.Chunk was added in .NET 6, and its function is the same as Buffer. Rx predates the introduction of Chunk, so there's nothing that can be done about the differing names, but if there are no constraints, names should align with LINQ to Objects. Therefore, Chunk is the only choice. The same goes for Prepend and DistinctBy.

You might resist changing Throttle to Debounce. This is because the world's standard is Debounce. In the Rx world, dotnet/reactive is the only one that refers to Debounce as Throttle. It could be argued that there's no need to change since RxNet is the progenitor of the Rx world, but now being in the minority, it's also correct to go along with the majority.

The reason for changing to Debounce is not just that, but also the existence of ThrottleFirst / ThrottleLast. These take the first or last value in a sampling period, respectively, forming a pair. But (dotnet/reactive's) Throttle behaves entirely differently, so the name Throttle is confusing. Originally, dotnet/reactive lacks ThrottleFirst and only has Sample, which corresponds to ThrottleLast, so it's fine. However, if adopting ThrottleFirst/ThrottleLast, inevitably, the name must be Debounce.

Regarding Sample, due to the symmetry in the names and functions of First/Last, it was renamed to ThrottleLast. In dotnet/reactive, since First doesn't exist, Sample would have been fine, but if adopting ThrottleFirst, the name inevitably becomes ThrottleLast.

There is a compromise to keep the name Sample and make it an alias for ThrottleLast (as is the case with RxJava), but having different names for the same function confuses users. There are quite a few questions like what's the difference between sample and throttleLast? Rx is complicated enough, and to avoid unnecessary confusion, aliases should definitely be avoided. Aliases like mapping Select to Map or Where to Filter are utterly foolish.

Default Scheduler for Platforms

In dotnet/reactive, the default scheduler is almost fixed. Technically, it’s possible to replace some behaviors by appropriately implementing IPlatformEnlightenmentProvider or IConcurrencyAbstractionLayer, but it's unnecessarily complicated and mostly hidden with [EditorBrowsable(EditorBrowsableState.Never)], so it's hardly expected to be used properly.

However, for Timer or Delay, if it’s WPF, they operate on DispatcherTimer, and in Unity, they work on Timer in the PlayerLoop, automatically dispatching to the main thread, which is convenient and advantageous for performance as ObserveOn becomes unnecessary in most cases.

In R3, we made it simple to replace the default TimeProvider/FrameProvider.

public static class ObservableSystem
{
    public static TimeProvider DefaultTimeProvider { get; set; } = TimeProvider.System;
    public static FrameProvider DefaultFrameProvider { get; set; } = new NotSupportedFrameProvider();
}

By replacing them at application startup, the best scheduler for that application will be used by default.

// For example, in WPF, the Dispatcher series is set, so it automatically returns to the UI thread
public static class WpfProviderInitializer
{
    public static void SetDefaultObservableSystem(Action unhandledExceptionHandler)
    {
        ObservableSystem.RegisterUnhandledExceptionHandler(unhandledExceptionHandler);
        ObservableSystem.DefaultTimeProvider = new WpfDispatcherTimerProvider();
        ObservableSystem.DefaultFrameProvider = new WpfRenderingFrameProvider();
    }
}

// In the case of Unity, PlayerLoop-based ones are used, avoiding ThreadPool
public static class UnityProviderInitializer
{
    [RuntimeInitializeOnLoadMethod(RuntimeInitializeLoadType.AfterAssembliesLoaded)]
    public static void SetDefaultObservableSystem()
    {
        SetDefaultObservableSystem(static ex => UnityEngine.Debug.LogException(ex));
    }

    public static void SetDefaultObservableSystem(Action unhandledExceptionHandler)
    {
        ObservableSystem.RegisterUnhandledExceptionHandler(unhandledExceptionHandler);
        ObservableSystem.DefaultTimeProvider = UnityTimeProvider.Update;
        ObservableSystem.DefaultFrameProvider = UnityFrameProvider.Update;
    }
}

dotnet/reactive’s inability to change the default scheduler hardly supports a variety of platforms.

internal static class SchedulerDefaults
{
    internal static IScheduler ConstantTimeOperations => ImmediateScheduler.Instance;
    internal static IScheduler TailRecursion => ImmediateScheduler.Instance;
    internal static IScheduler Iteration => CurrentThreadScheduler.Instance;
    internal static IScheduler TimeBasedOperations => DefaultScheduler.Instance;
    internal static IScheduler AsyncConversions => DefaultScheduler.Instance;
}

Especially in AOT scenarios(NativeAOT, Unity IL2CPP) or web publishing (WebGL, WASM), there are situations where ThreadPool cannot be used and must be absolutely avoided. Thus, SchedulerDefaults.TimeBasedOperations being essentially fixed to ThreadPoolScheduler is regrettably restrictive.

Pull IAsyncEnumerable vs Push Observable

IAsyncEnumerable (or UniTask's IUniTaskAsyncEnumerable) is a pull-based asynchronous sequence. Reactive Extensions (Rx) is a push-based asynchronous sequence. They are similar. The fact that you can do LINQ-like operations with both is also similar. It's natural to say that which one to use depends on the case, but then, what are those cases? When should you use which? It would be nice to have some criteria for this decision.

Basically, if there’s a buffer (queue) behind the scenes, pull-based approaches seem suitable, so for network-related scenarios, it might be a good idea to use IAsyncEnumerable. Indeed, natural opportunities to use it come up with System.IO.Pipelines or System.Threading.Channels.

The place to use Rx is indeed related to events.

The deciding factor on which to use should be to choose the representation that is natural for the source. Raw events, such as OnMove or OnClick, are entirely push-based, with no buffer involved. It would also be suitable for high-frequency events like sensor data, or for events that come through the network where the buffer is hidden and delivered purely as events. This means Rx is the natural choice to handle them.

You could interpose a queue and deal with it via IAsyncEnumerable, but that would be unnatural. Alternatively, expressing the intentional dropping of values by not using a queue could also be done, but again, that’s unnatural. Being unnatural usually means worse performance and less clarity. In other words, it’s not good. Therefore, handle event-related things with Rx. With R3, integration with async/await allows you to explicitly specify buffering during asynchronous operations or dropping values with operators. This is clear and performs well. Let’s use R3.

Conclusion

I’ve pointed out many things, I have nothing but gratitude for the original creators of Rx.NET. Once again, I am in awe of the brilliance of the Rx concept and the organized functionality of its various operators. Although some parts of the implementation have become outdated, it has a track record, stability, and high quality. I have been using it from the very beginning and have been enthusiastic about it. I also want to thank the current maintainers. It’s very difficult to maintain a widely used library in an ever-changing environment.

However, I wanted to revive the value of Rx. And if it was to be rebuilt, I believed I was the only one who could do it. I know the history and implementation of Rx from the beginning, have implemented Rx itself (UniRx), and through its widespread use, have become familiar with many use cases and issues. I’ve also been involved on the application side of Rx in large-scale implementations for game titles and implemented a custom runtime for async/await (UniTask) that has also been widely used, giving me insight into all aspects of this area. I have also accumulated experience in implementing high-performance serializers that have become standards in the industry, such as MessagePack for C# and MemoryPack, developing network frameworks like MagicOnion and applying new protocols (HTTP/2, gRPC), and in implementing modern high-performance C# in various areas with ZLogger and AlterNats. I have a sufficient technical foundation.

It’s fine for there to be multiple futures, so I hope you will see R3 as one possible future for Rx that I present. There may be another evolution and future for dotnet/reactive.

With that said, I believe R3 has shown enough potential and possibility to be considered a replacement. I have tried my best to consider migration scenarios as well, so please give it a try…!