“ZLinq”, a Zero-Allocation LINQ Library for .NET

admin

May 21, 2025 - 04:15

0 0

“ZLinq”, a Zero-Allocation LINQ Library for .NET

15 min read·4 days ago

I’ve released ZLinq v1 last month! By building on structs and generics, it achieves zero allocations. It includes extensions like LINQ to Span, LINQ to SIMD, LINQ to Tree (FileSystem, JSON, GameObject, etc.), a drop-in replacement Source Generator for arbitrary types, and support for multiple platforms including .NET Standard 2.0, Unity, and Godot. It has now exceeded 2000 GitHub stars.

https://github.com/Cysharp/ZLinq

Struct-based LINQ itself isn’t particularly rare, and many implementations have attempted this approach over the years. However, none have been truly practical until now. They’ve typically suffered from extreme assembly size bloat, insufficient operator coverage, or performance issues due to inadequate optimization, never evolving beyond experimental status. With ZLinq, we aimed to create something practical by implementing 100% coverage of all methods and overloads in .NET 10 (including new ones like Shuffle, RightJoin, LeftJoin), ensuring 99% behavior compatibility, and implementing optimizations beyond just allocation reduction, including SIMD support, to outperform in most scenarios.

This was possible because of my extensive experience implementing LINQ. In April 2009, I released linq.js, a LINQ to Objects library for JavaScript (it’s wonderful to see that linq.js is still being maintained by someone who forked it!). I’ve also implemented the widely-used Reactive Extensions library UniRx for Unity, and recently released its evolution, R3. I’ve created variants like LINQ to GameObject, LINQ to BigQuery, and SimdLinq. By combining these experiences with knowledge from zero-allocation related libraries (ZString, ZLogger) and high-performance serializers (MessagePack-CSharp, MemoryPack), we achieved the ambitious goal of creating a superior alternative to the standard library.

This simple benchmark shows that while normal LINQ allocations increase as you chain more methods (Where, Where.Take, Where.Take.Select), ZLinq remains at zero.

Performance varies depending on the source, quantity, element type, and method chaining. To confirm that ZLinq performs better in most cases, we’ve prepared various benchmark scenarios that run on GitHub Actions: ZLinq/actions/Benchmark. While there are cases where ZLinq structurally can’t win, it outperforms in most practical scenarios.

For extreme differences in benchmarks, consider repeatedly calling Select multiple times. Neither System.LINQ nor ZLinq apply special optimizations in this case, but ZLinq shows a significant performance advantage:

(Memory measurement 1B is BenchmarkDotNet MemoryDiagnoser errors. The documentation clearly states that MemoryDiagnoser has an accuracy of 99.5%, which means slight measurement errors can occur.)

In simple cases, operations that require intermediate buffers like Distinct or OrderBy show large differences because aggressive pooling significantly reduces allocations (ZLinq uses somewhat aggressive pooling since it’s primarily based on ref struct, which is expected to be short-lived):

LINQ applies special optimizations based on method call patterns, so reducing allocations alone isn’t enough to always outperform it. For operator chain optimizations, such as those introduced in .NET 9 and described in Performance Improvements in .NET 9, ZLinq implements all these optimizations to achieve even higher performance:

A great benefit of ZLinq is that these LINQ evolution optimizations become available to all .NET generations (including .NET Framework), not just the latest versions.

Usage is simple — just add an AsValueEnumerable() call. Since all operators are 100% covered, replacing existing code works without issues:

using ZLinq;
var seq = source    .AsValueEnumerable() // only add this line    .Where(x => x % 2 == 0)    .Select(x => x * 3);foreach (var item in seq) { }

To ensure behavior compatibility, ZLinq ports System.Linq.Tests from dotnet/runtime and continuously runs them at ZLinq/System.Linq.Tests.

9000 test cases guarantee behavior (Skip cases are due to ref struct limitations where identical test code can’t be run, etc.)

Additionally, ZLinq provides a Source Generator for Drop-In Replacement that can optionally eliminate even the need for AsValueEnumerable():

[assembly: ZLinq.ZLinqDropInAttribute("", ZLinq.DropInGenerateTypes.Everything)]

This mechanism allows you to freely control the scope of the Drop-In Replacement. ZLinq/System.Linq.Tests itself uses Drop-In Replacement to run existing test code with ZLinq without changing the tests.

ValueEnumerable Architecture and Optimization

For usage, please refer to the ReadMe. Here, I’ll delve deeper into optimization. The architectural distinction goes beyond simply implementing lazy sequence execution, containing many innovations compared to collection processing libraries in other languages.

The definition of ValueEnumerable, which forms the basis of chaining, looks like this:

public readonly ref struct ValueEnumerable(TEnumerator enumerator)    where TEnumerator : struct, IValueEnumerator, allows ref struct // allows ref struct only in .NET 9 or later{    public readonly TEnumerator Enumerator = enumerator;}
public interface IValueEnumerator : IDisposable{    bool TryGetNext(out T current); // as MoveNext + Current    // Optimization helper    bool TryGetNonEnumeratedCount(out int count);    bool TryGetSpan(out ReadOnlySpan span);    bool TryCopyTo(scoped Span destination, Index offset);}

Based on this, operators like Where chain as follows:

public static ValueEnumerable, TSource> Where(this ValueEnumerable source, Func predicate)    where TEnumerator : struct, IValueEnumerator, allows ref struct

We chose this approach rather than using IValueEnumerable because with a definition like (this TEnumerable source) where TEnumerable : struct, IValueEnumerable, type inference for TSource would fail. This is due to a C# language limitation where type inference doesn't work from type parameter constraints (dotnet/csharplang#6930). If implemented with that definition, it would require defining instance methods for a vast number of combinations. LinqAF took that approach, resulting in 100,000+ methods and massive assembly sizes, which wasn't ideal.

In LINQ, all implementation is in IValueEnumerator, and since all Enumerators are structs, I realized that instead of using GetEnumerator(), we could simply copy-pass the common Enumerator, allowing each Enumerator to process with its independent state. This led to the final structure of wrapping IValueEnumerator with ValueEnumerable. This way, types appear in type declarations rather than constraints, avoiding type inference issues.

TryGetNext

Let’s examine MoveNext, the core of iteration, in more detail:

// Traditional interfacepublic interface IEnumerator : IDisposable{    bool MoveNext();    T Current { get; }}
// iterate examplewhile (e.MoveNext()){    var item = e.Current; // invoke get_Current()}// ZLinq interfacepublic interface IValueEnumerator : IDisposable{    bool TryGetNext(out T current);}// iterate examplewhile (e.TryGetNext(out var item)){}

C#’s foreach expands to MoveNext() + Current, which presents two issues. First, each iteration requires two method calls: MoveNext and get_Current. Second, Current requires holding a variable. Therefore, I combined them into bool TryGetNext(out T current). This reduces method calls to one per iteration, improving performance.

This bool TryGetNext(out T current) approach is also used in Rust's iterator:

pub trait Iterator {    type Item;    // Required method    fn next(&mut self) -> Option;}

To understand the variable holding issue, let’s look at the Select implementation:

public sealed class LinqSelect(IEnumerator source, Func selector) : IEnumerator{    // Three fields    IEnumerator source = source;    Func selector = selector;    TResult current = default!;
    public TResult Current => current;
    public bool MoveNext()    {        if (source.MoveNext())        {            current = selector(source.Current);            return true;        }        return false;    }}
public ref struct ZLinqSelect(TEnumerator source, Func selector) : IValueEnumerator    where TEnumerator : struct, IValueEnumerator, allows ref struct{    // Two fields    TEnumerator source = source;    Func selector = selector;    public bool TryGetNext(out TResult current)    {        if (source.TryGetNext(out var value))        {            current = selector(value);            return true;        }        current = default!;        return false;    }}

IEnumerator requires a current field because it advances with MoveNext() and returns with Current. However, ZLinq advances and returns values simultaneously, eliminating the need to store the field. This makes a significant difference in ZLinq's struct-based architecture. Since ZLinq embraces a structure where each method chain encompasses the previous struct entirely (TEnumerator being a struct), struct size grows with each method chain. While performance remains acceptable within reasonable method chain lengths, smaller structs mean lower copy costs and better performance. The adoption of TryGetNext was essential to minimize struct size.

A drawback of TryGetNext is that it cannot support covariance and contravariance. However, I believe iterators and arrays should abandon covariance/contravariance support altogether. They’re incompatible with Span, making them outdated concepts when weighing pros and cons. For example, array Span conversion can fail at runtime without compile-time detection:

// Due to generic variance, Derived[] is accepted by Base[]Base[] array = new Derived[] { new Derived(), new Derived() };
// In this case, casting to Span or using AsSpan() causes a runtime error!// System.ArrayTypeMismatchException: Attempted to access an element as a type incompatible with the array.Span foo = array;class Base;class Derived : Base;

While this behavior exists because these features were added before Span, it's problematic in modern .NET where Span is widely used, making features that can cause runtime errors practically unusable.

TryGetNonEnumeratedCount / TryGetSpan / TryCopyTo

Naively enumerating everything doesn’t maximize performance. For example, when calling ToArray, if the size doesn’t change (e.g., array.Select().ToArray()), we can create a fixed-length array with new T[count]. System.LINQ internally uses an Iterator type for such optimizations, but since the parameter is IEnumerable, code like if (source is Iterator iterator) is always needed.

Since ZLinq is designed specifically for LINQ from the start, we’ve prepared for these optimizations. To avoid assembly size bloat, we’ve carefully selected the minimal set of definitions that provide maximum effect, resulting in these three methods.

TryGetNonEnumeratedCount(out int count) succeeds when the original source has a finite count and no filtering methods (Where, Distinct, etc., though Take and Skip are calculable) intervene. This benefits ToArray and methods requiring intermediate buffers like OrderBy and Shuffle.

TryGetSpan(out ReadOnlySpan span) potentially delivers dramatic performance improvements when the source can be accessed as contiguous memory, enabling SIMD operations or Span-based loop processing for aggregation performance.

TryCopyTo(scoped Span destination, Index offset) enhances performance through internal iterators. To explain external vs. internal iterators, consider that List offers both foreach and ForEach:

// external iteratorforeach (var item in list) { Do(item); }
// internal iteratorlist.ForEach(Do);

They look similar but perform differently. Breaking down the implementations:

// external iteratorList.Enumerator e = list.GetEnumerator();while (e.MoveNext()){    var item = e.Current;    Do(item);}
// internal iteratorfor (int i = 0; i < _size; i++){    action(_items[i]);}

This becomes a competition between delegate call overhead (+ delegate creation allocation) vs. iterator MoveNext + Current calls. The iteration speed itself is faster with internal iterators. In some cases, delegate calls may be lighter, making internal iterators potentially advantageous in benchmarks.

Of course, this varies case by case, and since lambda captures and normal control flow (like continue, break, await, etc…) aren’t available, I personally believe ForEach shouldn't be used, nor should custom extension methods be defined to mimic it. However, this structural difference exists.

TryCopyTo(scoped Span destination, Index offset) achieves limited internal iteration by accepting a Span rather than a delegate.

Using Select as an example, for ToArray when Count is available, it passes a Span for internal iteration:

public ref struct Select{    public bool TryCopyTo(Span destination, Index offset)    {        if (source.TryGetSpan(out var span))        {            if (EnumeratorHelper.TryGetSlice(span, offset, destination.Length, out var slice))            {                // loop inlining                for (var i = 0; i < slice.Length; i++)                {                    destination[i] = selector(slice[i]);                }                return true;            }        }        return false;    }}
// ------------------
// ToArrayif (enumerator.TryGetNonEnumeratedCount(out var count)){    var array = GC.AllocateUninitializedArray(count);    // try internal iterator    if (enumerator.TryCopyTo(array.AsSpan(), 0))    {        return array;    }    // otherwise, use external iterator    var i = 0;    while (enumerator.TryGetNext(out var item))    {        array[i] = item;        i++;    }    return array;}

Thus, while Select can’t create a Span, if the original source can, processing as an internal iterator accelerates loop processing.

TryCopyTo differs from regular CopyTo by including an Index offset and allowing destination to be smaller than the source (normal .NET CopyTo fails if destination is smaller). This enables ElementAt representation when destination size is 1 - index 0 becomes First, ^1 becomes Last. Adding First, Last, ElementAt directly to IValueEnumerator would create redundancy in class definitions (affecting assembly size), but combining small destinations with Index allows one method to cover more optimization cases:

public static TSource ElementAt(this ValueEnumerable source, Index index)    where TEnumerator : struct, IValueEnumerator, allows ref struct{    using var enumerator = source.Enumerator;    var value = default(TSource)!;    var span = new Span(ref value); // create single span    if (enumerator.TryCopyTo(span, index))    {        return value;    }    // else...}

LINQ to Span

In .NET 9 and above, ZLinq allows chaining all LINQ operators on Span and ReadOnlySpan:

using ZLinq;
// Can also be applied to Span (only in .NET 9/C# 13 environments that support allows ref struct)Span span = stackalloc int[5] { 1, 2, 3, 4, 5 };var seq1 = span.AsValueEnumerable().Select(x => x * x);// If enables Drop-in replacement, you can call LINQ operator directly.var seq2 = span.Select(x => x);

While some libraries claim to support LINQ for Spans, they typically only define extension methods for Span without a generic mechanism. They offer limited operators due to language constraints that previously prevented receiving Span as a generic parameter. Generic processing became possible with the introduction of allows ref struct in .NET 9.

In ZLinq, there’s no distinction between IEnumerable and Span - they're treated equally.

However, since allows ref struct requires language/runtime support, while ZLinq supports all .NET versions from .NET Standard 2.0 up, Span support is limited to .NET 9 and above. This means in .NET 9+, all operators are ref struct, which differs from earlier versions.

LINQ to SIMD

System.Linq accelerates certain aggregation methods with SIMD. For example, calling Sum or Max directly on primitive type arrays provides faster processing than using a for loop. However, being based on IEnumerable, applicable types are limited. ZLinq makes this more generic through IValueEnumerator.TryGetSpan, targeting collections where Span can be obtained (including direct Span application).

Supported methods include:

Range to ToArray/ToList/CopyTo/etc…
Repeat for unmanaged struct and size is power of 2 to ToArray/ToList/CopyTo/etc...
Sum for sbyte, short, int, long, byte, ushort, uint, ulong, double
SumUnchecked for sbyte, short, int, long, byte, ushort, uint, ulong, double
Average for sbyte, short, int, long, byte, ushort, uint, ulong, double
Max for byte, sbyte, short, ushort, int, uint, long, ulong, nint, nuint, Int128, UInt128
Min for byte, sbyte, short, ushort, int, uint, long, ulong, nint, nuint, Int128, UInt128
Contains for byte, sbyte, short, ushort, int, uint, long, ulong, bool, char, nint, nuint
SequenceEqual for byte, sbyte, short, ushort, int, uint, long, ulong, bool, char, nint, nuint

Sum checks for overflow, which adds overhead. We've added a custom SumUnchecked method that's faster:

Since these methods apply implicitly when conditions match, understanding the internal pipeline is necessary to target SIMD application. Therefore, for T[], Span, or ReadOnlySpan, we provide the .AsVectorizable() method to explicitly call SIMD-applicable operations like Sum, SumUnchecked, Average, Max, Min, Contains, and SequenceEqual (though these fall back to normal processing when Vector.IsHardwareAccelerated && Vector.IsSupported is false).

int[] or Span gain the VectorizedFillRange method, which performs the same operation as ValueEunmerable.Range().CopyTo(), filling with sequential numbers using SIMD acceleration. This is much faster than filling with a for loop when needed:

Vectorizable Methods

Handwriting SIMD loop processing requires practice and effort. We’ve provided helpers that take Func arguments for casual use. While these incur delegate overhead and perform worse than inline code, they’re convenient for casual SIMD processing. They accept Func, Vector> vectorFunc and Func func, processing with Vector where possible and handling remainder with Func.

T[] and Span offer the VectorizedUpdate method:

using ZLinq.Simd; // needs using
int[] source = Enumerable.Range(0, 10000).ToArray();[Benchmark]public void For(){    for (int i = 0; i < source.Length; i++)    {        source[i] = source[i] * 10;    }}[Benchmark]public void VectorizedUpdate(){    // arg1: Vector => Vector    // arg2: int => int    source.VectorizedUpdate(static x => x * 10, static x => x * 10);}

While faster than for loops, performance varies by machine environment and size, so verification is recommended for each use case.

AsVectorizable() provides Aggregate, All, Any, Count, Select, and Zip:

source.AsVectorizable().Aggregate((x, y) => Vector.Min(x, y), (x, y) => Math.Min(x, y))source.AsVectorizable().All(x => Vector.GreaterThanAll(x, new(5000)), x => x > 5000);source.AsVectorizable().Any(x => Vector.LessThanAll(x, new(5000)), x => x < 5000);source.AsVectorizable().Count(x => Vector.GreaterThan(x, new(5000)), x => x > 5000);

Performance depends on data, but Count can show significant differences:

For Select and Zip, you follow with either ToArray or CopyTo:

// Selectsource.AsVectorizable().Select(x => x * 3, x => x * 3).ToArray();source.AsVectorizable().Select(x => x * 3, x => x * 3).CopyTo(destination);
// Zip2array1.AsVectorizable().Zip(array2, (x, y) => x + y, (x, y) => x + y).CopyTo(destination);array1.AsVectorizable().Zip(array2, (x, y) => x + y, (x, y) => x + y).ToArray();// Zip3array1.AsVectorizable().Zip(array2, array3, (x, y, z) => x + y + z, (x, y, z) => x + y + z).CopyTo(destination);array1.AsVectorizable().Zip(array2, array3, (x, y, z) => x + y + z, (x, y, z) => x + y + z).ToArray();

Zip can be particularly interesting and fast for certain use cases (like merging two Vec3):

LINQ to Tree

Have you used LINQ to XML? In 2008 when LINQ appeared, XML was still dominant, and LINQ to XML’s usability was shocking. Now that JSON has taken over, LINQ to XML is rarely used.

However, LINQ to XML’s value lies in being a reference design for LINQ-style operations on tree structures — a guideline for making tree structures LINQ-compatible. Tree traversal abstractions work excellently with LINQ to Objects. A prime example is working with Roslyn’s SyntaxTree, where methods like Descendants are commonly used in Analyzers and Source Generators.

ZLinq extends this concept by defining an interface that generically enables Ancestors, Children, Descendants, BeforeSelf, and AfterSelf for tree structures:

This diagram shows traversal of Unity’s GameObject, but we’ve included standard implementations for FileSystem (DirectoryTree) and JSON (enabling LINQ to XML-style operations on System.Text.Json’s JsonNode). Of course, you can implement the interface for custom types:

public interface ITraverser : IDisposable    where TTraverser : struct, ITraverser // self{    T Origin { get; }    TTraverser ConvertToTraverser(T next); // for Descendants    bool TryGetHasChild(out bool hasChild); // optional: optimize use for Descendants    bool TryGetChildCount(out int count);   // optional: optimize use for Children    bool TryGetParent(out T parent); // for Ancestors    bool TryGetNextChild(out T child); // for Children | Descendants    bool TryGetNextSibling(out T next); // for AfterSelf    bool TryGetPreviousSibling(out T previous); // BeforeSelf}

For JSON, you can write:

var json = JsonNode.Parse("""// snip...""");
// JsonNodevar origin = json!["nesting"]!["level1"]!["level2"]!;// JsonNode axis, Children, Descendants, Anestors, BeforeSelf, AfterSelf and ***Self.foreach (var item in origin.Descendants().Select(x => x.Node).OfType()){    // [true, false, true], ["fast", "accurate", "balanced"], [1, 1, 2, 3, 5, 8, 13]    Console.WriteLine(item.ToJsonString(JsonSerializerOptions.Web));}

We’ve included standard LINQ to Tree implementations for Unity’s GameObject and Transform and Godot's Node. Since allocation and traversal performance are carefully optimized, they might even be faster than manual loops.

OSS and Me

There have been several incidents in .NET-related OSS in recent months, including the commercialization of well-known OSS projects. With over 40 OSS projects under github/Cysharp and more under my personal and other organizations like MessagePack, totaling over 50,000 stars, I believe I’m one of the largest OSS providers in the .NET ecosystem.

Regarding commercialization, I have no plans for it, but maintenance has become challenging due to growing scale. A major factor in OSS projects attempting commercialization despite criticism is the mental burden on maintainers (compensation doesn’t match time investment). I experience this too!

Setting aside financial aspects, my request is for users to accept occasional maintenance delays! When developing large libraries like ZLinq, I need focused time, which means Issues and PRs for other libraries might go without response for months. I intentionally avoid looking at them, not even reading titles (avoiding dashboards and notification emails). This seemingly neglectful approach is necessary to create innovative libraries — a necessary sacrifice!

Even without that, the sheer number of libraries means rotation delays of months are inevitable. This is unavoidable due to absolute manpower shortage, so please accept these delays and don’t claim “this library is dead” just because responses are slow. That’s painful to hear! I try my best, but creating new libraries consumes tremendous time, causing cascading delays that drain my mental energy.

Also, irritations related to Microsoft can reduce motivation — a common experience for C# OSS maintainers. Despite this, I hope to continue long-term.

Conclusion

ZLinq’s structure changed significantly after feedback from the initial preview release. @Akeit0 provided many proposals for core performance-critical elements like the ValueEnumerable definition and adding Index to TryCopyTo. @filzrev contributed extensive test and benchmark infrastructure. Ensuring compatibility and performance improvements wouldn't have been possible without their contributions, for which I'm deeply grateful.

While zero-allocation LINQ libraries aren’t novel, ZLinq’s thoroughness sets it apart. With experience and knowledge, driven by sheer determination, we implemented all methods, ran all test cases for complete compatibility, and implemented all optimizations including SIMD. This was truly challenging!

The timing was perfect as .NET 9/C# 13 provided all the language features needed for a full implementation. Simultaneously, maintaining support for Unity and .NET Standard 2.0 was also important.

Beyond being just a zero-allocation LINQ, LINQ to Tree is a favorite feature that I hope people will try!

One LINQ performance bottleneck is delegates, and some libraries adopt a ValueDelegate approach using structs to mimic Func. We deliberately avoided this because such definitions are impractical due to their complexity. It’s better to write inline code than use LINQ with ValueDelegate structures. Complicating internal structure and bloating assembly size for benchmark hacks is wasteful, so we accept only System.Linq-compatible.

R3 was an ambitious library intended to replace .NET’s standard System.Reactive, but replacing System.Linq would be a much larger or perhaps excessive undertaking, so I think there might be some resistance to adoption. However, I believe we’ve demonstrated sufficient benefits to justify the replacement, so I’d be very happy if you could try it out!