Analyzing local functions in C # 7

Original author: SergeyT
  • Transfer
Adding local functions in C # was initially redundant for me. After reading the article on the SergeyT blog , I realized that this feature is really needed. So, who doubts the need for local functions and who still does not know what it is, go ahead for knowledge!

Local functions are a new feature in C # 7 that allows you to define a function inside another function.

When to use local functions?


The basic idea of ​​local functions is very similar to anonymous methods: in some cases, creating a named function is too expensive in terms of cognitive load on the reader. Sometimes functionality, at its core, is local to another function, and it makes no sense to pollute the “external” scope with a separate named entity.

You might think that this possibility is redundant because the same behavior can be achieved with anonymous delegates or lambda expressions. But it's not always the case. Anonymous functions have certain limitations, and their performance characteristics may not be suitable for your scenarios.

Case Study 1: Prerequisites in Iterator Blocks


Here is a simple function that reads a file line by line. Do you know when an ArgumentNullException will be thrown?
public static IEnumerable ReadLineByLine(string fileName)
{
    if (string.IsNullOrEmpty(fileName)) throw new ArgumentNullException(nameof(fileName));
    foreach (var line in File.ReadAllLines(fileName))
    {
        yield return line;
    }
}
// When the error will happen?
string fileName = null;
// Here?
var query = ReadLineByLine(fileName).Select(x => $"\t{x}").Where(l => l.Length > 10);
// Or here?
ProcessQuery(query);

Methods with yield return in the body are special. They are called iterator blocks , and they are lazy. This means that the execution of these methods occurs “on demand”, and the first block of code in them will be executed only when the client of the method calls MoveNext on the resulting iterator. In our case, this means that an error will occur only in the ProcessQuery method , because all LINQ statements are also lazy.

Obviously, this behavior is undesirable because the ProcessQuery method will not have enough information about the context of ArgumentNullException . Therefore, it would be nice to throw an exception right away - when the client calls ReadLineByLinebut not when the client processes the result.

To solve this problem, we need to extract the verification logic into a separate method. This is a good candidate for an anonymous function, but anonymous delegates and lambda expressions do not support iterator blocks (*):

(*) Lambda expressions in VB.NET can have an iterator block.
public static IEnumerable ReadLineByLine(string fileName)
{
    if (string.IsNullOrEmpty(fileName)) throw new ArgumentNullException(nameof(fileName));
    return ReadLineByLineImpl();
    IEnumerable ReadLineByLineImpl()
    {
        foreach (var line in File.ReadAllLines(fileName))
        {
            yield return line;
        }
    }
}


Case Study 2: Preconditions in Asynchronous Methods


Asynchronous methods have a similar problem with exception handling: any exception thrown by a method marked with the async keyword appears in the returned task:
public static async Task GetAllTextAsync(string fileName)
{
    if (string.IsNullOrEmpty(fileName)) throw new ArgumentNullException(nameof(fileName));
    var result = await File.ReadAllTextAsync(fileName);
    Log($"Read {result.Length} lines from '{fileName}'");
    return result;
}
string fileName = null;
// No exceptions
var task = GetAllTextAsync(fileName);
// The following line will throw
var lines = await task;

You might think that when an error occurs there is not much difference. But this is far from the truth. A faulted task means that the method itself could not do what it was supposed to do. A faulty task means that the problem lies in the method itself or in one of the blocks on which the method depends.

Checking for reliable prerequisites is especially important when the resulting task is transferred through the system. In this case, it would be very difficult to understand when and what went wrong. A local function can solve this problem:
public static Task GetAllTextAsync(string fileName)
{
    // Eager argument validation
    if (string.IsNullOrEmpty(fileName)) throw new ArgumentNullException(nameof(fileName));
    return GetAllTextAsync();
    async Task GetAllTextAsync()
    {
        var result = await File.ReadAllTextAsync(fileName);
        Log($"Read {result.Length} lines from '{fileName}'");
        return result;
    }
}


Case Study 3: Local Function with Iterator Blocks


I was very annoyed that iterators cannot be used inside lambda expressions. Here is a simple example: if you want to get all fields in a type hierarchy (including private ones), you need to go through the inheritance hierarchy manually. But the bypass logic is specific to a particular method and should be maximally “localized”:
public static FieldInfo[] GetAllDeclaredFields(Type type)
{
    var flags = BindingFlags.Instance | BindingFlags.Public |
                BindingFlags.NonPublic | BindingFlags.DeclaredOnly;
    return TraverseBaseTypeAndSelf(type)
        .SelectMany(t => t.GetFields(flags))
        .ToArray();
    IEnumerable TraverseBaseTypeAndSelf(Type t)
    {
        while (t != null)
        {
            yield return t;
            t = t.BaseType;
        }
    }
}


Case Study 4: A Recursive Anonymous Method


Anonymous functions cannot refer to themselves by default. To get around this limitation, you must declare a local variable with the delegate type, and then capture this local variable inside the lambda expression or anonymous delegate:
public static List BaseTypesAndSelf(Type type)
{
    Action, Type> addBaseType = null;
    addBaseType = (lst, t) =>
    {
        lst.Add(t);
        if (t.BaseType != null)
        {
            addBaseType(lst, t.BaseType);
        }
    };
    var result = new List();
    addBaseType(result, type);
    return result;
}

This approach is not very readable, and the following solution with a local function seems more natural:
public static List BaseTypesAndSelf(Type type)
{
    return AddBaseType(new List(), type);
    List AddBaseType(List lst, Type t)
    {
        lst.Add(t);
        if (t.BaseType != null)
        {
            AddBaseType(lst, t.BaseType);
        }
        return lst;
    }
}


Use case 5: when allocation issues matter


If you've ever worked on a performance-critical application, then you know that anonymous methods are not cheap:
  • The overhead of calling a delegate (very small, but they exist).
  • Allocation of 2 objects in the managed heap if the lambda expression captures a local variable or method argument (one for the closure instance and the other for the delegate itself).
  • Allocation of 1 object in a managed heap if the lambda expression captures the instance fields of the object.
  • The absence of allocations will be only if the lambda expression does not capture anything or operates only with static members.

But the allocation model for local functions is significantly different.
public void Foo(int arg)
{
    PrintTheArg();
    return;
    void PrintTheArg()
    {
        Console.WriteLine(arg);
    }
}

If a local function captures a local variable or argument, then the C # compiler generates a special closure structure, creates an instance of it, and passes it by reference to the generated static method:
internal struct c__DisplayClass0_0
{
    public int arg;
}
public void Foo(int arg)
{
    // Closure instantiation
    var c__DisplayClass0_ = new c__DisplayClass0_0() { arg = arg };
    // Method invocation with a closure passed by ref
    Foo_g__PrintTheArg0_0(ref c__DisplayClass0_);
}
internal static void Foo_g__PrintTheArg0_0(ref c__DisplayClass0_0 ptr)
{
    Console.WriteLine(ptr.arg);
}

(The compiler generates names with invalid characters, such as <and>. To improve readability, I changed the names and simplified the code a bit.) A

local function can capture instance state, local variables (***) or arguments. No allocation in the managed heap will occur.
(***) Local variables used in a local function must be defined (definitely assigned) at the location of the local function declaration.

There are several cases when an object will be created on a managed heap:

1. A local function is explicitly or implicitly converted to a delegate.
Delegate allocation will occur if a local function captures instance or static field fields, but does not capture local variables or arguments.
public void Bar()
{
    // Just a delegate allocation
    Action a = EmptyFunction;
    return;
    void EmptyFunction() { }
}

The closure and delegate will happen if a local function captures local / arguments
public void Baz(int arg)
{
    // Local function captures an enclosing variable.
    // The compiler will instantiate a closure and a delegate
    Action a = EmptyFunction;
    return;
    void EmptyFunction() { Console.WriteLine(arg); }
}


2. A local function captures a local variable / argument, and an anonymous function captures a variable / argument from the same scope.
This is a more subtle case.

The C # compiler generates a separate closure type for each lexical scope (method arguments and local top-level variables are in the same top-level area). In the following case, the compiler will generate two types of closure:
public void DifferentScopes(int arg)
{
    {
        int local = 42;
        Func a = () => local;
        Func b = () => local;
    }
    Func c = () => arg;
}

Two different lambda expressions use the same type of closure if they capture variables from the same scope. The generated methods for lambda expressions a and b are in the same closure type:
private sealed class c__DisplayClass0_0
{
    public int local;
    internal int DifferentScopes_b__0()
    {
        // Body of the lambda 'a'
        return this.local;
    }
    internal int DifferentScopes_b__1()
    {
        // Body of the lambda 'a'
        return this.local;
    }
}
private sealed class c__DisplayClass0_1
{
    public int arg;
    internal int DifferentScopes_b__2()
    {
        // Body of the lambda 'c'
        return this.arg;
    }
}
public void DifferentScopes(int arg)
{
    var closure1 = new c__DisplayClass0_0 { local = 42 };
    var closure2 = new c__DisplayClass0_1() { arg = arg };
    var a = new Func(closure1.DifferentScopes_b__0);
    var b = new Func(closure1.DifferentScopes_b__1);
    var c = new Func(closure2.DifferentScopes_b__2);
}

In some cases, this behavior can cause some very serious memory problems. Here is an example:
private Func func;
public void ImplicitCapture(int arg)
{
    var o = new VeryExpensiveObject();
    Func a = () => o.GetHashCode();
    Console.WriteLine(a());
    Func b = () => arg;
    func = b;
}

It seems that the variable o should be available for garbage collection right after calling delegate a () . But this is not so, since two lambda expressions use the same type of closure:
private sealed class c__DisplayClass1_0
{
    public VeryExpensiveObject o;
    public int arg;
    internal int ImplicitCapture_b__0()
        => this.o.GetHashCode();
    internal int ImplicitCapture_b__1()
        => this.arg;
}
private Func func;
public void ImplicitCapture(int arg)
{
    var c__DisplayClass1_ = new c__DisplayClass1_0()
    {
        arg = arg,
        o = new VeryExpensiveObject()
    };
    var a = new Func(c__DisplayClass1_.ImplicitCapture_b__0);
    Console.WriteLine(func());
    var b = new Func(c__DisplayClass1_.ImplicitCapture_b__1);
    this.func = b;
}

This means that the lifetime of the closure instance is tied to the lifetime of the func field : the closure remains alive as long as the delegate is accessible from the application code. This can extend the lifetime of VeryExpensiveObject , which in essence is a kind of memory leak.

A similar problem occurs when a local function and a lambda expression capture variables from the same scope. Even if they capture different variables, the type of closure will be common, causing the object to be allocated on the managed heap:
public int ImplicitAllocation(int arg)
{
    if (arg == int.MaxValue)
    {
        // This code is effectively unreachable
        Func a = () => arg;
    }
    int local = 42;
    return Local();
    int Local() => local;
}

It will be converted by the compiler to:
private sealed class c__DisplayClass0_0
{
    public int arg;
    public int local;
    internal int ImplicitAllocation_b__0()
        => this.arg;
    internal int ImplicitAllocation_g__Local1()
        => this.local;
}
public int ImplicitAllocation(int arg)
{
    var c__DisplayClass0_ = new c__DisplayClass0_0 { arg = arg };
    if (c__DisplayClass0_.arg == int.MaxValue)
    {
        var func = new Func(c__DisplayClass0_.ImplicitAllocation_b__0);
    }
    c__DisplayClass0_.local = 42;
    return c__DisplayClass0_.ImplicitAllocation_g__Local1();
}

As you can see, all local variables from the upper scope are now part of the closure class, which leads to the creation of the closure object, even when the local function and lambda expression capture different variables.

Local functions 101


The following is a list of the most important aspects of local functions in C #:
  • Local functions can define iterator blocks.
  • Local functions are useful for eager checking preconditions in asynchronous methods and iterator blocks.
  • Local functions can be recursive.
  • Local functions do not allocate on the heap unless they are converted to delegates.
  • Local functions are slightly more efficient than anonymous functions due to the lack of overhead of delegate calls (****).
  • Local functions can be declared after the return statement, which allows you to separate the main logic of the method from the auxiliary.
  • Local functions can “hide” a function with the same name declared in the outer scope.
  • Local functions can be asynchronous and / or unsafe (unsafe); other modifiers are not allowed.
  • Local functions cannot have attributes.
  • Local functions are not very IDE friendly: there is no “refactoring for local methods allocation” (R # 2017.3 already supports this feature. - approx. Per), and if the code with the local function does not compile, you will get many “squiggles” underscores in the IDE.

(****) Here are the results of the microbenchmark:
private static int n = 42;
[Benchmark]
public bool DelegateInvocation()
{
    Func fn = () => n == 42;
    return fn();
}
[Benchmark]
public bool LocalFunctionInvocation()
{
    return fn();
    bool fn() => n == 42;
}

Method
Mean
Error
Stddev
DelegateInvocation
1.5041 ns
0.0060 ns
0.0053 ns
LocalFunctionInvocation
0.9298 ns
0.0063 ns
0.0052 ns

To get these numbers, you need to manually “decompile” the local function into a regular function. The reason for this is simple: a simple function like “fn” will be inline at runtime, and the test will not show the true cost of the call. To get these numbers, I used a static function marked with the NoInlining attribute (unfortunately, you cannot use attributes with local functions).

Only registered users can participate in the survey. Please come in.

Is the inclusion of local functions in C # justified?

  • 59.5% Yes, of course. They have quite a few use cases 119
  • 31.5% You can do without them 63
  • 9% They are not needed. 18

Also popular now: