Paul-Sebastian Manole
Paul-Sebastian Codes

Paul-Sebastian Codes

C#: Pass by value vs ref vs in vs out...

A rough guide

Paul-Sebastian Manole's photo
Paul-Sebastian Manole

Published on Oct 19, 2021

10 min read

Subscribe to my newsletter and never miss my upcoming articles

When learning C# you will find out about the different ways that variables can be passed as arguments to function parameters, such as by value or by some kind of reference ("normal", in reference, or out reference). In this post, I will try to summarize the essential differences between them.

The official Microsoft docs say that in C# arguments can be passed to parameters either by value or by reference. Passing by reference enables function members, methods, properties, indexers, operators, and constructors to change the value of the parameters and have that change persist in the calling environment.

Functional programming advocates might not agree with the above statement, because it enables functions to change variables that are defined outside of their scope, which makes code very prone to bugs, especially runtime bugs that are hard to diagnose.

The scope of this article is not to advocate for, or against the different programming styles. None are perfect and they all have their uses. The scope of this article is just to explain these features of the C# language and to briefly explain where they might be useful. In the end, they are just tools in your programmer's toolbox.

Pass by value

The default behavior of the language, when no parameter modifiers are used, is called pass by value. What this means is that, whatever value the variable that is passed-in holds, it will be copied from the passed-in variable to the function argument, effectively creating different instances of the data but with the same exact value at the start. These instances are in no other way related to each other. Changing (ie. reassigning) one does not change the other for example. They are independent.

This can sometimes be confusing for new C# programmers, because they usually think that only primitive types, like int and double, are passed by value, ie. the types inheriting from ValueType, because they are easy to copy.

In fact, reference types can also be passed by value. This is what normally happens when you don't use parameter modifiers and pass a reference type to an argument. The difference between value types and reference types, is that with reference types it's the reference that gets copied from one variable to the other, not the object the variable refers to (like it happens with primitive types). This is because there's no telling how large the object graph might be and copying such objects might be very expensive to do. So when the copy goes through, you will have two different variables holding the same reference, but these references, while being equal, are not the same, because the variables are not the same. They are still independent.

So passing by value copies values, but what the value actually is underneath doesn't matter. C# tries to abstract away memory pointers, which are more prevalent in programming languages like C and C++. This helps you always think of variables as their values, and not as either values or pointers/references. C# does this by automatically dereferencing pointers when you use them, without requiring any extra syntax, like you have with C++ (ie. the * prefix, the -> arrow operator, etc.).

More precisely, when a reference type object gets passed in by value (ie. no parameter modifier is used), the outside variable's reference value gets copied to the function argument. The two variables, the outside variable and the function argument, are two distinct variables that now hold the same value: the same reference, to the same object in memory.

So if you have something like this:

public static void PassByValue(int i, MyObject o)
{
    // `i` contains an actual clone of the passed in variable because it is
    //     a `ValueType` and value types in C# are cloned on assignment,
    //     so `i` and whatever was passed in for it, they point to different
    //     values (objects).
    // `o` contains a copy of the reference passed in when this function
    //     is called, so doing anything with this reference, like calling
    //     methods on the  underlying object, can modify `myObj` below, which
    //     is outside of the function body.
}

MyObject myObj1 = new MyObject();

PassByValue(0, myObj1);

// `myObj1` might have been modified here, if the above function calls
// a mutating function on `myObj1`, inside its body.

It might seem like calling the function PassByValue(0, myObj1) should clone myObj1 into the function body as the object o, but that's not what happens, because myObj is not a value type, but a reference type, so what actually gets copied is the reference itself, because that is the value that is stored intrinsically in the variable.

o is functionally like a pointer, a special variable holding a reference to the location of an object somewhere in memory, but in C# you don't actually think like that, because you don't usually work with pointers directly, you just work with variables holding objects or values.

And while the two references (myObj1 and o) are two distinct variables that point to the same object in memory, nothing is preventing you from changing the value of either variable so that they no longer point to the same object (or act on the same object), but reassigning something else to the parameter variable inside the function, doesn't also reassign to the variable outside the function's scope.

public static void PassByValue(int i, MyObject o)
{
    // we don't do anything to modify `o` up until here,
    // for example, we don't call any members that modify
    // `o`'s state, because that would also modify the state
    // of the object passed as argument.

    // this assignment does not affect the value of `myObj1` below,
    // it just makes `o` point to a different object.
    o = new MyObject();

    // starting from here, we can now do anything with `o`,
    // without affecting `myObj1`.
}

MyObject myObj1 = new MyObject();

PassByValue(0, myObj1);

// myObj1 is still the same here like it was when it was created above.

To summarize, pass by value copies what the variable that's passed in holds:

  1. If it's a value type object, it clones the value of the argument so that the variable in the caller's scope and the variable in the function's scope point to two different objects (but with the same value, initially).
  2. If it's a reference type, it copies the reference only, so that that reference in the caller's scope (the one passed as argument) and the reference in the function's scope both point to the same same object, but while the references held by the two variables are practically one and the same, the two variables are not themselves one and the same, so reassigning one does not also reassign the other.

Pass by reference

Passing a variable by reference is very simply what it just sounds like: passing a variable, to a function, by its reference (a reference to the variable itself).

As you might have deduced, no matter what semantics the passed in object (via its variable) has by default, C# will always take a reference to it and pass it as argument to the function.

To be more precise:

  1. If the variable passed in is a value type, then it will no longer be passed by value and cloned before entering the body of the function, but will actually be passed by reference like any other reference type object. C# will take a reference to the value type object (its variable) and make the variable inside the function's scope point to the same value in memory.

    So this is almost the same behavior like the one described above for pass by value for reference types, thus anything you do with that reference will affect the outside variable too, including reassignment! Because of the value semantics of value types, = (re)assigns (new) values to whatever a variable points to. So between two variables that hold references to the same value types, if you reassign one of those variables, C# replaces the value held at the address that that variable points to. In a way, value types are also references, but they just have different semantics (and different treatment at runtime).

  2. If the variable passed in is a reference type, then a reference to that variable will be created (a reference to a reference type), so if you reassign to that reference for example, the outside variable reflect this.

Note that the above two cases are exactly the same in behavior: a reference to the outside variable is created and we work with that reference in the function's scope, or body. Altering it, alters the outside variable too. Calling methods on it that might modify it, also modifies the object pointed to by the outside variable.

To conclude a few things that you might have already deduced by now, value types can almost be confused with the variables that hold them, while reference types sort of sit separately from the variables that hold them.

If you think about it, it sort of makes sense. Value types are held on the stack, which is where the grunt of a program's work happens, where we can only work with the variable that is sitting on top of the stack (there's no referencing or working with anything below the first item on the stack, unless you start popping elements off the stack).

Reference types (actually the objects that these reference types point to) sit on the heap, somewhere far away 😂 and we work with them mostly by sending them messages and getting back replies.

Here is some test code to support the ideas above:

using System;

public class MyClass
{
    public int Y { get; set; } = 0;
}

public class Program
{
    public static int z = 0;

    public static void TestValueTypeByRef(ref int arg)
    {
        arg = 1;
        // also arg.MutateSomehow()
    }

    public static void TestRefByRef1(ref MyClass c)
    {
        c.Y = 1;
    }

    public static void TestRefByRef2(ref MyClass c)
    {
        c = new MyClass();
        c.Y = 2;
    }

    public static void Main()
    {
        Console.WriteLine($"Main z = {z}"); // Main z = 0

        TestValueTypeByRef(ref z);
        Console.WriteLine($"After TestValueTypeByRef(ref z): Main z = {z}"); // After TestValueTypeByRef(ref z): Main z = 1
        Console.WriteLine("");

        var mc = new MyClass();
        Console.WriteLine($"Main mc.Y = {mc.Y}"); // Main mc.Y = 0

        TestRefByRef1(ref mc);
        Console.WriteLine($"After TestRefByRef1(ref mc): Main mc.Y = {mc.Y}"); // After TestRefByRef1(ref mc): Main mc.Y = 1

        TestRefByRef2(ref mc);
        Console.WriteLine($"After TestRefByRef2(ref mc): Main mc.Y = {mc.Y}"); // After TestRefByRef2(ref mc): Main mc.Y = 2
    }
}

I almost forgot, but passing by reference requires that variables be initialized when passed in (you cannot pass in null variables).

Passing by input reference

This is a special case of passing by reference where you're not allowed to modify (assign to) the variable inside the function, but you can call methods on it. Essentially the variable is readonly inside the function body (but the object is still modifiable internally via its interface).

This is good for functions that might want to take a reference to an object from an outside scope and call methods on it, but ensure that the external variable still points to the same object and that it was not swapped with another object instance, inside the function.

You also don't need to use the in keyword when calling the function like you do when using ref. In the calling scope, there's no need to be aware that the variable you're passing as an argument will be passed by reference, because it cannot be modified (reassigned) unexpectedly.

This also requires that the reference be already initialized before calling the function or else you'll get a null reference exception.

Passing by output reference

This is the last special case of passing by reference, which is exactly like ref but you don't need to initialize the variable before calling the function.

You do however need to assign a value to the out parameter before returning from the function.

You also need to use the out keyword when declaring and calling the function (you need to be aware of this in the calling scope because the variable is not readonly inside the function and could be reassigned).

Using out can make code more readable and can consolidate for example, multiple operations on multiple variables inside one function by having those variables be returned back to the calling scope without having to pass them as arguments and also encapsulate them in the function's return type. This way you could use the function's return type for error reporting for example. Or you could use tuples and exceptions for error reporting. It's your choice.

Closing note

At this point, I mainly write these blog posts to help myself consolidate what I've learned, but I hope it can help others like myself as well. While the quality now is a rough draft of what could be more well written and thought out blog posts, the focus right now is to teach myself first and then maybe others, so I hope you can forgive me for now for putting out not so stellar content.

And hey, if you have anything constructive to say or anything to ask, feel free to use the comments section below.

 
Share this