All about Unsafe Code in C#

C# .net hides most of memory management, which makes it much easier for the developer. Thanks for the Garbage Collector and the use of references. But to make the language powerful enough in some cases in which we need direct access to the memory, unsafe code was invented.

Commonly while programming in the .net framework we don?t need to use unsafe code, but in some cases there is no way not to, such as the following:

Real-time applications, we might need to use pointers to enhance performance in such applications.
External functions, in non-.net DLLs some functions requires a pointer as a parameter, such as Windows APIs that were written in C.
Debugging, sometimes we need to inspect the memory contents for debugging purposes, or you might need to write an application that analyzes another application process and memory.

Unsafe code is mostly about pointers which have the following advantages and disadvantages.

Advantages of Unsafe Code in C#:

Performance and flexibility, by using pointer you can access data and manipulate it in the most efficient way possible.
Compatibility, in most cases we still need to use old windows APIs, which use pointers extensively. Or third parties may supply DLLs that some of its functions need pointer parameters. Although this can be done by writing the DLLImport declaration in a way that avoids pointers, but in some cases it?s just much simpler to use pointer.
Memory Addresses, there is no way to know the memory address of some data without using pointers.

Disadvantages of Unsafe Code in C#:

Complex syntax, to use pointers you need to go throw more complex syntax than we used to experience in C#.
Harder to use, you need be more careful and logical while using pointers, miss using pointers might lead to the following:
- Overwrite other variables.
- Stack overflow.
- Access areas of memory that doesn?t contain any data as they do.
- Overwrite some information of the code for the .net runtime, which will surely lead your application to crash.
Your code will be harder to debug. A simple mistake in using pointers might lead your application to crash randomly and unpredictably.
Type-safety, using pointers will cause the code to fail in the .net type-safety checks, and of course if your security police don?t allow non type-safety code, then the .net framework will refuse to execute your application.

After we knew all the risks that might face us while using pointer and all the advantages those pointers introduces us of performance and flexibility, let us find now how to use them. The keyword unsafe is used while dealing with pointer, the name reflects the risks that you might face while using it. Let?s see where to place it. We can declare a whole class as unsafe:

unsafe class Class1
{
//you can use pointers here!
}

Or only some class members can be declared as unsafe:

class Class1
{
//pointer
unsafe int * ptr;
unsafe void MyMethod()
{
//you can use pointers here
}
}

The same applies to other members such as the constructor and the properties.

To declare unsafe local variables in a method, you have to put them in unsafe blocks as the following:

static void Main()
{
//can't use pointers here

unsafe
{
//you can declare and use pointer here

}

//can't use pointers here
}

You can?t declare local pointers in a ?safe? method in the same way we used in declaring global pointers, we have to put them in an unsafe block.

static void Main()
{
unsafe int * ptri; //Wrong
}

If you got too excited and tried to use unsafe then when you compile the code just by using

csc test.cs

You will experience the following error:

error CS0227: Unsafe code may only appear if compiling with /unsafe

For compiling unsafe code use the /unsafe

csc test.cs /unsafe

In VS.net go to the project property page and in ?configuration properties>build? set Allow Unsafe Code Blocks to True.

After we knew how to declare a block as unsafe we should now learn how to declare and use pointers in it.

Declaring pointers

To declare a pointer of any type all what you have to do is to put ?*? after the type name such as

int * ptri;

double * ptrd;

NOTE: If you used to use pointer in C or C++ then be careful that in C# int * ptri, i; ?*? applies to the type itself not the variable so ?i? is a pointer here as well, same as arrays.

void Pointers

If you want to declare a pointer, but you do not wish to specify a type for it, you can declare it as void.

void *ptrVoid;

The main use of this is if you need to call an API function than require void* parameters. Within the C# language, there isn?t a great deal that you can do using void pointers.

Using pointers

Using pointers can be demonstrated in the following example:

static void Main()
{

int var1 = 5;

unsafe
{
int * ptr1, ptr2;
ptr1 = &var1;
ptr2 = ptr1;
*ptr2 = 20;
}

Console.WriteLine(var1);
}

The operator ?&? means ?address of?, ptr1 will hold the address of var1, ptr2 = ptr1 will assign the address of var1, which ptr1 was holding, to ptr2. Using ?*? before the pointer name means ?the content of the address?, so 20 will be written where ptr2 points.

Now var1 value is 20.

sizeof operator

As the name says, sizeof operator will return the number of bytes occupied of the given data type

unsafe
{
Console.WriteLine("sbyte: {0}", sizeof(sbyte));
Console.WriteLine("byte: {0}", sizeof(byte));
Console.WriteLine("short: {0}", sizeof(short));
Console.WriteLine("ushort: {0}", sizeof(ushort));
Console.WriteLine("int: {0}", sizeof(int));
Console.WriteLine("uint: {0}", sizeof(uint));
Console.WriteLine("long: {0}", sizeof(long));
Console.WriteLine("ulong: {0}", sizeof(ulong));
Console.WriteLine("char: {0}", sizeof(char));
Console.WriteLine("float: {0}", sizeof(float));
Console.WriteLine("double: {0}", sizeof(double));
Console.WriteLine("decimal: {0}", sizeof(decimal));
Console.WriteLine("bool: {0}", sizeof(bool));

//did I miss something?!
}

The output will be:

sbyte: 1

byte: 1

short: 2

ushort: 2

int: 4

uint: 4

long: 8

ulong: 8

char: 2

float: 4

double: 8

decimal: 16

bool: 1

Great, we don?t have to remember the size of every data type anymore!

Casting Pointers

A pointer actually stores an integer that represents a memory address, and it?s not surprising to know that you can explicitly convert any pointer to or from any integer type. The following code is totally legal.

int x = 10;
int *px;

px = &x;
uint y = (uint) px;
int *py = (int*) y;

A good reason for casting pointers to integer types is in order to display them. Console.Write() and Console.WriteLine() methods do not have any overloads to take pointers. Casting a pointer to an integer type will solve the problem.

Console.WriteLine(?The Address is: ? + (uint) px);

As I mentioned before, it?s totally legal to cast a pointer to any integer type. But does that really mean that we can use any integer type for casting, what about overflows? On a 32-bit machine we can use uint, long and ulong where an address runs from zero to about 4 billion. And on a 64-bit machine we can only use ulong. Note that casting the pointer to other integer types is very likely to cause and overflow error. The real problem is that checked keyword doesn?t apply to conversions involving pointers. For such conversions, exceptions wont be raised when an overflow occur, even in a checked context. When you are using pointers the .net framework will assume that you know what you?re doing and you?ll be happy with the overflows!

You can explicitly convert between pointers pointing to different types. For example:

byte aByte = 8;
byte *pByte = &aByte;
double *pDouble = (double*) pByte;

This is perfectly legal code, but think twice if you are trying something like that. In the above example, the double value pointed to by pDouble will actually contain a byte (which is 8), combined by an area of memory contained a double, which surely won?t give a meaningful value. However, you might want to convert between types in order to implement a union, or you might want to cast pointers to other types into pointers to sbyte in order to examine individual bytes of memory.

Pointers Arithmetic

It?s possible to use the operators +, -, +=, -=, ++ and -- with pointers, with a long or ulong on the right-hand side of the operator. While it?s not permitted to do any operation on a void pointer.

For example, suppose you have a pointer to an int, and you want to add 1 to it. The compiler will assume that you want to access the following int in the memory, and so will actually increase the value by 4 bytes, the size of int. If the pointer was pointing to a double, adding 1 will increase its value by 8 bytes the size of a double.

The general rule is that adding a number X to a pointer to type T with a value P gives the result P + X *sizeof(T).

Let?s have a look at the following example:

uint u = 3;
byte b = 8;
double d = 12.5;
uint *pU = &u;
byte *pB = &b;
double *pD = &d;

Console.WriteLine("Before Operations");
Console.WriteLine("Value of pU:" + (uint) pU);
Console.WriteLine("Value of pB:" + (uint) pB);
onsole.WriteLine("Value of pD:" + (uint) pD);

pU += 5;
pB -= 3;
pD++;

Console.WriteLine("\nAfter Operations");
Console.WriteLine("Value of pU:" + (uint) pU);
Console.WriteLine("Value of pB:" + (uint) pB);
Console.WriteLine("Value of pD:" + (uint) pD);

The result is:

Before Operations
Value of pU:1242784
Value of pB:1242780
Value of pD:1242772

After Operations
Value of pU:1242804
Value of pB:1242777
Value of pD:1242780

5 * 4 = 20, where added to pU.

3 * 1 = 3, where subtracted from pB.

1 * 8 = 8, where added to pD.

We can also subtract one pointer from another pointer, provided both pointers point to the same date type. This will result a long whose value is given by the difference between the pointers values divided by the size of the type that they represent:

double *pD1 = (double*) 12345632;
double *pD2 = (double*) 12345600;
long L = pD1 ? pD2; //gives 4 =32/8(sizeof(double))

Note that the way of initializing pointers in the example is totally valid.

Pointers to Structs and Class members

Pointers can point to structs the same way we used before as long as they don?t contain any reference types. The compiler will result an error if you had any pointer pointing to a struct containing a reference type.

Let?s have an example,

Suppose we had the following struct:

struct MyStruct
{
public long X;
public double D;
}

Declaring a pointer to it will be:

MyStruct *pMyStruct;

Initializing it:

MyStruct myStruct = new MyStruct();
pMyStruct = & myStruct;

To access the members:

(*pMyStruct).X = 18;
(*pMyStruct).D = 163.26;

The syntax is a bit complex, isn?t it?

That?s why C# defines another operator that allows us to access members of structs through pointers with a simpler syntax. The operator ?Pointer member access operator? looks like an arrow, it?s a dash followed by a greater than sign: ->

pMyStruct->X = 18;
pMyStruct->D = 163.26;

That looks better!

Fields within the struct can also be directly accessed through pointer of their type:

long *pL = &(myStruct.X);
double *pD = &(myStruct.D);

Classes and pointers is a different story. We already know that we can?t have a pointer pointing to a class, where it?s a reference type for sure. The Garbage Collector doesn?t keep any information about pointers, it?s only interested in references, so creating pointers to classes could cause the Garbage Collector to not work probably.

On the other hand, class members could be value types, and it?s possible to create pointers to them. But this requires a special syntax. Remember that class members are embedded in a class, which sets in the heap. That means that they are still under the control of the Garbage Collector, which can at any time decide to move the class instance to a new location. The Garbage Collector knows about the reference, and will update its value, but again it?s not interested in the pointers around, and they will still be pointing to the old location.

To avoid the risk of this problem, the compiler will result an error if you tried to create pointers pointing to class members in the same way we are using up to now.

The way around this problem is by using the keyword ?fixed?. It marks out a block of code bounded by braces, and notifies the Garbage Collector that there may be pointers pointing to members of certain class instances, which must not be moved.

Let?s have an example,

Suppose the following class:

class MyClass
{
public long X;
public double D;
}

Declaring pointers to its members in the regular way is a compile-time error:

MyClass myClass = new MyClass();

long *pX = &(myClass.X); //compile-time error.

To create pointers pointing to class members by using fixed keyword:

fixed (long *pX = &(myClass.X))
{

// use *pX here only.
}

The variable *pX is scoped within the fixed block only, and tells the garbage collector that not to move ?myClass? while the code inside the fixed block.

stackalloc

The keyword "stackalloc" commands the .net runtime to allocate a certain amount of memory on the stack. It requires two things to do so, the type (value types only) and the number of variables you?re allocating the stack for. For example if you want to allocate enough memory to store 4 floats, you can write the following:

float *ptrFloat = stackalloc float [4];

Or to allocate enough memory to store 50 shorts:

short *ptrShort = stackalloc short [50];

stackalloc simply allocates memory, it doesn?t initialize it to any value. The advantage of stackalloc is the ultra-high performance, and it?s up to you to initialize the memory locations that were allocated.

A very useful place of stackalloc could be creating an array directly in the stack. While C# had made using arrays very simple and easy, it still suffers from the disadvantage that these arrays are actually objects instantiated from System.Array and they are stored on the heap with all of the overhead that involves.

To create an array in the stack:

int size;
size = 6; //we can get this value at run-time as well.
int *int_ary = stackalloc int [size];

To access the array members, it?s very obvious to use *(int_ary + i), where ?i ?is the index. But it won?t be surprising to know that it?s also possible to use int_ary[i].

*( int_ary + 0) = 5; //or *int_ary = 5;
*( int_ary + 1) = 9; //accessing member #1
*( int_ary + 2) = 16;

int_ary[3] = 19; //another way to access members
int_ary[4] = 7;
int_ary[5] = 10;

In a usual array, accessing a member outside the array bounds will cause an exception. But when using stackalloc, you?re simply accessing an address somewhere on the stack; writing on it could cause to corrupt a variable value, or worst, a return address from a method currently being executed.

int[] ary = new int[6];
ary[10] = 5;//exception thrown

int *ary = stackalloc int [6];
ary[10] = 5;// the address (ary + 10 * sizeof(int)) had 5 assigned to it.

This takes us to the beginning to the article; using pointer comes with a cost. You have to be very certain of what you?re doing, any small error could cause very strange and hard to debug run-time bugs.

Conclusion

Microsoft had chosen the term ?unsafe? to warn programmers of the risk they will go throw after typing that word. Using pointers as we had seen in this article has a lot of advantages from flexibility to high performance, but a very small error, even a simple typing mistake, might cause your entire application to crash. Worst, this could happen randomly and unpredictably, and will make debugging a very harder task to do.

- Article by Mardawi

Web Development - Dotnet with Cloud

Monday, June 8, 2009