Thursday, October 14, 2010

Is .NET Type-Safe?

A type error is erroneous or undesirable program behaviour caused by a discrepancy between differing data types. […] The behaviors classified as type errors by a given programming language are usually those that result from attempts to perform operations on values that are not of the appropriate data type.

That’s Wikipedia.

I always thought that .NET is type-safe. I always thought that the only legitimate way to access object data is to use it’s direct interface or an external dedicated interface (reflection).

I was wrong.

You see, in C/C++ there’s a notion of “union” where different data can share the same memory location. It seems that to preserve the compatibility with the legacy code, the same feature has been added to .NET but what shocks me is that it not only applies to value types but also to reference types.

Consider following example:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Runtime.InteropServices;
 
namespace ConsoleApplication50
{
    class Program
    {
        static void Main( string[] args )
        {
            Foo f = new Foo();
 
            f.A = new A() { ab = 123, ai = 12345678 };
 
            Console.WriteLine( "{0} {1} {2} {3} {4}", f.B.b1, f.B.b2, f.B.b3, f.B.b4, f.B.bi );
            Console.ReadLine();
        }
    }
 
    [StructLayout( LayoutKind.Explicit )]
    public class Foo
    {
        [FieldOffset( 0 )]
        public A A;
        [FieldOffset( 0 )]
        public B B;
    }
 
    public class A
    {
        public int  ai;
        public byte ab;
    }
    public class B
    {
        public B( string Param ) { }
 
        public byte bi;
        public byte b1;
        public byte b2;
        public byte b3;
        public byte b4;
    }
}

Note that Foo is defined so that A and B overlap. But both A and B are reference types, however they point to the same address in memory. The result is somewhat surprising – although only A is explicitely initialized, B is accessed without any issues and because A and B overlap, the internal content of A is read through consecutive members of B.

I must say that I am really and deeply confused.

The ECMA, Partition II, 10.7 says

Offset values shall be non-negative. It is possible to overlap fields in this way, though offsets occupied by an
object reference shall not overlap with offsets occupied by a built-in value type or a part of another object
reference. While one object reference can completely overlap another, this is unverifiable.

If the code is unverifiable how can it run with no issues?

I always thought that the code which cannot be statically verified to be type-safe (and thus considered unsafe) can be executed only if SecurityPermission( SecurityAction.RequestMinimum, SkipVerification=true ) attribute is present in the assembly (this is how you let the code using pointers be executed by the runtime environment – while it’s usafe, it’s your responsibility to let it execute).

This is not the case, however, in the above example. The code runs perfectly under .NET 2.0 and .NET 4.0. What’s also confusing, there are two outputs from two different versions of PEVerify.exe:

Microsoft (R) .NET Framework PE Verifier.  Version  3.5.30729.1Copyright (c) Microsoft Corporation.  All rights reserved.All Classes and Methods in ConsoleApplication50.exe Verified.

and

Microsoft (R) .NET Framework PE Verifier.  Version  4.0.30319.1Copyright (c) Microsoft Corporation.  All rights reserved.[IL]: Error: [C:\Users\wzychla\Documents\Poligony\C#3.0\ConsoleApplication50\ConsoleApplication50\bin\Debug\ConsoleApplication50.exe : ConsoleApplication50.Program::Main] Type load failed.[token  0x02000003] Type load failed.2 Error(s) Verifying ConsoleApplication50.exe

Would you at least expect the latter to be used by .NET 4.0 Runtime? Well, it’s not, as the code runs under .NET 4.0. The former is invoked from

c:\Program Files\Microsoft SDKs\Windows\v7.0A\bin

the latter from

c:\Program Files\Microsoft SDKs\Windows\v7.0A\bin\NETFX 4.0 Tools

The case is getting interesting! It seems that even .NET Framework engineers seem unsure whether or not such code is verifiable or not (or maybe the notion of “type-safety” changes as the time passes). Why the runtime does not prevent this code from executing without proper SecurityPermission even though the latest PEVerify claims it’s unsafe?

Nevertheless, I belive that the example from the above reveals something not quite intended. To me the .NET Framework is not type-safe anymore. Most people consider that type-safety means that the code accesses only the memory locations it is somehow authorized to access (directly or using dedicated API). However, consider this (I just modified the Main method from the above example):

static void Main( string[] args )
{
     Foo f = new Foo();
 
     f.A = new A() { ab = 123, ai = 12345678 };
 
     Console.WriteLine( f.A.ai );
 
     f.B.b3 = 44;
 
     Console.WriteLine( f.A.ai );
 
     Console.ReadLine();
}

How am I allowed to modify the internal contents of an object (A in this case) by modifying the contents of another object? Do you still belive that the runtime cares for the consistency of your objects? Consider this:

public class A
{
    public A( int i ) { this.i = i; }
 
    private int i;
 
    public void WriteI()
    {
        Console.WriteLine( i );
    }
}

This is your class. It does not expose any public data, it can only print the internal value. I can easily compromise your object by wrapping it together with another object having different structure:

[StructLayout( LayoutKind.Explicit )]
public class HackShell
{
    [FieldOffset( 0 )]
    public A A;
    [FieldOffset( 0 )]
    public HackA B;
}
 
public class HackA
{
    public byte i1;
    public byte i2;
    public byte i3;
    public byte i4;
}

and now the integrity of your object does not hold anymore:

static void Main( string[] args )
 {
     A a = new A( 5 );
 
     // do you believe in the integrity of your object?
 
     HackShell shell = new HackShell();
     shell.A = a;
 
     // well, here you are:
 
     shell.B.i3 = 17;
 
     a.WriteI();
 
     Console.ReadLine();
 }

Unbelievable.

Altough the internal data of an object can be modified using reflection, the reflection cannot be used to alter the integrity of an object. And while pointers can be used to alter the integrity of objects, using them means that your code is not verifiable and so it will not be executed without explicit security permissions.

The conclusion is as follows: it seems that it’s valid to write a code that is completely type-unsafe and verifiable in the same time.

Wednesday, October 6, 2010

Analyzing IIS logs (aka "Fundamendal change in Microsoft’s attitude”)

There are plenty of interesting ways you can analyze IIS logs, however after a short research I’ve found that two of them are worth to be mentioned:

  • use Analog, a free tool which can be customized to produce a detailed report out of any log file but of course it works like a charm for IIS logs. Analog output is a webpage which shows various statistics of your log including few charts. Edit: 06-2018: Analog has been discontinued. Please visit this comparison list instead.
  • import the log file to a SQL Server table and use SQL to build your own statistics

There’s a free tool to support the latter scenario, provided by Microsoft, PrepWebLog. It’s supposed to rewrite the log file so that the lines starting with comment char (#) are stripped out. The tool however does not work because of stupid bug in the code (the source code is included):

#include <stdio.h>
#include <string.h>
 
int main(int argc, char **argsch)
{
   FILE *stream;
   char line[1000];
   int  ch;
 
   if(argc < 2)
   {
       printf("Usage: preplog.exe <weblog>\n");
       printf("\nThe output will go to stdout, so use > filename to direct to an output file\n");
       return -1;
    }//if
 
 
   if( (stream = fopen( argsch[1], "r" )) != NULL )
   {
        while(fgets(line,10000,stream) != NULL)
        {
           if(ch = strncmp(line,"#",1) !=0)
           {
            printf( "%s", line);
           }//if
        }//while
      fclose( stream );
      return 0;
   }//if
   else
   {
       printf("Could not open %s.  Please ensure that the path and filename are correct.\n",argsch[1]);
       return -1;
   }//else
}//main

(yes, this is the complete source code of the tool!) Please note that “line” is defined as char[1000] while fgets tries to put 10000 chars into it. And yes, the log I’ve tried to analyze had a line longer than 1000 characters.

So I’ve been struck with a fundamental change in Microsoft attitude – instead of providing tools that work, they provide tools that do not work but since the source code is included, customers can fix all the issues on their own.

Way to go, Microsoft :)