Object-Oriented Software Development: January 2009

Thursday, January 29, 2009

C# Puzzle No.12 (intermediate)

Consider following class which stores the information about external urls.

   1: class LinkInfo

   2: {

   3:      public int    Ord { get; set; }

   4:      public string Url { get; set; }

   5: }

In your application you somehow retrieve the list of such items:

   1: List<LinkInfo> l = new List<LinkInfo>()

   2:      {

   3:        new LinkInfo() { Ord = 1, Url = "" },

   4:        new LinkInfo() { Ord = 2, Url = "test1" },

   5:        new LinkInfo() { Ord = 3, Url = "test1" },

   6:        new LinkInfo() { Ord = 4, Url = "test2" },

   7:        new LinkInfo() { Ord = 5, Url = "" }

   8:      };

Your goal is to use Linq to filter this list in a special way:

if the Url is empty - the item is always returned

if the Url is nonempty - only one item with such Url is returned

In the above case the Linq expression should return:

   1: {

   2:   new LinkInfo() { Ord = 1, Url = "" },

   3:   new LinkInfo() { Ord = 2, Url = "test1" },

   4:   new LinkInfo() { Ord = 4, Url = "test2" },

   5:   new LinkInfo() { Ord = 5, Url = "" }

   6: };

   1: {

   2:   new LinkInfo() { Ord = 1, Url = "" },

   3:   new LinkInfo() { Ord = 3, Url = "test1" },

   4:   new LinkInfo() { Ord = 4, Url = "test2" },

   5:   new LinkInfo() { Ord = 5, Url = "" }

   6: }

How effective is your solution? Can you not only think of just a solution but of an effective one?

Thursday, January 22, 2009

NTE_BAD_DATA (0x80090005) on CryptImportKey

The CryptImportKey docs says that NTE_BAD_DATA can occur when importing a key when

Either the algorithm that works with the public key to be imported is not supported by this CSP, or an attempt was made to import a session key that was encrypted with something other than one of your public keys.

This has been exactly the case in my scenario. I generate RSA public/private key pair using .NET and RSACryptoServiceProvider:

   1: RSACryptoServiceProvider rsa = new RSACryptoServiceProvider( 2048 );

2:

   3: File.WriteAllBytes( @"capipublic.key", rsa.ExportCspBlob( false ) );

   4: File.WriteAllBytes( @"capiprivate.key", rsa.ExportCspBlob( true ) );

and then try to use these keys to encrypt/decrypt data in C++ using CryptoAPI.

It seems that the default MS_DEF_PROV provider is uncapable of importing a 2048-bit key and it just returns with NTE_BAD_DATA.

However, initializing the crypto context (CryptAcquiteContext) with more powerful CSP is enough, in this case the the MS_STRONG_PROV.

Thursday, January 15, 2009

Password recovery mechanism for applications which store data in password-protected files

The issue

If your application stores data in files, you probably think of a mechanism to protect the data with user-provided passwords.

The common issue with password-protected files is that a lost password cannot be easily recovered. However, if the application provides a backdoor or magic unlock code it can be easily compromised and password-removing tools can be easily written.

Let's summarize few assumptions on the encrypt - recovery issue:

the application should be able to use any known encryption mechanism to protect files
the password-recovery mechanism should not be built into the application code but rather involve an additional actions from the application's producer (for example - I can send the file to the application's producer and they magically unlock it for me)
the password-recovery mechanism should not allow the application's producer to retrieve the password in an plain, explicit form (because I might have used the same password to protect my files and my bank account)
the password-recovery mechanism should provide some form of authorization so that no one is able to steal my files and ask the producer to unlock them
the password-recovery mechanism should be as automatized as it can be so that it does not involve a physical person to unlock each single file

The idea

I use standard cryptography mechanisms to fulfill above requirements: SHA512 will be used to compute hashes, AES will be used to encrypt file data and RSA will be used to recover passwords.

The idea works like this: when the user decides to protect the file with a password, AES is used to encrypt the file data. However, the user is asked to provide both password and email which will then be used to authorize unlock requests.

The application then stores the data in an encrypted form containing three sections:

(E) Base64 form of the email provided by the user (so that it can be easily extracted from the encrypted document)
(C) AES encoded file content with SHA512(Password) is used as an actual password
(S) RSA encoded tuple (Email, SHA512(Password)) (RSA's public key is provided together with the application)

When a password protected file is open, the user is asked to enter his/her password:

Note, that the EMail info is extracted from the (E) section of the document. Although it can be easily modified in the encrypted file (it's not encrypted but only base64ed), the email is also stored in one-way RSA signature of the document [(S) section].

After user provides the password, SHA512 of it is used to decrypt the file content using AES as the decryption algorithm.

Then, the triple - decrypted document content, user email and user password - exists in the application's memory. User can then invoke change password operation which, again, recomputes all three sections of the encrypted file, the (E), (C) and (S) and stores the document.

Recovering lost passwords

To recover a lost password, user clicks the [I lost my password] link on the password input window. The application contacts provider's webservice passing three parameters to the webservice - the encrypted filename, the email extracted from the public (E) section of the document and the contents of the (S) section of the document.

Note, that the webservice knows the RSA's private key used to encrypt the (S) section of the document. The (S) section is then decoded.

There are three possibilities:

the contents of the (S) section sent to the webservice cannot be decrypted. An email is sent from the server to the provided user email address saying that "sorry but the request cannot be processed"
the content of the (S) section is succesfully decrypted, however the user email provided to the webservice from the (E) section and the decoded user email recovered from the (S) section do not match. An email is sent to the provided email address saying "sorry but the document context does not seem to be auhtorized to provided user address"
the content of the (S) section is succesfully decrypted and both emails are the same. This means that the user is authorized to access the document data and an email is sent saying "dear user, this is the unlock code for your file" with the SHA512(Password) decrypted from the (S) section of the document

Note how the requirements are fulfilled. The email is stored independently in two sections of the document and the unlock code is generated only when these two addresses match. On the other hand, the provider's service get's no access to plain password but rather it knows the SHA512 hash of it.

The user is then asked to enter the unlock code (which in fact is the SHA512 hash of his password):

Since the application now knows the SHA512 hash of the password, it uses it to AES-decrypt the document data (remember that the hash is used to encrypt the data, not the password itself).

Note, that very few users will try to unlock files encrypted with an email which they are not able to use anymore. As a emergency procedure you can ask users to send such files directly to the service provider where the authorization part can be skipped but then some other authorization mechanisms have to be used.

The implementation

As usual, the reference implementation is provided (fileencrecov.zip) containing two Visual Studio 2008 C# projects: a dummy application and a recovery webservice. The webservice does not send real emails! Instead, a dummy implementation is provided and "emails" are stored in the application's server directory so you can open them with the Notepad and copy/paste the unlock code to the application.

Where to go from there

The idea and the implementations are provided "as is" which means that I do not take any responsibility of any holes in the recovery protocol which can be used to compromise it. You are free to use the code, however, if you wish to use it in a closed-source application you have to notify me of it.

You are also free to yet tune the protocol so that for example salts are used when computing hashes so that the provider's webservice cannot easily store the requests and perform a dictionary attack to retrieve users' plain passwords.

If you belive that the idea or the implementation is flawed in any way, please drop a note so others do not use something which has some fundamental problems.

Wednesday, January 14, 2009

LinqToSQL: how to replace DELETE with UPDATE

Today I wrote a small web application which uses LinqToSQL and LinqToSQLDataSource to connect to the data source. This time, however, I thought that I could use a common pattern for deleting: instead of physically deleting records from the database I was going to update them (and set object.Deleted = true).

My first successive approach was to plug into LinqtoSQLDataSource processing pipeline:

   1: protected void TheLinqDataSource_Deleting( object sender, LinqDataSourceDeleteEventArgs e )

   2:  {

   3:      /* an item to delete is passed in arguments */

   4:      ACTIVITY Item = e.OriginalObject as ACTIVITY;

5:

   6:      /* create a new context */

   7:      ActivityMonitorClassesDataContext context =

   8:         new ActivityMonitorClassesDataContext();

9:

  10:      /* retrieve the item once again in the new context */

  11:      ACTIVITY ItemToModify =

  12:         ( from i in context.ACTIVITies where i.ID == Item.ID select i ).Single();

13:

  14:      /* update it */

  15:      ItemToModify.Deleted = true;

  16:      context.SubmitChanges();

17:

  18:      /* cancel the Deleting event from the processing pipeline */

  19:      e.Cancel = true;

  20:  }

This works like a charm but then I thought Wait! Why do I hack the view's processing pipeline? Couldn't I just pretend that the object is deleted and rather modify the way LinqToSQL actually deletes data?

So I removed the TheLinqDataSource_Deleting event and started to search for a place where I could plug into Linq's processing pipeline.

It seems that there are few partial methods generated in the LINQ model:

   1: [System.Data.Linq.Mapping.DatabaseAttribute(Name="ActivityMonitor")]

   2: public partial class ActivityMonitorClassesDataContext :

   3:     System.Data.Linq.DataContext

   4: {

5:

   6:     private static System.Data.Linq.Mapping.MappingSource

   7:         mappingSource = new AttributeMappingSource();

8:

   9: #region Extensibility Method Definitions

  10: partial void OnCreated();

  11: partial void InsertACTIVITY(ACTIVITY instance);

  12: partial void UpdateACTIVITY(ACTIVITY instance);

  13: partial void DeleteACTIVITY(ACTIVITY instance);

  14: #endregion

15:

  16: ...

If you provide an explicit implementation of these methods (Ben Hall explains on how Linq knows if these methods are implemented), you can alter the way Linq inserts, updates and deletes items.

At first I thought that following will work:

   1: public partial class ActivityMonitorClassesDataContext

   2: {

   3:     partial void DeleteACTIVITY( ACTIVITY instance )

   4:     {

   5:         instance.Deleted = true;

6:

   7:         /* redirect delete to update */

   8:         this.ExecuteDynamicUpdate( instance );

   9:     }

  10: }

however, it does not work. Linq knows that the entity should be deleted so the query generated for the update clause does not contain any new values and you end up with SqlException saying that there is an error near WHERE (the update query looks like this: "UPDATE ... SET WHERE ..." with just a blank space between SET and WHERE).

I would still love to see the code above working, in the meantime however, following code works:

   1: public partial class ActivityMonitorClassesDataContext

   2: {

   3:     partial void DeleteACTIVITY( ACTIVITY instance )

   4:     {

   5:         /* duplicate the context with its transaction */

   6:         ActivityMonitorClassesDataContext context =

   7:             new ActivityMonitorClassesDataContext( this.Connection );

   8:         context.Transaction = this.Transaction;

9:

  10:         /* attach and modify the instance */

  11:         context.ACTIVITies.Attach( instance );

12:

  13:         instance.Deleted = true;

14:

  15:         context.SubmitChanges();

  16:     }

  17: }

I would love to learn a cleaner way to make it work.

Tutorial: using url rewriting to build an application using multiple independent data sources

Download the source code for this tutorial

As you are happy with your web application deployed on the application server, serving the data to your clients, someone asks you to launch yet another instance of your application to serve data for another group of clients. Then you repeat the process, create new copies of the application in your application server and new databases.

Then suddenly you wake up having few, few dozens or few hundreds copies of the same application deployed in your application server. And someone asks you to upgrade the application. You copy files to their directories, sign clickonce applications and convert databases. The upgrade takes forever to complete or worse - crashes before it's complete.

Sounds familiar?

I belive that this is one of the issues not-so-widely discussed (at least not so widely I would expect) and in the same time - it's one of the issues which occur not during the development phase but rather when the application is used for quite a time.

The intention of this post is to give you some basic ideas of architectural patterns used to build applications that are easily maintainable in many independent instances.

Assumptions

First of all - I do not assume that independent datasources can easily be merged into a single huge datasource. This is not the right way to go - large datasources are hard to maintain and debug and, moreover, you can consider putting all the data from different clients into a single database insecure.

However, there's no point of having copies of the same application in the application server. As you will see, it's fairly easy to be able to run multiple independent data sources using a single application.

So, how different datasources are distinguished by end-users? Again, it's fairly simple: by using virtual url addresses which are specific to data sources.

Suppose that you have your http://yourwebsite.com/start.aspx webpage, served for all users. Users running the datasource instance1 will use http://yourwebsite.com/instance1/start.aspx address while users running instance2 will use http://yourwebsite.com/instance2/start.aspx but both addresses will be virtual meaning that the only resource physically existing in your web server will be the http://yourwebsite.com/start.aspx webpage.

Goal

The goal of this tutorial is then to write an application which can "virtually" address nonexistent "instances" of independend datasources thus giving access to different datasources to different users. However, as the datasources have to be physically separated, we still want to have just a single instance of the application deployed on the application server.

Towards the solution

It seems that implementing such virtual addressing used to pick the datasource is relatively easy - all you have to do is to plug into the ASP.NET processing pipleline early enough to be able to perform url rewriting. There are two possible ways to rewrite urls in ASP.NET, however I prefer to use modules, specifically - the Global.asax, global application class.

It turns out that no matter what ASP.NET resource is actually accessed and whether the resource exists or not, you can plug into the begin request event and use HttpContext's RewritePath method to perform url rewriting.

In real life applications you would probably prefer to match addresses using regular expressions, however, for this simple tutorial it's sufficient to manually parse the request address to find out which datasource the user is actually accessing.

   1: void Application_BeginRequest( object sender, EventArgs e )

   2: {

   3:     HttpApplication app = (HttpApplication)this;

   4:     HttpContext ctx = app.Context;

5:

   6:     string LocalPath =

   7:         Helpers.ToAppRelativeWithQueryString( ctx.Request.Url.PathAndQuery );

   8:     string[] Segments = LocalPath.Split( '/' );

9:

  10:     if ( Segments.Length > 2 )

  11:     {

  12:         string Instance = Segments[Segments.Length - 2].Replace( "/", "" );

13:

  14:         /* remember curren instance */

  15:         ctx.Items["INSTANCE"] = Instance;

  16:         /* remember that url has been rewritten */

  17:         ctx.Items["REWRITE"] = true;

18:

  19:         string NewUrl = app.Request.Url.PathAndQuery.Replace( "/" + Instance, "" );

20:

  21:         app.Context.RewritePath( NewUrl );

  22:     }

  23: }

Let's start with lines 6/7. You see, as the actual request comes to the application server, the PathAndQuery differs depending on whether the application is located under the root directory of the applicaiton server or in the application server's "application directory". Suppose that users refers to the start.aspx page. In the former example the PathAndQuery is just ~/start.aspx. However, in latter case, when the application is put inside IIS's application, the PathAndQuery is ~/ApplicationName/start.aspx.

To be able to omit the part of the PathAndQuery which corresponds to the IIS's application name, I use magical

   1: VirtualPathUtility.ToAppRelative( ... )

from the standard library, however, this little shiny gem does not work with paths that contain nonempty querystrings! This is why I wrap it with my custom ToAppRelativeWithoutQueryString.

Then, in line 8 I split the PathAndQuery to check whether it refers to the virtual address of nonexistent instance of the application. As I said, you usually use regular expressions there.

If there are more than 2 segments, I know that the address user refers to is of the form http://yourservername.com/instance1/start.aspx. In line 15 I store the instance name in the temporary Items container and in line 17 I store the information about the rewriting being made.

Two notes here. First - as you'll find out in a minute, in this tutorial I store the information about the instance users selects (with the virtual address) in the server's session container. However - this is not the best way to go as it raises few technical issues. For example, the session container is not available during the BeginRequest phase of the pipeline.

So I store the information in the Items container and just to copy it to session container after it becomes available:

   1: void Application_AcquireRequestState( object sender, EventArgs e )

   2: {

   3:     HttpApplication app = (HttpApplication)this;

   4:     HttpContext ctx = app.Context;

5:

   6:     /* copy from items to session */

   7:     if ( ctx.Items["INSTANCE"] != null )

   8:         ctx.Session["INSTANCE"] = ctx.Items["INSTANCE"];

   9: }

Yet another note about the line 17 of the previous code snippet - we'll make use of the fact that the url is rewritten in the login page.

In the line 19 I build the new url by removing the virtual part of it which refers to the instance name and in line 21 I rewrite the url to the existing one.

The rewriting itself is really powerful operation. It changes the internal address of the request while maintaining all the POST parameters. This is why it's possible to build applications that rewrite urls - without POST parameters passed to rewritten address, you'd for example end up with no events raised from any controls!

Tweaks in the login page

Url rewriting is just not enough, what we have to do is to tweak the forms authentication. Normally, when users access pages they are not authorized to access, they are redirected to the login page. The problem is that there can be only a single login page specified in the forms authentication's section of the web.config. How do we then redirect some users to http://yourservername.com/instance1/loginpage.aspx and other users to http://yourservername.com/instance2/loginpage.aspx?

Well, that's easy again. Put this into the Page_Load of the login page:

   1: public partial class LoginPage : System.Web.UI.Page

   2: {

   3:     protected void Page_Load(object sender, EventArgs e)

   4:     {

   5:         /* if there is an instance and url has NOT been rewritten yet */

   6:         if ( this.Session["INSTANCE"] != null &&

   7:              this.Context.Items["REWRITE"] == null

   8:             )

   9:         {

  10:             this.Response.Redirect(

  11:                 Helpers.CreateVirtualUrl( this.Request.Url.PathAndQuery ) );

  12:         }

  13:     }

  14: }

Take a look at lines 6/7. If the users uses a virtual instance (the information for which we store in the Session container) and there has been no rewriting during processing of current request - we redirect the login page to another login page, the virtual one. And, as the first request to the login page will be of the form http://youraddress.com/LoginPage.aspx?ReturnUrl=Default.aspx, we redirect to http://youraddress.com/instance1/LoginPage.aspx?ReturnUrl=Default.aspx. This new, virtual login page, is then processed by the BeginRequest, where we rewrite it back again to the former form but this time there is no redirect back in the LoginPage's Page_Load since the Items container stores the information of the rewriting beeing made.

Storing the instance name user selects with his first request

As I said, in this tutorial we store the information of the user's virtual context in the Session container, however cookies seem much more practical solution. There's only one important security issue.

Suppose users opens http://youraddress.com/instance1/Default.aspx, gets redirected to http://youraddress.com/instance1/LoginPage?ReturnUrl=Default.aspx and provides his/her valid credentials. Then he types http://youraddress.com/instance2/Default.aspx into his/her browser and in the same time he deletes the cookie pointing the instance he/she has selected before from his/her browser.

Can you see what happens? Your BeginRequest happily assigns new cookie which stores the information about instance2 as the current selection, however, the Forms Authentication cookie is still there! And the user gets the access to private data of another instance of the datasource. Disaster.

I solve this issue by calling FormsAuthentication.SignOut() in BeginRequest each time I find out that there's no cookie pointing to a selected instance. Do allow users to change datasource instances in the same browser's window but allow them access only public resources for which a forms cookie is not required.

I also store the instance name in the UserData part of the Forms Authentication cookie and each time BeginRequest processes a virtual address, I check if the address of the virtual instance matches the one stored in the UserData.

Handling IIS's 404 error pages

There's yet a minor issue regarding the way IIS processes requests. You see, if there's the .aspx extension in your request, the IIS always passes such request to ASP.NET. However, if an user just navigates to http://youraddress.com/instance1/ (which is his/her legitimate address!), IIS happily returns 404 page, since there's no physical /instance1 resource available on the server.

The solution to this issue is to redefine the page IIS sends back when 404 occurs. Change the standard page to your custom page and do what you want to make user's browser ask for anything ending with .aspx.

For example, put:

   1: <html>

   2: <head>

   3: </head>

   4: <script language="javascript" type="text/javascript">   1:  
   2: function redirect404()
   3: {
   4:   var infix = '';
   5:   if ( location.href.toString().charAt( location.href.toString().length - 1 ) != '/' )
   6:     infix = '/';
   7:   location.href = location.href + infix + "start.aspx";  
   8: }
</script>

   5: <body onload="redirect404()">

   6: </body>

   7: </html>

into the _404.htm file and configure IIS to serve _404.htm in response to 404 error.

Where to go now from there

The basics are behind, let's discuss three issues.

First of all - you'll notice that you create a lot of relative/absolute links in your code which must be modified so that they point to /instance1/thelinkcontent instead just /thelinkcontent. It does not take a lot of time but the whole application has to be carefully examined and tested.

And another issue - you end up with a single application talking to many independed datasources. In case of databases - you have to be sure that the structure of all databases match the object model in your application.

This is where I recommend a very handy pattern - do not convert databases manually but rather write a code which converts the database and execute the code each time you handle the first request to a database. This way databases are automatically coverted when they are accessed for the first time after you deploy a new version of your web application.

And the last issue - your HTTP forms will be rendered with invalid action value. This issue can be resolved using standard techniques used by any ASP.NET url rewriting frameworks. You can either create your own Form Control Adapter or change the way forms are rendered.

If you are interested in url rewriting itself, please take a look here.

Object-Oriented Software Development