Wednesday, January 14, 2009

Tutorial: using url rewriting to build an application using multiple independent data sources

Download the source code for this tutorial 

As you are happy with your web application deployed on the application server, serving the data to your clients, someone asks you to launch yet another instance of your application to serve data for another group of clients. Then you repeat the process, create new copies of the application in your application server and new databases.

Then suddenly you wake up having few, few dozens or few hundreds copies of the same application deployed in your application server. And someone asks you to upgrade the application. You copy files to their directories, sign clickonce applications and convert databases. The upgrade takes forever to complete or worse - crashes before it's complete.

Sounds familiar?

I belive that this is one of the issues not-so-widely discussed (at least not so widely I would expect) and in the same time - it's one of the issues which occur not during the development phase but rather when the application is used for quite a time.

The intention of this post is to give you some basic ideas of architectural patterns used to build applications that are easily maintainable in many independent instances.

Assumptions

First of all - I do not assume that independent datasources can easily be merged into a single huge datasource. This is not the right way to go - large datasources are hard to maintain and debug and, moreover, you can consider putting all the data from different clients into a single database insecure.

However, there's no point of having copies of the same application in the application server. As you will see, it's fairly easy to be able to run multiple independent data sources using a single application.

So, how different datasources are distinguished by end-users? Again, it's fairly simple: by using virtual url addresses which are specific to data sources.

Suppose that you have your http://yourwebsite.com/start.aspx webpage, served for all users. Users running the datasource instance1 will use http://yourwebsite.com/instance1/start.aspx address while users running instance2 will use http://yourwebsite.com/instance2/start.aspx but both addresses will be virtual meaning that the only resource physically existing in your web server will be the http://yourwebsite.com/start.aspx webpage.

Goal

The goal of this tutorial is then to write an application which can "virtually" address nonexistent "instances" of independend datasources thus giving access to different datasources to different users. However, as the datasources have to be physically separated, we still want to have just a single instance of the application deployed on the application server.

Towards the solution

It seems that implementing such virtual addressing used to pick the datasource is relatively easy - all you have to do is to plug into the ASP.NET processing pipleline early enough to be able to perform url rewriting. There are two possible ways to rewrite urls in ASP.NET, however I prefer to use modules, specifically - the Global.asax, global application class.

It turns out that no matter what ASP.NET resource is actually accessed and whether the resource exists or not, you can plug into the begin request event and use HttpContext's RewritePath method to perform url rewriting.

In real life applications you would probably prefer to match addresses using regular expressions, however, for this simple tutorial it's sufficient to manually parse the request address to find out which datasource the user is actually accessing.

   1: void Application_BeginRequest( object sender, EventArgs e )
   2: {
   3:     HttpApplication app = (HttpApplication)this;
   4:     HttpContext ctx = app.Context;
   5:  
   6:     string LocalPath = 
   7:         Helpers.ToAppRelativeWithQueryString( ctx.Request.Url.PathAndQuery );
   8:     string[] Segments = LocalPath.Split( '/' );
   9:  
  10:     if ( Segments.Length > 2 )
  11:     {
  12:         string Instance = Segments[Segments.Length - 2].Replace( "/", "" );
  13:  
  14:         /* remember curren instance */
  15:         ctx.Items["INSTANCE"] = Instance;
  16:         /* remember that url has been rewritten */
  17:         ctx.Items["REWRITE"] = true;
  18:  
  19:         string NewUrl = app.Request.Url.PathAndQuery.Replace( "/" + Instance, "" );
  20:  
  21:         app.Context.RewritePath( NewUrl );
  22:     }
  23: }

 


Let's start with lines 6/7. You see, as the actual request comes to the application server, the PathAndQuery differs depending on whether the application is located under the root directory of the applicaiton server or in the application server's "application directory". Suppose that users refers to the start.aspx page. In the former example the PathAndQuery is just ~/start.aspx. However, in latter case, when the application is put inside IIS's application, the PathAndQuery is ~/ApplicationName/start.aspx.


To be able to omit the part of the PathAndQuery which corresponds to the IIS's application name, I use magical



   1: VirtualPathUtility.ToAppRelative( ... )

from the standard library, however, this little shiny gem does not work with paths that contain nonempty querystrings! This is why I wrap it with my custom ToAppRelativeWithoutQueryString.


Then, in line 8 I split the PathAndQuery to check whether it refers to the virtual address of nonexistent instance of the application. As I said, you usually use regular expressions there.


If there are more than 2 segments, I know that the address user refers to is of the form http://yourservername.com/instance1/start.aspx. In line 15 I store the instance name in the temporary Items container and in line 17 I store the information about the rewriting being made.


Two notes here. First - as you'll find out in a minute, in this tutorial I store the information about the instance users selects (with the virtual address) in the server's session container. However - this is not the best way to go as it raises few technical issues. For example, the session container is not available during the BeginRequest phase of the pipeline.


So I store the information in the Items container and just to copy it to session container after it becomes available:



   1: void Application_AcquireRequestState( object sender, EventArgs e )
   2: {
   3:     HttpApplication app = (HttpApplication)this;
   4:     HttpContext ctx = app.Context;
   5:  
   6:     /* copy from items to session */
   7:     if ( ctx.Items["INSTANCE"] != null )
   8:         ctx.Session["INSTANCE"] = ctx.Items["INSTANCE"];
   9: }

Yet another note about the line 17 of the previous code snippet - we'll make use of the fact that the url is rewritten in the login page.


In the line 19 I build the new url by removing the virtual part of it which refers to the instance name and in line 21 I rewrite the url to the existing one.


The rewriting itself is really powerful operation. It changes the internal address of the request while maintaining all the POST parameters. This is why it's possible to build applications that rewrite urls - without POST parameters passed to rewritten address, you'd for example end up with no events raised from any controls!


Tweaks in the login page

Url rewriting is just not enough, what we have to do is to tweak the forms authentication. Normally, when users access pages they are not authorized to access, they are redirected to the login page. The problem is that there can be only a single login page specified in the forms authentication's section of the web.config. How do we then redirect some users to http://yourservername.com/instance1/loginpage.aspx and other users to http://yourservername.com/instance2/loginpage.aspx?


Well, that's easy again. Put this into the Page_Load of the login page:



   1: public partial class LoginPage : System.Web.UI.Page
   2: {
   3:     protected void Page_Load(object sender, EventArgs e)
   4:     {
   5:         /* if there is an instance and url has NOT been rewritten yet */
   6:         if ( this.Session["INSTANCE"] != null &&
   7:              this.Context.Items["REWRITE"] == null
   8:             )
   9:         {
  10:             this.Response.Redirect( 
  11:                 Helpers.CreateVirtualUrl( this.Request.Url.PathAndQuery ) );
  12:         }
  13:     }
  14: }

Take a look at lines 6/7. If the users uses a virtual instance (the information for which we store in the Session container) and there has been no rewriting during processing of current request - we redirect the login page to another login page, the virtual one. And, as the first request to the login page will be of the form http://youraddress.com/LoginPage.aspx?ReturnUrl=Default.aspx, we redirect to http://youraddress.com/instance1/LoginPage.aspx?ReturnUrl=Default.aspx. This new, virtual login page, is then processed by the BeginRequest, where we rewrite it back again to the former form but this time there is no redirect back in the LoginPage's Page_Load since the Items container stores the information of the rewriting beeing made.


Storing the instance name user selects with his first request

As I said, in this tutorial we store the information of the user's virtual context in the Session container, however cookies seem much more practical solution. There's only one important security issue.


Suppose users opens http://youraddress.com/instance1/Default.aspx, gets redirected to http://youraddress.com/instance1/LoginPage?ReturnUrl=Default.aspx and provides his/her valid credentials. Then he types http://youraddress.com/instance2/Default.aspx into his/her browser and in the same time he deletes the cookie pointing the instance he/she has selected before from his/her browser.


Can you see what happens? Your BeginRequest happily assigns new cookie which stores the information about instance2 as the current selection, however, the Forms Authentication cookie is still there! And the user gets the access to private data of another instance of the datasource. Disaster.


I solve this issue by calling FormsAuthentication.SignOut() in BeginRequest each time I find out that there's no cookie pointing to a selected instance. Do allow users to change datasource instances in the same browser's window but allow them access only public resources for which a forms cookie is not required.


I also store the instance name in the UserData part of the Forms Authentication cookie and each time BeginRequest processes a virtual address, I check if the address of the virtual instance matches the one stored in the UserData.


Handling IIS's 404 error pages

There's yet a minor issue regarding the way IIS processes requests. You see, if there's the .aspx extension in your request, the IIS always passes such request to ASP.NET. However, if an user just navigates to http://youraddress.com/instance1/ (which is his/her legitimate address!), IIS happily returns 404 page, since there's no physical /instance1 resource available on the server.


The solution to this issue is to redefine the page IIS sends back when 404 occurs. Change the standard page to your custom page and do what you want to make user's browser ask for anything ending with .aspx.


For example, put:



   1: <html>
   2: <head>
   3: </head>
   4: <script language="javascript" type="text/javascript">
   1:  
   2: function redirect404()
   3: {
   4:   var infix = '';
   5:   if ( location.href.toString().charAt( location.href.toString().length - 1 ) != '/' )
   6:     infix = '/';
   7:   location.href = location.href + infix + "start.aspx";  
   8: }
</script>
   5: <body onload="redirect404()">
   6: </body>
   7: </html>

into the _404.htm file and configure IIS to serve _404.htm in response to 404 error.


Where to go now from there

The basics are behind, let's discuss three issues.


First of all - you'll notice that you create a lot of relative/absolute links in your code which must be modified so that they point to /instance1/thelinkcontent instead just /thelinkcontent. It does not take a lot of time but the whole application has to be carefully examined and tested.


And another issue - you end up with a single application talking to many independed datasources. In case of databases - you have to be sure that the structure of all databases match the object model in your application.


This is where I recommend a very handy pattern - do not convert databases manually but rather write a code which converts the database and execute the code each time you handle the first request to a database. This way databases are automatically coverted when they are accessed for the first time after you deploy a new version of your web application.


And the last issue - your HTTP forms will be rendered with invalid action value. This issue can be resolved using standard techniques used by any ASP.NET url rewriting frameworks. You can either create your own Form Control Adapter or change the way forms are rendered.


If you are interested in url rewriting itself, please take a look here.

3 comments:

Anonymous said...

Interesting.

This is very good solution for a web sites. There is also new WCF feature called UriTemplate, as an atribute to a service methods. This can be used to build RESTful web services, where standard SOAP request can be changed to HTTP Get with specified url's - soo basically a IIS-style mod_rewrite.

Whats more we can setup service response (not only SOAP 1.0/1.1 but also clean XML or even JSON).

From client point of view, he request via HTTP GET without stupid patameters eg: "www.page.pl/posts/23544", and he gets this post in XML.

Similar to generic *.ashx page, but it's not just a page - it's a living service. Very cool feature.

Wiktor Zychla said...

thanks for your comment. I will surely look at that WCF feature. Yet similar pattern is for example available in ASP.NET MVC.

Anonymous said...

What is the login details to login to the site