Friday, February 23, 2024

Entity framework migrations under heavy load from a server farm

Entity Framework migrations are great. I particularily like the mechanism that prevents any query on the database until all pending migrations are applied. This solves a lot of issues and in most scenarios, you can even rely on the MigrateDatabaseToLatestVersion initializer in production. The initializer is smart enough to guard any instance of the dbcontext within the current process, this works correctly even from ASP.NET.
Well, mostly. Problems start when you have a farm of many ASP.NET servers that connect to the very same database. Each single server runs its own migration. Under heavy load this possibly means that your database is concurrently migrated from multiple servers.
Some time ago we almost had a disaster involving this scenario. A really busy app deployed on multiple servers was updated and, sadly, applying pending migrations was constantly failing. Yep, it was something specific in one of migrations but the result was as follows: one of servers tried to start the migration. Migration involved a heavy query that lasted a couple of seconds. All other servers were migrating too, trying to execute the very same heavy query. After the heavy query there was another lightweight query that was failing on the first server because the heavy query was pending on other servers. And as soon as any other server finished the heavy query, it immediately failed on the lightweight query because yet another server was just executing the heavy query.
There's a solution, though, involving two custom initializers. One just checks if there are pending migrations and throws. This one is configured as the default initializer. Another one actually migrates the database and is only invoked from a controlled environment, like a separate application or a specific controller/action.
Some code:
    public class DefaultDbContextInitializer : 
       IDatabaseInitializer<ExampleMigrationDbContext>
    {
        public void InitializeDatabase( ExampleMigrationDbContext context )
        {
            Configuration cfg = new Configuration(); // migration configuration class
            cfg.TargetDatabase =
               new DbConnectionInfo(
                  context.Database.Connection.ConnectionString,
                  "System.Data.SqlClient" );

            DbMigrator dbMigrator = new DbMigrator( cfg );

            if ( dbMigrator.GetPendingMigrations().Count() > 0 )
            {
                throw new MigrationsException( "pending migrations!" );
            }
        }
    }
    
    public class MigratingDbContextInitializer : 
       IDatabaseInitializer<ExampleMigrationDbContext>
    {
        public void InitializeDatabase( ExampleMigrationDbContext context )
        {
            Configuration cfg = new Configuration(); // migration configuration class
            cfg.TargetDatabase =
               new DbConnectionInfo(
                  context.Database.Connection.ConnectionString,
                  "System.Data.SqlClient" );

            DbMigrator dbMigrator = new DbMigrator( cfg );

            foreach ( string MigrationName in dbMigrator.GetPendingMigrations() )
            {
                Stopwatch watch = new Stopwatch();
                watch.Start();

                dbMigrator.Update( MigrationName );

                watch.Stop();
            }
        }
    }    
The first default initializer is configured globally:
   Database.SetInitializer<ExampleMigrationDbContext>( new DefaultMigrationDbContextInitializer() );
Because of that, any attempt to touch the database that has pending migrations will fail with an exception you can catch and show a message.
But then, somewhere in a controlled environment you call this:
    var context = new ExampleMigrationDbContext();

    var migrator = new MigratingDbContextInitializer();
    migrator.InitializeDatabase( context );
This works. After the database is migrated in the controlled way, the default initializer stops throwing and the app is back to running.

Thursday, February 22, 2024

ASP.NET MVC and Bootstrap error classes

Bootstrap requires specific error classes to be applied to inputs (e.g. is-invalid). MVC on the other hand has its own error classes (e.g. field-validation-error). There are numerous ideas how to combine the two and this is another one.
The idea is to dirty replace MVC error classes using reflection. A starting point would be to change the input's error class:
typeof( HtmlHelper ).InvokeMember(
	nameof( HtmlHelper.ValidationInputCssClassName ),
	System.Reflection.BindingFlags.Public | System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.SetField,
	null,
	typeof( HtmlHelper ),
	new[] { "is-invalid" } ); 
This one overwrites the const HtmlHelper.ValidationInputCssClassName from its default value (field-validation-error) to Bootstrap's is-invalid. Calling this early causes invalid inputs have Bootstrap class.

Wednesday, February 7, 2024

Cross-domain front channel federated Single Log Out

The SSO (Single Sign On) can be implemented in various ways, people use WS-Federation, SAML2 or OpenIdConnect, all these protocols are great and work like a charm.
What's troublesome is the SLO (Single Log Out). There are two possible approaches:
  • back-channel logout consists in server-server communication. The identity provider calls the service provider directly, server-to-server, without any user browser support (thus the name "back-channel") and says "hey, it's me, the identity provider. The session 10b7e82f-4a1b-489f-9848-0d8babcd737f should be terminated." The service provider marks the session as terminated then. While this looks great, there's an implementation and performance cost - the service provider must be implemented in a specific way - the very each request from any user must be checked against the active sessions repository, just because the session can be terminated by a server-server call from the identity provider.
  • front-channel logout consists in server-browser-server communication. The usual way of implementing this is by using nested iframes returned from the identity provider. This is where the problem is.
The problem of nested iframes is a new, escalating issue. The more restrictions are added to web browsers because of the more strict security policy, the more problems there are with nested iframes. I've blogged about an issue with FireFox preventing cookies with SameSite=none from being sent in cross-domain iframes. Who knows if other browsers will adopt this seemingly incorrect policy just because someone decides safety is more important than conforming to the spec.
Anyway, because of how some web browsers start to act when cross-domain iframes are involved, marking your cookies with SameSite=none is no longer a solution. Instead, we've concluded that the idea of iframes has to be replaced by something else that would still allow us to use the front-channel logout but would not suffer from browsers quirks.
The idea is as follows. The identity provider maintains a per-user-session list of service providers that were authenticated in current session (it maintains such lists anyway, it's not anything new). The new element here is that the list allows the identity provider to distinguish between an old-service-provider and a new-service-provider. Old service providers will be handled the old way, with nested iframes. New service providers will be handled by redirecting to them and expecting they redirect back.
Let's first discuss how the identity provider is able to distinguish service providers. There are two possible approaches:
  • hard approach - the identity provider can rely on a static whitelist configuration where each possible identity provider is configured and an extra bit of information is available in the configuration.
  • soft approach - the identity provider can still have a whitelist but the information about the service provider type is not a part of the configuration. Instead, whenever a service provider tries to signin, the signin protocol contains an extra bit of information that tells the service provider what type of service provider is currently processed. For us, this extra information in the signin was an extra key in the wctx parameter of the WS-Federation. The wctx allows the service provider to pass an extra info to the identity provider, of the form [key=value]. It's commonly used to pass something like rm=1 (remember me=1) or ru=.... (return url=....). We just add yet another extra key here, nslo=1 (new single log out=1). In case of third-party identity providers, this extra key in wctx is just ignored. In case of a first-party, our own identity provider, we use this extra info from the wctx to mark service providers as old/new at the identity provider side.
And now, comes the important part, the actual log out implemented at service providers and identity providers.
In case of a service provider:
  • signin - a regular wsignin1.0 together with extra wctx=nslo%3D1
  • sign out - a regular wsignout1.0. In case of wsignoutcleanup1.0, the service provider is obliged to return back (302) to the identity provider (with wsignout1.0)
In case of an identity provider:
  • signin - pay attention to wctx and mark service providers as old/new (as explained above)
  • sign out
service_provider = list_of_service_providers_of_new_type_from_session.FirstOrDefault();
if ( service_provider != null ) {
    remove_from_session( service_provider );
    redirect_to_service_provider_with_wsignoutcleanup1.0( service_provider )
} else if ( identity_provider_has_its_own_top_level_identity_provider ) {
    create_page_with_iframes_to_old_type_service_providers();
    redirect_to_top_level_identity_provider_with_wsignout1.0( top_level_identity_provider )
} else {
    // there's no top-level identity provider above the current identity provider
    create_page_with_iframes_to_old_type_service_providers();
    close_session_and_return_information_page()
}
Q&A
  • Why it works? It works because there are no more nested iframes, instead the identity provider redirects to service providers and they redirect back. A 302 redirect always carries cookies.
  • Why the distinction between old/new service providers? It's because the log out is implemented as a redirect to the identity provider with wsignoutcleanup. Old-type service providers usually handle this by just terminating the local session and returning an empty response. Since now log out is a redirect from the identity provider to the service provider, the identity provider has to be sure the service provider will redirect back
  • Is it backwards compatible with existing service providers / identity providers? It is. An old-type identity provider (that just returns iframes) will ignore the extra signin info provided in wctx. The cross-domain sign out would probably still fail but you don't loose anything (it fails anyway). An old-type service provider with a new identity provider will not carry the extra wctx info so that it will be handled with an iframe (because the identity provider handles both type of service providers)

Thursday, February 1, 2024

ECMAScript modules in node.js

Node.js supports ECMAScript modules for few years now and if you still consider switching from the CommonJS, there are a couple of good arguments for.
First, enabling the module subsystem is as easy as adding
  ...
  "type": "module",
  ...
to the package.json. Then, modules can be exported/imported, both default and named conventions are supported:
// foo.js
function foo(n) {
    return n+1;
}

function bar(n) {
    return n-1;
}

function qux(n) {
    return n-2;
}

export { bar };
export { qux };
export default foo;

// app.js
import foo from './foo.js';
import { bar, qux } from './foo.js';

console.log( foo(5) );
console.log( bar(5) );
console.log( qux(5) );
Modules can reference other modules recursively, the old good CommonJS supports cycles too but here it's even easier:
// a.js
import { b } from './b.js';

function a(n) {
    if ( n > 1 ) {
        console.log( `a: ${n}` );
        return b( n-1 );
    } else {
        return 1;
    }
}

export { a };

// b.js
import { a } from './a.js';

function b(n) {
    if ( n > 1 ) {
        console.log( `b: ${n}` );
        return a( n-1 );
    } else {
        return 1;
    }
}

export { b };

// app.js
import { a } from './a.js';

console.log( a(7) );
Modules support top-level await
// app.js
console.log( await 7 );
And last but not least, modules interoperate with existing infrastructure
// app.js
import http from 'http';
import express from 'express';

var app = express();

app.get('/', (req, res) => {
    res.end( 'hello world');
});

http.createServer( app ).listen(3000, () => {
    console.log( 'started');
});

Thursday, January 25, 2024

Never, ever make your Random static

Years ago it was popular to advocate for sharing the instance of the Random class. There are numerous sources, including blogs, the stack, etc. Ocasionally, someone dared to write that it could not be the best idea. But still, looking for information, people stumle across this decade-old recommendation.
That was the case of one of our apps. In one critical place, a shared instance of random was used

public static Random _rnd = new Random();

...
..._rnd.Next()...
This worked. For years. Until one day, it failed miserably.
You see, the random has two internal variables it overwrites each time Next is called so that it can advance to the next random value. If your app is just winning a lottery ticket in its concurrent execution, the two values can end up being overwritten with the same value.
And guess what, random starts to return 0 each time Next is called! For us, it was the case of an internal stress test, where a heavy traffic was directed onto the application but it can happen just anytime.
There are numerous solutions. First, starting from .NET6 there's the Random.Shared static property, marked as thread-safe. In older frameworks, one of alternative approaches should be used, e.g. this

Wednesday, January 24, 2024

FireFox slowly becomes a new Internet Explorer

For years in the past, developing web apps was a constant case of if (IE) do_something_else_than_for_other_browsers(). Sadly, we lately have some bad cases where things seem similar, but instead of IE we now have FF.
One of earlier cases concerned the location.reload API. In a specific case of an iframe, the app calls this to reload the content when users change their color theme. Worked everywhere instead of FF. As seen in the docs, FF has its own forceGet parameter, not supported in the spec but mentioned in the docs. Seems that location.reload works for us in FF only when this extra argument is provided.
Another case appeared lately, unfortunately. Take the WS-Federation protocol and consider a scenario where the identity provider and the service provider are on different domains.
Signing in works correctly. The service provider redirects with wa=wsignin1.0 and the identity provider responds with the SAML token POSTed to the service provider.
Signing out is implemented using nested iframes where the identity provider points to the service provider and adds wa=wsignoutcleanup1.0 to terminate the session (drop the service provider session cookie). As you know, there's been a change lately in the way cross domain cookies are handled. To prevent the CSRF, SameSite flag was added and in the default case, the cookie falls back to SameSite=lax which prevents it from being accessed cross different domains.
There's however still a way to make cookies available in cross domain requests, you are supposed to just mark them with SameSite=none;Secure. And guess what, this works in all other browsers, except FF. It turns out, the default security settings for FF prevent all crossdomain cookies, no matter if they are marked with SameSite=none or not.
Sure, users can opt out by lowering the security level or configure exceptions for your app but this doesn't change the fact that the specific scenario mentioned above just doesn't work in the default FF setup. Other browsers have their own security settings and, at least at the very moment, you can opt in for more strict settings (and cross domain cookies don't work anymore) but this requires a change in default settings. In case of FF, it's the default setting that it's against the samesite spec.

Monday, January 15, 2024

An interesting edge case in C# yield state machines

Among many useful LINQ methods, there's no functional ForEach. People often implement it on their own, an extension method is just a few lines of code.
Someone did that in one of our projects, too:
public static void ForEach1<T>( this IEnumerable<T> enumeration, Action<T> action )
{
	foreach ( T item in enumeration )
	{
		action( item );
	}
}
This works great. Someone else, however, noticed that while this works great, it doesn't allow further chaining, since the method returns void
This someone else modified the code slightly:
public static IEnumerable<T> ForEach2<T>( this IEnumerable<T> enumeration, Action<T> action )
{
	foreach ( T item in enumeration )
	{
		action( item );
		yield return item;
	}
}
This doesn't work great. Frankly, it doesn't work at all. Surprising but true. Take this:
internal class Program
{
	static void Main( string[] args )
	{
		Test1();
		Test2();

		Console.ReadLine();
	}

	static void Test1()
	{
		var array = new int[] { 1,2,3 };

		var list = new List<int>();

		array.ForEach1( e => list.Add( e ) );

		Console.WriteLine( list.Count() );
	}
	static void Test2()
	{
		var array = new int[] { 1,2,3 };

		var list = new List<int>();

		array.ForEach2( e => list.Add( e ) );

		Console.WriteLine( list.Count() );
	}
}

public static class EnumerableExtensions
{
	public static void ForEach1<T>( this IEnumerable<T> enumeration, Action<T> action )
	{
		foreach ( T item in enumeration )
		{
			action( item );
		}
	}

	public static IEnumerable<T> ForEach2<T>( this IEnumerable<T> enumeration, Action<T> action )
	{
		foreach ( T item in enumeration )
		{
			action( item );
			yield return item;
		}
	}
}
While this is expected to produce 3 for both tests, it produces 3 and 0. The upgraded version doesn't work as expected.
The reason for this is a kind-of-a-limitation of the state machine generated by the yield sugar. The machine doesn't execute any code until the very result is enumerated. This means that changing
array.ForEach2( e => list.Add( e ) );
to
array.ForEach2( e => list.Add( e ) ).ToList();
would "fix it". What a poor foreach, though, that requires an extra terminator (the ToList) and doesn't work otherwise.
Luckily, a simpler fix exists, just forget the state machine at all for this specific case:
public static IEnumerable<T> ForEach2<T>( this IEnumerable<T> enumeration, Action<T> action )
{
	foreach ( T item in enumeration )
	{
		action( item );
	}

	return enumeration;
}