Monday, April 15, 2024

Rust/WebAssembly vs Javascript performance ... reloaded

Last year I've blogged about a tiny contest where Rust/WASM and Javascript were used to implement a simple Mandelbrot animation. I've polished the code, put it in a github repo, the rust-vs-js. I've also added a third contestant, the gpu.js accelerated version, which of course beats the heck out of the two (Javascript and Rust) since it's heavily parallelized.
Anyway, jump to the repository and enjoy the code.

Wednesday, April 3, 2024

Take just a single visible character?

Fairly simple requirement - get a first letter of profile description and present it together with a link. You get the idea, if I have two profiles, Foo and Bar, I want two links with F and B respectively.
The first version of the code (not even mentioned here) was just something like: if string has at least one character, take uppercase of the first character.
This seemingly simple approach completely ignores Emojis which are handled in C# strings as two consecutive chars. The second version of the code was then:
public static string ToShortDescription( this string source )
{
	var description = source?.Trim();

	if ( !string.IsNullOrWhiteSpace( description ) && description.Length >= 1 )
	{
		if ( char.IsSurrogatePair( description, 0 ) )
		{
			return description.Substring( 0, 2 ).ToUpper();
		}
		else
		{
			return description.Substring( 0, 1 ).ToUpper();
		}
	}

	return "?";
}
This is better, much better. It's not just the first character of the string, it's the substring that has the length of 2. This simple approach correctly handles many two-char Emojis, like the male mage Emoji, 🧙, encoded as 🧙.
Unfortunately, it's just the beginning of the story. It turns out some Emojis are combined from other Emojis. Let's take the mage emoji. Its female version, 🧙‍♀️, is encoded as male mage Emoji followed by additional characters to indicate female version (🧙‍♀️). The special character used to glue together emojis is the Zero-Width-Joiner (ZWJ).
Take a C# string that starts with the female mage emoji. This time it's not the 2 characters that should be taken from it, now it's 5! The two-char Emoji, the ZWJ, and another two-char Emoji!
Let this sink in - in order to have a single visible character on the screen, we need to take 5 first characters of the C# string!
And as you can expect, the above version of code correctly discovers the first surrogate but fails to discover the ZWJ.
There's even a discussion on SO on how to detect this.
My current approach is
public static string ToShortDescription( this string source, bool autoUpper = true )
{
	var description = source?.Trim();

	if ( !string.IsNullOrWhiteSpace( description ) && description.Length >= 1 )
	{
		// należy brać kolejne znaki na następujących zasadach
		// * jeśli zwykły znak - bierze się i koniec
		// * jeśli zjw - bierze się i nie koniec
		// * jeśli surrogatepair bierze się dwa i nie koniec
		char[] sourceChars = source.ToCharArray();
		List<char> destChars = new List<char>();

		var index = 0;
		bool takeAgain;
		bool zjw;

		do
		{
			takeAgain = false;

			// czy jest jeden i jeszcze jeden za nim (dwuznaki)
			if ( index < sourceChars.Length - 1 )
			{
				// surogat
				if ( char.IsSurrogatePair( sourceChars[index], sourceChars[index + 1] ) )
				{
					destChars.AddRange( new[] { sourceChars[index], sourceChars[index + 1] } );

					index    += 2;
					takeAgain = true;
				}
			}

			if ( index < sourceChars.Length - 2 )
			{
				// zjw - skleja dwa emoji
				if ( sourceChars[index] == (char)8205 )
				{
					destChars.AddRange( new[] { sourceChars[index], sourceChars[index + 1], sourceChars[index + 2] } );

					index    += 3;
					takeAgain = true;
				}

			}


		} while ( takeAgain && index < sourceChars.Length );

		// weź jeszcze jeden jeśli jeszcze nie ma nic lub zjw
		if ( !takeAgain && 
			 index <= sourceChars.Length-1 &&
			 destChars.Count == 0 
			)
		{
			destChars.Add( sourceChars[index] );
		}

		string _result = new string( destChars.ToArray() );

		return autoUpper ? _result.ToUpper() : _result;

		/*
		if ( char.IsSurrogatePair( description, 0 ) )
		{
			return description.Substring( 0, 2 ).ToUpper();
		}
		else
		{
			return description.Substring( 0, 1 ).ToUpper();
		}
		*/
	}

	return "?";
}
This passes some important unit tests. Namely, it correctly handles the England Emoji flag emoji, the 🏴󠁧󠁢󠁥󠁮󠁧󠁿 (&#x1F3F4;&#xE0067;&#xE0062;&#xE0065;&#xE006E;&#xE0067;&#xE007F;), which still is a single visible sign but in this extreme case it's the first 14 characters of the C# string! I believe there's still a room for improvement (and possibly other strange cases that I still miss).

Friday, February 23, 2024

Entity framework migrations under heavy load from a server farm

Entity Framework migrations are great. I particularily like the mechanism that prevents any query on the database until all pending migrations are applied. This solves a lot of issues and in most scenarios, you can even rely on the MigrateDatabaseToLatestVersion initializer in production. The initializer is smart enough to guard any instance of the dbcontext within the current process, this works correctly even from ASP.NET.
Well, mostly. Problems start when you have a farm of many ASP.NET servers that connect to the very same database. Each single server runs its own migration. Under heavy load this possibly means that your database is concurrently migrated from multiple servers.
Some time ago we almost had a disaster involving this scenario. A really busy app deployed on multiple servers was updated and, sadly, applying pending migrations was constantly failing. Yep, it was something specific in one of migrations but the result was as follows: one of servers tried to start the migration. Migration involved a heavy query that lasted a couple of seconds. All other servers were migrating too, trying to execute the very same heavy query. After the heavy query there was another lightweight query that was failing on the first server because the heavy query was pending on other servers. And as soon as any other server finished the heavy query, it immediately failed on the lightweight query because yet another server was just executing the heavy query.
There's a solution, though, involving two custom initializers. One just checks if there are pending migrations and throws. This one is configured as the default initializer. Another one actually migrates the database and is only invoked from a controlled environment, like a separate application or a specific controller/action.
Some code:
    public class DefaultDbContextInitializer : 
       IDatabaseInitializer<ExampleMigrationDbContext>
    {
        public void InitializeDatabase( ExampleMigrationDbContext context )
        {
            Configuration cfg = new Configuration(); // migration configuration class
            cfg.TargetDatabase =
               new DbConnectionInfo(
                  context.Database.Connection.ConnectionString,
                  "System.Data.SqlClient" );

            DbMigrator dbMigrator = new DbMigrator( cfg );

            if ( dbMigrator.GetPendingMigrations().Count() > 0 )
            {
                throw new MigrationsException( "pending migrations!" );
            }
        }
    }
    
    public class MigratingDbContextInitializer : 
       IDatabaseInitializer<ExampleMigrationDbContext>
    {
        public void InitializeDatabase( ExampleMigrationDbContext context )
        {
            Configuration cfg = new Configuration(); // migration configuration class
            cfg.TargetDatabase =
               new DbConnectionInfo(
                  context.Database.Connection.ConnectionString,
                  "System.Data.SqlClient" );

            DbMigrator dbMigrator = new DbMigrator( cfg );

            foreach ( string MigrationName in dbMigrator.GetPendingMigrations() )
            {
                Stopwatch watch = new Stopwatch();
                watch.Start();

                dbMigrator.Update( MigrationName );

                watch.Stop();
            }
        }
    }    
The first default initializer is configured globally:
   Database.SetInitializer<ExampleMigrationDbContext>( new DefaultMigrationDbContextInitializer() );
Because of that, any attempt to touch the database that has pending migrations will fail with an exception you can catch and show a message.
But then, somewhere in a controlled environment you call this:
    var context = new ExampleMigrationDbContext();

    var migrator = new MigratingDbContextInitializer();
    migrator.InitializeDatabase( context );
This works. After the database is migrated in the controlled way, the default initializer stops throwing and the app is back to running.

Thursday, February 22, 2024

ASP.NET MVC and Bootstrap error classes

Bootstrap requires specific error classes to be applied to inputs (e.g. is-invalid). MVC on the other hand has its own error classes (e.g. field-validation-error). There are numerous ideas how to combine the two and this is another one.
The idea is to dirty replace MVC error classes using reflection. A starting point would be to change the input's error class:
typeof( HtmlHelper ).InvokeMember(
	nameof( HtmlHelper.ValidationInputCssClassName ),
	System.Reflection.BindingFlags.Public | System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.SetField,
	null,
	typeof( HtmlHelper ),
	new[] { "is-invalid" } ); 
This one overwrites the const HtmlHelper.ValidationInputCssClassName from its default value (field-validation-error) to Bootstrap's is-invalid. Calling this early causes invalid inputs have Bootstrap class.

Wednesday, February 7, 2024

Cross-domain front channel federated Single Log Out

The SSO (Single Sign On) can be implemented in various ways, people use WS-Federation, SAML2 or OpenIdConnect, all these protocols are great and work like a charm.
What's troublesome is the SLO (Single Log Out). There are two possible approaches:
  • back-channel logout consists in server-server communication. The identity provider calls the service provider directly, server-to-server, without any user browser support (thus the name "back-channel") and says "hey, it's me, the identity provider. The session 10b7e82f-4a1b-489f-9848-0d8babcd737f should be terminated." The service provider marks the session as terminated then. While this looks great, there's an implementation and performance cost - the service provider must be implemented in a specific way - the very each request from any user must be checked against the active sessions repository, just because the session can be terminated by a server-server call from the identity provider.
  • front-channel logout consists in server-browser-server communication. The usual way of implementing this is by using nested iframes returned from the identity provider. This is where the problem is.
The problem of nested iframes is a new, escalating issue. The more restrictions are added to web browsers because of the more strict security policy, the more problems there are with nested iframes. I've blogged about an issue with FireFox preventing cookies with SameSite=none from being sent in cross-domain iframes. Who knows if other browsers will adopt this seemingly incorrect policy just because someone decides safety is more important than conforming to the spec.
Anyway, because of how some web browsers start to act when cross-domain iframes are involved, marking your cookies with SameSite=none is no longer a solution. Instead, we've concluded that the idea of iframes has to be replaced by something else that would still allow us to use the front-channel logout but would not suffer from browsers quirks.
The idea is as follows. The identity provider maintains a per-user-session list of service providers that were authenticated in current session (it maintains such lists anyway, it's not anything new). The new element here is that the list allows the identity provider to distinguish between an old-service-provider and a new-service-provider. Old service providers will be handled the old way, with nested iframes. New service providers will be handled by redirecting to them and expecting they redirect back.
Let's first discuss how the identity provider is able to distinguish service providers. There are two possible approaches:
  • hard approach - the identity provider can rely on a static whitelist configuration where each possible identity provider is configured and an extra bit of information is available in the configuration.
  • soft approach - the identity provider can still have a whitelist but the information about the service provider type is not a part of the configuration. Instead, whenever a service provider tries to signin, the signin protocol contains an extra bit of information that tells the service provider what type of service provider is currently processed. For us, this extra information in the signin was an extra key in the wctx parameter of the WS-Federation. The wctx allows the service provider to pass an extra info to the identity provider, of the form [key=value]. It's commonly used to pass something like rm=1 (remember me=1) or ru=.... (return url=....). We just add yet another extra key here, nslo=1 (new single log out=1). In case of third-party identity providers, this extra key in wctx is just ignored. In case of a first-party, our own identity provider, we use this extra info from the wctx to mark service providers as old/new at the identity provider side.
And now, comes the important part, the actual log out implemented at service providers and identity providers.
In case of a service provider:
  • signin - a regular wsignin1.0 together with extra wctx=nslo%3D1
  • sign out - a regular wsignout1.0. In case of wsignoutcleanup1.0, the service provider is obliged to return back (302) to the identity provider (with wsignout1.0)
In case of an identity provider:
  • signin - pay attention to wctx and mark service providers as old/new (as explained above)
  • sign out
service_provider = list_of_service_providers_of_new_type_from_session.FirstOrDefault();
if ( service_provider != null ) {
    remove_from_session( service_provider );
    redirect_to_service_provider_with_wsignoutcleanup1.0( service_provider )
} else if ( identity_provider_has_its_own_top_level_identity_provider ) {
    create_page_with_iframes_to_old_type_service_providers();
    redirect_to_top_level_identity_provider_with_wsignout1.0( top_level_identity_provider )
} else {
    // there's no top-level identity provider above the current identity provider
    create_page_with_iframes_to_old_type_service_providers();
    close_session_and_return_information_page()
}
Q&A
  • Why it works? It works because there are no more nested iframes, instead the identity provider redirects to service providers and they redirect back. A 302 redirect always carries cookies.
  • Why the distinction between old/new service providers? It's because the log out is implemented as a redirect to the identity provider with wsignoutcleanup. Old-type service providers usually handle this by just terminating the local session and returning an empty response. Since now log out is a redirect from the identity provider to the service provider, the identity provider has to be sure the service provider will redirect back
  • Is it backwards compatible with existing service providers / identity providers? It is. An old-type identity provider (that just returns iframes) will ignore the extra signin info provided in wctx. The cross-domain sign out would probably still fail but you don't loose anything (it fails anyway). An old-type service provider with a new identity provider will not carry the extra wctx info so that it will be handled with an iframe (because the identity provider handles both type of service providers)

Thursday, February 1, 2024

ECMAScript modules in node.js

Node.js supports ECMAScript modules for few years now and if you still consider switching from the CommonJS, there are a couple of good arguments for.
First, enabling the module subsystem is as easy as adding
  ...
  "type": "module",
  ...
to the package.json. Then, modules can be exported/imported, both default and named conventions are supported:
// foo.js
function foo(n) {
    return n+1;
}

function bar(n) {
    return n-1;
}

function qux(n) {
    return n-2;
}

export { bar };
export { qux };
export default foo;

// app.js
import foo from './foo.js';
import { bar, qux } from './foo.js';

console.log( foo(5) );
console.log( bar(5) );
console.log( qux(5) );
Modules can reference other modules recursively, the old good CommonJS supports cycles too but here it's even easier:
// a.js
import { b } from './b.js';

function a(n) {
    if ( n > 1 ) {
        console.log( `a: ${n}` );
        return b( n-1 );
    } else {
        return 1;
    }
}

export { a };

// b.js
import { a } from './a.js';

function b(n) {
    if ( n > 1 ) {
        console.log( `b: ${n}` );
        return a( n-1 );
    } else {
        return 1;
    }
}

export { b };

// app.js
import { a } from './a.js';

console.log( a(7) );
Modules support top-level await
// app.js
console.log( await 7 );
And last but not least, modules interoperate with existing infrastructure
// app.js
import http from 'http';
import express from 'express';

var app = express();

app.get('/', (req, res) => {
    res.end( 'hello world');
});

http.createServer( app ).listen(3000, () => {
    console.log( 'started');
});

Thursday, January 25, 2024

Never, ever make your Random static

Years ago it was popular to advocate for sharing the instance of the Random class. There are numerous sources, including blogs, the stack, etc. Ocasionally, someone dared to write that it could not be the best idea. But still, looking for information, people stumle across this decade-old recommendation.
That was the case of one of our apps. In one critical place, a shared instance of random was used

public static Random _rnd = new Random();

...
..._rnd.Next()...
This worked. For years. Until one day, it failed miserably.
You see, the random has two internal variables it overwrites each time Next is called so that it can advance to the next random value. If your app is just winning a lottery ticket in its concurrent execution, the two values can end up being overwritten with the same value.
And guess what, random starts to return 0 each time Next is called! For us, it was the case of an internal stress test, where a heavy traffic was directed onto the application but it can happen just anytime.
There are numerous solutions. First, starting from .NET6 there's the Random.Shared static property, marked as thread-safe. In older frameworks, one of alternative approaches should be used, e.g. this