Thursday, January 15, 2009

Password recovery mechanism for applications which store data in password-protected files

The issue

If your application stores data in files, you probably think of a mechanism to protect the data with user-provided passwords.

The common issue with password-protected files is that a lost password cannot be easily recovered. However, if the application provides a backdoor or magic unlock code it can be easily compromised and password-removing tools can be easily written.

Let's summarize few assumptions on the encrypt - recovery issue:

  • the application should be able to use any known encryption mechanism to protect files
  • the password-recovery mechanism should not be built into the application code but rather involve an additional actions from the application's producer (for example - I can send the file to the application's producer and they magically unlock it for me)
  • the password-recovery mechanism should not allow the application's producer to retrieve the password in an plain, explicit form (because I might have used the same password to protect my files and my bank account)
  • the password-recovery mechanism should provide some form of authorization so that no one is able to steal my files and ask the producer to unlock them
  • the password-recovery mechanism should be as automatized as it can be so that it does not involve a physical person to unlock each single file
The idea

I use standard cryptography mechanisms to fulfill above requirements: SHA512 will be used to compute hashes, AES will be used to encrypt file data and RSA will be used to recover passwords.

The idea works like this: when the user decides to protect the file with a password, AES is used to encrypt the file data. However, the user is asked to provide both password and email which will then be used to authorize unlock requests.

The application then stores the data in an encrypted form containing three sections:

  1. (E) Base64 form of the email provided by the user (so that it can be easily extracted from the encrypted document)
  2. (C) AES encoded file content with SHA512(Password) is used as an actual password
  3. (S) RSA encoded tuple (Email, SHA512(Password)) (RSA's public key is provided together with the application)

When a password protected file is open, the user is asked to enter his/her password:

 

Note, that the EMail info is extracted from the (E) section of the document. Although it can be easily modified in the encrypted file (it's not encrypted but only base64ed), the email is also stored in one-way RSA signature of the document [(S) section].

After user provides the password, SHA512 of it is used to decrypt the file content using AES as the decryption algorithm.

Then, the triple - decrypted document content, user email and user password - exists in the application's memory. User can then invoke change password operation which, again, recomputes all three sections of the encrypted file, the (E), (C) and (S) and stores the document.

Recovering lost passwords

To recover a lost password, user clicks the [I lost my password] link on the password input window. The application contacts provider's webservice passing three parameters to the webservice - the encrypted filename, the email extracted from the public (E) section of the document and the contents of the (S) section of the document.

Note, that the webservice knows the RSA's private key used to encrypt the (S) section of the document. The (S) section is then decoded.

There are three possibilities:

  • the contents of the (S) section sent to the webservice cannot be decrypted. An email is sent from the server to the provided user email address saying that "sorry but the request cannot be processed"
  • the content of the (S) section is succesfully decrypted, however the user email provided to the webservice from the (E) section and the decoded user email recovered from the (S) section do not match. An email is sent to the provided email address saying "sorry but the document context does not seem to be auhtorized to provided user address"
  • the content of the (S) section is succesfully decrypted and both emails are the same. This means that the user is authorized to access the document data and an email is sent saying "dear user, this is the unlock code for your file" with the SHA512(Password) decrypted from the (S) section of the document

Note how the requirements are fulfilled. The email is stored independently in two sections of the document and the unlock code is generated only when these two addresses match. On the other hand, the provider's service get's no access to plain password but rather it knows the SHA512 hash of it.

The user is then asked to enter the unlock code (which in fact is the SHA512 hash of his password):

Since the application now knows the SHA512 hash of the password, it uses it to AES-decrypt the document data (remember that the hash is used to encrypt the data, not the password itself).

Note, that very few users will try to unlock files encrypted with an email which they are not able to use anymore. As a emergency procedure you can ask users to send such files directly to the service provider where the authorization part can be skipped but then some other authorization mechanisms have to be used.

The implementation

As usual, the reference implementation is provided (fileencrecov.zip) containing two Visual Studio 2008 C# projects: a dummy application and a recovery webservice. The webservice does not send real emails! Instead, a dummy implementation is provided and "emails" are stored in the application's server directory so you can open them with the Notepad and copy/paste the unlock code to the application.

Where to go from there

The idea and the implementations are provided "as is" which means that I do not take any responsibility of any holes in the recovery protocol which can be used to compromise it. You are free to use the code, however, if you wish to use it in a closed-source application you have to notify me of it.

You are also free to yet tune the protocol so that for example salts are used when computing hashes so that the provider's webservice cannot easily store the requests and perform a dictionary attack to retrieve users' plain passwords.

If you belive that the idea or the implementation is flawed in any way, please drop a note so others do not use something which has some fundamental problems.

No comments: