Friday, October 11, 2013

Soft Delete pattern for Entity Framework Code First

Some time ago I’ve blogged on how to implement the Soft Delete pattern with NHibernate. This time I am going to show how to do the same with Entity Framework Code First.

(a side note: I really like the EFCF, I like its simplicity the and migration infrastructure. I tend to favor EFCF over other ORMs lately)

I’ve spent some time looking for a working solution and/or trying to come up with something on my own. There are solution that almost work, like the one by Zoran Maksimovic from his post “Entity Framework – Applying Global Filters”. Zoran’s approach involves cleverly replacing DbSets by FilteredDbSets internally in the DbContext. These FilteredDbSets have filtering predicates attached so that filtering occurs upon data retrieval. Unfortunately, this approaches missing the fact that filtering should also be applied to navigation properties. Specifically, this works correctly in Zoran’s approach

// both loop correctly over non-deleted entities only
foreach ( var child in context.Child ) ...
foreach ( var parent in context.Parent )...

but this fails miserably

foreach ( var parent in ctx.Parent )       // ok
  foreach ( var child in parent.Children ) // oops, deleted entities are included!
     ...

However, another solution has been proposed by a StackOverflow user Colin. This solution involves a discriminator column which normally is used when mapping class hierarchies to mark different types of entities mapped to the same table. There is the link to the original entry.

My job here is merely:

  • cleaning this up so that it complies
  • making it a little bit more general as the original approach makes some assumptions (a common base class for all entities where the primary key is always called “ID”)
  • adding a cache for the metadata so that all the metadata searching doesn’t have to be repeated over and over

All the credit goes to Colin, though.

Let’s start with entities:

public class Child
{
    public long ID { get; set; }
 
    public string ChildName { get; set; }
 
    public bool IsDeleted { get; set; }
 
    public virtual Parent Parent { get; set; }
}
 
public class Parent
{
    public long ID { get; set; }
 
    public string ParentName { get; set; }
 
    public bool IsDeleted { get; set; }
 
    public virtual ICollection<Child> Children { get; set; }
}

Nothing unusual as all the Soft Delete stuff is in the DbContext:

/// <summary>
/// http://stackoverflow.com/questions/19246067/ef-5-conditional-mapping/19248216#19248216
/// http://stackoverflow.com/questions/12698793/soft-delete-entity-framework-code-first/18985828#18985828
/// </summary>
public class Context : DbContext
{
    public DbSet<Child>  Child { get; set; }
    public DbSet<Parent> Parent { get; set; }
 
    public Context()
    {
        Database.SetInitializer<Context>( new MigrateDatabaseToLatestVersion<Context, Configuration>() );
    }
 
    protected override void OnModelCreating( DbModelBuilder modelBuilder )
    {
        modelBuilder.Entity<Child>()
            .Map( m => m.Requires( "IsDeleted" ).HasValue( false ) )
            .Ignore( m => m.IsDeleted );
        modelBuilder.Entity<Parent>()
            .Map( m => m.Requires( "IsDeleted" ).HasValue( false ) )
            .Ignore( m => m.IsDeleted );
 
        modelBuilder.Conventions.Remove<System.Data.Entity.ModelConfiguration.Conventions.PluralizingTableNameConvention>();
    }
 
    public override int SaveChanges()
    {
        foreach ( var entry in ChangeTracker.Entries()
                  .Where( p => p.State == EntityState.Deleted ) )
            SoftDelete( entry );
 
        return base.SaveChanges();
    }
 
    private void SoftDelete( DbEntityEntry entry )
    {
        Type entryEntityType = entry.Entity.GetType();
 
        string tableName      = GetTableName( entryEntityType );
        string primaryKeyName = GetPrimaryKeyName( entryEntityType );
 
        string deletequery =
            string.Format(
                "UPDATE {0} SET IsDeleted = 1 WHERE {1} = @id",
                    tableName, primaryKeyName );
 
        Database.ExecuteSqlCommand(
            deletequery,
            new SqlParameter( "@id", entry.OriginalValues[primaryKeyName] ) );
 
        //Marking it Unchanged prevents the hard delete
        //entry.State = EntityState.Unchanged;
        //So does setting it to Detached:
        //And that is what EF does when it deletes an item
        //http://msdn.microsoft.com/en-us/data/jj592676.aspx
        entry.State = EntityState.Detached;
    }
 
    private static Dictionary<Type, EntitySetBase> _mappingCache = 
       new Dictionary<Type, EntitySetBase>();
 
    private EntitySetBase GetEntitySet( Type type )
    {
        if ( !_mappingCache.ContainsKey( type ) )
        {
            ObjectContext octx = ( (IObjectContextAdapter)this ).ObjectContext;
 
            string typeName = ObjectContext.GetObjectType( type ).Name;
 
            var es = octx.MetadataWorkspace
                            .GetItemCollection( DataSpace.SSpace )
                            .GetItems<EntityContainer>()
                            .SelectMany( c => c.BaseEntitySets
                                            .Where( e => e.Name == typeName ) )
                            .FirstOrDefault();
 
            if ( es == null )
                throw new ArgumentException( "Entity type not found in GetTableName", typeName );
 
            _mappingCache.Add( type, es );
        }
 
        return _mappingCache[type];
    }
 
    private string GetTableName( Type type )
    {
        EntitySetBase es = GetEntitySet( type );
 
        return string.Format( "[{0}].[{1}]", 
            es.MetadataProperties["Schema"].Value, 
            es.MetadataProperties["Table"].Value );
    }
 
    private string GetPrimaryKeyName( Type type )
    {
        EntitySetBase es = GetEntitySet( type );
 
        return es.ElementType.KeyMembers[0].Name;
    }
}

A couple of explanations.

First, the mapping. Note that the discriminator column is used to force EF to focus on undeleted entities. This adds the filtering predicate to all queries, including queries involving navigation properties.

modelBuilder.Entity<Child>()
    .Map( m => m.Requires( "IsDeleted" ).HasValue( false ) )
    ...

But then the discriminator column has to be removed from the mapping:

modelBuilder.Entity<Child>()
    ...
    .Ignore( m => m.IsDeleted );

This is enough to make EF generate correct queries, you can ignore the following stuff for a moment and just try it.

Second, the data saving. It is not enough to be able to filter the data, the Soft Delete also requires that deleting should actually only mark data as deleted. This is done in the overridden SaveChanges method. For each entity that is internally marked as deleted in the EF’s object cache, we manually update it in the database and then mark them as unattached (just like EF’s SaveChanges does).

Third, the caching stuff, GetEntitySet/GetTableName/GetPrimaryKeyName. These are for reading metadata so that the query that marks the data can include correct table name and correct primary key name for given entity type.

And this is it, deleting the data

var child = ctx.Child.FirstOrDefault( c => c.ID == 123 );
ctx.Child.Remove( child );
correctly updates its state to deleted (IsDeleted=1) instead of physically deleting it from the database.

16 comments:

PapRoch said...

i would love to try this. i really need to implement soft delete. I am using dynamic data Entity Framework 4. I am new to all this stuff and i have no idea where to start or put what you have here. is there a sample project i can see and follow?

Wiktor Zychla said...

@Debra: you need to switch to EF5 to be able to compile my code. As for working example - just put all three classes, Parent/Child/Context in one project and you are done. This is really as simple as that.

Unknown said...

Great article! Would it not be better to use interceptors for soft delete?

With regards to using discriminators - what is the best why to handle the creation of "IsDeleted" when using migrations? (since it is set to ignore)

Thanks!

//Jens

Wiktor Zychla said...

@Jens: in practice - we have a base class for the dbcontext and two derivatives: the first one with soft deletes and the second one without. The latter is used for migrations, the former - for querying. Because both contexts derive from the same base context, the base class is used to establish obligations to client classes. This way one or the other context can be injected depending on the actual requirement.

Ben Lewies said...

@Wiktor

Hi Wiktor

I have been thinking about soft delete for a couple of days. At this juncture, I am of the opinion that the functionality does not quite make sense if you have a decent data auditing function in place (which implies that any delete that takes place is in effect only a "soft delete", since errors can be undone by reconstructing your record from the audit database).


From a requirements perspective, this opens a whole can of worms as well.

For myself, this sort of functionality, if you wish to implement it, should be done in a more generic manner as well. For me this translates to simply having to annotate your POCO's with some attributes, and some generic repository or DBContext derivative doing all of the work for you from there. It should not be necessary to manually override DBContext methods and doing custom wiring for all the entities you want to be able to soft delete.

Perhaps a nice candidate for an open source project? Something I might be interested in contributing towards.

I have only started using EF a month ago, so still new to it. I am, however, surprised that this functionality (and decent audit loggin) is not available out of the box already, considering that we are already at the 6th major release of the technology.

Wiktor Zychla said...

@Ben: interesting insight. We've been using the Soft Delete since 2009, incidentally, as it is the default (and the only) delete mode in DevExpress XPO.

I don't think we have observed any major drawbacks, in all applications that use XPO the data is deleted rarely. Thus, marked records are only a small fraction of all other data. Also, because of indexes put on discriminator columns, there are no performance penalties. I really haven't yet found the "can of worms" then.

On the other hand, having the data still in specific tables is often handy.

The presented solution is very generic and I really like it. I would probably prefer a derived context with a query tree rewriter that scans tress and appends discriminator clauses where applicable. I've tried that approach and I've failed. Specifically, I've tried to implement an interceptor using the QueryInterceptor (https://github.com/davidfowl/QueryInterceptor).

However, starting from EF6, EF has its own interception infrastructure. In particular, the IDbCommandTreeInterceptor looks promising (http://msdn.microsoft.com/en-us/library/system.data.entity.infrastructure.interception.idbcommandtreeinterceptor.treecreated%28v=vs.113%29.aspx) however I haven't found enough time to explore this possible route.

Marty said...

Your example doesn't work on EF 6.1 - the .HasValue() method in the mapping configuration no longer takes a parameter - it's an "IS NOT NULL" check only.

Wiktor Zychla said...

@Marty: as long as I can see the trunk, your statement is not true, still three versions of HasValue there. It still compiles under 6.1.0. http://entityframework.codeplex.com/SourceControl/latest#src/EntityFramework/ModelConfiguration/Configuration/Mapping/ValueConditionConfiguration.cs

Marty said...

@Wiktor -- you're correct; I assumed (incorrectly) that I could use the .Requires() overload that takes an expression rather than the overload that takes a string.

The .Requires() overload that takes an expression returns a NotNullConditionConfiguration, and the .HasValue() method on on that type doesn't take any parameters.

I hate typing magic strings (like column names) anywhere in my code and prefer the strongly-typed overloads whenever possible. But in this case I may not have a choice.

Marty said...

One last followup - for my database, I track "soft delete" via a column that is null for not deleted and has a value when deleted. I can't do a .HasValue(null) as that causes an error at runtime.

I suppose I can introduce a boolean IsDeleted but would rather not. Don't see any alternative though.

Wiktor Zychla said...

@Marty: the way soft deletes have to be implemented in EF is not the cleanest one. Since there are no global filters, the trick with discriminator columns at least works but is not ideal. nHibernate shines here, comparing to EF.

Daniel Mauritzson said...

Hi, we are looking to improve our Soft Delete implementation and this look promising. On question though: We would still like to be able to show values from deleted entries in some instances, would that be possible with this solution?

Wiktor Zychla said...

Yep, you need two context with two different metadata mapping. Both could even derive one from the other, EF doesn't care. All it cares is you have two different context types (so that two different mappings can be done).

Unknown said...

Looks good.

What happens with constraints? I.e. a non-nullable foreign row (which would return null if the foreign row was soft deleted)

Unknown said...

I don't understand the ".Ignore" part on the model builder.
When I do that, it doesn't include that property in the database table - which is how I understand it's supposed to work, so I have no IsDeleted column at all. What am I missing?

reader said...

Thank you very much for this solution! Ready for use and clean. Appreciate that!