Bulk Merge

BulkMerge() performs an upsert operation: it inserts new rows, updates existing ones, and optionally deletes missing rows from the target table. It’s the most powerful operation in the ETLBox.DbExtensions toolkit, suitable for full or delta synchronization of datasets.

Example

using System.Data.SqlClient;
using ETLBox.DbExtensions;

var connection = new SqlConnection("your-connection-string");

var customers = Enumerable.Range(1, 1_500)
    .Select(i => new Customer { Id = i, Name = $"Update Customer {i}", City = $"City {i % 50}" })
    .Union(
        Enumerable.Range(3_000, 1500)
        .Select(i => new Customer { Id = i, Name = $"New Customer {i}", City = $"City {i % 50}" })
    );

connection.BulkMerge(customers);

public class Customer : IMergeableRow {
    [IdColumn]
    public int Id { get; set; }
    public string Name { get; set; }
    public string City { get; set; }

    public DateTime ChangeDate { get; set; }
    public ChangeAction? ChangeAction { get; set; }
}

Method Signatures

IDbConnection BulkMerge<T>(
    this IDbConnection connection,
    IEnumerable<T> data
)

IDbConnection BulkMerge<T>(
    this IDbConnection connection,
    IEnumerable<T> data,
    Action<MergeBulkOptions<T>> options
)

Merge Modes

  • Full (default): Inserts new rows, updates existing ones, and deletes records from the target table that are missing in your source data.

  • Delta: Inserts and updates only. Deletes only rows marked with DeleteColumns.

  • InsertsAndUpdates: Inserts and updates only. No deletions.

  • InsertsOnly: Will only execute inserts.

  • UpdatesOnly: Will only execute updates.

Matching and Column Control

Merge operations rely on identifying matching records and determining what needs to be updated. This is controlled through the following columns:

  • IdColumn — Defines which property (or properties) uniquely identify a record in the database. Required for all merge operations.
  • CompareColumn — Indicates which properties should be compared between source and target. If values differ, an update is triggered.
  • UpdateColumn — Restricts updates to the specified properties, even if others differ.

You can configure these columns either through attributes in your class:

public class Customer : MergeableRow {
    [IdColumn]
    public int Id { get; set; }

    [CompareColumn]
    [UpdateColumn]
    public string City { get; set; }

    public string Name { get; set; }
}

…or programmatically in MergeBulkOptions:

options.IdColumns = new[] {
    new IdColumn("Id")
};
options.CompareColumns = new[] {
    new CompareColumn("City")
};
options.UpdateColumns = new[] {
    new UpdateColumn("City")
};

If CompareColumns is not specified, all non-ID properties are compared. If UpdateColumns is not specified, all differing non-ID columns will be updated.

Mergeable POCOs

To support merge tracking and internal bookkeeping, your class must either:

  • Implement the IMergeableRow interface or
  • Inherit from the base class MergeableRow

This enables ETLBox to track ChangeAction and ChangeDate, which are required for correctly identifying inserts, updates, deletes, and existing records.

Example using IMergeableRow

public class Customer : IMergeableRow {
    [IdColumn]
    public int Id { get; set; }
    public string Name { get; set; }
    public string City { get; set; }

    public ChangeAction? ChangeAction { get; set; }
    public DateTime ChangeDate { get; set; }
}
public class Customer : MergeableRow {
    [IdColumn]
    public int Id { get; set; }
    public string Name { get; set; }
    public string City { get; set; }
}

The MergeableRow base class provides the required properties (ChangeAction, ChangeDate) out of the box. Use it unless you need full control over change tracking behavior.

Customization Options

You can configure the operation using the optional BulkOptions<T> parameter:

var connection = new SqlConnection("your-connection-string");

var customers = Enumerable.Range(1, 1_500)
   .Select(i => new Customer { Id = i, Name = $"Options Customer {i}", City = $"New City {i % 50}" })
   .Union(
       Enumerable.Range(5_000, 500)
       .Select(i => new Customer { Id = i, Name = $"New Customer {i}", City = $"New City {i % 50}" })
   );

connection.BulkMerge(customers, options => {
    options.ReadGeneratedValues = true;
    options.CompareColumns = new[] { new CompareColumn() { ComparePropertyName = "City" } };
    options.UpdateColumns = new[] { new UpdateColumn() { UpdatePropertyName = "City" } };
    options.MergeMode = MergeMode.Delta;
});

Table Naming Convention

By default, the table name is inferred from the class name. For example:

public class Customer { ... }

This maps to either Customer or Customers.

You can override the name using TableName, or adjust it with TablePrefix and TableSuffix inside BulkOptions.

Partial Caching

For large destination tables, BulkMerge supports partial caching to reduce memory usage. Instead of loading the entire target table into memory, rows are fetched in smaller chunks as needed during processing. This is especially useful for merge operations with millions of records.

Enabling Partial Cache

To enable partial cache mode:

options.CacheMode = CacheMode.Partial;

By default, CacheMode is set to Full, meaning the entire destination table is preloaded into memory.

Read Connection Required

When CacheMode.Partial is used, BulkMerge performs read operations during the merge process while also writing to the database. On most databases, you must provide a separate read connection to avoid conflicts, especially when working inside transactions.

var mainConnection = new SqlConnection("your-connection-string");
var readConnection = new SqlConnection("your-connection-string");

connection.BulkMerge(customers, options => {
    options.CacheMode = CacheMode.Partial;
    options.ReadConnection = readConnection;
});

The ReadConnection must point to the same database but will only be used for lookups during the merge.

Example with Partial Caching

var writeConn = new SqlConnection("your-connection-string");
var readConn = new SqlConnection("your-connection-string");

var customers = Enumerable.Range(1000, 5000)
    .Select(i => new Customer {
        Id = i,
        Name = $"Partial Merge {i}",
        City = $"City {i % 100}"
    });

writeConn.BulkMerge(customers, options => {
    options.CacheMode = CacheMode.Partial;
    options.MaxCacheSize = 1000;
    options.ReadConnection = readConn;
    options.MergeMode = MergeMode.Full;
    options.BatchSize = 1000;
    options.EvictionPolicy = CacheEvictionPolicy.LeastRecentlyUsed;
});

Cache Eviction

When using partial caching, ETLBox manages memory with a configurable eviction policy. Available values:

  • FullRefresh (default)
  • LeastRecentlyUsed
  • LeastFrequentlyUsed
  • FirstInFirstOut
  • LastInFirstOut

Use EvictionPolicy to control how records are removed from the internal cache.

Example Code on GitHub

You can find examples — basic usage and usage with options — in the official demo project on GitHub:

The demo is ready to run and shows how to configure the connection, create the table, and execute bulk operations with real data.