Edit

Share via


Suggested workflow for a complex data migration

This article suggests a step-by-step process for migrating large amounts of data. When transferring data from a powerful cloud-based CRM, it's important to plan carefully because of its complex setup—like custom objects, links between data, and unique record IDs. You need to think through both the technical steps and how the migration works in practice.

  • Technical approach: Covers key migration steps—extracting, transforming, and loading data into Dataverse—while ensuring integrity, preserving relationships, and optimizing performance through validation and error handling.
  • Functional approach: Covers functional migration tasks like data segmentation and archiving, and highlights the need to involve business stakeholders to ensure the data meets their needs.

Technical approach for data migration

Ensure a smooth migration by following a structured approach—extract, transform, and load data while preserving integrity and minimizing disruption.

Diagram illustrating a data migration workflow with six interconnected circles as steps.

Extract data from source to staging database

For complex data migrations, we recommend staging data in a separate database (for example, SQL Server). This staging area captures a snapshot of the source system without disrupting ongoing business operations.

Key considerations:

  • Full vs. delta load: Organize data as full or incremental (delta) loads. Use autogenerated timestamps to track data arrival and identify changes for future loads.
  • Failover handling: Design the process to skip failed records (for example, due to field length, invalid lookups) without halting the migration. Log and resolve issues before reprocessing.
  • Field mapping: Convert source values to match target formats in the staging layer and value ranges in the staging database before migrating the data to Dataverse to improve efficiency.
  • Data validations: Run integrity checks to catch issues like missing references. Since data extraction can span hours or days, use the staging layer to filter incomplete records and ensure consistency.
  • Data visualization: Use the staging database to audit and analyze data—for example, count records or sum financial fields—before final migration.

Transform data into target staging database

After you extract data from the source system, transform it into a target staging database that mirrors the Dataverse schema and contains values ready for direct insert or update.

Key transformation steps:

  • Field mapping: Map source columns to target Dataverse columns. Use scripts to join and merge tables where needed.

  • Optionset conversion: Convert text-based optionset values to Dataverse integers by using a mapping table (for example, OptionSetMapping) and bulk update queries. Create a table to standardize and automate the transformation of optionset values from source to target systems.

    Table: OptionSetMapping

    Column name Data type
    Source table name string
    Target table name string
    Source text string
    Target text string
    Target value string

    Use the OptionSetMapping table to efficiently transform and update optionset values in bulk. For example, to update all optionset values in the Contact table based on matching text values:

    Update C.\<OptionsetValue\> = M.\<TargetValue\> 
    FROM Contact C 
    JOIN OptionsetMapping M 
      ON C.OptionsetText = M.TargetText 
      AND M.TargetTableName = 'Contact'
    
  • Avoid custom GUIDs: Let Dataverse generate GUIDs to prevent fragmentation and performance issues.

  • String length checks: Ensure string values fit Dataverse limits. Trim or adjust as needed.

  • Calculated fields: Add derived fields (for example, Name for lookups) if missing in the source.

  • Other consideration: When designing tables to match the Dataverse schema, consider the following key columns and supporting tables.

    • DataMigration_CreatedDateTime: Autopopulated timestamp for tracking data load batches.
    • Action flag: Indicates Insert (I), Update (U), or Delete (D).
    • Processing flag: Tracks status—Processed (P), Unprocessed (U), Error (E), or Success (S).
    • Unique column: Use a unique ID (for example, the unique ID from the source system) to map records.
    • Success/Error tables: Maintain separate tables (for example, Contact_Success, Contact_Error) to log outcomes and support retries.

Sequence tables and preload lookups

After static transformations, order your tables to reduce cyclic dependencies—cases where tables reference each other, making isolated imports impossible. Use this approach:

  • List all tables eligible for migration.
  • Count unique lookups per table (ignore out-of-the-box fields like Created By and other table lookups if not migrating).
  • Sort tables in ascending order by lookup count.
  • Include N:N relationship tables, counting both lookups.
  • Exclude multi-table lookups (for example, "regarding" fields).

This approach defines the sequence of data migration load and works well in most scenarios. For more complex cases:

  • Use a unique identifier (for example, importsequencenumber) to match records between staging and Dataverse when GUIDs are generated during insert.
  • Separate success and error logs to avoid locking issues and improve performance.
  • Preload lookup GUIDs from already migrated tables to resolve references during insert.
  • Handle cyclic dependencies by:
    • Inserting records without dependent lookups.
    • Updating those lookups after related records are loaded.

Load data into Dataverse

The next step is to determine and implement your approach to loading data into Dataverse.

  1. Tooling: Select a tool based on data size and complexity:

    • SDK Configuration Migration Tool
    • Azure Data Factory
    • KingswaySoft
    • Scribe
    • XrmToolBox’s Data Transporter
  2. Key considerations (tool-agnostic):

    • Handle cyclic dependencies: Sequence table loads to minimize circular lookups. Insert records without dependent lookups, then update them later.

    • Track record IDs: Capture Dataverse GUIDs in a success table, then update the main table by using a unique identifier (for example, importsequencenumber).

    • Optimize batch size and number of threads: Review guidance for optimizing performance for bulk operations. The application you use must manage service protection errors that occur when extraordinary numbers of requests are sent to Dataverse. If you write your own code and use the Dataverse Web API, make sure you retry 429 errors as described in Service protection API limits. If you use the Dataverse SDK, it manages these errors for you.

      To achieve optimal performance, tune batch size and thread count based on table complexity:

      • Out-of-the-box (OOB) tables (for example, Contact, Account, Lead): These tables are slower due to built-in plugins and jobs. Recommended: Batch size 200–300, up to 30 threads (if ≤10 lookups and 50–70 columns).
      • Simple tables (few or no lookups): Recommended: Batch size ≤10, up to 50 threads.
      • Moderately complex custom tables (some lookups): Recommended: Batch size ≤100, up to 30 threads.
      • Large/complex tables (>100 columns, >20 lookups): Recommended: Batch size 10–20, up to 10–20 threads to reduce errors.
  3. Infrastructure tips: To maximize data migration performance, run your migration from a virtual machine (VM) located in the same region as your Dataverse environment. This approach significantly reduces latency and speeds up the entire process. Learn how to determine the region of your Dataverse environment.

  4. Error handling: Don't ignore errors—resolve them to prevent cascading failures. Use defaults (for example, blank lookups, default optionset values) to insert placeholder records and capture GUIDs.

  5. Status updates: Only set the active status during initial record insertion. For inactive records or custom state/status codes, update them after data validation. For most custom tables, status updates can follow immediately after insert. However, for special tables like Case, Opportunity, or Lead, delay status updates until the end of the migration. Once these records are closed, they can't be modified unless reopened—a time-consuming process that risks data integrity.

  6. Ownership and security: Set the correct record owner during data insertion, as both user-level and business unit security in Dataverse are tied to the owner's business unit. Assign the right business unit at creation—updating it afterwards removes all security roles.

    • Use stub users:
      • Dataverse supports stub users (nonlicensed), which are useful for large or historical migrations. Stub users are automatically assigned the Salesperson security role—don't rename or modify this role. Stub users can own records if they have user-level read access to the relevant tables.
    • Recommendations:
      • Create all nonlicensed users during migration with the correct business unit set at insert time.
      • Don't change the business unit after creation—doing so removes all roles, including Salesperson.
      • Ensure the Salesperson role has read access to all migration-eligible tables.
      • Even users disabled in the Dataverse environment with this role can own records.
  7. Currency handling: Set exchange rates during insert by using a prevalidation plugin, as Dataverse doesn't support historical rates.

Post data load into Dataverse

After loading data into Dataverse, follow these steps to ensure data integrity and minimize downstream issues:

  1. Update the main table with GUIDs:

    • After a successful load, copy the Dataverse record GUIDs from the Success Table into the Main Table by using a unique identifier, such as importsequencenumber.
    • Update the Processing Flag to mark records as:
      • P – Processed
      • E – Errored
      • U – Unprocessed This strategy enables efficient reruns by skipping already processed records and supports lookup resolution in subsequent loads.
  2. Retry failed records: To reduce rework and maintain referential integrity, consider these actions:

    • Trim string values if they exceed allowed lengths.
    • Apply default optionset values when mappings are missing.
    • Assign a fallback owner if the original owner isn't available (even as a stub user).
    • Use blank or default values for unresolved lookups.
    • Even placeholder records can help generate GUIDs needed for lookups in related tables.

Using elastic tables for data migration

Elastic tables are designed to handle large volumes of data in real time. With elastic tables, you can import, store, and analyze large volumes of data without scalability, latency, or performance issues.

Elastic tables offer unique capabilities for flexible schema, horizontal scaling, and automatic removal of data after a specific time period.

Elastic tables are stored in Azure Cosmos DB and support:

  • Schema-less data via JSON columns
  • Automatic horizontal scaling
  • Time-to-live (TTL) for auto-deletion of stale data
  • Partitioning for performance optimization

Elastic tables are best suited for bulk imports with variable schema.

Elastic tables are ideal for specific data types.

Data type Description
Raw ingestion data Source logs, sensor feeds, or bulk exports from legacy systems. For example, customer interaction logs from a legacy ERP, old email threads, and support tickets from the previous system.
Semi-structured records Data with optional or evolving fields that don't fit a rigid schema. For example, customer feedback forms with optional fields, or event registration forms with custom notes or tags.
Staging data for validation A temporary holding zone before syncing data to relational tables. For example, imported lead data awaiting deduplication and validation before being added to the main Leads table.
Time-sensitive or expiring data Use TTL (Time-to-Live) for auto-deletion of temporary CRM records. For example, promotional discount codes tied to a campaign, one-time access links for customer surveys or onboarding portals, and temporary survey responses.
Partitioned bulk data Partition data by ID or category to improve performance and scalability. For example, partition by account ID or region ID during bulk data migration, or segment customer activity logs by campaign ID for analytics.

Data types unsuitable for elastic tables

Elastic tables are optimized for flexible, high-scale scenarios—but not every data type fits. This section highlights common CRM data patterns that are better stored elsewhere to ensure performance, cost-efficiency, and maintainability. Learn more about features currently not supported with elastic tables

Data type Reason
Highly relational data Elastic tables don't support joins or lookups
Business-critical records No transactional integrity or plugin support
Data requiring complex validation Better handled in standard tables with business rules

Functional data segmentation and archival framework

Effective technical planning includes selecting the right tools and infrastructure, aligning source and target data volumes, and setting up auditing and reconciliation processes. Many migrations become complex due to a lack of upfront analysis, especially around what data needs to move and where it belongs. This section outlines the core principles of data analysis to support a successful migration.

Data segmentation

Data segmentation is a key step in migrating from one CRM system to Dataverse. Organize data tables by business function—such as sales, service, or marketing—to simplify migration planning and execution.

Tables segmentation

Start by listing all tables eligible for migration, grouped by business area (for example, sales, marketing, service). Then:

  • Document the schema in Excel or a similar tool.
  • Run basic queries in the source system to check column usage.
  • Flag low-use columns. If fewer than 5% of records contain values, consult business stakeholders to decide whether to keep or discard them.

This simple analysis can significantly reduce migration scope. In long-running CRM systems, it’s common to eliminate 30–40% of columns and up to 20% of tables, streamlining the process and improving performance.

Columns relevancy

Some source system columns map directly to Dataverse, while others become calculated fields. Separate these columns and consult business stakeholders to decide if migration jobs are needed.

Ignore columns that are only relevant in the source system or not meaningful in the target. This includes many out-of-the-box fields like Created By, Modified By, or Row Version Number, unless they serve a specific purpose in your migration.

File type data

If your source system includes file-type data, flag these fields early and plan a separate migration strategy. Consider the following file types:

  • Office documents (for example, Word, Excel, PowerPoint): For up to 20,000 files, migrate to a collaborative platform like SharePoint to enable multi-user access.
  • Multimedia files (for example, images, videos): Choose a platform that supports playback. Options include SharePoint, streaming services, or other media-friendly storage solutions.
  • Large volumes or file sizes: If storage cost is a concern, use Azure Blob Storage or the native file column in Dataverse, which uses Azure Blob behind the scenes.
  • Malware protection: Run files through a malware detection tool (for example, Azure Advanced Threat Protection) before migration to ensure security.

After reviewing file relevance, you often find that total data volume drops significantly—especially in long-running CRM systems—making the migration more efficient.

Data archival strategies

Some data—like old emails, closed cases, or disqualified leads—remains important but is rarely accessed. To reduce migration volume without disrupting business operations, develop a smart archival strategy.

Step 1: Identify archivable data

Common candidates include:

  • Emails older than three years
  • Closed cases
  • Lost opportunities
  • Disqualified leads
  • Marketing emails, posts, and audit logs

Review your system to identify other tables that you can archive.

Step 2: Choose an archival approach

  • Keep data in the source system. Retain a few admin licenses for access while deactivating others to reduce costs.
  • Move to external storage. Use local databases, Azure Blob Storage, or Azure Tables to store archived records. This approach reduces storage and migration costs but requires a separate migration strategy.
  • Use a separate Dataverse environment. This option is less common, but it's useful if you want to isolate archived data. You can retire this environment later to simplify cutover planning.

To ensure fast and reliable data migration into Dataverse:

  • Use a virtual machine (VM) in the same region as your Dataverse environment to reduce latency and improve migration speed.
  • Choose a high-performance VM. At minimum, use a D4 VM with eight cores, 28-GB RAM, and 500-GB storage to handle large data volumes efficiently.
  • Prefer a local database on the VM. Avoid remote connections during migration. If you use Azure Data Factory, deploy it in the same region as your Dataverse environment for optimal performance.

Next step