The 5 pitfalls of a data migration - making a data migration painful

For all references and definition, see the page called “References & Glossary for the 5 pitfalls...”

Summary

This white paper provides an overview and a detailed view of pitfalls making painful a data migration related to Dynamics 365 or Dynamics CRM.

Regardless the type of data migration (migration to Dynamics online or migration to Dynamics On-Premises), native and custom items could delay the data migration process.

Those items should be categorized into 5 groups: data context, platform context, functional context, code context, architecture context.

Keywords

Data Migration, D365, Dynamics 365 On-Premise, Dynamics CRM On-Premise, Pitfalls, Data Context, Platform, Code, Customizations, CRM services, Data security Model, CRM Functionalities, CRM Database.

Confidentiality

The white paper does not provide any confidential client data related to data migration. All the code in this white paper is only related to the CRM SDK365 code.

Introduction: What are the main pitfalls during migration?

A data migration focuses on the movement of data between source (legacy data system and business) and destination (target system). Hence, in our white paper, a data migration will focus on the movement of data between Dynamics CRM / 365 on-premises (source) and Dynamics CRM / 365 on-premise or online (destination) [1].

However, pitfalls are real and can delay the data migration. Even with the most thoroughly tested tools and procedures, we need to identify those pitfalls and find a way to overcome the latter.

Data migration options

Without emphasizing the technical aspect of an option, in this case, whether it is a "Cloud" technology (cloud technology) or an "on Premise" technology, there are two execution contexts [2]:

OPTION	DATA SOURCE	DATA DESTINATION	DESIGN OF SCENARIO
TYPE A	DATABASE (DB)	DATABASE (DB)	DB/DB
TYPE B	DATABASE (DB)	SERVICE	DB/SERVICE

Data migration options applied to Dynamics CRM

Emphasizing the technical aspect of an option, what does it mean regarding the platform Dynamics CRM or Dynamics 365?

OPTION	DATA SOURCE	DATA DESTINATION	DESIGN SCENARIO	DYNAMICS 365
TYPE A	DATABASE	DATABASE	DB/DB	FROM DYNAMICS ON-PREM TO DYNAMICS ON-PREM
TYPE B	DATABASE	SERVICE	DB/SERVICE	FROM DYNAMICS ON-PREM TO DYNAMICS ON-LINE

From these options, could be extracted the cases of so-called figures specific to each execution context mentioned in the table above (types a, b, c, d, and e). So, what is a specific option?

Data migration options in details

A specific scenario is characterized by the following three elements: the data and metadata structure, the technology or platform in which the transfer is constrained and the technological context, and finally, our last element, which is the version and build number of the platform.

Description of option “type A”: from database to database, from Dynamics on-prem to Dynamics on-prem

OPTION - TYPE A (DB/DB)	DATA AND METADATA STRUCTURE	TECHNOLOGY/PLATEFORM	VERSION/PLATFORM BUILD NUMBER
TYPE A – 1	SIMILAR STRUCTURE	SIMILAR PLATEFORM	SIMILAR VERSION
TYPE A – 2	DIFFERENT STRUCTURE	SIMILAR PLATEFORM	DIFFERENT VERSION
TYPE A – 3	DIFFERENT STRUCTURE	DIFFERENT PLATEFORM	DIFFERENT VERSION

Description of option “type B”: from database to service, from Dynamics on-prem to Dynamics online

OPTION - TYPE B (DB/SERVICE)	DATA AND METADATA STRUCTURE	TECHNOLOGY/PLATEFORM	VERSION/PLATFORM BUILD NUMBER
TYPE B – 1	SIMILAR STRUCTURE	SIMILAR PLATEFORM	SIMILAR VERSION
TYPE B – 2	DIFFERENT STRUCTURE	SIMILAR PLATEFORM	DIFFERENT VERSION
TYPE B – 3	DIFFERENT STRUCTURE	DIFFERENT PLATEFORM	DIFFERENT VERSION

Selected option

Before the list of types mentioned above, it should be noted that our data migration takes place in TYPE A and more precisely in TYPE a-3.

Migration is the migration of data considering the three elements of our definition regarding our option:

→ CRM data and metadata structure differs from the D365 data and metadata structure.

→ The technology and the platform are different. From a .NET point of view, even if the platforms are based on the .NET Framework, their physical and application architectures differ from each other.

→ Hence, their versions and build numbers differ from each other.

OPTION - TYPE A (DB/DB)	DATA AND METADATA STRUCTURE	TECHNOLOGY/PLATEFORM	VERSION/PLATFORM BUILD NUMBER
TYPE A – 1	SIMILAR STRUCTURE	SIMILAR PLATEFORM	SIMILAR VERSION
TYPE A – 2	DIFFERENT STRUCTURE	SIMILAR PLATEFORM	DIFFERENT VERSION
TYPE A – 3	DIFFERENT STRUCTURE	DIFFERENT PLATEFORM	DIFFERENT VERSION

1. Generic Framework

Based on a generic framework, we asked ourselves questions related to the following concept: pitfalls …what are the pitfalls we could face during a data migration? To answer this question, I decided to regroup them in 5 categories: data context, platform context, functional context, code context and finally, architecture context.

2. Pitfalls Overview

During the data migration process and precisely during the milestones of data migration, pitfalls should occur. Each pitfall mentioned above (data context, platform context, functional context, code context and architecture context) should occur in all milestones of your data migration and should delay the process. A data migration is composed of important milestones in transferring business and system data from one Dynamics CRM platform to a new Dynamics 365 platform. Hence, a milestone is not a key factor in Dynamics CRM data migration.

3. Pitfalls Details

3.1 Pitfalls from data context

Data context could be categorized into five categories: data security model, data sharing, data model denormalization, data logs and audit and finally, data volume.

Data CRM Security Model

Complexity of data security model in the source:

By default, a team is associated with a business unit and the latter has a parent business unit. Dynamics allows:

→ To associate users with new and restricted teams and so, with restricted business unit.
→ To redefine the ownership of the data with a user or a team.

However, to limit access to business data in Dynamics CRM, we can give the ownership of business data to a team or even a single user. Then, we complexify the CRM data security model because we associate CRM business data with team or user, newly owner of the data. Doing so, we prefer to use the ownership to restrict access to records and do not use the native behavior of Dynamics CRM, behavior characterized by the security hierarchy and the Business Unit.

The complexity of the data security model could be a pitfall, specially with the Mailbox entity (New feature in D365 security model) !

If we start the data migration from an old version of Dynamics CRM (i.e. CRM 2011), we need consider that the native CRM security model has changed because the model includes the concept of “Mailbox”. Regarding the new versions (from CRM2013 to D365), the entity “User” is linked to the following entities “Mailbox” and “Queue”, creating a new CRM security model.

Data Sharing

What can be shared? Is there a limit to migrate shared data?

Sharing personally created views, charts, dashboards and even records in Dynamics CRM/365 could not be easier. But there is a cost to share data in CRM:

→ All shared data are stored in a table called “Principal Object Access” (POA) [3]. Because it is easy to share, the acceptable limit of records in the table could be easily exceeded and could become an issue to the migration. Effectively, as the limit is 1 Million, what do we do with the extra lines? Do we need to manage right away the volume of data because POA table is well known for its ability to cause performance issues?

→ Migration is complexified because POA is linked to entities such as SystemUserPrincipals and TeamMemberShip, and so, is linked to CRM security model (user, team…).

Complexity is increased because POA is linked to native and custom business entities (i.e. “account”).

Data Model Denormalization

Dynamics CRM allows to create a lot of relationships between entities and so, between tables. To do so, we can create as much as possible, those relationships in several ways:

Create objects such as lookup and customer in our business entities (native & custom).
Assign a record to a user or a team can increase the denormalization of the data because we update the ownership of the record. Ownership (OwnerId, Owner Team, Owner Business Unit, Owner User) concept is linked to all entities even the custom we create.
Also, “Regarding Object Id” can be linked to entities and increase the denormalization of our data model.

Directly create relationships from the following CRM features: customization through one-to-many relationships, customization through many-to-one relationships and customization through many-to-many relationships.

Data Logs and Audit

Different types of logs should be considered as pitfalls because the data could be relevant. But do we need to keep logs related to internal CRM process or custom process we developed?

Logs regarding the management of mailboxes in CRM by the server-side-sync could be relevant (i.e. Server-Side-Sync stores logs into “Trace log” table).
Logs regarding the customization deployed in CRM organization could be relevant (i.e. execution of workflows are stored in “Workflow Log” table).
Logs created by internal CRM process (i.e. CRM Async Service uses data from AsyncOperation table to execute internal process, CRM Async Service storing data in “Bulk Operation Log” table).
Logs created to audit the user actions in Dynamics CRM could be relevant for business (i.e. CRM SDK365 logs relevant data related to the following operations: update, create, delete, etc.).
Dynamics 365 App for Outlook (Server-Side Sync) uses 2 tracing mechanisms: Alert Wall (“TraceLog” table of CRM database) and Telemetry :

→ Alert Wall can also be very helpful in debugging processing failures;

→ Regarding Telemetry, Server-Side Sync sends data in ETW (Event Tracing for Windows) : first, data related to performance counters; secondly, data related to error information as verbose and warning information (i.e. ErrorSource, ErrorScope, TraceSnippet, ServerErrorCode, CrmTraceCode, MachineName, and Timestamp).

Regarding business needs or organizational constraints, we need to enable auditing in CRM.

“Audit” table could have become bigger and bigger and the challenge is how to maintain audit history creating an unmanageable database size over time and specially, in data migration process. Then, what business rules could help us to clean the data or archive data we will not want to migrate? Audit can be a pitfall because “Audit” table does not have any constraints that would be violated by moving data from the table to an audit archive database.

Data Volume (Row Number and Denormalization)

Data volume could be a pitfall because:

Bigger is the size of the database, longer should be the data migration.
Bigger is the size of the database, more functional testing and CRM deployment should be necessary.
Bigger is the size of the database, more indexes management should be necessary. Moreover, indexes management will be impacted by the degree of database denormalization.
Bigger is the size of the database, bigger should be the impact on the performance: 2 examples should be mentioned:

→ large AsyncOperation and PrincipalObjectAccess tables in Dynamics CRM has known performance issues that can occur because the number of records is larger than 1,000,000 rows;

→ excessive indexes fragmentation in Dynamics CRM database has knows performance issues that can occur because the average logical fragmentation exceeds 10% and the table could contain more than 5000 SQL Server pages.

PrincipalObjectAccess (POA) : a lot of data could be related to business data. Bigger is the size, more complex should be the strategy of data cleaning.

3.2 Pitfalls from platform context

Platform Cumulative Updates and Hotfixes (Version 8.2 & 9.0)

Regarding each major version of Dynamics CRM/365 on-premise (i.e. version 8.2 or version 9.0), there are a lot of cumulative updates: each month, until December 2020, there is a new update rollup include all the hotfixes that were released for limited distribution. The rollups have 2 purposes: enhance unexpected behavior and repair functionality.

Here is the list of cumulative updates of December 2020 for both major version of Dynamics (8.2 & 9.0) :

Dynamics 9.0: 21 rollups are available – from March 2019 to December 2020.
Dynamics 8.2: 25 rollups are available – from October 2019 to December 2020.

Update the version of the platform could be a pitfall because:

More we update the platform; more we need to include a new rollup deployment in the data migration;

More we need to test and validate the impact on the data previously migrated !

Black Box Behavior (Front-End and Back-End CRM Services)

Dynamics CRM/365 on-premise is a black box and so, unexpected behavior could derive from two services: CRM frontend services (CRM Client-code) and CRM backend services (CRM Async Service & CRM Async Service Maintenance). All the behaviors and the communication between the frontend and backend services are not easily apprehended, although we can:

decompile CRM backend services code and we can extract and analyze client-code (JavaScript);
extract and analyze SQL queries (i.e. stored procedures) of CRM.

Using external tools (decompiler, sql analyzer, …) could make it painfull.

System and Custom Metadata (Mapping old to new D365 values)

All retrievable, system and custom metadata has a “MetadataId” primary key to make it unique and has values that can be retrieved by primary key or by name.

The values extracted from the database source could be different from the database or platform destination. Then, we should map the old values (source values) to the new values (destination values).

3.3 Pitfalls from functional context

Non-Supported Functionalities (system functionalities)

Regarding Dynamics CRM, there are two groups of functionalities : one group related to inner CRM workings (i.e. auto-numbering management, relationship roles management, duplicate detection rules management, and many more) and another group related to integration CRM workings (i.e. SharePoint integration and document management, Exchange integration and mailbox management, and many more). If custom functionalities are replacing native functionalities, they could become pitfalls in a data migration process because:

migration code must adapt to new data and metadata model (dynamics on-premise, on-line);
or replace custom functionalities with native functionalities and then, adapt migrated data to the new data model.

Overridden Functionalities (unsupported JavaScript)

Web Resources in Dynamics CRM could be the place to override native CRM functionalities. Exactly, it is possible to override native functionalities with JavaScript (client-code), functionalities related to:

CRM forms: via “crmForm” tag, we can override native functionalities (i.e. resize the form).
CRM Ribbons: via custom utilities, we can override native functionalities related to CRM component (i.e. CRM grid).
CRM attributes: via custom utilities, we can override native functionalities related to native or custom attributes (i.e. turbo forms that are a new form rendering option in Dynamics 365 and not legacy form rendering).
CRM pages: via custom utilities, we can override the page refreshing from a save event.
CRM navigations: via custom utilities, we can override the native JavaScript event (i.e. onclick event).

Validation process could not pass. So, Custom and overriding code (JavaScript) is becoming a part of the data migration process; And Data migration process could be delayed because we would have to adapt our code or redesign our client-code to make sure it will work in our new platform.

3.4 Pitfalls from code and customization context

Customization Dependencies

In dynamics CRM, there is a way to determine the level of customization and so, the level of customization dependencies: DependencyBase and DependencyNodeBase tables are the starting point of our analysis and help to establish 2 points:

→ if the CRM component is dependent

→ if the CRM component is required

Regarding the mentioned points above, dependencies are related to the component type : entity, entity relationship, attribute, site map, saved query, save query visualization, ribbon command, ribbon context group, ribbon customization, ribbon rule, ribbon tab, ribbon diff, system form, etc.

Bigger is the number of dependency nodes, riskier is the data migration process and bigger could be the impact of CRM Deployment (CRM solutions).

Code and Customization Cohabitation and Dependencies (C# and JavaScript)

In Dynamics CRM, there are mainly 2 forms of development: no-code development (customization with workflows) and code development (plug-ins and code activity in C#, client-code in JavaScript). However, when real-time workflows are combined with plug-ins, 2 problems can occur:

→ Workflow can change the business data because it triggers plug-in and fires an update event;

→ Workflow can create unexpected behavior because it triggers plug-in execution causing an infinite loop;

If we have a design consideration when mixing plug-ins and workflows in Dynamics CRM, then how could we integrate this consideration into the data migration process?

A design issue has created useless dependencies and a potential negative impact on the migration process.

3.5 Pitfalls from Architecture context

Embedded CRM Systems (Dynamics 365 App for Outlook)

Dynamics CRM offers the opportunity to use an embedded application: “Dynamics 365 App for Outlook”. The latter helps the users to track CRM entities (CRM data) from MS Outlook (Office 365, Microsoft 365). Dynamics 365 App for Outlook integration could be a pitfall for the following reasons :

If users have tracked data into CRM by accident and it was never cleaned on the other side, then, we need to include a strategy considering this severity of the issue: data is confidential. It will automatically impact the execution of the migration code and delay our migration process.

Knowing that Dynamics 365 online or Power Platform have requests limits, we should not migrate data we want to remove anyway.

Dynamics 365 App for outlook uses “Server-Side Sync” mechanism to enable the user mailbox:

→ then, the capability to send, to receive, to forward emails in Dynamics 365;

→ then, the capability to track entities from Outlook to Dynamics 365.

Knowing this major issue, we must include the configuration of the “server-side sync” related to Exchange Online in the data migration process.

CRM System Maintenance (Log and AsyncOperation table)

Because “AsyncOperation” table is become too large, it could affect the performance and delay the data migration. The database is become bigger because asynchronous background operations (i.e. asynchronous workflows, bulk operations, system events, updates match codes…) save data in “AsyncOperation” table (i.e. which entity and with entry (id) is processed, at which time the operation finished, who is the owner of the task and many more).

Data migration process needs to determine the number of records to delete. Precisely, process needs to target the right data and perform the right cleanup of the table. To do so, process must include the list of the following tables:

→ PrincipalObjectAccess

→ WorkflowLogBase

→ BulkDeleteFailureBase

→ BulkDeleteOperationBase

→ WorkflowWaitSubscriptionBase

→ AsyncOperationBase

However, process needs to be very accurate because all data in “PrincipalObjectAccess” and “WorkflowLogBase” tables should not necessarily be deleted and should be kept because of business needs and business audits.

CRM System Externalization

Queue-Based System

Custom CRM functionalities developed for custom business needs need to be executed in batch mode. But, they can be and should be externalized because it will help to:

→ Avoid issue related to infinite loop during the plug-in execution;

→ Avoid performance issue because CRM plug-in architecture is not made for batch process.

Architecture queue-based could help to externalize custom CRM functionalities. However, data migration process should include this queueing process because data could be in a waiting line and could be a pitfall. Regarding queue-based architecture, many patterns exist (Publisher/Subscriber, Priority Queue, Queue-Based Load Leveling, etc.), question is the following:

How do we include data stocked in a waiting line in our data migration process? Regarding the pattern used to store data in a queue, we might not include data in a waiting line, but at what cost?

Notification-Based System

Dynamics CRM/365 Architecture allows 2 ways of notifying external applications about events in CRM by making requests to receiver endpoints with information about the events:

→ In previous version of Dynamics CRM to V. 9.0, Plugins Architecture allows us to create notifications through the event execution pipeline (pre-event, platform core operations, post-event). As those versions do not have encapsulated all the necessary objects, we must create all the objects from scratch to create notifications. Then, we must register a step in the Plugin Architecture of Dynamics CRM in synchronous or asynchronous mode.

→ In Dynamics CRM V. 9.0, Webhooks pattern has been introduced for connecting Web APIs and service based on the publish/subscribe model [4]. On the contrary of the previous versions of Dynamics CRM, V.9.0 has encapsulated all the objects in the CRM database we need to create notifications: endpoint URL, authentication options and execution mode. Then, we must register a webhook in the Plugin Architecture of Dynamics CRM (not a step) in synchronous or asynchronous mode.

Architecture notification-based could help to notify external applications. However, data between applications may be out of sync. And this could be a pitfall. Regarding notification-based architecture, question is the following: How do we include data out of sync in our data migration process? Regarding the pattern used to notify external applications, we might not include data in our process, but at what cost?

CRM Solutions Deployment (CRM Application Lifecycle Management)

Because we need to validate data which has been migrated, we need to deploy our CRM solutions (zip files) to evaluate the impact of the new data during and after the CRM deployment process. However, different ways of organizing CRM solutions in a CRM organization could be a pitfall because CRM solutions create dependencies into the database:

→ If all CRM solutions are “unmanaged” and we need to make them “managed”, how could we include this change into our data migration, knowing that we upgrade the platform to a new build version?

→ If a CRM organization contains “unmanaged” and “managed” and “unmanaged” solution is built from the “managed” solution, how could we include this pattern into our data migration, knowing that we upgrade the platform to a new build version?

→ If in the CRM Deployment of the previous CRM organization (source), process used to include data fix and configuration adjustment, how could we include this subprocess into our data migration?

→ If in the previous CRM Organization, Third-Party has been deployed in PROD environment and not in DEV or TEST environment, how could we include this pattern into our data migration?