Optimizing CRM Online Data Migration Performance

Written By: Tyler Sand

from October 29, 2012

After doing several data migrations into CRM Online, I noticed that the process was slower than with the on premise data migrations that I have done in the past.  For projects with relatively small data sets, this may not always cause a huge issue since it is still possible to get the data migration done over several days or a weekend; however with large data sets, it could end up taking weeks or even months to fully complete a data migration.  To get to the bottom of it and find out what improvements could be made, I recently had a call with Chris Brooks, Principal Program Manager on the Dynamics CRM team at Microsoft.  According to Chris, CRM Online performance comes down to down things, latency and concurrency.  Based on his suggestions and feedback, plus my own trial and error, I am going to describe my findings to optimize data migration performance.

Latency

It is important to understand that CRM Online organizations are hosted in the Microsoft US East and US West data centers.  When signing up for a new CRM Online organization, the system randomly assigns you a primary data center and unless your primary data center goes down, this is the data center that you always interact with.  Currently and unfortunately, this process does not assign a primary data center based on geography, nor are customers given a choice of data center, such as they are when provisioning new resources in Windows Azure.  Customers on the west coast are just as likely to be assigned to the east coast data center as they are to the west coast data center, and for companies located in the Midwest, they are approximately equally as far away from both data centers.  The result is that there can end up being high latency in the connection between you and your primary data center since the packets of data have to travel over a long distance.

To test your CRM Online performance Microsoft provides a diagnostic tool within CRM that tests latency and bandwidth, among several other things.  To access this diagnostic tool, simply append “/tools/diagnostics/diag.aspx” to your root domain name.  For example, if your organization was named testorganization5000, the URL to the diagnostic tool would be https://testorganization5000.crm.dynamics.com/tools/diagnostics/diag.aspx.  Once you arrive at the page, click the Run button to have the system performance the tests.

As you can see from my results, the Latency Test results show that the latency between my data migration server and the Microsoft data center is 114ms (compared with a latency to my on premise CRM server of <5ms).  While this may not see overly unreasonable, this can have a huge impact on data migrations.  If your data migration does multiple operations per source record, such as retrieving related values to populate looks and creating/updating multiple records, this can require multiple round trips to the server so that latency can come into play many times per source row (two seeks and two inserts would result in 114ms * 4 = 456ms of additional time per source row).

Since customers do not have control over the location of their primary data center and since most people do not have the flexibility of continuously traveling back and forth between the east and west coasts to get as physically close to that data center as possible, Windows Azure virtual machines can be a great solution!  Conveniently, the CRM Online and Windows Azure data centers are located in the same physical locations, which results in low latency and high bandwidth.  Running the same diagnostic tests as before from Windows Azure produces a completely different set of results, on par with the results that you would see in an on premise environment; the latency is only 4ms in Windows Azure, compared to 114ms from my environment.

Microsoft does not notify you as to which data center you are hosted in unless you open a support request but it can be easily discovered by running the diagnostics from a virtual machine on each coast because the results will be noticeably different.  As a comparative test, I migrated 100 account records from a SQL database to CRM Online using Scribe Insight on my local server and on a Windows Azure virtual machine.  While it took 110 seconds (1.1 seconds per record) to migrate the accounts from my local server, it only took 20 seconds (0.2 seconds per record) to migrate the account records from the Windows Azure virtual machine.  The results were striking, with over five times the performance.

Since Windows Azure virtual machines are not free, the costs of hosting such an environment do need to be considered and factored into data migration projects.  Between the minimized downtime for the customer and the monetary savings of less consulting hours, the value can be quickly and easily recognized in nearly all data migration projects.  The cost of running a medium virtual machine, which would be the minimum that I would recommend in most data migration scenarios and includes two processors and 3.5GB of RAM, is $115.20/month, plus the cost of bandwidth and storage.  Fortunately, Microsoft has a free 90 day trial available that includes compute instances, bandwidth, storage, and many other benefits for free, which can definitely take a significant bite out of that cost.  For more information or to sign up, please visit http://www.windowsazure.com/en-us/pricing/free-trial/.

Concurrency

To further improve on the performance provided by minimizing latency, concurrency (multi-threading) can be utilitized.  Many integration tools, such as Scribe Insight (which I will be focusing on in this section), have this type of functionality built in.  Both Scribe Small Business and Scribe Standard, including migration licenses, have the ability to run up to eight simultaneous message processors, and Scribe Enterprise has the ability to run up to 64 simultaneous message processors.  In both instances, one is always reserved for running time based integrations, so you are limited to seven and 63 simultaneous message processors respectively.  At a high level, this works by converting your source data into individual XML messages within a Windows Message Queue that can then be processed independently by each message processor.  By taking advantage of concurrency, I was able to migrate the same 100 accounts records as before from the virtual machine in Windows Azure to CRM Online in less than 5 seconds, with the initial connections already warmed up. 

One important consideration is the resource requirements for concurrency, since each additional integration process requires additional resources, mainly processing power (CPU) and memory.  I was able to run the seven concurrent message processors on a medium virtual machine without maxing out the resources, but a small instance would not have been enough, nor would a medium virtual machine have adequate resources for many more message processors (definitely not 63).  It is extremely important to monitor the performance of the environment and adjust the resources as needed.

In summary, to maximize CRM Online data migration performance, it is important to minimize latency and increase concurrency.  By combining these two elements, it is possible to increase performance to be on par or even exceed the performance of an on premise environment.  While these considerations definitely take a bit more planning and work in the short term, it can provide invaluable benefits to data migration projects.