The current data migration solution uses a completely new database cluster as the migration target.
This implementation has the following advantages:
The implementation has the following disadvantages:
A single data migration mainly consists of the following phases:
In the preparation stage, the data migration module verifies data source connectivity and permissions, counts stock data statistics, records the log and finally shards the tasks according to data volume and parallelism set by the users.
Execute the stock data migration tasks that have been sharded during preparation stage. The stock migration stage uses JDBC queries to read data directly from the source and write into the target based on the sharding rules and other configurations.
Since the duration of stock data migration depends on factors such as data volume and parallelism, it is necessary to synchronize the data added to the business operations during this period. Different databases differ in technical details, but in general they are all based on replication protocols or WAL logs to achieve the capture of changed data.
The incremental data captured is also written into the new data nodes by the data migration modules. When synchronization of incremental data is completed (the incremental data flow is not interrupted since the business system is still in function), you can then move to the traffic switching stage.
During this stage, there may be a read-only period of time, where data in the source data nodes is allowed to be in static mode for a short period of time to ensure that the incremental synchronization can be fully completed. Users can set this by shifting the database to read-only status or by controlling the traffic flow generated from the source.
The length of this read-only window depends on whether users need to perform consistency checks on the data and the exact amount of data in this scenario. Consistency check is an independent task. It supports separate start/stop and breakpoint resume.
Once confirmed, the data migration is complete. Users can then switch the read traffic or write traffic to Apache ShardingSphere.