Heterogeneous migration: reducing Dangdang’s customer system RTO 60x and increasing speed by 20% · ShardingSphere - Blog

Heterogeneous migration: reducing Dangdang’s customer system RTO 60x and increasing speed by 20%

Apache ShardingSphere helps Dangdang rebuild its customer system with 350 million users, and seamlessly transition from a PHP+SQL Server technology stack to a Java+ShardingSphere+MySQL stack. The performance, availability, and maintainability of its customer system have been significantly improved, which is the best practice of ShardingSphere’s heterogeneous migration.

Dangdang’s customer system

Dangdang’s customer system is mainly responsible for account registration, login, and privacy data maintenance. Its previous technology stack was based on PHP and SQL Server, which means a standard centralized architecture, as shown in the figure below. Image description

Before the rebuild project started, several business modules of the customer system had encountered multiple problems and technical challenges, such as logical decentralization, low throughput, and high operation & maintenance costs.

To improve customers’ shopping experience, Dangdang’s technical team decided to optimize the business logic and underlying data architecture to achieve the availability, scalability, and comprehensive improvement of the customer system in multiple scenarios. The rebuild also introduced many technological innovations such as cross-data source double write, read/write splitting, intelligent gateway, and gray release.

Dangdang’s technical team completed the system rebuild within half a year, from demand design, sharding planning, logic optimization, and stress testing to its official launch.

The project used Java to reconstruct more than ten modules, build distributed database solutions through ShardingSphere & MySQL, and finally complete the online migration of heterogeneous databases. The project boasts the following highlights:

Pain points & challenges

Business pain points At the business level, the registration and login logic of some modules of the customer system was scattered at different ends. This resulted in high maintenance costs, and the old technical architecture was limited in terms of performance improvement and high availability.

Challenges

Solutions

Overall planning To improve the maintainability, availability, and performance of the customer system, the R&D team reorganized the customer system architecture.

At the application layer, the goal was to unify the function logic of all terminals and improve business maintainability.

At the database layer, the centralized architecture was transformed into a distributed database architecture to improve performance and availability, which is exactly the open-source distributed solution built by ShardingSphere & MySQL.

The overall architecture design introduced multiple schemes, such as distributed primary-key generation strategy, shard management, data migration verification, and gray release.

Distributed primary-key generation strategy

Distributed primary-key generation strategy is the first problem to be solved if database architecture is to be transformed from a centralized architecture to a distributed one based on middleware.

During the system rebuild, we chose to build two or more database ID-generating servers. Each server had a Sequence table that records the current ID of each table. The step size of ID that increases in the Sequence table is the number of servers. The starting values are staggered so that the ID generation is hashed to each server node.

Implementing sharding (Apache ShardingSphere)

During the customer system rebuild, database sharding was completed through Apache ShardingSphere, and the read/write splitting function was also enabled.

Due to the requirements of the customer system for high concurrency and low latency, the access end chose ShardingSphere-JDBC, which is positioned as a lightweight Java framework and provides additional services in Java’s JDBC layer.

It connects directly to the database via the client and provides services in the form of a jar package without additional deployment and dependence. It can be viewed as an enhanced version of the JDBC driver, fully compatible with JDBC and various ORM frameworks. Image description

Sharding: ShardingSphere supports a complete set of sharding algorithms, including modulo operation, hash, range, time, and customized algorithms. Customers use the modulo sharding algorithm to split large tables. Read-write splitting: in addition to Sharding, ShardingSphere’s read/write splitting function is also enabled to make full use of MHA cluster resources and improve system throughput capacity. Image description

Double-write & data synchronization

Data synchronization runs through the whole rebuild project, and the integrity and consistency of data migration are vital to the rebuild.

This example periodically synchronizes SQL Server’s historical data to MySQL based on Elastic-Job synchronization. During the database switchover, a backup scheme is used to double-write the database to ensure data consistency. The process consists of:

Step 1: implement the double-write mechanism

Disconnect link 1, get through links 2, 3, 4, and then 9, 10.

Step 2: switch the login service

Disconnect links 9,10, get through link 7 and disconnect link 5.

Step 3: switch read service

Get through link 8 and disconnect link 6.

Step 4: cancel the double-write mechanism

Disconnect link 2 and complete the switchover. Image description

Data verification is performed periodically on both the service side and the database side. Different frequencies are used in different time periods to sample or fully check data integrity. COUNT/SUM is also verified on the database side.

Customer system reconstruction adopts an apollo-based gray release. In the process of new login processing, configuration items are gradually released and sequential cutover within a small range is implemented to ensure the launch success rate. The rebuilt system architecture is shown in the following figure. Image description

Advantages

After the rebuild, the response speed of Dangdang’s customer system is significantly improved, and the daily operation & maintenance costs are also reduced.

The distributed solution provided by ShardingSphere plays a big part in this. The solution is suitable for various high-traffic Internet platform services, as well as e-commerce platforms and other data-processing systems.

Conclusion

This is ShardingSphere’s second implementation by Dangdang, following the previous one we shared in the post “Asia’s E-Commerce Giant Dangdang Increases Order Processing Speed by 30% — Saves Over Ten Million in Technology Budget with Apache ShardingSphere”.

Apache ShardingSphere provides strong support for enterprise systems, as the project strives for simplicity and perfection, to achieve simpler business logic and maximum performance.

Apache ShardingSphere Project Links:

ShardingSphere Github

ShardingSphere Twitter

ShardingSphere Slack

Contributor Guide