Apache ShardingSphere 5.2.0 is Released! · ShardingSphere - Blog

Apache ShardingSphere 5.2.0 is Released!

Our new 5.2.0 release enhances features such as SQL audit, elastic migration, SQL execution process management, and data governance on the cloud.

Introduction

Since Apache ShardingSphere released version 5.1.2 at the end of June, our community has continued to optimize and enhance its product features. The community merged 1,728 PRs from teams and individuals around the world. The resulting 5.2.0 release has been optimized in terms of its features, performance, testing, documentation, examples, etc.

The establishment of the shardingsphere-on-cloud sub-project shows ShardingSphere’s commitment to being cloud native. We welcome anyone interested in Go, database, and cloud to join the shardingsphere-on-cloud community.

The 5.2.0 release brings the following highlights:

Newly added features, including SQL audit for data sharding and MySQL SHOW PROCESSLIST & KILL, can enhance users’ capability to manage ShardingSphere.

The SQL audit feature allows users to manage SQL audit to prevent the business system from being disrupted by inefficient SQL. The MySQL SHOW PROCESSLIST & KILL feature allows users to quickly view the SQL in execution through the SHOW PROCESSLIST statement, and forcibly cancel slow SQL.

The new version also supports elastic data migration. It supports the migration of data from Oracle, MySQL, and PostgreSQL to the distributed database ecosystem composed of ShardingSphere + MySQL or PostgreSQL, completing the transformation from a single database to a distributed one. The ShardingSphere community will support more features for heterogeneous database migration in future releases. Stay tuned for more updates.

The new version also transferred Helm Charts from the ShardingSphere repository to the shardingsphere-on-cloud sub-project. It is designed to provide distributed database solutions of ShardingSphere + MySQL or PostgreSQL on the cloud. This version significantly improves SQL parsing support for different databases and upgrades DistSQL’s parameter usage specifications. It removes the Memory operating mode from ShardingSphere mode and supports distributed transactions across multiple logical databases. This post will introduce the updates of ShardingSphere 5.2.0.

Highlights

SQL audit for data sharding

In large-scale data sharding scenarios, if a user executes an SQL query without the sharding feature, the SQL query will be routed to the underlying database for execution.

As a result, a large number of database connections will be occupied and businesses will be severely affected by timeout or other issues. If the user performs UPDATE/DELETE operations, a large amount of data may be incorrectly updated or deleted.

In response to the above problems, ShardingSphere 5.2.0 provides the SQL audit for data sharding feature and allows users to configure audit strategies. The strategy specifies multiple audit algorithms, and users can decide whether audit rules should be disabled. If any audit algorithm fails to pass, SQL execution will be prohibited. The configuration of SQL audit for data sharding is as follows.

rules:
- !SHARDING
  tables:
    t_order:
      actualDataNodes: ds_${0..1}.t_order_${0..1}
      tableStrategy:
        standard:
          shardingColumn: order_id
          shardingAlgorithmName: t_order_inline
      auditStrategy:
        auditorNames:
          - sharding_key_required_auditor
        allowHintDisable: true
  defaultAuditStrategy:
    auditorNames:
      - sharding_key_required_auditor
    allowHintDisable: true

  auditors:
    sharding_key_required_auditor:
      type: DML_SHARDING_CONDITIONS

In view of complex business scenarios, the new feature allows users to dynamically disable the audit algorithm by using SQL hints so that the business SQL that is allowable in partial scenarios can be executed. Currently, ShardingSphere 5.2.0 has a built-in DML disables full-route audit algorithm. Users can also implement ShardingAuditAlgorithm interface by themselves to realize more advanced SQL audit functions.

/* ShardingSphere hint: disableAuditNames=sharding_key_required_auditor */ SELECT * FROM t_order;

Elastic data migration

Data migration has always been a focus of the ShardingSphere community. Before 5.2.0, users needed to add an external table as a single sharding table, and then modify the sharding rules to trigger the migration, which was too complex and difficult for ordinary users.

To improve the ease of data migration, ShardingSphere 5.2.0 provides a new data migration feature, coupled with DistSQL for elastic migration. Users can migrate data from the existing single database to the distributed database system composed of ShardingSphere + MySQL or PostgreSQL in an SQL-like manner, achieving the transformation from a single database to a distributed one.

img

The new feature is capable of migrating Oracle data to PostgreSQL. Users can create sharding rules and sharding tables through DistSQL first, that is to create new distributed databases and tables, and then run MIGRATE TABLE ds.schema.table INTO table to trigger data migration.

During the migration process, users can also use the dedicated DistSQL for data migration in the table to manage the migration job status and data consistency. For more information about the new feature, please refer to the official document [Data Migration].

SQL Execution Process Management

The native MySQL database provides the SHOW PROCESSLIST statement, allowing the user to view the currently running thread. Users can kill the thread with the KILL statement for SQL that takes too long to be temporarily terminated.

The SHOW PROCESSLIST and KILL statements are widely used in daily operation and maintenance management. To enhance users’ ability to manage ShardingSphere, version 5.2.0 supports the MySQL SHOW PROCESSLIST and KILL statements. When a user executes a DDL/DML statement through ShardingSphere, ShardingSphere automatically generates a unique UUID identifier as an ID and stores the SQL execution information in each instance.

The following figure shows the results while executing the SHOW PROCESSLIST and KILL statements in ShardingSphere. When the user executes the SHOW PROCESSLIST statement, ShardingSphere processes the SQL execution information based on the current operating mode.

If the current mode is cluster mode, ShardingSphere collects and synchronizes the SQL execution information of each compute node through the governance center, and then returns the summary to the user. If the current mode is the standalone mode, ShardingSphere only returns SQL execution information in the current compute node.

The user determines whether to execute the KILL statement based on the result returned by the SHOW PROCESSLIST, and ShardingSphere cancels the SQL in execution based on the ID in the KILL statement.

Shardingsphere-on-cloud sub-project goes live

Shardingsphere-on-cloud is a project of Apache ShardingSphere providing cloud-oriented solutions. Version 0.1.0 has been released and it has been officially voted as a sub-project of Apache ShardingSphere.

Shardinsphere-on-cloud will continue to release various configuration templates, deployment scripts, and other automation tools for ShardingSphere on the cloud.

It will also polish the engineering practices in terms of high availability, data migration, observability, shadow DB, security, and audit, optimize the delivery mode of Helm Charts, and continue to enhance its cloud native management capabilities through Kubernetes Operator. Currently, there are already introductory issues in the project repository to help those who are interested in Go, Database, and Cloud to quickly get up and running.

Enhancement

Kernel

The ShardingSphere community optimizes the SQL parsing capability of different databases in this release, greatly improving ShardingSphere’s SQL compatibility.

Detailed SQL parsing optimization can be seen in the update log section below. It’s a long-term mission for the ShardingSphere community to improve SQL parsing support. Anyone who is interested is welcome to work with us.

Version 5.2.0 also supports the column-visible feature for MySQL, Oracle, SQLServer, and H2 databases, in a bid to meet the requirements of business SQL compatibility during a system upgrade. The read/write splitting feature supports the Cartesian product configuration, which greatly simplifies user configurations.

Access Port

In version 5.2.0, ShardingSphere-Proxy is capable of monitoring specified IP addresses and integrates openGauss database drivers by default. ShardingSphere-JDBC supports c3p0 data sources, and Connection.prepareStatement can specify the columns.

Distributed Transaction

In terms of distributed transactions, the original logical database-level transaction manager is adjusted to a global manager, supporting distributed transactions across multiple logical databases.

At the same time, it removed the XA statement’s ability to control distributed transactions as XA transactions are now automatically managed by ShardingSphere, which simplifies the operation for users.

Update logs

Below are all the update logs of ShardingSphere 5.2.0. To deliver a better user experience, this release adjusted the API of part of the functions, which can be seen from the API changes part below.

New Feature

Enhancement

Bug Fix

Refactor

API Changes

🔗 Download Link

🔗 Update Logs

🔗 Project Address

🔗 Cloud Sub-project Address

Community Contribution

The Apache ShardingSphere 5.2.0 release is the result of 1,728 merged PRs, committed by 64 Contributors. Thank you for your efforts!

Author

Duan Zhengqiang, a senior middleware development engineer at SphereEx & Apache ShardingSphere PMC.

He started to contribute to Apache ShardingSphere middleware in 2018 and used to play a leading role in sharding practices dealing with massive data. With rich practical experience, he loves open-source and is willing to contribute. Now he focuses on the development of Apache ShardingSphere kernel module.