How to Run SQL Trace with ShardingSphere-Agent · ShardingSphere - Blog

How to Run SQL Trace with ShardingSphere-Agent

Apache ShardingSphere, a data service platform that follows the Database Plus concept for distributed database systems, offers a range of features, including data sharding, read/write splitting, data encryption, and shadow database. In production environment, especially in data-sharding scenarios, SQL tracing is critical for monitoring and analyzing slow queries and abnormal executions. Therefore, a thorough understanding of SQL rewriting and query execution is crucial.

What is ShardingSphere-Agent

ShardingSphere-Agent provides an observable framework for ShardingSphere. It is implemented based on Java Agent technology, using Byte Buddy to modify the target bytecode and weave them into data collection logic. Metrics, tracing and logging functions are integrated into the agent through plugins to obtain observable data of system status. Among them, the tracing plugin is used to obtain the tracing information of SQL parsing and SQL execution, which can help users analyze SQL trace when using Apache ShardingSphere-JDBC or Apache ShardingSphere-Proxy.

This post will take ShardingSphere-Proxy as an example to explain how to use ShardingSphere-Agent for SQL tracing.

Two Basic Concepts You Need to Know

Before starting with the article, here are two important concepts that need to be paid attention to first:

When running a SQL statement in ShardingSphere-Proxy, it goes through parsing, routing, rewriting, execution, and merging. Currently, tracing has been implemented in two critical steps: parsing and execution — with execution oftentimes being the focus. In the execution stage, Proxy will connect to the physical database to execute the actual SQL. Therefore, the information obtained during this stage provides important evidence for troubleshooting issues and fully reflects the correspondence between logical SQL and actual SQL after rewriting.

In ShardingSphere-Proxy, a trace consists of three types of spans:

Span Description
/ShardingSphere/rootInvoke/ This span indicates the complete execution of an SQL statement, and you can view the amount of time spent on executing an SQL
/ShardingSphere/parseSQL/ This span indicates the parsing stage of the SQL execution. You can view the parsing time of an SQL and the SQL statements. (It is not available when a PreparedStatement is used.)
/ShardingSphere/executeSQL/ This span indicates the rewritten SQL is executed. And the time spent on executing is also available. (This span is not available if the SQL doesn’t need to be executed in the backend physical database).

How to use ShardingSphere-Agent for SQL tracing

For the convenience of viewing the tracing data, Zipkin or Jaeger is usually used to collect and display the tracing data. Currently, ShardingSphere-Agent supports reporting trace data to both components. Next, let’s use the sharding scenario as an example to explain how to report data and analyze the SQL trace.

Configuring ShardingSphere-Proxy

Finally, there will be tables t_order_0 and t_order_2 created in the physical database demo_ds_0, and t_order_1 and t_order_3 tables in the physical database demo_ds_1.

After ShardingSphere-Proxy is well configured, the next step is to introduce how to report SQL trace data to Zipkin and Jaeger through ShardingSphere-Agent.

Reporting to Zipkin

plugins:
 tracing:
   OpenTelemetry:
     props:
       otel.service.name: "shardingsphere" # the service name configured
       otel.traces.exporter: "zipkin" # Use zipkin exporter
       otel.exporter.zipkin.endpoint: "http://localhost:9411/api/v2/spans" # the address where zipkin receives data
       otel-traces-sampler: "always_on" # sampling setting
       otel.metrics.exporter: "none" # close OpenTelemetry metric collection

./bin/stop.sh

./bin/start.sh --agent

Let’s analyze the trace of the insert query. After finding the trace, you can see the execution details of this query. The Tags information in the /shardingsphere/parsesql/ span shows that the parsed SQL is consistent with the SQL executed on the client.

There are 4 /shardingsphere/executesql/ spans shown in the span table. After reviewing the details, it is found that the following two SQL statements were executed in the storage unit ds_0:

insert into t_order_0 (order_id, user_id, address_id, status) VALUES (4, 4, 4, 'OK')
insert into t_order_2 (order_id, user_id, address_id, status) VALUES (2, 2, 2, 'OK')

The following two SQL statements are executed in the storage unit ds_1

insert into t_order_1 (order_id, user_id, address_id, status) VALUES (1, 1, 1, 'OK')
insert into t_order_3 (order_id, user_id, address_id, status) VALUES (3, 3, 3, 'OK')

Then log in to the physical database to check the corresponding data (after executing the insert query)

Due to the t_order table being partitioned into 4 shards and data with order_id 1 to 4 being inserted, one record will be inserted into each of the t_order_0, t_order_1, t_order_2, and t_order_3 tables. As a result, there will be 4 /shardingsphere/executesql spans. The displayed SQL trace is consistent with the actual execution results. So you can view the time spent on each step through the span and also know the specific execution of the SQL through the /shardingsphere/executesql/ span.

The following is the trace details of the select, update, and delete queries, which are also consistent with the actual situation.

Reporting to Jaeger

plugins:
 tracing:
   OpenTelemetry:
     props:
       otel.service.name: "shardingsphere" # the service name configured
       otel.traces.exporter: "jaeger" # Use jaeger exporter
       otel.exporter.otlp.traces.endpoint: "http://localhost:14250" # the address where jaeger receives data
       otel.traces.sampler: "always_on" # sampling setting
       otel.metrics.exporter: "none" # close OpenTelemetry metric collection

./bin/stop.sh

./bin/start.sh --agent

Since the executed SQL queries are the same as those in the Zipkin example, the trace data should also be the same. As an example, we will use the trace from the insert query.

From the following picture, their are one parsed span and 4 executed span

Storage unit ds_0 has executed the following two SQL statements

insert into t_order_0 (order_id, user_id, address_id, status) VALUES (4, 4, 4, 'OK')
insert into t_order_2 (order_id, user_id, address_id, status) VALUES (2, 2, 2, 'OK')

Storage unit ds_1 has executed the following two SQL statements

insert into t_order_1 (order_id, user_id, address_id, status) VALUES (1, 1, 1, 'OK')
insert into t_order_3 (order_id, user_id, address_id, status) VALUES (3, 3, 3, 'OK')

By analyzing the span number, the parsing result of SQL statement and the execution process, it is concluded that the whole SQL link is in line with the expectation

Use Sampling Rate

Sampling is very common when the amount of trace data in the production environment is very large. Shown as follows, set the sampling rate to 0.01 (sampling rate of 1%). OpenTelemetry Exporters is used for exporting data here, and please refer to the OpenTelemetry Exporters document for detailed parameters. [5]

plugins:
 tracing:
   OpenTelemetry:
     props:
       otel.service.name: "shardingsphere"
       otel.metrics.exporter: "none"
       otel.traces.exporter: "zipkin"
       otel.exporter.zipkin.endpoint: "http://localhost:9411/api/v2/spans"
       otel-traces-sampler: "traceidratio"
       otel.traces.sampler.arg: "0.01"

Summary

SQL Tracking allows developers and DBAs to quickly diagnose and locate performance bottlenecks in applications. By collecting SQL tracing data through ShardingSphere-Agent and using visualization tools such as Zipkin and Jaeger, the time-consuming situation of each storage node can be analyzed, which helps to improve the stability and robustness of the application, and ultimately enhances the user experience.

Finally, you’re welcome to join ShardingSphere slack channel[6] to discuss your questions, suggestions or ideas about ShardingSphere and ShardingSphere-Agent.

[1] Download ShardingSphere-Proxy

[2] DistSQL document

[3] Zipkin official

[4] Jaeger official

[5] OpenTelemetry Exporters document

[6] ShardingSphere slack channel