ShardingSphere adopts a set of automatic execution engine, responsible for sending the true SQL, which has been routed and rewritten, to execute in the underlying data source safely and effectively. It does not directly send the SQL through JDBC to execute in the underlying data source simply, or put execution requests directly to the thread pool to execute concurrently, but focuses more on the creation of a balanced data source connection, the consumption generated by the memory usage, the maximum utilization of the concurrency and other problems. The objective of the execution engine is to automatically balance between the resource control and the execution efficiency.
From the perspective of resource control, the connection number of the business side’s visit of the database should be limited. It can effectively prevent the excessive resource occupation of some certain business exhausts the database connection resources and influences the normal use of other businesses. Especially when one database contains many tables, a logic SQL that does not contain any sharding key will produce a large amount of physical SQLs that fall into different tables in one database. If each physical SQL takes an independent connection, a query will undoubtedly take up excessive resources.
From the perspective of execution efficiency, holding an independent database connection for each sharding query can make effective use of multi-thread to improve execution efficiency. Opening an independent thread for each database connection can parallelize IO produced consumption. Holding an independent database connection for each sharding query can also avoid loading the query result to the memory too early. It is enough for independent database connections to maintain result set quotation and cursor position, and move the cursor when acquiring corresponding data.
The way to merge result set by moving down its cursor is called stream merger. It does not require to load all the query results to the memory and is able to, memory resource effectively, and thus, reduces the frequency of trash recycle.
When it is not able to make sure each sharding query holds an independent database connection, it requires to load all the current query results to the memory before reusing that database connection to acquire the query result from the next sharding table. Therefore, though the stream merger can be used, it will also degenerate to the memory merger under this kind of circumstances.
The control and protection of database connection resources is one thing, adopting better merging model to save the memory resources of middleware is another thing. How to deal with the relationship between them is a problem that should be solved by ShardingSphere execution engine. To be accurate, if a sharding SQL needs to operate 200 tables under some database case, should we choose to create 200 parallel connection executions or a serial connection execution? Or to say, how to choose between efficiency and resource control?
Aiming at the situation above, ShardingSphere has provided a solution. It has put forward a Connection Mode concept divided into two types,
MEMORY_STRICTLY mode and
The prerequisite to use this mode is that the database does not restrict the connection number of one operation. If the actual executed SQL needs to operate 200 tables in some database case, it will create a new database connection for each table and deal with them concurrently through multi-thread, to maximize the execution efficiency. When the SQL is up to standard, it will choose stream merger in priority to avoid memory overflow or frequent garbage recycle.
The prerequisite to use this mode is that the database strictly restrict the connection consumption number of one operation. If the SQL to be executed needs to operate 200 tables in database case, it will create one database connection and operate them serially. If shards exist in different databases, it will still be multi-thread operations for different databases, but with only one database connection being created for each operation of each database. It can prevent the problem brought by excessive occupation of database connection from one request. The mode chooses memory merger all the time.
The MEMORY_STRICTLY mode is applicable to OLAP operation, and can increase the capacity of the system by removing database connection restrictions. It is also applicable to OLTP operation, which usually has sharding keys and can be routed to a single shard. So it is a wise choice to control database connection strictly to make sure resources of online system databases can be used by more applications.
Which mode to use at first is up to users’ setting and they can choose to use MEMORY_STRICTLY mode or MEMORY_STRICTLY mode according to their actual business situations.
The solution that gives users the right to choose in the dilemma requires them to know the advantages and disadvantages of both modes and make decision according to the actual business situations. No doubt, it is not the best solution due to increasing users’ study cost and use cost.
This kind of dichotomy solution lacks flexible coping ability to switch between two modes with static initialization. In practical situations, route results of each time may differ with different SQL and placeholder indexes. It means some operations may need to use memory merger, while others are better to use stream merger. Connection modes should not be set by users before initializing ShardingSphere, but should be decided dynamically by the situation of SQL and placeholder indexes.
To reduce users’ use cost and solve the dynamic connection mode problem, ShardingSphere has extracted the thought of automatic execution engine in order to eliminate the connection mode concept inside. Users do not need to know what are so called MEMORY_STRICTLY mode and CONNECTION_STRICTLY mode, but let the execution engine to choose the best case according to current situations.
Automatic execution engine has narrowed the selection scale of connection mode to each SQL operation.
Aiming at each SQL request, automatic execution engine will do real-time calculations and evaluations according to its route result and execute the appropriate connection mode automatically, to strike the most optimized balance between resource control and efficiency.
For automatic execution engine, it is all right for users to only set
maxConnectionSizePerQuery, which represents the maximum connection number allowed by each database for one query.
The execution engine can be divided into two phases:
As indicated by its name, this phrase is used to prepare the data to be executed. It can be divided into two steps: result set grouping and unit creation.
Result set grouping is the key to realize the internal connection model concept.
According to the setting option of
maxConnectionSizePerQuery, execution engine will choose an appropriate connection mode combined with current route result.
Detailed steps are as follow:
Group the SQL route result according to the name of data sources.
Through the equation in the following picture, acquire the SQL route result group to be executed by each database case within the range that maxConnectionSizePerQuery permits; calculate the most optimized connection mode of this request.
Within the range that maxConnectionSizePerQuery permits, when the request number that one connection needs to execute is more than 1, meaning current database connection cannot hold the corresponding data result set, it must uses memory merger. On the contrary, when it equals to 1, meaning current database connection can hold the according data result set, it can use stream merger.
Each choice of connection mode aims at each physical database. That is to say, in one query, the connection mode of each database may not be the same but in a mixed existence form, if routed to more than one databases. Execution unit will use the route group result acquired from the last step to create the unit to execute. Execution unit refers to creating corresponding database connection for each route group result.
When data source uses technologies, such as database connection pool, to control database connection number, there is some chance of sending out deadlock, if it has not dealt with concurrency properly. As multiple requests waiting for each other to release database connection resources, it will generate hunger wait and cause the problem of crossing deadlock.
For example, suppose one query needs to acquire two database connections from a data source and apply them in two table sharding queries routed to one database. It is possible that Query A has already acquired a database connection of that data source and waits to acquire another connection; in the same time, Query B has also finished the same thing and waits. If the maximum connection number that database connection pool permits is 2, those two query requests will wait forever. The following picture has illustrated the deadlock situation:
To avoid deadlock, ShardingSphere will go through synchronous processing as acquiring database connection. As creating execution units, it acquires all the database connections that this SQL requires for once with atomic method, and reduces the possibility of only acquiring part of the resources. Due to the high operation frequency, locking database connection each time when acquiring database connection can decrease the concurrency of ShardingSphere. Therefore, it has improved two aspects here:
Avoid the setting that the condition for locking only takes one database connection one time. If it only acquires one connection each time, the situation that two requests wait for each other will not happen, so there is no need for locking. Most OLTP operations use sharding keys to route to the only data node, which will make the system in a totally unlocked state, thereby improve the concurrency efficiency further. In addition to the situation of routing to a single shard, read-write split also belongs to this category.
Only aim at MEMORY_STRICTLY mode to lock resources. When using CONNECTION_STRICTLY mode, all the query result sets will release database connection resources after loading to the memory, so the problem of deadlock wait will not appear.
Applied in actually SQL execution, this phrase can be divided into two steps: group execution and merger result generation.
Group execution can distribute execution unit groups generated in preparation phrase to the underlying concurrency engine and send events according to each key steps during the execution process, such as starting execution event, successful execution event and failed execution event. Execution engine only focuses on message sending rather than subscribers of the event. Other ShardingSphere modular, such as distributed transactions, invoked chain tracing and so on, will subscribe events interested in and do corresponding operations.
Through the connection mode acquired in preparation phrase, ShardingSphere will generate memory merger result set or stream merger result set, and transfer it to the result merger engine for the next step work.
The overall structure division of execution engine is shown as the following picture: