MYSQL Sharding Tutorial

This content originally appeared on Level Up Coding - Medium and was authored by Sonia Jessica

Introduction:

The majority of firms have begun to become data-driven in recent years. This implies they use data-driven decision-making to design future plans for departments like marketing, growth, business development, and product development, among others. Businesses must ensure that the essential data is collected in a format that is suitable for analysis and is easily available in order to undertake an accurate analysis. Because the goal is to provide an accurate analysis, a large amount of data is collected, making storage on a single system unfeasible. Database Sharding was born as a result of this.

What is Sharding?

Businesses that rely on monolithic Relational Database Management Systems (RDBMS) will have bottlenecks as the amount of data stored grows. Due to restricted CPU power, memory, storage capacity, and throughput, response time will inevitably deteriorate. Vertical scaling can be used in these scenarios, but it has its own restrictions and provides decreasing rewards after a certain point.

When dealing with large amounts of data, horizontal scaling is the ideal option. The load on each Server would be reduced if tables were partitioned horizontally and shared across many Servers, and numerous Servers performing a single query in parallel would result in a shorter response time.

Sharding is a scale-out technique in which database tables are partitioned and each partition is hosted on its own RDBMS server. In the case of MySQL, this means that each node is its own MySQL RDBMS, with its own set of data partitions. Because of this data separation, the application can distribute queries across numerous servers at the same time, resulting in parallelism and therefore boosting the workload’s scale. However, the separation of data and server introduces issues, such as sharding key selection, schema design, and application rewrites.

A Shard is a horizontal data split that contains a portion of the original data set. As a result, it is only accountable for a percentage of the entire workload. Data availability can also be ensured using sharding in the event of unanticipated outages. There will be no data available if the server for an Unsharded Database goes down for any reason. A Sharded Database, on the other hand, has several Shards among which the data is split. This means that in the event of an unforeseen outage, only the data in the downed Shards will be available. Those that are currently active will still be able to respond appropriately.

https://hazelcast.com/

Benefits of Sharding:

It’s relatively easy to run a relational database on a single system and expand it up as needed by upgrading the machine’s processing power. In the end, any non-distributed database will be constrained in terms of storage and processing capacity, therefore the ability to grow horizontally increases your setup’s flexibility.

Another reason some people may opt for a sharded database architecture is to reduce query response times. When you submit a query to a database that hasn’t been sharded, it may have to search every row in the table you’re querying to locate the result set you want.

Sharding can also help to improve the reliability of an application by reducing the impact of outages. If your program or website relies on an unsharded database, a failure could render the entire application inaccessible. An outage in a sharded database, on the other hand, is likely to affect only one shard.

MySQL Sharding:

If MySQL Sharding is to be performed, it must be done manually at the Application Layer, which demands a lot of Engineering bandwidth because a complete Sharding logic must be put up to determine how data is distributed and fetched.

Due to the fact that the implementation must be done manually, certain design considerations must be taken. The following are the decisions that must be made:

Choosing the MySQL Sharding Key.
Handling of Schema Changes.
Mapping among Physical Servers, Shards, and MySQL Sharding Keys.

Choosing the MySQL Sharding Key

The MySQL Sharding Key determines how data is shared between shards. The MySQL Sharding Key should be carefully chosen when deploying sharding in MySQL, as the erroneous key could result in system inflexibility later. If the parent and child rows are kept on distinct Shards, Referential Integrity, i.e. the parent/child relationship between tables that is maintained by the Relational Database Management System (RDBMS), will not be preserved automatically.

Another reason to choose the sharding key is that replacing an in-place sharding key might be time-consuming and inconvenient. Sharding key changes are normally avoided if at all possible because they might have a knock-on effect across applications, data locations, and transactionality (ACID as well) across nodes.

The following are the two types of MySQL Sharding Keys:

Intelligent sharding keys help to avoid cross-node transactions, but they are more susceptible to skew. If the user table is sharded by user_id, for example, it’s a good idea to place any related information about that user, such as user interactions, user contact points, and so on, on the same shard to avoid cross-node JOINS. This can function well if a large number of users are linked to each other.
Hashed sharding keys are disseminated throughout the shards automatically, although they are more vulnerable to cross-node transactions. If the sharding key is hash-distributed, for example, the associated user information will be hash-distributed as well. This is a good way to spread out the load, especially since new users can cause a spike in growth. However, frequent interaction across groups of users will necessitate cross-node JOINS and/or cross-node replication, resulting in continuing latency and/or potentially stale data due to replication lag.

Image Source: Link

Handling of Schema Changes

Users can alter table schemas in MySQL databases at any moment after they’ve been created. Each MySQL shard can undoubtedly do an online schema change, ensuring that no transactions are lost. But the issue isn’t RDBMS support; rather, it’s coordinating all of the shards’ DDL updates with the application state.

Before the application code employing the new schema(s) can be enabled, each of the shards must successfully complete their online schema change(s). Data inconsistencies, if not application failures, can occur if even one of the shards hasn’t finished the modification. This level of cooperation between several MySQL instances is a time-consuming process that exposes the application to failures and downtime. As a result, most sharded array administrators try to prevent schema changes wherever possible.

Mapping among Physical Servers, Shards, and MySQL Sharding Keys

If there isn’t a proper mapping between Physical Servers, Shards, and MySQL Sharding Keys, MySQL Sharding cannot be implemented. As the number of Shards grows and the Schemas change, this mapping may need to be changed on a frequent basis. This mapping would also have to be accessible for practically every query run on the database, thus it should be stored at a position that allows for quick lookups. As a result, storing it in an in-memory database is recommended.

Sharding: An Example

Let us assume that we have several servers and each server runs a MySQL instance that can have several databases. The data is sharded across each database.

We keep track of which machines these shards are on in a configuration table:

[{"range": (0, 255), "master": "MySQL1A", "slave": "MySQL1B"},
 {"range": (256, 511), "master": "MySQL2A", "slave": "MySQL2B"},
 …
 {"range": (512, 754), "master": "MySQL16A", "slave": "MySQL16B"}]

Only when we need to transfer shards or replace a host does this configuration alter. We can promote a slave and then bring up a new slave if a master dies.

There are the same tables in each shard: pencils, box, user_has_pencils, pencil_liked_by_user, and so on.

To distribute the shards:

ID = (shard ID << 45) | (type ID << 35) | (local ID<<0)

Let’s say a shard is on MySQL2A as an example. Here’s how we can get started:

conn = MySQLdb.connect(host="MySQL2A")
conn.execute("SELECT data FROM db03429.pencils where local_id=7075735")

Objects and mappings are the two forms of data.

Object tables feature an ID (the local ID, an auto-incrementing primary key) and a blob of data that contains a JSON with all of the object’s data, such as pencils, users, books, and comments.

CREATE TABLE pencils (
 local_id INT PRIMARY KEY AUTO_INCREMENT,
 data TEXT,
 ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB;

To modify a pencil, we use a MySQL transaction to read-modify-write the JSON:

> BEGIN
> SELECT blob FROM db01829.pencils WHERE localId=8325735 FOR UPDATE
[Modifying
 the json blob]
> UPDATE db01829.pencils SET blob='<modified blob>' WHERE localId=8325735
> COMMIT

A mapping table connects two objects, such as a board and the pencils it contains. A mapping table in MySQL has three columns: a 64-bit “from” ID, a 64-bit “to” ID, and a sequence ID. On the (from, to, sequence) triple, there are index keys that live on the shard of the ‘from’ ID.

CREATE TABLE box_has_pencils (
 boxId INT,
 pencilId INT,
 sequence INT,
 INDEX(boxId, pencilId, sequence)
) ENGINE=InnoDB;

To view things in a mapping table:

SELECT pencilId FROM box_has_pencils 
WHERE boardId=23468314728651 ORDER BY sequence 
LIMIT 25 OFFSET 100

Limitations of MySQL Sharding:

While sharding a database might help with scaling and performance, it can also come with certain drawbacks. Because of the disadvantages of MySQL Sharding implementation, it is not recommended for use in production for most applications.

The following are the constraints of using MySQL Sharding:

The sheer complexity of properly creating a sharded database architecture is the first challenge that people face with sharding. If the sharding procedure is done incorrectly, there’s a good chance that data may be lost or tables will be corrupted. Sharding, even when done correctly, is likely to have a significant influence on your team’s processes. Users must manage data across numerous shard locations rather than accessing and managing it from a single entry point, which could be disruptive to some teams.
Cross-joins across several Shards are not possible with MySQL Sharding. This is widely regarded as a significant disadvantage of MySQL Sharding, as many firms rely on SQL for analytics, which cannot be conducted without the usage of sophisticated join operations.
Another significant disadvantage is that after a database has been sharded, returning it to its original architecture can be quite difficult. Data written after the splitting will not be included in any database backups made before it was sharded. As a result, restoring the original unsharded architecture would necessitate either integrating the new partitioned data with the old backups or changing the partitioned DB back into a single DB, both of which would be costly and time-consuming.
One of the most important benefits of utilizing any SQL database is that it is ACID compliant. When MySQL Sharding is enabled, the database is no longer deemed ACID compliant, which raises the risk of data loss, as well as erroneous or inconsistent data. Learn More

Conclusion:

For those wishing to scale their database horizontally, sharding can be a wonderful alternative. However, it increases the complexity of your application and increases the number of potential failure spots. For some, sharding may be necessary, while for others, the time and resources required to build and maintain a sharded architecture may exceed the benefits.

Sharding implementation is seen to be more suitable for NoSQL databases. If sharding is critical to your company and data needs, it’s best to move all of the necessary data to a NoSQL database and then apply sharding to it.

MYSQL Sharding Tutorial was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

This content originally appeared on Level Up Coding - Medium and was authored by Sonia Jessica

Print Share Comment Cite Upload Translate Updates

APA

Sonia Jessica | Sciencx (2021-07-20T15:29:12+00:00) MYSQL Sharding Tutorial. Retrieved from https://www.scien.cx/2021/07/20/mysql-sharding-tutorial/

MLA

" » MYSQL Sharding Tutorial." Sonia Jessica | Sciencx - Tuesday July 20, 2021, https://www.scien.cx/2021/07/20/mysql-sharding-tutorial/

HARVARD

Sonia Jessica | Sciencx Tuesday July 20, 2021 » MYSQL Sharding Tutorial., viewed ,<https://www.scien.cx/2021/07/20/mysql-sharding-tutorial/>

VANCOUVER

Sonia Jessica | Sciencx - » MYSQL Sharding Tutorial. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2021/07/20/mysql-sharding-tutorial/

CHICAGO

" » MYSQL Sharding Tutorial." Sonia Jessica | Sciencx - Accessed . https://www.scien.cx/2021/07/20/mysql-sharding-tutorial/

IEEE

" » MYSQL Sharding Tutorial." Sonia Jessica | Sciencx [Online]. Available: https://www.scien.cx/2021/07/20/mysql-sharding-tutorial/. [Accessed: ]

rf:citation

» MYSQL Sharding Tutorial | Sonia Jessica | Sciencx | https://www.scien.cx/2021/07/20/mysql-sharding-tutorial/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.