When the size of the database grows beyond millions of rows, it is advisable to start scaling the application and split the database into multiple physical servers.
The biggest problem of splitting a database into multiple parts is its subsequent synchronization when a user requests specific data.
Suppose you have a table of articles
, but because you have a huge site, there are tens of millions more articles and you have to physically split them across multiple machines.
If we were to use an ordinary integer as the id
(primary key) with the setting as autoincrement, we would very quickly find that when creating records on different machines in a decentralized way and then synchronizing them, there are ID collisions and we have to renumber the records in a complicated way. Additionally, if we are resolving many sessions to other tables, this can be a very complex overhead in which it is easy to make mistakes.
Therefore, instead of a numeric identifier, we can generate a UUID
, which is a text string that is generated by a complex algorithm that guarantees that it will be unique even if it is generated independently on multiple machines.
Advantages:
primary key
before actually inserting the record into the database. This reduces the number of SQL queries, simplifies transaction logic, and you can easily use it as a foreign key before the record collection exists.19010018
, it is easy to guess that user 19010017
and others also exist. The attack is called a vector attack.We can get the UUID either simply by SQL query SELECT UUID();
, but this increases the number of queries to the database and we lose the possibility to prepare the data first in bulk in the application logic and then write them at once.
Therefore, I like to use the ramsey/uuid package obtained by Composer as a good solution. The UUID itself has several versions, and the package can playfully generate all kinds as needed.
This makes it easy to use:
require 'vendor/autoload.php';use Ramsey\Uuid\Uuid;// Generates version 1 (time-based) UUID object$uuid1 = Uuid::uuid1();echo $uuid1->toString() . "\n"; // e4eaaaf2-d142-11e1-b3e4-080027620cdd// Generates version 3 (name-based and hashed as MD5) UUID object$uuid3 = Uuid::uuid3(Uuid::NAMESPACE_DNS, 'php.net');echo $uuid3->toString() . "\n"; // 11a38b9a-b3da-360f-9353-a5a725514269// Generates version 4 (random) UUID object$uuid4 = Uuid::uuid4();echo $uuid4->toString() . "\n"; // 25769c6c-d34d-4bfe-ba98-e0ee856f3e7a// Generates version 5 (name-based and hashed as SHA1) UUID object$uuid5 = Uuid::uuid5(Uuid::NAMESPACE_DNS, 'php.net');echo $uuid5->toString() . "\n"; // c4a760a8-dbcf-5254-a0d9-6a4474bd1b62
If you use Doctrine, there is an extension ramsey/uuid-doctrine that generates the ID directly as a data type.
In my first attempts I used varchar(36)
as the primary key (ID), but that's not a good idea at all.
Explanation of the internal logic:
MySql databases (and many others) can't use
varchar
,char
or other data types expressing a string as a primary key efficiently. In some databases, there is aGUID
datatype that is designed to store UUIDs directly. If you cannot use this type, there is a suitable substitute in the formbinary(16)
.
When physically examining the database, the ID is then represented in HEX format (since binary format cannot be displayed), instead of the nice ID 726c67c4-e5eb-4a4c-8fcc-031da5d6f3c6
, you will just see 726C67C4E5EB4A4C8FCC031DA5D6F3C6
, which looks like '?kYߟKg2c;'
in the INSERT query.
varchar(36)
to binary(16)
I assume you represent (or plan to represent) the newly set ID in the database as:
``sql
id
binary(16) NOT NULL
However, just changing the data type doesn't work, so something like:
```sql
SET FOREIGN_KEY_CHECKS=0;
ALTER TABLE article CHANGE id id BINARY(16) NOT NULL
SET FOREIGN_KEY_CHECKS=1;
There are basically two reasons for this:
must have the same data type
. Therefore, you need to change both the data type for the article ID and, for example, in the relational table that matches articles to authors.Therefore, the only correct solution is to back up the data (but you should do this before each migration anyway), prepare an empty database with functional relations and put the data there again via migration.
If you have generated UUIDs strangely before, it is better to choose some sequential method to get the UUID and renumber all records. The reason is that sequential layout allows better ordering of values and creating a btree
, which makes the performance almost identical to bigint
.
If you know of a better way to convert an existing database from UUID stored as a varchar to binary format without having to devise complex migrations and with preservation of foreign keys, I would be very grateful for feedback.
Jan Barášek Více o autorovi
Autor článku pracuje jako seniorní vývojář a software architekt v Praze. Navrhuje a spravuje velké webové aplikace, které znáte a používáte. Od roku 2009 nabral bohaté zkušenosti, které tímto webem předává dál.
Rád vám pomůžu:
Články píše Jan Barášek © 2009-2024 | Kontakt | Mapa webu
Status | Aktualizováno: ... | en