Sashkin - Fotolia
Should you run MySQL on VMs or physical machines?
Take stock of your current environment, including instance types, orchestration systems and storage devices, before deciding whether to deploy MySQL to VMs or physical machines.
Although MySQL is quite old in computer years, the surge in private and public clouds has architects wondering whether they should run MySQL on VMs or physical machines.
Despite its old age, MySQL is one of the most popular open source databases. One reason MySQL is so popular is that it's a key component of the Linux, Apache, MySQL and PHP (LAMP) architecture, which powers many of the world's websites -- especially those that serve up static content, like WordPress.
Performance tests
One of the early defenders of the VM approach was VMware, which developed the first hypervisor for the x86 chip. In 2008, it ran a performance test to see whether MySQL ran faster on a dedicated machine or a VM. According to its findings, when you run MySQL on a VM, it's 4.9% slower because of the hypervisor’s overhead. If you consider that there are many moving parts in an app, that's probably not too significant of a percentage. The network, app server, content delivery network, internet and any viruses on the user's computer all introduce latency. And with the very fast speeds at which MySQL and disk drives work, including solid-state drives, 4.9% is a pittance.
A more relevant factor in the decision of whether to run MySQL on VMs versus dedicated machines is the wide adoption of cloud computing and the resulting cost savings. There is such a critical mass behind this idea that to argue otherwise would be a tough sell.
VMs and Kubernetes, YARN and Mesos orchestration systems can drive down the subscription fees on public clouds significantly by collocating services on one machine. So there is no need to run three servers for the LAMP stack when you can run all three services on one.
Another driving factor is the idea of abstraction, where architecture is code. This lets administrators and programmers package up their application and push it out using automation tools like Puppet, Chef and Ansible. That abstraction can be pushed to single servers, but a single server isn't a resource pool as envisioned by orchestration products that span servers. Those tools look at the collection of VMs as a resource pool from which to draw upon as needed.
Then there is the issue of storage. Local VM storage isn't a good fit for a database because, when you shut that down, the storage goes offline -- except in the example of MySQL Cluster, which is explained below, and Amazon Elastic Block Store. Therefore, you need a storage area network (SAN) or other storage device to contain the logical mount point, data and so on. So, this is not really a deciding factor unless you are using a public cloud in which there are many different storage options with a wide range of pricing. In that case, isolating storage from machines really does drive down cost. For example, you can put the data warehouse and ad hoc databases on the least expensive tier.
The actual implementation of MySQL storage doesn't depend on the underlying hardware in all cases. MySQL has 15 different storage options, called Storage Engines. Those vary from memory, federated -- a remote MySQL database without replication -- and even regular CSV text files.
MySQL NDB Cluster
MySQL Cluster is a distributed system. So, just like Hadoop, Elasticsearch and Spark, by definition, it's designed to run on clusters of commodity machines.
The MySQL Cluster is analogous to the way Apache Spark works. It's an in-memory database designed to run across nodes of machines using each server's local disk drive. So, you don't use SAN. Here, there are no Network File System mount points or physical sharing of storage using some protocol. Instead, the cluster uses local storage. Spark calls this type of storage Resilient Distributed Datasets. MySQL calls it Network Database.
That architecture follows the Apache Hadoop storage mechanism of dividing data into DataNodes and controlling nodes, which, in the case of MySQL, are called daemons and management servers. This model lends itself to virtualization as administrators can add additional DataNodes without having to restart the cluster.
So, should you run MySQL on VMs or physical machines? Trying to argue for dedicated machines will be a hard sell in an organization that has already moved to VMs and containers or signed up with a public cloud. Virtualization has definitely become the de facto standard. The microservices registry -- an architecture where applications register their IP and ports so they can be discovered -- and the load balancing that that enables makes for an architecture that is easiest to push to new environments, manage and scale as the workload changes. All of this drives down costs and reduces errors by using repeatable scripts. This fits into continuous release or other process and orchestration tools and is something the organization understands quite well.