Key considerations in developing a storage area network design
The best storage area network design for your customers will take into consideration a number of critical issues: uptime needs, scalability, security and disaster recovery. Find out how each of these factors will influence storage area network design choices.
By Yuval Shavit, Features Writer
Storage area networks (SANs) let several servers share storage resources and are often used in situations that require high performance or shared storage with block-level access, like virtualized servers and clustered databases. Although SANs started out as a high-end technology used only in large enterprises, cheaper SANs are now affordable even for small and medium-sized businesses (SMBs). In earlier installments of this Hot Spot Tutorial, we examined what benefits SANs offer over other storage architectural choices, as well as the two main storage networking protocols, Fibre Channel and iSCSI. In this installment, we'll look at the main considerations you should keep in mind when putting together a storage area network design.
Uptime and availability
Because several servers will rely on a SAN for all of their data, it's important to make the system very reliable and eliminate any single points of failure, said Paul Franco, executive vice president at Zibiz Data Management, a Ronkonkoma, N.Y., storage consultancy. Most SAN hardware vendors offer redundancy within each unit -- like dual power supplies, internal controllers and emergency batteries -- but you should make sure that redundancy extends all the way to the server, Franco said.
In a typical storage area network design, each storage device connects to a switch that then connects to the servers that need to access the data. To make sure this path isn't a point of failure, your client should buy two switches for the SAN network. Each storage unit should connect to both switches, as should each server. If either path fails, software can fail over to the other. Some programs will handle that failover automatically, but cheaper software may require you to enable the failover manually, Franco said. You can also configure the program to use both paths if they're available, for load balancing.
But you should also consider how the drives themselves are configured, Franco said. RAID technology spreads data among several disks -- a technique called striping -- and can add parity checks so that if any one disk fails, its content can be rebuilt from the others. There are several types of RAID, but the most common in SAN designs are levels 5, 6 and 1+0, Franco said.
RAID 5 stripes data across every disk in the unit except one, which is used to store parity information that can be used to rebuild any drive that needs to be replaced. RAID 6 adds a second disk for redundant parity. This protects your client's data in case a second drive breaks during the first disk's rebuild, which can take up to 24 hours for a terabyte, Franco said. RAID 1+0 stripes data across a series of disks without any parity checks, which is very fast, but mirrors each of those disks to a second set of striped disks for redundancy.
Capacity and scalability
A good storage area network design should not only accommodate your client's current storage needs, but it should also be scalable so that your client can upgrade the SAN as needed throughout the expected lifespan of the system. You should consider how scalable the SAN is in terms of storage capacity, number of devices it supports and speed, said Greg Schulz, founder and senior analyst with The StorageIO Group, a Stillwater, Minn., consulting firm.
Because a SAN's switch connects storage devices on one side and servers on the other, its number of ports can affect both storage capacity and speed, Schulz said. By allowing enough ports to support multiple, simultaneous connections to each server, switches can multiply the bandwidth to servers. On the storage device side, you should make sure you have enough ports for redundant connections to existing storage units, as well as units your client may want to add later.
One feature of storage area network design that you should consider is thin provisioning of storage. Thin provisioning tricks servers into thinking a given volume within a SAN, known as a logical unit number (LUN), has more space than it physically does. For instance, an operating system (OS) that connects to a given LUN may think the LUN is 2 TB, even though you have only allocated 250 GB of physical storage for it, Schulz said.
Thin provisioning allows you to plan for future growth without your client having to buy all of its expected storage hardware up front. In a typical "fat provisioning" model, each LUN's capacity corresponds to physical storage. That means that your client will have to buy as much space as it anticipates needing for the next few years, Schulz said. While it's possible to allocate a smaller amount of space for now and transfer its data to a larger provision as needed, that process is slow and could result in downtime for your client.
Thin provisioning allows you to essentially overbook a SAN's storage, promising a total capacity to the LUNs that is greater than the SAN physically has. As those LUNs fill up and start to reach the system's physical capacity, you can add more units to the SAN -- often in a hot-swappable way, Franco said. But because this approach to storage area network design requires more maintenance down the road, it's best for stable environments where a client can fairly accurately predict how each LUN's storage needs will grow, Schulz said.
Security
With several servers able to share the same physical hardware, it should be no surprise that security plays an important role in a storage area network design. Your client will want to know that servers can only access data if they're specifically allowed to. If your client is using iSCSI, which runs on a standard Ethernet network, it's also crucial to make sure outside parties won't be able to hack into the network and have raw access to the SAN.
Most of this security work is done at the SAN's switch level, Franco said. Zoning allows you to give only specific servers access to certain LUNs, much as a firewall allows communication on specific ports for a given IP address. If any outward-facing application needs to access the SAN, like a website, you should configure the switch so that only that server's IP address can access it, Franco said.
If your client is using virtual servers, the storage area network design will also need to make sure that each virtual machine (VM) has access only to its LUNs, Schulz said. Virtualization complicates SAN security because you cannot limit access to LUNs by physical controllers anymore -- a given controller on a physical server may now be working for several VMs, each with its own permissions. To restrict each server to only its LUNs, set up a virtual adapter for each virtual server. This will let your physical adapter present itself as a different adapter for each VM, with access to only those LUNs that the virtualized server should see.
Replication and disaster recovery
With so much data stored on a SAN, your client will likely want you to build disaster recovery into the system. SANs can be set up to automatically mirror data to another site, which could be a failsafe SAN a few meters away or a disaster recovery (DR) site hundreds or thousands of miles away.
If your client wants to build mirroring into the storage area network design, one of the first considerations is whether to replicate synchronously or asynchronously. Synchronous mirroring means that as data is written to the primary SAN, each change is sent to the secondary and must be acknowledged before the next write can happen.
While this ensures that both SANs are true mirrors, synchronization introduces a bottleneck. If the secondary site has a latency as high as even 100 to 200 milliseconds (msec), your system will slow down as the primary SAN has to wait for each confirmation, Schulz said. Although there are other factors, latency is often related to distance; synchronous replication is generally possible up to about 6 miles, Franco said.
The alternative is to asynchronously mirror changes to the secondary site. You can configure this replication to happen as quickly as every second, or every few minutes or hours, Schulz said. While this means that your client could permanently lose some data, if the primary SAN goes down before it has a chance to copy its data to the secondary, your client should make calculations based on its recovery point objective (RPO) to determine how often it needs to mirror.