Getty Images/iStockphoto
Key features of a distributed file system
Distributed file systems enable users to access file data that is spread across multiple storage servers. A DFS should uphold data integrity and be secure and scalable.
Distributed file systems can share data from a single computing system among various servers, so client systems can use multiple storage resources as if they were local storage. Distributed file systems enable organizations to access data in an easily scalable, secure and convenient way.
A DFS enables direct host access to file data from multiple locations. For example, NFS is a type of distributed file system protocol where storage resources connect to a computer by network resources, such as a LAN or SAN. Hosts can access data using protocols such as NFS or SMB. Admins can add nodes to a DFS to scale quickly. A DFS should create backup copies to prevent data loss if there are drive failures.
Features of distributed file systems
There are various features to a DFS, such as the following:
- Transparency. Transparency is a security mechanism that shields details of one file system from other file systems and users. There are four types:
- Structure transparency. The actual structure of the DFS, such as the number of file servers and storage devices, is hidden from users.
- Access transparency. The DFS should display the user's file resources following the correct secure login process, regardless of the user's location.
- Replication transparency. Replicated files stored in different nodes of the DFS are kept hidden from other nodes in the file system.
- Naming transparency. File names should not indicate the location of any given file and should not change when the files move among storage nodes supported by the DFS.
- Performance. This metric measures the time needed to process user file access requests and includes processor time, network transmission time, and time needed to access the storage device and deliver the requested content. DFS performance should be comparable to a local file system.
- Scalability. As storage requirements increase, users typically deploy additional storage resources. The DFS should be powerful enough so that, as storage capacity scales upward, the system can handle the additional resources so users do not notice any difference in performance.
- High availability. Like any storage device, equipment managed by a DFS must not be interrupted or disabled. However, if an issue such as a node failure or drive crash occurs, the DFS must remain operational and quickly reconfigure to alternate storage resources to maintain uninterrupted operations. DR plans must include provisions for backing up and recovering DFS servers, as well as storage devices.
- Data integrity. When multiple users access the same file storage systems and possibly even the same files, the DFS must manage the flow of access requests so there are no disruptions in file access or damage to file integrity.
- High reliability. Another way to ensure data availability and survivability in a disruption is to have the DFS create backup copies of files specified by users. This is complementary to high availability and ensures that files and databases are available when needed.
- Security. As with any data storage arrangement, data must be protected from unauthorized access and cyber attacks that could damage or destroy the data. Encryption of the data -- both at rest and in transit -- helps increase data security and protection.
- User mobility. This feature routes a user's directory of file resources to the node where the user logs in.
- Namespaces. Namespace defines a repository of commands and variables to facilitate specific activities. In distributed file systems, namespaces gather the required commands and related actions needed for the DFS to function properly. A single namespace supporting multiple file systems generates a single UI that makes all file systems look like a single file system to a user. Namespaces also reduce the chance of interference with the contents of other namespaces.
Standalone DFS versus domain-based DFS namespaces
Standalone distributed file systems do not use Active Directory (AD). Instead, they are created locally with their own unique root directories. They cannot be linked with any other DFS entities. They are not as popular as domain-based distributed file systems.
Domain-based DFS namespaces store the configuration of a DFS in AD. This makes DFS easier to use and more accessible throughout a system.
Strengths and limitations of distributed file systems
DFS technology provides file survivability by distributing critical files and databases across multiple storage devices. Some of these storage entities are at alternate company locations and can also be cloud-based, providing additional DR support. Users can enhance data movement across storage nodes.
There may be difficulties if there is a change in file servers, file storage applications and other storage protocols that may not be compatible with the DFS/NFS application. There are risks of data loss if security provisions are not in place. Also, movement of data from one storage node to another may result in lost data.