I recently attended Storage Field Day 7 (See disclosure) and was introduced to the concept of data virtualization. From a high-level data virtualization is the abstraction of data services from the physical data. Initially, the concept of data virtualization didn’t resonate with me. It reminded me of trying to understand network virtualization for the first time. I questioned what exactly is being abstracted and how does it add value. I had to re-watch the Primary Data SFD 7 presentation by their CTO.
Comparison to Server Virtualization
The advantage of data virtualization is the ability to use the underlying storage platform to its full capability and give operational flexibility to infrastructure admins. The goal is very similar to that of server and network virtualization. By abstracting data services from the physical data location, flexibility is added. In server virtualization, you can move compute from one physical server to another, for example. The compute service is abstracted from the hardware, which allows to a degree the application to be free of the constraints of the physical underlay.
Data virtualization aims to do the same. By abstracting the data services, the application can be free of the physical underlay. A request to move data from Tier 1 storage to Tier 2 or 3 doesn’t require a physical configuration change. Primary Data uses the pNFS protocol to facilitate the routing of data traffic between physical boundaries.
The advantage of Primary Data’s approach is that the transition is seamless to the application. The pNFS protocol handles the data migration. However, as Primary Data made a point of explaining data routing is just one component needed. The second is centralization of the metadata needed to manage storage. Think about all the different parts of storage is kept outside of the 1’s and 0’s. File system attributes, access control and file locks are just some examples.
When abstracting metadata from the physical storage all kinds of challenges arise including scale and reliability. Handling metadata on a single array is difficult enough. When a solution proposes a distributed system for storage metadata, I get nervous. Primary data isn’t proposing using their solution only for lower level environment such as test and dev but for Tier 1 workloads as well. I just don’t know if I’m comfortable trusting the metadata of my storage system to a centralized solution.
The future of enterprise storage
I theorized that we’d be able to abstract the data center completely. This doesn’t mean that we’d get rid of physical workload but rather we’d have the ability to abstract the service layer from the underlay. Data center abstraction can happen when automation tools can depend on services such as network, server, and data virtualization to live migrate entire infrastructures, not just server workloads.
Looking at solutions such as OpenStack, it’s obvious that server virtualization provides this capability. Network virtualization is maturing to a level that supports this abstraction. According to Primary Data, storage is the last leg to solve. Data virtualization is one way to achieve the goal. Another method is object storage which I’ll discuss in a future post.
Disclosure: Tech Field Day paid for my transportation and expenses to attend Storage Field Day. I’m not required to write about any of the vendors that presented at the event nor does any vendor preview my coverage prior to publishing.