I got into interesting sets of exchanges over CopyData with the fine folks of Actifio and Catalogic Software. In general, I’m a fan of the concept of CopyData software. Both solutions reduce the amount of production data on storage (which is up to 60% of all enterprise data) using technics such as snapshots, redirection and replication. In an enterprise, they may want to consider data validation to improve the quality of vast quantities of data so that insight gathering can be conducted more effectively. Both companies have plenty of material to reference. Catalogic presented at Storage Field Day 7, and I wrote a preview before I attended the event. Actifio attended Tech Field Day 4 way back when. If you want to get into the technical weeds, follow the links to their Tech Field Day presentations.
I’m more concerned with the Architects view of these types of solutions. Ultimately these solutions are a precursor to Data Virtualization, which is different than CopyData. Data virtualization aim is to provide access to data regardless of the data transport or storage location while maintaining the same service levels. Data virtualization is designed to be used with production data in addition to CopyData.
CopyData is designed to be a compliment to production data. In CopyData solutions, any non-production instance of data will be provided by the CopyData solution. My primary concern with these solutions is the need to change the workflow of the end users of copy data.
One of the biggest challenges that CopyData presents is its inability to be invisible to the transports. If you are using say, NFS as the transport protocol then you simply have to have your end users point their NFS client to the CopyData’s NFS solution. When data is refreshed from production, it’s virtually invisible to the end user.
However, if you are using block storage, (Fiber Channel, iSCSI or even DAS) CopyData is disruptive. You either need to change your transport to something such as SMB (CIFS for my fellow TFD delegates) or NFS to make data refreshes less disruptive. If SMB or NFS isn’t an option, then you have to follow a potentially disruptive mount and dismount process to refresh copy data. Or an agent would need to be installed on a server. Bottom line is that end users will have to adopt a new process.
The adoption of new processes is the easiest way to add risk to any technology improvement. If you have a established SDLC where Copy Data is refreshed by say a DB restore then it will be a hard sell to developers to change. If a developer has to mount and dismount a filesystem and still run consistency checks on the DB, there’s going to be a conversation. Ultimately you have to answer the tough question of what’s in it for them vs. the current process. If the answer is that Infrastructure gets to save money on storage, then I can predict how that conversation will end.
So CopyData, I like the concept but the execution is a long way off for a good portion of the market. For me, CopyData is a stopgap until Data Virtualization is mature.
(Note: Catalogic is a sponsor of Storage Field Day 7, which I attended. I’d go through the whole disclaimer thing but if you wanted journalist quality disclosure you shouldn’t be visiting some random guys enterprise IT blog.)