CopyData yeah… Long live Data Virtualization

I got into interesting sets of exchanges over CopyData with the fine folks of Actifio and Catalogic Software. In general, I’m a fan of the concept of CopyData software. Both solutions reduce the amount of production data on storage (which is up to 60% of all enterprise data) using technics such as snapshots, redirection and replication. In an enterprise, they may want to consider data validation to improve the quality of vast quantities of data so that insight gathering can be conducted more effectively. Both companies have plenty of material to reference. Catalogic presented at Storage Field Day 7, and I wrote a preview before I attended the event. Actifio attended Tech Field Day 4 way back when. If you want to get into the technical weeds, follow the links to their Tech Field Day presentations.



Architect’s View

I’m more concerned with the Architects view of these types of solutions. Ultimately these solutions are a precursor to Data Virtualization, which is different than CopyData. Data virtualization aim is to provide access to data regardless of the data transport or storage location while maintaining the same service levels. Data virtualization is designed to be used with production data in addition to CopyData.

CopyData is designed to be a compliment to production data. In CopyData solutions, any non-production instance of data will be provided by the CopyData solution. My primary concern with these solutions is the need to change the workflow of the end users of copy data.

One of the biggest challenges that CopyData presents is its inability to be invisible to the transports. If you are using say, NFS as the transport protocol then you simply have to have your end users point their NFS client to the CopyData’s NFS solution. When data is refreshed from production, it’s virtually invisible to the end user.

However, if you are using block storage, (Fiber Channel, iSCSI or even DAS) CopyData is disruptive. You either need to change your transport to something such as SMB (CIFS for my fellow TFD delegates) or NFS to make data refreshes less disruptive. If SMB or NFS isn’t an option, then you have to follow a potentially disruptive mount and dismount process to refresh copy data. Or an agent would need to be installed on a server. Bottom line is that end users will have to adopt a new process.

The adoption of new processes is the easiest way to add risk to any technology improvement. If you have a established SDLC where Copy Data is refreshed by say a DB restore then it will be a hard sell to developers to change. If a developer has to mount and dismount a filesystem and still run consistency checks on the DB, there’s going to be a conversation. Ultimately you have to answer the tough question of what’s in it for them vs. the current process. If the answer is that Infrastructure gets to save money on storage, then I can predict how that conversation will end.

So CopyData, I like the concept but the execution is a long way off for a good portion of the market. For me, CopyData is a stopgap until Data Virtualization is mature.

(Note: Catalogic is a sponsor of Storage Field Day 7, which I attended. I’d go through the whole disclaimer thing but if you wanted journalist quality disclosure you shouldn’t be visiting some random guys enterprise IT blog.)


Published by Keith Townsend

Now I'm @CTOAdvisor

6 thoughts on “CopyData yeah… Long live Data Virtualization

  1. Keith,

    Thanks for bringing this to this forum. (I had to laugh when I saw the last Actifio post in our exchange that called out 300 or so customers in x number of countries – basically had nothing to do w/ the exchange. I would point out that Catalogic has over 1000 customers in more countries, but back to the point at hand.)

    Glad your a “fan of the concept”. I find, having been in this space for a while, that the words we use to describe some technologies or capabilities sometimes tend to fall into the realm of what the industry analysts decide to call a particular market.

    I think, based on the piece, we both agree that copies of data that are created for multiple business purposes is becoming a major headache for storage managers everywhere. For every LOB whose business process requires a copy of production data in order to meet their business objectives puts a strain on IT. Today you have development (test/dev or DevOps) and sales, marketing and finance (analytics) that require a copy of production data in order to do their job. Who knows what the next group is and what the business process will be, but one thing we do know, it will require a copy of the production data. At the end of the day, process or not, these copies cost a lot of money. CapEx from a storage perspective (up to 60% as your piece calls out) and OpEx, from a data management perspective, and who knows from a security perspective (copies of data ‘laying around’ everywhere).

    I also agree w/ you 100% when you talk about process. I have written a number of blogs ( on “…the hardest thing to change in IT is process, not technology.” As our twitter exchange pointed out, when the line of business brings up the new process it is a bit easier. But how about not changing process but automating the processes that IT has. The core functionality that Catalogic provides with ECX is an actionable catalog that enables automation and orchestration of your existing processes on your snap data (snaps, vaults and mirrors) to then use that data for the different business operations or use cases like test/dev or analytics as well as for recovery and automated disaster recovery. I am not talking about ‘changing’ process but automating the process.

    In addition, it is important to note that regardless of how the interface labels how the data is made available to the end user, it IS done with what we term, instant virtualization. This allows the data to be made available to the users (file OR block), without the user having to worry about it.

    What is more important I feel is how to users of the data, say development, get access to the latest copy of production to data to develop / test against today? In all my client meetings, customers tell me, “we go to our last full backup and we do a full restore to make the data available.” To which I ask, “do you do this every morning so they have access to the latest and greatest data?” And the reply is, “no”. So in someways, this ‘new’ process, automating the process of making your shapshot data instantly available, through instant data virtualization, can only increase the companies competitive advantage, reduce the data sprawl, reduce the time and complexity of the data recovery process and reduce the management of data, giving valuable time back to the storage admin.

    Thanks again Keith, just wanted to try to shed some clarity on how we at Catalogic think about data access and data availability.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: