Category: Backup

Rubrik’s Masterless Cluster approach

Rubrik’s Masterless Cluster approach

I’ve written a blog post about Atlas, our distributed filesystem before, where I tried to differentiate between pre-packaged solutions, ultimately using existing legacy architectures with inherited build-in single points of failures, and a truly distributed system where each node in the cluster can perform all of the tasks that are required for successful operation of said cluster.

Atlas is just one piece of the puzzle, we need to further differentiate between using a partially distributed system design, i.e. using a distributed storage layer, in combination with a legacy management architecture, and using a distributed storage layer as part of a modern masterless architecture. In the case of Rubrik, all components operate in a distributed, masterless design. Each 2U appliance contains 4 individual nodes (individual servers), each individual server runs our Cloud Data Management solution as part of the bigger overall cluster. If a node (or an individual component inside a node) fails it has no impact on the overall correct operation of the cluster. The only component that is shared is the redundant power supply.

Can the same be said of running a legacy software stack on top of a distributed storage layer? Are all components running completely distributed? If the management components fail, does a distributed storage layer help you?

Let’s start by looking at a legacy stack example;

If you think about this from the pov of taking a backup, most of these components have a role to play. If one of the components is not operational no backups are taken, and the RPO keeps increasing until the backup solution is repaired.

Now let’s focus on distributing part of the legacy stack;

Again thinking about this from the pov of taking a backup, assuming the system/management components are operational you now have a scale-out, redundant storage layer which brings additional flexibility but it does not lessen the burden on the rest of the stack. If something in the system/management layer fails having a redundant storage layer does not save you, you are again increasing your RPO until the infrastructure is restored.

Now contrast this with Rubrik’s masterless approach;

Seeing that all processes (we don’t have the traditional demarcations of media server, search server, proxy/data mover, etc, but we do run processes/applications in a distributed manner in the cluster that provide all the CDM functions) are running on all nodes, we can similarly lose hardware components (because everything fails eventually) but this will not bring down the master server, simply because we don’t have a master server. Let’s say a node is working on ingesting backup data from your primary system and the node fails, the task is not marked as complete in our distributed system so another node in the cluster will pick up the job and make sure the backup is taken.

Rubrik has opted to build a truly scalable, masterless cluster, neatly leapfrogging legacy approaches of the past by taking lessons from massive-scale modern environments found at Google, Facebook and the like. To be fair, when legacy backup system where designed these giants’ shoulders were not available…

The bigger picture

Even though the principles and technology behind Rubrik are fascinating, it is important not to lose sight of the bigger picture. The idea is to deliver a greatly simplified approach to data management, that the architecture is a converged, masterless, scale-out, self-healing system, etc is nice but the real benefit lies in the operational use.

An additional benefit of using this design is that because we have a unified system, we can present this via a unified RESTful API. Integrating/automating the storage target, or integrating with the management components is not separate, simply because the system is one. Now contrast that with running multiple different components, each with their own, or non-existing integration points, the practical experience with be night and day. A better (much better) experience is ultimately what we are after, after all.

And finally, I wanted to point out that the pictures above are sort of looking at a traditional type of on-premises only environment, but what about running in the public cloud, multi-cloud, hybrid environment, branch offices, etc? Again the model we use delivers the flexibility to run anywhere since you don’t have to drag around these legacy interlinked pieces.

 

Rubrik Alta feature spotlight: AWS.

Rubrik Alta feature spotlight: AWS.

With the announcement of Rubrik CDM version 4, codenamed Alta, we have added tons of new features to the platform, but since most of the release announcements are focussed on providing an overview of all the goodies, I wanted to focus more deeply on one specific topic namely our integration with AWS.

Rubrik has had a long history of successfully working with Amazon Web Services, we’ve had integration with Amazon S3 since our first version. In our initial release you could already use Amazon S3 as an archive location for on-premises backups, meaning take local backups of VMware Virtual Machines and then keep them on the local Rubrik cluster for a certain period of time (short term retention) with the option to put longer term retention data into Amazon S3. The idea was to leverage cloud storage economics and resiliency for backup data and at the same time have an active archive for longer term retention data instead of an offline copy on tape. Additionally the way our metadata system works allows us to only retrieve the specific bits of data you need to restore instead of having to pull down the entire VMDK file and incurring egress costs potentially killing the cloud storage economics benefit.

Screen Shot 2017-06-12 at 18.40.13

Also note there is no need to put a gateway device between the Rubrik cluster and Amazon S3, instead Rubrik can natively leverage the S3 APIs.

The ability to archive to Amazon S3 is still here in version 4 of course but now all the supported sources besides VMware ESXi, like Microsoft Hyper-V, Nutanix AHV, Physical Windows/Linux, Native SQL, and so on can also leverage this capability.

Then in Rubrik CDM version 3.2 we added the ability to protect native AWS workloads by having a Rubrik cluster run inside AWS using EC2 for compute and EBS for storage.

Screen Shot 2017-06-13 at 10.40.43We’ll run a 4 node (protecting your data using Erasure Coding) Rubrik cluster in you preferred AWS location (the Rubrik AMI image is uploaded as a private image).
We use the m4.xlarge instance type, using 64GB RAM, 256GB SSD (GP SSD (GP2)) and 24TB Raw Capacity (Throughput Optimised HDD (ST1)), resulting in 15TB usable capacity before deduplication and compression.

Once the Cloud Cluster is running you can protect your native AWS workloads using the connector based approach, i.e. you can protect Windows and Linux filesets, and Native SQL workloads in the Public Cloud.

Additionally since potentially you can now have a Rubrik cluster on-premises and a Rubrik Cloud Cluster you can replicate from your in-house datacenter to your public cloud environment and vice versa. Or replicate from one AWS region to another.Screen Shot 2017-06-13 at 10.46.06

Since the Cloud Cluster has the same capabilities as the on-premises one it can also backup your AWS EC2 workloads and then archive the data to S3, essentially going from EBS storage to S3. (Christopher Nolan really likes this feature).

Screen Shot 2017-06-13 at 10.53.11

Version 4 of Rubrik CDM extends our AWS capabilities by delivering on-demand app mobility, called CloudOn. The idea is that now you can take an application that is running on-premises and move it on the fly to AWS for DR, or Dev/Test, or analytics scenarios.

Screen Shot 2017-06-13 at 17.31.23

The way it will work is that just as since v1 you archive your data to AWS S3, once you decide to instantiate that workload in the public cloud you select “Launch On Cloud” from the Rubrik interface and a temporary Rubrik node (spun up on-demand in AWS in the VPC of your choice) converts those VMs into cloud instances (i.e. going from VMware ESXi to AWS AMI images). Once complete the temporary Rubrik node powers down and is purged.

launch on cloud

Rubrik scans the configuration file of a VM to understand its characteristics (compute, memory, storage, etc.) and recommends a compatible cloud instance type so you are not left guessing what resources you need to consume on AWS.

Alternatively we can also auto-convert the latest snapshot going to S3 so you don’t have to wait for the conversion action.

Data has gravity, meaning that once you accumulate more and more data it starts to make more sense to move the application closer to the data since the other way around starts to become less performant, and more and more cost prohibitive in terms of transport costs. So what if your data sits on-premises but your applications are running in the public cloud? Screen Shot 2017-06-13 at 18.06.53
For instance let’s say you want to perform business analytics using Amazon QuickSight but your primary data sources are in your private data center, now you simply archive data to S3 as part of your regular Rubrik SLA (archive data older than x days) and pick the point-in-time dataset you are interested in, use Rubrik to “Launch on Cloud”, and point Amazon QuickSight (or any other AWS service) to your particular data source.

Together these AWS integrations allow you to make use of the public cloud on your terms, in an extremely flexible way.

Atlas distributed filesystem, think outside the box.

Atlas distributed filesystem, think outside the box.

Rubrik recently presented at Tech Field Day 12 and one of the sessions focused on our distributed filesystem called Atlas. As one of the SE’s at Rubrik I’m in the field every day (proudly) representing my company but also competing with other, more traditional backup and recovery vendors. What is apparent more and more is that these traditional vendors are also going down the appliance route to sell their solution into the market, and as such I sometimes get the pushback from potential customers saying they can also get an appliance based offer from their current supplier, or not really immediately grasping why this model can be beneficial to them.
A couple of things I wanted to clarify first, when I say “also down the appliance route” I need make clear that this is purely a way to offer the solution to market for us, there is nothing special about the appliance as such, all of the intelligence in Rubrik’s case lies in the software, we even started to offer a software only version in the form of a virtual appliance for ROBO use cases recently.
Secondly, some traditional vendors can indeed deliver their solution in an appliance based model, be it their own branded one, or pre-packaged via a partnership with a traditional hardware vendor. I’m not saying there is something inherently bad about this, simplifying the acquisition of a backup solution via an appliance based model is great, but there the comparison stops, it will still be a legacy based architecture with disparate software components, typically these software components, think media server,  database server, search server, storage node, etc. need individual love and care, daily babysitting if you will, to keep them going.
Lastly from a component point of view our appliance consists of multiple independent (masterless) nodes that are each capable of running all tasks of the data management solution, in other words there is no need to protect, or indeed worry about, individual software and hardware components as everything is running distributed and able to sustain multiple failures while remaining operational.

nospoon There is no spoon (box)

So the difference lies in the software architecture, not the packaging, as such we need to look beyond the box itself and dive into why starting from a clustered distributed system as a base makes much more sense in todays information era.

The session at TFD12 was presented by Adam Gee and Roland Miller, Adam is the lead of Rubrik’s distributed filesystem called Atlas, it shares some architectural principles with a previous filesystem Adam worked on while he was at Google called Colossus, most items you store while using Google services end up on Colossus, it itself is the successor to the Google File System (GFS) bringing the concept of a masterless cluster to it and making it much more scalable. Not a lot is available on the web in terms of technical details around Colossus, but you can read a high level article on Wired about it here.

Atlas, which sits at the core of the Rubrik architecture, is a distributed filesystem, built from scratch with the Rubrik data management application in mind. It uses all distributed local storage (DAS) resources available on all nodes in the cluster and pools them together into a global namespace. As nodes are added to the cluster the global namespace grows automatically, increasing capacity in the cluster. The local storage resources on each node consist of both SSD and HDD’s, the metadata of Atlas (and the metadata of the data management application) is stored in the metadata store (Callisto) which is also running distributed on all nodes in the SSD layer. The nodes communicate internally using RPC which are presented to Atlas by the cluster management component (Forge) in a topology aware manner thus giving Atlas the capability to provide data locality. This is needed to ensure that data is spread correctly throughout the cluster for redundancy reasons. For example, assuming we are using triple mirror, we need to store data in 3 different nodes in the appliance, let’s now assume the cluster grows beyond 1 appliance, then it would make more sense from a failure domain point of view to move 1 copy of the data from one of the local nodes to the other appliance.

screen-shot-2016-11-19-at-18-58-09

The system is self healing in the way that Forge publishes the disk and node health status and Atlas can react to that, again assuming triple mirror, if a node or entire brik (appliance) fails Atlas will create a new copy of the data on another node to make sure the requested failure tolerance is met. Additionally Atlas also runs a background task to check the CRC of each chunk of data to ensure what you have written to Rubrik is available in time of recovery. See the article How To Kill A Supercomputer: Dirty Power, Cosmic Rays, and Bad Solder on why that could be important.

screen-shot-2016-11-19-at-19-11-46

The Atlas filesystem was designed with the data management application in mind, essentially the application takes backups and places them on Atlas, building snapshots chains (Rubrik performs an initial full backup and incremental forever after that). The benefit is that we can instantly materialize any point in time snapshot without the need to re-hydrate data.

screen-shot-2016-11-19-at-19-44-00

In the example above you have the first full backup at t0, and then 4 incremental backups after that. Let’s assume you want to instantly recover data at point t3, this will simply be a metadata operation, pointers to the last time blocks making up t3 where mutated, there is no data movement involved.

Taking it a step further let’s now assume you want to use t3 as the basis and start writing new data to it from that point on. Any new data that you write to it (redirect-on-write) now goes to a log file, no content from the original snapshot is changed as this needs to be an immutable copy (compliancy). The use case here could be copy data management where you want to present a copy of a dataset to internal dev/test teams.

screen-shot-2016-11-19-at-19-55-54

Atlas also dynamically provides performance for certain operations in the cluster, for example a backup ingest job will get a higher priority than a background maintenance task. Because each node also has a local SSD drive Atlas can use this to place critical files on a higher performance tier and is also capable of tiering all data and placing hot blocks on SSDs. It also understands that each node has 3 HDDs and will these to stripe the data of a file across all 3 on a single node to take advantage of the aggregate disk bandwidth resulting in a performance improvement on large sequential reads and writes, by utilizing read-ahead and write buffering respectively.

For more information on Atlas you can find the blogpost Adam Gee wrote on it here, or watch the TFD12 recording here.

Whitewater 3 – waves of innovation washing onto the shore

Riverbed recently released the latest edition of it’s cloud storage gateway, both upgrading the software and providing new hardware options.

What is Riverbed Whitewater?

Whitewater is an on-premise appliance that connects your internal network with a cloud storage provider, it easily integrates your existing back-up/archive infrastructure with cloud storage, leveraging the cloud as a low cost tier for long term storage.

wwa3 overview

Whitewater brings cloud scale cost and protection (cloud data durability is extremely high (11x9s) due to advanced cloud architectures) benefits into your existing infrastructure. At the same time Whitewater provides fast restore since the local cache will hold the most recent backup data.

In contrast to Riverbed Steelhead, the WAN optimization solution, whitewater is single-ended (you only need 1 appliance in your datacenter), whereas Steelhead requires an appliance (or softclient) at both ends of the WAN connection.

On the front-end it presents itself as a CIFS/NFS share, providing easy integration with existing back-up applications, and on the back-end it connects to a cloud storage system using REST APIs.

wwa3 providers

Data that is written to Whitewater is deduplicated inline, and securely (encrypted at rest and in-flight) transferred to your cloud storage provider/system.

I’ve written about Whitewater a couple of times before;

http://filipv.net/2013/05/19/amazon-glacier-and-backup-economics/

http://filipv.net/2012/12/26/riverbed-whitewater-and-caringo-castor/

What’s new? – Bigger, Better, Faster

  • Whitewater now supports up to 2.88PB of source data locally cached
  • Up to 14.4PB of source data in the cloud
  • Scalability by optionally allowing you to connect disk shelve extensions
  • Faster performance, now ingesting up to 2.5TB/Hr
  • 10Gb connectivity
  • Ability to locally pin a data set
  • Ability to perform replication to a remote whitewater (peer replication)
  • Symantec Enterprise Vault support

Storage shelve extensions

The current version allows you to connect 2 additional storage shelves, greatly expanding the local cache. This combined with local data pinning and peer replication makes it feasible to use the system as a backup to disk system without the cloud tier. But the main purpose of the solution remains leveraging cloud storage economics for long term retention.

wwa3 shelves

Locally pin a data set

If you have a particular data set for which your SLA to the business requires a shorter RTO you can optionally lock this data set on the local cache (changes will still be replicated to the peer and/or cloud storage). This way you can ensure that this data set will always be recovered from the local cache at LAN speed.

pinned data set

Peer replication

Another standard feature (at no additional license cost) in version 3 of Whitewater is the ability to replicate data to a peer Whitewater at a DR site.

Since Whitewater uses inline deduplication this means that the primary appliance will sent only deduplicated (and encrypted) traffic towards the DR site, thus greatly reducing network transmissions. The secondary whitewater first needs to acknowledge the data before it is replicated to the cloud as a 3rd tier.

wwa3 replication

Although we are only transferring deduplicated data we still allow you to control the bandwidth used for replication both to the peer whitewater and the cloud.

wwa3 repl

Symantec Enterprise Vault support

Whitewater allows you to integrate the cloud as a storage vault for Symantec Enterprise Vault. Click here for more information on Enterprise Vault.

What if my datacenter is lost and I need to restore from the cloud?

First of all we would recommend replicating to a peer whitewater in a DR site so you don’t incur cloud restore charges or transmission delays. But we do allow you to download a virtual whitewater for FREE (read-only) which will allow you to quickly (or at least quicker since we are pulling out deduplicated data) restore your data and get back online.

free vwwa

A word about deduplication

In order to make cloud storage economically feasible Whitewater first deduplicates data before sending it to the cloud. Withe deduplication only unique data is stored on the disk thus guaranteeing much more efficient utilization of any storage.

In the process of deduplication the incoming data stream is split into blocks. A fingerprint (digital signature) is created for each block to uniquely identify it, as well as a signature index. The index provides the list of references in order to determine if a block already exists on disk. When the deduplication algorithm finds an incoming data block that has been processed before (a duplicate), it does not store it again but it creates a reference to it. References are generated every time a duplicate is found. If a block is unique, the deduplication system writes it to disk.

Some deduplication techniques split each file into fixed length blocks, others, like Whitewater use variable length blocks. Fixed Block deduplication involves determining a block size (size varies based on the system but is fixed) and segmenting files/data into those block sizes.

Variable Block deduplication involves using algorithms to determine a variable block size. The data is split based on the algorithm’s determination. When something changes, i.e. data is added so the blocks shift then the algorithm will determine the shift so the blocks that follow are not “lost” by the algorithm, fixed block length cannot do this.

In the example below we have a fixed block length of 3, so the incoming data is “sliced” into block of 3 characters. The arrow indicates a change to the data, i.e. we add a new character (A) upstream, the result of which is, since the boundaries with fixed length do not change, that all blocks now contain different data and there are zero block matches meaning all blocks are unique and will be written to disk.

Fixed block

Notice how the variable block deduplication has seemingly random block sizes. While this does not look too efficient compared to fixed block, notice what happens when we add the same upstream element to the data. 

VBL

Since the variable block length algorithm has determined the boundary for this particular data to lie between C and BB only the first block (AABC) has changed and needs to be written to disk, the other blocks remain unchanged and can be referenced by the deduplication algorithm.

Since Whitewater uses variable segment length inline deduplication this allows for higher dedupe ratios than fixed block length deduplication (see above), once we have deduplicated the data we use LZ compression to further compact the data. We see an average data set reduction of 10 to 30x depending on the source data.

dedupe ratio

If you are an existing Riverbed Whitewater customer you can download Whitewater 3.0.x here

Amazon Glacier and Backup economics

In the summer of last year Amazon announced Amazon Glacier, an extremely low cost storage service designed for data archiving and backup.

This makes it a very compelling solution for offloading your backup data to the cloud at low cost, but the point of a backup solution is not backing up your data, it is enabling restore of said data. The time it takes for the restore to complete must fit in your RTO (how long can the business wait before the data is back and useable), and this is where Amazon Glacier potentially falls down because the SLA it adheres to for getting your data back is between 3 and 5 hours, this is the reason why it is primarily marketed as an archive solution whereby the time constraints are less stringent and the cost of storing the archive takes precedent over the RTO. (If you need faster access to your data look at Amazon S3, but of course take into account the cost differential there).

glacier low cost

But have no fear, you can have your cake and eat it too, with Riverbed Whitewater you can leverage the low cost Amazon Glacier storage and still get fast restores. Whitewater is a tiered backup solution that ingests data from your existing, unmodified backup server, using inline deduplication to minimize the local storage required to maintain a full backup of your data locally, and sending the rest up to Amazon Glacier. Because most restores your users request are for relatively new data, chances are this data is stored on the local disks of the Whitewater appliance and the restore will be at LAN speed. The pricing of Amazon Glacier (see picture above) also assumes that storage retrieval will be infrequent (this is calculated in the pricing model), like say for archiving purposes, and with Whitewater it can be for backup purposes as well.

wwa glacier

ateamSo a serious reduction in data protection costs, eliminating tape, tape vaulting and disaster recovery storage sites. Improving DR readiness with secure anywhere accessible (think DR for example) Amazon Glacier storage services providing 11 9’s of durability. No need to change your existing backup application and processes, using less storage in Glacier because of our inline deduplication, and with local LAN speed restores. End-to-end security with secure data in flight and at rest with SSL v3 and AES 256 bit encryption.