Category: Cloud

Rubrik Alta feature spotlight: AWS.

Rubrik Alta feature spotlight: AWS.

With the announcement of Rubrik CDM version 4, codenamed Alta, we have added tons of new features to the platform, but since most of the release announcements are focussed on providing an overview of all the goodies, I wanted to focus more deeply on one specific topic namely our integration with AWS.

Rubrik has had a long history of successfully working with Amazon Web Services, we’ve had integration with Amazon S3 since our first version. In our initial release you could already use Amazon S3 as an archive location for on-premises backups, meaning take local backups of VMware Virtual Machines and then keep them on the local Rubrik cluster for a certain period of time (short term retention) with the option to put longer term retention data into Amazon S3. The idea was to leverage cloud storage economics and resiliency for backup data and at the same time have an active archive for longer term retention data instead of an offline copy on tape. Additionally the way our metadata system works allows us to only retrieve the specific bits of data you need to restore instead of having to pull down the entire VMDK file and incurring egress costs potentially killing the cloud storage economics benefit.

Screen Shot 2017-06-12 at 18.40.13

Also note there is no need to put a gateway device between the Rubrik cluster and Amazon S3, instead Rubrik can natively leverage the S3 APIs.

The ability to archive to Amazon S3 is still here in version 4 of course but now all the supported sources besides VMware ESXi, like Microsoft Hyper-V, Nutanix AHV, Physical Windows/Linux, Native SQL, and so on can also leverage this capability.

Then in Rubrik CDM version 3.2 we added the ability to protect native AWS workloads by having a Rubrik cluster run inside AWS using EC2 for compute and EBS for storage.

Screen Shot 2017-06-13 at 10.40.43We’ll run a 4 node (protecting your data using Erasure Coding) Rubrik cluster in you preferred AWS location (the Rubrik AMI image is uploaded as a private image).
We use the m4.xlarge instance type, using 64GB RAM, 256GB SSD (GP SSD (GP2)) and 24TB Raw Capacity (Throughput Optimised HDD (ST1)), resulting in 15TB usable capacity before deduplication and compression.

Once the Cloud Cluster is running you can protect your native AWS workloads using the connector based approach, i.e. you can protect Windows and Linux filesets, and Native SQL workloads in the Public Cloud.

Additionally since potentially you can now have a Rubrik cluster on-premises and a Rubrik Cloud Cluster you can replicate from your in-house datacenter to your public cloud environment and vice versa. Or replicate from one AWS region to another.Screen Shot 2017-06-13 at 10.46.06

Since the Cloud Cluster has the same capabilities as the on-premises one it can also backup your AWS EC2 workloads and then archive the data to S3, essentially going from EBS storage to S3. (Christopher Nolan really likes this feature).

Screen Shot 2017-06-13 at 10.53.11

Version 4 of Rubrik CDM extends our AWS capabilities by delivering on-demand app mobility, called CloudOn. The idea is that now you can take an application that is running on-premises and move it on the fly to AWS for DR, or Dev/Test, or analytics scenarios.

Screen Shot 2017-06-13 at 17.31.23

The way it will work is that just as since v1 you archive your data to AWS S3, once you decide to instantiate that workload in the public cloud you select “Launch On Cloud” from the Rubrik interface and a temporary Rubrik node (spun up on-demand in AWS in the VPC of your choice) converts those VMs into cloud instances (i.e. going from VMware ESXi to AWS AMI images). Once complete the temporary Rubrik node powers down and is purged.

launch on cloud

Rubrik scans the configuration file of a VM to understand its characteristics (compute, memory, storage, etc.) and recommends a compatible cloud instance type so you are not left guessing what resources you need to consume on AWS.

Alternatively we can also auto-convert the latest snapshot going to S3 so you don’t have to wait for the conversion action.

Data has gravity, meaning that once you accumulate more and more data it starts to make more sense to move the application closer to the data since the other way around starts to become less performant, and more and more cost prohibitive in terms of transport costs. So what if your data sits on-premises but your applications are running in the public cloud? Screen Shot 2017-06-13 at 18.06.53
For instance let’s say you want to perform business analytics using Amazon QuickSight but your primary data sources are in your private data center, now you simply archive data to S3 as part of your regular Rubrik SLA (archive data older than x days) and pick the point-in-time dataset you are interested in, use Rubrik to “Launch on Cloud”, and point Amazon QuickSight (or any other AWS service) to your particular data source.

Together these AWS integrations allow you to make use of the public cloud on your terms, in an extremely flexible way.

VMware Horizon DaaS and Google Chromebooks

Yesterday at VMware’s Partner Exchange (PEX), VMware announced that it is joining forces with Google to modernize corporate desktops for the Mobile Cloud Era by providing businesses with secure, cloud access to Windows applications, data and Desktops on Google Chromebooks.

In this post I want to provide an overview of what this means for VMware’s customers.

So first things first, what exactly is the “Mobile Cloud Era”?

If we look back (and generalise a bit) we can define two big computing architectures that defined IT in the past, these are the Mainframe era and the Client-Server era, more recently we are moving more and more towards a third computing architecture called the Mobile-Cloud era.

IT eras

The mainframe era was defined by highly centralised, highly controlled IT infrastructures that were meant to connect a relatively small user population to a small number of applications (mainly data processing).
In the client-server era the focus shifted to decentralisation and powering a large number of different types of applications each requiring (or so the thinking was) it’s own silo. The mobile-cloud era is all about letting billions of users consume IT as a Service, enabling access to any application, be it on-premises or in the cloud, from any location on any device.

keynote vmworld

So why are VMware and Google doing this?

I don’t think the actual delivery mechanism matters that much, we are in the fast moving/changing mobile cloud era and device choices are fluid, what really matters is the application. And, like it or not, most corporate customers are still relying heavily on the Microsoft Office suite to perform their day to day business.

Now, there is a transition under way (the cloud piece in the mobile cloud era) towards SaaS based productivity applications like Google Apps, which is why ChromeOS and the Chromebook exist in the first place. This transition will neither be all encompassing (in my humble opinion) and immediate, so until we get there this is a great option.

What is VMware Horizon DaaS?

During last year’s VMworld Europe in Barcelona VMware announced the acquisition of a company called Desktone. Desktone delivers desktops (DaaS) and applications as a service. The desktops can either be persistent Windows 7/8 desktops (so the actual Windows client-os, compatible with all your applications – it can also be a Linux desktop if you want) or Windows RDSH based desktops (the option you have here is that you can use the Windows server OS to get round the Microsoft VDI licensing snafu to potentially allow for additional cost-savings), or Remote Applications. The Desktone solutions have recently been brought under the Horizon umbrella, having been renamed to VMware Horizon DaaS.

desktone

You can access these services using PCoIP or RDP, this in contrast to the joint Google solution announced yesterday which will leverage VMware’s HTML5 Blast protocol for remote access.

What is the solution announced yesterday?

The combination of Horizon DaaS as a “back-end” and ChromeOS/Chromebooks as the “front-end” allows browser based, via the HTML5 Blast protocol, access to Windows desktops and applications via Google Chromebooks.

The Chromebooks are a low cost, “cloud first” form factor that combined with Horizon DaaS allow you to fully utilise this in a corporate setting. Different low-cost hardware options exist by vendors like Dell, HP, Samsung and Acer. Google itself also has a higher-end (if you want to maintain a low TCO for this combined solution maybe not for you) Chromebook called the Pixel.

pixel

Initially the solution will be available as an on-premises service, the joint solution is expected to be delivered as a fully managed, subscription DaaS offering by VMware and other vCloud Service Provider Partners, in the public cloud or within hybrid deployments.

Demo of Windows Desktops and Apps as a Cloud service on Chromebooks

Whitewater 3 – waves of innovation washing onto the shore

Riverbed recently released the latest edition of it’s cloud storage gateway, both upgrading the software and providing new hardware options.

What is Riverbed Whitewater?

Whitewater is an on-premise appliance that connects your internal network with a cloud storage provider, it easily integrates your existing back-up/archive infrastructure with cloud storage, leveraging the cloud as a low cost tier for long term storage.

wwa3 overview

Whitewater brings cloud scale cost and protection (cloud data durability is extremely high (11x9s) due to advanced cloud architectures) benefits into your existing infrastructure. At the same time Whitewater provides fast restore since the local cache will hold the most recent backup data.

In contrast to Riverbed Steelhead, the WAN optimization solution, whitewater is single-ended (you only need 1 appliance in your datacenter), whereas Steelhead requires an appliance (or softclient) at both ends of the WAN connection.

On the front-end it presents itself as a CIFS/NFS share, providing easy integration with existing back-up applications, and on the back-end it connects to a cloud storage system using REST APIs.

wwa3 providers

Data that is written to Whitewater is deduplicated inline, and securely (encrypted at rest and in-flight) transferred to your cloud storage provider/system.

I’ve written about Whitewater a couple of times before;

http://filipv.net/2013/05/19/amazon-glacier-and-backup-economics/

http://filipv.net/2012/12/26/riverbed-whitewater-and-caringo-castor/

What’s new? – Bigger, Better, Faster

  • Whitewater now supports up to 2.88PB of source data locally cached
  • Up to 14.4PB of source data in the cloud
  • Scalability by optionally allowing you to connect disk shelve extensions
  • Faster performance, now ingesting up to 2.5TB/Hr
  • 10Gb connectivity
  • Ability to locally pin a data set
  • Ability to perform replication to a remote whitewater (peer replication)
  • Symantec Enterprise Vault support

Storage shelve extensions

The current version allows you to connect 2 additional storage shelves, greatly expanding the local cache. This combined with local data pinning and peer replication makes it feasible to use the system as a backup to disk system without the cloud tier. But the main purpose of the solution remains leveraging cloud storage economics for long term retention.

wwa3 shelves

Locally pin a data set

If you have a particular data set for which your SLA to the business requires a shorter RTO you can optionally lock this data set on the local cache (changes will still be replicated to the peer and/or cloud storage). This way you can ensure that this data set will always be recovered from the local cache at LAN speed.

pinned data set

Peer replication

Another standard feature (at no additional license cost) in version 3 of Whitewater is the ability to replicate data to a peer Whitewater at a DR site.

Since Whitewater uses inline deduplication this means that the primary appliance will sent only deduplicated (and encrypted) traffic towards the DR site, thus greatly reducing network transmissions. The secondary whitewater first needs to acknowledge the data before it is replicated to the cloud as a 3rd tier.

wwa3 replication

Although we are only transferring deduplicated data we still allow you to control the bandwidth used for replication both to the peer whitewater and the cloud.

wwa3 repl

Symantec Enterprise Vault support

Whitewater allows you to integrate the cloud as a storage vault for Symantec Enterprise Vault. Click here for more information on Enterprise Vault.

What if my datacenter is lost and I need to restore from the cloud?

First of all we would recommend replicating to a peer whitewater in a DR site so you don’t incur cloud restore charges or transmission delays. But we do allow you to download a virtual whitewater for FREE (read-only) which will allow you to quickly (or at least quicker since we are pulling out deduplicated data) restore your data and get back online.

free vwwa

A word about deduplication

In order to make cloud storage economically feasible Whitewater first deduplicates data before sending it to the cloud. Withe deduplication only unique data is stored on the disk thus guaranteeing much more efficient utilization of any storage.

In the process of deduplication the incoming data stream is split into blocks. A fingerprint (digital signature) is created for each block to uniquely identify it, as well as a signature index. The index provides the list of references in order to determine if a block already exists on disk. When the deduplication algorithm finds an incoming data block that has been processed before (a duplicate), it does not store it again but it creates a reference to it. References are generated every time a duplicate is found. If a block is unique, the deduplication system writes it to disk.

Some deduplication techniques split each file into fixed length blocks, others, like Whitewater use variable length blocks. Fixed Block deduplication involves determining a block size (size varies based on the system but is fixed) and segmenting files/data into those block sizes.

Variable Block deduplication involves using algorithms to determine a variable block size. The data is split based on the algorithm’s determination. When something changes, i.e. data is added so the blocks shift then the algorithm will determine the shift so the blocks that follow are not “lost” by the algorithm, fixed block length cannot do this.

In the example below we have a fixed block length of 3, so the incoming data is “sliced” into block of 3 characters. The arrow indicates a change to the data, i.e. we add a new character (A) upstream, the result of which is, since the boundaries with fixed length do not change, that all blocks now contain different data and there are zero block matches meaning all blocks are unique and will be written to disk.

Fixed block

Notice how the variable block deduplication has seemingly random block sizes. While this does not look too efficient compared to fixed block, notice what happens when we add the same upstream element to the data. 

VBL

Since the variable block length algorithm has determined the boundary for this particular data to lie between C and BB only the first block (AABC) has changed and needs to be written to disk, the other blocks remain unchanged and can be referenced by the deduplication algorithm.

Since Whitewater uses variable segment length inline deduplication this allows for higher dedupe ratios than fixed block length deduplication (see above), once we have deduplicated the data we use LZ compression to further compact the data. We see an average data set reduction of 10 to 30x depending on the source data.

dedupe ratio

If you are an existing Riverbed Whitewater customer you can download Whitewater 3.0.x here

Amazon Glacier and Backup economics

In the summer of last year Amazon announced Amazon Glacier, an extremely low cost storage service designed for data archiving and backup.

This makes it a very compelling solution for offloading your backup data to the cloud at low cost, but the point of a backup solution is not backing up your data, it is enabling restore of said data. The time it takes for the restore to complete must fit in your RTO (how long can the business wait before the data is back and useable), and this is where Amazon Glacier potentially falls down because the SLA it adheres to for getting your data back is between 3 and 5 hours, this is the reason why it is primarily marketed as an archive solution whereby the time constraints are less stringent and the cost of storing the archive takes precedent over the RTO. (If you need faster access to your data look at Amazon S3, but of course take into account the cost differential there).

glacier low cost

But have no fear, you can have your cake and eat it too, with Riverbed Whitewater you can leverage the low cost Amazon Glacier storage and still get fast restores. Whitewater is a tiered backup solution that ingests data from your existing, unmodified backup server, using inline deduplication to minimize the local storage required to maintain a full backup of your data locally, and sending the rest up to Amazon Glacier. Because most restores your users request are for relatively new data, chances are this data is stored on the local disks of the Whitewater appliance and the restore will be at LAN speed. The pricing of Amazon Glacier (see picture above) also assumes that storage retrieval will be infrequent (this is calculated in the pricing model), like say for archiving purposes, and with Whitewater it can be for backup purposes as well.

wwa glacier

ateamSo a serious reduction in data protection costs, eliminating tape, tape vaulting and disaster recovery storage sites. Improving DR readiness with secure anywhere accessible (think DR for example) Amazon Glacier storage services providing 11 9’s of durability. No need to change your existing backup application and processes, using less storage in Glacier because of our inline deduplication, and with local LAN speed restores. End-to-end security with secure data in flight and at rest with SSL v3 and AES 256 bit encryption.

Riverbed Whitewater and Caringo CAStor

One of the coolest, in my humble opinion of course, solutions I was able to work with when I was at Dell, was the CAStor Object Storage system from Caringo, OEM’ed as the Dell DX6000.

The CTO and co-founder of Caringo is Paul Carpentier (the –car– in Caringo) known as the father of the Content Addressing concept. He previously worked at FilePool, where he invented the technology that created the Content Addressed Storage industry. FilePool, was sold to EMC who turned his work into the Centera platform.

CAS or Object Storage is one of three major forms of storage concepts (you can argue about how many forms of storage there are until the cows come home, but for the sake of simplicity I’ll focus on the three major ones here). The other two being file- and block storage.

In general the argument is that a certain type of content (documents, pictures, databases,…) requires, or at least works/fits better, when it resides on a specific type of storage. The access patterns of the different types of data make it that for example relational data is better served from a database on block storage as opposed to separate files from a file server.

Block storage has access to raw, unformatted hardware, when speed and space efficiency are most important this is probably a good fit. For example databases containing relational customer information.

File storage is most common to end-users, it takes a formatted harddrive and is exposed to the user as a filesystem. It is an abstraction on top of the storage device and thus you loose speed and space in favour of simplicity, it has inherent scalability issues (both in terms of the maximun size of a document, and the maximum number of documents you can store) and does not perform well when there is high demand for a certain piece of data by multiple users.

Object storage is probably the least familiar form of storage to the general public, it does not provide access to raw blocks of data, and it does not offer file based access*. It provides access to whole objects (files + metadata) or BLOBs of data. Typically you access it using a HTTP REST API specific to that CAS system. It lends itself particularly well to content like backup and archive data since that doesn’t change often. But also things like audio- and video files, medical images, large amounts of documents, etc. The object is to store a large, and growing, amount of data at a relatively low cost.

dell-dx-overview

The data and metadata are both written to the object storage system and get a unique UUID to reference it later.

A CAS system consist of a number of storage nodes (typically x86 servers with internal disks in a JBOD configuration – remember the relatively inexpensive remark above), in the case of CAStor it is a symmetric distributed system meaning that all nodes are exactly the same and they perform the same functions.

Data is stored and retrieved using HTTP REST APIs, the client writes data to the cluster using HTTP POST commands, the cluster has an internal load balancing algorithm that decides which node in the cluster will respond (HTTP 301 redirect response) to the client, the client is redirected to the node selected by the cluster and repeats it’s POST command, this time the node will issue the HTTP 100 continue command to tell the client to  start writing the data.

Once the data is received the node issues the HTTP 201 response to let the client know the data has been written. The response also includes a UUID (Universally Unique Identifier) which is needed later to retrieve the data. Now since one of the goals of a CAS system is to make sure your data is kept safe, the cluster now needs to replicate your freshly written data unto other nodes in the cluster, this replica is written shortly after the original data has been received (so yes, in theory if the cluster fails very shortly after the initial write the data was not replicated). All objects on the CAStor system are exact replicas, there is no special “original data set”.

In order to get the replica onto one or more of the other nodes in the cluster, the other node will issue a HTTP 200 GET request for the original piece of data, if you later need to read the data again the client would also issue a HTTP 200 GET request to a random node in the cluster, the cluster will locate the data (UUID) and again redirect (301) the client to one of the nodes containing the data using it’s internal load-balancing algorithm.

Now Riverbed has a Cloud Storage Gateway called Whitewater specifically designed to store and retrieve backup and archive data on object storage systems. Whitewater is compatible with a set of REST APIs that allows it to interact with a lot of public cloud storage systems like Amazon S3, Google, Azure, HP Cloud Services, AT&T Synaptic Storage as a Service, Nirvanix, Telus, Rackspace, and others. But Whitewater also allows you to utilize the same concept in a private cloud setup as long as you either use OpenStack Swift, or EMC Atmos.

ww

Recently Caringo announced the beta version of CAStor for OpenStack giving organizations an option to include a more robust, fully supported object store while still benefiting from the progress the OpenStack community is achieving.

castor-for-openstack

So by leveraging the Whitewater integration with OpenStack, you can now have your backup and archive data stored on very robust object storage in your own private cloud. The number of replicas and the geographical location (or in case of a private cloud your DR center location) of the replicas is dictated by the cloud storage policy (managed by the Content Router in Caringo terms), Whitewater would write to the CAS system and the right policy would be invoked to replicate the data in the right locations.

CR

The integration with your existing backup software is seamless since the Whitewater appliance presents itself using either CIFS or NFS towards your internal network while connecting to the object storage system using it’s specific APIs. Since Whitewater uses deduplication (and encryption in-flight and at-rest) you both benefit from having less data (dedupe) to store and having a cheaper (x86 servers) storage system to store it on.
In terms of scalability the system takes on new nodes and these are automatically used to redistribute the load if needed and go towards the unlimited (128bit namespace) scaling of the system.

How does your backup system scale these days..? 😉

*You can deploy a filesystem gateway to a CAS system so you can offer access via a filesystem if you want.

Your Salesforce.com, only faster.

As mentioned in my previous post Riverbed has a joined SaaS optimization solution with Akamai called Steelhead Cloud Accelerator. In this blog post I will show you how to use this technology to accelerate your salesforce (people and the application).

The picture below is a diagram of the lab environment I’ll be using for this setup.

The lab uses a WAN Simulator so we can simulate a cross-atlantic link towards Salesforce.com. For this simulation I have set the link to 200ms latency and 512Kbps.

For the Steelhead Cloud Functionality you need a specific firmware image, freely available to our customers on http://support.riverbed.com,  you can recognize this by the -sca at the end of the version number (right hand corner in the screenshot below).

Once you are using the firmware you get an additional option under Configure –> Optimization, called Cloud Accelerator. (see screenshot above).

Here you can register the Steelhead in our cloud portal (which is running as a public cloud service itself, running on Amazon Web Services). You can also enable one or more of our currently supported SaaS applications (Google Apps, Salesforce.com, and Office 365).

When you register the appliance on the Riverbed Cloud Portal you need to grant the appliance cloud service to enable it.

Once the appliance is granted service, the status on the Steelhead itself will change to “service ready”

So let’s first look at the unoptimized version of our SaaS application. As you can see in the screenshot below I have disabled the Steelhead optimization service so all connections towards Salesforce.com will be pass-through. You can also see the latency is 214ms on average and the bandwidth is 512Kbps.

I logged into Salesforce.com and am attempting to download a 24MB PowerPoint presentation, as you can see in the screenshot below this is estimated to take about 7 minutes to complete. Time for another nice unproductive cup of coffee…

If we now enable the optimization service on the Steelhead it will automatically detect that we are connecting to Salesforce.com and in conjunction with Akamai spin up a cloud Steelhead on the closest Akamai Edge Server next to the Salesforce.com datacenter I am currently using.

Looking at the current connections on the Steelhead you can see that my connections to Salesforce.com are now being symmetrically optimized by the Steelhead in the Lab and the Cloud Steelhead on the Akamai-ES.

Note the little lightning bolt in the notes section signifying that Cloud Acceleration is on.

Let’s attempt to download the presentation again.

Yeah, I think you could call that faster…

But that is not all, because we are using the same proven Steelhead technology including byte-level deduplication I can edit the PowerPoint file and upload it back to salesforce.com with a minimum of data transfer across the cloud.

I edited the first slide by changing the title and subtitle and will upload the changed file to my SaaS application, notice that the filename itself is also changed.

Looking at the current connections on the Steelhead you can see I am uploading the file at the same breakneck speed since I only need to transfer the changed bytes.

So there you have it, Salesforce.com at lightning speeds!

NOTE: I have not mentioned the SSL based configuration needed to allow us to optimize https based SaaS applications (as all of them are), I will cover this in a later post.

Where is my cloud?

Pixies anyone?

When you use applications on your PC at work in most cases (depending on when you read this) the server component of those applications will be sitting inside your company’s datacenter. A small but growing number of users don’t get all applications from inside their own datacenters but use externally hosted ones in the public cloud. Those applications are delivered as a service across the Internet to your PC, hence Software as a Service or SaaS.

The difference of course being that your IT department tightly controls what happens inside your datacenter and that it is likely to be very close to you as the user of the application, if not, your IT department can alleviate the distance problem (latency makes applications slow) by using WAN optimization.

Recently Google published a video that gives a peek inside one of their datacenters.

Notice something about those servers? They don’t belong to your company do they? And I’m betting you don’t live near that particular datacenter either.

So not having any say about what is installed at the Google datacenter and having lot’s of distance (latency) between your PC and the server powering your application can be a performance nightmare. Latency makes or breaks a SaaS application.
Microsoft also has this rather nice video about their cloud services, it even starts by asking “where is the Microsoft Cloud?”

Obsessed with performance Riverbed has figured out how to accelerate these SaaS applications so you don’t kill the productivity of the average business user who has to use the application every day.

Riverbed, in partnership with Akamai, is delivering SaaS acceleration via our Steelhead Cloud Accelerator (CSA) solution.

We use the Akamai network to find an Akamai server as close as possible to the datacenter powering your SaaS application and spin up a Cloud Steelhead system to provide symmetrical WAN acceleration.

Since you need to traverse the Internet when finding your way to the datacenter hosting your SaaS application there is a good possibility of not having the most efficient route from your PC to the server powering your app. Hence we also use Akamai SureRoute which triplicates the first packet going out to the datacenter and then chooses the path with the fastest round trip response so you not only have a steelhead very close to the datacenter, you now also have the fastest path across the Internet.

The video below shows the actual results of using this technology at Interop 2012 in New York.

 

So how do you go about enabling this technology? For my next post I’ll walk you through it step by step.

Using Windows Azure Cloud Storage with Riverbed Whitewater

Riverbed Whitewater allows you to connect your on-premise backup and archive environment to a public cloud storage provider like Microsoft’s Windows Azure.

Note that at this stage Whitewater is not meant to be a gateway for primary storage in the cloud, but rather provides an optimized (using network optimization, de-duplication and encryption) way to store your backup and archive data in the public cloud.

It connects your existing backup mechanism (Riverbed supports most backup vendors as seen in the picture below) with a number of cloud storage systems.

Also note that the Windows Azure Platform is a much broader public cloud platform than just cloud storage and provides a Platform-as-a-Service (PaaS) for customers to run applications (web apps, middle-tier/worker apps, and stand alone VMs) in the public cloud.

Windows Azure Platform

Windows Azure is made up of different building blocks namely:

  • Windows Azure Compute (RDFE, Fabric Controller, Networking)
  • Windows Azure Storage, consisting of BLOBs, Tables, and Queues (more on this later)
  • Windows Azure CDN (more on this later)
  • SQL Azure
  • AppFabric PaaS Middleware Services (AppFabric Caching, Access Control Server, Service Bus)

We are going to focus on the storage piece, if you want to learn more about the Windows Azure Platform I highly recommend watching some of Mark Russinovich’s Azure sessions, you can find these recordings on channel9  (just Bing for Azure).

The advantages of using Windows Azure storage are:

  • Fault-tolerance: Windows Azure Blobs, Tables and Queues stored on Windows Azure are replicated three times in the same data centre for resiliency against hardware failure. No matter which storage service you use, your data will be replicated across different fault domains to increase availability
  • Geo-replication: Windows Azure Blobs and Tables are also geo-replicated between two data centres 100s of miles apart from each other on the same continent, to provide additional data durability in the case of a major disaster, at no additional cost.
  • REST and availability: In addition to using Storage services for your applications running on Windows Azure, your data is accessible from virtually anywhere.
  • Content Delivery Network: With one-click, the Windows Azure CDN (Content Delivery Network) dramatically boosts performance by automatically caching content near your customers or users.
  • Price: No brainer

Windows Azure BLOBs, Tables, and Queues

Blobs and Tables reside within a Windows Azure Storage account. A single Windows Azure Storage account can hold up to 100 TB of data. If you need to store more data, then you can create additional storage accounts. Within a storage account, you can store data in Blobs and Tables. A storage account can also contain Queues. Here is a brief description for Blobs, Tables, and Queues:

  • Blobs (Binary Large Objects). Blobs are for storing individual data items (like files) which can be large in size. A single Blob typically contains a document (xml, docx, pptx, xlxs, pdf, etc.), a picture, a song or a video.
  • Tables. Tables contain large collections of property-bag state (called entities) such as customer information, order data, news feed items. Tables sort their entities and can return a filtered subset of the entities.
  • Queues. Windows Azure Queue storage is a service for storing large numbers of messages that can be accessed from anywhere in the world via authenticated calls using HTTP or HTTPS

Whitewater uses the Binary Large Object (BLOB) Service, an easy way to store text or binary data within Windows Azure, we connect to the BLOB storage service using REST APIs over port 443.

Public Cloud Data Security

One of the key security tactics you want to employ when storing your data in a public cloud (where everything is shared to take advantage of economies of scale) is making sure your data remains private by using encryption, the data we send to Windows Azure is encrypted in flight and at rest by using SSLv3 and AES 256.

Riverbed Whitewater CSG

The Riverbed Whitewater Cloud Storage Gateway presents itself to your existing backup infrastructure as either CIFS or NFS, so integrating it into your environment should be a snap.

Whitewater dedupes the date inline (using Riverbed’s very efficient and effective network optimization  and storage deduplication method) , we expect a dedupe ratio between 10 and 30x for backup data, meaning that a single Whitewater CSG can represent up to 1PB of cloud storage. Whitewater appliances use inline, variable-length, byte-level deduplication that maximizes the ability to identify duplicate data and reduce storage requirements.

The advantages of using Riverbed Whitewater CSG are:

  • Cost savings: Whitewater gateways reduce the management overhead, network bandwidth, and storage that
    are required for data protection.
  • Simplicity: Whitewater allows you to replace tape based infrastructures so you have a full lights-out solution requiring no ongoing tape management overhead, in addition to improved recovery point objectives (RPOs) and recovery time objectives (RTOs).
  • Drop-in DR: Because data is stored in the cloud, there is no need to provision and manage a second site for
    DR purposes. Should the main datacentre go down, a virtual or physical Whitewater can be deployed to a third-party site and, once the encryption key and cloud account information is loaded, restores can commence immediately. In addition, the virtual Whitewater could be spun up in the cloud itself, where the cloud could be used as the DR site.
  • Cloud effects: Whitewater also unlocks all the benefits of the cloud for IT administrators in one simple-to deploy
    appliance, including pay-as-you grow cost management, instant capacity scaling to accommodate demand fluctuations, greatly simplified capital budgeting, and access-anywhere flexibility to maximize IT efficiency.

The Whitewater appliance also uses a local disk cache to store to most recent backups, seeing that most restores require the latest backup data this significantly speeds up recovery time. At the same time we also store this local cache data on Windows Azure storage for data protection purposes. The amount of local cache, the ingest ratio (how much data we can take in at the LAN side) and supported cloud storage amount depend on the model of Whitewater appliance.

Azure: of or having a light, purplish shade of blue, like that of a clear and unclouded sky.

Whitewater: of or moving over or through rapids.