Category: Riverbed

New Year, New Job.

New Year, New Job.

I’m super excited to be taking on a new role in the NSBU at VMware, as of the 1st of January I’ll officially be joining the team as a Sr. Systems Engineer for the Benelux. I’ll be focused mainly on VMware NSX, including it’s integrations with other solutions (Like vRA and OpenStack for example).

Unofficially I’ve been combining this function with my “real” job for a couple of months now ever since a dear and well respected colleague decided to leave VMware. Recently I was fortunate enough to get the opportunity to attend a 2 week training at our Palo Alto campus on NSX-v, NSX-MH, OpenStack, VIO, OVS,…


The experience was face-meltingly good, I definitely learned a lot and got the opportunity to meet many wonderful people. One conclusion is that the NSX team certainly is a very interesting and exciting place to be in the company.

In the last few months I got my feet wet by training some of our partner community on NSX (most are very excited about the possibilities, even the die-hard hardware fanatics), staffing the NSX booth at VMworld Europe, and by having some speaking engagements like my NSX session at the Belgian VMUG.


So why NSX?

In the past I’ve been working on a wide variety of technologies (being in a very small country and working for small system integrators you need to be flexible, and I guess it’s also just the way my mind works #squirrel!) but networking and virtualisation are my two main fields of interest so how convenient that both are colliding!
I’ve been a pure networking consultant in the past, mainly working with Cisco and Foundry/HP ProCurve and then moved more into application networking at Citrix ANG and Riverbed.

The whole network virtualisation and SDN (let’s hold of the discussion of what’s what for another day) field are on fire at the moment and are making the rather tedious and boring (actually I’ve never really felt that, but I’m a bit of a geek) field of networking exciting again. The possibilities and promise of SDN have lot’s of potential to be disruptive and change an industry, and I’d like to wholeheartedly and passionately contribute and be a part of that.

As NSX is an enabling technology for a lot of other technologies it needs to integrate with a wide variety of solutions. 2 solutions from VMware that will have NSX integrated for example are EVO:RACK and VIO. I look forward to also work on those and hopefully find some time to blog about it as wel.

Other fields are also looking to the promise of SDN to enable some new ways of getting things done, like SocketPlane for example, trying to bring together Open vSwitch and Docker to provide pragmatic Software-Defined Networking for container-based clouds. As VMware is taking on a bigger and bigger role in the Cloud Native Apps space it certainly will be interesting to help support all these efforts.

“if you don’t cannibalise yourself, someone else will”
-Steve Jobs

I’m enjoying a few days off with my family and look forward to returning in 2015 to support the network virtualisation revolution!


Horizon Branch Office Desktop Architecture

VMware has a number of virtual desktop architectures that give a prescriptive approach to matching a companies’ specific use case to a validated design. These architectures are not price-list bundles, they include VMware’s own products combined with 3rd party solutions with the goal of bringing customers from the pilot phase all the way into production.

At the moment there a 4 different architectures focussed on different use cases, these are the Mobile Secure Workplace, the AlwaysOn Workplace, the Branch Office Desktop, and the Business Process Desktop.


In this article I wanted to focus in on the Branch Office Desktop but in the interest of completeness please find below the partner solutions around:

Seeing that there are over 11 million branch offices across the globe, a lot of people are working with remote, or distributed, IT infrastructures which potentially have a lot of downsides. (No remote IT staff, slow and unreliable connectivity, no centralised management,…).


With the Horizon Branch Office Desktop you have some options to alleviate those concerns and bring the remote workers into the fold. Depending on your specific needs you could look at several options.

If you have plenty of bandwidth and low latency, using a traditional centralised Horizon View environment is going to be the most cost effective and easy path to pursue. There are of course additional options if you have bandwidth concerns but still want to provide a centralised approach.

Optimized WAN connectivity delivered by F5 Networks.

The F5 solution offers simplified access management, hardened security, and optimized WAN connectivity between the branch locations and the primary datacenter. Using a Virtual Edition of F5’s Traffic Manager in the branch combined with a physical appliance in the datacenter.


The solution provides secure access management via the BIG-IP APM (access policy manager) which is an SSL-VPN solution with integrated AAA services and SSO capabilities. The BIG-IP LTM (local traffic manager) is an Application Delivery Networking solution that provides load-balancing for the Horizon View Security and Connection servers. The solution can also provide WAN optimisation through it’s Wan Optimization Manager (WOM) module, in this case focused on other non PCoIP branch traffic.

If you find that ample bandwidth is not available however you still have other options like the architectures combining Horizon with Riverbed, Cisco, and IBM which I’ll focus on in this article.

Riverbed for the VMware (Horizon) Branch Office Desktop.

With Riverbed’s architecture we essentially take your centralised storage (a LUN from your existing SAN array) and “project” this storage across the WAN towards the branch office. In the branch we have an appliance, called the Granite Edge (steelhead EX + Granite in the picture below) which then presents this “projected” LUN to any server, including itself (the Granite Edge appliance is also a x86 server running VMware ESXi). If we install the virtual desktops on the LUN we have just “projected” out from the central SAN environment then these desktops are now essentially locally available in the branch office. This means that from the POV of the end-user they setup a local (LAN) PCoIP connection toward the virtual desktop and can work with the same local performance one would expect in the datacenter location.


The end-result is that from a management perspective you keep (or gain) centralised control and from an end-user perspective you get the same performance as if you were local. For more details on this architecture you can download a deployment guide here: Deployment Guide: Riverbed for the VMware Branch Office Desktop …

Cisco Office in a Box.

With Cisco’s Office in a Box architecture you take their Integrated Services Routers Generation 2 (ISR G2) platforms (Cisco 2900 and 3900 Series ISRs) and the Cisco UCS E-Series Servers, and combine those into one physical platform that can host up to 50 virtual desktops in a Branch Office.

cisco office in a box

In this case you have essentially built a remote desktop appliance that sits in the branch office, all virtual machines share the direct-attached storage (DAS) of the Cisco UCS E-Series blade. So in this case the management domain is not stretched across the WAN but instead you have a “pod-like” design that includes everything you need to run virtual desktops in the branch.


For more information on Cisco’s architecture please see:

IBM Branch Office Desktop.

IBM has another validated approach that combines VMware Mirage and VMware Horizon View technologies to address the varying requirements within the branch office.

With VMware Mirage you can centrally manage OS images for both persistent virtual desktops and physical endpoints, while ensuring employees have fast, secure access to applications and data. With centralized images and layered single image management, the same image can be deployed in a server-hosted virtual desktop for remote execution and natively to a physical PC or client hypervisor for local execution.

This approach let’s you deliver centrally managed desktops with LAN-like performance and disaster recovery capabilities to locations with robust and reliable as well as well as unreliable wide area networks.

These components run on IBM’s (Lenovo’s) System x and FlexSystems compute nodes, IBM storage and IBM System networking components.


For more information on the IBM architecture please see:

Alternatively (or in conjunction with all the architectures mentioned) we can also independently leverage Horizon Mirage for the Branch Office, specifically if you have to deal with frequently disconnected users (laptop users that are not always on the office for example) or physical devices.

For more information on all these Branch Office architectures please see:  and for the partner extended capabilities.

Whitewater 3 – waves of innovation washing onto the shore

Riverbed recently released the latest edition of it’s cloud storage gateway, both upgrading the software and providing new hardware options.

What is Riverbed Whitewater?

Whitewater is an on-premise appliance that connects your internal network with a cloud storage provider, it easily integrates your existing back-up/archive infrastructure with cloud storage, leveraging the cloud as a low cost tier for long term storage.

wwa3 overview

Whitewater brings cloud scale cost and protection (cloud data durability is extremely high (11x9s) due to advanced cloud architectures) benefits into your existing infrastructure. At the same time Whitewater provides fast restore since the local cache will hold the most recent backup data.

In contrast to Riverbed Steelhead, the WAN optimization solution, whitewater is single-ended (you only need 1 appliance in your datacenter), whereas Steelhead requires an appliance (or softclient) at both ends of the WAN connection.

On the front-end it presents itself as a CIFS/NFS share, providing easy integration with existing back-up applications, and on the back-end it connects to a cloud storage system using REST APIs.

wwa3 providers

Data that is written to Whitewater is deduplicated inline, and securely (encrypted at rest and in-flight) transferred to your cloud storage provider/system.

I’ve written about Whitewater a couple of times before;

What’s new? – Bigger, Better, Faster

  • Whitewater now supports up to 2.88PB of source data locally cached
  • Up to 14.4PB of source data in the cloud
  • Scalability by optionally allowing you to connect disk shelve extensions
  • Faster performance, now ingesting up to 2.5TB/Hr
  • 10Gb connectivity
  • Ability to locally pin a data set
  • Ability to perform replication to a remote whitewater (peer replication)
  • Symantec Enterprise Vault support

Storage shelve extensions

The current version allows you to connect 2 additional storage shelves, greatly expanding the local cache. This combined with local data pinning and peer replication makes it feasible to use the system as a backup to disk system without the cloud tier. But the main purpose of the solution remains leveraging cloud storage economics for long term retention.

wwa3 shelves

Locally pin a data set

If you have a particular data set for which your SLA to the business requires a shorter RTO you can optionally lock this data set on the local cache (changes will still be replicated to the peer and/or cloud storage). This way you can ensure that this data set will always be recovered from the local cache at LAN speed.

pinned data set

Peer replication

Another standard feature (at no additional license cost) in version 3 of Whitewater is the ability to replicate data to a peer Whitewater at a DR site.

Since Whitewater uses inline deduplication this means that the primary appliance will sent only deduplicated (and encrypted) traffic towards the DR site, thus greatly reducing network transmissions. The secondary whitewater first needs to acknowledge the data before it is replicated to the cloud as a 3rd tier.

wwa3 replication

Although we are only transferring deduplicated data we still allow you to control the bandwidth used for replication both to the peer whitewater and the cloud.

wwa3 repl

Symantec Enterprise Vault support

Whitewater allows you to integrate the cloud as a storage vault for Symantec Enterprise Vault. Click here for more information on Enterprise Vault.

What if my datacenter is lost and I need to restore from the cloud?

First of all we would recommend replicating to a peer whitewater in a DR site so you don’t incur cloud restore charges or transmission delays. But we do allow you to download a virtual whitewater for FREE (read-only) which will allow you to quickly (or at least quicker since we are pulling out deduplicated data) restore your data and get back online.

free vwwa

A word about deduplication

In order to make cloud storage economically feasible Whitewater first deduplicates data before sending it to the cloud. Withe deduplication only unique data is stored on the disk thus guaranteeing much more efficient utilization of any storage.

In the process of deduplication the incoming data stream is split into blocks. A fingerprint (digital signature) is created for each block to uniquely identify it, as well as a signature index. The index provides the list of references in order to determine if a block already exists on disk. When the deduplication algorithm finds an incoming data block that has been processed before (a duplicate), it does not store it again but it creates a reference to it. References are generated every time a duplicate is found. If a block is unique, the deduplication system writes it to disk.

Some deduplication techniques split each file into fixed length blocks, others, like Whitewater use variable length blocks. Fixed Block deduplication involves determining a block size (size varies based on the system but is fixed) and segmenting files/data into those block sizes.

Variable Block deduplication involves using algorithms to determine a variable block size. The data is split based on the algorithm’s determination. When something changes, i.e. data is added so the blocks shift then the algorithm will determine the shift so the blocks that follow are not “lost” by the algorithm, fixed block length cannot do this.

In the example below we have a fixed block length of 3, so the incoming data is “sliced” into block of 3 characters. The arrow indicates a change to the data, i.e. we add a new character (A) upstream, the result of which is, since the boundaries with fixed length do not change, that all blocks now contain different data and there are zero block matches meaning all blocks are unique and will be written to disk.

Fixed block

Notice how the variable block deduplication has seemingly random block sizes. While this does not look too efficient compared to fixed block, notice what happens when we add the same upstream element to the data. 


Since the variable block length algorithm has determined the boundary for this particular data to lie between C and BB only the first block (AABC) has changed and needs to be written to disk, the other blocks remain unchanged and can be referenced by the deduplication algorithm.

Since Whitewater uses variable segment length inline deduplication this allows for higher dedupe ratios than fixed block length deduplication (see above), once we have deduplicated the data we use LZ compression to further compact the data. We see an average data set reduction of 10 to 30x depending on the source data.

dedupe ratio

If you are an existing Riverbed Whitewater customer you can download Whitewater 3.0.x here

VMware Branch Office Desktop with Granite and Atlantis ILIO

When using VMware View, or any other VDI based solution for that matter, across a Wide Area Network you need to think about certain limitations inherit in this setup that can potentially limit the user experience for your remote users.

Running your virtual desktops in the data center and connecting over the WAN.

If you decide to keep the virtual desktops in the data center and let your users connect remotely, the user experience will be impacted by the amount of bandwidth, the latency, the type of application, and the remoting protocol. In the case of VMware View we are using PCoIP* across the WAN. With Riverbed Steelhead you can use WAN optimization technology to optimize PCoIP, for example Riverbed Steelhead can optimize printer mappings, drive mappings and USB redirection between the branch office and the data center.

riverbed pciop

Riverbed Steelhead also enables QoS for PCoIP giving fine bandwidth control and latency prioritization for virtual channels within a PCoIP stream, enabling fine-tuning of traffic including voice, video and display rendering.

Running your virtual desktops in the branch office.

To get round the bandwidth and latency issue you could also decide to host the virtual desktop vm’s in the branch office, Riverbed Granite allows you to host the VM’s remotely while the central management components still remain in the data center.


The net result is that you only need bandwidth for PCoIP in the branch office, where it is readily available and is not impacted by latency, and that the SAN hosting the virtual desktops vm’s is less impacted by the IOPS requirement when booting and running the vm’s since they are now running on local blockstore of the Granite appliance in the branch office. All while maintaining central management from the data center.

Now depending on the amount of virtual desktops you need to run in the branch office you could be impacted by the amount of IOPS required, the Steelhead EX appliance in the branch which runs the Granite Edge component has a certain amount of internal disks (HDD for Granite Blockstore, SSD for Steelhead Datastore) which translates to a maximum amount of IOPS available for you virtual desktops. The total amount of IOPS we can serve from Steelhead EX depends on the model.

Let’s assume you are running a big branch office and you require a large amount of IOPS to keep user experience optimal, again you have several options, you could run multiple Steelhead EX + Granite appliances (Riverbed supports up to 80 branch offices connected to a single Granite Core in the data center), or you could use a solution like Atlantis Computing ILIO to leverage your server’s RAM to satisfy your IOPS requirements. Steelhead EX has a certain amount of memory depending on the model or you can use an external VMware host chock full of RAM and connect that to the Granite component in the branch office.

So how much IOPS do you need to run your virtual desktops?

A lot has been written about the IOPS requirements for VDI, there are numerous whitepapers and VDI storage calculators out there that will give you some idea of the amount of IOPS you should expect, just be careful with steady state numbers vs booting the vm vs starting an application (application virtualization also helps reduce IOPS requirements here), the idea is that you want to provide a user experience that is at least as good compared to working locally so your users wont revolt. In general the Windows OS will consume as much disk IO or throughput to the hard drive as is available, additionally Windows desktop workloads are write heavy (70-80% writes, 20-30% reads).

VMware itself has also provided a way to alleviate IOPS requirements with its View Storage Accelerator introduced in VMware View 5.1, this is a great addition to limit read IOPS (20-30% in Windows virtual desktops) but as such still leaves us with the write IOPS requirement.

Atlantis ILIO is a virtual machine that is hosted on the same host running the virtual desktops (in our case the Steelhead EX or an external VMware host), it essentially presents the RAM of the host as a datastore where the virtual desktops are run from, providing IOPS from RAM (nanosecond latency as compared to microsecond latency when using flash based arrays or PCI based flash cards).


By using inline deduplication you further limit the amount of IOPS needed from the backend storage (in our case the Granite Edge) since less blocks are being transferred and also limits the amount of RAM required to run the virtual desktops.

A closer look at Atlantis ILIO.

First we need to differentiate bewteen persistent and non-persistent VDI desktops. For non-persistent desktops ILIO has had a solution for some time to just run these desktops from RAM without needing persistent storage, when the server dies you just reboot the virtual desktop on another host and start working from there.

With persistent desktops there is a need to write data to persistent storage so your users adjustments aren’t lost after a reboot, with the release of ILO 4.0 this is now possible. I’ll further explore the persistent desktop use case since this is the most interesting one and has the bigger IOPS requirements.

The VM you install on the host is called the session host, this session host hosts the virtual desktops, it exposes the RAM of the host as a NFS or iSCSI mountpoint to which you attach the VMware datastore.

Once data needs to be written to the datastore ILIO (In Line Image Optimization) performs inline deduplication and compression, and Windows I/O optimization. (potentially fixing the I/O blender issue by optimizing random 4K blocks into sequential 64K blocks).


The Replication Host stores the persistent data of multiple Session Hosts and can further deduplicate data (across multiple hosts this time) and as such further reduces the storage requirements of the SAN/NAS. The replication host is responsible for making sure that any changes made to the desktop are saved to a persistent storage device, either SAN or NAS. In order to get the data from the RAM to the Replication host Atlantis uses Fast Replication. (you can run session and replication hosts on the same server if you want).  So once you need to restart the virtual desktop on another host, the persistent state of the desktop is retrieved by the replication host from persistent storage. With all these features in the I/O path, Atlantis estimates that they only need around 3GB of persistent storage space per desktop.

See below for a demo of the Atlantis ILIO Persistent VDI 4 user experience

Of course this is not the only way to deal with user experience requirements for branch office VDI and a lot of options exist out there but as far as I am concerned this is definitely one of the coolest.

*PCoIP is not the only option you have as a remoting protocol, you can also use RDP or the HTML5 Blast “client”.

Disclaimer: This is my personal blog and this post is in no way endorsed or approved by Riverbed, I have not built this solution in a production network and cannot comment on real life feasibility.

Amazon Glacier and Backup economics

In the summer of last year Amazon announced Amazon Glacier, an extremely low cost storage service designed for data archiving and backup.

This makes it a very compelling solution for offloading your backup data to the cloud at low cost, but the point of a backup solution is not backing up your data, it is enabling restore of said data. The time it takes for the restore to complete must fit in your RTO (how long can the business wait before the data is back and useable), and this is where Amazon Glacier potentially falls down because the SLA it adheres to for getting your data back is between 3 and 5 hours, this is the reason why it is primarily marketed as an archive solution whereby the time constraints are less stringent and the cost of storing the archive takes precedent over the RTO. (If you need faster access to your data look at Amazon S3, but of course take into account the cost differential there).

glacier low cost

But have no fear, you can have your cake and eat it too, with Riverbed Whitewater you can leverage the low cost Amazon Glacier storage and still get fast restores. Whitewater is a tiered backup solution that ingests data from your existing, unmodified backup server, using inline deduplication to minimize the local storage required to maintain a full backup of your data locally, and sending the rest up to Amazon Glacier. Because most restores your users request are for relatively new data, chances are this data is stored on the local disks of the Whitewater appliance and the restore will be at LAN speed. The pricing of Amazon Glacier (see picture above) also assumes that storage retrieval will be infrequent (this is calculated in the pricing model), like say for archiving purposes, and with Whitewater it can be for backup purposes as well.

wwa glacier

ateamSo a serious reduction in data protection costs, eliminating tape, tape vaulting and disaster recovery storage sites. Improving DR readiness with secure anywhere accessible (think DR for example) Amazon Glacier storage services providing 11 9’s of durability. No need to change your existing backup application and processes, using less storage in Glacier because of our inline deduplication, and with local LAN speed restores. End-to-end security with secure data in flight and at rest with SSL v3 and AES 256 bit encryption.

Boot from SAN? How about boot from WAN!

If you ask an IT administrator to draw his/her ideal IT architecture you’ll probably get a picture of a big consolidated datacenter with all remote branches connecting to it without any locally installed servers/storage.

Or not, since this would not deliver the local performance needed to keep your branch office users happy, and if the WAN connection went down nothing would work anymore.

Cue Riverbed Granite.

With Granite we can use your consolidated infrastructure in the datacenter to power your branch remotely while still delivering on centralized management and provisioning.


The Granite Core appliance in the datacenter will “project” a iSCSI LUN from your centralized SAN to the Steelhead EX (with Granite license) in the branch office making data, applications, and virtual machines available locally.

There are many use cases for this deployment model but in this blogpost I wanted to call out booting a virtual server across a high latency, low bandwith WAN link.

The problem when wanting to boot an entire virtual windows server from the datacenter across this limited bandwith link with substantial latency is that you simply would need to transfer too much data (i.e. how big is your VM of windows server 2008?), it would either take too long, or time out altogether.

What Granite is able to do is make a correlation between the file system and the block based storage (supported filesystems today are NTFS and VMFS) so we can predicatively transfer the blocks of storage from the SAN to power the filesystem in the branch without needing to transfer the entire image. (i.e. what blocks do we need to boot windows server 2008). Combined with WAN optimization using the Steelhead appliances this results in being able to comfortably boot a server and use it locally.

In this case I will use an external ESXi host in the branch and make an iSCSI connection to the “projected” LUN on the Steelhead EX appliance. It is also possible to use ESXi on top of the Steelhead EX appliance so all functionality is residing in one box (the branch office box if you will).

In terms of iSCSI functionality, the Steelhead EX appliance will act as the iSCSI initiator towards the Granite Core which will act as the iSCSI target (the local LUN in the datacenter), then the Steelhead EX will provide the iSCSI target (the “projected” LUN) towards the iSCSI initiator from the ESXi host.

core lun

On the Granite Core appliance I have a LUN called CDrive which contains the VM image of my Windows Server 2008, the LUN has been exposed to the IQN of the ESXi host.

I then need to connect my Granite Edge (Steelhead EX) to the Granite core.

edge to core conf

Once the connection is complete the GUI will show both ends connected and healthy


Next I will bring the CDrive LUN online on the core so it can be used from the edge.

cdrive online

Then from my ESXi host in the branch office I need to connect to the iSCSI LUN on the Granite Edge appliance (make sure the IQN from the ESXI host matches the configured initiator on the Edge).

esxi iqn

Then rescan the HBA on the ESXi host and simply add the storage to the host.

add storage

You can then browse the datastore and add the Windows Server VM to the inventory and place it on the host.


Now we are ready to boot our Windows Server VM across the WAN (in this case the Core is in San Francisco and the ESXi host is in Antwerp).

The report on the Edge will give some insight into what amount of data is being pulled across the WAN to actually boot the server. (what data is already locally available vs what data needs to be pushed over the WAN).

Screen Shot 2013-03-29 at 11.54.08 AM

In this case it took about 5 minutes to boot (I have 166ms latency and getting around 512Kbps throughput) the Windows Server 2008 VM.

Screen Shot 2013-03-29 at 11.56.22 AM

All operations (write new data etc.) you perform inside the VM are now committed locally to the Granite Edge (LAN speed read/write) and will be asynchronously synced back to the Core. You can track the amount of uncommitted data at the edge in the reports.


Because of the integration with the SAN we will first commit all data back to the core before allowing manipulation in the datacenter. I.e. you take a snapshot on your SAN of the “projected” LUN, we will first transfer the data from the Edge and then allow the snapshot to go through.

SSL Acceleration

One of the prerequisites for WAN optimization is that the traffic we are attempting to de-duplicate across the WAN is not encrypted, we need “clear-text” data in order to find data patterns so de-duplication is most optimum.

But Steelhead can optimize SSL encrypted data by applying the same optimization methods, while still maintaining end-to-end security and keeping the trust model intact.

To better understand how we perform SSL optimization let’s first look at a simple example of requesting a secured webpage from a webserver.

SSL example

The encryption using a private key/public key pair ensures that the data can be encrypted by one key but can only be decrypted by the other key pair. The trick in a key pair is to keep one key secret (the private key) and to distribute the other key (the public key) to everybody.

All of the same optimizations that are applied to normal non-encrypted TCP traffic, you can also apply to encrypted SSL traffic. Steelhead appliances accomplish this without compromising end-to-end security and the established trust model. Your private keys remain in the data center and are not exposed in the remote branch office location where they might be compromised.

The Riverbed SSL solution starts with Steelhead appliances that have a configured trust relationship, enabling the them to exchange information securely over their own dedicated SSL connection. Each client uses unchanged server addresses and each server uses unchanged client addresses; no application changes or explicit proxy configuration is required. Riverbed uses a unique technique to split the SSL handshake.

The handshake (the sequence depicted above) is the sequence of message exchanges at the start of an SSL connection. In an ordinary SSL handshake, the client and server first establish identity using public-key cryptography, and then negotiate a symmetric session key to use for data transfer. When you use Riverbed’s SSL acceleration, the initial SSL message exchanges take place between the Web browser and the server- side Steelhead appliance. At a high level, Steelhead appliances terminate an SSL connection by making the client think it is talking to the server and making the server think it is talking to the client. In fact, the client is talking securely to the Steelhead appliances. You do this by configuring the server-side Steelhead appliance to include proxy certificates and private keys for the purpose of emulating the server.

SSL SH example

When the Steelhead appliance poses as the server, there does not need to be any change to either the client or the server. The security model is not compromised—the optimized SSL connection continues to guarantee server-side authentication, and prevents eavesdropping and tampering. The connection between the two Steelheads is secured by the use of secure peering (a separate SSL tunnel running between the two appliances).

Citrix HDX and WAN optimization (part 1)


In this first of a  two part blog post about Citrix HDX I want to explore the impact of HDX on the Wide Area Network, part one will serve as the introduction, and in part two I will testrun some of the scenarios described in part one.

HDX came to be because Citrix was finally getting competitive pressure on its Independent Computing Architecture (ICA) protocol from Microsoft with RDP version 7 and beyond and Teradici/VMware with PCoIP. (And arguably other protocols like Quest EOP Xstream, HP RGS, RedHat SPICE, etc.)

Citrix’s reaction to these competitive pressures has been to elevate the conversation above the protocol, stating that a great user experience is more than just a protocol, thus Citrix created the HDX brand to discuss all the elements in addition to ICA that Citrix claims allow it to deliver the best user experience.

HDX Brands

HDX is not a feature or a technology — it is a brand.

Short for “High Definition user eXperience,” HDX is the umbrella term that encapsulates several different Citrix technologies. Citrix has created HDX sub-brands, these include the list below and each brand represents a variety of technologies:

  • HDX Broadcast (ICA)
    • Capabilities for providing virtual desktops and applications over any network. This is the underlying transport for many of the other HDX technologies; it includes instant mouse click feedback, keystroke latency reduction, multi-level compression, session reliability, queuing and tossing.
  • HDX MediaStream
    • Capabilities for multimedia such as sound and video, using HDX Broadcast as it’s base, including client side rendering (streaming the content to the local client device for playing via local codecs with seamless embedding into the remote session).
    • Flash redirection (Flash v2), Windows Media redirection.
  • HDX Realtime
    • Capabilities for real time communications such as voice and web cameras, using HDX Broadcast as it’s base, it includes EasyCall (VoIP integration), and bi-directional audio functionality.
  • HDX SmartAccess
    • Refers mainly to the Citrix Access Gateway (SSL VPN) and cloud gateway components for single sign-on.
  • HDX RichGraphics  (incl 3D, 3D PRO, and GDI+ remoting)
    • Capabilities in remoting high end graphics using HDX Broadcast as it’s base, uses image acceleration and progressive display for graphically intense images. (formerly known as project appollo)
  • HDX Plug-n-Play
    • Capabilities to provide connectivity for local devices and applications in a virtualized environment, including USB, multi-monitor support, smart card support, special folder redirection, universal printing, and file-type associations.
  • HDX WAN Optimization
    • Capabilities to locally cache bandwidth intensive data and graphics, locally stage streamed applications (formally known as Intellicache, relying mostly on their Branch Repeater product line).
  • HDX Adaptive Orchestration
    • Capabilities that enable seamless interaction between the HDX technology categories. The central concept is that all these components work adaptively to tune the unified HDX offering for the best possible user experience.


The goal of this post is to provide an overview of these HDX sub-brands and technologies that directly relate to the network, and WAN optimization, in order to have a clearer understanding of marketing vs. technology impact.

Not every HDX feature is available on both XenApp and XenDesktop, (and now also VDI in-a-box after the acquisition of Kaviza) the table below shows the feature matrix for both:

hdx tableHDX and the network

As stated before most of the HDX technologies are either existing ICA components or rely on ICA (HDX Broadcast) as a remoting protocol. As such we should be able to (WAN) optimize most of the content within HDX one way or another.

HDX MediaStream

HDX MediaStream is used to optimize the delivery of multimedia content, it interacts with the Citrix Receiver (ICA Client) to determine the optimal rendering location (see overview picture below) for Windows Media and Flash content.

Within HDX MediaStream the process of obtaining the multimedia content and displaying the multimedia content are referenced by the terms fetching and rendering respectively.

Within HDX MediaStream, fetching the content is the process of obtaining or downloading the multimedia content from a location external (Internet, Intranet, fileserver (for WMV only)) to the virtual desktop. Rendering utilizes resources on the machine to decompress and display the content within the virtual desktop. In a Citrix virtual desktop that is being accessed via Citrix Receiver, rendering of content can executed by either the client or the hypervisor depending on the policies and environmental resources available.


Adaptive display (server side rendering) provides the ability to fetch and render multimedia content on the virtual machine running in the datacenter and send the rendered content over ICA to the client device. This translates to more bandwidth needed on the network than client side rendering. Howerver in certain scenarios client side rendering can use more bandwidth than server side rendering, it is after all, adaptive.

HDX MediaStream Windows Media Redirection (client side rendering) provides the ability to fetch Windows Media content (inclusive of WMV, DivX, MPEG, etc.) on the server and render the content within the virtual desktop by utilizing the resources on the client hosting Citrix Receiver (Windows or Linux). When Windows Media Redirection is enabled via Citrix policy, Windows video content is sent to the client through an ICA Virtual Channel in its native, compressed format for optimal performance. The processing capability of the client is then utilized to deliver smooth video playback while offloading the server to maximize server scalability. Since the data is sent in its native compressed format this should result in less bandwidth needed on the network than server side rendering.

HDX MediaStream Flash Redirection  (client side rendering) provides the ability to harness the bandwidth and processing capability of the client to fetch and render Flash content. By utilizing Internet Explorer API hooks, Citrix Receiver is able to securely capture the content request within the virtual desktop and render the Flash data stream directly on the client machine. Added benefits include increased server hypervisor scalability as the servers are no longer responsible for processing and delivering Flash multimedia to the client.

This usually decreases the wan bandwidth requirements by 2 to 4 times compared to Adaptive Display (server side rendering).

HDX MediaStream network considerations

In some cases, Window Media Redirection (client-side rendering of the video) can used significantly more bandwidth than Adaptive Display (server-side rendering of the video).

In the case of low bit rate videos, Adaptive Display may utilize more bandwidth than the native bitrate of the Windows Media content. This extra usage of bandwidth actually occurs since full screen updates are being sent across the connection rather than the actual raw video content.

Packet loss over the WAN connection is the most restricting aspect of an enhanced end-user experience for HDX MediaStream.

Citrix Consulting Solutions recommends Windows Media Redirection (client-side rendering) for WAN connections with a packet loss less than 0.5%.

Windows Media Redirection requires enough available bandwidth to accommodate the video bit rate. This can be controlled using SmartRendering thresholds. SmartRendering controls when the video reverts back to server side rendering because the bandwidth is not available, Citrix recommends setting the threshold to 8Mbps.

WAN optimization should provide the most benefits when the video is rendered on the client since the data stream for the compressed Windows Media content is similar between client devices, once the video has been viewed by one person in the branch, very little bandwidth is consumed when other workers view the same video.

HDX RichGraphics 3D Pro

HDX 3D Pro can be used to deliver any application that is compatible with the supported host operating systems, but is particularly suitable for use with DirectX and OpenGL-driven applications, and with rich media such as video.

The computer hosting the application can be either a physical machine or a XenServer VM with Multi-GPU Passthrough. The Multi-GPU Passthrough feature is available with Citrix XenServer 6.0

For CPU-based compression, including lossless compression, HDX 3D Pro supports any display adapter on the host computer that is compatible with the application that you are delivering. To use GPU-based deep compression, HDX 3D Pro requires that the computer hosting the application is equipped with a NVIDIA CUDA-enabled GPU and NVIDIA CUDA 2.1 or later display drivers installed. For optimum performance, Citrix recommends using a GPU with at least 128 parallel CUDA cores for single-monitor access.

To access desktops or applications delivered with XenDesktop and HDX 3D Pro, users must install Citrix Receiver. GPU-based deep compression is only available with the latest versions of Citrix Receiver for Windows and Citrix Receiver for Linux.

HDX 3D Pro supports all monitor resolutions that are supported by the GPU on the host computer. However, for optimum performance with the minimum recommended user device and GPU specifications, Citrix recommends maximum monitor resolutions for users’ devices of 1920 x 1200 pixels for LAN connections and 1280 x 1024 pixels for WAN connections.

Users’ devices do not need a dedicated GPU to access desktops or applications delivered with HDX 3D Pro.

HDX 3D Pro includes an image quality configuration tool that enables users to adjust in real time the balance between image quality and responsiveness to optimize their use of the available bandwidth.

HDX RichGraphics 3D Pro network considerations

HDX 3D PRO has significant bandwidth requirements depending on the encoding used (NVIDA CUDA encoding, CPU encoding, and Lossless.)


When supported NVIDIA chipsets are utilized, HDX 3D Pro offers the ability to compress the ICA session in a video stream. This significantly reduces bandwidth and CPU usage on both ends by utilizing the NVIDA CUDA-based deep compression. If a NVIDIA GPU is not present to provide compression, the server CPU can be utilized to compress the ICA stream. This method, however, does introduce a significant impact on CPU utilization. The highest quality method for delivering a 3D capable desktop is by using the Lossless option. As the Lossless title states, no compression of the ICA stream occurs allowing for pixel perfect images to be delivered to the end point. This option is available for delivering medical imaging software that cannot have degraded image quality. This level of high quality imaging does come with the price of very high bandwidth requirements.

HDX RichGraphics GDI and GDI+ remoting

GDI (Graphics Device Interface) and GDI+ remoting allows Microsoft office specifically (although other apps, like wordpad, use GDI also) to be remoted to the client using native graphics commands instead of bitmaps. By using native graphics commands, it saves on server side CPU, saves network bandwidth and eliminates visual artifacts as it doesn’t need to be compressed using image compression.

General network factors for Remoting protocols (including RDP/RemoteFX, ICA, PCoIP, Quest EoP,…)

  • Bandwidth – the protocols take all they can get, 2 Mbps is required for a decent user experience. (see planning bandwidth requirements below)
  • Latency – at 50ms things start getting tough (sometimes even at 20ms)
  • Packet loss – should stay under 1%

Planning bandwidth requirements for HDX (XenDesktop example)

Citrix publishes the numbers below in a medium (user load) user environment, this gives some indication as to what to expect in terms of network sizing.

  •  MS Office-based                                     43Kbps
  • Internet                                                     85 Kbps
  • Printing (5MB Word doc)                          555-593 Kbps
  • Flash video (server rendered)                    174 Kbps
  • Standard WMV video (client rendered)      464 Kbps
  • HD WMV video (client rendered)               1812 Kbps

These are estimates. If a user watches a WMV HD video with a bit rate of 6.5 Mbps, that user will require a network link with at least that much bandwidth. In addition to the WMV video, the link must also be able to support the other user activities happening at the same time.

Also, if multiple users are expected to be accessing the same type of content (videos, web pages, documents, etc.), integrating WAN Optimization into the architecture can drastically reduce the amount of bandwidth consumed. However, the amount of benefit is based on the level of repetition between users.

Note: Riverbed Steelhead can optimize ICA/HDX traffic extremely well, we even support the newer multi-stream ica protocol. In part 2 of this blog I will demonstrate the effectiveness of Steelhead on HDX traffic and talk about our Citrix specific optimizations like our very effective Citrix QoS, Riverbed Steelheads also have the ability to decode the ICA Priority Packet Tagging that identifies the virtual channel from which each Citrix ICA packet originated.  As part of this capability, Riverbed specifically developed a packet-order queuing discipline that respects the ordering of ICA packets within a flow, even when different packets from a given flow are classified by Citrix into different ICA virtual channels.  This allows the Steelhead to deliver very granular Quality of Service (QoS) enforcement based on the virtual channel in which the ICA data is transmitted.  Most importantly, this feature prevents any possibility of out-of-order packet delivery as a result of Riverbed’s QoS enforcement; out-of-order packet delivery would cause significant degradation in performance and responsiveness for the Citrix ICA user.  Riverbed’s packet-order queuing capability is patent-pending, and not available from any other WAN optimization vendor.

Real world impact can be seen in the picture below of a customer saving 14GB of ICA traffic over a transatlantic link every month.citrixtraff

Riverbed Whitewater and Caringo CAStor

One of the coolest, in my humble opinion of course, solutions I was able to work with when I was at Dell, was the CAStor Object Storage system from Caringo, OEM’ed as the Dell DX6000.

The CTO and co-founder of Caringo is Paul Carpentier (the –car– in Caringo) known as the father of the Content Addressing concept. He previously worked at FilePool, where he invented the technology that created the Content Addressed Storage industry. FilePool, was sold to EMC who turned his work into the Centera platform.

CAS or Object Storage is one of three major forms of storage concepts (you can argue about how many forms of storage there are until the cows come home, but for the sake of simplicity I’ll focus on the three major ones here). The other two being file- and block storage.

In general the argument is that a certain type of content (documents, pictures, databases,…) requires, or at least works/fits better, when it resides on a specific type of storage. The access patterns of the different types of data make it that for example relational data is better served from a database on block storage as opposed to separate files from a file server.

Block storage has access to raw, unformatted hardware, when speed and space efficiency are most important this is probably a good fit. For example databases containing relational customer information.

File storage is most common to end-users, it takes a formatted harddrive and is exposed to the user as a filesystem. It is an abstraction on top of the storage device and thus you loose speed and space in favour of simplicity, it has inherent scalability issues (both in terms of the maximun size of a document, and the maximum number of documents you can store) and does not perform well when there is high demand for a certain piece of data by multiple users.

Object storage is probably the least familiar form of storage to the general public, it does not provide access to raw blocks of data, and it does not offer file based access*. It provides access to whole objects (files + metadata) or BLOBs of data. Typically you access it using a HTTP REST API specific to that CAS system. It lends itself particularly well to content like backup and archive data since that doesn’t change often. But also things like audio- and video files, medical images, large amounts of documents, etc. The object is to store a large, and growing, amount of data at a relatively low cost.


The data and metadata are both written to the object storage system and get a unique UUID to reference it later.

A CAS system consist of a number of storage nodes (typically x86 servers with internal disks in a JBOD configuration – remember the relatively inexpensive remark above), in the case of CAStor it is a symmetric distributed system meaning that all nodes are exactly the same and they perform the same functions.

Data is stored and retrieved using HTTP REST APIs, the client writes data to the cluster using HTTP POST commands, the cluster has an internal load balancing algorithm that decides which node in the cluster will respond (HTTP 301 redirect response) to the client, the client is redirected to the node selected by the cluster and repeats it’s POST command, this time the node will issue the HTTP 100 continue command to tell the client to  start writing the data.

Once the data is received the node issues the HTTP 201 response to let the client know the data has been written. The response also includes a UUID (Universally Unique Identifier) which is needed later to retrieve the data. Now since one of the goals of a CAS system is to make sure your data is kept safe, the cluster now needs to replicate your freshly written data unto other nodes in the cluster, this replica is written shortly after the original data has been received (so yes, in theory if the cluster fails very shortly after the initial write the data was not replicated). All objects on the CAStor system are exact replicas, there is no special “original data set”.

In order to get the replica onto one or more of the other nodes in the cluster, the other node will issue a HTTP 200 GET request for the original piece of data, if you later need to read the data again the client would also issue a HTTP 200 GET request to a random node in the cluster, the cluster will locate the data (UUID) and again redirect (301) the client to one of the nodes containing the data using it’s internal load-balancing algorithm.

Now Riverbed has a Cloud Storage Gateway called Whitewater specifically designed to store and retrieve backup and archive data on object storage systems. Whitewater is compatible with a set of REST APIs that allows it to interact with a lot of public cloud storage systems like Amazon S3, Google, Azure, HP Cloud Services, AT&T Synaptic Storage as a Service, Nirvanix, Telus, Rackspace, and others. But Whitewater also allows you to utilize the same concept in a private cloud setup as long as you either use OpenStack Swift, or EMC Atmos.


Recently Caringo announced the beta version of CAStor for OpenStack giving organizations an option to include a more robust, fully supported object store while still benefiting from the progress the OpenStack community is achieving.


So by leveraging the Whitewater integration with OpenStack, you can now have your backup and archive data stored on very robust object storage in your own private cloud. The number of replicas and the geographical location (or in case of a private cloud your DR center location) of the replicas is dictated by the cloud storage policy (managed by the Content Router in Caringo terms), Whitewater would write to the CAS system and the right policy would be invoked to replicate the data in the right locations.


The integration with your existing backup software is seamless since the Whitewater appliance presents itself using either CIFS or NFS towards your internal network while connecting to the object storage system using it’s specific APIs. Since Whitewater uses deduplication (and encryption in-flight and at-rest) you both benefit from having less data (dedupe) to store and having a cheaper (x86 servers) storage system to store it on.
In terms of scalability the system takes on new nodes and these are automatically used to redistribute the load if needed and go towards the unlimited (128bit namespace) scaling of the system.

How does your backup system scale these days..? 😉

*You can deploy a filesystem gateway to a CAS system so you can offer access via a filesystem if you want.

Managing Riverbed VSP with VMware vCenter

Riverbed has recently released version 2 of the EX platform software, this includes RiOS 8 and Virtual Services Platform v2. VSPv2 runs VMware ESXi 5 as it’s hypervisor layer and as such can be managed by VMware vCenter.

In this post I’ll first cover how to install EX 2 on your existing Riverbed Steelheads and then we’ll look at managing the hypervisor with VMware vCenter.

First thing you need is the new EX 2 firmware which can be downloaded from our support website (

Install the new firmware just like any regular update and reboot the appliance.

After the appliance has rebooted you will notice a new menu option under Configure, called Virtualization. Here you can install the VSP platform and also migrate any legacy VSPv1 packages you have installed.

Before you install ESXi, it is recommended you select the disk layout you need, this will allocate the internal disks on your Steelhead EX platform to your required setup (i.e. will you use the appliance only for Granite, only for VSP, or for a mix of both) by going to the Virtual Services Platform page.

After you have made your selection you can go ahead and launch the ESXi installation wizard.

As you can see the ESXi installation wizard uses a familiar colour scheme to VMware engineers.

The Wizard is pretty self explanatory.
Start by giving ESXi a management IP, this can be placed on either or both our Primary and AUX interface.

Enter the ESXi credentials in order to manage ESXi using vCenter. (or standalone).

If you want you can enter VNC credentials so you can have access to the ESXi console.

After verifying your settings click next to install and configure ESXi.

After the installation has finished you can manage the VSP platform by going to Configure, Virtualization, Virtual Services Platform.

Here you can see the resources currently allocated to the vSphere hypervisor, notice that at the moment we allocate 1 socket (with 2 cores – on the EX760 appliance) to the hypervisor, this is important for VMware licensing, should you choose to do so, if not you can keep running the free version (called embedded license) of the hypervisor by managing each EX appliance separately.

Connect to your vCenter server using the vSphere Client (or Webclient) and add the Steelhead appliance (using the ESXi management address) to vCenter.

At this point you can choose to add a license.

If you change the license, this is reflected on the management console (web interface) of the Steelhead appliance.

After adding the Steelhead appliance to vCenter you can manage it like any other vSphere server.

So there you have it, Steelhead EX version 2, managed by VMware vCenter 5.1.
Happy consolidating!