The benefits of Express route are well documented, and from the horses mouth, “ExpressRoute connections offer higher security, reliability, and speeds, with lower and consistent latencies than typical connections over the Internet”. A point of contention has been this so called reliability.

The only way to guarantee reliability in packet switched networks is QOS. Sure, through capacity management and or ‘massive pipes’ this can be avoided to a certain extent, but there is always a risk that one device/user/application could consume all of that bandwidth leaving other systems with a smaller piece of the pie.

We have recently adopted Skype for Business PSTN Calling and through this have developed a need for a priority queue on EF (voice traffic effectively) marked packets. Microsoft’s documentation is quite clear on how it thinks QOS should be deployed.

It needs to be deployed end-to-end.

A QoS capable connection must be configured end-to-end (PC, network switches and routers to the cloud) as any part in the path that fails to support QoS could degrade the quality of the entire call

Your express route provider must provide a class of service for EF packets.

Each ExpressRoute network service provider will have a class of service (QoS) that is appropriate for real-time voice and video. This COS is called ‘Expedited Forwarding’ (EF) for voice and ‘Assured Forwarding’ (AF) for video

The problem is that our network provider and Microsoft doesn’t mandate this requirement despite it’s clear stance on it. I’ve labored through support tickets both with our provider and Microsoft – The provider has no interest as there is no pressure (from Microsoft, or other customers at this point) and Microsoft won’t force them to implement it.

The post serves little more purpose than to say, ‘be careful’ when expecting your express route provider to support QOS. They may not, and they are not required to.

We are looking at alternative network providers.

A while ago Azure announced a new feature allowing you to connect VNets together without express route or VPN.

VNet peering is a mechanism that connects two virtual networks (VNets) in the same region through the Azure backbone network. Once peered, the two virtual networks appear as one for all connectivity purposes. They are still managed as separate resources, but virtual machines in these virtual networks can communicate with each other directly by using private IP addresses.

The product page can be found here.

The Existing Architecture

Originally, your architecture may have consisted of several VNets and all of these would have had a gateway (a billable item) and a form of connectivity between them – either express route or VPN. There are some inefficiencies here as all traffic between VNets needs to be routed via your express route circuit. Bandwidth and the actual location of your express route termination points may be problems here. A 1000mb/s link may be plenty between your on-prem networks and Azure, but is it enough for your traffic between all of your VNets? With the below architecture we have 4 gateways for 4 VNets.

Legacy Azure Architecture - A gateway in each VNET

Legacy Azure Architecture – A gateway in each VNet

The New Architecture

VNet peering does away with the requirement to have a gateway on every VNet. We can create a VNet which serves as the link between your on-prem networks and have the rest connected via VNet peering. This is even supported between VNets that are in different subscriptions. To use the same reference architecture but changed to adopt VNet peering, we will have something like the below.

VNet Peering Azure Architecture - One more VNet but 3 fewer gateways. Profit!

VNet Peering Azure Architecture – One more VNet but 3 fewer gateways. Profit!

So we have one extra VNet, but 3 fewer gateways. The gateways are what cost money in this setup, so this will not only be a cheaper setup, but a more efficient one. All traffic between VNets will be via the (blazing fast!!!) azure backbone and not your express route link. At the moment VNet peering traffic is free of charge, so you will save some money on your ingress/egress express route charges.

As with everything Azure, please do your own research as pricing and features change regularily.

We have an old Meridian phone system which is about the same size as a small car. As part of our wider strategy to move most of our core infrastructure to Azure, we have begun testing Microsofts “PSTN calling” add-on to the Skype for Business client. PSTN calling requires no on-premises infrastructure (ie SIP trunks or a session border controller) and can leverage either your internet circuit or express route links. The service has been available for some time in the US, although there is now a preview available in the UK for some select customers. For more information about the service itself, have a look here.

I have touched on our use of a squid proxy previously with regards to accessing Skype for business/O365. The need for this is necessitated mostly by my reluctance to write a firewall policy allowing end clients to access the huge range of (changing) subnets located in Azure directly. While our Cisco ASA’s do support DNS names, they do not support wildcard domains (how could they? they need to perform DNS resolution to resolve a name to an IP). Although MS best practice says “dont use a proxy, make exceptions”, I really don’t see a great way to write firewall policies to make this workable. The squid proxy provides a low latency method of accessing this vast list of URL’s for these services. We don’t do anything fancy on them – no SSL decryption, virus scanning, URL categorisation. A PAC file then defines which URL should go through Squid, and which should go through our main internet path (Zscaler FYI).

Traditionally we have protected voice traffic by marking packets with DSCP values and writing a QOS policy to protect this bandwidth. QOS really needs to be setup end to end, otherwise congestion somewhere can cause voice/video degradation. A QOS GPO will usually mark the Skype packets, as they are expected to be in a certain port range. The problem with a Squid server in the chain is that Skype will be accessed on the Squid listening port (3128) by default, not the Skype port ranges (49152:57500). You could write a GPO to mark anything going to the squid server, but what happens if non voice/video traffic goes there as well? This will be marked unintentionally. Text conversations, files, videos etc will all be tunnelled over port 3128 so we don’t be able to distinguish them by GPO.

What is the solution? I’m not so sure yet!

I recently got the second express route connection up and running at work. Out of our primary data center, London,  we have a 1Gb/s connection to Azure through their London point of presence.  The second of the two links is to the Amsterdam POP and is from our Manchester data center. As per Microsoft’s documentation, a resilient path is required from a different peering location in order for their SLA’s to be valid. This must also be an active/active configuration.

I have tested failover between these two links by altering some of the prefixes we advertise and this seems to work in an expected manner. Failover (on default BGP configuration) is between 15-20 seconds from one data centre to the other. The architecture looks like this:

Azure Express Route Toplogy

The express route topology I have implemented

I’ve chosen to use AS prepend for our advertised routes (which will control inbound traffic) and weight, which is a local value to each router, for the outbound routes. This way we should have a deterministic failover – The egress and ingress paths should always be London POP1 from ‘Datacenter A’ under normal circumstances. There are of course many ways with BGP to modify the best path, in order achieve the same or different routes for specific prefixes.

Under this same topology I run iBGP between our two data centres. Through this design a route map can be used to manipulate the path to azure that each data centers have. As we have two 1GB/s links, one at each data centre, it may be desirable to route some traffic out one link and the rest out another (an example might be sending all the backup traffic our the 2nd link).

I’m impressed with how flexible Azure allows this to be.

PS: When you have provisioned your second dedicated link, be sure to connect both to your VNET!

 

I’m currently involved in a project to move a data centre to ‘the cloud’. For commercial reasons, Azure was the chosen platform and I had been tasked with evaluating the networking capability there. While Amazon AWS has the luxury of a few years head start, and a better adoption from most networking/security players, Azure is very immature in this area. There is currently only one firewall vendor that exists in Azure and this is Barracuda.

Some of the azure networking limitations which exist as of today (06/2014):

  • No network level ACL’s between guests in a single subnet. Any host in a subnet has free-for-all access to other guests in the same subnet. You cannot create VACL’s like you would in a traditional DMZ environment. If one machine is compromised, there’s a good chance others will go with it.
  • There is a big reliance on guest OS firewalling. All the technical guides suggest you use some sort of firewall on the guest OS itself. Generally, iptables for Linux, Windows Firewall for Windows OS. Other vendors are don’t seem to be recommended.
  • Access between virtual nets must use public endpoints. This means a public IP addresses and Natting. A pubic IP address may represent several guests within a group, so the actual source of the traffic is obfuscated. It means controlling this access is less granular.
  • No role based access – your platforms team have as much access to network changes as your network team does.
  • By default, guests have full bound outbound access if they are internet accessible (ie have at least one endpoint). Once again, a firewall on the guest OS must be used to restrict this.
  • No gateway changes – there is no way to add a new default route to route traffic through a particular networking device ie a firewall.
  • Only one NIC per guest, no internal/external NIC topology permitted.

My impression is that Azure are pretty proactive about the platform, it’s being improved constantly but the networking doesn’t seem to get much love. I’ll be doing a lot of work on this over the coming months so I’ll post more information as I discover it.

Have a look at the currently requested features, some of this stuff is networking 101 pretty much! http://feedback.azure.com/forums/217313-networking-dns-traffic-manager-vpn-vnet.