I’ve been involved in a few discussion with people at Microsoft and it seems there has been a swing in sentiment with regards to whether customers should use Microsoft Peering over Express route at all. The long and short of Microsoft Peering is that it enables several Azure services to be available over Express route rather than the general internet. This includes Exchange Online, SharePoint Online, Skype for Business, and Dynamics 36. It excludes services such as Azure websites and Azure storage as these are available over ‘Public Peering’ instead.
When we first deployed Office365 (probably a good 4 years ago now I think?) we saw a perceivable increase in performance when running over Express Route rather than the internet – Our PA’s were running Outlook in the non-cached mode due to the number of mailboxes and calendars they were looking after. Express route made a big difference here.
It’s hard to say at this point why the recommendation would have changed from a reliable path (Express route) to big bad internet routing. To be fair, Microsoft would probably peer with most major ISPs in some way or another so maybe that’s where they are coming from (that it’ll effectively be the same number of hops there anyway?). It’s still a contended, and non QOS supporting path.
I see some murmurings on the internet that they are looking to combine the Public and Microsoft peering into a single routing domain. I’d welcome this as it means a few less BGP peering points. All of the prefixes they advertise now have BGP communities so it’s easy enough to distinguish different services being advertise over the same peering.
The benefits of Express route are well documented, and from the horses mouth, “ExpressRoute connections offer higher security, reliability, and speeds, with lower and consistent latencies than typical connections over the Internet”. A point of contention has been this so called reliability.
The only way to guarantee reliability in packet switched networks is QOS. Sure, through capacity management and or ‘massive pipes’ this can be avoided to a certain extent, but there is always a risk that one device/user/application could consume all of that bandwidth leaving other systems with a smaller piece of the pie.
We have recently adopted Skype for Business PSTN Calling and through this have developed a need for a priority queue on EF (voice traffic effectively) marked packets. Microsoft’s documentation is quite clear on how it thinks QOS should be deployed.
It needs to be deployed end-to-end.
A QoS capable connection must be configured end-to-end (PC, network switches and routers to the cloud) as any part in the path that fails to support QoS could degrade the quality of the entire call
Your express route provider must provide a class of service for EF packets.
Each ExpressRoute network service provider will have a class of service (QoS) that is appropriate for real-time voice and video. This COS is called ‘Expedited Forwarding’ (EF) for voice and ‘Assured Forwarding’ (AF) for video
The problem is that our network provider and Microsoft doesn’t mandate this requirement despite it’s clear stance on it. I’ve labored through support tickets both with our provider and Microsoft – The provider has no interest as there is no pressure (from Microsoft, or other customers at this point) and Microsoft won’t force them to implement it.
The post serves little more purpose than to say, ‘be careful’ when expecting your express route provider to support QOS. They may not, and they are not required to.
We are looking at alternative network providers.
We have an old Meridian phone system which is about the same size as a small car. As part of our wider strategy to move most of our core infrastructure to Azure, we have begun testing Microsofts “PSTN calling” add-on to the Skype for Business client. PSTN calling requires no on-premises infrastructure (ie SIP trunks or a session border controller) and can leverage either your internet circuit or express route links. The service has been available for some time in the US, although there is now a preview available in the UK for some select customers. For more information about the service itself, have a look here.
I have touched on our use of a squid proxy previously with regards to accessing Skype for business/O365. The need for this is necessitated mostly by my reluctance to write a firewall policy allowing end clients to access the huge range of (changing) subnets located in Azure directly. While our Cisco ASA’s do support DNS names, they do not support wildcard domains (how could they? they need to perform DNS resolution to resolve a name to an IP). Although MS best practice says “dont use a proxy, make exceptions”, I really don’t see a great way to write firewall policies to make this workable. The squid proxy provides a low latency method of accessing this vast list of URL’s for these services. We don’t do anything fancy on them – no SSL decryption, virus scanning, URL categorisation. A PAC file then defines which URL should go through Squid, and which should go through our main internet path (Zscaler FYI).
Traditionally we have protected voice traffic by marking packets with DSCP values and writing a QOS policy to protect this bandwidth. QOS really needs to be setup end to end, otherwise congestion somewhere can cause voice/video degradation. A QOS GPO will usually mark the Skype packets, as they are expected to be in a certain port range. The problem with a Squid server in the chain is that Skype will be accessed on the Squid listening port (3128) by default, not the Skype port ranges (49152:57500). You could write a GPO to mark anything going to the squid server, but what happens if non voice/video traffic goes there as well? This will be marked unintentionally. Text conversations, files, videos etc will all be tunnelled over port 3128 so we don’t be able to distinguish them by GPO.
What is the solution? I’m not so sure yet!