Virtualization Featured Article

Orchestration and Automation Challenges When Every Network is a Snowflake

August 31, 2016
By Special Guest
Olivier Huynh Van, CTO and Co-founder of Glue Networks -

Today’s wide area networks (WAN) have reached a level of complexity unlike anything IT has seen before. This is particularly the case for network operators who must manage historically grown networks with a mix of new and old equipment, so called “brownfield” environments.

The variety of network equipment in large enterprises is staggering. Business users are asking for specific network throughput guarantees when it comes to their critical applications, legal departments require compliance with mandatory compliance frameworks, and operations are asked to do more with shrinking budgets. All these requirements do not easily align with existing network architectures, hence, network operators are continuously faced with a slew of granular parameter change requests, trying to meet ongoing network requirement changes without having the proper tools in place.

Traditional network engineers have managed configurations through hop-by-hop or router-to-router management, a practice that is no longer feasible in light of rapid changes to hardware and software. Network complexity today requires control of the entire environment from end to end, and the ability to apply policy changes with the utmost precision across a whole network. Let’s consider the path to making these changes:

  • Do they need access to public or private cloud (or both)?
  • What is the location of the end user’s device, be it a laptop or a PC?
  • Is the device hard-wired or connecting wirelessly to the network?
  • Is the device on a corporate LAN or coming from a remote site?
  • Will changes to the device involve other operations teams (think telco and VoIP)?

Once all that is determined, the change requests must be completed to meet the users request and satisfy the network performance requirement of the application while having minimal effect on the other services running on the network. 

Then there’s the age-old problem of keeping up with those changes. It isn’t unusual for teams to fall behind as the changes roll in but when operations is given a complex request, only to realize there is no current network documentation, the results can be disastrous.  Often senior network architects, designers and engineers must collaborate to minimize potential side effects which slows the process even more.

Rebuilding the Jet Engine While in Flight

The job of an engineer is akin to rebuilding a jet engine while in flight. In corporate networks, disruptions can be catastrophic to the business. Hence, all changes must undergo stringent verification and approval processes. Add in changes that must be made across domains—for example, security, disaster recovery, video and VoIP—and a disparate knowledge base between the different departments involved often leads to conflicting or incomplete change requests.

In addition, other issues are compounding this effect - for example, the lack of highly skilled “full-stack engineers,” professionals who can program software and make configuration changes with networking on the fly, regardless of the application or the equipment. Operations personnel tend to be generalists, and without a detailed skillset, they must rely upon subject matter experts who are well versed in various subsets of technology. Alternatively, they must wait on configuration changes to be released by the hardware or software vendor, or engage a contractor from a third party, a slow and painful process at best.

It is important to reiterate that most networks projects are “brownfield,” gradually enhancing and evolving an existing network. This is in contrast to the less common “greenfield” scenario, where the engineers can pick the newest router or product with the most up-to-date feature set. The average replacement cycle is usually dictated by the manufacturer’s support cycle, typically five to 10 years. Such a long cycle can create a significant mismatch of feature sets supported, since new firmware is issued every six - 12 months on average, and existing devices are only updated when necessary. Today, engineers can’t think of each router or device as being the same but instead must consider what version of the firmware was installed, what hardware plug-in extensions have come and gone, and then mix and match configurations that work from end to end.

Challenges Times Many

From real-time quotes on a trading desk to emergency services responding to 911 calls, the performance of today’s critical apps leaves no room for error, failure or downtime, bringing us back to the “jet in flight” scenario. In fact, it’s not just a single jet; it’s more like fixing multiple jet engines from multiple manufacturers while in flight.

Even when enterprises try to standardize deployed hardware the situation can become untenable. With nearly 30 devices types per vendor on the market, all featuring a variety of firmware, and two or three vendors in deployment, the numbers are stacking up. Given that each piece of equipment comes with a unique command line or user interface to successfully configure a device, it becomes a nightmare for even the savviest network engineer.

The next major challenge is security. High-profile breaches, cyber attacks and the liabilities organizations face have made the security policy of a “vanilla” access list on a router a thing of the past. Simple firewalling and access control lists are no longer sufficient; they must be extended with more sophisticated intrusion detection and prevention systems. And using public internet transport for low-cost bandwidth requires additional layers of secured virtual networks.

Capacity planning has also become a hot topic. Many companies are now leveraging Software as a Service (SaaS) and Infrastructure as a Service (IaaS) such as AWS and Azure. Network bandwidth usage and flow patterns have significantly changed, and are evolving rapidly with the introduction of additional services such as rollouts of Salesforce or Office 365, creating unique demands on networks that were originally designed for “internal use only” usage and security concepts.

The reality is that adding new equipment to today’s complex networks can’t be done overnight.

Managing the Unmanageable

Some organizations have moved towards a self-service IT portal to provide help desk functionality and auto-ticketing. However, these services are typically reserved for simpler, specific tasks, such as accessing a network drive or adding access to a printer, that don’t require a change to network functions.

IT helpdesk ticketing systems were designed to assign incoming change requests to the proper engineer, but even still, requests that are handled manually can take three to five days to implement if there is no new equipment. When new equipment like a new circuit or router is involved, turnaround time can take 30-60 days or more and may require senior architects and networking engineers to implement.

On the path to automation, here are a few tips how to break down the challenge:

  • Shift the mindset of the organization. With the right tools and proper planning, it does not need to be complicated with the right tools and proper planning. Start by automating specific pain points, such as a QoS policy first. A small success will help in the progression of applying an automation strategy when you move to the next most painful solution.
  • Define a strategy, then a policy to match.  The strategy usually comes from the top down and relates to network features like scalable, reliable, secure etc. Beginning to roll-out standard policies in your enterprise, starting from simple enforcement of standards for DNS and NTP to routing and tunneling, will go a long way.  These policies define the rules for each so that centralized control and policy management can be enforced.  This is the start of taking control of your network.
  • Define the intended functionality for each device. Abstraction and modeling of standard features and node and site configuration are vendor-agnostic and can be implemented no matter what device you are working on. Consider what you want the network to do, then build it into the rule set to develop your model.  Having models for network features, node types, site type and more will help tremendously when implementing the models in an automation/orchestration platform.
  • Capture the “as-is” state and deploy changes against your models. The challenging part here is getting started early. Conducting a discovery exercise on an existing network can be complex. Understanding exactly what the current configuration state is, based on an internal configuration management database or existing documentation vs. what is really there in terms of validating devices and firmware can add to this complexity. Once the network is known, it is possible to deploy changes against your modeled functionality and migrate it to the desired state. After this is complete, it is essential to immediately protect the network against unauthorized changes, automatically monitoring the configuration state to ensure no other changes are applied.
  • Performing lifecycle management. The network is always changing.  New sites are being deployed.  Devices are being upgraded. User requirements and applications using the network are ever evolving.  To enable and maintain agility to service these requests, centralized model-driven automation and orchestration are necessary.  This type of control of the network will ensure new devices are provisioned correctly.  Policies and best practices must be maintained throughout the network. To enable this, engineering and operations must not be slowed down from automation and orchestration tools and must enable DevOps to quickly develop, test and deploy new features into the production network.

Best Practices for Managing Your Snowflake

Every network is a snowflake, so determining what works best in your environment will result in a less tangled web. Best practices suggest integrating an automated orchestration strategy that is model-driven but still provides the freedom and flexibility to handle customization. Next-generation solutions can help, but best practices suggest that all networks’ nodes should be in a config-synchronized state that falls in line with the approved network model. In doing so, engineers can forever detangle the network knot.

About the author:

Olivier Huynh Van is the visionary inventor of Gluware technology and leads R&D for Glue Networks. Previously, Olivier was the former CTO of Yelofin Networks, and has 20 years experience designing and managing mission critical global networks for ADM Investor Services, Groupe ODDO & Cie, Natixis, Oxoid and Deutsche Bank. Olivier holds a Master’s Degree in Electronics, Robotics and Information Technology from ESIEA in Paris, France.

Edited by Alicia Young

Click here to share your opinion – Would color of equipment influence your purchasing decision, one way or another?

Featured Blog Entries

Day 4, Cisco Live! - The Wrap

Day 4 was the final day of our first ever Cisco Live! We had a great show, with many great conversations and new connections with existing and potential end users, resellers, partners and job hunters.

Day 3, Cisco Live!

Day 3 of Cisco Live is history! For Fiber Mountain, we continued to enjoy visits from decision makers and influencers who were eager to share their data center and structured cabling challenges.

Day 2, Cisco Live!

Tuesday was Day 2 of Cisco Live for Fiber Mountain and we continued to experience high levels of traffic, with many high value decision makers and influencers visiting our booth. One very interesting difference from most conferences I attend is that there are no titles on anyone's show badges. This allows open conversations without people being pretentious. I think this is a very good idea.

Day 1, Cisco Live!

Fiber Mountain is exhibiting at Cisco Live! In Las Vegas for the first time ever! Our first day was hugely successful from just about any perspective - from quantity and quality of booth visitors to successful meetings with customers.

Industry News