HyperScale Data Centers Featured Article


Evolution of the Network Engineer


May 30, 2018
By Special Guest
Joe Clarke, Distinguished Services Engineer, Cisco Systems -

Each year, thousands of Cisco customers, partners, and employees descend on a city in Europe to discover, learn about, and collaborate over the latest Cisco technologies at Cisco Live. In order to enable the content sharing, interactive labs, exciting demos, and live streaming, there is a powerful network built and operated by Cisco engineers from all throughout the company.

The network at Cisco Live has become a critical piece of the conference, much like the network forms a vital business enabler for customers in this digital world. The team of individuals responsible for the network, which has grown every year as the size and complexity of the network has grown, ranges from early-in-career associate systems engineers to experienced CCIEs (Cisco Certified Internetwork Experts), technical leaders, and distinguished engineers.   Many of these people return every year, bringing a richer set of experiences and provide unique perspectives and talents that help ensure a successful network deployment. This case study looks at the year-over-year evolution of the team in general, examining how the type of work has changed at each level.

When the team first began building out the network for Cisco Live in Europe, it was a relatively small team, and they operated in a heavily siloed manner. It wasn’t because they hadn’t worked together before, but rather because the different tasks involved in the build-out didn’t overlap much. There was the core routing team, the core switching team, wireless, access, data center and services, and network management.  People knew their part of the network and did what they thought was best to accomplish their goals. There was no centralized automation, which meant a lot of configuration by hand and per device. Switches were unboxed, powered on, and configuration was copy-and-pasted into them. Complex Quality of Service (QoS) configuration on the edge routers was done by hand or with similar copying and pasting.

This manual configuration approach brought challenges that were compounded by the fact that the Cisco Live network was very dynamic, often needing last-minute changes either driven by stakeholder needs or venue oddities. When there was a problem, it often meant breaking out a console cable and running across the venue to troubleshoot. Network visibility was limited to the tools that were on hand. Because of the lack of strong collaboration, sometimes work was duplicated. Ultimately, the team was very reactive and, because there wasn't pervasive network visibility or a strong holistic automation, this led to long hours.

This was very apparent in my own work as the lead for the data center build-out and operation.

In 2015, we needed to redesign the data center for Cisco Live Europe. Since this was the first time we were deploying this architecture, a lot of the configuration was done manually. This gave me the opportunity to learn how things work and become aware of any pitfalls or “gotchas.” But when it came to replicating, for example, adding a new VLAN, it meant having to manually configure [end of sentence is missing]

But if I had to replicate, say, adding a new VLAN, it meant having to manually configure four switches, four UCS fabrics, and 16 compute hosts. To carefully apply all the necessary commands on all of the devices took a good 20 minutes.

Even with the diligent pre-stage work, sure enough, once onsite, the need came to add a new VLAN.  After going through all of the manual steps while the network was running, I said, “I'm not doing that again.” I come from a programming and software background, and I had been spreading the message of automation at Cisco Live in various breakout sessions. It was time I practice what I preached, so I began to build scripts that would automate the whole process. 

I developed a set of Python and shell scripts that would take the parameters needed to create an end-to-end VLAN and push it to each device while checking that the configuration was properly applied. These scripts didn’t simply automate the CLI, but they used the available data model-driven APIs that the devices provided. This meant the configuration was applied more quickly and more reliably. So, at the next Cisco Live, none of the configuration had to be applied by hand.  Instead of one VLAN taking 20 minutes, four new VLANs were created in two minutes.

This successful use of automation was addictive. The next year, since we were using a similar data center architecture, I spent more time building scripts to monitor the health of the data center, the services running in it, and the network itself. I started to build Python scripts that gathered data using SNMP, device APIs, and application APIs; processed those data; and then pushed informative messages to various Spark rooms in which the NOC staff lurked. This allowed all of us to stay on top of issues such as routing table changes, DHCP pool exhaustion, interfaces going into an error-disabled state, and devices becoming unreachable. We also had a bot that would tell you where a given user, MAC address, or IP address was in the venue.

The excitement for scripting and automation aren't just for those with a programming background, either. Application Programming Interfaces (APIs) are becoming more prevalent in applications as a way to do integration and customization that paves the way towards innovation.  Near the end of Cisco Live this year, one of the wireless engineers in the NOC came over to me and asked for my help. The Cisco Mobile Experience (CMX) application had been updated with a feature in its analytics system that could make a remote call to an application. We had been using this in the NOC to have a Spark bot notify us when a cart (which had a wireless sensor on it) went out of a specified area. He wanted to develop a script to demo for his customers. This engineer didn’t have any former scripting experience, but he saw the value in being able to do some basic coding. We sat for a while adapting the NOC Spark bot code to meet his needs, and when we were done, he had something working and useful so that he could show off the power of this feature to his customers.

Over the years, both Cisco Live and the network operations team have grown. Eight years ago, there were about 5,000 attendees and a team of 35 operations engineers. At this past Cisco Live in Barcelona, there were about 15,000 attendees and a team of 70 network operations engineers, many of whom were new to Cisco and participating in the NOC for the first time. Each one of them played an important role in delivering an automation-centric, production-class network; many of them had key takeaways and insights into what they as individuals and the whole team can do better next year. It’s interesting to note that while the NOC team has doubled in eight years, the size of the event has tripled. The capacity of the network, as well as the number of services, will continue to increase, and because of increased use of automation, the NOC team can more than keep up.

As networks move from a more traditional device-by-device management paradigm to one that is more driven by automation and orchestration to one that is highly intent-focused, the way network interactions are performed changes, but the need for engineers at every skill level remains. Junior engineers will interact with the network less through the terminal and more through web-based portals or via Application Programming Interface (API) invocations. More senior engineers will use these same web-based tools to simulate new network designs and architectures and leverage the APIs to build custom integrations in order to tie the network more tightly to the core business (and make it a true digital differentiator). 

General Transitional Activities

In summary, the value of the network engineer is stronger than ever. The workflow is shifting, however, to provide a more automation-focused set of network touch points. Additionally, the work tasks themselves are focusing on quicker, more scalable delivery; more reliable network interactions; and more time spent on higher-value network designs, business integrations, and mentoring to ensure a long-lasting pipeline of talented experts.

Below is a table of how this transition has been observed in the network operations center at Cisco Live.

Junior Engineers

From

To

Pulling cable and installing switches/APs

Using an automation tool to set specific configuration elements

Using console cables to provision switches

Out-of-the-box automation to stage switches with an initial config

Waiting for someone to report a problem

Getting alerts via Spark about issues and proactively going to help users or remediate problems

Time spent waiting to get network or user or network details

Empowered through automation, orchestration, and programmability to access tools through web portals or via Spark bot interactions

Time spent running around, putting out fires

More opportunity to work with senior engineers to learn about the back-end operations

Senior Engineers

From

To

Jumping device-to-device to gather logs

Automated log consolidation and holistic network interface to gather real-time data

Multiple groups overlapping with each other

Tight collaboration and a concerted effort to automate config and monitoring

Repetitive copy and paste of configuration

Fully automated, complex configuration generated from a web tool, pushed to all required devices and validated

Interrupt driven on-boarding of new engineers

Time spent to develop self-service portal to add new users

Limited visibility of network health

Creation of API-driven dashboards that provide pertinent and actionable insight into network operations

Multiple days spent in pre-stage configuring and testing

Effort re-focused on creative architectural solutions to known challenges and how to automate the deployment and testing

Long days (and nights) onsite dealing with issues

Time freed up to work with junior engineers to grow talent and build apprentices to scale the next generation of automation

About the author: Joe Clarke is a distinguished services engineer at Cisco Systems and has been an integral part of the Cisco Engineering team for over 15 years. Joe holds a CCIE certification and is a frequent speaker at Cisco Live conferences.




Edited by Maurice Nagle








Click here to share your opinion – Would color of equipment influence your purchasing decision, one way or another?





Featured Blog Entries

Day 4, Cisco Live! - The Wrap

Day 4 was the final day of our first ever Cisco Live! We had a great show, with many great conversations and new connections with existing and potential end users, resellers, partners and job hunters.

Day 3, Cisco Live!

Day 3 of Cisco Live is history! For Fiber Mountain, we continued to enjoy visits from decision makers and influencers who were eager to share their data center and structured cabling challenges.

Day 2, Cisco Live!

Tuesday was Day 2 of Cisco Live for Fiber Mountain and we continued to experience high levels of traffic, with many high value decision makers and influencers visiting our booth. One very interesting difference from most conferences I attend is that there are no titles on anyone's show badges. This allows open conversations without people being pretentious. I think this is a very good idea.

Day 1, Cisco Live!

Fiber Mountain is exhibiting at Cisco Live! In Las Vegas for the first time ever! Our first day was hugely successful from just about any perspective - from quantity and quality of booth visitors to successful meetings with customers.

Industry News