By Cathy Gadecki - June 24, 2022 6 Mins Read
The size and complexity of enterprise data center networks continue to grow, and this has created a serious burden for operations teams tasked with “keeping the lights on” and meeting service level agreements. A 2021 study from The Uptime Institute showed that 47 percent of data center operators say they’re struggling to find qualified candidates for open jobs, an increase from 38 percent in 2018. And as the adoption of cloud, IoT, and edge computing increases, they add to the workload of network engineers and technicians who are already overburdened.
Ultimately, no human being can keep track of all the links, devices, virtualizations, and policies that comprise a modern data center network. Automation is the only viable solution. The major problem that operations teams are now dealing with – far too much repetitive, manual work, leading to an increased chance of human error and burn-out – can be resolved with automation-based tools. These tools can accomplish tasks that took days or weeks in a matter of hours or minutes while also providing the change controls, up-to-date documentation, and compliance audits essential to network uptime and fast problem resolution. The tools are revolutionizing data center management, alleviating the workload of operations teams by taking administrative tasks off their plate, and substantially improving network reliability.
The New Role of Automation in Day 2 Operations
When carried out manually, as they typically are today, Day 2 network operations consume an enormous amount of time and energy. Automation that considers the full operations life cycle can reduce the time and effort involved in the workflow, from the configuration and deployment of new services to the day-in, day-out tasks of routine change management, monitoring, troubleshooting, and root cause analysis. These tasks, such as finding the underlying reason an application is unreachable, are labor-intensive and give network operators little time to focus their expertise on innovation and growth plans to support enterprise goals.
One of the most revolutionary aspects of automation is the concept of Intent-Based Networking (IBN), a software-based approach to network configuration, deployment, and ongoing operations that use high levels of intelligence, analytics, and orchestration to improve network operations and uptime. With IBN, engineers focus on defining what the desired results of a network must be in terms of connectivity, speed, availability, segmentation, and other such parameters required by business needs. Once these parameters are declared, automation tools figure out what configurations and policies are required to meet them. In other words, network operators focus on the “what,” while automation determines the “how.”
With the system fully knowing this “how,” it can greatly assist during common Day 2 operations tasks. These include monitoring for configuration drift and potential problems, alerting to potential issues in planned changes, quickly identifying the specific root cause of a problem amongst a sea of real-time data and alerts, and providing a means to roll back the entire network in minutes to a good working condition.
An essential benefit to having automated monitoring and maintenance of data center operations is that it reduces human error. A recent survey by Juniper Networks indicated that 80 percent of businesses experience network errors caused by human mistakes on a regular basis.
Faster Problem Resolution
While traditional network management tools might be able to provide some troubleshooting insights, an advanced network management tool—underpinned by intelligent automation, such as Intent-Based Analytics (IBA)—can predict issues proactively.
IBA uses real-time telemetry to continuously monitor the health of network nodes and can often predict a failure before it happens, eliminating causes of downtime altogether. Further, the system tracks the configuration state of the network and compares it with the intended state defined by the IBN process. Through these means, in many cases, problems can be resolved before users are even aware.
Additionally, modern automation tools provide important predictive capabilities during planned network changes. They enable network operators to analyze the effects of a change and make sure in advance that there will be no unintended consequences, such as removing a necessary routing path.
No matter how diligent operators may be, however, it’s a fact of data center life that systems go down. When problems arise, IT teams often find out after the issues have already impacted business processes. This puts them in reactive mode, consuming all of their time to resolve issues.
When a problem is severe enough to cause lag time or dropped connections for users, IT teams would normally need to manually check and sort through many different alerts to find the one underlying issue triggering the multitude of resulting problems and failures. Traditionally, the variety and a large amount of possible root causes had operators bouncing from one guess to another, whether it was a routing issue, a misconfiguration, a malfunctioning application creating traffic bursts, an intermittent interface failure, or an optic brownout – the list could go on and on. With the visibility and automated identification that IBA can provide, teams are directed to the problem immediately.
Further, while fixes are underway, in these systems, a fast path to restoration is available: roll back the entire network to a prior working setup. This gets services quickly restored through a single command, giving operators the necessary space to test and confirm permanent fixes will be effective.
All these Day 2 automation capabilities are of huge value. In data centers that rely on manual intervention alone, resolving an issue, especially an intermittent problem, can take anywhere from hours to days. IBA reduces these timeframes dramatically. Teams with the benefit of automation tools have shown 80 percent improvements in operational efficiency and 70 percent improvement in Mean-Time-To-Repair (MTTR).
Automation: From Nice-to-Have to Must-Have Status
Data centers have grown in size and complexity to the point where human beings simply cannot manage them effectively without help. Fortunately, new automation-based technologies are available that can bring higher speed, greater efficiency, and, most importantly, higher reliability to data center operations. With the right tools, operators can dramatically reduce the burden of manual administrative tasks and focus their efforts on exploring new technologies for improved operations and innovation.
Cathy Gadecki has over 20 years of experience leading complex, interdisciplinary teams on both the vendor and customer sides at companies like Brocade and NetApp. Now, Senior Director of Enterprise at Juniper, Cathy Gadecki serves as an industry expert, establishing goals and directions of launches and providing decision-making and redirection to adapt to new learnings and changing dynamics in the market and team.
A Peer Knowledge Resource – By the CXO, For the CXO.
Expert inputs on challenges, triumphs and innovative solutions from corporate Movers and Shakers in global Leadership space to add value to business decision making.Media@EnterpriseTalk.com