Lowering the aggregation link costs while improving service quality
22.2.2019

Automated decongestion of critical network links

A major international service provider was facing service quality issues due to regular, but non-deterministic congestion on their international, expensive aggregation links. NIL helped them develop a solution that automatically detects congestion and rebalances traffic across multiple links.

In regions where inter-domain and long-distance bandwidth are still very expensive, telecommunication operators use a variety of techniques to balance their traffic across a number of aggregation links. Because traffic patterns constantly change, it is almost impossible to reliably predict congestion over a number of links, which often causes the deterioration of service to unacceptable levels.

Our client, a large international telecommunication operator, provides integrated services for three main sectors - individuals, business, and carriers. They deployed one of the largest wireless networks in the region and one of the widest fiber-to-the-home (FTTH) networks. They also run one of the biggest data centers in the region and worldwide.

Traffic congestion on expensive aggregation links

In spite of their state-of-the-art infrastructure, fast userbase growth and service expansion caused regular congestion on our client’s aggregation links, mostly due to the failure of upstream links. Specific challenges that our client was facing were:

  • Traffic congestion on links toward peering points
  • Traffic congestion on links toward transit links
  • Traffic congestion on internal links due to the failure of the transit links
  • Traffic congestion on internal links due to the failure of aggregation links

Using only manual interventions, the customer was unable to resolve or control the congestion in a timely manner. This resulted in service outages and service degradation, causing SLAs violations and subsequent penalties. In addition, the situation was also damaging their reputation and making customers dissatisfied.

Cutting the re-provisioning time from days to seconds

To maintain their position as the leading regional service provider, our client decided to resolve these outages and increase the overall quality of their services as soon as possible. They wanted to deploy a solution that would be able to immediately recognize the congestion and effectively resolve it, without a negative impact on service delivery.

After considering different approaches and based on their technological stack, they decided to involve NIL in solution development. We specialize in integration and software-development for network automation platforms, and so we were able to create a customized network automation solution tailored to the client’s needs.

NIL’s solution models the network and suggests to configure a more optimal usage of network paths in the client’s core environment. With our solution, the client was able to cut the re-provisioning time from days down to seconds. This was achieved by the following solution components:

  • The client’s operations are automatically notified of the peering link outage (about to cause congestion elsewhere)
  • The automation solution instantly proposes the best evacuation plan for the affected prefixes through a simple web user interface
  • The client’s operations team fine-tunes or just confirms the change
  • The automation solution reliably pushes the new configurations to the network devices
  • When the failed peering link is back online, the solution suggests and applies the appropriate rollback solution.

The solution also enabled the client to optimize network traffic between multiple online links even when there are no links offline. It is also possible to completely automate the process, without any manual intervention of their operations staff – thus promoting the network to an intelligent self-learning and self-healing network.

In addition, the customer will be able to utilize these traffic management capabilities in order to offer new services to their customers, e.g. online gamers require low latency and jitter toward specific destinations on the Internet in order to be competitive in their game.

Cisco NSO based network automation solution

NIL’s solution is based on Cisco NSO technology coupled with the custom application for traffic management. It is built on the principles of machine learning as well as network self-care and self-adaptation.

Overall, this consulting and development engagement included the following technologies: