SLO-aware Robust and Resilient 5G Edge Infrastructure

Building SLO-aware, highly available, and secure cellular infrastructure for next-generation access-edge computing

Modern access-edge applications – from real-time video analytics to AR/VR – depend on 5G RAN stacks that are virtualized and disaggregated across edge datacenters. This project develops a holistic toolkit that makes cellular infrastructure robust against three chronic pain points: (1) Unpredictable end-to-end performance that breaks tight service-level objectives (SLOs); (2) Outages caused by failures or upgrades inside virtualized RAN (vRAN) components; and (3) New attack surfaces on the Ethernet-based fronthaul network. By enforcing SLO-awareness, high availability, and strong security, we show that next-generation edge services can run atop commodity 5G networks without costly over-provisioning or proprietary hardware.

Resilient 5G vRAN infrastructure

In virtualized RANs a PHY crash or DU upgrade can disconnect users for 6.2 s on average – far above the five-nines budget of < 6 minutes/year – and existing failover mechanisms rebuild state too slowly. Worse, when a primary DU dies, protocol time-outs of 3–30 seconds at the CU cascade into mass UE drops.

Slingshot (Lazarev et al., 2023) cleanly hot-migrates the stateless PHY between servers via an in-switch middlebox and an Orion control loop, yielding zero user-visible downtime during planned maintenance and instant recovery from unplanned PHY faults. Atlas (Xing et al., 2023) extends resilience to the stateful distributed unit (DU): a fronthaul NF shares one RU between a source and backup DU, while a midhaul NF pre-notifies the CU of failures and a controller orchestrates proactive handovers or reactive failovers. The design re-establishes connectivity in ≈100 ms after a DU crash – an order-of-magnitude faster than stock vRAN behavior – with no throughput loss during proactive migration. Slingshot and Atlas jointly furnish sub-second availability across the full RAN stack.

Secure fronthaul network

Ethernet-based eCPRI/O-RAN fronthaul traffic still lacks mandatory integrity protection; standards bodies considered MITM attacks “unlikely and low-impact”. In this work (Xing et al., 2024), we demonstrate the opposite: software-only adversaries who bypass 802.1X can inject or modify fronthaul packets to trigger cell-wide signaling storms or corrupt control blocks, impacting DUs and UEs across whole regions.

This work introduces two attack families, FRONTSTORM (mass handover storms) and FRONTSTRIKE (signal-level corruption), and shows they scale line-rate with minimal hardware. It then evaluates countermeasures, finding that MACsec with AES-NI adds only ~2.4 µs per jumbo packet and selective header-only protection can cut this to < 0.3 µs, making full-time integrity both feasible and essential. The work urges standards to mandate integrity and deploy lightweight anomaly detectors along the fronthaul path.

Predictable 5G access edge computing

Latency-critical mobile-edge applications still miss their end-to-end deadlines because the 5G/MEC pipeline is governed by a patchwork of independent schedulers. Radio bandwidth, transport queues and edge-compute workloads all fluctuate, yet each layer allocates resources as if its own deadline were the only one that matters. Without any common notion of how much time-budget (or “slack”) remains, the network may deliver packets only after their compute window has vanished, or a frame that finishes inference on the edge may find no airtime left on the uplink. Even under moderate load this siloed decision-making can drag SLO success rates below one-quarter.

ARMA (Yi et al., 2025) introduces a lightweight controller above the O-RAN RIC that lets the application and the RAN share just enough state—frame deadlines, DNN load, instantaneous RB availability—to co-optimise bitrate, model depth, RB allocation and GPU time on a per-frame basis. By continuously re-splitting each request’s latency budget between “over-the-air” and “on-the-edge” stages, ARMA lifts SLO satisfaction from roughly 26% to about 97% on an Open-RAN video-analytics testbed, with negligible radio overhead. As an ongoing work, we are developing a resource scheduling framework that enables applications beyond video analytics to meet their SLOs.

Publications

2025

MobiSys

Towards End-to-End Latency Guarantee in MEC Live Video Analytics with App-RAN Mutual Awareness

Juheon Yi, Goodsol Lee, Seokgyeong Shin, Minkyung Jeong, Daehyeok Kim, and Youngki Lee

In Proceedings of 23rd ACM International Conference on Mobile Systems, Applications, and Services, June 2025

Abs PDF

While mobile live video analytics apps require end-to-end latency guarantee for responsiveness and immersiveness, achieving consistent low latency is challenging due to complex fluctuations of wireless channel and scene complexity; for example, latency SLO satisfaction rate drops to as low as 26% in commercial 5G MEC platforms. Prior works mostly focus on either app-only (bitrate, DNN adaptation, or GPU allocation) or RAN-only (radio resource allocation) scheduling, with mutual ignorance of the other side resulting in mismatched scheduling decisions and frequent SLO violations. Coordinating the two schedulers is also challenging, as they are run separately by network and cloud operators with disjoint control. We present ARMA, an end-to-end live video analytics system with app-RAN mutual-awareness for high end-to-end latency SLO satisfaction in MEC. We design a mutually-aware decoupled scheduling mechanism on top of RAN Intelligent Controller (RIC) in Open-RAN architecture that fosters cooperative interaction between the two operators’ schedulers while preserving operational proprietaries. We prototype an Open RAN-enabled 5G MEC testbed and evaluate ARMA, showing that it achieves 97% SLO satisfaction rate.

2024

Usenix Security

On the Criticality of Integrity Protection in 5G Fronthaul Networks

Jiarong Xing, Sophia Yoo, Xenofon Foukas, Daehyeok Kim, and Michael K. Reiter

In Proceedings of 33rd USENIX Security Symposium, August 2024

Abs PDF

The modern 5G fronthaul, which connects the base stations to radio units in cellular networks, is designed to deliver microsecond-level performance guarantees using Ethernet-based protocols. Unfortunately, due to potential performance overheads, as well as misconceptions about the low risk and impact of possible attacks, integrity protection is not considered a mandatory feature in the 5G fronthaul standards. In this work, we show how vulnerabilities from the lack of protection can be exploited, making attacks easier and more powerful than ever. We present a novel class of powerful attacks and a set of traditional attacks, which can both be fully launched from software over open packet-based interfaces, to cause performance degradation or denial of service to users over large geographical regions. Our attacks do not require a physical radio presence or signal-based attack mechanisms, do not affect the network’s operation (e.g., not crashing the radios), and are highly severe (e.g., impacting multiple cells). We demonstrate the impact of our attacks in an end-to-end manner on a commercial-grade, multi-cell 5G testbed, showing that adversaries can degrade performance of connected users by more than 80%, completely block a selected subset of users from ever attaching to the cell, or even generate signaling storm attacks of more than 2500 signaling messages per minute, with just two compromised cells and four mobile users. We also present an analysis of countermeasures that meet the strict performance requirements of the fronthaul.

2023

SIGCOMM

Resilient Baseband Processing in Virtualized RANs with Slingshot

Nikita Lazarev, Tao Ji, Anuj Kalia, Daehyeok Kim, Ilias Marinos, Francis Y. Yan, Christina Delimitrou, Zhiru Zhang, and Aditya Akella

In Proceedings of ACM SIGCOMM conference, September 2023

Abs PDF

In cellular networks, there is a growing adoption of virtualized radio access networks (vRANs), where operators are replacing the traditional specialized hardware for RAN processing with software running on commodity servers. Today’s vRAN deployments lack resilience, since there is no support for vRAN failover or upgrades without long service interruptions. Enabling these features for vRANs is challenging because of their strict real-time latency requirements and black-box nature. Slingshot is a new system that transparently provides resilience for the vRAN’s most performance-critical layer: the physical layer (PHY). We design new techniques for realtime workload migration with fast RAN protocol middleboxes, and realtime RAN failure detection. A key insight in our design is to view the transient disruptions from resilience events to RAN computation state and I/O similarly to regular wireless signal impairments, and leverage the inherent resilience of cellular networks to these events. Experiments with a state-of-the-art 5G vRAN testbed show that Slingshot handles PHY failover with no disruption to video conferencing, and under 110 ms of disruption to a TCP connection, and it also enables zero-downtime upgrades.
MobiCom

Enabling Resilience in Virtualized RANs with Atlas

Jiarong Xing, Junzhi Gong, Xenofon Foukas, Anuj Kalia, Daehyeok Kim, and Manikanta Kotaru

In Proceedings of 29th ACM International Conference on Mobile Computing and Networking, October 2023

Abs PDF

Virtualized radio access networks (vRANs), which allow running RAN processing on commodity servers instead of proprietary hardware, are gaining adoption in cellular networks. Two properties of the vRAN’s “Distributed Unit (DU)” that implements the lower RAN layers—its real-time deadlines and its black-box nature—make it challenging to provide resilience features such as upgrades and failover without long service disruptions. These properties preclude the use of existing resilience techniques like virtual machine migration or state replication that are used for typical workloads. This paper presents Atlas, the first system that provides resilience for the DU. The central insight in Atlas is to repurpose existing cellular mechanisms for \emphwireless resilience, namely handovers and cell reselection, to provide \emphsoftware resilience for the DU. For planned resilience events like upgrades, we design a novel technique that simultaneously serves cells from both the old and new DUs via the same radio, and uses handovers between these cells to migrate user devices. For unplanned failures, we identify deficiencies in existing RAN protocols that disrupt cell reselection after DU failure, and show how we can eliminate these disruptions using a middlebox between the DU and higher layers. Our evaluation with a state-of-the-art 5G vRAN testbed shows that Atlas achieves minimal disruption to cellular connectivity during resilience events, while incurring low overhead.