SLO-aware Robust and Resilient 5G Edge Infrastructure
Building SLO-aware, highly available, and secure cellular infrastructure for next-generation access-edge computing
Modern access-edge applications – from real-time video analytics to AR/VR – depend on 5G RAN stacks that are virtualized and disaggregated across edge datacenters. This project develops a holistic toolkit that makes cellular infrastructure robust against three chronic pain points: (1) Unpredictable end-to-end performance that breaks tight service-level objectives (SLOs); (2) Outages caused by failures or upgrades inside virtualized RAN (vRAN) components; and (3) New attack surfaces on the Ethernet-based fronthaul network. By enforcing SLO-awareness, high availability, and strong security, we show that next-generation edge services can run atop commodity 5G networks without costly over-provisioning or proprietary hardware.
Resilient 5G vRAN infrastructure
In virtualized RANs a PHY crash or DU upgrade can disconnect users for 6.2 s on average – far above the five-nines budget of < 6 minutes/year – and existing failover mechanisms rebuild state too slowly. Worse, when a primary DU dies, protocol time-outs of 3–30 seconds at the CU cascade into mass UE drops.
Slingshot (Lazarev et al., 2023) cleanly hot-migrates the stateless PHY between servers via an in-switch middlebox and an Orion control loop, yielding zero user-visible downtime during planned maintenance and instant recovery from unplanned PHY faults. Atlas (Xing et al., 2023) extends resilience to the stateful distributed unit (DU): a fronthaul NF shares one RU between a source and backup DU, while a midhaul NF pre-notifies the CU of failures and a controller orchestrates proactive handovers or reactive failovers. The design re-establishes connectivity in ≈100 ms after a DU crash – an order-of-magnitude faster than stock vRAN behavior – with no throughput loss during proactive migration. Slingshot and Atlas jointly furnish sub-second availability across the full RAN stack.
Secure fronthaul network
Ethernet-based eCPRI/O-RAN fronthaul traffic still lacks mandatory integrity protection; standards bodies considered MITM attacks “unlikely and low-impact”. In this work (Xing et al., 2024), we demonstrate the opposite: software-only adversaries who bypass 802.1X can inject or modify fronthaul packets to trigger cell-wide signaling storms or corrupt control blocks, impacting DUs and UEs across whole regions.
This work introduces two attack families, FRONTSTORM (mass handover storms) and FRONTSTRIKE (signal-level corruption), and shows they scale line-rate with minimal hardware. It then evaluates countermeasures, finding that MACsec with AES-NI adds only ~2.4 µs per jumbo packet and selective header-only protection can cut this to < 0.3 µs, making full-time integrity both feasible and essential. The work urges standards to mandate integrity and deploy lightweight anomaly detectors along the fronthaul path.
Predictable 5G access edge computing
Latency-critical mobile-edge applications still miss their end-to-end deadlines because the 5G/MEC pipeline is governed by a patchwork of independent schedulers. Radio bandwidth, transport queues and edge-compute workloads all fluctuate, yet each layer allocates resources as if its own deadline were the only one that matters. Without any common notion of how much time-budget (or “slack”) remains, the network may deliver packets only after their compute window has vanished, or a frame that finishes inference on the edge may find no airtime left on the uplink. Even under moderate load this siloed decision-making can drag SLO success rates below one-quarter.
ARMA (Yi et al., 2025) introduces a lightweight controller above the O-RAN RIC that lets the application and the RAN share just enough state—frame deadlines, DNN load, instantaneous RB availability—to co-optimise bitrate, model depth, RB allocation and GPU time on a per-frame basis. By continuously re-splitting each request’s latency budget between “over-the-air” and “on-the-edge” stages, ARMA lifts SLO satisfaction from roughly 26% to about 97% on an Open-RAN video-analytics testbed, with negligible radio overhead. As an ongoing work, we are developing a resource scheduling framework that enables applications beyond video analytics to meet their SLOs.