Robust and Resilient 5G Edge Infrastructure
Building Predictable, Available and Secure Cellular Infrastructure for Next-Generation Access-Edge Computing
Modern access-edge applications—from real-time video analytics to AR/VR—depend on 5G RAN stacks that are virtualized and disaggregated across edge datacenters. This project develops a holistic toolkit that makes cellular infrastructure robust against three chronic pain-points: (1) unpredictable end-to-end performance that breaks tight service-level objectives (SLOs); (2) outages caused by failures or upgrades inside virtualized RAN (vRAN) components; and (3) new attack surfaces on the Ethernet-based fronthaul network. By enforcing predictability, high availability and strong security, we show that next-generation edge services can run atop commodity 5G networks without costly over-provisioning or proprietary hardware.
Resilient 5G vRAN infrastructure
In virtualized RANs a PHY crash or DU upgrade can disconnect users for 6.2 s on average—far above the five-nines budget of < 6 minutes / year—and existing failover mechanisms rebuild state too slowly. Worse, when a primary DU dies, protocol time-outs of 3 – 30 s at the CU cascade into mass UE drops.
Slingshot (Lazarev et al., 2023) cleanly hot-migrates the stateless PHY between servers via an in-switch middlebox and an Orion control loop, yielding zero user-visible downtime during planned maintenance and instant recovery from unplanned PHY faults. Atlas (Xing et al., 2023) extends resilience to the stateful distributed unit (DU): a fronthaul NF shares one RU between a source and backup DU, while a midhaul NF pre-notifies the CU of failures and a controller orchestrates proactive handovers or reactive failovers. The design re-establishes connectivity in ≈100 ms after a DU crash—an order-of-magnitude faster than stock vRAN behavior—-with no throughput loss during proactive migration. Slingshot and Atlas jointly furnish sub-second availability across the full RAN stack.
Secure fronthaul network
Ethernet-based eCPRI/O-RAN fronthaul traffic still lacks mandatory integrity protection; standards bodies considered MITM attacks ``unlikely and low-impact’’. In this work (Xing et al., 2024), we demonstrate the opposite: software-only adversaries who bypass 802.1X can inject or modify fronthaul packets to trigger cell-wide signaling storms or corrupt control blocks, impacting DUs and UEs across whole regions.
This work introduces two attack families, FRONTSTORM (mass handover storms) and FRONTSTRIKE (signal-level corruption), and shows they scale line-rate with minimal hardware. It then evaluates countermeasures, finding that MACsec with AES-NI adds only ~2.4 µs per jumbo packet and selective header-only protection can cut this to < 0.3 µs, making full-time integrity both feasible and essential. The work urges standards to mandate integrity and deploy lightweight anomaly detectors along the fronthaul path.
Predictable 5G access edge computing
Latency-critical mobile-edge applications still miss their end-to-end deadlines because the 5G/MEC pipeline is governed by a patchwork of independent schedulers. Radio bandwidth, transport queues and edge-compute workloads all fluctuate, yet each layer allocates resources as if its own deadline were the only one that matters. Without any common notion of how much time-budget (or “slack”) remains, the network may deliver packets only after their compute window has vanished, or a frame that finishes inference on the edge may find no airtime left on the uplink. Even under moderate load this siloed decision-making can drag SLO success rates below one-quarter.
ARMA (Yi et al., 2025) introduces a lightweight controller above the O-RAN RIC that lets the application and the RAN share just enough state—frame deadlines, DNN load, instantaneous RB availability—to co-optimise bitrate, model depth, RB allocation and GPU time on a per-frame basis. By continuously re-splitting each request’s latency budget between “over-the-air” and “on-the-edge” stages, ARMA lifts SLO satisfaction from roughly 26 % to about 97 % on an Open-RAN video-analytics testbed, with negligible radio overhead. As an ongoing work, we are developing a resource scheduling framework that enables applications beyond video analytics to meet their SLOs.