Cellular Edge Infrastructure for Latency-critical Applications

Building SLO-aware, highly available, and secure cellular infrastructure for next-generation access-edge computing

Multi-access edge computing (MEC) brings computational power closer to mobile devices by connecting with 5G cellular networks. This paradigm enables latency-critical applications, from smart stadiums and AR/VR to cloud gaming and autonomous driving, to offload compute-intensive tasks to edge servers. These applications typically operate through request-response interactions between clients and edge servers, where each application must meet strict service-level objectives (SLOs) to maintain quality of service. To realize this vision, 5G RAN stacks have been virtualized and disaggregated across edge datacenters.

However, these systems face three critical challenges: (1) unpredictable end-to-end performance that breaks tight SLOs, (2) outages caused by failures or upgrades inside virtualized RAN (vRAN) components, and (3) new attack surfaces on the Ethernet-based fronthaul network. This project develops a holistic toolkit that makes cellular infrastructure robust against these challenges. By enforcing SLO-awareness, high availability, and strong security, we enable next-generation edge services to run atop commodity 5G networks without costly over-provisioning or proprietary hardware.

Predictable 5G access edge computing

Latency-critical mobile-edge applications frequently miss their end-to-end deadlines because the 5G/MEC pipeline is governed by a patchwork of independent schedulers. Radio bandwidth, transport queues, and edge-compute workloads all fluctuate, yet each layer allocates resources as if its own deadline were the only one that matters. Without any common notion of how much time budget (or “slack”) remains, the network may deliver packets only after their compute window has vanished, or a frame that finishes inference on the edge may find no airtime left on the uplink. Even under moderate load, this siloed decision-making can drag SLO success rates below one-quarter.

ARMA (Yi et al., 2025) introduces a lightweight controller above the O-RAN RIC that lets the application and the RAN share just enough state (frame deadlines, DNN load, instantaneous RB availability) to co-optimize bitrate, model depth, RB allocation, and GPU time on a per-frame basis. By continuously re-splitting each request’s latency budget between “over-the-air” and “on-the-edge” stages, ARMA lifts SLO satisfaction from roughly 26% to about 97% on an Open-RAN video-analytics testbed, with negligible radio overhead. As ongoing work, we are developing a resource scheduling framework that enables applications beyond video analytics to meet their SLOs while requiring only minimal hints from applications and no infrastructure changes.

Resilient 5G vRAN infrastructure

While virtualization brings flexibility and cost savings to cellular infrastructure, it also introduces new failure modes that can violate the strict availability requirements of latency-critical applications. Traditional cellular networks achieve five-nines availability (less than 6 minutes of downtime per year), but virtualized RAN components such as the PHY and distributed unit (DU) are now subject to software crashes, planned maintenance, and hardware failures. In virtualized RANs, a PHY crash or DU upgrade can disconnect users for several seconds, and existing failover mechanisms rebuild state too slowly. Worse, when a primary DU dies, protocol timeouts of 3 to 30 seconds at the CU cascade into mass UE drops, causing widespread service disruption that far exceeds acceptable downtime budgets.

Our approach leverages in-network processing and cross-layer coordination to enable fast failover and seamless migration of vRAN components. Our work, Slingshot (Lazarev et al., 2023) and Atlas (Xing et al., 2023), deploys lightweight network functions along the fronthaul and midhaul and orchestrates proactive state management to achieve sub-second recovery without requiring changes to RAN protocols or radio hardware.

Slingshot cleanly hot-migrates the stateless PHY between servers via an in-switch middlebox and an Orion control loop, yielding zero user-visible downtime during planned maintenance and instant recovery from unplanned PHY faults. Atlas extends resilience to the stateful distributed unit (DU): a fronthaul NF shares one RU between a source and backup DU, while a midhaul NF pre-notifies the CU of failures and a controller orchestrates proactive handovers or reactive failovers. The design re-establishes connectivity in ≈100 ms after a DU crash, an order of magnitude faster than stock vRAN behavior, with no throughput loss during proactive migration. Together, Slingshot and Atlas deliver sub-second availability across the full RAN stack.

Secure fronthaul network

The shift from proprietary fronthaul interfaces to open, Ethernet-based standards (eCPRI and O-RAN) has democratized RAN deployments but also introduced new security vulnerabilities. Unlike legacy fronthaul that relied on physical isolation and vendor-specific protocols, modern fronthaul networks traverse shared Ethernet infrastructure where adversaries can potentially intercept or manipulate traffic. Despite these risks, current standards still lack mandatory integrity protection because standards bodies considered MITM attacks “unlikely and low-impact.”

Our work (Xing et al., 2024) demonstrates the opposite: software-only adversaries who bypass 802.1X can inject or modify fronthaul packets to trigger cell-wide signaling storms or corrupt control blocks, impacting DUs and UEs across whole regions.

We introduce two attack families, FRONTSTORM (mass handover storms) and FRONTSTRIKE (signal-level corruption), and show they scale to line-rate with minimal hardware. We then evaluate countermeasures, finding that MACsec with AES-NI adds only ~2.4 µs per jumbo packet and selective header-only protection can cut this to < 0.3 µs, making full-time integrity both feasible and essential. Our findings urge standards bodies to mandate integrity protection and deploy lightweight anomaly detectors along the fronthaul path.

Publications

2025

  1. MobiSys
    Towards End-to-End Latency Guarantee in MEC Live Video Analytics with App-RAN Mutual Awareness
    Juheon Yi, Goodsol Lee, Seokgyeong Shin, Minkyung Jeong, Daehyeok Kim, and Youngki Lee
    In Proceedings of 23rd ACM International Conference on Mobile Systems, Applications, and Services, June 2025

2024

  1. Usenix Security
    On the Criticality of Integrity Protection in 5G Fronthaul Networks
    Jiarong Xing, Sophia Yoo, Xenofon Foukas, Daehyeok Kim, and Michael K. Reiter
    In Proceedings of 33rd USENIX Security Symposium, August 2024

2023

  1. SIGCOMM
    Resilient Baseband Processing in Virtualized RANs with Slingshot
    Nikita Lazarev, Tao Ji, Anuj Kalia, Daehyeok Kim, Ilias Marinos, Francis Y. Yan, Christina Delimitrou, Zhiru Zhang, and Aditya Akella
    In Proceedings of ACM SIGCOMM conference, September 2023
  2. MobiCom
    Enabling Resilience in Virtualized RANs with Atlas
    Jiarong Xing, Junzhi Gong, Xenofon Foukas, Anuj Kalia, Daehyeok Kim, and Manikanta Kotaru
    In Proceedings of 29th ACM International Conference on Mobile Computing and Networking, October 2023