Cellular Edge Infrastructure for Latency-critical Applications
Building SLO-aware, highly available, and secure cellular infrastructure for next-generation access-edge computing
Multi-access edge computing (MEC) brings computational power closer to mobile devices by connecting with 5G cellular networks. This paradigm enables latency-critical applications, from smart stadiums and AR/VR to cloud gaming and autonomous driving, to offload compute-intensive tasks to edge servers. These applications typically operate through request-response interactions between clients and edge servers, where each application must meet strict service-level objectives (SLOs) to maintain quality of service. To realize this vision, 5G RAN stacks have been virtualized and disaggregated across edge datacenters.
However, these systems face three critical challenges: (1) unpredictable end-to-end performance that breaks tight SLOs, (2) outages caused by failures or upgrades inside virtualized RAN (vRAN) components, and (3) new attack surfaces on the Ethernet-based fronthaul network. This project develops a holistic toolkit that makes cellular infrastructure robust against these challenges. By enforcing SLO-awareness, high availability, and strong security, we enable next-generation edge services to run atop commodity 5G networks without costly over-provisioning or proprietary hardware.
Predictable 5G access edge computing
Latency-critical mobile-edge applications frequently miss their end-to-end deadlines because the 5G/MEC pipeline is governed by a patchwork of independent schedulers. Radio bandwidth, transport queues, and edge-compute workloads all fluctuate, yet each layer allocates resources as if its own deadline were the only one that matters. Without any common notion of how much time budget (or “slack”) remains, the network may deliver packets only after their compute window has vanished, or a frame that finishes inference on the edge may find no airtime left on the uplink. Even under moderate load, this siloed decision-making can drag SLO success rates below one-quarter.
ARMA (Yi et al., 2025) introduces a lightweight controller above the O-RAN RIC that lets the application and the RAN share just enough state (frame deadlines, DNN load, instantaneous RB availability) to co-optimize bitrate, model depth, RB allocation, and GPU time on a per-frame basis. By continuously re-splitting each request’s latency budget between “over-the-air” and “on-the-edge” stages, ARMA lifts SLO satisfaction from roughly 26% to about 97% on an Open-RAN video-analytics testbed, with negligible radio overhead. As ongoing work, we are developing a resource scheduling framework that enables applications beyond video analytics to meet their SLOs while requiring only minimal hints from applications and no infrastructure changes.
Resilient 5G vRAN infrastructure
While virtualization brings flexibility and cost savings to cellular infrastructure, it also introduces new failure modes that can violate the strict availability requirements of latency-critical applications. Traditional cellular networks achieve five-nines availability (less than 6 minutes of downtime per year), but virtualized RAN components such as the PHY and distributed unit (DU) are now subject to software crashes, planned maintenance, and hardware failures. In virtualized RANs, a PHY crash or DU upgrade can disconnect users for several seconds, and existing failover mechanisms rebuild state too slowly. Worse, when a primary DU dies, protocol timeouts of 3 to 30 seconds at the CU cascade into mass UE drops, causing widespread service disruption that far exceeds acceptable downtime budgets.
Our approach leverages in-network processing and cross-layer coordination to enable fast failover and seamless migration of vRAN components. Our work, Slingshot (Lazarev et al., 2023) and Atlas (Xing et al., 2023), deploys lightweight network functions along the fronthaul and midhaul and orchestrates proactive state management to achieve sub-second recovery without requiring changes to RAN protocols or radio hardware.
Slingshot cleanly hot-migrates the stateless PHY between servers via an in-switch middlebox and an Orion control loop, yielding zero user-visible downtime during planned maintenance and instant recovery from unplanned PHY faults. Atlas extends resilience to the stateful distributed unit (DU): a fronthaul NF shares one RU between a source and backup DU, while a midhaul NF pre-notifies the CU of failures and a controller orchestrates proactive handovers or reactive failovers. The design re-establishes connectivity in ≈100 ms after a DU crash, an order of magnitude faster than stock vRAN behavior, with no throughput loss during proactive migration. Together, Slingshot and Atlas deliver sub-second availability across the full RAN stack.
Secure fronthaul network
The shift from proprietary fronthaul interfaces to open, Ethernet-based standards (eCPRI and O-RAN) has democratized RAN deployments but also introduced new security vulnerabilities. Unlike legacy fronthaul that relied on physical isolation and vendor-specific protocols, modern fronthaul networks traverse shared Ethernet infrastructure where adversaries can potentially intercept or manipulate traffic. Despite these risks, current standards still lack mandatory integrity protection because standards bodies considered MITM attacks “unlikely and low-impact.”
Our work (Xing et al., 2024) demonstrates the opposite: software-only adversaries who bypass 802.1X can inject or modify fronthaul packets to trigger cell-wide signaling storms or corrupt control blocks, impacting DUs and UEs across whole regions.
We introduce two attack families, FRONTSTORM (mass handover storms) and FRONTSTRIKE (signal-level corruption), and show they scale to line-rate with minimal hardware. We then evaluate countermeasures, finding that MACsec with AES-NI adds only ~2.4 µs per jumbo packet and selective header-only protection can cut this to < 0.3 µs, making full-time integrity both feasible and essential. Our findings urge standards bodies to mandate integrity protection and deploy lightweight anomaly detectors along the fronthaul path.