publications | Daehyeok Kim

2025

IROS

ConfigBot: Adaptive Resource Allocation for Robot Applications in Dynamic Environments

Rohit Dwivedula, Sadanand Modak, Aditya Akella, Joydeep Biswas, Daehyeok Kim, and Christopher Rossbach

In Proceedings of 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems, October 2025

Abs

The growing use of service robots in dynamic environments requires flexible management of on-board compute resources to optimize the performance of diverse tasks such as navigation, localization, and perception. %, and so on. Current robot deployments often rely on static OS configurations and system over-provisioning. However, they are suboptimal because they do not account for variations in resource usage. This results in poor system-wide behavior such as robot instability or inefficient resource use. This paper presents \system, a novel system designed to adaptively reconfigure robot applications to meet a predefined performance specification by leveraging \emphruntime profiling and \emphautomated configuration tuning. Through experiments on multiple real robots, each running a different stack with diverse performance requirements, which could be \emphcontext-dependent, we illustrate \system’s efficacy in maintaining system stability and optimizing resource allocation. Our findings highlight the promise of automatic system configuration tuning for robot deployments, including adaptation to dynamic changes.
MobiSys

Towards End-to-End Latency Guarantee in MEC Live Video Analytics with App-RAN Mutual Awareness

Juheon Yi, Goodsol Lee, Seokgyeong Shin, Minkyung Jeong, Daehyeok Kim, and Youngki Lee

In Proceedings of 23rd ACM International Conference on Mobile Systems, Applications, and Services, June 2025

Abs PDF

While mobile live video analytics apps require end-to-end latency guarantee for responsiveness and immersiveness, achieving consistent low latency is challenging due to complex fluctuations of wireless channel and scene complexity; for example, latency SLO satisfaction rate drops to as low as 26% in commercial 5G MEC platforms. Prior works mostly focus on either app-only (bitrate, DNN adaptation, or GPU allocation) or RAN-only (radio resource allocation) scheduling, with mutual ignorance of the other side resulting in mismatched scheduling decisions and frequent SLO violations. Coordinating the two schedulers is also challenging, as they are run separately by network and cloud operators with disjoint control. We present ARMA, an end-to-end live video analytics system with app-RAN mutual-awareness for high end-to-end latency SLO satisfaction in MEC. We design a mutually-aware decoupled scheduling mechanism on top of RAN Intelligent Controller (RIC) in Open-RAN architecture that fosters cooperative interaction between the two operators’ schedulers while preserving operational proprietaries. We prototype an Open RAN-enabled 5G MEC testbed and evaluate ARMA, showing that it achieves 97% SLO satisfaction rate.
NSDI

Enabling Portable and High-Performance SmartNIC Programs with Alkali

Jiaxin Lin, Zhiyuan Guo, Mihir Shah, Tao Ji, Yiying Zhang, Daehyeok Kim, and Aditya Akella

In Proceedings of 22nd USENIX Symposium on Networked Systems Design and Implementation, April 2025

Abs PDF

Trends indicate that emerging SmartNICs, either from different vendors or generations from the same vendor, exhibit substantial differences in hardware parallelism and memory interconnects. These variations make porting programs across NICs highly complex and time-consuming, requiring programmers to significantly refactor code for performance based on each target NIC’s hardware characteristics. We argue that an ideal SmartNIC compilation framework should allow developers to write target-independent programs, with the compiler automatically managing cross-NIC porting and performance optimization. We present such a framework, Alkali, that achieves this by (1) proposing a new intermediate representation for building flexible compiler infrastructure for multiple NIC targets and (2) developing a new iterative parallelism optimization algorithm that automatically ports and parallelizes the input programs based on the target NIC’s hardware characteristics. Experiments across a wide range of NIC applications demonstrate that Alkali enables developers to easily write portable, high-performance NIC programs. Our compiler optimization passes can automatically port these programs and make them run efficiently across all targets, achieving performance within 9.8% of hand-tuned expert implementations.

2024

Usenix Security

On the Criticality of Integrity Protection in 5G Fronthaul Networks

Jiarong Xing, Sophia Yoo, Xenofon Foukas, Daehyeok Kim, and Michael K. Reiter

In Proceedings of 33rd USENIX Security Symposium, August 2024

Abs PDF

The modern 5G fronthaul, which connects the base stations to radio units in cellular networks, is designed to deliver microsecond-level performance guarantees using Ethernet-based protocols. Unfortunately, due to potential performance overheads, as well as misconceptions about the low risk and impact of possible attacks, integrity protection is not considered a mandatory feature in the 5G fronthaul standards. In this work, we show how vulnerabilities from the lack of protection can be exploited, making attacks easier and more powerful than ever. We present a novel class of powerful attacks and a set of traditional attacks, which can both be fully launched from software over open packet-based interfaces, to cause performance degradation or denial of service to users over large geographical regions. Our attacks do not require a physical radio presence or signal-based attack mechanisms, do not affect the network’s operation (e.g., not crashing the radios), and are highly severe (e.g., impacting multiple cells). We demonstrate the impact of our attacks in an end-to-end manner on a commercial-grade, multi-cell 5G testbed, showing that adversaries can degrade performance of connected users by more than 80%, completely block a selected subset of users from ever attaching to the cell, or even generate signaling storm attacks of more than 2500 signaling messages per minute, with just two compromised cells and four mobile users. We also present an analysis of countermeasures that meet the strict performance requirements of the fronthaul.

2023

MLSys@NeurIPS

On a Foundation Model for Operating Systems

Divyanshu Saxena, Nihal Sharma, Donghyun Kim, Rohit Dwivedula, Jiayi Chen, Chenxi Yang, Sriram Ravula, Zichao Hu, Aditya Akella, Sebastian Angel, Joydeep Biswas, Swarat Chaudhuri, Isil Dillig, Alex Dimakis, P. Brighten Godfrey, Daehyeok Kim, Chris Rossbach, and Gang Wang

In Workshop on ML for Systems at NeurIPS, December 2023

Abs PDF

This paper lays down the research agenda for a domain-specific foundation model for operating systems (OSes). Our case for a foundation model revolves around the observations that several OS components such as CPU, memory, and network subsystems are interrelated and that OS traces offer the ideal dataset for a foundation model to grasp the intricacies of diverse OS components and their behavior in varying environments and workloads. We discuss a wide range of possibilities that then arise, from employing foundation models as policy agents to utilizing them as generators and predictors to assist traditional OS control algorithms. Our hope is that this paper spurs further research into OS foundation models and creating the next generation of operating systems for the evolving computing landscape.
MICRO

LogNIC: A High-Level Performance Model for SmartNICs

Zerui Guo, Jiaxin Lin, Yuebin Bai, Daehyeok Kim, Michael Swift, Aditya Akella, and Ming Liu

In Proceedings of 56th IEEE/ACM International Symposium on Microarchitecture, October 2023

Abs PDF

SmartNICs have become an indispensable communication fabric and computing substrate in today’s data centers and enterprise clusters, providing in-network computing capabilities for traversed packets and benefiting a range of applications across the system stack. Building an efficient SmartNIC-assisted solution is generally non-trivial and tedious as it requires programmers to understand the SmartNIC architecture, refactor application logic to match the device’s capabilities and limitations, and correlate an application execution with traffic characteristics. A high-level SmartNIC performance model can decouple the underlying SmartNIC hardware device from its offloaded software implementations and execution contexts, thereby drastically simplifying and facilitating the development process. However, prior architectural models can hardly be applied due to their ineptness in dissecting the SmartNIC-offloaded program’s complexity, capturing the nondeterministic overlapping between computation and I/O, and perceiving diverse traffic profiles.

This paper presents the LogNIC model that systematically analyzes the performance characteristics of a SmartNIC-offloaded program. Unlike conventional execution flow-based modeling, LogNIC employs a packet-centric approach that examines SmartNIC execution based on how packets traverse heterogeneous computing domains, on-/off-chip interconnects, and memory subsystems. It abstracts away the low-level device details, represents a deployed program as an execution graph, retains a handful of configurable parameters, and generates latency/throughput estimation for a given traffic profile. It further exposes a couple of extensions to handle multi-tenancy, traffic interleaving, and accelerator peculiarity. We demonstrate the LogNIC model’s capabilities using both commodity SmartNICs and an academic prototype under five application scenarios. Our evaluations show that LogNIC can estimate performance bounds, explore software optimization strategies, and provide guidelines for new hardware designs.
MobiCom

Enabling Resilience in Virtualized RANs with Atlas

Jiarong Xing, Junzhi Gong, Xenofon Foukas, Anuj Kalia, Daehyeok Kim, and Manikanta Kotaru

In Proceedings of 29th ACM International Conference on Mobile Computing and Networking, October 2023

Abs PDF

Virtualized radio access networks (vRANs), which allow running RAN processing on commodity servers instead of proprietary hardware, are gaining adoption in cellular networks. Two properties of the vRAN’s “Distributed Unit (DU)” that implements the lower RAN layers—its real-time deadlines and its black-box nature—make it challenging to provide resilience features such as upgrades and failover without long service disruptions. These properties preclude the use of existing resilience techniques like virtual machine migration or state replication that are used for typical workloads. This paper presents Atlas, the first system that provides resilience for the DU. The central insight in Atlas is to repurpose existing cellular mechanisms for \emphwireless resilience, namely handovers and cell reselection, to provide \emphsoftware resilience for the DU. For planned resilience events like upgrades, we design a novel technique that simultaneously serves cells from both the old and new DUs via the same radio, and uses handovers between these cells to migrate user devices. For unplanned failures, we identify deficiencies in existing RAN protocols that disrupt cell reselection after DU failure, and show how we can eliminate these disruptions using a middlebox between the DU and higher layers. Our evaluation with a state-of-the-art 5G vRAN testbed shows that Atlas achieves minimal disruption to cellular connectivity during resilience events, while incurring low overhead.
SIGCOMM

Resilient Baseband Processing in Virtualized RANs with Slingshot

Nikita Lazarev, Tao Ji, Anuj Kalia, Daehyeok Kim, Ilias Marinos, Francis Y. Yan, Christina Delimitrou, Zhiru Zhang, and Aditya Akella

In Proceedings of ACM SIGCOMM conference, September 2023

Abs PDF

In cellular networks, there is a growing adoption of virtualized radio access networks (vRANs), where operators are replacing the traditional specialized hardware for RAN processing with software running on commodity servers. Today’s vRAN deployments lack resilience, since there is no support for vRAN failover or upgrades without long service interruptions. Enabling these features for vRANs is challenging because of their strict real-time latency requirements and black-box nature. Slingshot is a new system that transparently provides resilience for the vRAN’s most performance-critical layer: the physical layer (PHY). We design new techniques for realtime workload migration with fast RAN protocol middleboxes, and realtime RAN failure detection. A key insight in our design is to view the transient disruptions from resilience events to RAN computation state and I/O similarly to regular wireless signal impairments, and leverage the inherent resilience of cellular networks to these events. Experiments with a state-of-the-art 5G vRAN testbed show that Slingshot handles PHY failover with no disruption to video conferencing, and under 110 ms of disruption to a TCP connection, and it also enables zero-downtime upgrades.
NSDI

ExoPlane: An Operating System for On-Rack Switch Resource Augmentation

Daehyeok Kim, Vyas Sekar, and Srinivasan Seshan

In Proceedings of 20th USENIX Symposium on Networked Systems Design and Implementation, April 2023

Abs PDF Slides Talk video

The promise of in-network computing continues to be unrealized in realistic deployments (e.g., clouds and ISPs) as serving concurrent stateful applications on a programmable switch is challenging today due to limited switch’s on-chip resources. In this paper, we argue that an on-rack switch resource augmentation architecture that augments a programmable switch with other programmable network hardware, such as smart NICs, on the same rack can be a pragmatic and incrementally scalable solution. To realize this vision, we design and implement ExoPlane, an operating system for on-rack switch resource augmentation to support multiple concurrent applications. In designing ExoPlane, we propose a practical runtime operating model and state abstraction to address challenges in managing application states correctly across multiple devices with minimal performance and resource overheads. Our evaluation with various P4 applications shows that ExoPlane can provide applications with low latency, scalable throughput, and fast failover while achieving these with small resource overheads and no or little modifications on applications.
NSDI

Sketchovsky: Enabling Ensembles of Sketches on Programmable Switches

Hun Namkung, Zaoxing Liu, Daehyeok Kim, Vyas Sekar, and Peter Steenkiste

In Proceedings of 20th USENIX Symposium on Networked Systems Design and Implementation, April 2023

Abs PDF

Network operators need to run diverse measurement tasks on programmable switches to support management decisions (e.g., traffic engineering or anomaly detection). While prior work has shown the viability of running a single sketch instance, they largely ignore the problem of running an ensemble of sketch instances for a collection of measurement tasks. As such, existing efforts fall short of efficiently supporting a general ensemble of sketch instances. In this work, we present the design and implementation of Sketchovsky, a novel cross-sketch optimization and composition framework. We identify five new cross-sketch optimization building blocks to reduce critical switch hardware resources. We design efficient heuristics to select and apply these building blocks for arbitrary ensembles. To simplify developer effort, Sketchovsky automatically generates the composed code to be input to the hardware compiler. Our evaluation shows that Sketchovsky makes ensembles with up to 18 sketch instances become feasible and can reduce up to 45% of the critical hardware resources.

2022

SOSR

Automatic Generation of Network Function Accelerators Using Component-Based Synthesis

Francisco Pereira, Gonçalo Matos, Hugo Sadok, Daehyeok Kim, Ruben Martins, Justine Sherry, Fernando Ramos, and Luis Pedrosa

In Proceedings of ACM Symposium on SDN Research, October 2022

Abs PDF

Designing networked systems that take best advantage of heterogeneous dataplanes – e.g., dividing packet processing across both a PISA switch and an x86 CPUs – can improve performance, efficiency, and resource consumption. However, programming for multiple hardware targets remains challenging because developers must learn platform-specific languages and skills. While some ‘write-once, run-anywhere’ compilers exist, they are unable to consider a range of implementation options to tune the NF to meet performance objectives. In this short paper, we explore preliminary ideas towards a compiler that explores a large search space of different mappings of functionality to hardware. This exploration can be tuned for a programmer-specified objective, such as minimizing memory consumption or maximizing network throughput. Our initial prototype, SyNAPSE, is based on a methodology called component-based synthesis and supports deployments across x86 and Tofino platforms. Relative to a baseline compiler which only generates one deployment decision, SyNAPSE uncovers thousands of deployment options – including a deployment which reduces the amount of controller traffic by an order of magnitude, and another deployment which halves memory usage.
NSDI

SketchLib: Enabling Efficient Sketch-based Monitoring on Programmable Switches

Hun Namkung, Zaoxing Liu, Daehyeok Kim, Vyas Sekar, and Peter Steenkiste

In Proceedings of 19th USENIX Symposium on Networked Systems Design and Implementation, April 2022

Abs PDF

Sketching algorithms or sketches enable accurate network measurement results with low resource footprints. While emerging programmable switches are an attractive target to get these benefits, current implementations of sketches are either inefficient and/or infeasible on hardware. Our contributions in the paper are: (1) systematically analyzing the resource bottlenecks of existing sketch implementations in hardware; (2) identifying practical and correct-by-construction optimization techniques to tackle the identified bottlenecks; and (3) designing an easy-to-use library called SketchLib to help developers efficiently implement their sketch algorithms in switch hardware to benefit from these resource optimizations. Our evaluation on state-of-the-art sketches demonstrates that SketchLib reduces the hardware resource footprint up to 96% without impacting fidelity.
NSDI

SwiSh: Distributed Shared State Abstractions for Programmable Switches

Lior Zeno, Dan R. K. Ports, Jacob Nelson, Daehyeok Kim, Shir Landau Feibish, Idit Keidar, Arik Rinberg, Alon Rashelbach, Igor De-Paula, and Mark Silberstein

In Proceedings of 19th USENIX Symposium on Networked Systems Design and Implementation, April 2022

Abs PDF

We design and evaluate SwiSh, a distributed shared state management layer for data-plane P4 programs. SwiSh enables running scalable stateful distributed network functions on programmable switches entirely in the data-plane. We explore several schemes to build a shared variable abstraction, which differ in consistency, performance, and in-switch implementation complexity. We introduce the novel Strong Delayed-Writes (SDW) protocol which offers consistent snapshots of shared data-plane objects with semantics known as rr-relaxed strong linearizability, enabling implementation of distributed concurrent sketches with precise error bounds. We implement strong, eventual, and SDW consistency protocols in Tofino switches, and compare their performance in microbenchmarks and three realistic network functions, NAT, DDoS detector, and rate limiter. Our results show that the distributed state management in the data plane is practical, and outperforms centralized solutions by up to four orders of magnitude in update throughput and replication latency.

2021

PhD Thesis

Towards Elastic and Resilient In-Network Computing

Daehyeok Kim

PhD Thesis, Carnegie Mellon University, Computer Science Department, November 2021

Abs PDF

Recent advances in programmable networking hardware technology such as programmable switches and smart network interface cards create a new computing paradigm called in-network computing. This new paradigm allows functionality that has been served by servers or proprietary hardware devices, ranging from network middleboxes to components of distributed systems, to now be performed in the network. The demand for higher performance and the commercial availability of programmable hardware have driven the popularity of in-network computing.
While many recent efforts have demonstrated the performance benefit of in-network computing, we observe a significant gap between what it offers today and evolving application demands. In particular, we argue that in-network computing lacks resource elasticity and fault resiliency which are essential building blocks for practical computing platforms, limiting its potential. Elasticity can address the shortcoming that today’s in-network computing only supports a simple deployment model where a single application runs on a single device equipped with fixed and limited resources. Similarly, fault resiliency is critical for managing prevalent device failures for the correctness and performance of applications, but it has gained littleattention. Although resource elasticity and fault resiliency have been extensively studied for traditional CPU server-based computing, we find that enabling them on programmable networking devices is challenging, especially due to their low-level abstractions, hardware constraints, heterogeneity, and workload characteristics.
In this thesis, we argue that by designing high-level abstractions and runtime environments that help leverage compute and memory resources available outside of one type of device, we can make in-network computing more elastic and resilient without any hardware modifications. This concept, which we call device resource augmentation, is a key enabler for resource elasticity and fault resiliency for stateful in-network applications written for programmable switches. In particular, we design three systems, named TEA, ExoPlane, and RedPlane, that use this concept to support elastic memory and elastic compute/memory, and fault resiliency, respectively. Each of these systems consists of a key abstraction, programming APIs, and a runtime environment. We demonstrate their feasibility and effectiveness with prototype implementations and evaluations using various in-network applications. Putting all the pieces together, developers can easily enable resource elasticity and fault resiliency for their applications without worrying about underlying complexities.
Tech Report

A Roadmap for Enabling a Future-Proof In-Network Computing Data Plane Ecosystem

Daehyeok Kim, Nikita Lazarev, Tommy Tracy, Farzana Siddique, Hun Namkung, James C. Hoe, Vyas Sekar, Kevin Skadron, Zhiru Zhang, and Srinivasan Seshan

arXiv preprint arXiv:2111.04563, October 2021

Abs PDF

As the vision of in-network computing becomes more mature, we see two parallel evolutionary trends. First, we see the evolution of richer, more demanding applications that require capabilities beyond programmable switching ASICs. Second, we see the evolution of diverse data plane technologies with many other future capabilities on the horizon. While some point solutions exist to tackle the intersection of these trends, we see several ecosystem-level disconnects today; e.g., the need to refactor applications for new data planes, lack of systematic guidelines to inform the development of future data plane capabilities, and lack of holistic runtime frameworks for network operators. In this paper, we use a simple-yet-instructive emerging application-data plane combination to highlight these disconnects. Drawing on these lessons, we sketch a high-level roadmap and guidelines for the community to tackle these to create a more thriving "future-proof" data plane ecosystem.
SIGCOMM

RedPlane: Enabling Fault-Tolerant Stateful In-Switch Applications

Daehyeok Kim, Jacob Nelson, Dan R. K. Ports, Vyas Sekar, and Srinivasan Seshan

In Proceedings of ACM SIGCOMM conference, August 2021

Abs PDF Slides Talk video Talk + Q&A video

Many recent efforts have demonstrated the performance benefits of running datacenter functions (\emphe.g., NATs, load balancers, monitoring) on programmable switches. However, a key missing piece remains: fault tolerance. This is especially critical as the network is no longer stateless and pure endpoint recovery does not suffice. In this paper, we design and implement RedPlane, a fault-tolerant state store for stateful in-switch applications. This provides in-switch applications consistent access to their state, even if the switch they run on fails or traffic is rerouted to an alternative switch. We address key challenges in devising a practical, provably correct replication protocol and implementing it in the switch data plane. Our evaluations show that RedPlane incurs negligible overhead and enables end-to-end applications to rapidly recover from switch failures.
SOSR

Telemetry Retrieval Inaccuracy in Programmable Switches: Analysis and Recommendations

Hun Namkung, Daehyeok Kim, Zaoxing Liu, Vyas Sekar, and Peter Steenkiste

In Proceedings of ACM Symposium on SDN Research, July 2021

Abs PDF

Sketching algorithms or sketches are attractive as telemetry capabilities on programmable hardware switches since they offer rigorous accuracy guarantees and use compact data structures. However, we find that in practice, their actual implementations can have a significant (up to 94x) accuracy drop compared to theoretical expectations. We find that the delays incurred by pulling and resetting the data plane state induce accuracy degradation. We design and implement solutions to reduce the delays and show that our solutions can help eliminate almost all the inaccuracy of existing sketch workflows.

2020

Tech Report

Unleashing In-network Computing on Scientific Workloads

Daehyeok Kim, Ankush Jain, Zaoxing Liu, George Amvrosiadis, Damian Hazen, Bradley Settlemyer, and Vyas Sekar

arXiv preprint arXiv:2009.02457, September 2020

Abs PDF

Many recent efforts have shown that in-network computing can benefit various datacenter applications. In this paper, we explore a relatively less-explored domain which we argue can benefit from in-network computing: scientific workloads in high-performance computing. By analyzing canonical examples of HPC applications, we observe unique opportunities and challenges for exploiting in-network computing to accelerate scientific workloads. In particular, we find that the dynamic and demanding nature of scientific workloads is the major obstacle to the adoption of in-network approaches which are mostly open-loop and lack runtime feedback. In this paper, we present NSinC (Network-accelerated ScIeNtific Computing), an architecture for fully unleashing the potential benefits of in-network computing for scientific workloads by providing closed-loop runtime feedback to in-network acceleration services. We outline key challenges in realizing this vision and a preliminary design to enable acceleration for scientific applications.
SIGCOMM

TEA: Enabling State-Intensive Network Functions on Programmable Switches

Daehyeok Kim, Zaoxing Liu, Yibo Zhu, Changhoon Kim, Jeongkeun Lee, Vyas Sekar, and Srinivasan Seshan

In Proceedings of ACM SIGCOMM conference, August 2020

Abs PDF Slides Talk video Talk + Q&A video

Programmable switches have been touted as an attractive alternative for deploying network functions (NFs) such as network address translators (NATs), load balancers, and firewalls. However, their limited memory capacity has been a major stumbling block that has stymied their adoption for supporting state-intensive NFs such as cloud-scale NATs and load balancers that maintain millions of flow-table entries. In this paper, we explore a new approach that leverages DRAM on servers available in typical NFV clusters. Our new system architecture, called TEA (Table Extension Architecture), provides a virtual table abstraction that allows NFs on programmable switches to look up large virtual tables built on external DRAM. Our approach enables switch ASICs to access external DRAM purely in the data plane without involving CPUs on servers. We address key design and implementation challenges in realizing this idea. We demonstrate its feasibility and practicality with our implementation on a Tofino-based programmable switch. Our evaluation shows that NFs built with TEA can look up table entries on external DRAM with low and predictable latency (1.8-2.2 μs) and the lookup throughput can be linearly scaled with additional servers (138 million lookups per seconds with 8 servers).
NSDI

Adapting TCP for Reconfigurable Datacenter Networks

Matthew Mukerjee, Christopher Canel, Weiyang Wang, Daehyeok Kim, Srinivasan Seshan, and Alex C. Snoeren

In Proceedings of 17th USENIX Symposium on Networked Systems Design and Implementation, February 2020

Abs PDF Slides Talk + Q&A video

Reconfigurable datacenter networks (RDCNs) augment traditional packet switches with high-bandwidth reconfigurable circuits. In these networks, high-bandwidth circuits are assigned to particular source-destination rack pairs based on a schedule. To make efficient use of RDCNs, active TCP flows between such pairs must quickly ramp up their sending rates when high-bandwidth circuits are made available. Past studies have shown that TCP performs well on RDCNs with millisecond-scale reconfiguration delays, during which time the circuit network is offline. However, modern RDCNs can reconfigure in as little as 20 μs, and maintain a particular configuration for fewer than 10 RTTs. We show that existing TCP variants cannot ramp up quickly enough to work well on these modern RDCNs. We identify two methods to address this issue: First, an in-network solution that dynamically resizes top-of-rack switch virtual output queues to prebuffer packets; Second, an endpoint-based solution that increases the congestion window, cwnd, based on explicit circuit state feedback sent via the ECN-echo bit. To evaluate these techniques, we build an open-source RDCN emulator, Etalon, and show that a combination of dynamic queue resizing and explicit circuit state feedback increases circuit utilization by 1.91x with an only 1.20x increase in tail latency.

2019

NSDI

FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds

Daehyeok Kim, Tianlong Yu, Hongqiang Harry Liu, Yibo Zhu, Jitu Padhye, Shachar Raindel, Chuanxiong Guo, Vyas Sekar, and Srinivasan Seshan

In Proceedings of 16th USENIX Symposium on Networked Systems Design and Implementation, February 2019

Abs PDF Slides Talk + Q&A video

Many popular large-scale cloud applications are increasingly using containerization for high resource efficiency and lightweight isolation. In parallel, many data-intensive applications (e.g., data analytics and deep learning frameworks) are adopting or looking to adopt RDMA for high networking performance. Industry trends suggest that these two approaches are on an inevitable collision course. In this paper, we present FreeFlow, a software-based RDMA virtualization framework designed for containerized clouds. FreeFlow realizes virtual RDMA networking purely with a software-based approach using commodity RDMA NICs. Unlike existing RDMA virtualization solutions, FreeFlow fully satisfies the requirements from cloud environments, such as isolation for multi-tenancy, portability for container migrations, and controllability for control and data plane policies. FreeFlow is also transparent to applications and provides networking performance close to bare-metal RDMA with low CPU overhead. In our evaluations with TensorFlow and Spark, FreeFlow provides almost the same application performance as bare-metal RDMA.

2018

HotNets

Generic External Memory for Switch Data Planes

Daehyeok Kim, Yibo Zhu, Changhoon Kim, Jeongkeun Lee, and Srinivasan Seshan

In Proceedings of the 17th ACM Workshop on Hot Topics in Networks (HotNets), November 2018

Abs PDF Slides Talk + Q&A video

Network switches are an attractive vantage point to serve various network applications and functions such as load balancing and virtual switching because of their in-network location and high packet processing rate. Recent advances in programmable switch ASICs open more opportunities for offloading various functionality to switches. However, the limited memory capacity on switches has been a major challenge that such applications struggle to deal with. In this paper, we envision that by enabling network switches to access remote memory purely from data planes, the performance of a wide range of applications can be improved. We design three remote memory primitives, leveraging RDMA operations, and show the feasibility of accessing remote memory from switches using our prototype implementation.
SIGCOMM

HyperLoop: Group-Based NIC-Offloading to Accelerate Replicated Transactions in Multi-Tenant Storage Systems

Daehyeok Kim, Amirsaman Memaripour, Anirudh Badam, Yibo Zhu, Hongqiang Harry Liu, Jitu Padhye, Shachar Raindel, Steven Swanson, Vyas Sekar, and Srinivasan Seshan

In Proceedings of ACM SIGCOMM conference, August 2018

Abs PDF Slides Talk + Q&A video

Storage systems in data centers are an important component of large-scale online services. They typically perform replicated transactional operations for high data availability and integrity. Today, however, such operations suffer from high tail latency even with recent kernel bypass and storage optimizations, and thus affect the predictability of end-to-end performance of these services. We observe that the root cause of the problem is the involvement of the CPU, a precious commodity in multi-tenant settings, in the critical path of replicated transactions. In this paper, we present HyperLoop, a new framework that removes CPU from the critical path of replicated transactions in storage systems by offloading them to commodity RDMA NICs, with non-volatile memory as the storage medium. To achieve this, we develop new and general NIC offloading primitives that can perform memory operations on all nodes in a replication group while guaranteeing ACID properties without CPU involvement. We demonstrate that popular storage applications can be easily optimized using our primitives. Our evaluation results with microbenchmarks and application benchmarks show that HyperLoop can reduce 99th percentile latency ≈800x with close to 0% CPU consumption on replicas.

2017

Comm. Letter

REboost: Improving Throughput in Wireless Networks using Redundancy Elimination

Kilho Lee, Daehyeok Kim, and Insik Shin

IEEE Communications Letters, January 2017

Abs PDF

Traffic redundancy elimination (RE) is an attractive approach to improve the throughput in bandwidth-limited networks. While previous studies show that the RE is useful for improving the throughput in such networks, we observed that the RE would not be an effective solution in wireless networks. We found the TCP congestion control cannot take advantage of the RE, without knowing how the underlying RE system manipulates each TCP packet. In this letter, we present a novel technique called REboost to enable the TCP layer to be aware of the underlying RE system and improve the throughput. Our evaluation with a prototype shows that REboost significantly improves the throughput compared with the previous RE systems.

2016

NDSS

What Mobile Ads Know About Mobile Users

Sooel Son, Daehyeok Kim, and Vitaly Shmatikov

In Proceedings of 23rd Network and Distributed System Security Symposium, February 2016

Abs PDF Slides

We analyze the software stack of popular mobile advertising libraries on Android and investigate how they protect the users of advertising-supported apps from malicious advertising. We find that, by and large, Android advertising libraries properly separate the privileges of the ads from the host app by confining ads to dedicated browser instances that correctly apply the same origin policy. We then demonstrate how malicious ads can infer sensitive information about users by accessing external storage, which is essential for media-rich ads in order to cache video and images. Even though the same origin policy prevents confined ads from reading other apps’ external-storage files, it does not prevent them from learning that a file with a particular name exists. We show how, depending on the app, the mere existence of a file can reveal sensitive information about the user. For example, if the user has a pharmacy price-comparison app installed on the device, the presence of external-storage files with certain names reveals which drugs the user has looked for. We conclude with our recommendations for redesigning mobile advertising software to better protect users from malicious advertising.
NDSS

FlexDroid: Enforcing In-App Privilege Separation in Android

Jaebaek Seo, Daehyeok Kim, Donghyun Cho, Taesoo Kim, and Insik Shin

In Proceedings of 23rd Network and Distributed System Security Symposium , February 2016

Abs PDF Slides

Mobile applications are increasingly integrating third-party libraries to provide various features, such as advertising, analytics, social networking, and more. Unfortunately, such integration with third-party libraries comes with the cost of potential privacy violations of users, because Android always grants a full set of permissions to third-party libraries as their host applications. Unintended accesses to users’ private data are underestimated threats to users’ privacy, as complex and often obfuscated third-party libraries make it hard for application developers to estimate the correct behaviors of third-party libraries. More critically, a wide adoption of native code (JNI) and dynamic code executions such as Java reflection or dynamic code reloading, makes it even harder to apply state-of-the-art security analysis. In this work, we propose FLEXDROID, a new Android security model and isolation mechanism, that provides dynamic, fine-grained access control for third-party libraries. With FLEXDROID, application developers not only can gain a full control of third-party libraries (e.g., which permissions to grant or not), but also can specify how to make them behave after detecting a privacy violation (e.g., providing a mock user’s information or kill). To achieve such goals, we define a new notion of principals for third-party libraries, and develop a novel security mechanism, called inter-process stack inspection that is effective to JNI as well as dynamic code execution. Our usability study shows that developers can easily adopt FLEXDROID’s policy to their existing applications. Finally, our evaluation shows that FLEXDROID can effectively restrict the permissions of third-party libraries with negligible overheads.

2015

RTSS

SounDroid: Supporting Real-Time Sound Application on Commodity Mobile Devices

Hyosu Kim, SangJeong Lee, Wookhyun Han, Daehyeok Kim, and Insik Shin

In Proceedings of 36th IEEE Real-Time Systems Symposium, December 2015

Abs PDF

A variety of advantages from sounds such as measurement and accessibility introduces a new opportunity for mobile applications to offer broad types of interesting, valuable functionalities, supporting a richer user experience. However, in spite of the growing interests on mobile sound applications, few or no works have been done in focusing on managing an audio device effectively. More specifically, their low level of real-time capability for audio resources makes it challenging to satisfy tight timing requirements of mobile sound applications, e.g., a high sensing rate of acoustic sensing applications. To address this problem, this work presents the SounDroid framework, an audio device management framework for real-time audio requests from mobile sound applications. The design of SounDroid is based on the requirement analysis of audio requests as well as an understanding of the audio playback procedure including the audio request scheduling and dispatching on Android. It then incorporates both real-time audio request scheduling algorithms, called EDF-V and AFDS, and dispatching optimization techniques into mobile platforms, and thus improves the quality-of-service of mobile sound applications. Our experimental results with the prototype implementation of SounDroid demonstrate that it is able to enhance scheduling performance for audio requests, compared to traditional mechanisms (by up to 40% of improvement), while allowing deterministic dispatching latency.
INFOCOM

Optimized Layered Integrated Video Encoding

Sangki Yun, Daehyeok Kim, Xiaofan Lu, and Lili Qiu

In Proceedings of 34th IEEE International Conference on Computer Communications, April 2015

Abs PDF

Wireless video traffic has grown at an unprecedented rate and put significant burden on wireless networks. Multicast can significantly reduce traffic by sending a single video to multiple receivers simultaneously. On the other hand, wireless receivers are heterogeneous due to both channel and antenna heterogeneity, the latter of which is rapidly increasing with the emergence of 802.11n and 802.11ac. In this paper, we develop optimized layered integrated video encoding (LIVE) to guarantee reasonable performance to weaker receivers (with worse channel and/or fewer antennas) and allow stronger receivers to enjoy better quality. Our approach has three distinct features: (i) It uses a novel layered coding to naturally accommodate the heterogeneity of different video receivers; (ii) It uses an optimization framework to optimize the amount of time used for transmission and the amount of information to transmit at each layer under the current channel condition; and (iii) It uses an integrated modulation, where most video data are transmitted using soft modulation to enjoy efficiency and resilience while the most important video data are transmitted using a combination of soft modulation and conventional hard modulation to further enhance their reliability. To our knowledge, this is the first approach that handles MIMO antenna heterogeneity in wireless video multicast. We demonstrate its effectiveness through extensive Matlab simulation and USRP testbed experiments.

2014

CCS

ATRA: Address Translation Redirection Attack against Hardware-based External Monitors

Daehee Jang, Hojoon Lee, Minsu Kim, Daehyeok Kim, Daegyeong Kim, and Brent B. Kang

In Proceedings of 21st ACM Conference on Computer and Communications Security, November 2014

Abs PDF

Hardware-based external monitors have been proposed as a trustworthy method for protecting the kernel integrity. We introduce the design and implementation of Address Translation Redirection Attack (ATRA) that enables complete evasion of the hardware-based external monitor that anchors its trust on a separate processor. ATRA circumvents the external monitor by redirecting the memory access to critical kernel objects into a non-monitored region. Despite the seriousness of the ATRA issue, the address translation integrity has been assumed in many hardware-based external monitors and the possibility of its exploitation has been suggested yet many considered hypothetical. We explore the intricate details of ATRA, explain major challenges in realizing ATRA in practice, and address them with two types of ATRA called Memory-bound ATRA and Register-bound ATRA. Our evaluations with benchmarks show that ATRA does not introduce a noticeable performance degradation to the host system, proving practical applicability of the attack to alert the researchers to seriously address ATRA in designing future external monitors.

2013

MobiCom

Fine-grained Spectrum Adaptation in WiFi Networks

Sangki Yun, Daehyeok Kim, and Lili Qiu

In Proceedings of 20th ACM International Conference on Mobile Computing and Networking, September 2013

Abs PDF

Explosive growth of WiFi traffic calls for new technologies to dramatically improve spectrum efficiency. In this paper, we propose an approach to adapt the spectrum on a per-frame basis. It consists of three major components: (i) a fine-grained spectrum access design that allows a sender and receiver to change their transmission and reception spectrum on demand, (ii) fast and accurate spectrum detection that allows a receiver to determine which spectrum is used by its sender on a per-frame basis by exploiting the IEEE 802.11 preamble structure, and (iii) an efficient spectrum allocation algorithm that determines which spectrum to use for each transmission by taking into account frequency diversity and interference. It can further be adapted to perform a joint assignment of spectrum, schedule, and access point (AP) for each frame. Using a SORA implementation and trace-driven simulation, we demonstrate the feasibility of per-frame spectrum adaptation and its significant benefit over existing channel assignment approaches. To our knowledge, this is the first per-frame spectrum adaptation prototype for WiFi networks.

2012

MS Thesis

Optimal Combination of Opportunistic Routing and Network Coding for Minimizing Transmission Time

Daehyeok Kim

MS Thesis, Pohang University of Science and Technology, February 2012
WCNC

Multi-rate Combination of Opportunistic Routing and Network Coding

Daehyeok Kim, and Young-Joo Suh

In Proceedings of 9th IEEE Wireless Communications and Networking Conference, April 2012

Abs PDF Slides

Recently, wireless communication methods that exploit the broadcast nature of the wireless medium have been attracting growing attention. Among these methods, opportunistic routing and network coding are regarded as the most promising techniques. While there have been some attempts to combine opportunistic routing with network coding to capture the advantages of both techniques, none of these attempts has considered bit-rate selection for data transmission in multi-rate wireless networks. In this paper, we study the potential benefits of the combination of opportunistic routing and network coding with the bit-rate selection mechanism from an optimization perspective. We develop a theoretical model and algorithm for finding the optimal forwarding scheme for a multi-rate combination of opportunistic routing and network coding in a given network. MIT Roofnet trace-based simulations show that considering bit-rate selection in combination with opportunistic routing and network coding has substantial benefits in terms of the expected transmission time compared to multi-rate opportunistic routing, multi-rate network coding, and a fixed-rate combination approach.

2011

JCSE

Multicast Extension to Proxy Mobile IPv6 for Mobile Multicast Services

Daehyeok Kim, Wan-Seon Lim, and Young-Joo Suh

Journal of Computing Science and Engineering, December 2011

Abs PDF

Recently, Proxy Mobile IPv6 (PMIPv6) has received much attention as a mobility management protocol in next-generation all-IP mobile networks. While the current research related to PMIPv6 mainly focuses on providing efficient handovers for unicast-based applications, there has been relatively little interest in supporting multicast services with PMIPv6. To provide support for multicast services with PMIPv6, there are two alternative approaches called Mobile Access Gateway (MAG)-based subscription and Local Mobility Anchor (LMA)-based subscription. However, MAG-based subscription causes a large overhead for multicast joining and LMA-based subscription provides non-optimal multicast routing paths. The two approaches may also cause a high packet loss rate. In this paper, we propose an efficient PMIPv6-based multicast protocol that aims to provide an optimal delivery path for multicast data and to reduce handover delay and packet loss rate. Through simulation studies, we found that the proposed protocol outperforms existing multicast solutions for PMIPv6 in terms of end-to-end delay, service disruption period, and the number of lost packets during handovers.