Crossroads Seminars

The Crossroads seminar series is offered regularly on Fridays 2~3pm (US eastern). On the 4th Friday of each month, we will have a featured talk on the latest research results by the center's PIs and students. In remaining weeks, we will host a diverse range of talks including work-in-progress and outside visitors.

Upcoming Seminars

Portrait of
Friday, December 24, 2021 | 2pm~3pm ET

There will not be a featured seminar on December 24. Please enjoy the break.

Past Seminars

Portrait of Nirav Atre
Friday, November 19, 2021 | 2pm~3pm ET

SurgeProtector: Mitigating Algorithmic Complexity Attacks using Adversarial Scheduling
Nirav Atre, Carnegie Mellon University

Abstract: Algorithmic complexity attacks (ACAs) are a class of Denial-of-Service (DoS) attacks where an attacker uses a small amount of adversarial traffic to induce a large amount of work in the target system, pushing the system into overload and causing it to drop packets from innocent users. ACAs are particularly dangerous because, unlike volumetric DoS attacks, ACAs don't require a significant network bandwidth investment from the attacker. Today, network functions (NFs) on the Internet must be painstakingly designed and engineered on a case-by-case basis to mitigate the debilitating impact of ACAs. Further, the resulting designs tend to be overly conservative in their attack mitigation strategy, limiting the innocent traffic that the NF can serve during common-case operation.

In this talk, I will present a general framework we designed to make any NF more resilient to ACAs without the limitations of prior approaches. Our framework, SurgeProtector, uses the NF's scheduler to mitigate the impact of ACAs using a very traditional scheduling algorithm---Weighted Shortest Job First (WSJF). To evaluate SurgeProtector, we propose a new metric of DoS vulnerability called the Displacement Factor (DF), which quantifies the maximum "harm per unit effort" an adversary can inflict on the system. Using novel insights from adversarial scheduling theory, we show that any system using WSJF has a worst-case DF of only a small constant (unity), where traditional schedulers would place no upper bound on the adversary's DF. Illustrating that SurgeProtector is not only theoretically, but practically robust, we integrate SurgeProtector into an open source Intrusion Detection System (IDS). Under simulated attack, the SurgeProtector-augmented IDS suffers 90-99% lower innocent traffic loss than the original system.

Bio: Nirav is a fourth-year Ph.D. student in Computer Science at Carnegie Mellon University (CMU) advised by Prof. Justine Sherry. His research interests broadly lie at the intersection of networking and performance modeling. Prior to starting graduate school, Nirav completed his BASc in Computer Engineering at the University of Toronto, Canada, in 2018.

Portrait of Jing Li
(Invited) Friday, November 5, 2021 | 2pm~3pm ET

It is All About Abstraction: Virtualizing FPGAs in the Cloud
Jing Li, University of Pennsylvania

Abstract: We have seen growing interests and benefits in exploiting FPGAs as a first-class citizen in cloud computing. Cloud vendors such as Amazon and Microsoft have begun to support on-demand FPGA acceleration in various forms of cloud service. Nonetheless, system support for cloud FPGAs is still in its infancy. The lack of efficient virtualization support makes it challenging to fully unleash the benefits of integrating FPGAs into the cloud infrastructure, leading to low elasticity and resource utilization. There are many historical and practical reasons for that: traditional FPGAs and the associated compilation tools are not designed and optimized for the multi-tenant and resource-sharing cloud computing environment. And there is no widely adopted simple hardware/software interface for spatial architecture i.e., FPGA compared to temporal architecture such as CPU.

In this talk, I will present our exploratory efforts to address these limitations. I will first present the key requirements that we identified for virtualizing spatial architecture and present a generic virtualization stack that satisfies the requirements for heterogeneous FPGA clusters. Specifically, I will introduce a two-level system abstraction that can decouple the compilation and resource allocation and thus enables fine-grained resource management with low compilation overhead. I will present how we modify existing compilation flow and runtime management to leverage the proposed abstraction to achieve efficient virtualization. Finally, I will discuss further optimization opportunities through two case studies.

Bio: Jing (Jane) Li is the Eduardo D. Glandt Faculty Fellow and Associate Professor of Electrical and Systems Engineering and of Computer and Information Science at the University of Pennsylvania. She is broadly interested in developing fundamental methods for workload optimized systems. To validate the research ideas, her research puts a strong emphasis on real system prototyping both at chip level and system level. She is the recipient of DARPA's Young Faculty Award, NSF Career Award, IBM Research Division Outstanding Technical Achievement Award for successfully achieving CEO milestone, multiple invention achievement awards and high value patent application awards from IBM. Previously she was the Dugald C. Jackson Assistant Professor at the University of Wisconsin–Madison and a faculty affiliate with the UW-Madison Computer Architecture group and Machine Learning group. She is one of the PIs in SRC JUMP center – Center for Research on Intelligent Storage and Processing-In-Memory (CRISP). She spent her early career at IBM T. J. Watson Research Center as a Research Staff Member after obtaining her PhD degree from Purdue University.

Portrait of Naif Tarafdar and Paul Chow
(Invited) Friday, October 29, 2021 | 2pm~3pm ET

AIgean: An Open Framework for Deploying Machine Learning on Heterogeneous Clusters
Naif Tarafdar and Paul Chow, University of Toronto

Abstract: AIgean, pronounced like the sea, is an open framework to build and deploy machine learning (ML) algorithms on a heterogeneous cluster of devices (CPUs and FPGAs). We present AIgean as a use case for our multi-FPGA deployment infrastructure: Galapagos. AIgean provides a full end-to-end multi-FPGA/CPU implementation of a neural network. The user supplies a high-level neural network description and our tool flow is responsible for the synthesizing of the individual layers, partitioning layers across different nodes as well as the bridging and routing required for these layers to communicate. If the user is an expert in a particular domain and would like to tinker with the implementation details of the neural network, we define a flexible implementation stack for ML that includes the layers of Applications & Algorithms, Cluster Deployment & Communication, and Hardware. The Cluster Deployment & Communication and Hardware leverages the Galapagos layer abstractions where the communication protocol is abstracted from the application and the hardware implementations are abstracted from the physical hardware being used. This allows the user to modify specific layers of abstraction without having to worry about components outside of their area of expertise. We demonstrate the effectiveness of AIgean with three use cases: a small network running on a single network-connected FPGA, an autoencoder running on three FPGAs, and ResNet-50 running across twelve FPGAs.
Bio: Naif Tarafdar is a fifth year PhD candidate at the University of Toronto. He has previously interned at Xilinx Research and Microsoft Research. His main research interest is in democratizing heterogeneous compute to give access to as many new users as possible. This can be through abstraction layers, APIs and programming models. He is the chief architect in Galapagos, a heterogeneous multi-FPGA development stack at the University of Toronto.

Paul Chow is a professor in the faculty of The Edward S. Rogers Sr. Department of Electrical and Computer Engineering at the University of Toronto. He is a Fellow of the IEEE and Fellow of the Engineering Institute of Canada. His main research is about making FPGAs into computing devices so that applications can be easily deployed. In particular, he wants to do this at scale in a heterogeneous environment where FPGAs seamlessly interact with CPUs and other devices, all as peers, and transparently to the application.

Portrait of David Z. Pan
Friday, October 22, 2021 | 2pm~3pm ET

FPGA Placement: Recent Progress and Road Ahead
David Z. Pan, The University of Texas at Austin

Abstract: In the FPGA implementation flow, placement plays a crucial role in determining the overall quality of results and runtime. After synthesis and logic mapping, placement determines the physical locations of heterogeneous instances to optimize wirelength, timing, power, routability, etc., while meeting various constraints in modern FPGAs. This talk will give an overview of recent progress of FPGA placement targeting large-scale heterogeneous FPGAs, including UTPlaceF which won ISPD FPGA Placement Contests before, and the current academic state-of-the-art elfPlace. Since placement may need to be called many times to achieve design closure, it is very important to ensure high scalability with increasing design complexity, e.g., on future 3D FPGAs. We will discuss how to scale-up and accelerate FPGA placement algorithms. We will also discuss some future directions, e.g., open-source to enable cross-team collaborations, and leveraging machine learning hardware/software for FPGA placement.
Bio: David Z. Pan is a Professor and Silicon Laboratories Endowed Chair at the Department of Electrical and Computer Engineering, The University of Texas at Austin.  His research interests include bidirectional AI and IC interactions, electronic design automation, design for manufacturing, hardware security, and CAD for analog/mixed-signal ICs and emerging technologies. He has published over 400 refereed journal/conference papers and 8 US patents. He has served in many journal editorial boards and conference committees, including various leadership roles such as ICCAD 2019 General Chair, ASP-DAC 2017 TPC Chair, and ISPD 2008 General Chair. He has received many awards, including SRC Technical Excellence Award, 19 Best Paper Awards (at DAC, ICCAD, DATE, ASP-DAC, ISPD, HOST, etc.), DAC Top 10 Author Award in Fifth Decade, ASP-DAC Frequently Cited Author Award, Communications of ACM Research Highlights, ACM/SIGDA Outstanding New Faculty Award, NSF CAREER Award, IBM Faculty Award (4 times), and many international CAD contest awards. He has graduated 40 PhD students and postdocs who have won many awards, including the First Place of ACM Student Research Competition Grand Finals (twice, in 2018 and 2021), ACM/SIGDA Student Research Competition Gold Medal (three times), ACM Outstanding PhD Dissertation in EDA Award (twice), EDAA Outstanding Dissertation Award (twice), etc. He is a Fellow of IEEE and SPIE.

Portrait of Justine Sherry
Friday, September 24, 2021 | 2pm~3pm ET

Crossroads RV1: Exploring Data on the Move Applications
Justine Sherry, Carnegie Mellon University

Abstract: The Crossroads FPGA is uniquely positioned to support applications which operate with high throughput over data "on the move" between endpoints such as CPUs, GPUs, Storage, Network, and other platforms. In this talk, we will highlight two applications under development in the Crossroads center. First, the Pigasus IDS is a 100Gbps hybrid FPGA + CPU platform for network security. Next, Norman is a new network dataplane for Linux that offloads the OS networking stack onto a Crossroads FPGA. We will discuss the high level goals of Pigasus and Norman, some of their design details, and finally contrast their two different approaches to using Crossroads: Pigasus, as an "FPGA-centric" design with the CPU working in support of the FPGA, and Norman as a "CPU-centric" design, with the FPGA working in support of the CPU.
Bio: Justine Sherry is an assistant professor at Carnegie Mellon University. Her interests are in computer networking; her work includes middleboxes, networked systems, measurement, cloud computing, and congestion control. Her recent research focuses on new opportunities and challenges arising from the deployment of middleboxes -- such as firewalls and proxies -- as services offered by clouds and ISPs. Dr. Sherry received her PhD (2016) and MS (2012) from UC Berkeley, and her BS and BA (2010) from the University of Washington. She is a recipient of the SIGCOMM doctoral dissertation award, the David J. Sakrison prize, paper awards at USENIX NSDI and ACM SIGCOMM, and an NSF Graduate Research Fellowship. Most importantly, she is always on the lookout for a great cappuccino.

Portrait of Vaughn Betz
Friday, July 23, 2021 | 2pm~3pm ET

Verilog to Routing (VTR): A Flexible Open-Source CAD Flow to Explore and Target Diverse FPGA Architectures
Vaughn Betz, University of Toronto

Abstract: With the need for improvements in compute performance and efficiency beyond what process scaling can provide, FPGAs and FPGA-like programmable accelerators that can target a range of compute tasks efficiently are of interest in many application areas. However, creating a new CAD flow that can evaluate and map circuits to a new programmable architecture remains a daunting task, making flexible CAD flows that can be quickly retargeted to new architectures highly desirable.

This talk will give an overview of the Verilog-to-Routing (VTR) open source tool flow that addresses this need. We'll discuss recent enhancements to VTR that have broadened the range of architetures it can target, and allow it to not only evaluate new FPGA architectures, but also program the chosen architectures that are committed to silicon.

Architecture flexibility can have a cost however, and a common conception in the FPGA Computer Aided Design (CAD) community is that architecture-specific algorithms and tools will significantly out-perform more general approaches which target a variety of FPGA architectures. In this talk we'll show how through careful algorithm design and code architecture VTR has improved result quality without architecture-specific code, challenging the idea that result quality and architecture flexibility are mutually exclusive. We will detail the key packing and routing enhancements that led to large improvements in wirelength and timing, while simultaneously reducing run time by over 6x.

Finally, we'll present efforts to use Reinforcement Learning to create more adaptable and efficient CAD algorithms. Taking placement as an example, we'll show how an RL-enhanced move generator can improve the quality/run-time trade-off of VTR's placement algorithm.
Bio: Vaughn Betz is a Professor and the NSERC/Intel Industrial Research Chair in Programmable Silicon at the University of Toronto. He is the original developer of the widely used VPR FPGA placement, routing and architecture evaluation CAD flow, and a lead developer in the VTR project that has built upon VPR. He co-founded Right Track CAD to commercialize VPR, and joined Altera upon its acquisition of Right Track CAD. Dr. Betz spent 11 years at Altera, ultimately as Senior Director of software engineering, and is one of the architects of the Quartus CAD system and the first five generations of the Stratix and Cyclone FPGA families. He holds 101 US patents and has published over 100 technical articles in the FPGA area, thirteen of which have won best or most significant paper awards. Dr. Betz is a Fellow of the IEEE and the National Academy of Inventors, and a Faculty Affiliate of the Vector Institute for Artificial Intelligence.

Portrait of James C. Hoe
Friday, July 16, 2021 | 2pm~3pm ET

From “Field Programmable” to “Programmable”
James C. Hoe, Carnegie Mellon University

Abstract: This talk is an overview of RV5.

To elevate FPGAs from logic to computing roles, we need to address the greater requirement for programmability beyond being a “field programmable” ASIC. A computing FPGA will be asked to do more tasks than could fit on the fabric at once and to do new tasks that are unknown before deployment. Moreover, dynamically managing the logic resource utilization is a presently under-tapped source of performance optimization----by devoting available resources to only active tasks or by supporting tasks with differently-optimized design variants to changing conditions.

To maximally exploit the benefits of FPGAs’ programmability, the Intel/VMware Crossroads 3D FPGA Academic Research Center aims to make runtime reprogramming a regular mode of operation for Crossroads 3D-FPGA in future datacenter servers. This talk will motivate the need for a new, expanded design mindset by FPGA users and designers to fully pursue FPGAs’ programmability and dynamism. The talk next presents the Crossroads Center’s research toward realizing this new usage and programming on the Crossroads 3D-FPGA for datacenter applications. The talk will present a re-design of the Pigasus network intrusion detection/prevention system (IDS/IPS) following a design methodology to leverage the flexible and dynamic capabilities of FPGA targets.
Bio: James C. Hoe is a Professor of Electrical and Computer Engineering at Carnegie Mellon University. He received his Ph.D. in EECS from Massachusetts Institute of Technology in 2000 (S.M., 1994). He received his B.S. in EECS from UC Berkeley in 1992. He is interested in many aspects of computer architecture and digital hardware design, including the specific areas of FPGA architecture for computing; digital signal processing hardware; and high-level hardware design and synthesis. He is a Fellow of IEEE. For more information, please visit

Portrait of Zhipeng Zhao
Friday, July 9, 2021 | 2pm~3pm ET

Pigasus: Efficient Handling of Input-Dependent Streaming on FPGAs
Zhipeng Zhao, Carnegie Mellon University

Abstract: FPGAs have well-demonstrated success in many networking applications but failed in accelerating Intrusion Detection and Prevention Systems(IDS/IPS). The root cause is the mismatch of the traditional static, fixed-performance FPGA design and input-dependent behaviors of IDS/IPS. As a result, the design is provisioned to handle worst-case, losing the opportunity to utilize the resource allocated for worst-case to improve the common-case performance.

In this talk, I will present an FPGA based IDS/IPS called Pigasus which is tailored to the common-case, thus using minimal resources to extract maximum performance. Pigasus can achieve 100Gbps using 1 FPGA and on average 5 CPU cores, 100x faster than CPU-only baseline and 50x faster than existing FPGA designs. A natural objection to this design is that it will suffer from shifting workloads. In the second part, I will show how to use a disaggregated architecture and spillover mechanism to scale subcomponents of the system on demand to address changes in the traffic profile at both compile time and runtime.
Bio: Zhipeng Zhao is a Ph.D. candidate in Electrical and Computer Engineering at Carnegie Mellon University, advised by Prof. James C. Hoe. His research interests broadly lie at the intersection of FPGA and networking. Prior to CMU, he received a BS and an MS in Electrical Engineering, both from Beihang University, China.

Portrait of Derek Chiou
Friday, June 25, 2021 | 2pm~3pm ET

Soft Processor Overlays to Improve Time-to-Solution
Derek Chiou, The University of Texas at Austin

Abstract: Soft Processor Overlays are application-specific processors implemented in FPGA logic. Overlays can be more efficient than standard processors because they can be highly specialized and can get to a working implementation faster than dedicated circuits in FPGAs because they have software compile times and are more debuggable. In this talk, I will discuss prior work in overlays, how we plan to experiment with, develop, and use overlays in Research Vector 2 (RV2) of the Intel/VMware Crossroads 3D-FPGA Academic Research Center, and how those overlays will influence and interact with other research vectors in the center, such as investigations on 3D FPGA base-die architecture (RV3) and partial reconfiguration (RV5).
Bio: Derek Chiou is a Research Scientist in the Electrical and Computer Engineering Department at The University of Texas at Austin and a Partner Architect at Microsoft responsible for future infrastructure hardware architecture. He is a co-founder of the Microsoft Azure SmartNIC effort and lead the Bing FPGA team to first deployment of Bing ranking on FPGAs. Until 2016, he was an associate professor at UT. Before UT, Dr. Chiou was a system architect and lead the performance modeling team at Avici Systems, a manufacturer of terabit core routers. Dr. Chiou received his Ph.D., S.M. and S.B. degrees in Electrical Engineering and Computer Science from MIT.

Portrait of Sanil Rao
Friday, June 11, 2021 | 2pm~3pm ET

High-Performance Code Generation for Graph Applications
Sanil Rao, Carnegie Mellon University

Abstract: Software libraries have been a staple in computing, providing users with a maintained interface of functions for their applications. One such library, GraphBLAS, is used in the graph processing community because of its foundation in linear algebra, and its clear description of the overarching computation through its library calls. One issue that arises however, is these library calls have the potential to leave performance behind when looking for optimization, especially when one considers multiple library calls. Simply writing additional merged library calls is impractical given the importance of library clarity, and writing a general-purpose compiler that understands library call semantics would be infeasible. Therefore, we propose an approach from a higher level of abstraction, treating the GraphBLAS library as a specification, and generating code that understands the libraries’ semantics. We transform library calls to their linear algebraic descriptions, and use pattern matching techniques to look for optimizations. Preliminary results show that our code generation system, SPIRAL, achieves performance matching that of hand-optimized codes, while keeping the clarity of both the original library and user application.
Bio: Sanil Rao is a second-year PhD student In Electrical and Computer Engineering at CMU advised by Prof. Franz Franchetti and part of the SPIRAL group. His research focus is in the area of programming languages and compilers, specifically code generation. Prior to CMU, he received a BS in Computer Science from the University of Viriginia.

Portrait of Hugo Sadok
Friday, May 28, 2021 | 2pm~3pm ET

We Need Kernel Interposition over the Network Dataplane
Hugo Sadok, Carnegie Mellon University

Abstract: Kernel-bypass networking, which allows applications to circumvent the kernel and interface directly with NIC hardware, is one of the main tools for improving application network performance. However, allowing applications to circumvent the kernel makes it impossible to use tools (e.g., tcpdump) or impose policies (e.g., QoS and filters) that need to interpose on traffic sent by different applications running on a host. This makes maintainability and manageability a challenge for kernel-bypass applications. In response, we propose Kernel On-Path Interposition (KOPI), in which traditional kernel dataplane functionality is retained but implemented in a fully programmable SmartNIC. We hypothesize that KOPI can support the same tools and policies as the kernel stack while retaining the performance benefits of kernel bypass.
Bio: Hugo Sadok is a second-year PhD student in Computer Science at CMU advised by Prof. Justine Sherry and part of the SNAP Lab. His research interests are broadly in computer networks and computer systems. Prior to CMU, he received a BS in Electronic and Computer Engineering and an MS in Electrical Engineering, both from UFRJ.

Portrait of James C. Hoe
Friday, May 14, 2021 | 2pm~3pm ET

The Role for Programmable Logic in Future Datacenter Servers
(An Overview of the Crossroads Center)

James C. Hoe, Carnegie Mellon University

Abstract: This talk is a rerun for those affiliated with the center and intended to introduce the Center to the outside audience.

Field Programmable Gate Arrays (FPGAs) have been undergoing rapid and dramatic changes fueled by their expanding use in datacenter computing. Rather than serving as a compromise or alternative to ASICs, FPGA 'programmable logic' is emerging as a third paradigm of compute that stands apart from traditional hardware vs. software archetypes. The Crossroads 3D-FPGA Research Center has been formed with the goal to define a new role for programmable logic in future datacenter servers. Guided by both the demands of modern network-driven, data-centric computing and the new capabilities from 3D integration, this center is developing the Crossroads 3D-FPGA as a new central fixture component on future server motherboards, serving to connect all server endpoints (network, storage, memory, CPU) intelligently. As a literal crossroads of data, a Crossroads 3D-FPGA can apply application-specific functions over data-on-the-move between any pair of server endpoints, intelligently steer data to the right core or accelerator, and reduce and compress the volume of data that needs to be moved between servers. This talk will overview the Crossroads 3-D FPGA concepts, as well as the associated set of research thrusts to pursue a full-stack solution spanning application, programming support, dynamic runtime, design automation, and architecture.
Bio: James C. Hoe is a Professor of Electrical and Computer Engineering at Carnegie Mellon University. He received his Ph.D. in EECS from Massachusetts Institute of Technology in 2000 (S.M., 1994). He received his B.S. in EECS from UC Berkeley in 1992. He is interested in many aspects of computer architecture and digital hardware design, including the specific areas of FPGA architecture for computing; digital signal processing hardware; and high-level hardware design and synthesis. He is a Fellow of IEEE. For more information, please visit

Portrait of Joseph Melber
Friday, April 23, 2021 | 2pm~3pm ET

Raising the Level of Abstraction for FPGA System Design
Joseph Melber, Carnegie Mellon University

Abstract: Current Field Programmable Gate Array (FPGA) programming abstractions give disproportionate emphasis to reducing the design effort for processing kernels than to the memory access side of the design task. Designers are asked to build all the datapaths for on-chip buffering and data movements, as well as the state machines to coordinate these datapath activities. These datapaths are often ad-hoc efforts that are not generally reusable.

Software programmers leverage abstraction to simplify their design efforts⸺hardware designers should be supported by similar abstractions in order to increase FPGA programmability in modern computing systems. In this talk, I will focus on (1) re-imagining what memory should look like for FPGA hardware designers, and (2) virtualizing functionalities, devices and platforms for FPGA computing. I have been investigating a service-oriented abstraction and framework to simplify hardware design efforts for FPGA accelerator’s memory systems. The goal is to enable FPGA accelerator designers to configure a specialized memory system that presents abstract semantic-rich memory operations, across diverse memory devices, without performance overhead. Current efforts also extend this abstraction to virtualize these functionalities across devices and architectures. I will conclude by discussing the future potential of my research and vision for FPGA computing.
Bio: Joseph Melber is a Ph.D. candidate in Electrical and Computer Engineering at Carnegie Mellon University. He is advised by Dr. James C. Hoe. His research interests are reconfigurable computing, and computer architecture. His research focuses on memory systems and programming abstractions for heterogeneous FPGA computing systems. He received his M.S. in Electrical and Computer Engineering from Carnegie Mellon University in 2016, and B.S. in EE from the University at Buffalo in 2014.