Crossroads Seminars

The Crossroads seminar series is offered regularly on Fridays 2~3pm (US eastern). On the 4th Friday of each month, we will have a featured talk on the latest research results by the center's PIs and students. In remaining weeks, we will host a diverse range of talks including work-in-progress and outside visitors.

Upcoming Seminars

Portrait of Prashanth Mohan
(Invited) Friday, June 24, 2022 | 2pm~3pm ET

Soft Embedded FPGA Fabrics: Top-down Physical Design and Applications
Prashanth Mohan, Carnegie Mellon University

Abstract: Embedded FPGA (eFPGA) fabrics are increasingly used in modern System-on-Chip (SoC) designs as their programmability can be leveraged to accelerate a variety of workloads and enable upgradeability, feature addition, and security. As technology scales down to sub 5nm nodes, designing eFPGA fabrics using custom layout techniques requires extensive design time (many months), suffers from poor process portability, and is not compatible with demanding SoC design schedules. On the other hnd, soft eFPGA fabrics described in RTL and designed using standard cells provide effortless process portability and have the potential to reduce the eFPGA physical design cycle from months to less than a day. Conventional design methodologies for implementing standard-cell-based eFPGA employ a bottom-up approach wherein individual tiles are synthesized in isolation and later stitched together to generate the large FPGA fabric. However, the bottom-up approach significantly deviates from push-button ASIC flows and requires manual floorplanning and buffering strategy for each FPGA architecture and process technology.

This work proposes a top-down design methodology fully compatible with standard ASIC design flows to facilitate the agile physical design of soft eFPGA just like any other digital block, without the manual effort required in bottom-up flow. We developed a soft eFPGA fabric generator using CHISEL and used it to tapeout a proof-of-concept homogenous and heterogeneous fabrics with BRAM and DSP tiles on 16nm and 22nm industrial CMOS FinFET process nodes. The true potential of soft eFPGA comes to light when it is integrated with other designs to enable new applications that were previously difficult to realize. We present two such applications: hardware redaction and reconfigurable co-processor for the RISC-V CPU. First, the idea of hardware redaction, a hardware obfuscation approach, is proposed to allow designers to substitute security-critical IP blocks within a design with a synthesizable eFPGA. eFPGA redaction was demonstrated by obfuscating the control path of a RISC-V CPU. Second, a heterogeneous soft eFPGA fabric was integrated as a RISC-V co-processor to support custom RISC-V instructions on a 22nm SoC test chip.
Bio: Prashanth Mohan is a Ph.D. student at Carnegie Mellon University. Prior to that, he completed his Masters in Electronics Design at the Indian Institute of Science, Bangalore, and then worked as a physical design engineer in Nvidia for two and a half years. His research interests include VLSI design, FPGAs, and hardware security.

Past Seminars

Portrait of Mohamed S. Abdelfattah
(Invited) Friday, April 22, 2022 | 2pm~3pm ET

FPGAs are (not) Good at Deep Learning
Mohamed S. Abdelfattah, Cornell University

Abstract: There have been many attempts to use FPGAs to accelerate deep neural networks (DNNs), including many by the speaker of this talk. Some of these attempts ended up facing direct competition from GPUs and ASICs that are hyper-tuned for DNNs–inevitably, FPGAs often lose in that competition. However, there are many promising research directions in which FPGAs are indeed the best platform to accelerate parts of a deep learning workload. This talk will discuss several emerging paradigms in which FPGA strengths can be successfully leveraged for accelerating deep learning workloads. I will focus on (1) Automated DNN-HW codesign, (2) Using FPGA lookup tables as DNN building blocks and (3) The role of embedded networks on-chip in FPGA-powered datacenters.
Bio: Mohamed Abdelfattah is an Assistant Professor at Cornell Tech and in the Electrical and Computer Engineering Department at Cornell University. His research interests include deep learning systems, automated machine learning, hardware-software codesign, reconfigurable computing, and FPGA architecture. Mohamed’s goal is to design the next generation of machine-learning-centric computer systems for both datacenters and mobile devices.

Mohamed received his BSc from the German University in Cairo, his MSc from the University of Stuttgart, and his PhD from the University of Toronto. His PhD was supported by the Vanier Canada Graduate Scholarship and he received three best paper awards for his work on embedded networks-on-chip for FPGAs. His PhD work garnered much industrial interest and has since been adopted by multiple semiconductor companies in their latest FPGAs. After his PhD, Mohamed spent time at Intel’s programmable solutions group, and most recently at Samsung where he led a research team focused on hardware-aware automated machine learning.

Portrait of Sang-Woo Jun
(Invited) Friday, March 25, 2022 | 2pm~3pm ET

Near-Storage Acceleration in Practice: Opportunities and Challenges
Sang-Woo Jun, University of California, Irvine

Abstract: Modern high-density, high-performance storage devices coupled with power-efficient accelerators such as FPGAs have demonstrated extremely good cost and power efficiency on various applications, compared to conventional computer systems. Many off-the-shelf commercial offerings such as the Samsung SmartSSD already exist, putting such benefits within reach for real-world applications. However, extracting the most benefits from near-storage acceleration requires drastic changes to the role of the storage device as well as how the rest of the software interacts with it, which is a daunting process due to the differences in the abstraction level, programming model, and performance characteristics of near-storage acceleration compared to conventional storage.

In this talk, we present some of the more prominent challenges and the design patterns we discovered that help overcome them. We base our discoveries on multiple important applications including graph analytics and relational database queries, as well as unstructured log analytics explored in collaboration with VMware, targeting the Samsung SmartSSD platform. Our near-storage log analytics accelerator design is efficient enough to make the best possible use of underlying storage bandwidth, resulting in an order of magnitude throughput improvement compared to pure software such as Splunk and MonetDB, equipped with comparable system resources. These experiences show that it is feasible to augment cloud software with near-storage acceleration, resulting in a dramatically lower cost of operating cloud deployments.
Bio: Sang-Woo Jun is an assistant professor at the department of computer science, University of California, Irvine. He received his Ph.D in 2018 from the Massachusetts Institute of Technology, for his work on near-storage accelerators for graph analytics. His current research continues the topic of reconfigurable hardware accelerators coupled with fast storage devices for the purpose of making large-scale data analytics more affordable, targeting a wide array of scientific and enterprise applications.

Portrait of Mohamed Ibrahim
(Featured) Friday, February 25, 2022 | 2pm~3pm ET

High Performance CNN Inference Acceleration on FPGAs
Mohamed Ibrahim, University of Toronto

Abstract: Field Programmable Gate Arrays (FPGAs) are programmable devices that can implement any digital circuit. FPGAs have gained popularity in accelerating CNN computations due to their programmability, energy efficiency, customized operand precisions, and low time to market. HPIPE is a sparsity-aware deeply-pipelined CNN inference accelerator that converts a Tensorflow graph of a CNN into specialized hardware units that implement each layer in a CNN. HPIPE outperforms all CNN inference accelerators on PGAs; moreover, its performance surpasses a V100 GPU on ResNet-50 at a batch size of one. CNNs are used extensively in image classification, but it is not the only use for CNNs. Object detection is another critical application that incorporates CNNs. We integrate HPIPE with a hardware-friendly unit to accelerate object detection. In order to accelerate large CNNs and further enhance PIPE's performance, we develop an end-to-end flow to partition CNNs across multiple FPGAs.
Bio: Mohamed Ibrahim is currently a M.A.Sc. student at the department of Electrical and Computer Engineering (ECE) at the University of Toronto. He holds a BSc degree in Electronics and Communications Engineering from the American University in Cairo. His MSc main focus is scaling machine learning accelerators to systems of FPGA clusters. His work is done in collaboration with Intel PSG CTO.

Portrait of Akshitha Sriraman
(Invited) Friday, February 18, 2022 | 2pm~3pm ET

Re-thinking Data Center Hardware Architectures from the Ground-up
Akshitha Sriraman, Carnegie Mellon University

Abstract: Current hardware and software systems were conceived at a time when we had scarce compute and memory resources, limited data and application functionality, and easy hardware performance scaling due to Moore's Law. These assumptions are not true today. Today, modern data centers must manage a rapid growth in data, users, and application functionality, while also dealing with a decline in hardware performance scaling. However, modern server hardware has not sufficiently grown to meet these new data center application requirements. In fact, the fundamental architecture of a modern server still dates back to the compute-centric desktop PCs of the 1980s, managing memory at hardware speeds but accessing I/O through legacy software stacks and peripheral interfaces.

In this talk, I will focus on meeting modern web application requirements by fundamentally re-thinking data center hardware architectures from the ground-up. Specifically, I will detail my efforts towards answering the question of: How should we build data center hardware for emerging software paradigms in the post-Moore era? I will then conclude by describing my ongoing and future research on moving from a compute-centric to a data-centric hardware architecture to meet modern web application requirements.
Bio: Akshitha Sriraman is an Assistant Professor in the Department of Electrical and Computer Engineering at Carnegie Mellon University. Her research interests are in the area of bridging computer architecture and systems software, with a focus on making hyperscale data centers more efficient (via solutions that span the systems stack). The central theme of her work is to design software that is aware of new hardware constraints/possibilities and architect hardware that efficiently supports new hyperscale software requirements.

Sriraman's research has been recognized with an IEEE Micro Top Picks distinction and the 2021 David J. Kuck Dissertation Prize. She was awarded a Facebook Fellowship, a Rackham Merit Ph.D. Fellowship, and a CIS Full-Tuition Scholarship. She was also named a 2019 Rising Star in EECS. Sriraman completed her Ph.D. in Computer Science and Engineering at the University of Michigan.

Portrait of Hugo Sadok
Friday, February 11, 2022 | 2pm~3pm ET

Redesigning NIC Interfaces for Direct Application Access
Hugo Sadok, Carnegie Mellon University

Abstract: The increasing gap between network throughput and CPU performance has shifted the way network-intensive applications transfer data. These applications can no longer afford the overheads of the kernel network stack and instead communicate directly with the Network Interface Card (NIC). While this significantly improves performance, it is still challenging for applications to achieve the line rates offered by latest NICs. One of the factors that precludes applications from reaching the full potential of the communication hardware is that NICs expose a packet-level interface that places each individual packet in a separate fixed-sized memory buffer. Unfortunately, dedicating a buffer per packet imposes buffer management overhead and scattered memory accesses that interact poorly with the CPU cache.

In this talk I will present a new NIC interface design that eliminates most of the overheads imposed by the traditional packet-level interface. This new design builds upon two novel techniques: contiguous data buffers and reactive descriptors. Contiguous data buffers eliminate the need for buffer management while increasing L1d cache hits by making memory accesses sequential. Reactive descriptors pace arrival notifications according to how fast applications consumes the data, avoiding overwhelming applications with unnecessary notifications. I will show how this design lets us achieve beyond 140 Mpps (94 Gbps with min-sized packets) with a single CPU core compared to 64 Mpps (43 Gbps with min-sized packets) using a state-of-the-art NIC.
Bio: Hugo Sadok is a second-year PhD student in Computer Science at CMU advised by Prof. Justine Sherry and part of the SNAP Lab. His research interests are broadly in computer networks and computer systems. Prior to CMU, he received a BS in Electronic and Computer Engineering and an MS in Electrical Engineering, both from UFRJ.

Portrait of Huaicheng Li
(Invited) Friday, February 4, 2022 | 2pm~3pm ET

Towards Predictable and Efficient Datacenter Storage
Huaicheng Li, Carnegie Mellon University

Abstract: The increasing complexity in storage software and hardware brings new challenges to achieve predictable performance and efficiency. On the one hand, emerging hardware break long-held system design principles and are held back by aged and inflexible system interfaces and usage models, requiring radical rethinking on the software stack to leverage new hardware capabilities for optimal performance. On the other hand, the computing landscape is becoming increasingly heterogeneous and complex, demanding explicit systems-level support to manage hardware-associated complexity and idiosyncrasy, which is unfortunately still largely missing.

In this talk, I will discuss my efforts to build low-latency and cost-efficient datacenter storage systems. By revisiting existing storage interface/abstraction designs and software/hardware responsibility divisions, I will present holistic storage stack designs for cloud datacenters, which deliver orders of magnitude of latency improvement and significantly improved cost-efficiency.
Bio: Huaicheng is a postdoc at CMU in the Parallel Data Lab (PDL). He received his Ph.D. from University of Chicago. His interests are mainly in Operating Systems and Storage Systems, with a focus on building high-performance and cost-efficient storage infrastructure for datacenters. His research has been recognized by two best paper nominations at FAST (2017 and 2018) and has also made real impact, with production deployment in datacenters, code integration to Linux, and a storage research platform widely used by the research community.

Portrait of Daehyeok Kim
(Invited) Friday, Janurary 28, 2022 | 2pm~3pm ET

Unleashing the Potential of In-Network Computing
Daehyeok Kim, Microsoft

Abstract: Recent advances in programmable networking hardware create a new computing paradigm called in-network computing. This new paradigm allows functionality that has been served by commodity servers, ranging from network middleboxes to components of distributed systems, to be performed in the network. I argue that to fully unleash its potential, we need resource elasticity and fault resiliency via higher-level abstractions.

In this talk, I demonstrate that in-network computing can be elastic and resilient by designing high-level abstractions and runtime systems that enable us to effectively leverage compute and memory resources available outside of a single type of device -- e.g., programmable switches -- while hiding the complexities of dealing with device heterogeneity. I begin by introducing TEA, a framework that provides elastic memory by enabling memory-intensive in-switch applications, such as cloud-scale load balancers, to leverage DRAM on remote servers via virtual table abstraction. Then I present ExoPlane and RedPlane, frameworks that support evolving in-network computing workloads and requirements -- i.e., serving multiple concurrent applications and making them fault-tolerant -- via infinite switch resource and one big fault-tolerant switch abstractions. Several systems in the industry are now adopting some of the technologies presented in this talk.
Bio: Daehyeok Kim is a senior researcher at Microsoft. He recently completed his Ph.D. in the computer science department at Carnegie Mellon University. His research interests lie in the intersection of computer systems and networking with a focus on building new abstractions and runtime systems for in-network computing. He is a recipient of the Microsoft Research Ph.D. Fellowship.

Portrait of Nirav Atre
Friday, November 19, 2021 | 2pm~3pm ET

SurgeProtector: Mitigating Algorithmic Complexity Attacks using Adversarial Scheduling
Nirav Atre, Carnegie Mellon University

Abstract: Algorithmic complexity attacks (ACAs) are a class of Denial-of-Service (DoS) attacks where an attacker uses a small amount of adversarial traffic to induce a large amount of work in the target system, pushing the system into overload and causing it to drop packets from innocent users. ACAs are particularly dangerous because, unlike volumetric DoS attacks, ACAs don't require a significant network bandwidth investment from the attacker. Today, network functions (NFs) on the Internet must be painstakingly designed and engineered on a case-by-case basis to mitigate the debilitating impact of ACAs. Further, the resulting designs tend to be overly conservative in their attack mitigation strategy, limiting the innocent traffic that the NF can serve during common-case operation.

In this talk, I will present a general framework we designed to make any NF more resilient to ACAs without the limitations of prior approaches. Our framework, SurgeProtector, uses the NF's scheduler to mitigate the impact of ACAs using a very traditional scheduling algorithm---Weighted Shortest Job First (WSJF). To evaluate SurgeProtector, we propose a new metric of DoS vulnerability called the Displacement Factor (DF), which quantifies the maximum "harm per unit effort" an adversary can inflict on the system. Using novel insights from adversarial scheduling theory, we show that any system using WSJF has a worst-case DF of only a small constant (unity), where traditional schedulers would place no upper bound on the adversary's DF. Illustrating that SurgeProtector is not only theoretically, but practically robust, we integrate SurgeProtector into an open source Intrusion Detection System (IDS). Under simulated attack, the SurgeProtector-augmented IDS suffers 90-99% lower innocent traffic loss than the original system.

Bio: Nirav is a fourth-year Ph.D. student in Computer Science at Carnegie Mellon University (CMU) advised by Prof. Justine Sherry. His research interests broadly lie at the intersection of networking and performance modeling. Prior to starting graduate school, Nirav completed his BASc in Computer Engineering at the University of Toronto, Canada, in 2018.

Portrait of Jing Li
(Invited) Friday, November 5, 2021 | 2pm~3pm ET

It is All About Abstraction: Virtualizing FPGAs in the Cloud
Jing Li, University of Pennsylvania

Abstract: We have seen growing interests and benefits in exploiting FPGAs as a first-class citizen in cloud computing. Cloud vendors such as Amazon and Microsoft have begun to support on-demand FPGA acceleration in various forms of cloud service. Nonetheless, system support for cloud FPGAs is still in its infancy. The lack of efficient virtualization support makes it challenging to fully unleash the benefits of integrating FPGAs into the cloud infrastructure, leading to low elasticity and resource utilization. There are many historical and practical reasons for that: traditional FPGAs and the associated compilation tools are not designed and optimized for the multi-tenant and resource-sharing cloud computing environment. And there is no widely adopted simple hardware/software interface for spatial architecture i.e., FPGA compared to temporal architecture such as CPU.

In this talk, I will present our exploratory efforts to address these limitations. I will first present the key requirements that we identified for virtualizing spatial architecture and present a generic virtualization stack that satisfies the requirements for heterogeneous FPGA clusters. Specifically, I will introduce a two-level system abstraction that can decouple the compilation and resource allocation and thus enables fine-grained resource management with low compilation overhead. I will present how we modify existing compilation flow and runtime management to leverage the proposed abstraction to achieve efficient virtualization. Finally, I will discuss further optimization opportunities through two case studies.

Bio: Jing (Jane) Li is the Eduardo D. Glandt Faculty Fellow and Associate Professor of Electrical and Systems Engineering and of Computer and Information Science at the University of Pennsylvania. She is broadly interested in developing fundamental methods for workload optimized systems. To validate the research ideas, her research puts a strong emphasis on real system prototyping both at chip level and system level. She is the recipient of DARPA's Young Faculty Award, NSF Career Award, IBM Research Division Outstanding Technical Achievement Award for successfully achieving CEO milestone, multiple invention achievement awards and high value patent application awards from IBM. Previously she was the Dugald C. Jackson Assistant Professor at the University of Wisconsin–Madison and a faculty affiliate with the UW-Madison Computer Architecture group and Machine Learning group. She is one of the PIs in SRC JUMP center – Center for Research on Intelligent Storage and Processing-In-Memory (CRISP). She spent her early career at IBM T. J. Watson Research Center as a Research Staff Member after obtaining her PhD degree from Purdue University.

Portrait of Naif Tarafdar and Paul Chow
(Invited) Friday, October 29, 2021 | 2pm~3pm ET

AIgean: An Open Framework for Deploying Machine Learning on Heterogeneous Clusters
Naif Tarafdar and Paul Chow, University of Toronto

Abstract: AIgean, pronounced like the sea, is an open framework to build and deploy machine learning (ML) algorithms on a heterogeneous cluster of devices (CPUs and FPGAs). We present AIgean as a use case for our multi-FPGA deployment infrastructure: Galapagos. AIgean provides a full end-to-end multi-FPGA/CPU implementation of a neural network. The user supplies a high-level neural network description and our tool flow is responsible for the synthesizing of the individual layers, partitioning layers across different nodes as well as the bridging and routing required for these layers to communicate. If the user is an expert in a particular domain and would like to tinker with the implementation details of the neural network, we define a flexible implementation stack for ML that includes the layers of Applications & Algorithms, Cluster Deployment & Communication, and Hardware. The Cluster Deployment & Communication and Hardware leverages the Galapagos layer abstractions where the communication protocol is abstracted from the application and the hardware implementations are abstracted from the physical hardware being used. This allows the user to modify specific layers of abstraction without having to worry about components outside of their area of expertise. We demonstrate the effectiveness of AIgean with three use cases: a small network running on a single network-connected FPGA, an autoencoder running on three FPGAs, and ResNet-50 running across twelve FPGAs.
Bio: Naif Tarafdar is a fifth year PhD candidate at the University of Toronto. He has previously interned at Xilinx Research and Microsoft Research. His main research interest is in democratizing heterogeneous compute to give access to as many new users as possible. This can be through abstraction layers, APIs and programming models. He is the chief architect in Galapagos, a heterogeneous multi-FPGA development stack at the University of Toronto.

Paul Chow is a professor in the faculty of The Edward S. Rogers Sr. Department of Electrical and Computer Engineering at the University of Toronto. He is a Fellow of the IEEE and Fellow of the Engineering Institute of Canada. His main research is about making FPGAs into computing devices so that applications can be easily deployed. In particular, he wants to do this at scale in a heterogeneous environment where FPGAs seamlessly interact with CPUs and other devices, all as peers, and transparently to the application.

Portrait of David Z. Pan
Friday, October 22, 2021 | 2pm~3pm ET

FPGA Placement: Recent Progress and Road Ahead
David Z. Pan, The University of Texas at Austin

Abstract: In the FPGA implementation flow, placement plays a crucial role in determining the overall quality of results and runtime. After synthesis and logic mapping, placement determines the physical locations of heterogeneous instances to optimize wirelength, timing, power, routability, etc., while meeting various constraints in modern FPGAs. This talk will give an overview of recent progress of FPGA placement targeting large-scale heterogeneous FPGAs, including UTPlaceF which won ISPD FPGA Placement Contests before, and the current academic state-of-the-art elfPlace. Since placement may need to be called many times to achieve design closure, it is very important to ensure high scalability with increasing design complexity, e.g., on future 3D FPGAs. We will discuss how to scale-up and accelerate FPGA placement algorithms. We will also discuss some future directions, e.g., open-source to enable cross-team collaborations, and leveraging machine learning hardware/software for FPGA placement.
Bio: David Z. Pan is a Professor and Silicon Laboratories Endowed Chair at the Department of Electrical and Computer Engineering, The University of Texas at Austin.  His research interests include bidirectional AI and IC interactions, electronic design automation, design for manufacturing, hardware security, and CAD for analog/mixed-signal ICs and emerging technologies. He has published over 400 refereed journal/conference papers and 8 US patents. He has served in many journal editorial boards and conference committees, including various leadership roles such as ICCAD 2019 General Chair, ASP-DAC 2017 TPC Chair, and ISPD 2008 General Chair. He has received many awards, including SRC Technical Excellence Award, 19 Best Paper Awards (at DAC, ICCAD, DATE, ASP-DAC, ISPD, HOST, etc.), DAC Top 10 Author Award in Fifth Decade, ASP-DAC Frequently Cited Author Award, Communications of ACM Research Highlights, ACM/SIGDA Outstanding New Faculty Award, NSF CAREER Award, IBM Faculty Award (4 times), and many international CAD contest awards. He has graduated 40 PhD students and postdocs who have won many awards, including the First Place of ACM Student Research Competition Grand Finals (twice, in 2018 and 2021), ACM/SIGDA Student Research Competition Gold Medal (three times), ACM Outstanding PhD Dissertation in EDA Award (twice), EDAA Outstanding Dissertation Award (twice), etc. He is a Fellow of IEEE and SPIE.

Portrait of Justine Sherry
Friday, September 24, 2021 | 2pm~3pm ET

Crossroads RV1: Exploring Data on the Move Applications
Justine Sherry, Carnegie Mellon University

Abstract: The Crossroads FPGA is uniquely positioned to support applications which operate with high throughput over data "on the move" between endpoints such as CPUs, GPUs, Storage, Network, and other platforms. In this talk, we will highlight two applications under development in the Crossroads center. First, the Pigasus IDS is a 100Gbps hybrid FPGA + CPU platform for network security. Next, Norman is a new network dataplane for Linux that offloads the OS networking stack onto a Crossroads FPGA. We will discuss the high level goals of Pigasus and Norman, some of their design details, and finally contrast their two different approaches to using Crossroads: Pigasus, as an "FPGA-centric" design with the CPU working in support of the FPGA, and Norman as a "CPU-centric" design, with the FPGA working in support of the CPU.
Bio: Justine Sherry is an assistant professor at Carnegie Mellon University. Her interests are in computer networking; her work includes middleboxes, networked systems, measurement, cloud computing, and congestion control. Her recent research focuses on new opportunities and challenges arising from the deployment of middleboxes -- such as firewalls and proxies -- as services offered by clouds and ISPs. Dr. Sherry received her PhD (2016) and MS (2012) from UC Berkeley, and her BS and BA (2010) from the University of Washington. She is a recipient of the SIGCOMM doctoral dissertation award, the David J. Sakrison prize, paper awards at USENIX NSDI and ACM SIGCOMM, and an NSF Graduate Research Fellowship. Most importantly, she is always on the lookout for a great cappuccino.

Portrait of Vaughn Betz
Friday, July 23, 2021 | 2pm~3pm ET

Verilog to Routing (VTR): A Flexible Open-Source CAD Flow to Explore and Target Diverse FPGA Architectures
Vaughn Betz, University of Toronto

Abstract: With the need for improvements in compute performance and efficiency beyond what process scaling can provide, FPGAs and FPGA-like programmable accelerators that can target a range of compute tasks efficiently are of interest in many application areas. However, creating a new CAD flow that can evaluate and map circuits to a new programmable architecture remains a daunting task, making flexible CAD flows that can be quickly retargeted to new architectures highly desirable.

This talk will give an overview of the Verilog-to-Routing (VTR) open source tool flow that addresses this need. We'll discuss recent enhancements to VTR that have broadened the range of architetures it can target, and allow it to not only evaluate new FPGA architectures, but also program the chosen architectures that are committed to silicon.

Architecture flexibility can have a cost however, and a common conception in the FPGA Computer Aided Design (CAD) community is that architecture-specific algorithms and tools will significantly out-perform more general approaches which target a variety of FPGA architectures. In this talk we'll show how through careful algorithm design and code architecture VTR has improved result quality without architecture-specific code, challenging the idea that result quality and architecture flexibility are mutually exclusive. We will detail the key packing and routing enhancements that led to large improvements in wirelength and timing, while simultaneously reducing run time by over 6x.

Finally, we'll present efforts to use Reinforcement Learning to create more adaptable and efficient CAD algorithms. Taking placement as an example, we'll show how an RL-enhanced move generator can improve the quality/run-time trade-off of VTR's placement algorithm.
Bio: Vaughn Betz is a Professor and the NSERC/Intel Industrial Research Chair in Programmable Silicon at the University of Toronto. He is the original developer of the widely used VPR FPGA placement, routing and architecture evaluation CAD flow, and a lead developer in the VTR project that has built upon VPR. He co-founded Right Track CAD to commercialize VPR, and joined Altera upon its acquisition of Right Track CAD. Dr. Betz spent 11 years at Altera, ultimately as Senior Director of software engineering, and is one of the architects of the Quartus CAD system and the first five generations of the Stratix and Cyclone FPGA families. He holds 101 US patents and has published over 100 technical articles in the FPGA area, thirteen of which have won best or most significant paper awards. Dr. Betz is a Fellow of the IEEE and the National Academy of Inventors, and a Faculty Affiliate of the Vector Institute for Artificial Intelligence.

Portrait of James C. Hoe
Friday, July 16, 2021 | 2pm~3pm ET

From “Field Programmable” to “Programmable”
James C. Hoe, Carnegie Mellon University

Abstract: This talk is an overview of RV5.

To elevate FPGAs from logic to computing roles, we need to address the greater requirement for programmability beyond being a “field programmable” ASIC. A computing FPGA will be asked to do more tasks than could fit on the fabric at once and to do new tasks that are unknown before deployment. Moreover, dynamically managing the logic resource utilization is a presently under-tapped source of performance optimization----by devoting available resources to only active tasks or by supporting tasks with differently-optimized design variants to changing conditions.

To maximally exploit the benefits of FPGAs’ programmability, the Intel/VMware Crossroads 3D FPGA Academic Research Center aims to make runtime reprogramming a regular mode of operation for Crossroads 3D-FPGA in future datacenter servers. This talk will motivate the need for a new, expanded design mindset by FPGA users and designers to fully pursue FPGAs’ programmability and dynamism. The talk next presents the Crossroads Center’s research toward realizing this new usage and programming on the Crossroads 3D-FPGA for datacenter applications. The talk will present a re-design of the Pigasus network intrusion detection/prevention system (IDS/IPS) following a design methodology to leverage the flexible and dynamic capabilities of FPGA targets.
Bio: James C. Hoe is a Professor of Electrical and Computer Engineering at Carnegie Mellon University. He received his Ph.D. in EECS from Massachusetts Institute of Technology in 2000 (S.M., 1994). He received his B.S. in EECS from UC Berkeley in 1992. He is interested in many aspects of computer architecture and digital hardware design, including the specific areas of FPGA architecture for computing; digital signal processing hardware; and high-level hardware design and synthesis. He is a Fellow of IEEE. For more information, please visit

Portrait of Zhipeng Zhao
Friday, July 9, 2021 | 2pm~3pm ET

Pigasus: Efficient Handling of Input-Dependent Streaming on FPGAs
Zhipeng Zhao, Carnegie Mellon University

Abstract: FPGAs have well-demonstrated success in many networking applications but failed in accelerating Intrusion Detection and Prevention Systems(IDS/IPS). The root cause is the mismatch of the traditional static, fixed-performance FPGA design and input-dependent behaviors of IDS/IPS. As a result, the design is provisioned to handle worst-case, losing the opportunity to utilize the resource allocated for worst-case to improve the common-case performance.

In this talk, I will present an FPGA based IDS/IPS called Pigasus which is tailored to the common-case, thus using minimal resources to extract maximum performance. Pigasus can achieve 100Gbps using 1 FPGA and on average 5 CPU cores, 100x faster than CPU-only baseline and 50x faster than existing FPGA designs. A natural objection to this design is that it will suffer from shifting workloads. In the second part, I will show how to use a disaggregated architecture and spillover mechanism to scale subcomponents of the system on demand to address changes in the traffic profile at both compile time and runtime.
Bio: Zhipeng Zhao is a Ph.D. candidate in Electrical and Computer Engineering at Carnegie Mellon University, advised by Prof. James C. Hoe. His research interests broadly lie at the intersection of FPGA and networking. Prior to CMU, he received a BS and an MS in Electrical Engineering, both from Beihang University, China.

Portrait of Derek Chiou
Friday, June 25, 2021 | 2pm~3pm ET

Soft Processor Overlays to Improve Time-to-Solution
Derek Chiou, The University of Texas at Austin

Abstract: Soft Processor Overlays are application-specific processors implemented in FPGA logic. Overlays can be more efficient than standard processors because they can be highly specialized and can get to a working implementation faster than dedicated circuits in FPGAs because they have software compile times and are more debuggable. In this talk, I will discuss prior work in overlays, how we plan to experiment with, develop, and use overlays in Research Vector 2 (RV2) of the Intel/VMware Crossroads 3D-FPGA Academic Research Center, and how those overlays will influence and interact with other research vectors in the center, such as investigations on 3D FPGA base-die architecture (RV3) and partial reconfiguration (RV5).
Bio: Derek Chiou is a Research Scientist in the Electrical and Computer Engineering Department at The University of Texas at Austin and a Partner Architect at Microsoft responsible for future infrastructure hardware architecture. He is a co-founder of the Microsoft Azure SmartNIC effort and lead the Bing FPGA team to first deployment of Bing ranking on FPGAs. Until 2016, he was an associate professor at UT. Before UT, Dr. Chiou was a system architect and lead the performance modeling team at Avici Systems, a manufacturer of terabit core routers. Dr. Chiou received his Ph.D., S.M. and S.B. degrees in Electrical Engineering and Computer Science from MIT.

Portrait of Sanil Rao
Friday, June 11, 2021 | 2pm~3pm ET

High-Performance Code Generation for Graph Applications
Sanil Rao, Carnegie Mellon University

Abstract: Software libraries have been a staple in computing, providing users with a maintained interface of functions for their applications. One such library, GraphBLAS, is used in the graph processing community because of its foundation in linear algebra, and its clear description of the overarching computation through its library calls. One issue that arises however, is these library calls have the potential to leave performance behind when looking for optimization, especially when one considers multiple library calls. Simply writing additional merged library calls is impractical given the importance of library clarity, and writing a general-purpose compiler that understands library call semantics would be infeasible. Therefore, we propose an approach from a higher level of abstraction, treating the GraphBLAS library as a specification, and generating code that understands the libraries’ semantics. We transform library calls to their linear algebraic descriptions, and use pattern matching techniques to look for optimizations. Preliminary results show that our code generation system, SPIRAL, achieves performance matching that of hand-optimized codes, while keeping the clarity of both the original library and user application.
Bio: Sanil Rao is a second-year PhD student In Electrical and Computer Engineering at CMU advised by Prof. Franz Franchetti and part of the SPIRAL group. His research focus is in the area of programming languages and compilers, specifically code generation. Prior to CMU, he received a BS in Computer Science from the University of Viriginia.

Portrait of Hugo Sadok
Friday, May 28, 2021 | 2pm~3pm ET

We Need Kernel Interposition over the Network Dataplane
Hugo Sadok, Carnegie Mellon University

Abstract: Kernel-bypass networking, which allows applications to circumvent the kernel and interface directly with NIC hardware, is one of the main tools for improving application network performance. However, allowing applications to circumvent the kernel makes it impossible to use tools (e.g., tcpdump) or impose policies (e.g., QoS and filters) that need to interpose on traffic sent by different applications running on a host. This makes maintainability and manageability a challenge for kernel-bypass applications. In response, we propose Kernel On-Path Interposition (KOPI), in which traditional kernel dataplane functionality is retained but implemented in a fully programmable SmartNIC. We hypothesize that KOPI can support the same tools and policies as the kernel stack while retaining the performance benefits of kernel bypass.
Bio: Hugo Sadok is a second-year PhD student in Computer Science at CMU advised by Prof. Justine Sherry and part of the SNAP Lab. His research interests are broadly in computer networks and computer systems. Prior to CMU, he received a BS in Electronic and Computer Engineering and an MS in Electrical Engineering, both from UFRJ.

Portrait of James C. Hoe
Friday, May 14, 2021 | 2pm~3pm ET

The Role for Programmable Logic in Future Datacenter Servers
(An Overview of the Crossroads Center)

James C. Hoe, Carnegie Mellon University

Abstract: This talk is a rerun for those affiliated with the center and intended to introduce the Center to the outside audience.

Field Programmable Gate Arrays (FPGAs) have been undergoing rapid and dramatic changes fueled by their expanding use in datacenter computing. Rather than serving as a compromise or alternative to ASICs, FPGA 'programmable logic' is emerging as a third paradigm of compute that stands apart from traditional hardware vs. software archetypes. The Crossroads 3D-FPGA Research Center has been formed with the goal to define a new role for programmable logic in future datacenter servers. Guided by both the demands of modern network-driven, data-centric computing and the new capabilities from 3D integration, this center is developing the Crossroads 3D-FPGA as a new central fixture component on future server motherboards, serving to connect all server endpoints (network, storage, memory, CPU) intelligently. As a literal crossroads of data, a Crossroads 3D-FPGA can apply application-specific functions over data-on-the-move between any pair of server endpoints, intelligently steer data to the right core or accelerator, and reduce and compress the volume of data that needs to be moved between servers. This talk will overview the Crossroads 3-D FPGA concepts, as well as the associated set of research thrusts to pursue a full-stack solution spanning application, programming support, dynamic runtime, design automation, and architecture.
Bio: James C. Hoe is a Professor of Electrical and Computer Engineering at Carnegie Mellon University. He received his Ph.D. in EECS from Massachusetts Institute of Technology in 2000 (S.M., 1994). He received his B.S. in EECS from UC Berkeley in 1992. He is interested in many aspects of computer architecture and digital hardware design, including the specific areas of FPGA architecture for computing; digital signal processing hardware; and high-level hardware design and synthesis. He is a Fellow of IEEE. For more information, please visit

Portrait of Joseph Melber
Friday, April 23, 2021 | 2pm~3pm ET

Raising the Level of Abstraction for FPGA System Design
Joseph Melber, Carnegie Mellon University

Abstract: Current Field Programmable Gate Array (FPGA) programming abstractions give disproportionate emphasis to reducing the design effort for processing kernels than to the memory access side of the design task. Designers are asked to build all the datapaths for on-chip buffering and data movements, as well as the state machines to coordinate these datapath activities. These datapaths are often ad-hoc efforts that are not generally reusable.

Software programmers leverage abstraction to simplify their design efforts⸺hardware designers should be supported by similar abstractions in order to increase FPGA programmability in modern computing systems. In this talk, I will focus on (1) re-imagining what memory should look like for FPGA hardware designers, and (2) virtualizing functionalities, devices and platforms for FPGA computing. I have been investigating a service-oriented abstraction and framework to simplify hardware design efforts for FPGA accelerator’s memory systems. The goal is to enable FPGA accelerator designers to configure a specialized memory system that presents abstract semantic-rich memory operations, across diverse memory devices, without performance overhead. Current efforts also extend this abstraction to virtualize these functionalities across devices and architectures. I will conclude by discussing the future potential of my research and vision for FPGA computing.
Bio: Joseph Melber is a Ph.D. candidate in Electrical and Computer Engineering at Carnegie Mellon University. He is advised by Dr. James C. Hoe. His research interests are reconfigurable computing, and computer architecture. His research focuses on memory systems and programming abstractions for heterogeneous FPGA computing systems. He received his M.S. in Electrical and Computer Engineering from Carnegie Mellon University in 2016, and B.S. in EE from the University at Buffalo in 2014.