FPGA Technology at Crossroads

Field Programmable Gate Arrays (FPGAs) have been undergoing rapid and dramatic changes fueled by their expanding use in datacenter computing. Rather than serving as a compromise or alternative to ASICs, FPGA ‘programmable logic’ is emerging as a third paradigm of compute that stands apart from traditional hardware vs. software archetypes. A multi-university, multi-disciplinary team has been formed behind the question:

What should be the future role of FPGAs as a central function in datacenter servers?

Guided by both the demands of modern networked, data-centric computing and the new capabilities from 3D integration, the Intel/VMware Crossroads 3D-FPGA Academic Research Center will investigate a new programmable hardware data-nexus lying at the heart of the server and operating over data ‘on the move’ between network, traditional compute, and storage elements.

The Intel/VMware Crossroads 3D-FPGA Academic Research Center is jointly supported by Intel and VMware. The center is committed to public and free dissemination of its research outcome.


You can find an overview presentation on the center’s YouTube channel. Please contact any of the Crossroads PIs in your research area if you have any questions or interest.

If you are looking for an introductory overview on FPGAs, you may find the first 4 lectures from this course useful. Please see FPGA Architecture: Principles and Progression by Boutros and Betz for a technical overview article. You can find a wide range of FPGA topics presented to different skill levels on this Intel YouTube Channel.


Latest News

February 2022 | Intel’s Corporate Research Council recognizes Crossroads Center PIs Sherry, Sekar and Hoe with 2021 Outstanding Researcher Awards for their work on the Pigasus FPGA-Accelerated Intrusion Detection and Prevention System. Pigasus inspects 100k+ concurrent connections against 10k+ SNORT rules at 100 Gbps in a single server form factor by handling common-case processing… read more

February 2022 | Intel’s Corporate Research Council recognizes Crossroads Center PIs Sherry, Sekar and Hoe with 2021 Outstanding Researcher Awards for their work on the Pigasus FPGA-Accelerated Intrusion Detection and Prevention System. Pigasus inspects 100k+ concurrent connections against 10k+ SNORT rules at 100 Gbps in a single server form factor by handling common-case processing in an Intel FPGA SmartNIC. Pigasus was developed by former CMU PhD student Dr. Zhipeng Zhao in his dissertation on efficient acceleration of irregular, data-dependent stream processing. Today, Pigasus is a focus application driver for many technologies under research by the Crossroads Center. Pigasus has gained broad interest as an open-sourced project with a growing academic and industrial user and developer community. (Read Less)


December, 2021 | Mohamed Ibrahim successfully defended his MASc thesis at the University of Toronto. His thesis detailed enhancements to the HPIPE FPGA-based CNN accelerator to perform object detection and to span multiple FPGAs for higher performance. Mohamed developed an automatic partitioning algorithm that allows HPIPE accelerators to achieve higher parallelism by… read more

December, 2021 | Mohamed Ibrahim successfully defended his MASc thesis at the University of Toronto. His thesis detailed enhancements to the HPIPE FPGA-based CNN accelerator to perform object detection and to span multiple FPGAs for higher performance. Mohamed developed an automatic partitioning algorithm that allows HPIPE accelerators to achieve higher parallelism by spanning multiple FPGAs. Both performance models and deployment on a multi-Stratix-10 system in James Hoe’s group at CMU showed near-linear speedup as the FPGA count increased. Mohamed will join Intel’s Deep Learning Accelerator team in February. (Read Less)


December 2021 | The paper “Specializing for Efficiency: Customizing AI Inference Processors on FPGAs,” by by Andrew Boutros, Vaughn Betz (University of Toronto) and Eriko Nurvitadhi (Intel) received the “Third Paper Award” from the IEEE International Conference on Microelectronics. This work showed that specializing NPU accelerators to workload classes improves performance by 9% to 35% while simultaneously reducing resource usage … read more

December 2021 | The paper “Specializing for Efficiency: Customizing AI Inference Processors on FPGAs,” by by Andrew Boutros, Vaughn Betz (University of Toronto) and Eriko Nurvitadhi (Intel) received the “Third Paper Award” from the IEEE International Conference on Microelectronics. This work showed that specializing NPU accelerators to workload classes improves performance by 9% to 35% while simultaneously reducing resource usage by 23% to 44%. Andrew is currently augmenting the SystemC NPU model to investigate dividing the NPU into modular latency-insensitive components; this will enable investigation of Crossroads FPGA architecture ideas that include linking components and accelerators with a (latency-insensitive) NoC. (Read Less)


[Find all News here]


Recent Publications

  • BBQ: A Fast and Scalable Integer Priority Queue for Hardware Packet Scheduling [abstract]

    Atre, N., Sadok, H., and Sherry, J. (2024). BBQ: A Fast and Scalable Integer Priority Queue for Hardware Packet Scheduling. In Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI). Santa Clara, CA, USA: USENIX Association. [bibtex]

    Abstract:
    The need for fairness, strong isolation, and fine-grained control over network traffic in multi-tenant cloud settings has engendered a rich literature on packet scheduling in switches and programmable hardware. Recent proposals for hardware scheduling primitives (e.g., PIFO, PIEO, BMW-Tree) have enabled run-time programmable packet schedulers, considerably expanding the suite of scheduling policies that can be applied to network traffic. However, no existing solution can be practically \textit{deployed} on modern switches and NICs because they either do not scale to the number of elements required by these devices or fail to deliver good throughput, thus requiring an impractical number of replicas. In this work, we ask: is it possible to achieve priority packet scheduling at line-rate while supporting a large number of flows? Our key insight is to leverage a scheduling primitive used previously in software -- called Hierarchical Find First Set -- and port this to a highly pipeline-parallel hardware design. We present the architecture and implementation of the Bitmapped Bucket Queue (\system), a hardware-based integer priority queue that supports a wide range of scheduling policies (via a PIFO-like abstraction). BBQ, for the first time, supports hundreds of thousands of concurrent flows while guaranteeing 100\,Gbps line rate (148.8\,Mpps) on FPGAs and 1\,Tbps (1,488\,Mpps) line rate on ASICs. We demonstrate this by implementing BBQ on a commodity FPGA where it is capable of supporting over 100K flows and 32K priorities at 300\,MHz, $3\times$ the packet rate of similar hardware priority queue designs. On ASIC, we can synthesize 100K elements at 3.1\,GHz using a 7nm process.
    BibTeX:
    @inproceedings {bbq,
      author = {Atre, Nirav and Sadok, Hugo and Sherry, Justine},
      title = {{BBQ}: A Fast and Scalable Integer Priority Queue for Hardware Packet Scheduling},
      booktitle = {21st {USENIX} Symposium on Networked Systems Design and Implementation},
      year = {2024},
      address = {Santa Clara, CA},
      publisher = {{USENIX} Association},
      month = apr,
      series = {{NSDI}~'24}
    }
    
  • Of Apples and Oranges: Fair Comparisons in Heterogenous Systems Evaluation [abstract] [paper] [slides]

    Sadok, H., Panda, A., and Sherry, J. (2023). Of Apples and Oranges: Fair Comparisons in Heterogenous Systems Evaluation. In Proceedings of the 22nd Workshop on Hot Topics in Networks (HotNets). Boston, MA, USA: Association for Computing Machinery. [bibtex]

    Abstract:
    Accelerators, such as GPUs, SmartNICs and FPGAs, are common components of research systems today. This paper focuses on the question of how to fairly compare these systems. This is challenging because it requires comparing systems that use different hardware, e.g., two systems that use two different types of accelerators, or comparing a system that uses an accelerator with one that does not. We argue that fair evaluation in this case requires reporting not just performance, but also the cost of competing systems. We discuss what cost metrics should be used, and propose general principles for incorporating cost in research evaluations.
    BibTeX:
    @inproceedings{apples_oranges,
    author = {Sadok, Hugo and Panda, Aurojit and Sherry, Justine},
    title = {Of Apples and Oranges: Fair Comparisons in Heterogenous Systems Evaluation},
    year = {2023},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3626111.3628186},
    doi = {10.1145/3626111.3628186},
    booktitle = {Proceedings of the 22nd Workshop on Hot Topics in Networks},
    pages = {1--8},
    location = {Boston, Massachusetts},
    month = nov,
    series = {{HotNets}~'23}
    }
    
  • Ensō: A Streaming Interface for NIC-Application Communication [abstract] [paper] [slides] [video] [code]

    Sadok, H., Atre, N., Zhao, Z., Berger, D. S., Hoe, J., Panda, A., Sherry, J., and Wang, R. (2023). Ensō: A Streaming Interface for NIC-Application Communication. In Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI). Boston, MA, USA: USENIX Association. [bibtex]

    Abstract:
    Today, most communication between the NIC and software involves exchanging fixed-size packet buffers. This packetized interface was designed for an era when NICs implemented few offloads and software implemented the logic for translating between application data and packets. However, both NICs and networked software have evolved: modern NICs implement hardware offloads, e.g., TSO, LRO, and serialization offloads that can more efficiently translate between application data and packets. Furthermore, modern software increasingly batches network I/O to reduce overheads. These changes have led to a mismatch between the packetized interface, which assumes that the NIC and software exchange fixed-size buffers, and the features provided by modern NICs and used by modern software. This incongruence between interface and data adds software complexity and I/O overheads, which in turn limits communication performance. This paper proposes Ensō, a new streaming NIC-to-software interface designed to better support how NICs and software interact today. At its core, Ensō eschews fixed-size buffers, and instead structures communication as a stream that can be used to send arbitrary data sizes. We show that this change reduces software overheads, reduces PCIe bandwidth requirements, and leads to fewer cache misses. These improvements allow an Ensō-based NIC to saturate a 100 Gbps link with minimum-sized packets (forwarding at 148.8 Mpps) using a single core, improve throughput for high-performance network applications by 1.5-6x, and reduce latency by up to 43%.
    BibTeX:
    @inproceedings {enso,
    author = {Sadok, Hugo and Atre, Nirav and Zhao, Zhipeng and Berger, Daniel S. and Hoe, James C. and Panda, Aurojit and Sherry, Justine and Wang, Ren},
    title = {{Ensō}: A Streaming Interface for {NIC}-Application Communication},
    booktitle = {17th {USENIX} Symposium on Operating Systems Design and Implementation},
    year = {2023},
    isbn = {978-1-939133-34-2},
    address = {Boston, MA},
    pages = {1005--1025},
    publisher = {{USENIX} Association},
    month = jul,
    series = {{OSDI}~'23}
    }
    

[Find all Publications and Downloads here]