Skip to main content
All Posts By

racepointglobal

Alibaba Cloud Announced Progress in Porting Android Functions onto RISC-V

By Announcement

The company also tops MLPerf Tiny v0.7 Benchmark with its IOT processor

Hangzhou, China, April 20, 2022 – Alibaba Cloud, the digital technologies and intelligence backbone of Alibaba Group, announced it has made further progress in porting basic Android functions onto the RISC-V instruction-set architecture (ISA). This proves the feasibility of using RISC-V based Android devices in scenarios ranging from multimedia to signal processing, device interconnection, and artificial intelligence.

Last year, the company reported it had successfully ported basic functions like chrome browsing in Android 10. Since the initial porting trial, further effort has been invested to rebase previous engineering on Android 12 to enable third-party vendor modules to facilitate new functions, including audio and video playback, WiFi and Bluetooth, as well as camera operation. 

To better facilitate these new functions, Alibaba Cloud has also enabled more system enhancement features such as core tool sets, third-party libraries and SoC board support package on RISC-V, further improving the robustness of the RISC-V ecosystem when running on the Android software stack. 

In addition, Alibaba Cloud successfully trialed the TensorFlow Lite models on RISC-V, supporting AI functions like image and audio classification and Optical Character Recognition (OCR), a development that helps accelerate the incorporation of RISC-V into smart devices.

“The support of Android12, vendor modules and the AI framework on RISC-V based devices is another major milestone that we have achieved,” said Jianyi Meng, Senior Director at Alibaba. “We look forward to further contributing to the RISC-V community with our advanced technology and resources, and encouraging more innovation in the community together with global developers.”

Meng added Alibaba Cloud will open the source codes of related technologies in the near future.

Alibaba Cloud Tops MLPerf Tiny v0.7 Benchmark

Earlier this month, Alibaba Cloud’s Xuantie C906 processor attained firsts in the most recent findings from MLPerf Tiny v0.7, an AI benchmark focusing on IOT devices. The Xuantie C906’s performance excelled in all four core categories – visual wake words, image classifications, keyword spotting, and anomaly detection. The Xuantie C906 is Alibaba’s custom-built processor based on the RISC-V instruction-set architecture.

Xuantie C906’s remarkable performance marks a milestone that showcases the potential of the RISC-V framework in achieving tailored AI functions with extremely low computing power. 

The breakthrough performance in the AIoT area is driven by Alibaba Cloud’s innovation across  hardware and software layers. Alibaba Cloud has improved the computing efficiency by using SinianML, a model optimiser, the Heterogeneous Honey Badger (HHB), the neural network model deployment toolset designed for the RISC-V architecture, and CSI-NN2, the optimised neural network operator library. In addition, Alibaba’s software stack, along with the hardware toolset and library, has optimised AI operators and further improved the performance of the AI inference model, resulting in the Xuantie C906’sexceptional performance.

Alibaba Cloud’s RISC-V based processors have already been deployed widely across a range  of applications including smart home appliances, automotive environments and edge computing. Last year, Alibaba Cloud opened the source code of its XuanTie IP Core series, enabling  developers to access the codes on Github and the Open Chip Community in order to build prototype chips of their own, which can be customised for IoT applications such as networking, gateway and edge servers.

Launched by the open engineering consortium MLCommons, MLPerf™ Tiny benchmark measures how quickly a trained neural network can process new data for the lowest power devices and smallest form factors. MLPerf Tiny v0.7 is the organisation’s second inference benchmark suite that targets machine learning use cases on embedded devices.

“AI for IoT is a highly competitive arena where customisation at every level is critical to achieve new breakthrough results at very low power” said Calista Redmond, CEO of RISC-V International. “Alibaba continues to build RISC-V industry leadership in parallel with their dedication and contribution to the global RISC-V community.”   

“The flexibility of the RISC-V’s framework gives it an advantage in meeting the customisation demands of clients in the AIoT field. We will continue to drive innovation among the thriving RISC-V community, and assist global developers to build their own RISC-V-based chips in a much more cost-effective way,” said Meng. 

CHIPS Alliance Forms F4PGA Workgroup to Accelerate Adoption of Open Source FPGA Tooling

By Announcement

New workgroup draws support from industry leaders as the open FPGA toolchain matures

SAN FRANCISCO, Feb. 18, 2022 – CHIPS Alliance, the leading consortium advancing common and open source hardware for interfaces, processors and systems, today established the FOSS Flow For FPGA (F4PGA) Workgroup to drive open source tooling, IP and research efforts for FPGAs. 

FPGA vendors such as Xilinx (now part of AMD) and QuickLogic, industrial FPGA users and contributors such as Google and Antmicro, as well as universities including Brigham Young University, University of Pennsylvania, Princeton University and University of Toronto, can now officially collaborate under the umbrella of the newly launched F4PGA Workgroup.

“FPGAs are essential for a wide variety of low-latency compute use cases, from telecoms to space applications and beyond. This new F4PGA toolchain will enable a software-driven approach to building FPGA gateware, making code integration easier than ever,” said Rob Mains, General Manager at CHIPS Alliance. “Under the umbrella of the CHIPS Alliance, this workgroup will help unite current FPGA efforts so academia and industry leaders can collaborate on accelerating open FPGA innovation.”

The initial F4PGA projects are focused around the free and open source FPGA toolchain formerly known as SymbiFlow, as well as the FPGA Interchange Format, which is designed to enable interoperability between open and closed source FPGA toolchains.  CHIPS Alliance’s newest member Xilinx, now part of AMD, collaborated with Google and Antmicro to develop the Interchange Format definition and related tools to provide a development standard for the entire FPGA industry. The FPGA Interchange Format allows developers to quickly and easily move from one tool to another, lowering the barriers to entry for the entire supply chain – from FPGA vendors to academics and FPGA users.

In addition to the work around the FPGA Interchange Format, several CHIPS Alliance members have collaborated on the FPGA tool perf framework. This open FPGA tooling project provides a comprehensive end-to-end FPGA synthesis flow and FPGA performance profiling framework, allowing developers to analyze FPGA designs by looking at metrics such as clock frequency, resource utilization and runtime.

CHIPS Alliance members have also worked on the development of the FPGA Assembly (FASM) format. The FPGA Assembly (FASM) format is a textual format specifying which FPGA feature should be enabled or disabled; the textual nature of FASM makes it easy to analyze and experiment with in different designs.

Industry support for open FPGA tools has continued to rise with QuickLogic becoming the first company to fully embrace the open source FPGA toolchain in 2020, and now with Xilinx’ participation in the FPGA Interchange project. The strong support for the F4PGA Workgroup promises to help further accelerate industry adoption across geographies and increase confidence in open source FPGA tooling as a viable option for all types of designs.

To learn more about the F4PGA Workgroup, please visit: https://chipsalliance.org/workgroups/

About the CHIPS Alliance

The CHIPS Alliance is an organization which develops and hosts high-quality, open source hardware code (IP cores), interconnect IP (physical and logical protocols), and open source software development tools for design, verification, and more. The primary focus is to provide a barrier-free collaborative environment, to lower the cost of developing IP and tools for hardware development. The CHIPS Alliance is hosted by the Linux Foundation. For more information, visit chipsalliance.org.

About the Linux Foundation

The Linux Foundation was founded in 2000 and has since become the world’s leading home for collaboration on open source software, open standards, open data, and open hardware. Today, the Foundation is supported by more than 1,000 members and its projects are critical to the world’s infrastructure, including Linux, Kubernetes, Node.js and more. The Linux Foundation focuses on employing best practices and addressing the needs of contributors, users, and solution providers to create sustainable models for open collaboration. For more information, visit linuxfoundation.org.

CHIPS Alliance Announces Xilinx as its Newest Member 

By Announcement

Xilinx to continue to drive forward open source FPGA innovation

SAN FRANCISCO, Feb. 3, 2022 – CHIPS Alliance, the leading consortium advancing common and open hardware for interfaces, processors and systems, today announced that Xilinx, Inc. (NASDAQ: XLNX) has joined the CHIPS Alliance organization. Xilinx is a leader in adaptive computing, providing highly-flexible programmable silicon, enabled by a suite of advanced software and tools to drive rapid innovation across a wide span of industries and technologies – from consumer to cars to the cloud. 

“Xilinx has long been an advocate of open standards and open source,” said Tomas Evensen, CTO Open Source at Xilinx. “As a member of the CHIPS Alliance, we look forward to continuing to spearhead open FPGA initiatives to give everyone the opportunity to innovate faster and do more with their designs.”

Xilinx collaborated with longstanding CHIPS Alliance members Antmicro and Google to develop the FPGA Interchange Format, which helps to lower design barriers by enabling interoperability between open and closed source FPGA toolchains. Xilinx designed its RapidWright open source platform to work with the Interchange Format. RapidWright enables users to customize implementations to their unique challenges and provides a design methodology using pre-implemented modules with a gateway to back-end tools in Vivado. 

“As the inventor of the FPGA, Xilinx is one of the key companies driving forward innovation in this market,” said Rob Mains, General Manager at CHIPS Alliance. “Xilinx has already been working closely with several CHIPS Alliance members around open source efforts, so it’s great to have them under the CHIPS Alliance umbrella as we plan to boost our FPGA efforts this year.”

To learn more about Xilinx, please visit: www.xilinx.com.

About the CHIPS Alliance

The CHIPS Alliance is an organization which develops and hosts high-quality, open source hardware code (IP cores), interconnect IP (physical and logical protocols), and open source software development tools for design, verification, and more. The primary focus is to provide a barrier-free collaborative environment, to lower the cost of developing IP and tools for hardware development. The CHIPS Alliance is hosted by the Linux Foundation. For more information, visit chipsalliance.org.

About the Linux Foundation

The Linux Foundation was founded in 2000 and has since become the world’s leading home for collaboration on open source software, open standards, open data, and open hardware. Today, the Foundation is supported by more than 1,000 members and its projects are critical to the world’s infrastructure, including Linux, Kubernetes, Node.js and more. The Linux Foundation focuses on employing best practices and addressing the needs of contributors, users, and solution providers to create sustainable models for open collaboration. For more information, visit linuxfoundation.org.

Towards UVM: Using Coroutines for Low-overhead Dynamic Scheduling in Verilator

By Blog

This post was originally published at Antmicro.

Verilator is a popular open source SystemVerilog simulator and one of the key tools in the ASIC and FPGA ecosystem, which Antmicro is actively using and developing, e.g. by enabling co-simulation with Renode or Cocotb integration. It’s also one of the fastest available HDL simulators, including proprietary alternatives. It achieves that speed by generating highly optimized C++ code from a given hardware design. Verilator does a lot of work at compile-time to make the generated (‘verilated’) code extremely fast, such as ordering statements in an optimal way.

Verilation diagram

This static ordering of code also means that support for some SystemVerilog features has been sacrificed to make Verilator so performant. Namely, Verilator does not support what is known as the stratified scheduler, an algorithm that specifies the correct order of execution of SystemVerilog designs. This algorithm is dynamic by nature, and does not fit with Verilator’s static approach.

Because of this, it doesn’t support UVM, a widely-used framework for testing hardware design. Testbenches for Verilator have to be written using C++, which is not ideal – you shouldn’t have to know how to program in C++ in order to use a SystemVerilog simulator. Many ASIC projects out there are not able to take advantage of Verilator, because verification in this space is very often done with UVM. This is a gap that together with Western Digital, Google and the entire CHIPS Alliance we have been working to close, to enable fully open source, cloud-scalable verification usable by the broad ASIC industry.

A milestone towards open source UVM

Some of the key features UVM requires are dynamically-triggered event variables and delays. To support them, we introduced to Verilator what we call a dynamic scheduler with a proof-of-concept implementation which we described in more detail in a previous blog note earlier this year. Essentially, it enabled us to suspend execution of SystemVerilog processes when waiting for delays to finish or events to be triggered, thus postponing some of the scheduling from compile-time to runtime.

initial forever begin
    @ping;
    #1 ->pong;
end

initial forever begin
    #1 ->ping;
    @pong;
end

That thread-based implementation worked, but it required us to run each process in a design in a separate thread, using mutexes and condition variables to facilitate communication. With a working solution in hand, which proved that what we set out to do was possible, we started thinking about a different approach which would allow us to avoid the significant performance overhead introduced by threads and hopefully also simplify the implementation. That’s when coroutines came up as a possible solution.

What is a coroutine?

The concept of coroutines has been around for decades. Arguably, most programmers have used them, knowingly or not. They are available in some form for most modern programming languages, and now they are also included in the newest C++20 standard. But what are they exactly?

Normally, when a function or procedure is called, it needs to finish execution in order for the control flow to go back to a previously executed function. This is reflected in the way the call stack works. A coroutine is a generalization of the concept of a function, but it differs in that its execution can be paused at any point, and resumed from any other point in the program, even from a different thread. Implementations vary, but often this is achieved by allocating coroutine state on the heap.

Diagram depicting the call stack and coroutine state

Unlike threads which are commonly used in desktop operating systems, coroutines are a form of cooperative multitasking, meaning that they have to yield control by themselves – there is no scheduler controlling them from the outside. A programmer needs to specify when and where a coroutine should resume execution.

A popular use case for coroutines is writing generators. As the name suggests, a generator is used for generating some set of values, but instead of returning them all at once, it yields them one by one to the function that called the generator.

generator<uint64_t> fib(int n) {
    uint64_t a = 0, b = 1;
    for (int i = 0; i < n; i++) {
        b = b + exchange(a, b);
        co_yield a;
    }
}
for (uint64_t n : fib(40))
    printf("%d\n", n);

Coroutines are also useful for asynchronous programming, for writing functions that start their execution on one thread, but continue on another (i.e. a background thread intended for heavy computation).

ui_task click_compute() {
    label = "Computing...";
    co_await compute();
    label = "Finished!";
}

Currently, coroutines are supported by many C++ compilers, including GCC 11 and Clang 13 (which offers experimental support). It’s worth mentioning that Clang is excellent at optimizing them: if a coroutine does not outlive the calling function’s stack frame, and its state object’s size is known at compile time, the heap allocation can be elided. Coroutine state is then simply stored on the stack. This gives Clang a significant performance edge over GCC in some cases, such as when using generators.

Coroutines for dynamic scheduling

From the get-go, coroutines seemed like a good fit for dynamic scheduling of SystemVerilog in Verilator. As previously mentioned, they follow the cooperative model of multitasking, which is sufficient for handling delays and events in SV processes. Preemption is not necessary, as there is no danger of starving a task. That is because all SystemVerilog processes should yield in a given time slot either after they finish or when they’re awaiting an event.

A significant drawback of threads, which was what the initial implementation was based on, is that it’s not possible to spawn thousands of them, one for each process in a design. However, it is possible to spawn thousands of coroutines, and that number is only bound by the amount of RAM available to the user. Also, with coroutines, one does not have to worry about multithreading problems like data races. All multitasking can be done on one thread.

The only issue with coroutines is the allocation of coroutine state. However, there are ways to mitigate that by using a custom allocator, as well as only using coroutines for the parts of a design that actually require it. After all, dynamic scheduling is not relevant to the synthesizable subset of SystemVerilog.

Thus, we decided to go ahead and replace threads with coroutines in our implementation. The new approach immediately proved to be easier to work with, and development pace increased significantly. The new version already surpassed the thread-based implementation in completeness as well as performance, and is available here. Let’s take a closer look at how it works.

Implementation

initial forever begin
    @ping;
    #1;
    ->pong;
end
while (true) {
    co_await ping;
    co_await 1;
    resume(pong);
}

The general idea for the implementation was to reflect the behavior of SystemVerilog delay and event trigger statements in the co_await statement in C++20. This statement is responsible for suspending coroutines, and we use it to suspend SystemVerilog processes represented by coroutines in a verilated design.

When a delay is encountered, the current coroutine (or process) is suspended and put into a queue. When the awaited time comes, the corresponding coroutine is removed from the queue and resumed.

Diagram depicting how delays are handled

Event variables work in a similar way. When we are awaiting an event, we suspend the current coroutine and put it in what we call an event dispatcher. If the event is triggered at a later point, we inform the event dispatcher which resumes the corresponding coroutine.

Diagram depicting how event variables are handled

With all this, the C++ code that Verilator generates for delays and event statements is very similar to the original SystemVerilog source code.

initial forever begin
    @ping;
    #10;
    ->pong;
end

This SystemVerilog corresponds to the following C++ code. The snippet shown here is simplified for readability, but the structure of the verilated code is preserved.

Coroutine initial() {
    while (true) {
        co_await eventDispatcher[&ping];
        co_await delayedQueue[TIME() + 10];
        eventDispatcher.trigger(&pong);
    }
}

As mentioned before, one of the main reasons for the switch to coroutines is performance. The original, thread-based implementation was hundreds of times slower than vanilla Verilator when simulating CHIPS Alliance’s SweRV EH1 core. Just replacing threads with coroutines resulted in 3-time speedup in SWeRV. Further optimization, the most crucial part being detecting which parts of a design need dynamic scheduling, resulted in indistinguishable performance between vanilla Verilator and our version when using Clang for verilated code compilation.

Next steps and future goals

There is still more work to be done. We are continuously working on improving the dynamic scheduler in the following areas:

  • working out some remaining edge cases,
  • making it work with Verilator’s built-in multithreading solution,
  • adding new test cases to push these new features to their limits.

Our goal is to provide the dynamic scheduler in Verilator as an optional scheduler that users can enable if they want more SystemVerilog compatibility. Of course users should bear in mind that it is not as well-tested as Verilator’s default behavior, but this will most likely improve as we find more practical use cases to make use of the solution.

Naturally, many more features are needed to provide full UVM support. This, among others, includes:

  • the built-in process class, which is used for controlling the behavior of a SystemVerilog process,
  • randomized constraints, which let the user generate test data easily by specifying constraints for random generation of said data,
  • better support for assertions, which are statements that allow for verifying that certain conditions are fulfilled by a tested design.

The dynamic scheduler is part of a bigger undertaking driven by Antmicro within the CHIPS Alliance to create fully open source toolchains and flows for FPGA and ASIC development. Together with Surelog/UHDM, a project aiming at providing a complete SystemVerilog parsing and elaboration solution, this brings us closer to being able to simulate, test and verify designs which use UVM with entirely open source tools.

SATA Design Implementation on FPGAs with Open Source Tools

By Blog

This post was originally published at Antmicro.

Real-world FPGAs designs often require high rate transmission protocols such as PCIe, USB and SATA which rely on high speed transceivers for external communication. These protocols are used to interface with various devices such as graphics cards and storage devices, and many of our clients reach out to us specifically because they need the flexibility, high-throughput and low-latency characteristics of FPGAs.

In particular, for customers that deal with high data volumes (which is very common in video applications), implementing SATA to communicate and transfer data with e.g. an SSD hard drive is a must.

Since Antmicro believes in an open source, vendor neutral approach to FPGAs, today we will describe how to build a SATA-enabled system using a completely open source flow, including the hardware platform, FPGA IP as well as, perhaps most importantly, tooling which we have been developing as part our bigger effort within CHIPS Alliance.

Origin and motivation

Antmicro is a pioneer in a software-driven approach to developing FPGAs. On top of new hardware description languages, open source IP and software that have been gaining traction in the FPGA space, one necessary missing element has been open source tooling. Open tools allow a workflow more familiar to software developers, who are used to just downloading their toolchain without having to log in anywhere or managing licenses.

Moreover, open tools provide the great advantage of easy to set up CI systems that keep track of regressions and allow more efficient and robust development.

Some of our forward-looking customers such as Google require these kinds of workflows to take full control of their development toolchain, for various reasons: securitydevelopment productivityscale. Others, like QuickLogic, who thanks to the cooperation with us are the first ever FPGA vendor company to fully embrace open source tools, are looking to deliver a more tailored experience to their own customers, which is easier to do based on open source.

To prove the viability of open source FPGA tools, being able to implement high-speed interfaces to verify how the toolchain handles high-speed transceivers is key; thus, a fully open source SATA is a very good target, especially that an open source core, LiteSATA, was available in our favorite open source SoC generator for FPGAs, LiteX. What was missing was a hardware platform, putting it all together, and – of course – tools.

Hardware setup

The SATA design we developed is meant to run on top of a Nexys Video board from Digilent, featuring an Artix7 200T Xilinx FPGA, coupled with custom expansion board connected through the FMC connector and hosting an M.2 SSD module. Thanks to the FMC connector on the Nexys Video we achieved a relatively simple and modular hardware setup.

The FMC expansion board, developed by Antmicro, is fully open-sourced and available on GithHub.

Open source SATA hardware setup

FPGA gateware and system block diagram

The FPGA design is generated with the LiteX SoC builder and the main components that we used are:

  • The VexRiscV RISC-V CPU
  • The LiteDRAM controller to communicate with the DDR Memory
  • The LiteSATA core to communicate with the SSD module on the custom expansion board
  • a UART controller to be able to control the system from the host

Moreover, the software running in the SoC includes a simple BIOS that can perform SATA initialization and basic read and write operations of the sectors in the SSD drive.

Running open source SATA diagram

Open source toolchain

The SATA setup proves that high speed protocols can be enabled on mainstream FPGAs such as Xilinx 7-series with an open source toolchain, with Yosys for synthesis and VPR for place and route. The LiteSATA IP core makes use of so-called GTP hard blocks, and in fact one of the main challenges we dealt with here was enabling these hard blocks in the Artix-7 architecture definition to get an end-to-end open source toolchain.

Other than enabling more coverage of popular FPGAs, much of our current FPGA toolchain effort goes into increasing the interoperability of tools like VPR and nextpnr as well as their proprietary counterparts to enable a more collaborative ecosystem which would enable the community – including universities, commercial companies, FPGA vendors and individual developers – to tackle the ambitious goal of open source FPGAs together.

For more information on the FPGA interchange and how much value it brings to the open source FPGA tooling refer to the dedicated Antmicro blog note. In the future, once that work is at a more advanced stage, LiteSATA will be one of the first example designs to be tested with the FPGA interchange-enabled tools.

Building and running the setup

The FPGA SATA design is available in the symbiflow-examples repository and can be built with the open toolchain, and run on the hardware setup described above.

After following the instructions to install the toolchain and preparing the environment, run the following to build the LiteX SATA design:

cd xc7
make TARGET=”nexys_video” -C litex_sata_demo

When the bitstream is generated, you can find it in the build directory under litex_sata_demo/build/nexys_video.

To load the bitstream on the Nexys Video board you can use the OpenFPGALoader tool, which has support for the board.

Once the bitstream is loaded on the FPGA, you can access the BIOS console through the UART connected to your host system and run the following (note that X depends on the assigned USB device):

picocom --baud 115200 /dev/ttyUSBX

When the LiteX BIOS gives you control, you need to perform the SATA initialization before being able to read and write sectors on the drive. See the output below:
Running LiteSATA with SymbiFlow in console

Future goals

The work on enabling the SATA protocol in a fully open source flow was one of the steps on the way towards supporting PCIe in the toolchain which will unlock even more advanced use cases. PCIe can be used for a variety of purposes, such as connecting external graphic cards or accelerators to an FPGA design, and generally enable even faster transmission rates from and to the FPGA chip.

Open Source FPGA Platform for Rowhammer Security Testing in the Data Center

By Blog

This post was originally published at Antmicro.

Our work together with Google and the world’s research community on detecting and mitigating the Rowhammer problem in DRAM memories has been proving that the challenge is far from being solved and a lot of systems are still vulnerable.
The DDR Rowhammer testing framework that we developed together with an open hardware LPDDR4 DRAM tester board has been used to detect new attack methods such as Half-Double and Blacksmith and all data seems to suggest this more such methods will be discovered with time.

But consumer-facing devices are not the only ones at risk. With the growing role of shared compute infrastructure in the data center, keeping the cloud secure is critical. That is why we again teamed up with Google to bring the open source FPGA-based Rowhammer security research methodology to DDR4 RDIMM used in servers by designing a new Kintex-7 platform for that use case specifically, to foster collaboration around what seems to be one of the world’s largest security challenges.

Hardware overview

Open source data center Rowhammer tester board

The data center DRAM tester is an open source hardware test platform that enables testing and experimenting with various DDR4 RDIMMs (Registered Dual In-Line Memory Module).

The main processing platform on this board is a Xilinx Kintex-7 FPGA which interfaces directly with a regular DDR4 DIMM connector. The new design required more IOs compared to the LPDDR version, which was a major driving factor for changing the Kintex-7 FPGA package from 484 to 686 pins.

Basing the test platform on the Kintex-7 FPGA allowed us to implement a completely open source memory controller – LiteDRAM – fully within the FPGA just like for the LPDDR case. The system can thus be modified and re-configured on both hardware and software level to freely sculpt memory testing scenarios, providing developers with a flexible platform that can be easily adjusted to new data center use cases. Our previous design was targeting a single channel from a single LPDDR4 IC, featuring specially-designed modules to cover for the fact that LPDDR memories aren’t meant to particulary “modular”. For the data center use case however, as reflecting the more standardized nature of that space, the new board can handle a full-fledged, off-the-shelf DDR4 RDIMM with multiple DRAM chips.

As in the LPDDR4 version, the new board features different interfaces to communicate with the FPGA, such as RJ45 Gigabit Ethernet and a Micro USB console. Additionally, there is an HDMI output connector for development purposes. Other features include:

  • A JTAG programming connector
  • A microSD card slot and 12 MBytes flash memory
  • HyperRam – external DRAM memory that can be used as an FPGA cache.

What is worth stressing here is that unlike LPDDR4, DDR4 modules don’t have to be custom made and are available to buy off the shelf – an advantage that greatly expands the potential applicability and outreach of the platform.

Block diagram depicting open source data center Rowhammer tester platform

Using open source to transform data centers

The DRAM tester described here is meant, of course, to be used with the Antmicro open source Rowhammer testing framework mentioned in the opening of this blog note. The list of devices discovered to be vulnerable to attacks so far is significant, and the new design will help to cover a huge chunk of data center oriented memory modules.

The DRAM testing capabilities of the Rowhammer tester are not limited to DDR4 RDIMM memories and LPDDR4 only. Plans for 2022 include support for LPDDR5 and DDR5, which will result in more hardware and collaborations, and hopefully more mitigation techniques. With an open source DRAM controller at the heart, the framework offers potential of collaboration around building Rowhammer mitigations into the controller itself, using the transparency of open source IP to stay one step ahead of the potential attacks.

The recent data center security work is part of our wider effort to push open source tooling, methodologies and approach to data center customers. In a similar vein, within the LibreBMC group in OpenPOWER Foundation, we are leading a project to replace ASIC-based BMC (board management controllers) with soft CPUs running on popular and low-cost FPGA platforms. LibreBMC will be a completely transparent security and management solution both in terms of hardware and software and includes two boards compatible with OCP’s DC-SCM standard based on the Xilinx Artix-7 and Lattice ECP5 FPGAs respectively.

Complementing our software capabilities in scaling huge workloads and building robust design, development, test and CI, simulation and verification pipelines, our data center oriented platforms also include Scalenode, which shows how open source hardware can be used to build modular servers based on both ARM (Raspberry Pi 4 CM) and RISC-V (ARVSOM).

Our open source based services ranging from ASIC and hardware design through IP and software development lets us offer comprehensive help to a wide array of data center customers, to improve their security, development speed and collaboration capacities.

The DDR testing platform in a broader context

The data center DRAM tester is further proof that the open source hardware trend spearheaded by Antmicro can bring practical value, especially in terms of security and collaboration capability. Developing a completely open framework, configurable down to the DRAM controller itself, has led us to some fantastic collaborations and sparked ideas which would otherwise be impossible to implement. Both the CHIPS Alliance, the OpenPOWER Foundation and RISC-V International have a keen interest in taking the memory controller work forward, potentially leading up to ASIC-proven DDR controller IP.

An open source IP ecosystem which we are actively participating in could revolutionize how ASIC and FPGA systems are built. It is one of the key components in a wider push for a more open source, pragmatic and software-centric approach to hardware that we are helping shape at the global level by participating in policy-making initiatives in the EU and US.

On a more down-to-earth note, the data center platform is yet another permissively licensed open source board in our arsenal, and can serve as a good reference design for Kintex-7 projects which we are happy to customize and build upon for other areas or types of research for our customers.

Software-driven ASIC Prototyping Using the Open Source SkyWater Shuttle

By Blog

This post was originally published at Antmicro.

The growing cost and complexity of advanced nodes, supply chain issues and demand for silicon independence mean that the ASIC design process is in need of innovation. Antmicro believes the answer to those challenges is bound to come from the software-driven, open source approach which has shaped the Internet and gave rise to modern cloud computing. Applying the methodologies of software design to ASICs is however notoriously viewed as difficult, given the closed nature of many components needed to build chips – tools, IP and process design kits, or PDKs for short, as well as the slow turnaround of manufacturing.

The open source, collaborative SkyWater PDK project, combined with the free ASIC manufacturing shuttles running every quarter from Google and efabless, has been filling one of those gaps. Add to it open source licensed ASIC design tools we are helping develop as part of CHIPS Alliance as well as massive parallelization capabilities offered by the cloud and what you get is an ASIC design ecosystem at the verge of a breakthrough. To effect this change, together with Google, efabless, SkyWater and others we are working on more developments, including letting the shuttle designs benefit from software-driven hardware productivity tools such as LiteX and Renode (which we are currently helping the SkyWater shuttle effort to adopt), as well as new and exciting developments in the process technology itself.

If you want to participate in making ASIC design history, let us show you why and how the shuttle program is the way to do that. And by the end of this article, hopefully you will want to participate in the next, fourth shuttle with the submission deadline at the end of this year.

chip tapeout

SkyWater PDK – some background

In May 2020 Google and SkyWater Technology Foundry released the first ASIC proven open source PDK. The PDK targets the 130 nm process which, while not state-of-the-art, is still in widespread practical use, especially in mixed-signal and specialized designs.

The PDK release involved restructuring the original code and data and properly documenting all the available cells in the PDK. This operation was performed in a collaboration between a group of industrial and academic partners, with Antmicro’s effort focused mostly on developing tools for automatic PDK structuring and documentation.

An open source PDK was a key missing piece in end-to-end open source ASIC development, but in itself would not allow the average developer to feel the change. To enable developers to work with the PDK in practice and build fully open source chips with fast turnaround that is necessary to breed the necessary innovation, Google funded the Open MPW Shuttle Program operated by efabless, a fellow CHIPS Alliance member. The program assumes the applying projects are fully open source and based on a permissive license, targets the 130 nm SkyWater process and uses open source ASIC toolchain. Projects accepted in the program are then manufactured and the authors receive their packaged ASICs without any additional costs – production, packaging, testing and delivery is all covered for.

The program is a great opportunity for any developer wanting to develop open source ASICs and contribute to the emerging open source ASIC community. The first shuttle program attracted 37 projects, including:

  • Five RISC-V SoCs
  • A cryptocurrency miner
  • A robotic app processor
  • A template SoC based on OpenPOWER
  • An Amateur Satellite Radio Transceiver
  • Analog/RF IPs
  • Four eFPGAs
  • Antmicro’s AES-128 core integration.

We have been assisting customers expressing the desire to participate in the SkyWater shuttle in assessing the feasibility of their designs, creating the necessary workflows and adapting the tools involved to their particular needs.

Our engineering services can be used to enhance your development teams with the ability to use open source tools more effectively and integrate with your infrastructure in a way which allows you to benefit from the open source’s capabilities while not disrupting your internal workflows unnecessarily.

In total, over 100 designs have been sent to fabrication so far, many authored by teams with a predominantly software background. With over 2500 users in the SkyWater open source PDK slack, this is truly a community in the making.

Most of the designs in the shuttles use the Caravel harness design which implements a RISC-V CPU with some base peripherals, OpenRAM generated memory, an I/O ring and a user area where developers can place their designs. The harness design is meant to be a fixed block / starting point which significantly lowers the entry level for the ASIC developers, but as such is also subject to evolution to better answer the needs of the shuttle participants, which we will describe later in the note.

Open source ASIC tools

The core part of the PDK shuttle process uses the OpenLane toolchain, a flow based on the OpenROAD project, also a part of CHIPS Alliance. The toolchain implements all the steps required to generate a production-ready ASIC layout (GDS) from an RTL design.

ASIC design with SkyWater Shuttle diagram

Since production is the most expensive and time consuming part of the process, testing and validation are key stages in need of innovation, and the experiences learned from the SkyWater shuttle effort are invaluable.

Under the auspices of CHIPS Alliance, Google, Western Digital and Antmicro are leading the work on enabling fully open source SystemVerilog development, testing and validation. The work focuses on a number of design flow aspects, including:

All these are meant to improve the development experience and benefit from the inherent scalability and reusability of open source tools to offer practical value for teams building new ASIC designs.

Adoption of LiteX for Caravel

Open source design tools constitute one aspect of fully open source ASIC design. The other aspect, just as important as tooling, is open source, high-quality, reusable IP cores, and indeed the very rules of the SkyWater shuttle program encourage developers to open source their design and reuse existing cores.

At the core of the shuttles is the Caravel harness. To improve the shuttle’s user experience and let the community benefit from a wider array of off-the-shelf tools and cores, we are assisting with the ongoing effort of adopting the Caravel design to be based on LiteX.

LiteX, a widely known open source SoC generator, will make it possible for more open source cores to be integrated with ASIC designs, ultimately lowering the entry barrier for software engineers. It comes with multiple ready to use cores, including an open source DRAM controller used in the Rowhammer test platform we described some time ago. This alternative harness, whose development you can track in a dedicated GitHub repository, will open the door to more contributions from the LiteX community and allow us to use a bunch of tools that we have already integrated, like our open source simulation framework, Renode.

Renode’s hardware/software co-development capabilities

The LiteX framework provides developers with an easy way to experiment with various different CPU cores. Testing a system against many possible cores, often running complex software, makes validation no trivial task.

Renode, Antmicro’s open source development framework, features advanced SW/HW/FPGA/ASIC co-simulation capabilities and has been directly integrated with LiteX to generate the necessary configurations that correspond to the hardware system. Renode supports a multitude of CPU, I/O IP, sensor and network building blocks, both native to LiteX and otherwise, allowing its users to simulate the entire platform design before implementation, i.e. in the pre-silicon stage.

Renode addresses the profound challenge of testing complex software, running it on various CPUs and using custom peripheral cores at the same time. Developers can make use of Renode’s ability to co-simulate with Verilator or with physical hardware, reducing the simulation time of SoC systems that utilize custom IP cores.

Back in September, Antmicro presented a case of co-simulating the popular Xilinx Zynq-7000 SoC running Linux with a verilated FastVDMA core, and of course co-simulation with platforms like the PolarFire SoC is something we have been steadily improving on with our partner Microchip.

A similar kind of development methodology will be possible with the new Caravel harness.

Taking that HW/SW co-design workflow to its natural consequence, as showcased by our work with Google, Dover Microsystems and others, Renode allows developers to build SW-oriented hardware faster than HDL and benefit from the flexibility known from software development cycles where iterations happen in a matter of days. Recently, Renode has been extended with support for RISC-V vector instructions which translates into a further improvement of the development process of machine learning algorithms in open source ASICs.

Scaling into the cloud and hybrid setups

Building and testing ASIC designs is often a time and resource intensive task. The open source tooling approach, endorsed by the SkyWater shuttle program, possesses an important advantage over any proprietary perspective – it allows for infinite scaling of compute resources as there are neither licensing costs nor other license related limitations involved.

Developments around distributed and scalable cloud based CI/CD systems like self-hosted GitHub Actions runners in GCP, a collaboration between Antmicro and Google, are providing the ecosystem with new options for reliable, fast testing and deployment of ASIC designs. Cloud based CI systems can be built to combine both closed and open source solutions, providing hybrid solutions that fill the gaps of either approach. And on a more general level, scalable and accessible CI/CD systems facilitate collaboration between large and geographically distributed teams of developers.

New developments

SkyWater PDK is being constantly improved, extending the possibilities for future designs. One of the recent add-ons to the PDK is a ReRAM library which can be used to develop non-volatile memories using the SkyWater 130nm technology.

Further SkyWater PDK development plans include extending the PDK portfolio with 180nm, 90nm and 45nm technology processes – stay tuned for upcoming developments in that space!

Participate in shuttle runs

Three shuttle runs have already happened, and thanks to Google’s commitment as well as the overwhelming interest from business, research and government institutions, the project will continue through 2022 and most likely beyond. The 4th shuttle run is currently open and will be accepting submissions by December 31, 2021.

For projects that, for any reason, cannot be open sourced or submitted within the timeline of the open shuttle, a private shuttle called ChipIgnite has been created.

Open Source Debayerization Blocks in FPGA

By Blog

This post was originally published at Antmicro.

In modern digital camera systems, the captured image undergoes a complex process involving various image signal processing (ISP) techniques to reproduce the observed scene as accurately as possible while preserving bandwidth. On the most basic level, most CCD and CMOS image sensors use the Bayer pattern filter, where 50% of the pixels are green, 25% are red and 25% are blue (corresponding to the increased sensitivity of the human eye to the green color). Demosaicing, also known as debayering, is an important part of any ISP pipeline whereby an algorithm reconstructs the missing RGB components/channels of each pixel by performing interpolation of the values collected for individual pixels.

Diagram depicting debayerization process

The rapid development of FPGA technologies has made it possible to use advanced ISP algorithms in FPGAs even for high-res, multi-camera arrays, which is great news for resource-constrained real-time applications where image quality is essential. In our R&D work, we are developing reusable open source building blocks for various I/O and data processing purposes that can be used as a starting point for customer projects which need to be bootstrapped quickly, and those include IP cores for debayerization.

Open source debayerization

As part of a recent project, we implemented and tested an open source FPGA-based demosaicing system that converts raw data obtained from CCD or CMOS sensors and reconstructs the image using three different interpolation algorithms controlled via a dedicated wrapper. The three interpolation methods are:

  • Nearest neighbour interpolation, where the nearest pixel is used to approximate the color value. This algorithm is simple to implement and is common in real-time 3D rendering. It uses a 2×2 px matrix and is the lightest and easiest method to implement.
  • Bilinear interpolation, which establishes color intensity by calculating the average value of the 4 nearest pixels located diagonally in relation to the given pixel. This method uses a 3×3 px matrix and gives better results than the nearest neighbour interpolation method but takes up more FPGA resources.
  • Edge directed interpolation, which calculates the pixel components in a similar way to bilinear interpolation, but uses edge detection with a 5×5 px matrix. This algorithm is the most sophisticated of the three, but gives the best results and eliminates zippering.

System structure

The demosaicing system consists of two parts. The most important part is formed by the demosaicing cores representing the three algorithms described earlier.

Diagram depicting dmosaicing wrapper

The software part of the system runs in the FPGA and features the bootloader, operating system (Linux), Antmicro’s open source FastVDMA IP core that controls data transmission between the demosaicing setup and the DDR3 memory, and a dedicated Linux driver that makes it possible to control the demosaicing cores from software.

Open source FPGA IP cores for vendor-neutral solutions

Apart from building highly-capable vision systems based on FPGA platforms, we are developing various tools, open source IP cores and other resources to provide our customers with a complete, end-to-end workflow that they can fully control.

Some of our recent projects include the FPGA Interchange Format to enable interoperability between FPGA development tools, an open source PCIe link for ASIC prototyping in FPGA or an FPGA-based testing framework for hardware security of DRAM. If you would like to benefit from introducing a more software-driven, open source friendly work methodology into your next product development cycle, reach out to us at contact@antmicro.com and keep track of our growing list of open IP cores at the Antmicro Open Source Portal.

How Google is Applying Machine Learning to Macro Placement 

By Blog

CHIPS Alliance’s latest Deep Dive Cafe featured an outstanding talk by a Google physical design engineer, Young-Joon Lee, who has a PhD from Georgia Tech, and has been working on machine learning physical design projects for the past two years. 

The chip placement problem is a notoriously challenging design problem, and has been explored in the electronic design automation research and development community for years. For those unfamiliar with the problem, it involves finding the optimal placement of physical cells implementing the logical function on a chip image to minimize performance, power, and area of the silicon, which in the end affects the cost of the product. The effort by Google to apply machine learning to the placement problem started as part of the Google Brain effort, which in part focuses on running algorithms at scale on large amounts of data.

Deep learning itself started taking off in 2012, with computational needs rapidly increasing every three to four  months. Machine learning is particularly applicable to the placement problem as different moves that are tried for training can be fed back to the underlying network as part of reinforcement learning. Models are trained and scale on distributed systems with billions of parameters. 

In this talk, Young-Joon shared how Google devised a hybrid approach to the placement problem, first placing large macros with reinforcement learning, and then using force directed placement for standard cells. As part of this effort, Google used an open source RISC-V processor called Ariane. The presentation highlighted the quality of results achieved, challenges, and the overall designer productivity that was afforded with the development of this technique. Of note, the recently released TPU v4 was able to be placed within 24 hours by the machine learning approach compared to six to eight  weeks by a physical design engineer, achieving 3% shorter wirelengths and 23 more DRC violations. 

Finally, next steps in the project were discussed as well as how Google is exploring more machine learning techniques to use for other parts of electronic design automation. 

Watch the presentation below and check out the slides here: Learning To Play the Game of Macro Placement with Deep Reinforcement Learning.

Improving the OpenLane ASIC Build Flow with Open Source SystemVerilog Support

By Blog

This post was originally published at Antmicro.

Open source toolchains are key to building collaborative ecosystems, welcoming to new approaches, opportunistic/focused innovations and niche use cases. The ASIC design domain, especially in the view of the rising tensions around manufacturing and supply chains, are in dire need of a software-driven innovation based on an open source approach. The fledgling open source hardware ecosystem has been energized by the success of RISC-V and is now being vastly expanded to cover the entire ASIC design flow by CHIPS Alliance, and Antmicro has been playing a leadership role in both of these organizations as well as offering commercial engineering and support services to assist with early adoption of open source approaches in hardware.

One of CHIPS Alliance’s projects, the DARPA-funded OpenROAD, has created the necessary tooling to build open source ASIC-oriented flows such as OpenLane and OpenFASoC, becoming one of the central elements of the open ASIC ecosystem. The OpenROAD project, led by prof. Andrew Kahng from University of California San Diego, aims at a fully end-to-end, RTL-GDSII flow, providing accessibility and collaboration options that are often not available for proprietary tools.

Antmicro is helping early adopters of OpenROAD-based flows, providing development services improving the tools themselves – such as the open source SystemVerilog support described in this note. Our engineering services involve introducing better, open workflows on the design and implementation level, but also scaling up into the cloud in collaboration with Google Cloud.

There has been a lot of progress recently in the practical adoption of open ASIC design workflows, with most of the effort currently focused around the fully open source 130 nm SkyWater PDK for the release of which we collaborated with Google, efabless, SkyWater Technologies and many others. What is perhaps less known is that there have been successful tapeouts based on OpenROAD in more modern processes such as GF12 (and together with our customers we are planning more next year), but of course there is a long way to go until the toolchain can be considered a viable replacement for your latest-gen smartphone SoC design. There are, however, serious advantages in favor of an open source ASIC design flow, and to get there, delivering practical value in useful increments is needed. This is why we are working in CHIPS Alliance on making the flow more useful for practical designs.

Why open source tools for ASIC development?

The hope for open source ASIC flows like OpenLane is to provide multiple benefits:

  • enable new, less conventional approaches
  • provide more capabilities of vertical integration
  • encourage more software-driven and AI-assisted workflows
  • use the infinite scalability of the cloud without worrying about licensing costs

In this early and emerging ecosystem, having a commercial support partner like Antmicro which can be relied on for the tools development is important for success – this lets you focus on your design while we take care of the infrastructure needs. Already now there are many niche use cases which can benefit from employing point open source improvements, and our work in CHIPS aims at broadening the application area of open source ASIC design flows to capture the needs of the broader market and bring the benefits of software innovation to hardware. We offer flexible, scalable engineering services to support our customers every step of the way.

Open Source SystemVerilog support in OpenLane

OpenLane is an automated RTL to GDSII flow that is composed of several tools such as OpenROADYosysMagicNetgenFaultCVCSPEF-ExtractorCU-GRKlayout and a number of scripts used for design exploration and optimization. This collection of tools performs all steps required in a full ASIC implementation from RTL to GDSII.

The flow, depicted below, consists of several highly customizable stages. In the initial stage the RTL source files written in an HDL (Hardware Description Language) are synthesized, technology mapped and analyzed in terms of timing. In the following step the floorplan and power distribution network are prepared for the design placement. Once finished, the design and clock placement is performed followed by global routing. With the physical layout ready in the form of a DEF (Design Exchange Format) file, it’s possible to perform several DRC checks, including Antenna Net or LVS (Layout vs. Schematic), to eventually generate the final GDSII layout file which contains a complete layout description that is now ready to be sent to the foundry.

Diagram depicting openlane flow

Our latest improvement in the OpenLane open source ASIC build flow is adding a Surelog/UHDM/Yosys flow that enables SystemVerilog synthesis without the necessity of converting the HDL code to Verilog as an intermediate step.

Yosys is a highly extensible open source HDL synthesis tool used in the RTL synthesis step of the OpenLane flow. Yosys has extensive Verilog-2005 support, but it needs additional plugins to support other languages such as VHDL or SystemVerilog. Previously, SystemVerilog support was only supported with a proprietary plugin, but Antmicro has been involved in adding the SystemVerilog support to Yosys through recent contributions adding UHDM (Universal Hardware Data Model) frontend which has been described in a separate Blog note. UHDM is a file format for storing hardware designs and at the same time a library able to manipulate this format. Designs written in SystemVerilog can be elaborated to the UHDM format by Surelog, which is a tool that aims to be a fully-featured SystemVerilog 2017 preprocessor, parser and elaborator. In our recent efforts both the Surelog parser library and the UHDM library have been integrated with Yosys which essentially enabled seamless SystemVerilog support.

Diagram depicting surelog/uhmd -> yosys

This combination of Surelog/UHDM/Yosys is a very practical improvement for the OpenLane ASIC build flow as it enables using it with a number of existing ASIC designs which are often implemented with SystemVerilog (e.g. OpenTitan’s Ibex/Earl Grey (an SoC based on Ibex), OHG’s Core-VUniversity of Washington’s BlackParrotCHIPS Alliance’s SWeRV).

Together with Google, we are working with relevant communities to prove the feasibility of these designs with our open source flows and the open source SkyWater PDK.

Building and testing OpenLane with SystemVerilog support

The OpenLane toolchain relies on dockerized tools to implement the flow in practice. This allows the users to focus on building the designs without the necessity of handling complex tools dependencies.

In order to add SystemVerilog support to OpenLane, you need to have a Docker container with Yosys with UHDM support. First you will need to install the OpenLane flow files as well as the PDK (Process Design Kit) files.

git clone https://github.com/The-OpenROAD-Project/OpenLane.git
cd OpenLane

make openlane 
make pdk 
cd docker_build/
make build-yosys DOCKER_BUILD_OPTS=--no-cache
make merge

Once the tools are built and containers recreated, you can see the tools in action using this example CI run:

export IMAGE_NAME=<docker image name>
make test

Optionally, you can run the OpenLane container in interactive mode and manually run the OpenLane flow script.

export IMAGE_NAME=<docker image name>
make mount
./flow.tcl -design spm 
exit

The IMAGE_NAME environment variable selects which Docker image will be used. If not set, a default OpenLane image will be fetched from DockerHub. The default image does not implement the Surelog/UHDM/Yosys flow and will, most likely, fail when used with the synthesis scripts from the repository.

The Surelog/UHDM flow is very similar to the original (Verilog only) flow, but differs in the way Yosys handles the RTL design. This is because Surelog and UHDM (along with the Yosys UHDM frontend) are packed into a library and loaded into Yosys as a plugin. Loading the plugin is one of the first steps of the synthesis process. Once the plugin is loaded, you can proceed to loading the RTL files and continue with the rest of the flow.

Early adoption services for open source in ASIC design

At Antmicro, we are providing services to our customers developing custom ASICs on many levels both IP, tools (Yosys, VPR, OpenROAD, Renode etc.) as well as cloud scaling (distant, custom GitHub runners etc.), focusing on interoperability between the many building blocks needed for a complete ASIC design flow.

Our collaboration with CHIPS Alliance partners like Google and Western Digital has been spawning many interesting projects which contributed to maturing the open source ecosystem. Our linting and formatting work, used by such projects like OpenTitan and Core-V, recently presented at the CHIPS Alliance Fall Workshop, is a very good example of delivering incremental value with solutions to practical problems. Other projects which enable the blending of old and new methodologies for chip design are the efforts towards UVM support in Verilator or co-simulation in Renode.