Category

Blog

Advanced Co-simulation with Renode and Verilator: PolarFire SoC and FastVDMA

By Blog

This post was originally published at Antmicro.

Co-simulating HDL has been possible in Renode since the 1.7.1 release, but the functionality – critical for hardware/software co-development as well as FPGA use cases – is constantly evolving based on the needs of our customers like Google and Microchip as well as our work in open source groups including CHIPS Alliance and RISC-V International. To quickly recap, by co-simulation we mean a scenario where a part of the system is simulated in Renode but some specific peripheral or subsystem is simulated directly from HDL, e.g. Verilog. To achieve this, Renode integrates with Verilator, a fast and popular open source HDL simulator, which we are helping our customers adopt as well as expanding its capabilities to cover new use cases. Peripherals simulated directly from HDL are typically called Verilated peripherals.

Why co-simulate?

Co-simulation is a highly effective approach to testing IP cores in complex scenarios. HDL simulation is typically much slower than functional simulation, so to cut down on development turnaround time you can partition your design into a fixed part that you can simulate fast using Renode and the IP you are developing that you can co-simulate using Verilator. Renode offers a lot of ready-to-use components – for which you do not need to have a hardware description – which can be put together to provide a complete system to test new IP core designs.

Co-simulation in Renode so far

For more than a year now, Renode has supported using AXI4-Lite and Wishbone buses for the Verilated peripherals’ communication with the rest of the system. As can be seen with the Verilated UART examples included with Renode, these can be used not only to receive data from, but also to send data to the peripheral. Additionally, because Renode can simulate external interfaces, you can easily open a window for such a UART (with showAnalyzer command during simulation) and the whole interaction is driven by UART’s TX and RX signals.

Diagram depicting FastVDMA co-simulation

The riscv_verilated_liteuart.resc script shows the interaction between the system based on the VexRiscv CPU implemented in Renode and the UART based on an HDL model simulated with Verilator that together enable users to work with Zephyr’s shell interactively.

In these examples, however, all the communication is initiated from Renode’s side by either reading from or writing to UART. There is no way for the peripheral to trigger the communication, which limited use-cases where the co-simulation could have been used.

New co-simulation features

Thanks to our work with Microchip and Google, since Renode 1.12 new features are available in the VerilatorPlugin; the most important change is introducing the ability for a Verilated peripheral to trigger communication with other elements of the system. The Renode API was expanded to accommodate the new features. There are now two new actions in the co-simulating communication which enable a Verilated peripheral to initialize communication with Renode by sending data to or requesting data from the system bus.

Support for the AXI4 bus was also added. This is different from the AXI4-Lite bus, which only implements a subset of AXI4 features. The Verilated peripheral can either act on AXI4 bus as a controller that requests accesses (therefore utilizing these new Renode API actions) or as a peripheral (in the AXI4 bus context) that is only requested to take some action.

How it works

Creating a Renode agent (which handles an HDL model’s connection with Renode) with a role chosen for your peripheral is as simple as including an appropriate header and connecting signals. For example, FastVDMA acts as a controller on AXI4 bus with the following code in a file creating its Renode agent:

#include "src/buses/axi-slave.h"
VDMATop *top;

void Init() {
    AxiSlave* slaveBus = new AxiSlave(32, 32);
	
	
    slaveBus->aclk = &top->clock;
    slaveBus->aresetn = &top->reset;

    slaveBus->awid = &top->io_write_aw_awid;
    slaveBus->awaddr = (uint32_t *)&top->io_write_aw_awaddr;

    // Connecting the rest of the signals to the bus

Another new feature is the ability to connect the Verilated peripheral through multiple buses. For example, now it’s possible to have a peripheral connected to both AXI4 and AXI4-Lite buses, which enables even more complex HDL models to be tested with Renode. This is the case of the FastVDMA, which is connected this way:

#include "src/buses/axi-slave.h"
#include "src/buses/axilite.h"

RenodeAgent *fastvdma;

void Init() {
	AxiLite* bus = new AxiLite();
	AxiSlave* slaveBus = new AxiSlave(32, 32);

	// Initializing both buses’ signals

	//=================================================
	// Init eval function
	//=================================================
	bus->evaluateModel = &eval;
	slaveBus->evaluateModel = &eval;

	//=================================================
	// Init peripheral
	//=================================================
	fastvdma = new RenodeAgent(bus);
	fastvdma->addBus(slaveBus);

	slaveBus->setAgent(fastvdma);
}

FastVDMA

Fast Versatile DMA (FastVDMA) is a Direct Memory Access controller designed with portability and customizability in mind. It is an open source IP core developed by Antmicro in 2019 that is written in Chisel. It is a very versatile DMA controller because of the range of supported buses. It can be controlled through an AXI4-Lite or Wishbone bus while the data can be transmitted through either an AXI4, AXI4-Stream or Wishbone. Since its inception it has been used in a number of projects, e.g., bringing a GUI to krktl’s snickerdoodle. For more information it’s best to head to the original blog note about FastVDMA.

Expanding the PolarFire SoC ecosystem

Through Antmicro’s long-term partnership with Microchip, Renode is a vital part of the PolarFire SoC developer experience. Microchip customers are using the Renode integration with the vendor’s default SoftConsole IDE to develop software for the PolarFire SoC platform and it’s development board, the Icicle Kit.

PolarFire Icicle with Antmicro's HDMI breakout board

With the latest co-simulation capabilities, developers are able to explore a combination of hard and soft IP, developing their FPGA payload and verifying it within Renode. This allows them to work on advanced projects, from graphic output support similar to the one we implemented for snickerdoodle with FastVDMA (possibly employing our open-hardware HDMI board for Icicle Kit), through advanced FPGA-based crypto solutions, to design-space exploration and pre-silicon development, with the debugging and testing capabilities native to Renode.

To see how Renode allows you to simulate complex setups with multiple co-simulated IP blocks, see our tests of Verilated FastVDMA plus Verilated RAM.

It’s worth noting that as these blocks are isolated and interconnected only via Renode, you can easily prepare setups in which peripherals use different busses to connect. This gives you an ability to focus on the interesting details and to prepare your systems from building blocks more easily.

Commercial support

If you’d like to speed up the development of your ASIC or FPGA solution using the advantages of advanced co-simulation, Antmicro offers commercial support and engineering services around Renode, Verilator and other tools, as well as FPGA and ASIC design, hardware, software and cloud services. We work with organizations like CHIPS Alliance and RISC-V International enable a more software-driven hardware ecosystem based by open source. For a full list of our open source activity, head to our open source portal.

Progress on Building Open Source Infrastructure for System Verilog

By Blog

By Rob Mains, General Manager of CHIPS Alliance

SystemVerilog is a rich hardware design description and verification language that is seeing increased usage in industry. In the second Deep Dive Cafe Talk by CHIPS Alliance on July 20, Henner Zeller, who is an software developer with Google, provided an excellent in depth technical talk on building out an open source tooling ecosystem around SystemVerilog to provide a common framework that can be used by both functional simulation applications as well as logic synthesis. In case you missed the live presentation, you can watch it here

Henner started the talk by providing background and motivation for the work on expanding the tooling available for SystemVerilog. It was noted that SystemVerilog is being used by a large cohort in industry, but there is little awareness that open source tools supporting SystemVerilog exist. Of note is that SystemVerilog is analogous to C++ as is Verilog to C in terms of language robustness and complexity. As such, developing a framework and metric-based scorecard to measure the quality of applications supporting this is of importance. A key aspect of adoption of SystemVerilog and associated open source tooling is that it works out of the box. Subsequently, the so-called sv-tests framework was created by Antmicro and Google – and subsequently transferred to CHIPS Alliance – as a unit test system for SystemVerilog tools to assess the quality of the parsing and handling of different attributes of the language. To incentivize participation, a leader board has been created to show quality of results. 

The talk also provided an overview of the different parsers Surelog, sv-parser, and Verible, along with discussion of their positive attributes and limitations. This was followed by presentation of UHDM (Unified Hardware Data Model) – a data model allowing transferring parsed and elaborated SystemVerilog between frontend and simulation or synthesis software.

An example integration of the UHDM based flow developed collaboratively by CHIPS members Google and Antmicro was presented next. The development aims to integrate the UHDM handling capability with Verilator simulator and Yosys synthesis tools. The work is done publicly in CHIPS Alliance’s GitHub repositories. The UHDM integration tests repository collects all the pieces of the system and allows running the integration tests.

The development and measurement of different applications to parse SystemVerilog and robustly represent in a common data representation will do much to ensure the development of a solid framework for construction of different design automation tools. 

The next CHIPS Alliance Deep Dive Cafe is on Tuesday, Aug. 10 at 4 p.m. PT. Experts from Intel and Blue Cheetah will present on PHYS, protocols, EDA, and heterogeneous integration as well as latest developments on die-to-die interfacing. You can register for the event here.

What You Need to Know About Verilator Open Source Tooling

By Blog

By Rob Mains, General Manager of CHIPS Alliance

Verilator is a high performance, open source functional simulator that has gained tremendous popularity in its usage and adoption in the verification of chip design. The ASIC development community has widely embraced Verilator as an effective, often even superior alternative to proprietary solutions, and it is now the standard approach in RISC-V CPU design as the community has worked to provide Verilator simulation capabilities out of the box. CHIPS Alliance and RISC-V leaders Antmicro and Western Digital have been collaborating to make Verilator even more useful for ASIC design purposes, working towards supporting industry-standard verification methods in a completely open source flow.

On June 15 at the inaugural CHIPS Alliance Deep Dive Cafe Talk, Karol Gugala, an engineering manager with Antmicro, provided an excellent in-depth technical talk on extending Verilator towards supporting universal verification methodology, UVM. The event was well attended, and is now also available on YouTube.

Karol started the talk by providing motivation for the improvement and expansion of the System Verilog design environment, and in particular the importance of UVM to offer a common verification framework. He covered the current limitations of Verilator relative to its handling of UVM, and the goal of the community to more robustly support UVM and also System-Verilog as a design language, both of which are efforts Antmicro is deeply involved with as part of its work within CHIPS Alliance with other CHIPS partners. In the talk, Karol provided in-depth details of work that has been done to handle UVM.

The focus of the work summarized by Karol’s talk was the comparison of static event scheduling already existent in Verilator to dynamic scheduling which is required for event driven simulation, specifically event and time awareness. This necessitated a new event scheduling approach in Verilator to be implemented. The new scheduler is an option that the user can activate as a runtime switch. Future work includes: clocking blocks, assertions, assertions, class parameters, integration with UHDM, and further runtime optimization. Full UVM support for the SweRV RISC-V cores and OpenTitan – also based on a RISC-V implementation, Ibex – are two key goals. A detailed overview of how System Verilog statements are evaluated as members of different processes was provided. As an initial implementation, each process now runs in a separate thread, with processes and threads communicating via standard IPC synchronization protocols. Scheduling of the threads / processes is left to the OS scheduler, and further work in the project will include more involved scheduling techniques.

You can read more about dynamic scheduling in Verilator at Antmicro’s blog, and test all the new features of Verilator with examples of designs and some GitHub Actions-based CI simulations (based on Antmicro’s public repo).

The next talk in the CHIPS Deep Dive Cafe Series is the AIB Deep Dive + Opportunities Presented By Intel on Tuesday, Aug. 10 at 8 a.m. PT. You can register for the event here.

Efabless Launches chipIgnite with SkyWater to Bring Chip Creation to the Masses

By Blog
  • Program includes a pre-designed carrier chip and automated open source design flow from Efabless
  • SkyWater’s open source SKY130 process is the first node to be used to fabricate chips for the program
  • Initiative removes access barriers by significantly reducing cost and the need for deep semiconductor experience to design chips

Efabless, a community chip creation platform, today announced the launch of its new chipIgnite program to bring chip design and fabrication to the masses and a collaboration with SkyWater Technology for the first node supported in the program. The chipIgnite program expands upon the SKY130-based open source chip manufacturing program sponsored by Google and supports private commercial designs that include non-open source IP. This initiative represents another step forward in the industry to broaden access to chip design by giving people the ability to more easily create and fabricate chips.

The first shuttle in the chipIgnite program will support fabrication of student projects as part of the EE272B course in the Electrical Engineering department at Stanford University for senior undergraduate and graduate students.

The new program also has industry support from organizations including QuickLogic and the CHIPS Alliance.

“CHIPS Alliance is a major champion of open source hardware design and associated design automation tools. I am excited to see the chipIgnite program offered by Efabless to include many different collaborative IP developers to prove new ideas. The platform alleviates the barriers to entry into chip design and allows for ready exploration of many concepts,” said Rob Mains, general manager of CHIPS Alliance.

Read the full announcement at Efabless: https://info.efabless.com/press-release-efabless-launches-chipignite-with-skywater-to-bring-chip-creation-to-the-masses/

Antmicro’s ARVSOM RISC-V Module Announced

By Blog

This post was originally published at Antmicro.

We are excited to announce the ARVSOM – Antmicro’s fully open source, RISC-V-based system-on-module featuring the StarFive 71×0 SoC. Using the RISC-V architecture, which Antmicro has been heavily involved in since the early days as a Founding Member of RISC-V International, the SoM is going to enable unprecedented openness, reusability and functionality across different verticals.

We are excited to announce the ARVSOM – Antmicro’s fully open source, RISC-V-based system-on-module featuring the StarFive 71×0 SoC. Using the RISC-V architecture, which Antmicro has been heavily involved in since the early days as a Founding Member of RISC-V International, the SoM is going to enable unprecedented openness, reusability and functionality across different verticals.

ARVSOM

Open source-driven innovation

Since its conception, Antmicro has been enabling its customers to tap into the technological freedom that is inherent in open source. We’ve been developing cutting-edge industrial and edge AI systems using vendor-neutral and customizable solutions as well as actively developing and contributing to the tooling ecosystem, improving processes and unlocking even more system design options – often in alignment with our efforts driving RISC-V International and CHIPS Alliance. Targeted at a range of different use cases and enabling unmatched flexibility, ARVSOM is the latest product and example of Antmicro’s expertise in creating practical and easily modifiable technologies using open source.

Based on StarFive 71×0

Sitting at the heart of ARVSOM is the StarFive 71×0 system-on-chip – the first Linux-capable RISC-V SoC that is intended for mainstream, general-purpose edge applications and is expected to be used in millions of devices to be built on the open source RISC-V ISA. The SoC is also used in the BeagleV StarLight development platform (GitHub repo) which has just been shipped to beta users, including Antmicro, who will be verifying its robustness and reliability in real life use cases, while the development community focuses on further expanding the RISC-V software support. The SoC features a dual core U74 CPU from our partner and RISC-V pioneer SiFive, two AI accelerators: the open source NVDLA and SiFive’s Neural Network Engine, 1 MIPI DSI and 2 MIPI CSI interfaces, HDMI, Gigabit Ethernet, dual ISP, USB 3.0, while the production version of the SoC to be released later in the year will also have PCI Express.

Compatible with Scalenode for server room applications

The SoM will be compatible with our server-oriented Scalenode platform, released earlier this month, originally designed as a baseboard for the Raspberry Pi 4 Compute Module. Used together, Scalenode and ARVSOM will enable building easily scalable and flexible infrastructures consisting of clusters of small RISC-V-based compute units, with as many as 18 Scalenode boards fitting in a 1U rack. ARVSOM will also work with other Raspberry Pi CM 4 baseboards, opening the door to a host of solutions and custom devices to be built with it right away.

ARVSOM in Renode

You can already start software development with our ARVSOM and the BeagleV StarLight SBC thanks to support in Renode, our open source simulation framework. The two have joined an array of platforms and sensors that can be simulated in our open source framework for system development and testing. You can freely co-develop hardware and software in Renode’s virtual environment, with the code behaving exactly the way it would on real hardware. Renode can also be used throughout the development lifecycle to enable continuous testing and integration, as well as prototyping new solutions based on RISC-V, as it features extensive support of this open source ISA. You can read more about the latest updates in Renode in the version 1.12 announcement blog note.

Our SoM is in development and will be gradually unveiled throughout the year with the increasing StarFive 71×0 SoC availability.

ARVSOM can be customized and integrated into your project. We have been helping our customers embrace the groundbreaking RISC-V architecture and enable them to reap the benefits of having an advanced system based on vendor-neutral, future-proof and robust technologies. Get in touch with us at contact@antmicro.com if you need a modern product that you have full control over.

Dynamic Scheduling in Verilator – Milestone Towards Open Source UVM

By Blog

This post was originally published at Antmicro.

UVM is a verification methodology traditionally used in chip design which has historically been missing from the open source landscape of verification-focused tooling. While new, open source approaches to verification have emerged that include the excellent Python-based Cocotb (that we also use and support) maintained by FOSSi Foundation, not everyone can easily adopt it, especially in long-running projects and existing codebases that use a different verification approach. Leading the efforts towards comprehensive UVM / SystemVerilog support in open source tools, we have been gradually completing milestones, getting closer to what will essentially be a modular, collaboration-driven chips design methodology/workflow. Some examples of our activity in this space include enabling open source synthesis and simulation of the Ibex CPU in Verilator/Yosys via UHDM/Surelog, and the most recent joint project with Western Digital in which we have developed dynamic scheduling in Verilator.

Verilating non-synthesizable code

The work is a part of a wider, long-term effort that, besides Western Digital, has involved other fellow CHIPS Alliance members such as Google, to develop UVM / SystemVerilog support in open source tools. To get closer to this goal, this time we have introduced an improvement to Verilator – an open source tool that is great at doing what it was originally designed to do, that is simulating synthesizable code, but hasn’t been enabling its users to convert non-synthesizable code and run fully fledged testbenches written in Verilog/SystemVerilog. The biggest challenge to opening that possibility turned out to be Verilator’s original scheduler running everything sequentially, which in general is not a bad approach but can get you only so far without actually executing the code in parallel.

Dynamic scheduling in Verilator

To run proper UVM testbenches in Verilator, we had to be able to properly handle language constructs specifically designed for use in simulation. Those features include delay statements, forks, wait statements and events. To achieve all of this, we needed to add a proof-of-concept dynamic scheduler to Verilator.

How dynamic scheduling in Verilator works

Instead of grouping code blocks of the same type as the original scheduler does, the dynamic scheduler separates the code into many runnable blocks and then creates and spawns a process for each of them (when they are due to be executed). To deal with it we created a new process, which translates to a new high level thread for each initial blockalways blockforever block, each block in the fork statement etc. This allows us to pause the execution of one block and continue the execution of other blocks. Blocks can be paused when they encounter simulation control statements like:

  • The wait statement (e.g. wait (rdy == 1); or wait (event_name.triggered))
  • Waiting for an event (e.g @event_name;)
  • A delay (#10;)
  • Or even a join statement after fork (joinjoin_any – join_none is also supported, however it is not used to control the execution per se)

All of the statements listed above are not supported in Verilator’s original scheduler, so to be able to dynamically control the execution of those already dynamically spawned sub-processes, a way to react to changes in signal/events states needed to be implemented. We did it by wrapping the primitive types used for storing signal values in objects using higher abstraction levels, which now allows us to attach other processes to monitor values of specific signals and execute an arbitrary callback when a signal changes. Note that this is a mechanism used internally to implement the flow control statements and is not visible to the user writing Verilog code.

One additional thing that had to be re-written and adapted to the dynamic scheduler was the way non-blocking assignments are handled.

With a scheduler that is aware of the simulation time and other flow control statements, the same approach originally used by Verilator could not be applied anymore. A way to properly schedule assignments, and execute them in the proper moment was a crucial step in getting all of this to work.

Note that the features described above are the ones that are immediately visible to users of Verilator with the dynamic scheduler. This is however only the tip of the iceberg when it comes to the amount of internal changes needed to be done in Verilator.

Example usage

To present the usage of all the new Verilator features we have created a public GitHub repository with a number of example designs and GitHub Actions-based CI simulating them.

The examples range from single feature examples to more complex scenarios simulating a system with two UARTs sending data between each other.

The UART testbench example uses all the newly added features. It spawns a thread for every simulated UART block and each thread feeds the UART IP that is being tested with the predefined data (the “hello world” string in our example).

The UART block reads the data from an AXI stream interface, serializes it and sends it over the TX line.

The received data is available on the output AXI Stream bus.

Each testbench thread introduces a random delay between consecutive data chunk transmission.

The threads are synchronized using SystemVerilog events – e.g. once the data is fed into the UART block over the AXI stream interface, a thread is triggering an event informing the other part of the thread that the data is being transmitted by the IP.
Once transmission is done, another event is triggered informing the first thread that the IP is ready for a new data chunk.

Once all the test data is transferred between both tested UART instances, the threads are joined and data correctness is validated.

Next steps, future goals

This ongoing project is a big step towards UVM support in open tools as it not only removes a number of limitations in this area but also opens the door to future developments – which include a number of SystemVerilog features oriented towards verification which were earlier out of scope since open source UVM verification wasn’t on the horizon. Together with the effort to support SystemVerilog in Verilator using UHDM which is part of another ongoing project, in CHIPS Alliance we continue to work towards enabling open source design verification for everyone, which will revolutionize the way chips are built.

New MPW-TWO Program Will Provide Fabrication For Fully Open Source Projects

By Blog

By Rob Mains, General Manager of CHIPS Alliance

CHIPS Alliance is excited to announce that the hardware development community can submit their open source design projects to Efabless.com for space on their forthcoming shuttle. This opportunity comes after the success of having 40 submissions for the MPW-ONE shuttle; 60% of those designs were submitted by first-time ASIC designers. MPW-TWO is the second Open MPW Shuttle providing fabrication for fully open-source projects using the SkyWater Open Source PDK announced by Google and SkyWater. 

The shuttle gives designers the freedom to innovate without having to worry about the risks associated with the cost of fabrication. This is a great opportunity for individuals, universities, and industry to create their own IP and have it manufactured. 

The deadline for submission is June 18. Submissions must be open source designs and leverage open source tooling such as OpenROAD, OpenLane and other EDA applications that Efabless.com has at its design portal. Read more about the project requirements and submit here: https://efabless.com/open_shuttle_program/2.

I look forward to seeing the community’s contributions for this generous offering from Efabless.com and Google.

Modular, Open-source FPGA-based LPDDR4 Test Platform

By Blog

This post was originally published at Antmicro.

The flexibility of FPGAs makes them an excellent choice not only for parallel processing applications but also for research and experimentation in a range of technological areas.

We often provide our customers with flexible R&D platforms that can be easily adapted to changing requirements and new use cases as a result of our practice of using open source hardware, software, FPGA IP and tooling.

As an example of such activity, we have recently been contracted to develop a hardware test platform for experimenting with memory controllers and measuring vulnerability of various LPDDR4 memory chips to the Row hammer attack and similar exploits.

LPDDR4 test platform

Modular and cost-optimized

Targeting high-volume customer-facing devices where size, power use and unit cost are a priority, LPDDR4 does not come in the form of modules, while the hardware tools and software frameworks for testing it can be prohibitively expensive.
Despite efforts to mitigate the Row hammer exploit, a number number of memories available on the market remain vulnerable to the problem, which calls for a test platform that would allow experimenting with memory chips and memory controllers to devise new mitigation techniques.

Another issue is that preexisting work mostly relies on proprietary memory controllers which cannot be adapted to specific memory access patterns that trigger Row hammer.

To address this need, we have created a fully open source flow including Enjoy Digital’s open source memory controller LiteDRAM for which we implemented LPDDR4 support, to enable testing LPDDR4 memory chips.

What our customer needed was a flexible platform for developing security measures that would be cost-optimized for high volume production.

To accomplish that we’ve built a modular device that consists of the main test board and a series of easily swappable testbeds for different memory types, the first of which is already available on our GitHub.

What is more, thanks to being open source, the platform enables various research teams to combine efforts and work collaboratively on coming up with new attacks and mitigations, as well as fully reproduce the results of the work.

LPDDR4 test module

Experimenting reliably on a robust platform

The platform is based on Xilinx Kintex-7 FPGA and it features several I/O options: HDMI, which can be used for processing video data and experimenting with streaming and HDMI preview applications featuring RAM, USB for uploading your bitstream or debugging, as well as an SD card slot and GbE.

There is also additional 64MB of on-board HyperRAM that enables safe experimentation with interchangeable RAM chips under extreme conditions.

With Antmicro’s commercial development services the platform can be customized to meet your specific requirements, while the open source character of the solutions we use gives you full control over the product and vendor independence.

We help our customers build complicated FPGA solutionsembrace the dynamically growing open source tooling ecosystem and develop various technologies that allow developers to work more efficiently across the whole FPGA spectrum.

GitHub Actions Self-hosted Runners, Build Event Server and Google Cloud

By Blog

This post was originally published at Antmicro.

Continuous Integration and smart lifecycle management are key for high-tech product development, which is often a complex and multi-faceted process that requires automation to be efficient and failure-proof. At Antmicro, we’ve been creating various open source cloud and hybrid cloud solutions for our customers, helping them to encapsulate the complexity of their software stack. Lots of those projects cross the hardware/software boundary and involve a mix of open source and proprietary code, which means that fine-grained control of the CI setups are needed to make them work.

To provide the level of flexibility that we and our customers require, we often find ourselves working extensively on the underlying CI infrastructure, building open source solutions that can scale between organizations and teams. One such project involved creating a custom, local GitHub Actions runner, with containerized builds, support for Google’s Build Event Server and workload measurement and analytics; in collaboration with Google we then also enabled running an identical setup with the extra capabilities in Google Cloud.

Self-hosted runner diagram

Custom runner, more applications

GitHub is the world’s largest open source code sharing space, home to many of our open source projects such as Renode, the open source FPGA toolchain SymbiFlow or our open source ASIC development-focused SystemVerilog work. GitHub Actions – used by millions of developers worldwide – is a natural choice for those projects as the go-to CI flow. However, by default it provides compute resources – in the CI world traditionally called runners – with a specific hardware configuration which does not always fit the needs of the workloads that we deal with.

Especially in our work that involves ASIC and FPGA development flows – working towards enabling fully open source chip and IP design in our broader work in collaboration with our customers and fellow CHIPS Alliance members such as Google, Western Digital and QuickLogic, we find ourselves needing hybrid setups which would allow us to keep the code as well as the CI definitions public while being able to rely on internal infrastructure to do the heavy lifting. Long-running builds involving tools like VtR or OpenROAD that use lots of memory and CPU power, can greatly benefit from the flexibility that comes with the use of custom, self-hosted runners, and this solution also gives you a high degree of freedom in terms of integrating your runner with external hardware or tools that can’t be shared publicly. The latter is especially helpful in some of our other open source projects for things like benchmarking RISC-V, OpenPOWER and other cores or tracking the QoR for your FPGA designs. The Quality of Results and flexible Continuous Integration elements are extremely important for custom engineering projects we embark on, which typically integrate a variety of open source components together – fortunately the very nature of open source that constitute the predominant part of the tools we use makes such work much easier.

Virtual machines, distant-bes and Google Cloud integration

Our internal and our customers’ needs have called for the ability to integrate on-premise runners into our GitHub CI flows, which can be done using the GitHub runner project. For many of our projects, we provide flexible development infrastructure based on open source that allows us to better collaborate around shared code and to do that, we need to be able to scale compute resources between the private and public cloud. To enable feature parity with some of internal infrastructure, we also extended the self-hosted runners with some extra features.

Firstly, the custom runners developed as part of the project can be used with our distant-bes framework to push results in the Build Event Server format to custom results viewers transparently to the CI run itself. You can see an example of how this works in the symbiflow-examples repository. Secondly, we modified the runner so that instead of running the CI script on bare metal, it spawns virtual machines and performs the run steps inside them, collects results, and kills the machine, without changing the state of the host system’s kernel. This also allows us to gather performance metrics to see what the real utilization of the runner’s resources is – and we push those results in the form of graphs to our BES server.

Runner's resource utilization

Lastly, based on the needs of several of our collaborative open source projects with Google, we pursued yet another goal, namely, instantiating our self-hosted runners in Google Cloud, which enables our CI to spin powerful servers up and down on demand. This mix of robust internal infrastructure and always-available, scalable on-demand Google Cloud resources is very useful for heavy workloads run by multiple organizations. In the world of collaborative development in forums like the CHIPS Alliance and RISC-V International, this is no longer a nice-to-have, but a necessity.

Goings-on in the FuseSoC Project and Other Open Source Silicon Related News

By Blog

This post was originally published by Olof Kindgren

FOSSi Fever 2020

2020 was a year with a lot of bad news and so it feels slightly strange to cheerfully write about a very specific topic in the light of this. But there will always be good and bad things happening in the world. So let’s keep fighting the bad things and for now take look at what happened last year within the amazing world of open source silicon. I will start by mentioning the most significant, but by no means the only, milestones for the FOSSi movement as a whole and then take a more personal look at the work where I have been directly involved.

OpenMPW

The biggest story within free and open source silicon this year has undoubtly been the openMPW project involving Google, SkyWater Foundries and eFabless together with a number of other collaborators.

Ever since I got involved in open source silicon ten years ago, building a fully open source ASIC has been one of those big milestones. While we have had FOSSi IP cores taped out on chips for at least 20 years and parts of the flow being managed by open source tools, it has always seemed to be too much work and requiring filling in too many gaps to have a fully open source end-to-end flow to produce ASICs. But, over the years people all over the world have filled in the gaps and done the work bit by bit. Sometimes in the context of overarching programmes to advance open source silicon, sometimes in academic settings, sometimes coming from the industry and sometimes as completely unpaid hobby projects. And this year all these efforts came together, helped by funding, to produce four shuttle runs, each loaded with 40 different completely open source designs. The first of these runs are currently being fabricated, and it will be extremely interesting to see them coming back.

One of the final pieces in this puzzle was the PDK. And while SkyWater should be rightfully lauded for their decicion to open up their 130nm PDK, it begs to ask the question: Why on Earth did it take this long? What could possibly the fabs have to lose by doing this? What they gain is easy to answer, a completely new market of users who can create chips at their fab. According to people within the project, it’s estimated that 75% of the people on the first shuttle define themselves as software engineers. It’s very likely that none of these people would ever dream of making an ASIC without this possibility. So I kind of feel that making the EDA industry open up their formats is a bit like trying to get your kids to eat vegetables. It’s a lot of groaning and complaining, but was it really all that bad in the end to get some nutritions. Or in the case of ASIC fabs, was it really all that horrible to release your PDK to get some more customers? Let’s just hope this opens up the eyes of more fabs. My, and many others, dream is to eventually see the same thing happening to ASIC fabs as has been happening with cheap PCB services over the past ten years. And using that analogy, I’m quite sure it pays off to be early in the race. So, get started folks!

QuickLogic and SymbiFlow

The other big thing happening this year is that we finally have an FPGA-vendor shipping an open source toolchain for their devices. The company that will go down in the annals of history for being the first to do this is QuickLogic and their EOS S3 FPGA. This is by no means the first FPGA with an open toolchain, and the QuickLogic-flavored version of SymbiFlow developed by FOSSi veterans Antmicro is based on all this prior work. But it is the first time we see a toolchain being created from the FPGA manufacturer’s specifications rather than being figured out from compiled FPGA binaries and it’s the first toolchain that is supported and funded by the vendor rather than being at best tolerated by them. And again I must ask, why did it take so long for this to happen? If I was running a small FPGA startup with limited resources, I can’t for my life understand why I would want to spend a lot of time and money to build and continouosly maintain a big unwieldy toolchain all by myself instead of adding the required device-specific bits to a known good open source toolchain and share the maintenance burden. If nothing else it would free up resources to build other value-add products on top of the tools. It’s like if every vendor of computer systems would first build their own operating system and compiler before shipping their products. This is what we had in the 80s and abandonded it for very good reasons. Because it made absolutely no one happy. And you know what? I think the users of FPGAs should put more effort into pushing their vendors to support open source toolchains, because it will save everyone a heap of time and money.

Let me illustrate that last point with an example that actually happened when I was porting SERV to the QuickLogic devices. After synthesis I noticed that it used far more resources than expected. Looking at the synthesis logs I realized the memories in the design weren’t mapped to on-chip SRAM. So I asked the toolchain developers about this. They pointed me to the file in the toolchain that contained the rules for mapping to SRAM. I quickly found a badly tuned parameter, changed it to a more sensible value and ten minutes later it was working fine. An hour later I had submitted a patch back to the toolchain that fixed the problem for everyone else who would encounter it.

let’s break this down into numbers. Finding the cause of the bug took about 15 minutes. Fixing it, another five. At that point I could use it myself, but after spending another 15 minutes or so, it was also fixed for everyone else.

Now let’s do the same exercise for a proprietary closed source toolchain. Finding the cause of the bug takes… well…it depends… Let me explain.

I started my professional career at a company which at that point was the world’s largest FPGA buyer. Whenever we had problems, they flew in two FAEs to sit in our lap, they could provide us with custom internal builds of their tools and generally tried to make sure the problem quickly went away so that we would continue to buy FPGAs from them. However, most companies are not the world’s largest FPGA buyer and does not get this treatment. Instead you will have to wade through layers of support people until you reach someone who is actually qualified enough to acknowledge the issue. I have been in this situation numerous times and would estimate this process usually takes around 2-3 months. Actually fixing the bug probably takes five minutes or so in this case too, but here comes the fun part. In most cases you will now have to wait, I don’t know, a year or so until the fixed bug ends up in a released product that you can download. What happens in practice is that the user tends to find a workaround instead. In the example above, the likely solution would be to instantiate a RAM macro instead of relying on inference. This however doesn’t come for free as it requires finding all the instances where this is a problem, add special handling for this case which results in a larger code base with more options to verify and maintain. This costs time and this costs money. So the moral of this story is that closed source tools are more expensive for everyone involved and users of FPGAs should get better at telling the FPGA vendors that they are done with this closed source nonsense.

The QuickFeather. First FPGA board to ship with a FOSSi toolchain

There are numerous other news and projects that are well worth mentioning, but the above two are milestones that we have been waiting for a long time, so they deserved special attention. And if you want to keep up with the latest happenings in open source silicon, I highly recommend subscribing to the El Correo Libre newsletter which does a fantastic job of providing an overview of what goes on in all corners of the world. So let’s move on to some of my more personal victories that aren’t necessarily mentioned in other people’s year in review.

When I am working on and talking about open source silicon I am often wearing many hats because I’m associated with several different organizations. Luckily they are all pretty much aligned on this topic which makes things far easier. But these organizations also have different motives and goals so I would like to mention a few words about them here.

Qamcom

My dayjob is working for Qamcom Research & Technology and also there it has been more FOSSi work than usual, which I think is a good indication that open source silicon is becoming increasingly more common in chip design in general. The year started off by finishing up some work on SweRVolf together with a couple of my Qamcom colleagues. SweRVolf is a project under CHIPS Alliance, an organization that Qamcom has been part of since 2019 to help improve the state for open source and custom silicon. After that I was pulled into a project for doing climate research with a huge radar system. My task here was to handle sub-nanosecond time synchronization between systems located hundreds of miles from each other, using the White Rabbit system developed at CERN. I was pretty excited about getting to know White Rabbit. The timing section at CERN responsible for the White Rabbit project and associated technologies are household names within open source silicon and have a long history here. I know many of the people working there personally and have great respect for their work. But so far I  haven’t had the chance until now to actually get down and dirty with the technology. Once my job was done there I moved on to another proprietary project that I can’t discuss here. I do get to use FuseSoC and Verilator though, so it’s fine 🙂

FOSSi Foundation

The hat I tend to wear most when topic revolves around open source silicon is my FOSSi Foundation director hat. And despite doing a lot less of the things we normally spend time on, it turns out we did a lot of other great things instead. I will however not go into more detail here, but instead point to the excellent summary done by my FOSSi Foundation colleague Philipp Wagner.

RISC-V

Arguably the most well-known project nowadays with ties to open source silicon is RISC-V. My RISC-V ties deepened in the beginning of the year when I was asked to become a RISC-V ambassador. Part of being an ambassador is to create awareness of RISC-V in the fields where I’m active. For well-known reasons the number and nature of events were a bit different from previous years which meant fewer opportunities to weild this new-found power, but I did participate in an ask-the-expert session during the RISC-V global forum and a couple of other events that will be described later. And I also got to be interviewed about RISC-V and open source silicon by Sweden’s largest electronics and tech news outlets as well as the Architecnologia blog.

Current crop of RISC-V Ambassadors; AKA the Twantasic 12

In addition to my day job and participating in different organizations I also run a bunch of open source projects, so let’s take a look at the progress of the most important ones during 2020.

SweRVolf

SweRVolf is an extendable and portable reference SoC platform for the Western Digital / CHIPS Alliance SweRV cores. SweRVolf is designed for software engineers who wants a turn-key system to evaluate SweRV performance and features, for system designers who want a base platform to build upon and for learners of SoC design, computer architecture, embedded systems or open source silicon methodology. To easily achieve the goals of portability and extendability it is powered by FuseSoC, which just happens to be one of my other open source silicon projects mentioned later on.

During 2020 SweRVolf gained support for booting from SPI Flash but most effort was spent on usability,  by making it rock-solid, improve documentation, increase compatibility with more EDA tools, keep the underlying cores up-to-date and follow along with changes in the Zephyr operating system, which is the officially supported software platform for SweRVolf.

But the biggest thing to happen SweRVolf this year is that SweRVolf will be used as the base of a new university course from Imagination University Programme called RVfpga: Understanding Computer Architecture. I’m very excited (and slightly scared) about soon having thousands of students getting familiar with computer architecture, RISC-V and open source silicon through a SoC that I have designed. And I would like to mention a few things about how SwerRVolf is built because I think it’s a great example of how to create chip designs. When I say that I have designed SweRVolf, most of the work has consisted of putting together various pieces and make sure it works well as a whole. Most of the underlying code has been written by other people, and from my perspective, that is really the most successful aspect of SweRVolf because it highlights the rich open source silicon ecosystem. The main CPU core is from Western Digital and governed by CHIPS Alliance. Most of the AXI infrastructure was developed through the PULP project at ETH Zürich and University of Bologna. The UART and SPI controllers were developed for the OpenRISC project during the first wave of open source silicon almost 20 years ago. The Wishbone infrastructure was developed by me when I started out with open source silicon ten years ago and the memory controller was created by Enjoy Digital and is written in Migen as part of the Litex ecosystem. And to go full circle, the memory controller uses a tiny RISC-V CPU called SERV internally to aid with calibration. SERV, the world’s smallest RISC-V CPU is written by me. Small world. And of course the whole project is packaged with FuseSoC and uses Verilator by default for simulations, so it’s FOSSi all the way. As I hope you understand by now it’s not about some lone hero churning out code, but instead all this has been made possible by a huge amount of work by a ton of people over many years, and I’m proud to be one of them.

SERV

Probably the hobby (read unpaid) project I spent most time on during 2020 was SERV, the world’s smallest RISC-V CPU, which turned from small to even smaller during the year. SERV is very much a project driven by numbers, so lets look at some of these numbers.

In February I got hold of an ZCU106 development board with a huge Xilinx Ultrascale+ FPGA for a project I was assigned to. As this was the largest FPGA I had ever had in my home I got curious to see how many SERV cores I could squeeze into it. The year before, at the 2019 RISC-V workshop in Zürich, I had done a presentation on how to fit 8 RISC-V cores in a small Lattice iCE40 FPGA (spoiler: it ended up being slightly more than 8 eventually), giving each of them a single I/O to communicate with the outside world. Problem this time was that after stuffing in 360 cores I run out of I/O pins. It would also had been practically impossible to verify that all these external pins actually did what they were supposed to do, so I needed some way of using less than one I/O pin per core. Then it struck me that just a few months earlier I had created a heterogenous sensor aggregation platform based on SERV cores called Observer. The idea with Observer was to connect a lot of sensors to an FPGA, each serviced by its own SERV core and then merge the data to an output stream. I gave up on the platform when I realized that while I could fit a lot of SERV cores into the devices, I just had a few sensors so there wasn’t much data to to aggregate. But this platform was a very good starting point.

Block diagram of the Observer platform
By removing all sensor interfaces and just have each core print out an identification message instead I had a system that I could instantiate with any number of cores. Trying this on the ZCU106 I could now run over 600 cores on the FPGA. The next problem then was that I was running out of on-chip RAM way before any of the other FPGA resources. In case you don’t know, most FPGAs contain a number of fixed-size SRAM spread out over the devices, each typically being 1-8kB large. For SERV, each core used one for the RF (register file), and another one for the program/data memory. With RISC-V using 32 32-bit registers, this means that only 128 bytes of the RF RAM is used but since the fixed-size SRAM on FPGAs are typically being far larger than that most of the RAM ends up unused. That’s bad, but I had a plan. With a bit of work I managed to share RF, program and data in the same RAM so that the RF is allocated to the top 128 bytes of the RAM. This freed up half of the on-chip RAM blocks and I could eventually hit 1000 SERV cores on the ZCU106 board. Of course, at this point I was curious to see what the situation was for other boards after all these optimizations. Taking it one step further I figured I should turn this into a real thing by creating a benchmark so that people can have a quick way to see roughly how large the FPGAs are on different boards. And with that ServMark was born.

ServMark lasted for about three minutes until I realized CoreScore was a much catchier name, so that’s we have now.

I had originally planned to do a presentation about SERV at Latch-Up in Boston. But for well-known reasons we cancelled all FOSSi Foundation physical events. Instead I accepted an invitation to speak at the First virtual RISC-V Munich meetup. By this point I had attended a couple of virtual meetups and I hated it. It was in most cases awful to watch a narrated slide deck without a stage and a speaker to bring it to life at least a bit. So, I decided to take a fresh look and look at the possibilities instead of being limited by the medium. I decided to make videos instead. First of all, since people would watch on a computer screen with proper resolution instead of a washed-out image projected on canvas. This meant I could have much more detailed pictures and smaller font sizes. I could also freely mix pictures and animations, fine-tune timings, do several takes of the audio and add sound effects. And despite being done by someone who has pretty much zero experience in these sort of things, I think it turned out pretty well. So on the day of the event, I just introduced myself and let the audience indulge in my fully immersive multimedia edutainment experience about SERV. Not sure why the Oscars committee hasn’t got in touch yet.

Is CoreScore the only attempt to put a mind-boggling number of RISC-V cores inside chips? Absolutely not, and during the summer I learned of the Manticore project from Florian Zaruba and Fabian Schuiki (jointly known as Flobian Schuba) from ETH Zürich, both well-known names in the FOSSisphere from their work in the PULP ecosystem. Manticore: A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing had been accepted into the prestigious Hot chips conference. Manticore is an impressive project, but I still thought it was a bit unfair that I wasn’t invited as well. So I reached out for the biggest FPGA board I could find and then wrote my first academic paper of the year called Plenticore: A 4097-core RISC-V SoClet Architecture for Ultra-inefficient Floating-point Computing. Unfortunately I did not recieve an invitation to Hot chips despite this. I assume it must have gotten lost in the mail somewhere.

Oh well, let’s look at some more numbers instead.

  • A number of optimizations was found over the year which, depending on the measure, further shrank the core 5-10%.
  • The number of supported FPGA boards for the servant SoC grew from 4 to 17, mostly thanks to other contributors (thanks everyone, love you all!)
  • The SERV support for the Zephyr operating system was rewritten and upgraded from the aging Zephyr 1.13 to 2.4, the latest version at the time of writing.
SERV resource usage over time on a Lattice iCE40 FPGA

Coming back to CoreScore, the results right now range from 10 cores on a smaller Lattice iCE40 device up to 5087 cores on a large Xilinx device and the high score table can even be viewed interactively online! If you can’t find your favorite board there, just send a PR and we’ll get it added. And if any people with access to crazy large FPGAs happen to read this, please get in touch with me. I’m very curious to see who will be first to get above 10000 cores.

SERV also saw another great improvement in the form of documentation. While not completely there yet, the functionality of most SERV modules have been documented together with detailed schematics showing the implementation down the individual gates, muxes and flip-flops. And most changes to the source are now accompanied by a code comment to clarify what is going on. Since SERV is optimized for size rather than readability, many parts of the core are difficult to figure out by just looking at the source code. Hopefully this will make it easier for other who want to understand or work on the core, but frankly it has also been very useful for me since I tend to forget why I did things in a certain way so I have had to spend a lot of time following my own tracks.

Schematic of the SERV control unit from the SERV documentation

FuseSoC

The oldest of my open source silicon project still going strong is FuseSoC. It is now about to turn ten years old and keeps growing in features and users for each year. Looking back at the changes through 2020 I can see some new trends in the development. The most important one is that for the first year ever, most of the work was not done by me. During 2020, my fellow FOSSi Foundation director and LowRISC employee Philipp Wagner has been pulling the heaviest load of FuseSoC development. And with Philipp came quality. Dr. Wagner has improved FuseSoC in pretty much every aspect. Bugs have been fixed and features has been added. The development experience has been improved by CI testing, automatic code formatting checks and improved testing coverage. And what makes me happiest is that the user experience has been improved not least by a total rewrite of the documentation into something that is actually useful and can be proudly shown to the world. All this is very much needed as FuseSoC is becoming increasingly popular. It has already been picked up by many of the flagship open source silicon projects like OpenTitanSweRVolfOpenPiton and with the RVFPGA university programme there will soon be a whole new generation who will get familiar with it as well.

Edalize

In 2018, the part of FuseSoC that interacted with the EDA tools was spun off into Edalize. The reason was that it was believed this part could be useful for others who weren’t interested in the whole FuseSoC package. This prediction seems to have been correct and Edalize has very much started a life on its own by now. In addition to FuseSoC, Edalize is now used by several other projects such as SiliceClash and fpga-perf-tool and over the year Edalize has gained support for 7 new EDA tool flows, bringing the total number up to 25.

2020 was also the year when Edalize had it’s first taste of being in the spotlight on its own merits. For the Workshop on Open-Source EDA Technology (WOSET) 2020 I decided to submit a presentation about Edalize. Being an academic conference this also prompted me to write an accompanying paper as is the common courtesy for these kind of events. The paper received a lukewarm response but was accepted anyway. Once again I did not feel like reciting slides to a camera so I turned back to my new-found interest in advanced multimedia productions. And it paid off. The Edalize video won an award for best video at WOSET 2020. Well done Edalize!

LED to Believe

All of the above projects use FuseSoC and Edalize because – well it’s kind of why I created FuseSoC in the first place – to easily reuse components and retarget to different devices. But I also realized there was a need for a dead simple project to help people getting started with FuseSoC – the Hello world of silicon, so to speak. And the Hello World of silicon is of course the blinking LED. So in 2018 I created project LED to Believe with the ambitious goal to create FuseSoC-powered LED blinkers for every FPGA board ever made. The project has several aspects that are useful in different ways. It serves as a very simple introduction to FuseSoC and how to make a design that targets multiple hardware. It is also an excellent pipe cleaner for when you receive a new board. If you can run the project successfully and get the LED to blink, it likely means you have managed to install all the EDA tools correctly which is no small feat, and you also have a template to take on bigger projects. And it’s also fun to see what boards are available out there. While I have submitted a bunch of the board ports myself, the vast majority have come from all the fantastic contributors out there. And during 2020 the number of supported boards grew from 16 to 44. Perhaps not all the FPGA boards ever made, but a considerable chunk of them. And already in the short amount of 2021 that has passed, there have been numerous more contributions so we’re getting closer all the time.

In closing, 2020 was a busy year FOSSi-wise. And this has just touched upon the surface of all things that have been happening during the year. And just as we were about to close the books on 2020, I was informed that Lattice had incorporated one of my FOSSi projects into their shiny new award-winning Propel design suite. Which project, you might ask? Was it the similarly award-winning FuseSoC, to give Lattice users immediate access to a rich ecosystem of Open IP cores? Or was it the Rosetta stone of Edalize, with its award-winning video, that would easily provide a coherent interface for a dozen simulators and make it easy to switch between Lattice’s multitude of FPGA tools such as Diamond, icecube2 and Radiant? Or was it SERV itself, the award-winning CPU capable of offering a RISC-V experience for all but their absolutely smallest offerings? Well, actually, none of the above. It turns out that Propel now contains ipyxact, my somewhat feature-limited Python library for working with IP-XACT files. Not my first choice, but fair enough. I wonder if they have read my somewhat complicated relationship with IP-XACT.

Finally my work is recognized by big EDA vendors (picture by Gatecat)