All Posts By

racepointglobal

Antmicro’s ARVSOM RISC-V Module Announced

By Blog

This post was originally published at Antmicro.

We are excited to announce the ARVSOM – Antmicro’s fully open source, RISC-V-based system-on-module featuring the StarFive 71×0 SoC. Using the RISC-V architecture, which Antmicro has been heavily involved in since the early days as a Founding Member of RISC-V International, the SoM is going to enable unprecedented openness, reusability and functionality across different verticals.

We are excited to announce the ARVSOM – Antmicro’s fully open source, RISC-V-based system-on-module featuring the StarFive 71×0 SoC. Using the RISC-V architecture, which Antmicro has been heavily involved in since the early days as a Founding Member of RISC-V International, the SoM is going to enable unprecedented openness, reusability and functionality across different verticals.

ARVSOM

Open source-driven innovation

Since its conception, Antmicro has been enabling its customers to tap into the technological freedom that is inherent in open source. We’ve been developing cutting-edge industrial and edge AI systems using vendor-neutral and customizable solutions as well as actively developing and contributing to the tooling ecosystem, improving processes and unlocking even more system design options – often in alignment with our efforts driving RISC-V International and CHIPS Alliance. Targeted at a range of different use cases and enabling unmatched flexibility, ARVSOM is the latest product and example of Antmicro’s expertise in creating practical and easily modifiable technologies using open source.

Based on StarFive 71×0

Sitting at the heart of ARVSOM is the StarFive 71×0 system-on-chip – the first Linux-capable RISC-V SoC that is intended for mainstream, general-purpose edge applications and is expected to be used in millions of devices to be built on the open source RISC-V ISA. The SoC is also used in the BeagleV StarLight development platform (GitHub repo) which has just been shipped to beta users, including Antmicro, who will be verifying its robustness and reliability in real life use cases, while the development community focuses on further expanding the RISC-V software support. The SoC features a dual core U74 CPU from our partner and RISC-V pioneer SiFive, two AI accelerators: the open source NVDLA and SiFive’s Neural Network Engine, 1 MIPI DSI and 2 MIPI CSI interfaces, HDMI, Gigabit Ethernet, dual ISP, USB 3.0, while the production version of the SoC to be released later in the year will also have PCI Express.

Compatible with Scalenode for server room applications

The SoM will be compatible with our server-oriented Scalenode platform, released earlier this month, originally designed as a baseboard for the Raspberry Pi 4 Compute Module. Used together, Scalenode and ARVSOM will enable building easily scalable and flexible infrastructures consisting of clusters of small RISC-V-based compute units, with as many as 18 Scalenode boards fitting in a 1U rack. ARVSOM will also work with other Raspberry Pi CM 4 baseboards, opening the door to a host of solutions and custom devices to be built with it right away.

ARVSOM in Renode

You can already start software development with our ARVSOM and the BeagleV StarLight SBC thanks to support in Renode, our open source simulation framework. The two have joined an array of platforms and sensors that can be simulated in our open source framework for system development and testing. You can freely co-develop hardware and software in Renode’s virtual environment, with the code behaving exactly the way it would on real hardware. Renode can also be used throughout the development lifecycle to enable continuous testing and integration, as well as prototyping new solutions based on RISC-V, as it features extensive support of this open source ISA. You can read more about the latest updates in Renode in the version 1.12 announcement blog note.

Our SoM is in development and will be gradually unveiled throughout the year with the increasing StarFive 71×0 SoC availability.

ARVSOM can be customized and integrated into your project. We have been helping our customers embrace the groundbreaking RISC-V architecture and enable them to reap the benefits of having an advanced system based on vendor-neutral, future-proof and robust technologies. Get in touch with us at contact@antmicro.com if you need a modern product that you have full control over.

Dynamic Scheduling in Verilator – Milestone Towards Open Source UVM

By Blog

This post was originally published at Antmicro.

UVM is a verification methodology traditionally used in chip design which has historically been missing from the open source landscape of verification-focused tooling. While new, open source approaches to verification have emerged that include the excellent Python-based Cocotb (that we also use and support) maintained by FOSSi Foundation, not everyone can easily adopt it, especially in long-running projects and existing codebases that use a different verification approach. Leading the efforts towards comprehensive UVM / SystemVerilog support in open source tools, we have been gradually completing milestones, getting closer to what will essentially be a modular, collaboration-driven chips design methodology/workflow. Some examples of our activity in this space include enabling open source synthesis and simulation of the Ibex CPU in Verilator/Yosys via UHDM/Surelog, and the most recent joint project with Western Digital in which we have developed dynamic scheduling in Verilator.

Verilating non-synthesizable code

The work is a part of a wider, long-term effort that, besides Western Digital, has involved other fellow CHIPS Alliance members such as Google, to develop UVM / SystemVerilog support in open source tools. To get closer to this goal, this time we have introduced an improvement to Verilator – an open source tool that is great at doing what it was originally designed to do, that is simulating synthesizable code, but hasn’t been enabling its users to convert non-synthesizable code and run fully fledged testbenches written in Verilog/SystemVerilog. The biggest challenge to opening that possibility turned out to be Verilator’s original scheduler running everything sequentially, which in general is not a bad approach but can get you only so far without actually executing the code in parallel.

Dynamic scheduling in Verilator

To run proper UVM testbenches in Verilator, we had to be able to properly handle language constructs specifically designed for use in simulation. Those features include delay statements, forks, wait statements and events. To achieve all of this, we needed to add a proof-of-concept dynamic scheduler to Verilator.

How dynamic scheduling in Verilator works

Instead of grouping code blocks of the same type as the original scheduler does, the dynamic scheduler separates the code into many runnable blocks and then creates and spawns a process for each of them (when they are due to be executed). To deal with it we created a new process, which translates to a new high level thread for each initial blockalways blockforever block, each block in the fork statement etc. This allows us to pause the execution of one block and continue the execution of other blocks. Blocks can be paused when they encounter simulation control statements like:

  • The wait statement (e.g. wait (rdy == 1); or wait (event_name.triggered))
  • Waiting for an event (e.g @event_name;)
  • A delay (#10;)
  • Or even a join statement after fork (joinjoin_any – join_none is also supported, however it is not used to control the execution per se)

All of the statements listed above are not supported in Verilator’s original scheduler, so to be able to dynamically control the execution of those already dynamically spawned sub-processes, a way to react to changes in signal/events states needed to be implemented. We did it by wrapping the primitive types used for storing signal values in objects using higher abstraction levels, which now allows us to attach other processes to monitor values of specific signals and execute an arbitrary callback when a signal changes. Note that this is a mechanism used internally to implement the flow control statements and is not visible to the user writing Verilog code.

One additional thing that had to be re-written and adapted to the dynamic scheduler was the way non-blocking assignments are handled.

With a scheduler that is aware of the simulation time and other flow control statements, the same approach originally used by Verilator could not be applied anymore. A way to properly schedule assignments, and execute them in the proper moment was a crucial step in getting all of this to work.

Note that the features described above are the ones that are immediately visible to users of Verilator with the dynamic scheduler. This is however only the tip of the iceberg when it comes to the amount of internal changes needed to be done in Verilator.

Example usage

To present the usage of all the new Verilator features we have created a public GitHub repository with a number of example designs and GitHub Actions-based CI simulating them.

The examples range from single feature examples to more complex scenarios simulating a system with two UARTs sending data between each other.

The UART testbench example uses all the newly added features. It spawns a thread for every simulated UART block and each thread feeds the UART IP that is being tested with the predefined data (the “hello world” string in our example).

The UART block reads the data from an AXI stream interface, serializes it and sends it over the TX line.

The received data is available on the output AXI Stream bus.

Each testbench thread introduces a random delay between consecutive data chunk transmission.

The threads are synchronized using SystemVerilog events – e.g. once the data is fed into the UART block over the AXI stream interface, a thread is triggering an event informing the other part of the thread that the data is being transmitted by the IP.
Once transmission is done, another event is triggered informing the first thread that the IP is ready for a new data chunk.

Once all the test data is transferred between both tested UART instances, the threads are joined and data correctness is validated.

Next steps, future goals

This ongoing project is a big step towards UVM support in open tools as it not only removes a number of limitations in this area but also opens the door to future developments – which include a number of SystemVerilog features oriented towards verification which were earlier out of scope since open source UVM verification wasn’t on the horizon. Together with the effort to support SystemVerilog in Verilator using UHDM which is part of another ongoing project, in CHIPS Alliance we continue to work towards enabling open source design verification for everyone, which will revolutionize the way chips are built.

New MPW-TWO Program Will Provide Fabrication For Fully Open Source Projects

By Blog

By Rob Mains, General Manager of CHIPS Alliance

CHIPS Alliance is excited to announce that the hardware development community can submit their open source design projects to Efabless.com for space on their forthcoming shuttle. This opportunity comes after the success of having 40 submissions for the MPW-ONE shuttle; 60% of those designs were submitted by first-time ASIC designers. MPW-TWO is the second Open MPW Shuttle providing fabrication for fully open-source projects using the SkyWater Open Source PDK announced by Google and SkyWater. 

The shuttle gives designers the freedom to innovate without having to worry about the risks associated with the cost of fabrication. This is a great opportunity for individuals, universities, and industry to create their own IP and have it manufactured. 

The deadline for submission is June 18. Submissions must be open source designs and leverage open source tooling such as OpenROAD, OpenLane and other EDA applications that Efabless.com has at its design portal. Read more about the project requirements and submit here: https://efabless.com/open_shuttle_program/2.

I look forward to seeing the community’s contributions for this generous offering from Efabless.com and Google.

Modular, Open-source FPGA-based LPDDR4 Test Platform

By Blog

This post was originally published at Antmicro.

The flexibility of FPGAs makes them an excellent choice not only for parallel processing applications but also for research and experimentation in a range of technological areas.

We often provide our customers with flexible R&D platforms that can be easily adapted to changing requirements and new use cases as a result of our practice of using open source hardware, software, FPGA IP and tooling.

As an example of such activity, we have recently been contracted to develop a hardware test platform for experimenting with memory controllers and measuring vulnerability of various LPDDR4 memory chips to the Row hammer attack and similar exploits.

LPDDR4 test platform

Modular and cost-optimized

Targeting high-volume customer-facing devices where size, power use and unit cost are a priority, LPDDR4 does not come in the form of modules, while the hardware tools and software frameworks for testing it can be prohibitively expensive.
Despite efforts to mitigate the Row hammer exploit, a number number of memories available on the market remain vulnerable to the problem, which calls for a test platform that would allow experimenting with memory chips and memory controllers to devise new mitigation techniques.

Another issue is that preexisting work mostly relies on proprietary memory controllers which cannot be adapted to specific memory access patterns that trigger Row hammer.

To address this need, we have created a fully open source flow including Enjoy Digital’s open source memory controller LiteDRAM for which we implemented LPDDR4 support, to enable testing LPDDR4 memory chips.

What our customer needed was a flexible platform for developing security measures that would be cost-optimized for high volume production.

To accomplish that we’ve built a modular device that consists of the main test board and a series of easily swappable testbeds for different memory types, the first of which is already available on our GitHub.

What is more, thanks to being open source, the platform enables various research teams to combine efforts and work collaboratively on coming up with new attacks and mitigations, as well as fully reproduce the results of the work.

LPDDR4 test module

Experimenting reliably on a robust platform

The platform is based on Xilinx Kintex-7 FPGA and it features several I/O options: HDMI, which can be used for processing video data and experimenting with streaming and HDMI preview applications featuring RAM, USB for uploading your bitstream or debugging, as well as an SD card slot and GbE.

There is also additional 64MB of on-board HyperRAM that enables safe experimentation with interchangeable RAM chips under extreme conditions.

With Antmicro’s commercial development services the platform can be customized to meet your specific requirements, while the open source character of the solutions we use gives you full control over the product and vendor independence.

We help our customers build complicated FPGA solutionsembrace the dynamically growing open source tooling ecosystem and develop various technologies that allow developers to work more efficiently across the whole FPGA spectrum.

CHIPS Alliance and RISC-V International Invite the RISC-V Community to Participate in Updating a New Unified Memory Architecture Standard

By Announcement

New joint working group will enhance the OmniXtend Cache Coherency architecture

SAN FRANCISCO, March 24, 2020 – RISC-V International, a non-profit corporation controlled by its members to drive the adoption and implementation of the free and open RISC-V instruction set architecture (ISA), and CHIPS Alliance, the leading consortium advancing common and open hardware for interfaces, processors and systems, today announced a joint collaboration to update the OmniXtend Cache Coherency specification and protocol, along with building out developer tools for OmniXtend.

As part of this collaboration, RISC-V International and CHIPS Alliance have formed a new OmniXtend working group which will focus on creating an open, cache coherent, unified memory standard for multicore compute architectures. The group will update the OmniXtend specification and protocol, build out architectural simulation models and a reference register-transfer level (RTL) implementation, as well as create a verification workbench. These tools for an open, standard unified memory coherency bus leveraging OmniXtend will make it easier for designers to take advantage of OmniXtend for data-centric applications.

“As RISC-V International develops implementation independent specifications and ecosystem components, it is an important priority for us to ensure that whatever we develop will work with emerging and established standards. The joint working group will interact with various RISC-V groups to review the OmniXtend protocol with an emphasis on cache management and paying close attention to coherency enablement for RISC-V members,” said Mark Himelstein, CTO at RISC-V International. “As a result of this joint effort, the RISC-V community will have the tools they need to leverage an open, coherent, unified memory standard for all types of data-centric applications.”

“The newly formed OmniXtend working group will set the standard for open, coherent heterogeneous compute architectures. We plan to allow for a mixture of hardware IP blocks, giving developers more design flexibility so they can choose what works best for their specific application needs,” said Rob Mains, General Manager at CHIPS Alliance. “We encourage the RISC-V community to get involved in this important initiative which will open new design possibilities with OmniXtend.”

Dejan Vucinic of Western Digital will be giving a talk on OmniXtend at the CHIPS Alliance Spring Workshop on March 30, 2021. The event will also cover the AIB chiplet ecosystem, SWeRV Core support, FPGA tooling and much more. To register for this free virtual event, please visit: https://events.linuxfoundation.org/chips-alliance-spring-workshop/register/.

To learn more about the OmniXtend working group, please visit: https://lists.chipsalliance.org/g/riscv-omnixtend-wg.

About RISC-V International

RISC-V is a free and open ISA enabling a new era of processor innovation through open collaboration. Founded in 2015, RISC-V International is composed of more than 1,300 members building the first open, collaborative community of software and hardware innovators powering a new era of processor innovation. The RISC-V ISA delivers a new level of free, extensible software and hardware freedom on architecture, paving the way for the next 50 years of computing design and innovation.

RISC-V International, a non-profit organization controlled by its members, directs the future development and drives the adoption of the RISC-V ISA. Members of RISC-V International have access to and participate in the development of the RISC-V ISA specifications and related HW / SW ecosystem.

About the CHIPS Alliance

The CHIPS Alliance is an organization which develops and hosts high-quality, open source hardware code (IP cores), interconnect IP (physical and logical protocols), and open source software development tools for design, verification, and more. The main aim is to provide a barrier-free collaborative environment, to lower the cost of developing IP and tools for hardware development. The CHIPS Alliance is hosted by the Linux Foundation. For more information, visit chipsalliance.org.

About the Linux Foundation    

The Linux Foundation was founded in 2000 and has since become the world’s leading home for collaboration on open source software, open standards, open data, and open hardware. Today, the Foundation is supported by more than 1,000 members and its projects are critical to the world’s infrastructure, including Linux, Kubernetes, Node.js and more. The Linux Foundation focuses on employing best practices and addressing the needs of contributors, users, and solution providers to create sustainable models for open collaboration. For more information, visit linuxfoundation.org.

###

GitHub Actions Self-hosted Runners, Build Event Server and Google Cloud

By Blog

This post was originally published at Antmicro.

Continuous Integration and smart lifecycle management are key for high-tech product development, which is often a complex and multi-faceted process that requires automation to be efficient and failure-proof. At Antmicro, we’ve been creating various open source cloud and hybrid cloud solutions for our customers, helping them to encapsulate the complexity of their software stack. Lots of those projects cross the hardware/software boundary and involve a mix of open source and proprietary code, which means that fine-grained control of the CI setups are needed to make them work.

To provide the level of flexibility that we and our customers require, we often find ourselves working extensively on the underlying CI infrastructure, building open source solutions that can scale between organizations and teams. One such project involved creating a custom, local GitHub Actions runner, with containerized builds, support for Google’s Build Event Server and workload measurement and analytics; in collaboration with Google we then also enabled running an identical setup with the extra capabilities in Google Cloud.

Self-hosted runner diagram

Custom runner, more applications

GitHub is the world’s largest open source code sharing space, home to many of our open source projects such as Renode, the open source FPGA toolchain SymbiFlow or our open source ASIC development-focused SystemVerilog work. GitHub Actions – used by millions of developers worldwide – is a natural choice for those projects as the go-to CI flow. However, by default it provides compute resources – in the CI world traditionally called runners – with a specific hardware configuration which does not always fit the needs of the workloads that we deal with.

Especially in our work that involves ASIC and FPGA development flows – working towards enabling fully open source chip and IP design in our broader work in collaboration with our customers and fellow CHIPS Alliance members such as Google, Western Digital and QuickLogic, we find ourselves needing hybrid setups which would allow us to keep the code as well as the CI definitions public while being able to rely on internal infrastructure to do the heavy lifting. Long-running builds involving tools like VtR or OpenROAD that use lots of memory and CPU power, can greatly benefit from the flexibility that comes with the use of custom, self-hosted runners, and this solution also gives you a high degree of freedom in terms of integrating your runner with external hardware or tools that can’t be shared publicly. The latter is especially helpful in some of our other open source projects for things like benchmarking RISC-V, OpenPOWER and other cores or tracking the QoR for your FPGA designs. The Quality of Results and flexible Continuous Integration elements are extremely important for custom engineering projects we embark on, which typically integrate a variety of open source components together – fortunately the very nature of open source that constitute the predominant part of the tools we use makes such work much easier.

Virtual machines, distant-bes and Google Cloud integration

Our internal and our customers’ needs have called for the ability to integrate on-premise runners into our GitHub CI flows, which can be done using the GitHub runner project. For many of our projects, we provide flexible development infrastructure based on open source that allows us to better collaborate around shared code and to do that, we need to be able to scale compute resources between the private and public cloud. To enable feature parity with some of internal infrastructure, we also extended the self-hosted runners with some extra features.

Firstly, the custom runners developed as part of the project can be used with our distant-bes framework to push results in the Build Event Server format to custom results viewers transparently to the CI run itself. You can see an example of how this works in the symbiflow-examples repository. Secondly, we modified the runner so that instead of running the CI script on bare metal, it spawns virtual machines and performs the run steps inside them, collects results, and kills the machine, without changing the state of the host system’s kernel. This also allows us to gather performance metrics to see what the real utilization of the runner’s resources is – and we push those results in the form of graphs to our BES server.

Runner's resource utilization

Lastly, based on the needs of several of our collaborative open source projects with Google, we pursued yet another goal, namely, instantiating our self-hosted runners in Google Cloud, which enables our CI to spin powerful servers up and down on demand. This mix of robust internal infrastructure and always-available, scalable on-demand Google Cloud resources is very useful for heavy workloads run by multiple organizations. In the world of collaborative development in forums like the CHIPS Alliance and RISC-V International, this is no longer a nice-to-have, but a necessity.

Goings-on in the FuseSoC Project and Other Open Source Silicon Related News

By Blog

This post was originally published by Olof Kindgren

FOSSi Fever 2020

2020 was a year with a lot of bad news and so it feels slightly strange to cheerfully write about a very specific topic in the light of this. But there will always be good and bad things happening in the world. So let’s keep fighting the bad things and for now take look at what happened last year within the amazing world of open source silicon. I will start by mentioning the most significant, but by no means the only, milestones for the FOSSi movement as a whole and then take a more personal look at the work where I have been directly involved.

OpenMPW

The biggest story within free and open source silicon this year has undoubtly been the openMPW project involving Google, SkyWater Foundries and eFabless together with a number of other collaborators.

Ever since I got involved in open source silicon ten years ago, building a fully open source ASIC has been one of those big milestones. While we have had FOSSi IP cores taped out on chips for at least 20 years and parts of the flow being managed by open source tools, it has always seemed to be too much work and requiring filling in too many gaps to have a fully open source end-to-end flow to produce ASICs. But, over the years people all over the world have filled in the gaps and done the work bit by bit. Sometimes in the context of overarching programmes to advance open source silicon, sometimes in academic settings, sometimes coming from the industry and sometimes as completely unpaid hobby projects. And this year all these efforts came together, helped by funding, to produce four shuttle runs, each loaded with 40 different completely open source designs. The first of these runs are currently being fabricated, and it will be extremely interesting to see them coming back.

One of the final pieces in this puzzle was the PDK. And while SkyWater should be rightfully lauded for their decicion to open up their 130nm PDK, it begs to ask the question: Why on Earth did it take this long? What could possibly the fabs have to lose by doing this? What they gain is easy to answer, a completely new market of users who can create chips at their fab. According to people within the project, it’s estimated that 75% of the people on the first shuttle define themselves as software engineers. It’s very likely that none of these people would ever dream of making an ASIC without this possibility. So I kind of feel that making the EDA industry open up their formats is a bit like trying to get your kids to eat vegetables. It’s a lot of groaning and complaining, but was it really all that bad in the end to get some nutritions. Or in the case of ASIC fabs, was it really all that horrible to release your PDK to get some more customers? Let’s just hope this opens up the eyes of more fabs. My, and many others, dream is to eventually see the same thing happening to ASIC fabs as has been happening with cheap PCB services over the past ten years. And using that analogy, I’m quite sure it pays off to be early in the race. So, get started folks!

QuickLogic and SymbiFlow

The other big thing happening this year is that we finally have an FPGA-vendor shipping an open source toolchain for their devices. The company that will go down in the annals of history for being the first to do this is QuickLogic and their EOS S3 FPGA. This is by no means the first FPGA with an open toolchain, and the QuickLogic-flavored version of SymbiFlow developed by FOSSi veterans Antmicro is based on all this prior work. But it is the first time we see a toolchain being created from the FPGA manufacturer’s specifications rather than being figured out from compiled FPGA binaries and it’s the first toolchain that is supported and funded by the vendor rather than being at best tolerated by them. And again I must ask, why did it take so long for this to happen? If I was running a small FPGA startup with limited resources, I can’t for my life understand why I would want to spend a lot of time and money to build and continouosly maintain a big unwieldy toolchain all by myself instead of adding the required device-specific bits to a known good open source toolchain and share the maintenance burden. If nothing else it would free up resources to build other value-add products on top of the tools. It’s like if every vendor of computer systems would first build their own operating system and compiler before shipping their products. This is what we had in the 80s and abandonded it for very good reasons. Because it made absolutely no one happy. And you know what? I think the users of FPGAs should put more effort into pushing their vendors to support open source toolchains, because it will save everyone a heap of time and money.

Let me illustrate that last point with an example that actually happened when I was porting SERV to the QuickLogic devices. After synthesis I noticed that it used far more resources than expected. Looking at the synthesis logs I realized the memories in the design weren’t mapped to on-chip SRAM. So I asked the toolchain developers about this. They pointed me to the file in the toolchain that contained the rules for mapping to SRAM. I quickly found a badly tuned parameter, changed it to a more sensible value and ten minutes later it was working fine. An hour later I had submitted a patch back to the toolchain that fixed the problem for everyone else who would encounter it.

let’s break this down into numbers. Finding the cause of the bug took about 15 minutes. Fixing it, another five. At that point I could use it myself, but after spending another 15 minutes or so, it was also fixed for everyone else.

Now let’s do the same exercise for a proprietary closed source toolchain. Finding the cause of the bug takes… well…it depends… Let me explain.

I started my professional career at a company which at that point was the world’s largest FPGA buyer. Whenever we had problems, they flew in two FAEs to sit in our lap, they could provide us with custom internal builds of their tools and generally tried to make sure the problem quickly went away so that we would continue to buy FPGAs from them. However, most companies are not the world’s largest FPGA buyer and does not get this treatment. Instead you will have to wade through layers of support people until you reach someone who is actually qualified enough to acknowledge the issue. I have been in this situation numerous times and would estimate this process usually takes around 2-3 months. Actually fixing the bug probably takes five minutes or so in this case too, but here comes the fun part. In most cases you will now have to wait, I don’t know, a year or so until the fixed bug ends up in a released product that you can download. What happens in practice is that the user tends to find a workaround instead. In the example above, the likely solution would be to instantiate a RAM macro instead of relying on inference. This however doesn’t come for free as it requires finding all the instances where this is a problem, add special handling for this case which results in a larger code base with more options to verify and maintain. This costs time and this costs money. So the moral of this story is that closed source tools are more expensive for everyone involved and users of FPGAs should get better at telling the FPGA vendors that they are done with this closed source nonsense.

The QuickFeather. First FPGA board to ship with a FOSSi toolchain

There are numerous other news and projects that are well worth mentioning, but the above two are milestones that we have been waiting for a long time, so they deserved special attention. And if you want to keep up with the latest happenings in open source silicon, I highly recommend subscribing to the El Correo Libre newsletter which does a fantastic job of providing an overview of what goes on in all corners of the world. So let’s move on to some of my more personal victories that aren’t necessarily mentioned in other people’s year in review.

When I am working on and talking about open source silicon I am often wearing many hats because I’m associated with several different organizations. Luckily they are all pretty much aligned on this topic which makes things far easier. But these organizations also have different motives and goals so I would like to mention a few words about them here.

Qamcom

My dayjob is working for Qamcom Research & Technology and also there it has been more FOSSi work than usual, which I think is a good indication that open source silicon is becoming increasingly more common in chip design in general. The year started off by finishing up some work on SweRVolf together with a couple of my Qamcom colleagues. SweRVolf is a project under CHIPS Alliance, an organization that Qamcom has been part of since 2019 to help improve the state for open source and custom silicon. After that I was pulled into a project for doing climate research with a huge radar system. My task here was to handle sub-nanosecond time synchronization between systems located hundreds of miles from each other, using the White Rabbit system developed at CERN. I was pretty excited about getting to know White Rabbit. The timing section at CERN responsible for the White Rabbit project and associated technologies are household names within open source silicon and have a long history here. I know many of the people working there personally and have great respect for their work. But so far I  haven’t had the chance until now to actually get down and dirty with the technology. Once my job was done there I moved on to another proprietary project that I can’t discuss here. I do get to use FuseSoC and Verilator though, so it’s fine 🙂

FOSSi Foundation

The hat I tend to wear most when topic revolves around open source silicon is my FOSSi Foundation director hat. And despite doing a lot less of the things we normally spend time on, it turns out we did a lot of other great things instead. I will however not go into more detail here, but instead point to the excellent summary done by my FOSSi Foundation colleague Philipp Wagner.

RISC-V

Arguably the most well-known project nowadays with ties to open source silicon is RISC-V. My RISC-V ties deepened in the beginning of the year when I was asked to become a RISC-V ambassador. Part of being an ambassador is to create awareness of RISC-V in the fields where I’m active. For well-known reasons the number and nature of events were a bit different from previous years which meant fewer opportunities to weild this new-found power, but I did participate in an ask-the-expert session during the RISC-V global forum and a couple of other events that will be described later. And I also got to be interviewed about RISC-V and open source silicon by Sweden’s largest electronics and tech news outlets as well as the Architecnologia blog.

Current crop of RISC-V Ambassadors; AKA the Twantasic 12

In addition to my day job and participating in different organizations I also run a bunch of open source projects, so let’s take a look at the progress of the most important ones during 2020.

SweRVolf

SweRVolf is an extendable and portable reference SoC platform for the Western Digital / CHIPS Alliance SweRV cores. SweRVolf is designed for software engineers who wants a turn-key system to evaluate SweRV performance and features, for system designers who want a base platform to build upon and for learners of SoC design, computer architecture, embedded systems or open source silicon methodology. To easily achieve the goals of portability and extendability it is powered by FuseSoC, which just happens to be one of my other open source silicon projects mentioned later on.

During 2020 SweRVolf gained support for booting from SPI Flash but most effort was spent on usability,  by making it rock-solid, improve documentation, increase compatibility with more EDA tools, keep the underlying cores up-to-date and follow along with changes in the Zephyr operating system, which is the officially supported software platform for SweRVolf.

But the biggest thing to happen SweRVolf this year is that SweRVolf will be used as the base of a new university course from Imagination University Programme called RVfpga: Understanding Computer Architecture. I’m very excited (and slightly scared) about soon having thousands of students getting familiar with computer architecture, RISC-V and open source silicon through a SoC that I have designed. And I would like to mention a few things about how SwerRVolf is built because I think it’s a great example of how to create chip designs. When I say that I have designed SweRVolf, most of the work has consisted of putting together various pieces and make sure it works well as a whole. Most of the underlying code has been written by other people, and from my perspective, that is really the most successful aspect of SweRVolf because it highlights the rich open source silicon ecosystem. The main CPU core is from Western Digital and governed by CHIPS Alliance. Most of the AXI infrastructure was developed through the PULP project at ETH Zürich and University of Bologna. The UART and SPI controllers were developed for the OpenRISC project during the first wave of open source silicon almost 20 years ago. The Wishbone infrastructure was developed by me when I started out with open source silicon ten years ago and the memory controller was created by Enjoy Digital and is written in Migen as part of the Litex ecosystem. And to go full circle, the memory controller uses a tiny RISC-V CPU called SERV internally to aid with calibration. SERV, the world’s smallest RISC-V CPU is written by me. Small world. And of course the whole project is packaged with FuseSoC and uses Verilator by default for simulations, so it’s FOSSi all the way. As I hope you understand by now it’s not about some lone hero churning out code, but instead all this has been made possible by a huge amount of work by a ton of people over many years, and I’m proud to be one of them.

SERV

Probably the hobby (read unpaid) project I spent most time on during 2020 was SERV, the world’s smallest RISC-V CPU, which turned from small to even smaller during the year. SERV is very much a project driven by numbers, so lets look at some of these numbers.

In February I got hold of an ZCU106 development board with a huge Xilinx Ultrascale+ FPGA for a project I was assigned to. As this was the largest FPGA I had ever had in my home I got curious to see how many SERV cores I could squeeze into it. The year before, at the 2019 RISC-V workshop in Zürich, I had done a presentation on how to fit 8 RISC-V cores in a small Lattice iCE40 FPGA (spoiler: it ended up being slightly more than 8 eventually), giving each of them a single I/O to communicate with the outside world. Problem this time was that after stuffing in 360 cores I run out of I/O pins. It would also had been practically impossible to verify that all these external pins actually did what they were supposed to do, so I needed some way of using less than one I/O pin per core. Then it struck me that just a few months earlier I had created a heterogenous sensor aggregation platform based on SERV cores called Observer. The idea with Observer was to connect a lot of sensors to an FPGA, each serviced by its own SERV core and then merge the data to an output stream. I gave up on the platform when I realized that while I could fit a lot of SERV cores into the devices, I just had a few sensors so there wasn’t much data to to aggregate. But this platform was a very good starting point.

Block diagram of the Observer platform
By removing all sensor interfaces and just have each core print out an identification message instead I had a system that I could instantiate with any number of cores. Trying this on the ZCU106 I could now run over 600 cores on the FPGA. The next problem then was that I was running out of on-chip RAM way before any of the other FPGA resources. In case you don’t know, most FPGAs contain a number of fixed-size SRAM spread out over the devices, each typically being 1-8kB large. For SERV, each core used one for the RF (register file), and another one for the program/data memory. With RISC-V using 32 32-bit registers, this means that only 128 bytes of the RF RAM is used but since the fixed-size SRAM on FPGAs are typically being far larger than that most of the RAM ends up unused. That’s bad, but I had a plan. With a bit of work I managed to share RF, program and data in the same RAM so that the RF is allocated to the top 128 bytes of the RAM. This freed up half of the on-chip RAM blocks and I could eventually hit 1000 SERV cores on the ZCU106 board. Of course, at this point I was curious to see what the situation was for other boards after all these optimizations. Taking it one step further I figured I should turn this into a real thing by creating a benchmark so that people can have a quick way to see roughly how large the FPGAs are on different boards. And with that ServMark was born.

ServMark lasted for about three minutes until I realized CoreScore was a much catchier name, so that’s we have now.

I had originally planned to do a presentation about SERV at Latch-Up in Boston. But for well-known reasons we cancelled all FOSSi Foundation physical events. Instead I accepted an invitation to speak at the First virtual RISC-V Munich meetup. By this point I had attended a couple of virtual meetups and I hated it. It was in most cases awful to watch a narrated slide deck without a stage and a speaker to bring it to life at least a bit. So, I decided to take a fresh look and look at the possibilities instead of being limited by the medium. I decided to make videos instead. First of all, since people would watch on a computer screen with proper resolution instead of a washed-out image projected on canvas. This meant I could have much more detailed pictures and smaller font sizes. I could also freely mix pictures and animations, fine-tune timings, do several takes of the audio and add sound effects. And despite being done by someone who has pretty much zero experience in these sort of things, I think it turned out pretty well. So on the day of the event, I just introduced myself and let the audience indulge in my fully immersive multimedia edutainment experience about SERV. Not sure why the Oscars committee hasn’t got in touch yet.

Is CoreScore the only attempt to put a mind-boggling number of RISC-V cores inside chips? Absolutely not, and during the summer I learned of the Manticore project from Florian Zaruba and Fabian Schuiki (jointly known as Flobian Schuba) from ETH Zürich, both well-known names in the FOSSisphere from their work in the PULP ecosystem. Manticore: A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing had been accepted into the prestigious Hot chips conference. Manticore is an impressive project, but I still thought it was a bit unfair that I wasn’t invited as well. So I reached out for the biggest FPGA board I could find and then wrote my first academic paper of the year called Plenticore: A 4097-core RISC-V SoClet Architecture for Ultra-inefficient Floating-point Computing. Unfortunately I did not recieve an invitation to Hot chips despite this. I assume it must have gotten lost in the mail somewhere.

Oh well, let’s look at some more numbers instead.

  • A number of optimizations was found over the year which, depending on the measure, further shrank the core 5-10%.
  • The number of supported FPGA boards for the servant SoC grew from 4 to 17, mostly thanks to other contributors (thanks everyone, love you all!)
  • The SERV support for the Zephyr operating system was rewritten and upgraded from the aging Zephyr 1.13 to 2.4, the latest version at the time of writing.
SERV resource usage over time on a Lattice iCE40 FPGA

Coming back to CoreScore, the results right now range from 10 cores on a smaller Lattice iCE40 device up to 5087 cores on a large Xilinx device and the high score table can even be viewed interactively online! If you can’t find your favorite board there, just send a PR and we’ll get it added. And if any people with access to crazy large FPGAs happen to read this, please get in touch with me. I’m very curious to see who will be first to get above 10000 cores.

SERV also saw another great improvement in the form of documentation. While not completely there yet, the functionality of most SERV modules have been documented together with detailed schematics showing the implementation down the individual gates, muxes and flip-flops. And most changes to the source are now accompanied by a code comment to clarify what is going on. Since SERV is optimized for size rather than readability, many parts of the core are difficult to figure out by just looking at the source code. Hopefully this will make it easier for other who want to understand or work on the core, but frankly it has also been very useful for me since I tend to forget why I did things in a certain way so I have had to spend a lot of time following my own tracks.

Schematic of the SERV control unit from the SERV documentation

FuseSoC

The oldest of my open source silicon project still going strong is FuseSoC. It is now about to turn ten years old and keeps growing in features and users for each year. Looking back at the changes through 2020 I can see some new trends in the development. The most important one is that for the first year ever, most of the work was not done by me. During 2020, my fellow FOSSi Foundation director and LowRISC employee Philipp Wagner has been pulling the heaviest load of FuseSoC development. And with Philipp came quality. Dr. Wagner has improved FuseSoC in pretty much every aspect. Bugs have been fixed and features has been added. The development experience has been improved by CI testing, automatic code formatting checks and improved testing coverage. And what makes me happiest is that the user experience has been improved not least by a total rewrite of the documentation into something that is actually useful and can be proudly shown to the world. All this is very much needed as FuseSoC is becoming increasingly popular. It has already been picked up by many of the flagship open source silicon projects like OpenTitanSweRVolfOpenPiton and with the RVFPGA university programme there will soon be a whole new generation who will get familiar with it as well.

Edalize

In 2018, the part of FuseSoC that interacted with the EDA tools was spun off into Edalize. The reason was that it was believed this part could be useful for others who weren’t interested in the whole FuseSoC package. This prediction seems to have been correct and Edalize has very much started a life on its own by now. In addition to FuseSoC, Edalize is now used by several other projects such as SiliceClash and fpga-perf-tool and over the year Edalize has gained support for 7 new EDA tool flows, bringing the total number up to 25.

2020 was also the year when Edalize had it’s first taste of being in the spotlight on its own merits. For the Workshop on Open-Source EDA Technology (WOSET) 2020 I decided to submit a presentation about Edalize. Being an academic conference this also prompted me to write an accompanying paper as is the common courtesy for these kind of events. The paper received a lukewarm response but was accepted anyway. Once again I did not feel like reciting slides to a camera so I turned back to my new-found interest in advanced multimedia productions. And it paid off. The Edalize video won an award for best video at WOSET 2020. Well done Edalize!

LED to Believe

All of the above projects use FuseSoC and Edalize because – well it’s kind of why I created FuseSoC in the first place – to easily reuse components and retarget to different devices. But I also realized there was a need for a dead simple project to help people getting started with FuseSoC – the Hello world of silicon, so to speak. And the Hello World of silicon is of course the blinking LED. So in 2018 I created project LED to Believe with the ambitious goal to create FuseSoC-powered LED blinkers for every FPGA board ever made. The project has several aspects that are useful in different ways. It serves as a very simple introduction to FuseSoC and how to make a design that targets multiple hardware. It is also an excellent pipe cleaner for when you receive a new board. If you can run the project successfully and get the LED to blink, it likely means you have managed to install all the EDA tools correctly which is no small feat, and you also have a template to take on bigger projects. And it’s also fun to see what boards are available out there. While I have submitted a bunch of the board ports myself, the vast majority have come from all the fantastic contributors out there. And during 2020 the number of supported boards grew from 16 to 44. Perhaps not all the FPGA boards ever made, but a considerable chunk of them. And already in the short amount of 2021 that has passed, there have been numerous more contributions so we’re getting closer all the time.

In closing, 2020 was a busy year FOSSi-wise. And this has just touched upon the surface of all things that have been happening during the year. And just as we were about to close the books on 2020, I was informed that Lattice had incorporated one of my FOSSi projects into their shiny new award-winning Propel design suite. Which project, you might ask? Was it the similarly award-winning FuseSoC, to give Lattice users immediate access to a rich ecosystem of Open IP cores? Or was it the Rosetta stone of Edalize, with its award-winning video, that would easily provide a coherent interface for a dozen simulators and make it easy to switch between Lattice’s multitude of FPGA tools such as Diamond, icecube2 and Radiant? Or was it SERV itself, the award-winning CPU capable of offering a RISC-V experience for all but their absolutely smallest offerings? Well, actually, none of the above. It turns out that Propel now contains ipyxact, my somewhat feature-limited Python library for working with IP-XACT files. Not my first choice, but fair enough. I wonder if they have read my somewhat complicated relationship with IP-XACT.

Finally my work is recognized by big EDA vendors (picture by Gatecat)

High-Throughput Open Source PCIe on Xilinx VU19P-Based ASIC Prototyping Platform

By Blog

This post was originally published at Antmicro.

In our daily work at Antmicro we use FPGAs primarily for their flexibility and parallel data processing capabilities that make them remarkably effective in advanced vision and audio processing systems involving high-speed interfaces such as PCI Express, USB, Ethernet, HDMI, SDI etc. that we develop and integrate as open source, portable building blocks. Many of our customers, however, use FPGAs also in a different context, namely for designing ASICs, which is a highly specialized market that typically involves large FPGAs, proprietary flows and IP. In one such project, we were working with one of the largest FPGAs in production today, the 9-million LUT Xilinx VU19. Being a design with considerable complexity, it needed a high-throughput link between the FPGA and the host PC that could be thoroughly benchmarked, analyzed and optimized for the use case.

Implementing PCIe with open source

Implementing PCIe is not completely straightforward as you have to synchronize multiple lines of high-speed bi-directional data. If you hit a bug somewhere in your data flow, things get very tricky to debug, especially if you have no ability to inspect and change the source code of the IPs involved. Being active developers of a variety of portable and reusable open source FPGA IP cores, for the project in question we were able to integrate a fully open PCIe interface into the Xilinx VU19-based ASIC prototyping platform using LiteX/LitePCIe, achieving a pretty respectable throughput of 31 Gbits/s on an 8-lane bandwidth. Although the FPGA chip itself is capable of 16-lane bandwidth for transferring data, the proFPGA board used in the setup supports only 8 lanes, but with hardware capable of bigger bandwidth we can achieve even greater throughput if needed. In fact, the repository also contains instructions for a 16-lane capable VU9-based setup – using a popular and not as prohibitively expensive devboard available from after-market- where we could measure as much as 59 Gbits/s.

PCIe connection between host PC and VU19

Scalable, portable and customizable flows

Our ability to rapidly iterate as well as track down and fix bugs in the system we have created for this customer project demonstrates the scalability and portability of the open source-based approach, and is an example of Antmicro’s wider efforts aimed at developing reusable building blocks and introducing improvements to the whole FPGA ecosystem.

Open source-licensed IP cores play well with the open source FPGA and ASIC tooling that we are building to enable a faster, collaborative and modular system development workflow – a goal that is shared by CHIPS Alliance, of which we are proud to be a Platinum member. As one of the many examples, we are making great progress in enabling open source synthesis and simulation of complex SystemVerilog-based designs, such as security-focused RISC-V cores like OpenTitan’s Ibex. Some of our other projects focus on open source synthesis and place & route flows, linters, formatters, CI systems, simulation platforms, test suites and more.

Flexible system design

The PCIe core used in the ASIC prototyping project also works great in sophisticated computer systems we have been building for our customers. The wide array of customizable and licence-free FPGA IPs and SoC generators that we work with allows us to implement specific functionalities in the devices we build and it includes MIPI CSI and other camera interfaces, SDI, HDMI, ISP processing, video codecs, AI and 2D GPU acceleration, I2S, SPDIF, PCIe, USB, Ethernet, DMA, SATA and DRAM controllers.

CHIPS Alliance Welcomes Antmicro and VeriSilicon to the Platinum Membership Level

By Announcement

CHIPS Alliance continues to grow with more than 25 companies collaborating on open source hardware and software technologies

SAN FRANCISCO, Feb. 11, 2021 – CHIPS Alliance, the leading consortium advancing common and open hardware for interfaces, processors and systems, today welcomed Antmicro and VeriSilicon to the company’s Platinum membership level. Antmicro, one of the initial members of the CHIPS Alliance, has upgraded to the Platinum membership level to reflect its deepening involvement in the organization. VeriSilicon is new to the CHIPS Alliance, although the company is heavily involved in open source activities. 

“Over the past few years, Antmicro has continued to become more involved in the CHIPS Alliance, helping to steer the technical deliverables and strategic direction of this important organization,” said Michael Gielda, VP Business Development at Antmicro. “We’re deeply committed to furthering the goals of the CHIPS Alliance to realize the vision of open source RTL designs and tooling for silicon and FPGAs.” 

In addition to his role at Antmicro, Gielda is Chair of Outreach at the CHIPS Alliance, helping to drive the marketing, educational and community activities of the organization. Antmicro provides development and commercial support services for open source IP, systems and tools, actively participating in a number of other open source projects and initiatives including RISC-V International, OpenPOWER Foundation, Renode and Zephyr Project. Antmicro is also propelling many of CHIPS Alliance efforts like open source SystemVerilog support and FPGA & ASIC tooling. 

Said Wayne Dai, President and CEO at VeriSilicon: “We have been impressed by the momentum the CHIPS Alliance community has generated over the past two years, and we look forward to helping to drive its next phase of growth and development by joining as a Platinum member.”

In 2018, VeriSilicon was instrumental in establishing the China RISC-V Industry Consortium (CRVIC), which has more than 120 members today. VeriSilicon is also a member of RISC-V International, and is eager to expand its open source efforts by joining the CHIPS Alliance. With the company’s strong growth over the past two decades, the company recently celebrated a new milestone with its entry to the Sci-Tech Innovation Board (STAR Market) of the Shanghai Stock Exchange in China.

“The addition of Antmicro and VeriSilicon to our Platinum membership level demonstrates the growing commitment we’re seeing from companies across the silicon ecosystem,” said Rob Mains, Executive Director at CHIPS Alliance. “As we continue to expand our membership base, we remain laser focused on targeting other parts of ASICs beyond the CPU core, open sourcing the tools needed to work with ASICs, and providing real, battle-proven reference implementations and project infrastructure.”

As Platinum members, Antmicro and VeriSilicon are entitled to appoint a representative to the Governing Board and any Committee. Additionally, a representative of each Platinum member company is eligible to be elected Chair and/or Vice Chair of the Technical Steering Committee (the “TSC”). Furthermore, Platinum members get ten complimentary registrations for CHIPS Alliance workshops and events during the year of membership, along with each company’s logo prominently displayed in CHIPS Alliance online and print materials.

About the CHIPS Alliance

The CHIPS Alliance is an organization which develops and hosts high-quality, open source hardware code (IP cores), interconnect IP (physical and logical protocols), and open source software development tools for design, verification, and more. The main aim is to provide a barrier-free collaborative environment, to lower the cost of developing IP and tools for hardware development. The CHIPS Alliance is hosted by the Linux Foundation. For more information, visit chipsalliance.org.

About the Linux Foundation

The Linux Foundation was founded in 2000 and has since become the world’s leading home for collaboration on open source software, open standards, open data, and open hardware. Today, the Foundation is supported by more than 1,000 members and its projects are critical to the world’s infrastructure, including Linux, Kubernetes, Node.js and more. The Linux Foundation focuses on employing best practices and addressing the needs of contributors, users, and solution providers to create sustainable models for open collaboration. For more information, visit linuxfoundation.org.

CHIPS Alliance Brings on Rob Mains as New Executive Director

By Announcement

Industry veteran to lead open hardware consortium democratizing silicon innovation

SAN FRANCISCO, Feb. 8, 2021 – CHIPS Alliance, the leading consortium advancing common and open hardware for interfaces, processors and systems, today announced the appointment of Rob Mains as the organization’s new executive director.

Rob has over 35 years of experience in software engineering and development, with 25 years of experience as an EDA software architect focused on microprocessor design and advanced process node technologies. He most recently served as a technology advisor at Spillbox, and prior to that worked in leadership and senior engineering roles at Qualcomm, Sun Microsystems (staying on at Oracle after the acquisition) and IBM. Throughout his career, Rob has worked closely with hardware developers to play a hands-on role in helping to devise innovative solutions for a wide range of applications.

“Rob is an ideal fit for the CHIPS Alliance with his strong leadership experience and deep understanding of the silicon industry,” said Dr. Zvonimir Bandić, Chairman, CHIPS Alliance. “As the CHIPS Alliance runs full steam ahead with its growing membership, impressive technical milestones and other activities, we look forward to having Rob on board to continue this strong momentum.”

“As more companies are looking to open source solutions to help eliminate design barriers, reduce costs and speed up development time, the CHIPS Alliance will play a critical role in advancing open hardware for the benefit of everyone,” said Mains. “I look forward to working closely with CHIPS Alliance members to continue the organization’s goals, while also focusing on growing the membership base.”

Today the CHIPS Alliance has more than 25 members collaborating to accelerate the creation and deployment of open system-on-chips (SoCs), peripherals and software development tools for a wide range of applications. To learn more, check out the CHIPS Alliance 2020 Annual Report: https://chipsalliance.org/chips-alliance-2020-annual-report/.

About the CHIPS Alliance
The CHIPS Alliance is an organization which develops and hosts high-quality, open source hardware code (IP cores), interconnect IP (physical and logical protocols), and open source software development tools for design, verification, and more. The main aim is to provide a barrier-free collaborative environment, to lower the cost of developing IP and tools for hardware development. The CHIPS Alliance is hosted by the Linux Foundation. For more information, visit chipsalliance.org.

About the Linux Foundation
The Linux Foundation was founded in 2000 and has since become the world’s leading home for collaboration on open source software, open standards, open data, and open hardware. Today, the Foundation is supported by more than 1,000 members and its projects are critical to the world’s infrastructure, including Linux, Kubernetes, Node.js and more. The Linux Foundation focuses on employing best practices and addressing the needs of contributors, users, and solution providers to create sustainable models for open collaboration. For more information, visit linuxfoundation.org.