Frans Skarman at OSDA 2023 -- Spade: An Expression-Based HDL With Pipelines

Frans will present Spade, a new open source standalone hardware description language. He will show how Spade's abstractions and tooling, which is inspired by software languages, improves the productivity of an HDL without sacrificing low level control.

[For readability, moderator comments have been removed, as well as minor questions for better understanding.]

Yes, thank you very much for the introduction. Yeah, so I'm Frans, I'm going to talk about Spade, and of course we need to motivate things. So my motivation is very much in line with the answer to my question. It's abstraction, and to me this is sort of saying what things we have to think about and what things we are in control over. So as an example you might have, you have Verilog and VHDL. They are going to be low-level in almost any way you look at them, but you can do low-level Verilog and sort of instantiate individual AND gates in a netlist, or do high-level Verilog doing more behavioral description and reasoning about sort of individual operations on individual bundles of bits. On the other end of the spectrum we have high-level synthesis. Here we've given up a whole lot of control, but we also have way less things to think about. Now Spade is absolutely not a high-level synthesis tool. Just like Bluespec it falls on the lower end side of the spectrum, and the goal with everything I'm trying to do here is to retain the control that you have with Verilog and VHDL, but push the amount of high-level reasoning you can do while still retaining that control. So first of all I want to give a... Alright, I also want to steal a bunch of stuff from software languages, because I'm a software developer and I miss a lot of things when I come to Verilog and VHDL and other HDLs.

So I want to start off with an example of what the language looks like. This is sort of the "Hello world!", the counter that comes from some value, or from zero up to some maximum value. The inputs are separated from the outputs, so here we take a clock, reset, and a max value, and this thing produces a current counter value. And the whole language is more linear than Verilog and VHDL, for better or for worse, but I think it's easier to reason about things in a linear way and then explicitly do non-linear flow of data, I suppose. To define a counter we need a register. We call that register 'val' and we clock it by this 'clk' signal. The register statement is the only sequential statement in the language, everything else is combinational. We specify a reset, so if the 'rst' signal is true, then we set it to zero. And then to describe the behavior of the circuit, we give a new value of the register as a function of the old values. So if the current values are the 'max' value, the new value will be set to zero, otherwise it will be the old value incremented by one. Some key takeaways here. First of all, it's an expression-based language, so instead of saying "if 'val' is 'max', set 'val' to zero", we say "'val' is the result of this 'if' statement", and then we have to specify a value in each branch, and that prevents a few bugs, and once you get used to it, I think it's a more natural mapping to hardware than the imperative style that most HDLs have. It's a statically typed language, so we specify all the types. It also tries to make sure that you don't accidentally throw away information that you might need. So if you do "'val' plus one", that could overflow, so the language will not allow you to throw away that overflow implicitly. You have to call this 'trunc' function to say that "No, I actually do want to throw away a bit here", since we have a feedback in the circuit. It has type inference, so you don't have to specify any types inside the body, the compiler figures them out for you, as long as things are fine. If the compiler finds some inconsistencies, then it will, of course, alert you. So you get the benefits of static types without the annoying typing. Unlike Bluespec, this is a cycle-to-cycle description of your hardware. You give the new value of all the reducers as a function of the old value of all the reducers, but you have a lot more structured tools for doing this than you do in Verilog and VHDL.

One of those tools that you have access to is a pipelining feature. So on the left here you have the code, on the right we have the resulting hardware. The thing I want you to pay attention to is these three lines. First of all, the head of the pipeline specifies its latency. Just by reading the head, you can now see that the delay between input and output in this module is going to be two. Then you have the two 'reg' statements, which specify that all the variables above a 'reg' statement should now be registered, and when you refer to those variables below the rig statement, refer to the register version. This decouples the description of where you put pipeline reducers from the computation that you're performing inside the circuit. This is useful when you're describing the circuit originally, but it's way more useful when you have to refactor things. So let's say we realize that this 'g' block at the top is too slow. It's breaking our f_max, so we need to optimize it, and we do that by pipelining it. Normally you would have to do a bunch of thinking now, like "What do I have to change because of this?", in Spade, because the compiler knows about pipelines. It will first tell you that, hey, you first need to instantiate this 'g' block as a pipeline, and you specify the depth there, so if there was more complex behavior here, the user would need to go back and check that, okay, there is no, either change the circuit to match a new description, or, yeah... In this case, the compiler will actually figure out the problem for us. We have this line going backwards through the pipeline. The behavior of our circuit changed, so the compiler will tell us that this 'x' value is not available where you're trying to use it. You need to use it in a later stage. Of course, the solution here is to delay the computation of this whole pipeline one cycle, so insert these two registers, these two new registers into the pipeline, and because we decoupled the description of pipelining from the description of computation, the only change we have to make is to put 'reg' there, and update the depth of the outer pipeline, because now that changed the latency of the pipeline.

This pipelining feature has a few more things that I don't have time to go into. You can do feedback and bypasses, so you can say, "I want to refer to the value of this register two cycles ago", or, "two cycles in the value as it appears in two stages below me", for example. This is very useful if you're doing something like a processor, where you have a register file that sort of feedbacks on itself. You can also do, as of a few weeks ago, there's some built-in support for dynamic behavior, so you can say that all of the pipeline registers in this stage should stall if a condition holds, and then it will stall all the pipeline stages above this, and this allows you to do correct flushing, and there's almost enough support in the language now for doing backpressure negotiation between the pipelines, so you can solve that as well in a structured manner. That's the pipelining construct.

One of the things I wanted to steal from software languages was the type system, so one of the major things I miss in a lot of languages that don't have this is the 'enum', and the more powerful 'enum' in the Rust sense rather than the 'enum' of C, where it's just one of a set of values. The type system supports generics, so we define a type here called 'Option', and it's going to be generic over any type 'T'. We could put integers in there, we could put 'structs' in there, other 'enums', whatever we feel like. This option type will take on one of two values, either it's 'Some', in which case a value is present, or it could be 'None', in which case no value is present. The best way to view this to me is as a valid bit that is bundled with the data it validates, so the representation of this type will be a tag along with a value. If the tag is zero, the option type is 'None', and the value bits are undefined, and we're not going to be allowed to access these bits unless we first check that the tag is '1', in which case the compiler will give us access to the bits, so this prevents reading data that wasn't really valid.

This is very useful for a lot of other things as well. You could model commands on a bus. For example, if you have a memory, you could have no operation on the memory, you could have a read operation or you could have a write operation, and you can bundle the data that it needs.

You can model an instruction set, so you have a 'Set' instruction where you have a destination register and an immediate value, you have an 'Add' instruction where you have a destination register and two input registers, or you could have a 'Jump' instruction which has a target, and this will be encoded in a similar way, and it's very nice to match on these and then you only get access to the fields of your instruction that you actually have a use for.

Finally, tooling. I think tooling is a very important part of any language. I showed briefly the compiler error message. That's something I'm very passionate about. The compiler should give you useful error messages that describe what you need to change to make things work, so any unclear compiler error message acts as a bug.

We have test benches in cocotb, and cocotb is a Python testing framework for Verilog. The thing that Spade appends to it is the ability to write Spade code inside cocotb, so you can write a string with a Spade expression, it goes out to the compiler, compiles it, and that way you don't have to care how the compiler decided to encode your 'enum' or your 'struct'.

And there's a build tool. It can manage dependencies. Here I say I want a RISC-V implementation from a path and a RGB-led driver from some Git repo. It can call build tools for you, so Yosys and nextpnr, and it's scriptable via plugins, so this specifies that I want to bring in a plugin that generates a program memory from an assembly file.

The last slide, Spade is an open-source project. Of course, this is OSDA. It's implemented in Rust, but it's not using any of the Rust compiler, it's not embedded inside Rust, it is a standalone language, it's just that I take a lot of inspiration from it and I implemented the compiler in Rust. It's targeting Verilog, which is not great, it's easy to do, and all the tools support it, so that's why I decided to do it, but I would like to explore something else, like CIRCT, Calyx, or RTLIL, because that would give me a whole lot more nice features.

That's all I have to say. If you want to learn more, there's a website, spade-lang.org, or you can follow me on Mastodon, where I ramble about this language.

Q&A

I'm trying to grasp how this fits into your first diagram of the varying levels of abstraction. How would you compare its capabilities with something like Chisel, for one, I guess?

For Chisel, I actually have a slide, because I get asked this question a lot. So, of course, Spade is a new language. It's Chisel, and all the other languages are going to be way more mature. I would say all of the languages kind of push the abstraction level, but Chisel, in particular. And, well, I guess Chisel and all the other hardware construction languages, as I think they call them, they do so by doing meta-programming. But when you can't use meta-programming, when you're describing the individual operations that you want to perform, you're still doing bundles of bits and individual operations on those bundles of bits. So there's no pipelining feature, because it's just bundles of bits. You don't have any nice types at the hardware level. You don't have this pattern matching that you can do on the 'enums'. I didn't show that off due to time, and they're also imperative. So I think you have this "if this happens, set this," which I think is sort of the wrong way to view hardware. And there are also embedded languages. So to me, they feel kind of clunky. You have to do "when/elsewhen/otherwise" instead of "if/else", which is fine, but then the autoformatter kind of messes with it when you do that. And, yeah. There's some other stuff as well. You can read the points on the slide. But I hope that answers your question. You asked about LiteX as well?
Which is in the same space.

Yeah. [...] Yeah.
Could you give us an example of what you are designing with this?

Sure, yeah. So, it's kind of early stage still. I have a working RISC-V processor, a five-stage pipeline thing that it only supports the base instruction set for now. I've built the controller for a research project I'm working on which is sort of doing dynamic programming. So it's feeding a bunch of stuff into a long pipeline and writing the results back to memories. I'm playing around with talking to SDRAM now. So I, for that I realized that I don't really want to write a SDRAM controller right now. So then it's more like "Can I integrate LiteDRAM with Spade in a nice way?" And some random games. A few friends of mine built a game during a game jam.
I've used, I guess, Migen, which is the older version of Amaranth. A lot. And I guess your work takes 10 hours. I'm trying to figure out how much time I'm going to use. I also love Rust. It's a nice thing. The only thing is how much could you push this sort of fatal thought of basically leveraging typing and Rust-like tightness which comes with a lot of relief that you take it over time. But at the same time, how much frustration has it been in terms of trying to put things in that way? Has it been a positive journey? And how do you feel about it?

I've had lots of fun with this project. Started two years ago as a hobby project and then it turned into a work project and I still find it super fun to work on. It's a fun challenge. It's nice to have things to borrow from. With your typing thing, anything that isn't a thing in software but is a thing in hardware is like modeling ports. So I didn't get into that here. But I have a system for similar to the lifetimes in Rust but for modeling so that you can only use a memory port, for example, once. If you try to give a memory port to two different independent circuits, the compiler would say "No! It's being used already."
That seems to be going a bit heavily on the Rust side. I do enjoy the fact that when I use Migen, it's a Python-based thing and you have this introspection of the objects and actually there's a lot of power to be managed there. Unfortunately, this kind of strict typing, which I think is coming with Python, but at the same time there's a trade-off between [?] and ease of expression.

Yeah, I'm very much on the typing side of things.
I could follow up on that also. As I mentioned in my talk, we use essentially all Haskell types. It's not a subset of Haskell types. We are just not a believer of that. There are all kinds of interesting, apart from just generally matching types and making sure that I'm right. There are other aspects that you get out of that that are really beneficial. For example, in Haskell you have a concept in the type system called type classes, and it's a kind of very disciplined structure for overloading. When you do that, what you can do is, for example, a little earlier, 'Any' in the 'struct'. There's two separate questions. One is logically, "What is the 'Any' in the 'struct'?", and then the other is physically, "How do you represent that in bits?". One way to do it is a particular tagging scheme this way or another tag scheme that way. Type classes completely solve that problem. You can separate out the concept of logically what is an 'Any' in the 'struct' versus what the representation is. That's one example of type classes. Another kind of place where you use type classes is if you think of a large system that has component status registers floating all over the place in different modules. Ultimately, that's a global space of component status registers. Plumbing it through module interface is often extremely messy. Again, type classes completely solve that problem for you because you can hide the plumbing using the monadic types. We're just believers in very strong polymorphic typing in hardware design languages.

[Comment from the audience:] I agree.
I have just one question. How did you choose pipelining? That the user types pipeline, data types, latency; ...and why not let the tool do that for you?

The answer is I want control of the retiming, I guess. The reason I started this project was I was doing a bunch of pipelining stuff, and then if you're doing it in Verilog, you do _s1, _s2, and then you have to make sure you refer to things. I still wanted that level of description. I just didn't have to do it. I didn't want to do the work manually because it seemed so easy to automate. If you want more, like you specify a latency and it does the retiming for you, you should look at PipelineC, which is another HDL, and it does this iterative retiming automatically.

Share this page!

Spade: An Expression-Based HDL With Pipelines

Q&A