[For readability, moderator comments have been removed, as well as minor questions for better understanding.]
Yes, I'm going to tell you a little bit about the Bluespec high-level design languages. I would guess that most people might have heard the name Bluespec, but perhaps don't know very much about it at all, other than having heard the name. There's also a small terminological confusion in that there's a Bluespec company and there's the Bluespec languages. They used to be the same at one point, but the Bluespec language has been open-sourced completely for about three years now, and the Bluespec company is not directly involved in Bluespec language anything, other than allowing people like me to work on it and giving any resources for that. So I'd like to talk about, just give you a sense of what the language is about. It's yet another high-level design language, many of you will be saying, so what's different? And there are many dimensions or differences between different other HDLs, but I think the one fairly unique one, in the sense that I've not seen any other HDL that takes this approach, is the semantic model of condition-action rules, which is inspired by languages in formal verification.
So I'm going to mostly spend my time on that. There's also a different dimension, which is how it borrows lots of ideas from Haskell for static elaboration, et cetera, but we don't have enough time to go into both of these things, so I'm just going to focus on this thing, which is fairly unique. So I'll do it by means of a small example. So what Bluespec, the language is, is being inspired by, right from the beginning, by languages for formal specification of highly concurrent systems.
So I'll start by giving a small example. Just imagine a very small toy coherent cache system. So in this example, L2 is your L2 cache. We have two L1 caches, A and B, and we have FIFOs that do read-write requests from, let's say, from two CPUs into the L1 caches, and they may be coming concurrently into those CPUs. Alright? So let's imagine this is the initial state of the system, where L1_A has the data X in exclusive mode, maybe we simply did a write onto the data. It's invalid in L1_B, and L2 says, just has a notation that it's available in exclusive mode in A, and perhaps has an old value Y in it.
And now imagine a read request comes in to B, so we can imagine transitioning to a state like this, where the states have all become shared, the current value X is available everywhere, and L2 has a notation that it's available at both A and B. Once we have it in the state, seeing a read on B, we can immediately satisfy the read and put back an X.
So what I've shown you in a picture can be expressed in text as a rule, a condition action rule. This rule says if the request on B at the head is a read, its state is invalid, and L2 is exclusive, then here are some updates to the state that will transition the L1, L2s and respond to B with the value X in it.
So this, essentially the entire behavior of the cache can be specified. So this, I'm not claiming this is an implementation, but you can specify the cache behavior using rules like this. So for example, I'd have a similar rule if the request was a write, and similarly a pair of rules for reads and writes on A, right? And the nice thing about rules is that they are concurrent, in the sense that if rules don't interfere in any way, they can run concurrently, there's no ordering between rules. You can non-deterministically choose what order you execute rules on if you want to pick an order. The only thing you need to, that is important, is that rules should be treated as atomic transactions. And why atomic transactions? Because we want to think in terms of invariance for correctness on the overall cache system. An invariant may be a correctness condition like if one of the L1s is in exclusive mode, then the other one must be invalid, for example. Right? So we want you to check. So with rules, you can reason about that if the invariant was true before the rule, after this rule, is it still true? And atomicity gives you the tool exactly to do that kind of reasoning.
So here's the second version of the same problem, except I'm going to write it with much finer granularity rules, right? So start with the same state. And now my rules are going to be local. In other words, every rule is only going to look at one of the elements of that cache system, and it can send and receive messages. So when we see this initial state, we send a shared request to L2, and we go into a pending mode on B. When L2 sees the shared request, it sends a downgrade request up to L1_A, saying please get back into shared mode, and it also goes into pending mode. When that sees the shared thing, it goes into shared state, returns a shared response, and says here's the new value X. When the L2 gets that, it goes into shared mode and sends an acknowledgment back to B, saying okay, you're shared. And finally, B itself can go into shared mode, and now we're just like in the second state that you saw in the previous slide, and we can respond with an X.
So this is much more closer to the implementation because it's local. I'm not looking at state all over the system as if I can instantaneously look at state. I'm accommodating the idea that there's a cost to be paid by communication. Now this whole thing could also be written in exactly the same syntax, the same notation language, except that every rule now, if you look at it in detail, I've just given about three or four of these rules, is set up to be a local examination of your current local state and incoming messages, and the transition is the local change in state and outgoing messages.
So this is not a new idea in general, the idea of condition-action rules. It goes back to perhaps the earliest days of LISP programming in the 50s and 60s. I've mentioned some formal specification systems, TLA+, UNITY, term-rewriting systems, Event-B, et cetera, that all use modular different syntaxes and details, roughly the same idea. You have condition-action rules. Rules are not ordered. They can go in any order, so they give you concurrency. They're non-deterministic. You can choose them in any order you like, and the only thing is they all assume rules are atomic. So, and further, the other nice thing about having the single languages, you can refine, right? In my first slide and my second slide, I showed you the solution to the same cache-cores problem with rules at different granularity, where in some sense you're lowering it down closer to the implementation.
So given that background, let me now tell you a little bit about BSP and BH, these languages. So essentially, the syntax and language I showed you in the first two slides is actual BSP code. I can push a button, run it through the BSP compiler, and get synthesizable RTL out of it. So, and that RTL, then you take it through standard RTL flows, FPGA, or ASIC, and essentially all behavior is written in this way, as in these formal specification languages. There is no other behavioral description other than this rule notation right here. So we have two such languages for historical reasons.
The original compiler 20 years ago, when we first started it, was written by Haskell enthusiasts, had a very Haskell-like syntax, didn't go very well in the hardware engineering community, so we redesigned the syntax to be a more SystemVerilog-ish syntax, so that's called BSV. But essentially, it's the same language, your choice of front-end, and you can mix and match, mix and match them.
So it's, so this is not a new language, as I said, it's newly open-sourced. It's been about 20 years in the making, so this is not a research language, this is not a toy language we're experimenting with. We're using this, and it has been used a lot in production in many, many situations. And being of that provenance, of that much time, it's a so-called batteries-included language. In other words, there's lots and lots of libraries available, so that you don't design every little piece of IP from scratch, whether it's caches, or MMUs, or RISC-V CPUs, or debug controllers, interrupt controllers, AXI buses, all of these are available as open-source libraries, so that you can sort of focus more on the architecture of something that you might be designing, rather than having to build everything up from scratch.
So it's been used, like I said, it was a proprietary tool from the company for 15, 16 years, during which time it was used for ASIC design in ST and in TI, for some rather large subsystems. It was used for modeling in IBM and other places, because the nice fact about the fact that you have the same notation, whether you're, no matter what level of abstraction you're describing it at, means you can actually compile and synthesize even the higher description. So even the thing I showed you in the first slide, which was unrealistic from an implementation point of view, if you're looking for performance, is still compilable to RTL, and you can still produce an RTL, you can run it on FPGAs as is, even the specification can be run on FPGAs. So IBM, recognizing that, was using this as a modeling tool to explore microarchitectures for some of their power CPUs, etc., back in the 2000s.
So nowadays, I would say it's mostly used by people for FPGA programming, a lot of it in the RISC-V space. So it's been open-sourced, like I said, about three years ago, and so I hope that gives you a quick sense of what's in Bluespec. So this is a unique feature. You'll see other languages, for example, like Chisel has exploited Scala for high-level expression of RTL, if you like, because it generates RTL from that. This is the only language I'm aware of that uses this idea of condition-action rules as the behavioral spec, and like I said, the whole inspiration has been for formal verification. We and the company haven't used it.
First of all, I'd like to say that even if you're not using formal verification as part of your automation, the idea that it has a formal semantics and has a formal way of doing refinement is useful, even if you're doing it manually. Most of what we have done has, in fact, been done manually. The automation work, there's a bunch of work at MIT where they have done pretty advanced RISC-V superscalar, out-of-order speculative processors and proved them correct using Bluespec, and we're in the process of doing some DARPA-related projects also along these lines.
So I'll stop at that point with this main point. Just as a teaser, it's in the slide, so you can see it later on. It's too much to look at at this point, but the other side of Bluespec is that it uses essentially all the power of Haskell, including polymorphic types, polymorphic parameters, higher-order functions, et cetera, to make it into a very powerful language for circuit description. So if you think of generating your circuit as two phases, the static elaboration part in some sense is the circuit description part by which you describe what's the structure of your circuit, what are the pieces, what connects to what, and how do they connect correctly from a types point of view to each other. And then there's the behavioral part. So the condition-action rules is about the behavioral side of Bluespec, and there's also a whole chapter on the Haskell-based static elaboration, which you can talk to me about separately outside or later on.
Q&A
-
So, obviously, it's a very good decision. I'm always interested in hearing the reasoning behind these things. I'm just curious why your company decided to make a Bluespec compiler open-source. I'm always interested in knowing why people do this.
Yeah. So we essentially gave up on trying to commercialize this. We tried it for more than a decade. And I think what I find is that if you look at similar experience in software languages, open-source languages succeed. Open-source languages themselves might succeed if you have enough oomph in them. Let's say Python, for example. Or even if it's not an open-source language, it might succeed if you have a big name and big resources behind it. So I point at Java, for example. Why did people adopt Java? Well, Sun Microsystems was behind it. And why do people adopt C-sharp and F-sharp? Microsoft is behind it. So we can't compete in that space. We're not a big company like it. So unless we got acquired by one of the biggies and one of the biggies promoted it, then it would take off. But it didn't. So we gave up eventually. We found over time also that a lot of our activities in the company drifted more into services and IP and things like that, for which we use Bluespec for everything we do, the language. But as an EDA tool, we just gave up on that.
-
So this is a very different execution model from what we're used to, right? I can very much see how it's useful for describing caches and CPUs and stuff. But are there things that you tried to design with it where the execution model was a limitation, I guess?
I don't think so. So I think both for... So this raises an interesting question, a related question. Other people ask it. It was not your question. It was how does this compare with HLS, for example, right? So one interesting thing is that one separation I make is that whether you take Verilog, VHDL, SystemVerilog, Chisel, Bluespec, you, the designer, are completely ultimately choosing the microarchitecture of what you do. There is no guesswork. The tool is not making any innovation, except local combinatorial optimizations and all. Generally, the overall structure of what you're doing is designer-specified. Whereas in HLS, the tool has an architecture and it has knobs by which you can adjust details of that architecture. So that's one major difference. So we sit very much in the Verilog, SystemVerilog, Chisel, that camp, as opposed to HLS.
Then coming back to your question, then, is if you think all of these languages, Verilog, SystemVerilog, VHDL, and Chisel, all use the classical clocked synchronous circuit model as their, effectively, their behavioral model on top of it, right? And here we are lifting it up beyond that to the rule-based model. I didn't say anything about clocks in any of my description out here. So there is a question of how you synthesize from there into, because ultimately we produce standard clocked synthesizeable Verilog.
And ultimately, it comes down to scalability in your reasoning about correctness, right? I think for small circuits, it doesn't make much of a difference whether you use the conventional Verilog, SystemVerilog, or Bluespec. But the thing about this atomicity is it's a global property. In other words, the way in a Bluespec program, you can have a module that has rules, that rule might invoke a method that crosses the boundary of the module into another module, which you can think of the method as a piece of a rule that's extending, and that may transitively go beyond that to multiple modules.
And it's in that kind of a situation where atomicity becomes a nightmare if you have to think about every level of detail of arbitration of every possible interference, especially when there's a lot of conditional interference. That is, on some conditions it interferes, and on other conditions it doesn't, etc. So that is where the value of having this atomic transaction model really helps in quickly composing scalable systems that have a lot of this kind of thing.
It's definitely a big advantage on circuits that have a lot of control flow in them, because that is where the conditional interference, etc., becomes most prominent. If you have very much of a data path, you know, the kind of thing that HLS does well on loops and arrays and all that, I don't think Bluespec will give you much of an advantage on that.
-
At one point, they just go away, but there were fashionable discussions of self-timed logic and systolic operation. I guess they are very good at that.
So, good question. So the question was about self-timing logic and asynchronous logic and things like that. And again, Caltech was, of course, very prominent in that. That's a very interesting question because rules, as I gave you, didn't say anything about clocks. And it seems like a natural fit for asynchronous logic, self-timing logic, etc. It's a topic to be, somebody should explore. It's the kind of thing where we've always had this idea that it could be targeted towards asynchronous logic. There was a company called Acronix that was doing asynchronous logic-based FPGAs that we had some conversations with, etc. We just didn't have the bandwidth to explore that, per se. But absolutely, it could be, the same rule-based source language could, instead of our synthesis method that goes through into clocked Verilog, could have gone into asynchronous logic as well.