Share this page!

SRAM Design with OpenRAM in SkyWater 130nm

In this talk, Prof. Guthaus presents the current status of the OpenRAM project including Skywater 130 tape-out results. In addition, Prof. Guthaus will discuss the future roadmap of the OpenRAM project features and support for newer technologies.

[For readability, moderator comments have been removed, as well as minor questions for better understanding.]

Thank you everybody, so thank you for the nice introduction, hopefully I won't be the first one to disappoint by going over ten minutes, but we'll see, I have a lot of slides, so...

Now I'm not going to talk a lot about kind of what OpenRAM is, how it works and so on, I've given a number of other talks online that we can look at, this one's actually going to focus a lot more on some actual first silicon test results that we've gotten and measurements and so on, but you know the TLDR, OpenRAM's a memory compiler in Python, it has kind of reference flows and so on, I think the newest things in the last six to nine months are you can now 'pip install' OpenRAM, it doesn't 'pip install' all the SkyWater stuff because that's quite a big set of cells and libraries, but we're working on something for that, and then we've also moved as well from a Docker type set up to more of a Conda type tool set up, so we're kind of making improvements to it from a software perspective over time, which is interesting.

Now the interesting thing is we've actually gotten back some of our first results, we've made two test chips, the first one was OR1 that I did with efabless and Google way back before the Google SkyWater open MPWs, so this was kind of a dedicated test chip to get that going, this actually was all open source OpenRAM except the DRC-LVS was still the proprietary PDK from SkyWater.

And then we did another test chip and we've actually done two more since then this, but the second test chip was actually using the Caravel project with efabless, and we did 10 SRAMs on this, including five dual port memories, five single port memories with a bunch of different configurations to kind of be able to hopefully test and characterize the memories in real silicon. Now some of the challenges, you may think memory compiler is easy, it's just an embedded for loop, you make the array, you're done, it's not quite that simple, there's a lot of control logic and annoying things you have to deal with.

One of the annoying things is how to deal with the bit cells when they're foundry specific bit cells, specifically the bit cells from SkyWater were our first experience with this because we were originally an open source tool with a free PDK and scalable CMOS technology, which we didn't have anything about the lithography information in those processes. So SkyWater started to expose us to some of that stuff, and we had some reference arrays of known bit cells from SkyWater and we basically reverse engineered an old memory to basically make the OpenRAM for SkyWater, and as you can see here, we have an example of the dual port bit cell, which we extracted from an array, and this included a strap cell as well as a well tap cell all in one.

And then the second iteration we started adding, that was on the first tape out, then the second tape out we added the single port memories, which this had a much more dense and complicated bit cell layout, including custom corner cells and a lot more integrated optical proximity correction for the lithography. And so we had to do a more customized placement of the array, which required writing some new code to do that for OpenRAM, using our kind of plug-in interface to do custom modules. And you can see here the single port bit cell along with a separate strap cell and the corner cells.

So you can see the size difference of the bit cells. This is a little bit unfair because this dual port cell on the left includes the tap cell and the one on the right does not, but the difference in size is quite dramatic. And you can also see kind of the customization of the layout of the cell of the single port is, you know, uses some non-rectilinear geometries and so on. So it's a much denser cell.

Now one of the challenges we had was how to verify these. So our first tape out we used the commercial tools, which were able to handle a lot of these proprietary rules. The open source tools were less flexible to the non-user design rules. So because the SRAM cells have all of these OPC layers that help with the lithography, they violate a lot of the user design rules. And so we basically went with an approach to replace the bit cell and any sort of offending cell with an abstract view cell that passes DRC, but doesn't necessarily have all of the features in the cell. You can see here an example of the 6T single port SRAM cell, which we used to actually do the connectivity analysis to make sure that the bit lines and at least some of the high level stuff passes DRC while ignoring the other contents of the bit cell. Now, so that's kind of how we address the main arrays.

We also had some other kind of custom stuff needed for our control logic in our memory. We do a replica-based control scheme where we use a kind of a fake column that is all the bit cells are pre-programmed to a logic zero. And we use that to generate the timing for our array. And in order to do this, we had to generate a replica bit cell in red that's programmed to zero, as well as a dummy bit cell that has the bit lines disconnected.

And so we made those cells by very slightly perturbing the layout. And we're hopefully going to be getting some x-ray analysis of these bit cells to see how good our guesses were and how the lithography would play out. The benefits of open-source community is someone's going to do that for us.

We did the same with the dummy bit cell as well. Now, decoders. I'm going to skip ahead to some of the actual results. So we taped out. Oop, jumped ahead too quickly. There's our actual dye photomicrograph of the first one. We don't have the second one yet, but we have it on our actual silicon.

And who thought you'd see a shmoo plot in an open source talk? But we've actually got silicon measurements of that first SRAM. And it's functional. I think the main challenge was a lot of the routing at the top level. We didn't buffer a lot of the signals to do timing optimization to connect to the SRAM. So we're actually limited in performance by the interconnect connecting to the SRAM, rather than the SRAM itself. And you can see, you know, we tested over a different set of corner temperatures, voltages, and so on. And it was working up to around 40 megahertz, which is not bad for a first go.

Then we also did voltage measurements as well. And then finally, we did a voltage retention analysis. And we see that it retains voltage down to about 440 millivolts. Then we raise the voltage back up and be able to read the contents back. So we actually have some characterization results, which are encouraging. Then the second test chip, we have it on my desk. It's configuring the IOs and we don't have a lot of life out of it yet. But hopefully I can talk to Tim more and we can come up with plans to get it a little more analyzed. It's one of the reasons I'm here.

And so future work, we've also just released OpenROM. So this is a NAND ROM generator. It's not integrated with OpenLane yet. We're porting to Global Foundry's 180. And we also got some ReRAM test structures on the last MPW. And we're working on ReRAM arrays in OpenRAM as well. So a lot of different information. I don't think I went too far over my 10 minutes. And I do want to leave time for some questions. So...

Q&A

  1. You said you need a limited [?] of software, and this [?] a lot of times.

    Yeah, so the Python itself is, it implements a lot of stuff. And it's a lot of stuff that we don't have. It uses a lot of open source tools in the backend for simulation, DRC, LBS. We try to use kind of a wrapper idea where we disguise the interface. So we can use, for example, simulation with, you know, HSPICE, NGSPICE, Xyce. Any simulator that's kind of standard, we have an interface to it. Inside OpenRAM itself, we actually have a lot of data structures for layout, for hierarchy of logic, you know, transistors, devices. We have a data structure and an API to basically interface with all of that. And so it's meant to be a flexible interface that you can basically generate any sort of custom layout, you know, whether that's regular.

  2. Yeah, so it's my experience, any structural code that starts with a sizeable, which is not, you know, like, too large, it gets unmanageable.

    Yeah, we also have some, it does become unmanageable in a certain extent, but we also start to automate certain things. Like we have a channel router. We have, we actually have a maze router that's not very good, but it's a maze router for connecting some things. So there are some things that are a little more automated, but make it a little more manageable. And we always sacrifice area for portability is our key. We want it to be portable. The layout's not very dense in a lot of cases.

  3. Yes, exactly. It's a question of how much more area efficient your generator will be when at one kilobyte, as some of your normal design people say, yeah, just use the RTL synthesis and if you have enough chip area, this will at least be in one flow working and not much area. If you compare your generator with using a standard full flow for the same sizes of SRAM.

    So if you were to use a 1k, like flip-flop or latch-based RAM, we're probably like 4x smaller. It's considerable. Once you're above a couple hundred bits, we're a savings. Compared to a commercial compiler of memories, I would say we're 30% worse, that ballpark. There's a lot of improvement needed there. But again, our goal has always been portability and productivity. And then we have that on the horizon to go back for density and layout, but that's kind of a secondary goal still. I'm always looking for help though. If people want to help with that, that'd be good.

  4. Yeah, you go ahead. Thank you. It's a follow-up question on Python. So this is a real C++ for this type of work, but have you ever ran into a moment of aha, so maybe we should not choose Python because of the performance problem or not?

    So I would say the only reason I've said aha, we shouldn't have chosen Python is because it's horrible at object-oriented design. And we started the project in Python 2. Whatever it was, way back actually, quite a long time ago. And Python's evolved over time as well. And we didn't necessarily pick up on a lot of the design practices early on, which they happened after we started the project. So like naming schemes to help with object-oriented and stuff like that. You know, the PD, whatever they call it, the Python suggestions, the design suggestions. So that's the only reason I've said I reconsidered, would reconsider Python as that. How it can abstract and so on. But that's not a fundamental limitation. And we've been revising it over time, so I think it keeps getting better and better.

  5. Is ECC on your roadmap?

    So we, that's a good question. ECC, we support extra rows and columns already. And we have, I had a student do a master's project where we do a soft Verilog wrapper to do the self-test and repair. So you have extra glue logic that gets synthesized, but we, and we have redundant rows and columns. Yes.

  6. Are there any roadblockers or the opposite, which is like avenues that you see where you can wrap up a big set?

    Yeah, my thoughts are changing how we think about memories in design is a big thing. Like right now, the common thing is for designers just to instantiate a memory and be like, I need this much memory. That's a bad approach. Like it should be more of a synthesis type approach to that. I think interfacing with the high-level tools like OpenROAD, Yosys, that's where there's a lot of potential. And it's not really possible with a lot of the commercial or proprietary compilers because you don't have as much flexibility, so.

  7. So how about in the, you know, the hardware, how about non-CC cell, like AP, AP, that, you know, especially for processing the memory?

    Yeah, so we intentionally wrote it that the type of bit cell doesn't really matter. It, it does rely a little bit on differential signaling. So if you went to a single-ended, you'd have to change your sense-amp scheme. There are probably some stuff you'd have to fix, but our intent was that you would be able to change your cell. And we've written it that it's very flexible in that. It's also very flexible in, for example, like your decoder. You can override our default decoder and make your own. And it's intended to very, be very, you know, modifiable in that way.