Thursday, January 31, 2008

Design of a 50M gate ASIC..

These are 10% of the problems ASIC designers face in Physical Design and STA of a 50M gate count ASIC.

Some of the problems are:
-------------------------
1. Not many CAD tools exist out there which can handle this design flat atleast in an initial prototyping phase (synthesis + cluster placement) to arrive at the initial logical/physical hierarchies.

2. This leaves us with 50 partitions (assuming an ideal partition size to be 1 Million gates). 1 Million gates is an ideal block size which can be closed Netlist to GDS typically in < 1 day's time frame in current CAD tools. (timing closure + routing + clean LVS and DRC).

50 blocks is very very tough to manage though.

A better tradeoff.......

May be we can have 10 partitions ideally with 5M gates each. But my block run times will be higher and I have to live with those block run times (netlist to GDS might take 3-4 days or more easily).

3. K-way partitioners (k<5,6) work best if the partitions are 5-6. Not sure how good an initial seed partition is going to end up being if we have 50 of them. So I will freeze on a max 10- or worst case 20 partitions.

partitioning has traditionally been reducing pin counts as the main cost function. Any other cost function is EDA Sales innovation.

4. Reality check == Partitioning is not timing/congestion/power driven.

5. There cant be much glue logic at top level and it is a sliced floorplan. This is the simplest floorplan which is possible (minimal glue logic at top).

6. People have designed these chips routinely. But they were a bunch of very smart chip designer's sitting at IBM Microelectronics. (Not monkey's pushing CAD tools)

7. This is a 45nm chip? 200-250 mm2 die size. what is my yield gonna look like? How many times do I keep spinning this thing through the fab?

8. How am I going to reduce leakage on this chip? Power gating? Multi-Vt? V VFS? Some special-K dielectric which the fab is going to employ? All of the above?

9. The complexity is difficult to fathom if the chip has multiple modes of operation (2-3 modes is also very very hard). what if it has multiple power domains? (3).

10. How do I to generate the block level constraints? My netlist is getting generated in a bottom up fashion. Although I would love to push top level constraints down to each of the blocks, I will have a schedule slip of 6 months if I wait for my top level to finish :(

11. How do I plan my block budgeting?

12. The constraints will be quite tough to manage for all modes/corners. How do I clean up the messed up constraints? in each mode/multiple modes? across corners?
Do I use automatic constraint generators? validators? How correct are they going to be? How much time do I waste trying to see if these tools are production worthy?

13. Some of my sub chips have 200+ macro's. Are mixed placers (capable of simultaneous std cell + macro placement) going to help my woes atleast with a good initial seed placement?

14. How do I design my power plan? IR drop and EM limits, routing congestion, adequate de-cap insertion?

15. How the heck do I handle those monsterous ECO's? Reconfigurable filler cells (Metal only eco)?. what spare cell planning will help?

16. How do I do the final timing signoff? Incremental timer updates..how long are they gonna take in STA tools? Do I use ILM's? Across corners and across multiple modes? Tough to manage so many timing models.

Validating the ILM's for correctness is another big challenge.

17. How many clocks does this monster have? Clock tree (CTS) is a nightmare with multiple modes and across corners.

Also what if the clocks transcend across multiple blocks. I need to do proper clock planning early on to avoid the clock from jogging too much across multiple modules running at different voltage levels (and modes).

18. I hope my manager doesnt commit on a final netlist 2-3 week turn around time (netlist to GDS2) to the end customer.

Let me know your thoughts :)

12 comments:

John said...

I sympathize with all these points! Do you have to solve them all yourself?

I'm fortunate to work with a relatively big team to address such questions.
I feel extra sorry for guys at startups trying to tape out these nanometer chips on a shoestring budget.

Nick said...

Hi John,
It's a small team (5) of pretty smart individuals at a big semiconductor company, but unfortunately not everyone in the team (except 1-2 of us) have the experience doing a tapeout of this magnitude and complexity within a 3-4 month schedule.

--Nik

Unknown said...

Hi Nick,

Specifically referring to "monstrous" ECOs, how big can they practically be in your case ? I mean when do you call a changed specification/design correction to be a "monstrous" ECO ?

--Nil

Nick said...

Hi Nilesh,

We have seen industrial level CAD tools choke on eco's > 15-20K gates in size (choking could be related to timing/congestion/drc convergence of the overal design).

Most tools can handle capacity, but thats not the only issue here.

The ECO could be very design specific as it depends quite a bit on where the ECO cells got placed (for timing/congestion reasons) and where the available white space in your design is.

We usually have to figure work arounds or re-do some global optimizations so that they are more aware of some of the design parameters, but this rarely happens in ECO flows in CAD tools by default, as global steps are runtime/memory intensive.

ECO steps are mostly not aware of some or all of the design parameters as they are intended to be very small and quick/dirty optimizations which do not have a global view of your design.

It really depends on what kind of issues the ECO is going to present to the CAD tool of choice.

Many ECO optimization commands in CAD tools might not be congestion aware (for example an incremental route might just fix opens, but might not detour to avoid congestion, as this would need a global view of routing), ECO placement might add cells in congested regions (and create more local congestion).

I would ideally categorize a monsterous ECO as something which in general is going to take a monsterous effort for design closure due to above issues in the CAD tool of your choice and it is going to vary from CAD tool to CAD tool.

Unknown said...
This comment has been removed by a blog administrator.
Nick said...

Hi,
ECO falls under that purview and happens to be one of the problems I am looking at right now from research angle: as in what precisely do current tools/algorithms lack in tackling practical ECOs ? I would love to hear your view on it.

Another question is what do you mean by ECO ? a function change or timing requirements change or the design is giving you too much trouble with power, thermal hotspots (reliability) etc ?

Are these ECOs -- pre-mask or post-mask ?

Thanks,
Nil

Hi Nil,

To address this question..

1. We had functional eco's (change in spec). Some of our memories changed and hence we had to make floor plan and power plan changes to accommodate the new memories as last minute changes.

2. Timing ECO to fix remaining hold violations (other than the one's we fixed in our implementation tool) rolled in as net list edits from a sign off timer like PT.

3. On one of our designs, we taped out base layers and rolled in the ECO as a metal. Here we used reconfigurable filler cells for added functionality as well as fixing additional timing violations.

4. We almost always have a spare cell methodology in place. This needs planning on where and what kind of change we can expect later on in the design.

Almost all the above steps work well if perturbation to the design DB is small. If ECO is huge (>20-25K gates), most incremental optimization in most CAD tools suck.

Things we would like to see in futuristic CAD tools :)

1. RTL ECO

2. Incremental formal verification

3. If No spare cell methodology, we would need an incremental clock router for clocking the added flip flops.

4. There is also a need for better/accurate modeling in incremental timing engine/incremental placer/incremental router for timing/congestion/reliability and power cost functions.

If we split up the existing CAD flows into

1. Global mode and
2. Final mode

Global mode is less accurate/fast/less memory intensive. Almost all initial optimizations post plcement happen in this mode.

Final mode is more accurate/slow/more memory intensive.

ECO's always need to be rolled in on the final mode DB (almost always last minute changes! :)).

This becomes a big pain if the size of the ECO is huge as we need to perform final mode optimizations to close on timing/DRC/reliability/power which can end up being very expensive.

John said...

Have you guys heard about Cadence Conformal ECO? It's a new product they announced -- sounds like the old idea of Synopsys ECO Compiler. Maybe Cadence has done it better. :-)

Unknown said...

Yes, I interned in the same group in Cadence a summer back!

Nick said...

Hi Nil,
It would be nice to hear what the differentiating features are with this particular cadence product compared to the rest of the products out there (synopsys and Magma both have OK ECO flows)

Unknown said...

The differentiation comes at Tech. Mapping support:
http://www.cadence.com/company/newsroom/press_releases/pr.aspx?xml=032508_eco

Unknown said...

Have you guys used ECO routing tools/feature ? I have heard Cadence Nanoroute supports ECO routing...I want to know if it supports a feature using which u can minimize the number of metal layers affected ?

--Nil

Nick said...

Most industrial routers support eco routing. But I am not sure if that facility is there in the routers.

As long as ECO router does not cause any major perturbation to already routed nets (minimal detouring or rip-up/reorute of other nets), I guess it's okay.