



## About

- Advanced Datapath Architectures
- Low power Al design
- Library
- Methodologies and tools



- Two granted Patents
- Silicon efficiency far beyond that can be achieved
- Without verification overhead
- Without impacting current design methodologies

















# Challenges Addressed













Al chips, also known as Al accelerators or Al processors, are specialized hardware designed to efficiently perform the computations required for artificial intelligence tasks such as machine learning and deep learning.



These chips are optimized for the specific computational patterns and algorithms commonly used in AI workloads, enabling faster and more energy-efficient processing compared to traditional CPUs or GPUs.



Neural Network Cores: The heart of the Al chip, consisting of specialized processing units optimized for matrix multiplication operations, which are fundamental to neural network computations. These cores execute the neural network layers and perform calculations in parallel.





# Challenges in Al

**Complex and resource intensive** 

**Power consumption** 

Latency is a significant challenge

Heterogeneity

Varying levels of computational resources and memory bandwidth complex engineering

**Power Efficiency** 

A critical consideration in AI chips, particularly in mobile and edge computing devices where energy efficiency is essential.

**Memory Bandwidth** 

**Essential for maintaining** performance and efficiency

# Al Algorithms Complexity of



# Challenges Addressed

Innovative design techniques and optimizations at all abstraction levels.

Collaboration between researchers, engineers, and industry stakeholders to develop innovative solutions.

Balancing high performance with low power consumption

Banashree addresses all the Al challenges Best practices for designing, manufacturing, and deploying





#### Standard Cell Libraries – Convenience has come with a cost!

- Limited and legacy architectures used in standard cells
- Not context aware of Applications & Domains
- Primarily focused on Time to Market and meeting Timing
- Interconnect power has become more predominant than cell power

#### Current Low Power Methodology is based on:

- Exploring different drive strength cells.
- By reducing supply (Vdd) & cut-in voltages (Vt).
- By using different materials with different di-electric.
- Swapping low-Vt cells with high-Vt cells: to reduce leakage power (after timing closure)

## Current EDA Tools are based on Delay optimization rather than PPA & TAT

• The industry tools focus on timing betterment.



# Unique Value Proposition

## **Optimal PPA with minimal TAT**

• Up to 25%

#### **EDA TOOL: ARCEL**

Architecture Selection Tool

#### **Custom Library**

- Low Power standard cell library
- Design custom library cells that can optimize the power further by mapping the design to these new standard cell elements

#### **IPs**

• AI, IOT, ML, GPU, DSP

#### **Advantages**

Enables to achieve OPTIMAL PPA with reduced TAT





## **ARCEL EDA Tool**

## **ARCEL EDA TOOL**



- ARCEL Guides Synthesis & PD Tool, based upon the AI design constraints & library, to select best possible Architecture & Std Cells for given Design, Library & Constraints to achieve Optimal Power Performance Area (PPA) & reduced TAT.
- Vendor, Technology Node, Domain, Application & Design agnostic.

#### Constraints

- Enables plethora of New Optimization Corners
- ARCEL analyzes the QOR of the synthesis process to check whether synthesis tool has chosen right architecture & Std Cells (optimal PPA).

#### • Post Processing:

Based on whether the required constraints are met, it will apply Advanced Optimization Algorithms.



## **ARCEL EDA Tool**



## **Advantages**

- Most Optimal PPA
  - Simpler Datapath Architecture at Synthesis Phase
  - Simpler Interconnect Architecture at PD Phase
- Reduced Design Cycle Time (due to faster run time)
  - Faster Synthesis
  - Faster PD due to less Routing Congestion
- No Verification Overhead
- Reduced Interconnect Area, Delay & Power
- Reduced Power (both DP & LP) at Synthesis level itself
- Simple to Implement
- Scalable Solution



# **Custom Library and Benefits**

## **Custom Library**

- Custom Library (with custom data path standard cells complimenting existing library).
- Low Power standard cell library Banashree has designed custom library cells that can optimize the power further, by mapping the Al design to these new standard cell elements.
- Embedded isolation of Sum & Carry paths.
  - Context Specific, Inverter elimination, Minimal Level,
     Interconnect (Routing & Congestion) aware Architectures.
- Simple Plug & Play solution.
- Min Verification Overhead.

### **Benefits**

- .lib of New Custom Library Cells will be back annotated to Synthesis Flow.
- Within one synthesis run, the customer will see the PPA enhancement in their actual design & environment.
- Proprietary architectures are provided for further enhancement of PPA.



## **Our Demonstrations**

Across different Nodes, Fabs, EDA Vendors, Domains & Technologies









40,28

synopsys 40,28

## Domains & Designs



### **EDA Tools**



## Technologies









## Results

| Component | TSMC               | Optimized                   | % Gain     |
|-----------|--------------------|-----------------------------|------------|
| XOR2D1    | A = 1.1  sq units  | A = 0.88  sq units          | A=-12.5    |
|           | T = 0.035  ns      | T = 0.021  ns               | T=40       |
|           | CIP = 36.;2  nW    | CIP = 19.2  nW              | CIP = 47.5 |
|           | NSP= 11.2nW        | NSP= 16.4 nW                |            |
|           | DP = 47.5  nW      | DP = 35.6  nW               | DP= 24.95  |
|           | CLP= 0.102 nW      | CLP= 0.054 nW               | CLP= 46.51 |
| XNOR2D1   | A = 1.1  sq units  | A = 0.88  sq units          | A=12.5     |
|           | T = 0.033  ns      | T = 0.021  ns               | T=-36      |
|           | CIP = 38.7  nW     | CIP = 19.5  nW              | CIP = 49.5 |
|           | NSP= 10.5 nW       | NSP= 16.4 nW                |            |
|           | DP = 49.3  nW      | DP = 41.8  nW               | DP=67.5    |
|           | CLP= 0.10 nW       | CLP= 0.053 nW               | CLP=163.3  |
| AH01D1    | A = 1.683 sq units | A = 1.386                   | A= 2.2     |
|           | T Sum = 0.38 ns    | T Sum = 0.2 ns              | T= 34.5    |
|           | T Carry = 0.17 ns  | T Carry = $0.22 \text{ ns}$ |            |
|           | CIP = 57.1  nW     | CIP = 32.9  nW              |            |
|           | NSP = 19.0  nW     | NSP = 21.0  uW              |            |
|           | DP = 76.3  nW      | DP = 31.0394  uW            |            |
|           | CLP = 0.163  nw    | CLP = 1.6521  uW            | DP= 11.9   |
|           |                    |                             | CLP= 18.8  |
| AD01D1    | A = 588.96         | A = 540.36                  | A= 8.9     |
|           | T = 2.06  ns       | T = 2.06  ns                | T= 0       |
|           | CIP = 116.8271 uW  | CIP = 98.5750uW             |            |
|           | NSP = 28.1215  uW  | NSP=31.3773 uW              |            |
|           | DP = 144.9486 uW   | DP =129.9524uW              | DP= 11.6   |
|           | CLP = 7.8342  uW   | CLP= 6.4654 uW              | CLP= 20    |

#### Note:

CIP = Cell Internal Power. NSP = Net Switching Power. DP = Dynamic Power CLP = Cell Leakage Power. Leading 28-nm Tech Node



## CUSTOMER 1: DATAPATH INTENSIVE DESIGN

## TSMC28nm post Synthesis

#### **CUSTOMER 3: CONTROL PATH INTENSIVE DESIGN**

TSMC16nm Node, post PD Signoff



## **Customer Wins**



## **CUSTOMER 2: ANDES PROCESSOR**GF14nm Node





# Power Optimization - Case Studies

|            |                                     |                                                              |                       | Seattle Comments of the Commen |                                                                                                                           |                                                                                                             |
|------------|-------------------------------------|--------------------------------------------------------------|-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
| Sl.<br>No. | Design                              | Activity                                                     | Technology            | Power, Design Metrics<br>( Taped out )                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | Power, Design Metrics<br>(Improvised                                                                                      | % Improvement                                                                                               |
| 1          | ANDES<br>Processor                  | Power Optimization done using Taped out Synthesis netlist.   | GF 14nm               | Cells: 55747  Cell Area: 29.3 Sq.mm  Critical Timing path in ns: 0.66  Leakage Power ( nw ): 1866  Dynamic Power ( nw ): 13311519                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | Cells: 47211 Cell Area: 25.2 Sq.mm Critical Timing path in ns: 0.66 Leakage Power (nw): 1417 Dynamic Power (nw): 11709605 | Cells: 15.3  Cell Area: 13.9  Critical Timing path in ns: 0  Leakage Power (nw): 24  Dynamic Power (nw): 12 |
| 2          | Sifive<br>RISC – V<br>process<br>or | Power<br>Optimization<br>done using<br>Synthesis<br>netlist. | TSMC 16nm             | Die Area: 22.5 Sq.mm  CTS Power: 1.5 mW  Total Power: 2.903 mW                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | Die Area: 14.4 Sq.mm  CTS Power: 1.4 mW  Total Power: 2.825 mW                                                            | Die Area: 36 CTS Power: 6.7 Total Power: 2.7                                                                |
| 3          | Sifive<br>RISC - V<br>process<br>or | Power Optimization done using Synthesis netlist.             | Lowest<br>Stable Node | Die Area: 7.969 Sq.mm  WNS: -0.057 ns  TNS: -3.3 ns  CTS Power: 0.461 mW  Total Power: 2.264 mW                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | Die Area: 7.969 Sq.mm  WNS: -0.029 ns  TNS: -0.2 ns  CTS Power: 0.417 mW  Total Power: 2.159 mW                           | Die Area: 0% CTS Power: 9.5 Total Power: 4.6                                                                |





# Area Optimization - Case Studies

| Sl.<br>No. | Design          | Activity                                        | Technology            | Power, Design Metrics<br>( Taped out )                                  | Power, Design Metrics<br>(Improvised)                              | % Improvement                              |
|------------|-----------------|-------------------------------------------------|-----------------------|-------------------------------------------------------------------------|--------------------------------------------------------------------|--------------------------------------------|
| 4          | DSP ASIC        | Area Optimization done using Synthesis netlist. | TSMC 28nm             | Std Cell Area: 3.41 Sq.mm  CTS Power: Total Power:                      | Std Cell Area: 3. 32 mm2  CTS Power: 1.4 mW  Total Power: 2.825 mW | Die Area : 2.6 CTS Power : Total Power :   |
| 5          | Networking ASIC | Area Optimization done using Synthesis netlist. | Lowest Stable<br>Node | Std Cell Area: 3.866 Sq.mm WNS: 0.074 ns TNS: 9 CTS Power: Total Power: | Die Area: 3.79404 mm2 WNS: 0.105 ns TNS: 9 CTS Power: Total Power: | Die Area : 1.8  CTS Power :  Total Power : |



# **Expectations**



- Theoretical Demonstration
- PDK Access
- Custom Lib development
- PPA enhancement demonstration
- Flow Optimization based on new Custom Cells
- Replicating to other Nodes
- Commercial Models
  - Consulting for Evaluation
  - IP Licensing per design per node basis





**THANK YOU**