0% found this document useful (0 votes)
19 views93 pages

Final Report

Uploaded by

Ding Dong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views93 pages

Final Report

Uploaded by

Ding Dong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 93

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

Jnana Sangama, Belagavi, Karnataka_590014

Project Report
On

DESIGN AND IMPLEMENTATION OF DDR FOR 32-BIT RISC V


PROCESSOR

Submitted in partial fulfilment of the requirements for the reward of the degree of

Bachelor of Engineering
in
Electronics & Communication
Submitted by

SAMARTH R BHARADWAJ 1BI20EC126


SACHIN BHASKAR S 1BI20EC121
SAGAR G P 1BI20EC123
SIDDESH M G 1BI20EC142

Under the Guidance of


Dr. KALPANA A.B
Professor
Dept. of ECE, BIT

Department of Electronics & Communication Engineering


BANGALORE INSTITUTE OF TECHNOLOGY
K. R. Road, V.V Puram, Bengaluru - 560004
2023-2024
BANGALORE INSTITUTE OF TECHNOLOGY
K.R. Road, V. V Puram, Bengaluru -560004
Phone: 26613237/26615865, Fax:22426796
www.bit-bangalore.edu.in

Department of Electronics and Communication Engineering

CERTIFICATE

Certified that the project work entitled “Design and Implementation of DDR For 32-bit RISC V
Processor” by SAMARTH R BHARADWAJ USN: 1BI20EC126, SACHIN BHASKAR S
USN: 1BI20EC121, SAGAR G P USN: 1BI20EC123, SIDDESH M G USN: 1BI20EC142
bonafide students Bangalore Institute Of Technology in partial fulfillment for the award of
Bachelor of Engineering in Electronics and Communication Engineering of the
Visvesvaraya Technological University, Belgaum during the year 2023-2024. It is certified that
all corrections/suggestions indicated for Internal Assessment have been incorporated in the Report
deposited in the departmental library. The project report has been approved as it satisfies the
academic requirements in respect of Project work prescribed for the said Degree.

Signature of Guide Signature of HOD Signature of Principal

Dr. KALPANA A B Dr. HEMANTH KUMAR A.R Dr. ASWATH M U


Professor, Professor & HOD, Principal, BIT.
Dept. of ECE, BIT. Dept. of ECE, BIT.

External Viva
Name of the examiners Signature with date

1.

2.
ACKNOWLEDGEMENT

We take this opportunity to express our sincere gratitude and respect to the
Bangalore Institute of Technology, Bangalore for providing us an opportunity to carry
out final project.

We express our sincere regards and thanks to Dr. KALPANA A B, Professor,


Department of Electronics & Communication Engineering, BIT, Bangalore for giving
necessary advices and guidance. Her incessant encouragement and valuable technical
support have been of immense help in realizing this project. Her guidance gave us the
environment to enhance our knowledge, skills and to reach the pinnacle with sheer
determination, dedication and hard work.

We would like to thank Dr. MUKTHI S.L, Prof. GAHAN A V, Prof.


HITHAISHI.P Assistant Professor and Project Coordinators, Department of Electronics
& Communication Engineering, BIT, Bangalore.

We express our sincere regards and thanks to Dr. HEMANTH KUMAR A.R,
Professor and HOD, Electronics & communication Engineering, BIT for his valuable
suggestions.

We immensely thank Dr. ASWATH M U, Principal, BIT, Bangalore for


providing excellent academic environment in the college.

We also extend our thanks to the entire faculty of the Department of ECE, BIT,
Bangalore, who have encouraged us throughout the course of bachelor degree.

SAMARTH R BHARADWAJ 1BI20EC126


SACHIN BHASKAR S 1BI20EC121
SAGAR G P 1BI20EC123
SIDDESH M G 1BI20EC142
ABSTRACT

The microprocessor industry has historically been dominated by complex proprietary technologies with
restrictive licensing, but there is a shift towards freely available, open-source alternatives like the RISC-V
Instruction Set Architecture (ISA). RISC-V is a license-free, modular, and extensible option that has gained
significant popularity across a wide range of applications, from microcontroller chips to supercomputing
initiatives. The RISC-V ISA provides designers the freedom to develop their own custom processors using
open-source or commercial resources as a starting point, and its modularity allows for a high degree of
customization, as users can choose to implement only the extensions they need, reducing the complexity
and power consumption of their hardware. The RISC-V architecture has been extensively tested and
verified, and its simplicity reduces the risk of bugs and vulnerabilities. The impact of RISC-V on the
processor market could be significant, as it has the potential to lower costs, increase innovation, and foster
competition in an industry traditionally dominated by a few key players. For hardware developers, RISC-
V offers the freedom to customize their processors to meet their specific needs, without the licensing fees
and restrictions associated with proprietary ISAs. For software developers, RISC-V offers a stable target
for software development, as the base ISA is frozen, meaning it will not change in future versions. Looking
ahead, the future of RISC-V appears promising, with a growing community of developers and a wide range
of applications, from microcontrollers to supercomputers, RISC-V is poised to continue its growth and
impact on the processor market.

A 5-stage pipelined RISC-V processor has been developed and implemented with DDR flip-flops,
enhancing efficiency by reducing the number of clock cycles per instruction with DDR, with minimum
operating frequency of clock decreased from 221.047Mhz to 104.093Mhz i.e., 53% improvement. This
reduced operating frequency led to a significant decrease in dynamic power from 0.198W to 0.001W
leading to changes of 99.9% reduction in supply power. Due to this reduction the Total supply power is
reduced from. .281W to .083W i.e. 99.7% reduction. However, as we have traded the Power and clock
cycles reduction with Area to improve existing design performance there is proportional amount of
increase in Area with increase that is from 244 Number of LUTs to 402 Number of LUTs in proposed
design. This processor design, realized using Verilog HDL and simulated in Xilinx Vivado, presents a
promising alternative to conventional proprietary microprocessor technologies, offering the advantages of
open-source customization and lower entry barriers.
LIST OF CONTENTS

CHAPTER CONTENTS PAGE NUMBER

CHAPTER 1: INTRODUCTION 1-5


1.1 Introduction to Microprocessor 1
1.2 Evolution of Microprocessor 2
1.3 Types of Microprocessors 3
1.4 Comparison between CISC and RISC 4
1.5 Problem Statement 5
1.6 Objectives 5
CHAPTER 2: LITERATURE SURVEY 6-8

CHAPTER 3: METHODOLOGY 8-22

3.1 Existing Methodology 10


3.2 Proposed Methodology 11
3.3 Double Data Rate Flip Flop 12
3.3.1 Structural Model of DDR Flip Flop 12
3.3.2 Construction of DDR 12
3.3.3 Working of DDR 13
3.4 Stages in RISC V Processor 14
3.5 Instruction Memory 16
3.5.1 Existing Instruction Memory 16
3.5.2 Proposed Instruction Memory 16
3.6 Control Unit 17
3.6.1 Existing Control Unit 17
3.6.2 Proposed Control Unit 18
3.7 Hazard Unit 18
3.7.1 Existing Hazard Unit 18
3.7.2 Proposed Hazard Unit 18
3.8 Data Memory 19
3.8.1 Existing Data Memory 19
3.8.2 Proposed Data Memory 19
3.9 Register File 20
3.9.1 Existing Register File 20
3.9.2 Proposed Register File 21
3.10 Pipelined Registers 21
3.10.1 Existing Pipelined Registers 21
3.10.2 Proposed Pipelined Registers 22
CHAPTER 4: RISC V INSTRUCTION SET ARCHITECTURE 23-40
4.1 RISC V 23
4.1.1 R type RV32I Instruction format 24
4.1.2 I type RV32I Instruction format 25
4.1.3 S type RV32I Instruction format 26
4.1.4 B type RV32I Instruction format 27
4.1.5 U type & J type RV32I Instruction format 28
4.2 Instruction supported 30
4.2.1 Arithmetic Operations 31
4.2.2 Logical Operations 32
4.2.3 Data Transfer Operations 33
4.2.4 Control Transfer Instructions 34
4.3 Five Stage RISC Pipeline Processor 35
4.4 Elimination of Pipeline Hazards 36
4.4.1 Elimination of Structural Hazards 36
4.4.2 Elimination of Data Hazards 36
4.4.3 Elimination of Control Hazards 36
4.5 DDR (Double Data Rate) 37
4.5.1 Benefits of DDR 37
4.5.2 DDR flip flop 38
CHAPTER 5: IMPLEMENTATION 41-55
5.1 Implementation of proposed methodology 41
5.2 Simulation Results 50
5.2.1 DDR without enable and reset 50
5.2.2 DDR with enable and reset 50
5.2.3 DDR with only reset 51
5.2.4 Output Waveform for Addition Instruction 52
5.3 Synthesis Results 53
5.3.1 Timing Report 53
5.3.2 Power Report 54
5.3.3 Area Report 54
CHAPTER 6: CONCLUSION AND FUTURE SCOPE 56-57
6.1 Conclusion 56
6.2 Future Scope 56
LIST OF FIGURES
1.1 Block diagram of Microprocessor 1
3.1 Existing RISC V datapath architecture 10
3.2 Flowchart of proposed methodology 11
3.3 Structural model of DDR Flip Flop 12
3.4 DDR Simulation Results 13
3.5 5 stages of RISC V processing 15
3.6 Instruction Memory 16
3.7 Instruction Memory clocking with DDR 17
3.8 Control Unit 17
3.9 Hazard Unit 18
3.10 Data Memory 19
3.11 Data Memory sensitive to both edges of clock 20
3.12 Register file 20
3.13 Register file with DDR 21
3.14 Pipeline Registers in existing datapath of RISC V 21
3.15 Pipeline Registers with DDR 22
4.1 R-Type RV32I V 2.0 Instruction Format 24
4.2 RISC-V instruction formats 24
4.3 I-Type RV32I V 2.0 Instruction Format 25
4.4 Decoding an I-type Instruction 25
4.5 S-Type RV32I V 2.0 Instruction Format 26
4.6 Decoding an S-type Instruction 26
4.7 B-Type RV32I V 2.0 Instruction Format 27
4.8 Decoding a B-type Instruction 27
4.9 J-Type RV32I V 2.0 Instruction Format 28
4.10 Decoding a J-type Instruction 28
4.11 U-Type RV32I V 2.0 Instruction Format 28
4.12 Decoding a U-type Instruction 29
4.13 Pipelining scheduling 35
4.14 Conventional dual edge triggered flip flop 38
4.15 Explicit pulsed dual edge flip flop 39
4.16 Implicit pulsed dual edge flip flop 40
5.1 Control Unit module 41
5.2 Hazard Unit 42
5.3 Datapath Unit 43
5.4 DDR flip flop 44
5.5 DDR flip flop with only reset as control signal 45
5.6 DDR flip flop with enable and reset as control signal 45
5.7 Conventional flip flops of fetch stage 46
5.8 DDR flip flops of fetch stage after replacement 46
5.9 Testbench Integration 49
5.10 Simulation of DDR without enable and reset 50
5.11 Simulation of DDR with enable and reset 51
5.12 Simulation of DDR with only reset 51
5.13 Simulated waveform of addition instruction 52
5.14 Existing Timing Report 53
5.15 Proposed Timing Report 53
5.16 Existing Power report 54
5.17 Proposed Power report 54
5.18 Existing Area report 54
5.19 Proposed Area report 55
6.1 Rise in Trend of RISC V 57
6.2 Practical application of RISC V processor 57
LIST OF TABLES

1.1 Comparison between CISC and RISC 4


3.1 Comparison between normal and DDR flip flop 13
4.1 Arithmetic Operations 31
4.2 Logical Operations 32
4.3 Data Transfer Operations 33
4.4 Control Transfer Operations 34
5.1 Comparison of synthesis results 55
Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

CHAPTER 1

INTRODUCTION

1.1 INTRODUCTION TO MICROPROCESSOR

A central processor unit is referred to as a microprocessor when it consists of just one integrated circuit.
It can handle several instructions at once thanks to its millions of transistors and electronic components. All
of this is contained on a single silicon chip that supports the computer system with memory and other unique
capabilities. It can be programmed to read binary instructions from memory, carry out the operation, and
provide the desired result. It is helpful for concurrent data transmission and receiving, device interaction,
and data saving.

Transistors, registers, and diodes are just a few of the many parts that make up a microprocessor
and work together to complete tasks. As technology has advanced, chip capabilities have grown
increasingly sophisticated. Better functionality and faster speed have been achieved. These days, most
devices require a microprocessor in order to operate. It is the component that gives a gadget
intelligence. Every device, whether it be a computer or a smartphone, need an interface to manage data,
which a microprocessor alone can supply. Furthermore, there is still a long way to go in the
advancement of artificial intelligence.

Fig 1.1 Block diagram of Microprocessor

Department of Electronics and Communication Engineering, BIT P a ge |1


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

1.2 EVOLUTION OF MICROPROCESSOR

1. First Generation – 4bit Microprocessors

The first microprocessor generation was released by the Intel corporation in 1971. Specifically,
these were Intel 4004 4-bit processors. With 60k instructions per second, the processor ran at a speed
of 740 kHz. It was constructed with 16 pins and 2300 transistors. Simple arithmetic and logical
processes could be performed with it, as it was constructed on a single chip. To interpret the commands
from memory and carry out the tasks, there was a control unit.

2. Second Generation – 8bit Microprocessor

Intel introduced the first 8-bit microprocessor in 1973, marking the start of the second generation.
For arithmetic and logic operations on 8-bit words, it was helpful. With a clock speed of 500 kHz and
50k instructions per second, the 8008 was the first processor. In 1974, an 8080 microprocessor with a
2 MHz speed and 60k instructions per second came next. The 8085 microprocessors, which could
process up to 769230 instructions per second at a speed of 3 MHz, was the last to arrive in 1976.

3. Third Generation – 16bit Microprocessor

In 1978, the third generation of microprocessors debuted with the 8086-88, which had a speed
of 4.77, 8, and 10 MHz and a capacity of 2.5 million instructions per second. Other notable inventions
were the Zilog Z800 and the Zilog 80286 (released in 1982, capable of reading 68 pin instructions at a
rate of 4 million per second).

4. Fourth Generation – 32bit Microprocessors

Around 1986, a number of companies released 32-bit microprocessors, although Intel remained
the market leader. With 275k transistors inside, their clock speed ranged from 16 MHz to 33 MHz. The
Intel 80486 microprocessor, which had 1.2 million transistors, 16–100 MHz clock speed, and 8 KB of
cache memory, was one of the earliest. In 1993, the PENTIUM microprocessor was released, including
a clock speed of 66 MHz and an 8-bit cache memory.

Department of Electronics and Communication Engineering, BIT P a ge |2


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024
5. Fifth Generation – 64bit Microprocessors

One of the earliest 64-bit processors with a clock speed range of 1.2 GHz to 3 GHz was the
Pentium, which debuted in 1995. There were 64kb of instructions per second and 291 million
transistors.i3, i5, and i7 microprocessors in 2007, 2009, and 2010 in that order of precedence. These
were a few of this generation's salient features.

1.3 TYPES OF MICROPROCESSORS

1. Complex Instruction Set Microprocessors

In order to support the system, CISM may handle orders in addition to other low-level tasks like
downloading and uploading. With only a command, it is also capable of carrying out intricate
mathematical calculations. Their high-quality personal computers are compatible with simpler
compilers. Their instructions consist of multiple clock cycles. Intel 386 and 486, Pentium, and so on
are a few instances.

2. Reduced Instruction Set Microprocessor

Small, targeted commands should be carried out by RISC with excellent optimization and speed.
Simple commands and the same length result in a shorter instruction set. By adding registers, they
decrease memory references. The pipelining that RISC uses causes the fetching and execution of
instructions to overlap. Most of them require one CPU cycle to complete. AMD K6, K7, and other
models are a few examples.

3. Explicitly Parallel Instruction Computing

Combining the greatest qualities of both RISC and CISC processors, EPIC is a hybrid. Without
a set width, they obey commands in parallel. They allow sequential semantics to be used by compilers
to interface with hardware. Intel IA-64 and Itanium are a couple of example

Department of Electronics and Communication Engineering, BIT P a ge |3


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

4. Superscalar Microprocessors

Multiple tasks can be executed simultaneously by the superscalar processor. They are frequently
found in multipliers or ALUs because of their ability to carry multiple commands. To convey
instructions within the CPU, they make use of various operational units.

5. Application Specific Integrated Circuit

ASICs are widely used in personal digital assistants and automobile pollution control systems.
Although they use off-the-shelf gears, their architecture is extremely well described.

1.4 COMPARISION BETWEEN CISC & RISC

Sl. no CISC RISC


1 More than 300 instructions in instruction Less than 100 instructions in instruction
set set
2 CISC focuses on hardware, such as RISC focuses more on software such as
transistors, to execute instructions. codes or compilers to
execute instructions.
3 CISC devices are installed with a RISC devices are embedded with a
microprogramming unit. hardwired programming unit.
4 A CISC processor works with 16 bits to A RISC processor utilizes 32 bits to
64 bits to execute each instruction. execute each instruction.
5 CISC does not support parallelism and RISC processors support instruction
pipelining. As such, CISC instructions are pipelining.
less pipelined.
6 CISC instructions require high execution RISC instructions require less time for
time. execution.
7 In the CISC architecture, the task of In RISC processors, instruction decoding
decoding instructions is quite complex. is simpler than in CISC.
8 Some examples of CISC processors Examples of RISC processors include
include Intel x86 CPUs, System/360, Alpha, ARC, ARM, AVR, MIPS, PA-
VAX, PDP-11, Motorola 68000 family, RISC, PIC, Power
and AMD Architecture, and SPARC.
Table 1.1 Comparison between CISC and RISC

Department of Electronics and Communication Engineering, BIT P a ge |4


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

1.5 PROBLEM STATEMENT

The current processors in the market face challenges in efficiently executing complex instructions, resulting
in increased delay, and reduced overall performance. There is a need to explore alternative solutions to
enhance processor efficiency and address these issues. Introducing RISC-V processors, known for their
simplicity and flexibility, offers a potential solution. However, integrating DDR flip-flops into existing
RISC-V architectures presents technical challenges and requires careful consideration. The problem
statement aims to investigate the feasibility and benefits of replacing existing processors with RISC-V
processors while implementing DDR flip-flops to improve performance and reduce delay in executing
complex instructions.

1.6 OBJECTIVES
 Analysis of the RISC V processor to execute complex instruction.

 Design and implementation of DDR in RISC V processor.

 Evaluation of performance parameter of the proposed processor to reduce the number of clock
cycle to improve the efficiency.

Department of Electronics and Communication Engineering, BIT P a ge |5


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

CHAPTER 2

LITERATURE SURVEY

In order to get a foothold and basic understanding of the idea of our proposed project, we need to
review and analyze previously published technical papers in the domain of RISC V Processor. The list
below presents the details about the major such papers.

[1] Mehrdad Poorhosseini, Wolfgang Nebel, Kim Gruttner "A Compiler Comparison in the RISC-V
Ecosystem" In the context of the RISC-V environment, which is becoming more and more important
for embedded software development, the study compares the GCC and LLVM compilers. GCC has
long been the preferred compiler for embedded systems because of its broad support for RISC-V and
other instruction set architectures. Still, LLVM begs for comparison given its rising popularity and
recent support for RISC-V. The study reveals that LLVM compiles quicker in 88% of experiments,
whereas GCC and LLVM create similar binary sizes in 51%, with GCC winning in 37% of the
experiments and LLVM in 12%. The benchmarking framework evaluates compile time, binary size,
instruction count, and execution time.Remarkably, in 94% of situations, the binary size difference is
within +/- 5%. Similar clock cycles are found in 42% of the studies, according to execution time
analysis, with LLVM winning in 18% of the cases and GCC in 40%. Developers can use these data to
get insight into which compiler to choose based on project demands and optimization objectives. While
both compilers perform similarly in terms of binary size and execution time, LLVM has an advantage
in compilation speed. One of this paper's main contributions is the establishment of a compiler
benchmark approach designed exclusively for RISC-V ecosystem compiler evaluation. Comparing the
performance of the GCC and Clang/LLVM compilers is the main objective of the article. These
compilers' compilation efficiency, code optimization capabilities, and overall performance may be
systematically evaluated and compared with the help of this benchmark approach. In order to help
developers and organizations in the RISC-V community make well-informed decisions, the article uses
standardized benchmarking techniques to shed light on the advantages and disadvantages of the GCC
and Clang/LLVM compilers.

[2] Nguyen My Qui, Chang Hong Lin, and Poki Chen's paper "Design and Implementation of a 256-
Bit RISC-V-Based Dynamically Scheduled Very Long Instruction Word on FPGA" describes a novel
method of creating a very long instruction word (VLIW) microprocessor by utilizing the RISC-V

Department of Electronics and Communication Engineering, BIT P a ge |6


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024
instruction set architecture (ISA). Utilizing basic integer RV32I and extension instruction sets like
RV32M, RV32F, and RV32D, the study focuses on building a 256-bit VLIW hardware. Eight 32-bit
instruction flows, each carrying out a specific operational purpose, make up each 256-bit instruction.
An obstacle in the process of creating VLIW microprocessors for novel instruction sets (ISAs) such as
RISC-V is the lack of specialized compilers. This means that an instruction scheduler must be
incorporated in order to dynamically schedule independent instructions. This method makes use of the
current RISC-V GNU toolchain and avoids the requirement for a specialized RISC-V VLIW compiler.
The suggested architecture, in contrast to conventional VLIW designs, consists of six primary stages:
fetch, data memory, execute, decode, writeback, and instruction scheduler. On a Xilinx Virtex-6 FPGA,
the entire design is validated, synthesized, and implemented, reaching a maximum synthesis frequency
of 83.739MHz. The experimental findings show that when compared to current open-source RISC-V
cores, the suggested RISC-V-based VLIW architecture performs better in terms of average instructions
per cycle. All things considered, the paper offers a thorough approach to developing and putting into
practice a high-performance VLIW microprocessor built on the RISC-V ISA, tackling issues with
compiler support, and making use of pre-existing toolchains for effective hardware implementation.

[3] Aaron Elson Phangestu, Dr. Ir. Totok Mujiono, M.I.Kom and Ahmad Zaini ST, M.T “Five-Stage
Pipelined 32-Bit RISC-V Base Integer Instruction Set Architecture Soft Microprocessor Core in
VHDL”, marks a significant achievement in the realm of open-source Instruction Set Architectures
(ISAs) and microprocessor development. By successfully implementing the core in VHDL and
simulating it using ModelSim, and further synthesizing it using FPGA techniques with Synopsys
Design Compiler, the study showcases the feasibility and effectiveness of utilizing open-source ISAs
for microprocessor creation. The achievement of reaching a maximum frequency of 62.95 MHz
demonstrates the core's efficiency and performance potential. Despite excluding certain instructions
like FENCE, ECALL, and CSR, the core effectively executes the majority of RV32I instructions,
indicating its versatility and compatibility with common instruction sets. Moreover, the study
highlights the potential applications of such a processor core, particularly in embedded digital signal
processing (DSP) applications. Its competitive performance coupled with low resource usage makes it
an attractive choice for scenarios where efficiency and processing power are crucial, such as in
embedded systems and IoT devices.

[4] Srikanth V. Devarapalli, Payman Zarkesh-Ha and Steven C. Suddarth, “A Robust and Low Power
Dual Data Rate (DDR) Flip-Flop Using C-Elements”, introduces a novel dual-edge triggered flip-flop
(DETFF) design called DDR-FF, which aims to address power consumption and delay issues compared
to existing designs like ep-DSFF. DDR-FF leverages direct clock pulses to achieve a notable reduction

Department of Electronics and Communication Engineering, BIT P a ge |7


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024
in power consumption by 32% and an improvement in power-delay product by 41% when compared
to ep-DSFF. This improvement is significant and indicates the potential for DDR-FF to enhance the
efficiency and performance of integrated circuits. One of the key advantages of DDR-FF is its
simplicity, achieved with only 24 transistors. This simplicity not only contributes to reduced power
consumption but also makes DDR-FF robust and suitable for integration into high-performance, low-
power ASIC designs. Additionally, DDR-FF offers several desirable features such as hard-edge
property, low activity factor, and radiation-hardened capabilities. The proposed design represents a
significant contribution to the field of flip-flop design, particularly in the context of low-power and
high-performance ASIC applications. By addressing power consumption and delay concerns while
offering simplicity and robustness, DDR-FF has the potential to improve the overall efficiency and
reliability of integrated circuits. Moreover, its suitability for radiation-hardened applications makes it
applicable in aerospace, military, and other critical environments where resilience to radiation-induced
errors is essential. Overall, the paper's findings demonstrate the feasibility and benefits of adopting
DDR-FF in modern integrated circuit design.

Department of Electronics and Communication Engineering, BIT P a ge |8


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

CHAPTER 3
METHODOLOGY

The objective of incorporating DDR (Double Data Rate) flip-flops into the RISC-V datapath architecture
is to enhance the performance by reducing operation time and increasing efficiency. Traditional flip-flops
operate on a single clock edge, either rising or falling, which can limit the speed at which data can be
processed. By using DDR flip-flops, which are sensitive to both rising and falling edges of the clock signal,
we aim to exploit more opportunities for data processing within each clock cycle.

The methodology involves replacing conventional registers in the datapath with DDR flip-flops to enable
dual-edge sensitivity. This allows for data to be sampled and processed on both the rising and falling edges
of the clock signal, effectively doubling the data transfer rate compared to single-edge flip-flops.

The primary goals of this approach are:

1. Increased Throughput: By utilizing both edges of the clock signal, more data can be transferred and
processed within a single clock cycle, leading to higher throughput.

2. Reduced Latency: With the ability to sample data on both edges of the clock, the latency of critical
datapath operations can be reduced, enhancing overall system responsiveness.

3. Improved Efficiency: By optimizing the datapath architecture to leverage DDR flip-flops, we aim to
achieve better utilization of hardware resources, resulting in improved efficiency and performance.

In summary, the objective of incorporating DDR flip-flops into the RISC-V datapath architecture is to
enhance performance, reduce operation time, and increase efficiency by leveraging dual-edge sensitivity
for data processing.

Department of Electronics and Communication Engineering, BIT P a ge |9


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

3.1 EXISTING METHODOLOGY

Fig 3.1 Existing RISC V Datapath Architecture inside the processor

The RISC-V datapath architecture stands as the cornerstone of RISC-V processors, embodying
fundamental principles of simplicity, efficiency, and scalability. At its core, the architecture is built upon a
clean and streamlined instruction set, designed to execute instructions swiftly and with minimal complexity.

Within the datapath, various components collaborate seamlessly to facilitate instruction execution. The
instruction fetch unit retrieves instructions from memory, while the instruction decode unit interprets their
meaning. The register file serves as a repository for data operands, accessible by the arithmetic logic unit
(ALU) for computation. Additionally, the control unit orchestrates the flow of data and control signals,
ensuring smooth operation throughout the processor pipeline.

The RISC-V datapath architecture represents a paradigm of elegance and efficiency in processor design.
Its modular and streamlined nature enables RISC-V processors to execute instructions with remarkable
speed and efficiency, while also providing flexibility for future innovations. As the RISC-V ecosystem
continues to grow and evolve, the datapath remains a foundational element, driving progress and pushing
the boundaries of what is achievable in the realm of processor architecture .

Department of Electronics and Communication Engineering, BIT P a g e | 10


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

3.2 PROPOSED METHODOLOGY

Fig 3.2 Flowchart of proposed methodology

It is aimed to enhance the sensitivity of sequential flip-flops, latches, and registers within the datapath
architecture by transitioning to DDR (Double Data Rate) flip-flops. This transition involves a systematic
approach, starting with an analysis of performance requirements and identifying target elements for
modification. Design adjustments are made to incorporate DDR functionality, enabling sampling on both
rising and falling clock edges. Thorough verification, integration into the datapath architecture, and
performance evaluation ensure the effectiveness of the transition. Optimization and fine-tuning iterations
are conducted to maximize performance benefits, resulting in an improved datapath architecture poised for
greater efficiency and throughput.

In summary, the methodology involves careful analysis, design modification, verification, integration, and
optimization to transition from sequential elements to DDR flip-flops within the datapath architecture. By
enhancing sensitivity to dual clock edges, this approach aims to unlock performance improvements, leading
to a more efficient and responsive computing system.

Department of Electronics and Communication Engineering, BIT P a g e | 11


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

3.3 DOUBLE DATA RATE FLIPFLOP


3.3.1 Structural model of DDR flipflop

Fig 3.3 Structural model of DDR flipflop

Data transport and memory were completely transformed by Double Data Rate (DDR) technology, which
allowed data to be transmitted on both the rising and falling edges of the clock signal. When compared to
conventional single data rate (SDR) systems, this invention essentially doubled the data transfer rate.
Because DDR memory offers better performance and efficiency, it is commonly employed in current
computing systems. It's especially important in sophisticated processors and high-speed memory interfaces,
because satisfying the demands of contemporary computer activities requires faster data transfer. DDR
technology has, in general, greatly increased computer systems' performance and capacities by speeding
up and streamlining data transfer.
DDR with control signals:
The control signals were introduced in DDR flipflops for combinational data synchronization.
Common control signals like enable and reset are introduced.

3.3.2 CONSTRUCTION OF DDR


 The DDR flipflop is constructed using the two separate flipflops and the 2:1 multiplexer.
 The two types of flipflops are positive edge and negative edge flipflops.
 The output of both flipflop is connected to input of 2:1 Multiplexer.
 Multiplexer input I0 and I1 are connected to positive edge flipflop and negative edge flipflop.
Respectively
 The clock drives the multiplexer through select line thus selecting both edges data accordingly.

Department of Electronics and Communication Engineering, BIT P a g e | 12


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

3.3.3 WORKING OF DDR


The input of both positive and negative edge flipflop is driven simultaneously from top DDR module both
the flipflops samples data according to their respective edges i.e. the positive flipflop samples the data at
positive edge and negative flipflop at negative edge as the output of both flip flops are connected to input
of multiplexer hence the data is available to the multiplexer that are sampled at both edges separately. Since
clock is connected as a select line to the multiplexer the positive edge of clock makes multiplexer to select
positive edge sampled data and the negative edge makes multiplexer to choose negatively sampled data.
Thus, in this way the DDR is modeled.

Fig 3.4 DDR simulation results

Aspect Normal Flip-Flop Double Data Rate (DDR) Flip-Flop


Clock Edge Sensitive to one clock Sensitive to both rising and falling clock edges
Sensitivity edge
Data Transfer Transfers data on one Transfers data on both rising and falling clock edges
Rate clock edge
Clock Lower clock frequency Higher clock frequency compared to normal flip-
Frequency compared to DDR flop due to data being transferred on both clock
edges
Timing Easier to meet timing More stringent timing constraints due to data being
Constraints constraints transferred on both clock edges
Performance Lower performance Higher performance compared to normal flip-flop
compared to DDR due to faster data transfer
Applications Commonly used in Commonly used in high-speed memory interfaces
simpler systems and advanced processors

Table 3.1 comparison between normal and DDR flipflop

Department of Electronics and Communication Engineering, BIT P a g e | 13


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

3.4 STAGES IN RISC V PROCESSOR


In a typical RISC-V processor, the processing of instructions generally follows five key stages:
1. Instruction Fetch (IF):
- The processor fetches the next instruction from memory based on the current program counter (PC)
value.
- The instruction fetched is typically placed into an instruction register for decoding in the subsequent
stage.

2. Instruction Decode (ID):


- The fetched instruction is decoded to determine the operation to be performed and the operands
involved.
- Control signals are generated based on the instruction opcode, specifying the actions to be taken in the
subsequent stages.

3. Execute (EX):
- The decoded instruction is executed in this stage, which involves performing arithmetic, logic, or data
transfer operations.
- For arithmetic or logical operations, the operands are typically sourced from registers or immediate
values, and the result is computed by the ALU (Arithmetic Logic Unit).
- Branch instructions may also be evaluated in this stage to determine whether a branch is to be taken.

4. Memory Access (MA):


- If the instruction involves a memory operation, such as a load or store, it is executed in this stage.
- Data may be read from or written to memory, and addresses may be computed based on the instruction
operands.

5. Write Back (WB):


- The final stage involves writing back the results of the executed instruction to the appropriate
destination.
- For instructions that produce results (e.g., arithmetic operations), the computed values are written back
to registers.
- Control signals may also be updated based on the outcome of the instruction execution, such as updating
the program counter for branch instructions.

Department of Electronics and Communication Engineering, BIT P a g e | 14


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024
These stages represent the typical flow of instruction processing in a RISC-V processor, with each stage
contributing to the overall execution of instructions and manipulation of data. The five stages of instruction
processing in a RISC-V processor form a structured pipeline that efficiently executes instructions with
minimal complexity. Beginning with instruction fetching, each stage contributes to the seamless
progression of instructions through the pipeline, culminating in the completion of instruction execution and
the potential update of program state.

This structured approach not only simplifies the design and implementation of the processor but also
enables high performance and scalability. By dividing the instruction processing into distinct stages, the
processor can execute multiple instructions concurrently, leveraging parallelism to enhance throughput and
efficiency.

Fig 3.5 5 stages of RISC V processor

Department of Electronics and Communication Engineering, BIT P a g e | 15


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

3.5 INSTRUCTION MEMORY

3.5.1 EXISTING INSTRUCTION MEMORY

Fig 3.6 Instruction memory

Instruction memory stands as a fundamental component within the RISC-V datapath architecture, serving
as the repository for program instructions. Its direct integration into the datapath ensures swift access to
instructions, thereby enhancing execution speed and overall efficiency. By storing instructions within the
datapath framework, the instruction memory facilitates sequential fetching, decoding, and execution of
instructions, thus ensuring smooth program flow and optimal performance. As a core element of the RISC-
V datapath, instruction memory plays a vital role in the seamless execution of instructions, contributing
significantly to the efficiency and effectiveness of RISC-V processors.

DRAWBACKS: In the RISC-V datapath, the program counter (PC) efficiently advances to the next
instruction address by incrementing by 4 during each clock cycle. This synchronous operation ensures
consistent and reliable updates to the PC value. With instructions typically aligned at byte boundaries in
memory, this incrementation scheme optimizes memory access and instruction fetching. By swiftly
progressing through the program sequence in this manner, the RISC-V processor achieves efficient
execution of instructions and maintains smooth program flow.

3.5.2 PROPOSED INSTRUCTION MEMORY

By replacing conventional clocking blocks with DDR flip-flops in instruction memory, RISC-V
architectures optimize data throughput and processing speed. DDR flip-flops, capable of sampling data on
both rising and falling clock edges, significantly reduce latency and accelerate instruction fetching and
execution. This enhancement in memory bandwidth utilization not only improves overall system
performance but also enhances responsiveness. Moreover, the strategic integration of DDR flip-flops
ensures seamless compatibility and benefits across the entire RISC-V processor architecture.

Department of Electronics and Communication Engineering, BIT P a g e | 16


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

Fig 3.7 Instruction memory clocking with DDR

3.6 CONTROL UNIT

3.6.1 EXISTING CONTROL UNIT

The control unit within the RISC-V datapath serves as the conductor of instruction execution, generating
essential control signals to coordinate various components efficiently. It interprets instruction opcodes,
determining the sequence of operations required for optimal execution. Through seamless coordination
with the instruction decoder, the control unit facilitates proper data flow between registers, the ALU,
memory, and other functional units. By ensuring synchronization and coordination of datapath operations,
the control unit plays a pivotal role in optimizing performance and executing instructions accurately.

Fig 3.8 Control Unit

DRAWBACKS: Control signals are synchronized to consecutive clock edges in pipelined registers,
ensuring precise timing coordination. This synchronization optimizes the overall timing of operations
within the datapath, enhancing efficiency. It enables seamless orchestration of instruction execution and
contributes to the smooth functioning of the RISC-V processor.

Department of Electronics and Communication Engineering, BIT P a g e | 17


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

3.6.2 PROPOSED CONTROL UNIT


Control signals are synchronized for both edges of clocks in pipelined registers. Pipelined registers gate the
control signals, ensuring sensitivity to each clock edge. This dual-edge synchronization optimizes datapath
operation timing.

3.7 HAZARD UNIT

3.7.1 EXISTING HAZARD UNIT

Fig 3.9 Hazard unit

The hazard detection and resolution unit in the RISC-V pipeline identifies and resolves hazards, including
data, control, and structural issues. By implementing mechanisms like data forwarding or pipeline stalls, it
minimizes their impact, ensuring smooth operation. This unit also manages data flow to maintain coherence
and prevent incorrect data from affecting instruction execution, enhancing pipeline reliability.

3.7.2 PROPOSED HAZARD UNIT

The proposed methodology for enhancing the hazard unit using DDR (Double Data Rate) involves
integrating DDR flip-flops to enable dual-edge sensitivity for hazard detection. This integration allows for
more precise identification of hazards, including data, control, and structural issues. Enhanced hazard
detection mechanisms are implemented, leveraging the increased sensitivity to clock edges, while dynamic
resolution strategies optimize pipeline performance. Thorough evaluation and optimization ensure seamless
integration into the RISC-V pipeline architecture, ultimately improving hazard detection and resolution
capabilities.

Department of Electronics and Communication Engineering, BIT P a g e | 18


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

3.8 DATA MEMORY

3.8.1 EXISTING DATA MEMORY

The data memory component in the RISC-V datapath efficiently stores data accessed by instructions,
facilitating seamless data management. Through load and store operations, it enables smooth manipulation
of data, interfacing with the CPU to ensure efficient data exchange during execution. Additionally, the
inclusion of cache hierarchy optimizes performance by reducing latency, further enhancing the overall
efficiency of data processing within the RISC-V architecture.

Fig 3.10 Data memory

DRAWBACKS: Introduces Propagation Delay: Pipelined registers with single-edge sensitivity may
introduce propagation delays as data moves through pipeline stages. This delay can affect the overall system
throughput and performance, potentially leading to slower execution of instructions.

3.8.2 PROPOSED DATA MEMORY

Introducing DDR instead of pipelined registers for dual-edge sensitivity can enhance data transfer rates by
allowing data to be sampled on both rising and falling edges of the clock signal. This approach potentially
boosts system throughput and performance compared to single-edge sensitive pipelined registers.

Integrating DDR instead of pipelined registers enables dual-edge sensitivity, allowing data sampling on
both rising and falling clock edges. This enhances data transfer rates, potentially boosting system
throughput and performance compared to single-edge sensitive pipelined registers.

Department of Electronics and Communication Engineering, BIT P a g e | 19


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

Fig 3.11 Data memory sensitive to both edges of clock

3.9 REGISTER FILE

3.9.1 EXITING REGISTER FILE

The register file within the RISC-V datapath serves as a repository for a set of general-purpose registers,
essential for data manipulation and temporary storage during instruction execution. Comprising multiple
registers, each capable of holding fixed-size data values, commonly 32 or 64 bits long, it enables fast access
to operands required for arithmetic, logic, and data movement operations. This efficient architecture allows
instructions to swiftly read from or write to specific registers within the file, facilitating the rapid execution
of program instructions.

Fig 3.12 Register file

DRAWBACKS : Data transactions with the register file in the RISC-V datapath are completed within a
single clock cycle, effectively boosting processing speed. This swift operation enables instructions to
promptly access operands stored in the register file, enhancing system efficiency and responsiveness.

Department of Electronics and Communication Engineering, BIT P a g e | 20


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

3.9.2 PROPOSED REGISTER FILE

DDR flip-flops optimize data throughput by sampling on both clock edges, enhancing instruction fetching
and execution efficiency. This dual-edge sensitivity reduces access latency, resulting in faster program
execution and enhanced memory bandwidth utilization. Strategically integrated into the instruction memory
module, DDR flip-flops replace traditional clocking blocks, ensuring seamless compatibility and efficiency
enhancement within the datapath architecture. This systematic upgrade process leads to improved efficiency
and reduced latency, ultimately enhancing overall system performance.

Fig 3.13 Register file with DDR

3.10 PIPELINED REGISTERS

3.10.1 EXISTING PIPELINED REGISTERS

The pipelined registers in the RISC-V datapath involves utilizing sequential storage elements, typically
flip-flops, to stage data throughout the pipeline stages. These registers enable the smooth flow of data and
control signals, facilitating concurrent instruction execution and improving system performance by
reducing overall instruction latency. However, their single-edge sensitivity may present limitations in
achieving optimal throughput and efficiency in modern processor designs.

DRAWBACKS: Pipelined registers with single-edge sensitivity introduce propagation delays as data
progresses through pipeline stages. These delays can have a notable impact on system throughput and
performance, potentially resulting in slower execution of instructions. As data must wait for the next clock edge
to propagate through each stage, the cumulative effect of these delays can hinder overall efficiency and
responsiveness. This highlights the importance of considering alternative approaches, such as DDR flip-flops
with dual-edge sensitivity, to mitigate propagation delays and optimize system performance within the RISC-
V Datapath architecture.

Department of Electronics and Communication Engineering, BIT P a g e | 21


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

Fig 3.14 Pipeline Registers in existing datapath of RISC V

3.10.2 PROPOSED PIPELINED REGISTERS

Integrating DDR (Double Data Rate) flip-flops instead of pipelined registers in the RISC-V datapath introduces
dual-edge sensitivity, enabling data sampling on both rising and falling edges of the clock signal. This approach
holds the potential to significantly enhance data transfer rates, as it allows for more frequent sampling and
processing of data within each clock cycle. By leveraging dual-edge sensitivity, DDR flip-flops can mitigate
the limitations of single-edge sensitive pipelined registers, potentially leading to improved system throughput
and performance. This enhancement in data processing efficiency aligns with the growing demands of modern
computing tasks and contributes to the overall optimization of RISC-V.

Fig 3.15 Pipeline Registers with DDR

Department of Electronics and Communication Engineering, BIT P a g e | 22


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

CHAPTER 4

RISC V INSTRUCTION SET ARCHITECTURE

4.1 RISC V

The RISC-V (RV32I) instruction set, with a fixed length of 32 bits aligned to 32-bit boundaries,
is tailored to serve as a comprehensive compile target supporting modern operating systems. Crafted
to minimize hardware requirements, it adopts a little-endian format where the lowest address holds the
least significant byte of a word.

RV32I, a refined iteration of RISC-V, is optimized for constructing RISC machines, offering
broad support for contemporary operations and functionalities. Featuring 32 general-purpose registers
(reg0 to reg31), with reg0 inherently set to 0, it also includes a user-accessible program counter. This
counter, 32 bits in length, increments at the positive edge of the clock, typically by one in word-
addressable instruction memory configurations.

RISC-V was chosen primarily for its pipeline-friendly nature and efficient resource
consumption, making it appealing for software-centric applications. To enhance processor
performance, techniques such as loop unrolling, and compiler scheduling are utilized for runtime
optimization.

Within the RV32I instruction set, there are six distinct formats: R-type, U-type, I-type, B-type,
J-type, and S-type, each serving specific purposes in instruction encoding and execution.

Department of Electronics and Communication Engineering, BIT P a g e | 23


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

4.1.1 R-type RV32I Instruction Format

Fig 4.1: R-Type RV32I V 2.0 Instruction Format

The Register-type RV32I ISA V 2.0, illustrated in Figure 4.1, comprises six fields. The Opcode
field spans 7 bits, determining the instruction type. Source registers (rs1, rs2) and the destination
register (rd) are denoted by five-bit fields. A 10-bit Function field identifies the operation type.
Supported instructions include add, sub, sltu, sll, xor, and, sra, srl, or, and slt.

Fig 4.2: RISC-V instruction formats

Department of Electronics and Communication Engineering, BIT P a g e | 24


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

4.1.2 I-type RV32I Instruction Format

Fig 4.3: I-Type RV32I V 2.0 Instruction Format

Fig 4.4: Decoding an I-type Instruction

Figure 4.3 depicts the Immediate-type RV32I ISA V 2.0. Similar to the R-type format, the Opcode
width within this format is 7 bits. Source register (rs1) and destination register (rd) are denoted by five-bit
fields. A three-bit function field is utilized to specify the operation type. Additionally, there's a dedicated
12-bit field for holding immediate operands, crucial for immediate data operations.

Instructions supported by this format include jalr, lhu, lw, lb, lbu, lh, srai, srli, slli, slti, addi, andi, ori, xori,
and sltiu. Figure 4.4 illustrates the decoding logic of the I-type Instruction.

Department of Electronics and Communication Engineering, BIT P a g e | 25


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

4.1.3 S-type RV32I Instruction Format

Fig 4.5: S-Type RV32I V 2.0 Instruction Format

Fig 4.6: Decoding an S-type Instruction

Figure 4.5 illustrates the Store-type RV32I ISA v2.0. Similar to the R-type format, the Opcode width is 7
bits. Source registers (rs1 and rs2) are identified by five-bit fields. A three-bit function field specifies the
size of the data to be stored. Additionally, there's a separate 12-bit field for holding the immediate operand,
which, when added to rs1, determines the address where the value from rs2 will be stored.

Instructions supported by this format include sw, sb, and sh. Figure 4.6 displays the decoding logic of the
S-type Instruction.

Department of Electronics and Communication Engineering, BIT P a g e | 26


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

4.1.4 B-type RV32I Instruction Format

Fig 4.7: B-Type RV32I V 2.0 Instruction Format

Fig 4.8: Decoding a B-type Instruction

In Figure 4.7, the Branch-type RV32I ISA V 2.0 is depicted. Similar to other instructions, the Opcode width
is 7 bits. Source registers (rs1 and rs2) are represented by five-bit fields, serving as the basis for comparison
during branching operations. The function field, spanning 3 bits, determines the type of condition to be
evaluated for branching.

A separate space of 12 bits accommodates the immediate operand, which, when a branch is taken, is added
to the program counter. Instructions supported by this format include bne, bltu, blt, bgeu, bge, and beq.
Figure 4.8 illustrates the decoding logic of the B-type Instruction.

Department of Electronics and Communication Engineering, BIT P a g e | 27


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

4.1.5 U-type & J-type RV32I Instruction Format

Fig 4.9: J-Type RV32I V 2.0 Instruction Format

Fig 4.10: Decoding a J-type Instruction

Fig 4.11: U-Type RV32I V 2.0 Instruction Format

Department of Electronics and Communication Engineering, BIT P a g e | 28


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

Fig 4.12: Decoding a U-type Instruction

Figures 3.9 and 3.11 present the U-type and J-type RV32I ISA V 2.0, respectively, both sharing similarities
in their structure. Each comprises two main fields. The Opcode width, spanning 7 bits, serves to distinguish
the type of instruction format. The destination register (rd) is identified by a five-bit field within these
instructions.

Additionally, there's a 20-bit field dedicated to holding immediate operands, crucial for immediate data
operations. However, in the case of J-type instructions, the immediate data undergoes rearrangement before
branching, distinguishing it from other types. Instructions supported by this format include jal, lui, and
auipc, each serving distinct purposes in program control and data manipulation.

Figures 3.10 and 3.12 provide insights into the decoding logic of J and U-type instructions, respectively,
elucidating how these instructions are interpreted and executed within the RV32I ISA framework.

Department of Electronics and Communication Engineering, BIT P a g e | 29


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

4.2 INSTRUCTIONS SUPPORTED


The RISC-V RV32I processor is capable of handling a wide array of tasks, encompassing
arithmetic, logical, data transfer, and control operations. Let's delve into each of these categories:

1) Arithmetic Operations:
These operations involve mathematical computations such as addition, subtraction, multiplication, and
division. They are crucial for manipulating numerical data and performing calculations within the
processor.

2) Logical Operation:
Logical operations entail bitwise manipulations on binary data. These operations include bitwise AND,
OR, XOR, and logical shifts. They are utilized for tasks such as masking, setting or clearing specific
bits, and logical comparisons.

3) Data Transfer Operations:


Data transfer operations involve moving data between different memory locations or between memory
and registers. This includes instructions like load (e.g., lw for loading a word) and store (e.g., sw for
storing a word), which facilitate data movement within the processor and between the processor and
external memory.

4) Control Operations:
Control operations govern the flow of execution within the processor. These operations include
branching instructions (e.g., conditional branches like beq for branching if equal) and jump instructions
(e.g., jal for jumping to a specific address). They enable decision-making and looping structures within
programs.

In executing each of these operations, the processor relies on a series of interconnected stages, such as
instruction fetch, decode, execute, memory access, and write back. Each stage contributes to the overall
execution of an instruction, with dependencies between them ensuring the correct sequencing and
completion of operations. This interdependence underscores the intricate nature of processor operation and
the coordination required for efficient instruction execution. The specific details and behaviors of each
instruction type are documented in tables, providing comprehensive guidance for programmers and
hardware designers alike.

Department of Electronics and Communication Engineering, BIT P a g e | 30


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

4.2.1 Arithmetic Operations

Table 4.1: Arithmetic Operations

Table 4.1 provides a comprehensive list of arithmetic operations that the processor supports. These
operations are executed by the Arithmetic Logic Unit (ALU) during the execution stage of the processor's
pipeline. During execution, arithmetic operations involve two source operands, typically retrieved from
registers, and the resulting value is written back to the register file during the memory write-back stage.

It's important to note that immediate data, which are values directly encoded within the instruction, are
extended to 32 bits before being used in arithmetic operations. This ensures consistency in operand size, as
all operations are performed with respect to 32 bits. In these operations, register 1 always serves as the left-
hand side operand, while register 2 or the immediate data acts as the right-hand side operand.

In essence, arithmetic operations manipulate numerical data using basic mathematical functions such as
addition, subtraction, multiplication, and division. These operations play a fundamental role in data
processing within the processor, facilitating computations required for various tasks and algorithms. The
consistent handling of operand size and the sequential execution of these operations within the processor's
pipeline ensure efficient and reliable arithmetic computation.

Department of Electronics and Communication Engineering, BIT P a g e | 31


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

4.2.2 Logical Operations

Table 4.2: Logical Operations

Table 4.2 presents the assortment of Logical operations that the processor supports. These operations are
executed by the Arithmetic Logic Unit (ALU) during the execution stage of the processor's pipeline. In this
stage, two source operands are utilized for the operation, and the resulting output is subsequently written
back to the register file during the memory write-back stage.

During execution, immediate data are extended to 32 bits, ensuring consistency across all operations, which
are conducted within the framework of 32-bit data. The operations are structured such that Register 1
always serves as the left-hand operand, while Register 2 or the immediate data acts as the right-hand
operand.

In simpler terms, the processor can perform a variety of logical operations, such as bitwise AND, OR,
XOR, and logical shifts, using two input sources. These operations occur within a specific stage of the
processor's operation, and the results are stored back into the processor's registers. Immediate data, when
used, are expanded to 32 bits to maintain uniformity in processing, and the operands are arranged such that
Register 1 is on the left side and Register 2 or immediate data is on the right side.

Department of Electronics and Communication Engineering, BIT P a g e | 32


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

4.2.3 Data Transfer Operations

Table 4.3: Data Transfer Operations

Table 4.3 outlines the data transfer operations supported by the processor. During execution, the ALU
(Arithmetic Logic Unit) handles the address calculation aspect. These operations involve two source
operands, and the resulting data is written back to subsequent stages for memory access. Immediate data,
sign-extended to 32 bits, ensures uniformity in operation, as all computations are performed with respect
to this length.

In these operations, a consistent convention is followed: register 1 is always positioned on the left-hand
side, while register 2 or immediate data occupies the right-hand side. Load operations are executed during
the write-back stage, whereas store operations take place during the memory access stage.

In essence, the table provides a comprehensive overview of how data is transferred within the processor,
detailing the stages involved and the specific procedures for load and store operations. This streamlined
process ensures efficient handling of data movement, contributing to the overall functionality and
performance of the processor.

Department of Electronics and Communication Engineering, BIT P a g e | 33


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

4.2.4 Control Transfer Instructions

Table 4.4: Control Transfer Operations

Table 4.4 provides an overview of the Control Transfer operations supported by the processor. These
operations involve evaluating conditions for branching, a task executed by the Arithmetic Logic Unit
(ALU) during the execution stage of the processor. In this process, two source operands are used, and the
resulting outcome determines whether the branch is taken or not.

Any immediate data involved in these operations are extended to 32 bits, ensuring uniformity in data
handling across the processor. It's important to note that all operations are performed with respect to 32
bits. Additionally, a consistent convention is followed where register 1 is always on the left-hand side,
while register 2 or immediate data is placed on the right-hand side.

The outcome of the branching operation, captured by the taken branch flag, influences the subsequent
instructions' ability to modify the memory and register file of the processor. This mechanism helps maintain
the integrity and coherence of the processor's state during control transfer operations.

Department of Electronics and Communication Engineering, BIT P a g e | 34


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

4.3 FIVE STAGE RISC PIPELINE PROCESSOR

Pipelining is the methodical approach used by processors to retrieve instructions and execute them through
a sequence of stages. It facilitates the organized storage and execution of instructions, optimizing the
efficiency of processing tasks. Sometimes referred to as pipeline processing, this technique streamlines the
flow of instructions through the processor's stages.

In essence, pipelining enables the processor to concurrently handle multiple instructions by breaking down
the execution process into smaller, sequential stages. Each stage focuses on a specific task, such as
instruction fetch, decode, execute, memory access, and write back. As instructions progress through these
stages, new instructions can be fetched, allowing for continuous processing without waiting for the
completion of earlier instructions.

Fig 4.13: Pipelining scheduling

Pipelining is the method of organizing instructions within a processor to facilitate their systematic retrieval
and execution through a pipeline. It enables efficient storage and processing of instructions, often referred
to as processor pipelining. Fig 4.13 illustrates pipelining scheduling, demonstrating how multiple
instructions can simultaneously utilize different stages of the processor, enhancing overall throughput and
efficiency.

Department of Electronics and Communication Engineering, BIT P a g e | 35


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

4.4 ELIMINATION OF PIPELINE HAZARDS

There are possibly three types of hazards that arise in a pipelined processor:-
1) Structural Hazards
2) Data Hazards
3) Control Hazards

4.4.1 Elimination of Structural Hazards:

Hardware Duplication: We can have more than one resource for the same function. For example, having
separate instruction and data memories can eliminate the structural hazard that occurs when an instruction
fetch, and a data operation need to access memory at the same time.
Pipeline Scheduling: Compiler techniques can be used to schedule the pipeline so that simultaneous
resource requests do not occur.

4.4.2 Elimination of Data Hazards:

Operand Forwarding: Also known as data bypassing, it involves forwarding the result of an operation
directly from the functional unit to the pipeline stage where it is needed, without having to write it to a
register and then read it back.
Pipeline Stalls: Also known as pipeline bubbles, these are deliberate delays inserted into the pipeline to
allow time for the data to be written and then read.
Register Renaming: This involves dynamically reassigning the registers used in the program to avoid
hazards.

4.4.3 Elimination of Control Hazards:

Branch Prediction: This involves predicting the outcome of a branch operation and fetching instructions
accordingly. The prediction could be static (always taken or not taken) or dynamic (based on the history of
branch outcomes).
Branch Delay Slots: The compiler or hardware fills the slots following a branch instruction with other
instructions that can be executed whether the branch is taken.
Loop Unrolling: This involves replicating the body of the loop to reduce the number of branches

Department of Electronics and Communication Engineering, BIT P a g e | 36


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

4.5 DDR (Double Data Rate)

Computers and other electronic devices use a sort of memory technology called Double Data Rate (DDR)
Synchronous Dynamic Random Access Memory (SDRAM) to improve performance. Fast data retrieval
and storage are made possible by DDR SDRAM, which operates by synchronizing its operations with the
system clock and enabling random access to any memory location. In order to manage massive volumes of
data effectively, there is an increasing need for more complex and quick memory solutions due to
technological breakthroughs. To satisfy these objectives, DDR memory has developed throughout time,
resulting in cost savings as well as notable gains in speed and storage density. One of the main advantages
of DDR is that, in comparison to conventional memory technologies, it can transmit data twice as fast by
taking use of the clock signal's rising and falling edges. Modern computing systems require DDR memory
because of its crucial role in improving overall system performance and responsiveness through increased
data transfer speed.

4.5.1 Benefits of DDR

 Faster data transfer rates: DDR memory allows for faster data transfer rates than SDR memory
because it can transfer data on the rising as well as the falling edges of the clock signal.
 Increased bandwidth: The quantity of data that can be moved in each amount of time is known as
bandwidth, and it is boosted by the quicker data transfer rates of DDR memory. Because of this,
DDR memory has the capacity to manage larger amounts of data simultaneously, which is very
helpful for high-performance equipment like servers and game consoles.
 Improved power efficiency: Because DDR memory is more power-efficient than SDR memory,
DDR memory-using devices will use less energy and produce less heat. Mobile devices, which must
preserve power to operate for extended periods of time between charges, will find this especially
helpful.
 Higher capacity: There are various DDR memory variants, including DDR1, DDR2, DDR3, DDR4,
and DDR5, with varying capacity specifications. For instance, the maximum capacity of DDR4
memory is 16 GB, but the maximum capacity of DDR1 memory is 1 GB. As a result, devices using
DDR memory have a larger capacity for data storage than those using SDR memory.
 Synchronized with the system clock: Synchronous Dynamic Random Access Memory, or DDR
SDRAM, is another name for DDR memory. It is named such because it is synced with the system
clock.
 Widely used in various applications: Numerous gadgets, such as mobile phones, game consoles,
servers, and personal PCs, use DDR memory. It is a flexible technology with a wide range of

Department of Electronics and Communication Engineering, BIT P a g e | 37


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024
applications, including industrial automation and automotive systems.

4.5.2 DDR flip flop

A flip-flop can function in two modes: standard mode and double data-rate mode. The flip-flop outputs
data on both edges of the applied clock when set to operate in the double data-rate mode. The flip-flop
sends data on the rising or falling edges of the applied clock when it is set to function in the standard mode.
When a flip-flop is in the double data-rate mode, the output data is supplied by the second latch in the
holding mode when it is operating in a sampling mode, and the output data is supplied by the first latch in
the holding mode when the second latch is operating in a sampling mode. Consequently, one of the latches
provides an output data with each rising or falling edge of the clock.

Types of DDR flip flop:


 Conventional dual edge triggered flip flop: The conventional dual edge triggered flip-flop
(DETF) scheme, used in digital circuits, reduces the clock frequency by half by triggering data
transfer on both the rising and falling edges of the clock signal, effectively doubling the data rate
for a given clock frequency. However, this technique has some drawbacks. The DETF design
requires additional circuitry compared to a single edge triggered flip-flop (SETF), resulting in an
increased silicon area, which can be a significant disadvantage in high-density integrated circuits.
Furthermore, the DETF design doubles the load on the data and clock inputs because data is sampled
twice per clock cycle, leading to higher power consumption and slower signal propagation times.
While the DETF design can reduce the clock frequency, the increased load on the data and clock
inputs can offset the power savings achieved from the reduced clock frequency. Therefore, these
factors must be carefully considered when designing high-performance, low-power digital circuits.

Fig 4.14 Conventional dual edge triggered flip flop

Department of Electronics and Communication Engineering, BIT P a g e | 38


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024
 Explicit pulsed dual edge flip flop: Pulsed flip-flops, unlike conventional dual edge triggered flip-
flops (DETF), lack a hard-edge property, making them skew-tolerant and enabling them to borrow
cycle time, which can be advantageous in certain timing scenarios. The general explicit pulse dual
edge flip-flop scheme operates differently from the conventional DETF. Despite halving the clock
frequency, the effective activity factor of the clock distribution network increases because each
clock edge is replaced by two edges (pulse), effectively doubling the number of transitions per clock
cycle. Moreover, the transistor count in the clock system increases to accommodate pulse
generation, leading to an increased load on the clock signal. Consequently, despite the reduction in
clock frequency, the increase in the activity factor and load can lead to an overall increase in power
consumption. Therefore, while pulsed flip-flops offer advantages such as skew-tolerance and the
ability to borrow cycle time, they also present challenges in terms of increased activity factor and
load on the clock signal, potentially increasing power consumption.

Fig 4.15 Explicit pulsed dual edge flip flop

 Implicit pulsed Dual Edge Flip-Flop: In digital circuits, implicit pulse dual-edge flip-flops are a
kind of clocking device that can record information on both the rising and falling edges of a clock
signal. In order to use this strategy, two series devices that receive both the clock and a delayed
clock contained in the logic branch must be used. Implicit pulse dual-edge flip-flops, as opposed to
explicit pulsed systems, can be employed for dynamic logic but may perform worse because of a
deeper nMOS stack. Because pulse generators cannot be shared by flip-flops, their power overhead
is thus larger. Nonetheless, compared to explicit pulsed flip-flops, implicit pulse dual-edge flip-
flops can have benefits like a two-fold reduction in clock dynamic power consumption and a simpler
design with fewer transistors.

Department of Electronics and Communication Engineering, BIT P a g e | 39


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

Fig 4.16 Implicit pulsed dual edge flip flop

Department of Electronics and Communication Engineering, BIT P a g e | 40


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

CHAPTER 5
IMPLEMENTATION

5.1 IMPLEMENTATION OF PROPOSED METHODOLOGY

The following are the steps involved in implementing the above proposed methodology:
Step 1: Development of the control and Hazard units.
Step 2: Development of the Datapath unit.
Step 3 : Design and Verification of the Double data Rate Flipflop along with the control signal.
Step 4: Identifying and replacing sequential Elements with DDR.
Step 5: Developed Testcases of immediate and Register type Instructions and encoded testcases
based RISC 32i type format and incorporated in instruction ROM.

Step 6: Writing the testbench to simulate and verify, also extracted the post synthesis report.

Step1: Development of the control and Hazard units:

 Control Unit:

Fig 5.1 control unit module

The control unit of RISC-V, a modern open-source instruction set architecture (ISA), orchestrates the
execution of instructions within the processor. It decodes instructions, manages data flow, and
coordinates operations, adhering to the RISC philosophy of simplicity and efficiency. This unit is pivotal
in directing the processor's actions for streamlined performance. the control unit consists of the Main
decoder and the ALU control decoder and the control unit is the combinational block. which accepts the

Department of Electronics and Communication Engineering, BIT P a g e | 41


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024
part of instruction decoded as input and produces control signals based on the instruction type and
opcode as an output.
The sub modules of the control units are:
Main decoder: Its decoder unit that decodes the instruction by accepting it as an input and produces the
control signals as an output based on the opcode the opcode is used in case statement to produce the
control signals namely,
branchD: Its control signal associated with the branch logic in Execute stage to branch target Address.
ImmsrcD: Its control signal that specifies the sign extension based on the type of instruction.
jumpD: It’s the control signal that branches the control to target address.
memwrD: It’s the control signal that enables the memory to write.
regwrD: It’s the control signal that enables the register file write.
resultsrcD: It’s the control signal to the Write Back stage mux to select among the registered, memory
and nonregistered output.

ALU control signal generator: It accepts the part of instruction that specifies the ALU function which
involves arithmetic and logical operation and sends the ALU CTRL and ALUSrcD as the output control
signal to the ALU at the execute stage and ALUSrcD to alumux that choose appropriate input for ALU.

 Hazard unit:

Fig 5.2 hazard unit

The pipeline architecture is different than the single cycle that it has flip flops between every stage and
the next one so the data can propagate as one unit, and two 3:1muxes to select the operands for the ALU
in case of the forwarding, and hazard unit to control the conditions of forwarding.

Department of Electronics and Communication Engineering, BIT P a g e | 42


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024
Forwarding: is the case that the source register of an instruction in the execute stage (rs1E , rs2E)
destination register that matches an instruction in the memory or writeback stage (rdM or rdW) It accepts
the different register write at memory and write back stages and produces corresponding Forward AE
and Forward BE signals to corresponding alumux to forward results at write back stages.

Step 2: Development of Datapath unit:

Fig 5.3 Datapath unit

The Datapath is one of the modules that consists of ALU, muxes program counter sign extender and
pipeline registers. This module accepts the inputs from control unit, hazard unit and instruction memory
and processes the data according to the signals of control and hazard units. The Datapath encloses all
5 stages namely fetch, decode, execute, memory and writeback.
The Datapath consist of the following:
 Pipeline registers: The Datapath consists of pipeline registers which are normally flipflops in
between each of five stages namely fetch decode execute memory and write back to avoid
metastability and data support data synchronization.

 Multiplexers: There are 2 to1 and 3 to 1 muxes in fetch and execute stages to choose among different
outputs.

Department of Electronics and Communication Engineering, BIT P a g e | 43


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

 Program counters: The Program Counter (PC) is a register in the processor of a computer that holds
the address of the next instruction to be fetched and executed.
 Register file: The Register File is a high-speed storage area within the processor of a computer that
stores a small set of data called registers. In the RISC-V architecture, the Register File typically
contains a set of general-purpose registers (GPRs), each capable of storing a fixed-size binary data
word, commonly 32 or 64 bits.
 ALU: Performs arithmetic and logical operations on data, such as addition, subtraction, AND, OR
XOR, etc.
 Sign extender: A sign extender is a component in a computer processor that extends the sign bit of a
binary number to fill additional bits. In the context of RISC-V architecture, the sign extender is often
used when dealing with signed integer operations.
 Simple adder: The combinational block that Just adds the four byte or immediate address as part of
the program counter.

Step 3: Design and verification of the DDR flipflop:

Fig 5.4 DDR Flipflop

We designed the DDR flipflop by using two flipflops that one triggers at positive edge and other at
negative edge the output of both flipflops selected by 2:1 mux and clock as the select line. The modeling
of the DDR was in structural modelling.

Department of Electronics and Communication Engineering, BIT P a g e | 44


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024
 DDR with different control signals

Fig 5.5 DDR flipflop with only reset as control signal.

Fig 5.6 DDR flipflop with enable and reset as control signal.

The verification plan for a DDR (Double Data Rate) flip-flop includes testing its functionality along
with control signals along with time requirements, transition times, and data stability under various
clock and input signal conditions. Simulations will validate proper functioning during both rising and
falling clock edges. Timing analysis ensures compliance with DDR specifications, while corner-case
were covered by the testbench with appropriate control signals at specific simulation time.

Department of Electronics and Communication Engineering, BIT P a g e | 45


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

Step 4: Identifying and replacing the sequential units with DDR:

Fig 5.7 Conventional flipflops (under box) of Fetch stage

Fig 5.8 DDR flipflops of Fetch stage After replacement

Next step is to identify the sequential units as of in each stage consists of the pipeline registers in which
are conventional flipflops are replaced with the DDR flipflops the above diagram illustrates the
replacement of the pipeline registers with the DDR flipflops and other constructs remains the same.
Similarly, all other pipeline registers between each stage are replaced with the DDR.

Department of Electronics and Communication Engineering, BIT P a g e | 46


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

Fig 5.8(a) existing implementation


Register file: The above figures show the existing and proposed Register (below) the proposed
implementation of register file allows the data writing to register file at both Edges unlike the existing
one which was used to write to register file only at rising or falling edge of the clock thus the
implemented structure synchronizes the writing of data at both edges due to implementation of the
DDR in register file.

Fig 5.8(b) proposed implementation

Department of Electronics and Communication Engineering, BIT P a g e | 47


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

Step 5: Developed Testcases of immediate and Register type Instructions and


encoded testcases based RISC 32i type format and incorporated in instruction
ROM.
We now built testcases to check the functionality by feeding the required test case in instruction rom.
For testing we made use of the immediate and R type instructions.
The following instructions along with their hex formats used in testing:

We used the reference released by university of Berkeley to encode our instructions the instructions
format is of as follows:

Step 6: Dividing the extracted features into training and testing set:

Where rs1 is the source address rd is the destination address and funct3 is the function used for the
Alu operation and opcode is specify the type of instruction.

Department of Electronics and Communication Engineering, BIT P a g e | 48


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

Step 6: Writing the testbench to simulate and verify, also extracted the post
synthesis report:

Fig 5.9 Testbench Integration

We have made testbench as the top module from there we instantiated the top module which consist of
Datapath, control and hazard units which is in turn connected with instruction rom and data memory
the test bench drives the global signals clock and reset and runs the processor through elaborated design
of instruction rom, top module and data memory.

We used Xilinx ISE power analyzer and synthesizer to synthesize our design and post synthesis. We
extracted our design through Xilinx ISE Design Suite.

Department of Electronics and Communication Engineering, BIT P a g e | 49


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

5.2 SIMULATION RESULTS

5.2.1 DDR without enable and reset


The behavior described indicates that the "q" signal is sensitive to both the rising and falling edges of the
clock signal ("clk"). This dual edge triggering mechanism ensures that the "q" signal is synchronized with
both the positive and negative transitions of the clock signal. By responding to changes on both edges of
the clock, the "q" signal can capture data at twice the rate compared to systems that only
sample on one edge.

Fig 5.10 Simulation of DDR without enable and reset

5.2.2 DDR with enable and reset


The behavior described indicates that the "q" signal is sensitive to both the rising and falling edges of the
clock signal ("clk") when the enable signal ("en") is high and the reset signal ("rst_p") is low. This dual
edge triggering mechanism ensures that the "q" signal can capture data at twice the rate compared to
systems that only sample on one edge of the clock. Additionally, when the reset signal ("rst_p") is high, the
"q" output is forced low, regardless of the state of the other signals. The behavior where "q" remains
unchanged when the enable signal ("en") is low, even with clock transitions, demonstrates the ability to
selectively enable or disable the data capture functionality as needed, providing flexibility in the overall
system design and operation.

Department of Electronics and Communication Engineering, BIT P a g e | 50


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

Fig 5.11 Simulation of DDR with enable and reset

5.2.3 DDR with only reset


The behavior described in the sources indicates that when the reset signal ("rst_p") is high, the output signal
"q" is forced low, irrespective of the clock signal ("clk"). Conversely, when the reset signal is low, the
output signal "q" changes on both the rising and falling edges of the clock signal, ensuring synchronization
with both transitions of the clock. This design ensures that the output signal "q" accurately reflects the input
data changes on both edges of the clock signal, enhancing the reliability and precision of the
synchronization process in the system.

Fig 5.12 Simulation of DDR with only reset

Department of Electronics and Communication Engineering, BIT P a g e | 51


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

5.2.4 Output Waveform for Addition Instruction


The simulation results of the RISC-V processor reveal a well-designed pipeline architecture where the
program counter increments with each clock edge, instruction fetching and decoding occur at every clock
edge, and pipeline registers like rdW, rdM, and rs1E accurately reflect the pipeline stages with
synchronization. The register file contents are updated every half clock cycle, optimizing instruction
execution with minimal clock cycles. This efficient design allows for multiple instructions to progress
through different stages simultaneously, enhancing throughput and performance while ensuring precise
control logic for correct instruction execution in the presence of complex control flow and
data dependencies.

Fig 5.13 Simulated waveform of addition instruction

Department of Electronics and Communication Engineering, BIT P a g e | 52


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024
5.3 SYNTHESIS RESULTS

5.3.1 TIMING REPORT


Existing Timing Report:

Fig 5.14 Existing Timing Report

Proposed Timing Report:

Fig 5.15 Proposed Timing Report

Department of Electronics and Communication Engineering, BIT P a g e | 53


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

5.3.2 POWER REPORT


Existing Power Report:

Fig 5.16 Proposed Power report

Proposed Power Report:

Fig 5.17 Existing Power report

5.3.3 AREA REPORT


Existing Area Report:

Fig 5.18 Existing Area report

Department of Electronics and Communication Engineering, BIT P a g e | 54


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024
Proposed Area Report:

Fig 5.19 Proposed Area report

The proposed architecture enhancement aims to significantly reduce number of clock cycle per operation
and operational time compared to the current RISC-V architecture, with minimum operating frequency of
clock decreased from 221.074Mhz to 104Mhz i.e., 53% improvement. This significant reduction in clock
period directly translates to a substantial increase in the operational speed of the processor, enabling faster
execution of tasks and improved system responsiveness. However, the addition of more multiplexers and
flip-flops in the proposed design leads to an increase in the overall area and power consumption,
highlighting a trade-off between minimum operational time and increased hardware complexity and size.
Property Existing RISC V Proposed RISC V % difference in
Datapath Datapath Architecture synthesis results
Architecture (existing to
proposed)
Minimum time period of 4.523 9.607 52
Clock required (ns)
Maximum clock frequency 221.074 104.039 53
allowed (MHz)
Dynamic power Consumed 0.198 0.001 99.9
(W)
Total Power Consumed (W) 0.281 0.083 97

Total number of cells 3581 1384 58


(Device utilization)

Table 5.1 Comparison of the synthesis results

Department of Electronics and Communication Engineering, BIT P a g e | 55


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024

CHAPTER 6
CONCLUSION AND FUTURE SCOPE

6.1 CONCLUSION
The RISC-V architecture presents a promising solution to address the challenges faced by current
processors in efficiently executing complex instructions. As an open-source, modular, and extensible ISA,
RISC-V offers hardware developers the freedom to customize processors to meet their specific needs,
without the restrictions and licensing fees associated with proprietary architectures. The integration of DDR
flip-flops into the RISC-V processor design is a key enhancement that aims to improve performance and
reduce the delay in executing complex instructions. By leveraging the dual edge triggering of DDR, the
processor can effectively double the data transfer rate, leading to a significant reduction in the number of
clock cycles required for complex instruction execution.

The proposed architecture modifications demonstrate substantial improvement in operational time, with a
notable decrease in the minimum time period of clock from 9.607ns to 4.523ns. However, this comes at the
cost of increased hardware complexity and size due to the addition of more multiplexers and flip-flops
(referring to the Table 5.1). Overall, the RISC-V processor with DDR flip-flops represents a compelling
alternative to conventional proprietary microprocessor technologies, offering the advantages of open-
source customization, lower entry barriers, and enhanced performance and efficiency. As the RISC-V
community continues to grow and evolve, this architecture is poised to reshape the landscape of the
processor market, driving innovation, and fostering competition across a wide range of applications, from
microcontrollers to supercomputing.

6.2 FUTURE SCOPE


The integration of DDR flip-flops for dual-edge detection within RISC-V data paths holds profound
implications across automotive, robotics, healthcare, and industrial automation sectors. In automotive
applications, the project's enhancements facilitate real-time decision-making in advanced driver assistance
systems (ADAS), contributing to safer roads and advancing autonomous driving capabilities. Moreover,
the optimization of vehicle networking and infotainment systems promises enriched multimedia
experiences and seamless communication between vehicle subsystems, paving the way for enhanced user
satisfaction and driving comfort.

In robotics, the project unlocks new frontiers in precision and efficiency. Medical robotics stand to benefit

Department of Electronics and Communication Engineering, BIT P a g e | 56


Design and Implementation of DDR For 32-bit RISC V Processor 2023-2024
significantly from faster data processing, enabling precise surgical interventions, rehabilitation therapies,
and remote medical consultations. Furthermore, agile search and rescue operations, empowered by
advanced robotics, promise to revolutionize disaster response efforts, saving lives and minimizing risks in
critical situations. Additionally, in industrial automation, the project's capabilities offer opportunities for
optimizing manufacturing processes, increasing productivity, and ensuring sustainable production practices
across diverse industries.

Beyond its immediate applications, the project's emphasis on security and privacy underscores its
commitment to responsible innovation. By safeguarding sensitive data collected and processed within
various domains, the project fosters trust and reliability in the adoption of advanced technologies. As
research and development efforts continue to push the boundaries of what is possible, the project's impact
is poised to reshape industries, driving progress and prosperity in the global economy.

Fig 6.1 Rise in Trend of RISC V Fig 6.2 Practical application of RISC V processor

Department of Electronics and Communication Engineering, BIT P a g e | 57


REFRENCES

[1] Aaron Elson Phangestu, Dr. Ir. Totok Mujiono, M.I.Kom, Ahmad Zaini ST, M.T, "Five-Stage Pipelined
32-Bit RISC-V Base Integer Instruction Set Architecture Soft Microprocessor Core in VHDL", 2022.
[2] K. Asanovic and D. A. Patterson, “Instruction sets should be free: The case for RISC-V,” EECS
Department, University of California, Berkeley, Aug 2014.
[3] A. Waterman, Y. Lee, D. A. Patterson, and K. Asanovi, “The RISC-V instruction set manual. Volume
1: User-level ISA, version 2.0,” 2014.
[4] M. Poorhosseini, W. Nebel, and K. Grüttner, “A compiler comparisonin the RISC-V ecosystem,” 09
2020.
[5] N. M. Qui, C. H. Lin, and P. Chen, “Design and implementation of a256-bit RISC-V based dynamically
scheduled very long instruction word on FPGA,” 2020
[6] Srikanth V. Devarapalli, Payman Zarkesh-Ha and Steven C. Suddarth, “A Robust and Low Power Dual
Data Rate (DDR) Flip-Flop Using C-Elements”, 2010
APPENDIX
SOURCE CODE

TOP MODULE OF PROPOSED RISC V DATAPATH


`timescale 1ps / 1fs

module riscc ( input wire clk,


input wire rst,
input wire [31:0] instrF,
output wire [9:0] addr,
output wire [31:0] write_dataM,
output wire memwrM,
output wire [31:0] read_dataM,
output wire [31:0] PCF,
output wire [31:0] instrD
);

wire [1:0] PCsrc ;


wire [1:0] immsrcD ;
wire ALUsrcD ;
wire [1:0] ALUctrlD ;
wire [1:0] resultsrcD ;
wire regwrD , regwrM , regwrW ;
wire [31:0] ALUoutM ;
wire [1:0] forwardAE, forwardBE;
wire [4:0] rs1E , rs2E ;
wire [4:0] rdE ,rdM , rdW ;
wire [4:0] rs1D , rs2D ;
wire jumpD ,jalrD , branchD ;
wire stallF , stallD, flushD , flushE ;
wire resultsrcE0 , PCsrcE0 ;

assign addr = ALUoutM[9:0] ;

datapath u_dp (
.clk(clk),
.rst(rst),
//instr memory inputs
.instrF(instrF),
//data memory inputs
.read_dataM(read_dataM),
//CU inputs
.immsrcD(immsrcD),
.ALUsrcD(ALUsrcD),
.ALUctrlD(ALUctrlD),
.resultsrcD(resultsrcD),
.regwrD(regwrD),
.jumpD(jumpD),
.jalrD(jalrD),
.branchD(branchD),
.memwrD(memwrD),
//hazard unit inputs
.forwardAE(forwardAE),
.forwardBE(forwardBE),
.stallF(stallF),
.stallD(stallD),
.flushE(flushE),
.flushD(flushD),
//CU outputs
.instrD(instrD),

//hazard unit outputs


.rs1E(rs1E),
.rs2E(rs2E),
.rdM(rdM),
.rdW(rdW),
.regwrM(regwrM),
.regwrW(regwrW),
.rs1D(rs1D),
.rs2D(rs2D),
.rdE(rdE),
.resultsrcE0(resultsrcE0),
.PCsrcE0(PCsrcE0),
//instr memory outputs
.PCF(PCF),
//data memory outputs
.ALUoutM(ALUoutM),
.write_dataM(write_dataM),
.memwrM(memwrM)
);

control_unit u_cu (
.opD(instrD[6:0]),
.funct3D(instrD[14:12]),
.funct7_5D(instrD[30]),
//datapath outputs
.immsrcD(immsrcD),
.ALUsrcD(ALUsrcD),
.ALUctrlD(ALUctrlD),
.resultsrcD(resultsrcD),
.regwrD(regwrD),
.jumpD(jumpD),
.jalrD(jalrD),
.branchD(branchD),
//data memory output
.memwrD(memwrD)
);

hazard_unit u_hu (
.rst(rst),
.rs1E(rs1E),
.rs2E(rs2E),
.rdM(rdM),
.rdW(rdW),
.regwrM(regwrM),
.regwrW(regwrW),
//stalling inputs
.rs1D(rs1D),
.rs2D(rs2D),
.rdE(rdE),
.resultsrcE0(resultsrcE0),
//flushing inputs
.PCsrcE0(PCsrcE0),
//forwarding outputs
.forwardAE(forwardAE),
.forwardBE(forwardBE),
//stalling outputs
.stallF(stallF),
.stallD(stallD),
.flushE(flushE),
//flushing outputs
.flushD(flushD)
);

endmodule

//////////////////////////////////////////////////////////////////////
////////////////////////////////
module mux2x1 ( input wire sel , input wire [31:0] in0 , in1 , output
reg [31:0] out );
always@(in1,in0,sel)
begin
if(sel)
begin
out = in1 ;
end
else
begin
out = in0 ;
end
end
endmodule
//////////////////////////////////////////////////////////////////////
////////////////////////////////
module mux3x1 (
input wire [1:0] sel,
input wire [31:0] in0,
input wire [31:0] in1,
input wire [31:0] in2,
output reg [31:0] out );
always@ (sel,in0,in1,in2)
begin
if(sel == 2'b10)
begin
out = in2;
end
else if (sel == 2'b01)
begin
out = in1;
end
else if (sel == 2'b00)
begin
out = in0;
end
else
begin
out = in0;
end
end
endmodule

//////////////////////////////////////////////////////////////////////
//////////////////////////////////////////

module adder ( input wire [31:0] in1, input wire [31:0] in2, output
wire [31:0] out );
assign out = in1 + in2 ;
endmodule

//////////////////////////////////////////////////////////////////////
////////////////////////////////////

module Sign_ext ( input wire [31:7] in, input wire [1:0] opcode,
output reg [31:0] out );
always@(opcode,in)
begin
case(opcode)
2'b00 : out = { {20{in[31]}} , in[31:20] } ; //S-type instruction
2'b01 :out = { {20{in[31]}} , in[31:25] , in[11:7] } ; //B-type
instruction
2'b10 :out = { {20{in[31]}} , in[7] , in[31:25] , in[11:8] , 1'b0} ;
//J-type instruction
2'b11 :out = { {12{in[31]}} , in[19:12] , in[20] , in[30:21] , 1'b0}
;
default : out = 32'hxxxxxxxx ;
endcase
end
endmodule

//////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////

module Reg_file (

input wire clk, input wire [4:0] Addr1, input wire [4:0] Addr2, input
wire [4:0] Addr3, input wire [31:0] wd3, input wire we3, output reg
[31:0] rd1, output reg [31:0] rd2
);

reg [31:0] temp [0:31];


integer i;
//clocked writing
always@(negedge clk)
begin
if(we3)
begin
temp [Addr3] <= wd3;
end
end
//combinational reading
always@ (Addr1,Addr2)
begin
if (Addr1 == 0)
begin
rd1 = 0;
end
else
begin
rd1 = temp [Addr1];
end
if (Addr2 == 0)
begin
rd2 = 0;
end
else
begin
rd2 = temp [Addr2];
end
end
endmodule

//////////////////////////////////////////////////////////////////////
//////////////////////////////////////////

module Alu ( input wire [1:0] ALUctrl , input wire [31:0] A , B ,


output reg [31:0] ALUout, output wire zero );
assign zero = (ALUout == 0)? 1 : 0 ;
always@(ALUctrl,A,B) begin case(ALUctrl)
2'b00: ALUout = A + B ;
2'b01: ALUout = A - B ;
2'b10: ALUout = A & B ;
2'b11: ALUout = A | B ;
default: ALUout = 0 ;
endcase
end
endmodule

//////////////////////////////////////////////////////////////////////
/////////////////////////////
module main_decoder ( input wire [6:0] op, output reg jump,
output reg jalr,
output reg branch, output reg [1:0] immsrc, output reg ALUsrc,
output reg [1:0] ALUop, output reg [1:0] resultsrc, output reg regwr,
output reg memwr
);

always@(*) begin
case(op)

7'b0000011 : //R-type instruction


begin
regwr = 1'b1 ;
immsrc = 2'b00 ;
ALUsrc = 1'b1 ;
memwr = 1'b0 ;
resultsrc = 2'b01 ;
branch = 1'b0 ;
ALUop = 2'b00 ;
jump = 1'b0 ;
jalr = 1'b0 ;
end
7'b0100011 : //sw instruction
begin
regwr = 1'b0 ;
immsrc = 2'b01 ;
ALUsrc = 1'b1 ;
memwr = 1'b1 ;
resultsrc = 2'bxx ;
branch = 1'b0 ;
ALUop = 2'b00 ;
jump = 1'b0 ;
jalr = 1'b0 ; end
7'b0110011 : //R-type instruction
begin
regwr = 1'b1 ;
immsrc = 2'bxx ;
ALUsrc = 1'b0 ;
memwr = 1'b0 ;
resultsrc = 2'b00 ;
branch = 1'b0 ;
ALUop = 2'b10 ;
jump = 1'b0 ;
jalr = 1'b0 ; end
7'b1100011 : //beq instruction
begin
regwr = 1'b0 ;
immsrc = 2'b10 ;
ALUsrc = 1'b0 ;
memwr = 1'b0 ;
resultsrc = 2'bxx ;
branch = 1'b1 ;
ALUop = 2'b01 ;
jump = 1'b0 ;
jalr = 1'b0 ; end

7'b0010011 : //I-type instruction (except jalr)


begin
regwr = 1'b1 ;
immsrc = 2'b00 ;
ALUsrc = 1'b1 ;
memwr = 1'b0 ;
resultsrc = 2'b00 ;
branch = 1'b0 ;
ALUop = 2'b10 ;
jump = 1'b0 ;
jalr = 1'b0 ; end
7'b1101111 : //jal instruction
begin
regwr = 1'b1 ;
immsrc = 2'b11 ;
ALUsrc = 1'bx ;
memwr = 1'b0 ;
resultsrc = 2'b10 ;
branch = 1'b0 ;
ALUop = 2'bxx ;
jump = 1'b1 ;
jalr = 1'b0 ; end
7'b1100111 : //jalr instruction
begin
regwr = 1'b1 ;
immsrc = 2'b00 ;
ALUsrc = 1'b1 ;
memwr = 1'b0 ;
resultsrc = 2'b10 ;
branch = 1'b0 ;
ALUop = 2'b00 ;
jump = 1'b0 ;
jalr = 1'b1 ; end

default :
begin
regwr = 1'bx ;
immsrc = 2'bxx ;
ALUsrc = 1'bx ;
memwr = 1'bx ;
resultsrc = 2'bxx ;
branch = 1'bx ;
ALUop = 2'bxx ;
jump = 1'bx ;
jalr = 1'bx ; end
endcase
end

endmodule
//////////////////////////////////////////////////////////////////////
///////////////////////

module Alu_decoder ( input wire [1:0] ALUop, input wire [2:0] funct3,
input wire funct7_5,
input wire op_5, output reg [1:0] ALUctrl
);

always@(*) begin
case(ALUop)
2'b00 : ALUctrl = 2'b00 ; //adding for lw,sw,jalr
2'b01 : ALUctrl = 2'b01 ; //subtructing for beq,bne

//adding for lw,sw,jalr


2'b10 : //R,I-type instructions
begin
case(funct3)
3'b000 :
begin
if({op_5,funct7_5} == 3'b11) begin
ALUctrl = 2'b01 ; //subtraction for sub
end else
begin
ALUctrl = 2'b00 ; //adding for add,addi
end
end
3'b111 : ALUctrl = 2'b10 ;//anding for and,andi
3'b110 : ALUctrl = 2'b11 ;//oring for or,ori
default : ALUctrl = 2'bxx ;
endcase

end
default : ALUctrl = 2'bxx ; endcase
end
endmodule
//ALU_CTRL
//////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////

module datapath (
//global inputs
input wire clk, input wire rst,
//instr memory inputs
input wire [31:0] instrF,
//data memory inputs
input wire [31:0] read_dataM,
//CU inputs
input wire [1:0] immsrcD,
input wire ALUsrcD,
input wire [1:0] ALUctrlD,
input wire [1:0] resultsrcD,
input wire regwrD,
input wire jumpD,
input wire jalrD,
input wire branchD,
input wire memwrD,
//hazard unit inputs
input wire [1:0] forwardAE,
input wire [1:0] forwardBE,
input wire stallF,
input wire stallD,
input wire flushE,
input wire flushD,
//CU outputs
output wire [31:0] instrD,
//hazard unit outputs
output wire [4:0] rs1E,
output wire [4:0] rs2E,
output wire [4:0] rdM,
output wire [4:0] rdW,
output wire regwrM,
output wire regwrW,
output wire [4:0] rs1D,
output wire [4:0] rs2D,
output wire [4:0] rdE,
output wire resultsrcE0,
output wire PCsrcE0,
//instr memory outputs
output wire [31:0] PCF,
//data memory outputs
output wire [31:0] ALUoutM,
output wire [31:0] write_dataM,
output wire memwrM
);

wire [31:0] resultW ;


wire [31:0] SrcA , SrcB ;
wire [31:0] immextD , immextE ;

wire [31:0] instrE ;


wire jalrE ,branchE , jumpE,regwrE , memwrE , ALUsrcE ;
wire [31:0] rd1D, rd2D ,rd1E, rd2E ;
wire [31:0] ALUoutE , ALUoutW;
wire [31:0] write_dataE ;
wire [1:0] PCsrcE , ALUctrlE;
wire [1:0] resultsrcE , resultsrcM , resultsrcW ;
wire [31:0] read_dataW ;
wire [4:0] rdD ;
wire [31:0] PCnext , PCplus4F , PCplus4D , PCplus4E , PCplus4M ,
PCplus4W, PCtargetE , PCD, PCE ;
wire zero ;
wire zero_new ;
assign zero_new = (instrE[12] && instrE[6:0] == 7'b1100011)? !zero :
zero ;
assign PCsrcE = {jalrE, ((zero_new & branchE) | jumpE) } ;
assign resultsrcE0 = resultsrcE[0] ;
assign PCsrcE0 = PCsrcE[0] ;
assign rs1D = instrD[19:15] ;
assign rs2D = instrD[24:20] ;
assign rdD = instrD[11:7] ;

////////////////////////////
//flip flops between fetch and decode

generate

for (genvar i = 0; i < 32; i = i + 1) begin : INST_LOOP1

ddre u_ff1 (
.clk(clk),
.rst_p(flushD),
.en(~stallD),
.din(instrF[i]),
.q(instrD[i])
);
end
endgenerate

generate

for (genvar i1 = 0; i1 < 32; i1 = i1 + 1) begin : INST_LOOP2

ddre u_ff2 (
.clk(clk),
.rst_p(flushD),
.en(~stallD),
.din(PCF[i1]),
.q(PCD[i1])
);
end
endgenerate

generate

for (genvar i2 = 0; i2 < 32; i2 = i2 + 1) begin : INST_LOOP3

ddre u_ff3 (
.clk(clk),
.rst_p(flushD),
.en(~stallD),
.din(PCplus4F[i2]),
.q(PCplus4D[i2])
);
end
endgenerate
//////////////////////////////
//flip flops between decode and excute
ddr u_ff4 (
.clk(clk),
.rst_p(flushE),
.din(regwrD),
.q(regwrE)
);

ddr u_ff5(
.clk(clk),
.rst_p(flushE),
.din(resultsrcD[1]),
.q(resultsrcE[1])
);
ddr u_ff51(
.clk(clk),
.rst_p(flushE),
.din(resultsrcD[0]),
.q(resultsrcE[0])
);

ddr u_ff6 (
.clk(clk),
.rst_p(flushE),
.din(memwrD),
.q(memwrE)
);

ddr u_ff7(
.clk(clk),
.rst_p(flushE),
.din(jumpD),
.q(jumpE)
);

ddr u_ff8(
.clk(clk),
.rst_p(flushE),
.din(jalrD),
.q(jalrE)
);

ddr u_ff9(
.clk(clk),
.rst_p(flushE),
.din(branchD),
.q(branchE)
);
generate
for (genvar i2 = 0; i2 < 2; i2 = i2 + 1) begin : INST_LOOP333

ddr u_ff6 (
.clk(clk),
.rst_p(flushE),

.din(ALUctrlD[i2]),
.q(ALUctrlE[i2])
);
end
endgenerate

ddr u_ff11(
.clk(clk),
.rst_p(flushE),
.din(ALUsrcD),
.q(ALUsrcE)
);

generate

for (genvar j = 0; j < 32; j = j + 1) begin : INST_LOOP4

ddr u_ff12 (
.clk(clk),
.rst_p(flushE),

.din(rd1D[j]),
.q(rd1E[j])
);
end
endgenerate

generate

for (genvar j1 = 0; j1 < 32; j1 = j1 + 1) begin : INST_LOOP5

ddr u_ff13 (
.clk(clk),
.rst_p(flushE),

.din(rd2D[j1]),
.q(rd2E[j1])
);
end
endgenerate

generate

for (genvar j2 = 0; j2 < 32; j2 = j2 + 1) begin : INST_LOOP6

ddr u_ff14 (
.clk(clk),
.rst_p(flushE),

.din(PCD[j2]),
.q(PCE[j2])
);
end
endgenerate

generate

for (genvar j31 = 0; j31 < 5; j31 = j31 + 1) begin : INST_LOOP11

ddr u_ff15 (
.clk(clk),
.rst_p(flushE),

.din(rs1D[j31]),
.q(rs1E[j31])
);
end
endgenerate

generate

for (genvar j21 = 0; j21 < 5; j21 = j21 + 1) begin : INST_LOOP12

ddr u_ff16 (
.clk(clk),
.rst_p(flushE),

.din(rs2D[j21]),
.q(rs2E[j21])
);
end
endgenerate

generate

for (genvar j11 = 0; j11 < 5; j11 = j11 + 1) begin : INST_LOOP13

ddr u_ff17 (
.clk(clk),
.rst_p(flushE),

.din(rdD[j11]),
.q(rdE[j11])
);
end
endgenerate

generate
for (genvar j3 = 0; j3 < 32; j3 = j3 + 1) begin : INST_LOOP7

ddr u_ff18 (
.clk(clk),
.rst_p(flushE),

.din(immextD[j3]),
.q(immextE[j3])
);
end
endgenerate

generate

for (genvar j4 = 0; j4 < 32; j4 = j4 + 1) begin : INST_LOOP8

ddr u_ff19 (
.clk(clk),
.rst_p(flushE),

.din(instrD[j4]),
.q(instrE[j4])
);
end
endgenerate

generate

for (genvar j5 = 0; j5 < 32; j5 = j5 + 1) begin : INST_LOOP9

ddr u_ff20 (
.clk(clk),
.rst_p(flushE),

.din(PCplus4D[j5]),
.q(PCplus4E[j5])
);
end
endgenerate
////////////////////////////////////////////
//flip flops between excute and memory
ddfr u_ff21(
.clk(clk),
.din(regwrE),
.q(regwrM)
);

ddfr u_ff221(
.clk(clk),
.din(resultsrcE[1]),
.q(resultsrcM[1])
);
ddfr u_ff222(
.clk(clk),
.din(resultsrcE[0]),
.q(resultsrcM[0])
);

ddfr u_ff23(
.clk(clk),
.din(memwrE),
.q(memwrM)
);

generate

for (genvar j32 = 0; j32 < 32; j32 = j32 + 1) begin : INST_LOOP20

ddfr u_ff24 (
.clk(clk),

.din(ALUoutE[j32]),
.q(ALUoutM[j32])
);
end
endgenerate
generate

for (genvar j33 = 0; j33 < 32; j33 = j33 + 1) begin : INST_LOOP21

ddfr u_ff25 (
.clk(clk),

.din(write_dataE[j33]),
.q(write_dataM[j33])
);
end
endgenerate

generate

for (genvar j17 = 0; j17 < 5; j17 = j17 + 1) begin : INST_LOOP23

ddfr u_ff26 (
.clk(clk),

.din(rdE[j17]),
.q(rdM[j17])
);
end
endgenerate

generate

for (genvar j34 = 0; j34 < 32; j34 = j34 + 1) begin : INST_LOOP22

ddfr u_ff27 (
.clk(clk),

.din(PCplus4E[j34]),
.q(PCplus4M[j34])
);
end
endgenerate
//////////////////////////////////////////
//flip flops between memory and writeback
ddfr u_ff28(
.clk(clk),
.din(regwrM),
.q(regwrW)
);

ddfr u_ff291(
.clk(clk),
.din(resultsrcM[1]),
.q(resultsrcW[1])
);

ddfr u_ff292(
.clk(clk),
.din(resultsrcM[0]),
.q(resultsrcW[0])
);

generate
for (genvar k = 0; k < 32; k = k + 1) begin : INS_LOOP22

ddfr u_ff30 (
.clk(clk),

.din(ALUoutM[k]),
.q(ALUoutW[k])
);
end
endgenerate

generate
for (genvar k4 = 0; k4 < 32; k4 = k4+ 1) begin : INS_LOOP1

ddfr u_ff31 (
.clk(clk),
.din(read_dataM[k4]),
.q(read_dataW[k4])
);
end
endgenerate
generate
for (genvar k3 = 0; k3 < 5; k3 = k3 + 1) begin : INST_LOOP40

ddfr u_ff32 (
.clk(clk),
.din(rdM[k3]),
.q(rdW[k3])
);
end
endgenerate

generate
for (genvar k1 = 0; k1 < 32; k1 = k1 + 1) begin : INS_LOOP2

ddfr u_ff33 (
.clk(clk),
.din(PCplus4M[k1]),
.q(PCplus4W[k1])
);
end
endgenerate

generate
for (genvar k2 = 0; k2 < 32; k2 = k2 + 1) begin : INS_LOOP3

ddre u_ff (
.clk(clk),
.rst_p(rst),
.en(~stallF),
.din(PCnext[k2]),
.q(PCF[k2])
);
end
endgenerate
/////////////////////////////////////
mux3x1 u_pcmux (
.sel(PCsrcE) ,
.in0(PCplus4F) ,
.in1(PCtargetE) ,
.in2( {ALUoutE[31:1],1'b0} ),
.out(PCnext)
);

Reg_file u_regf (
.clk(clk),
.Addr1(rs1D),
.Addr2(rs2D),
.Addr3(rdW),
.wd3(resultW),
.we3(regwrW),
.rd1(rd1D),
.rd2(rd2D)
);

Sign_ext u_signext(
.in(instrD[31:7]),
.opcode(immsrcD),
.out(immextD)
);

mux3x1 u_forwardAEmux (
.sel(forwardAE) ,
.in0(rd1E) ,
.in1(resultW) ,
.in2(ALUoutM),
.out(SrcA)
);

mux3x1 u_forwardBEmux (
.sel(forwardBE) ,
.in0(rd2E) ,
.in1(resultW) ,
.in2(ALUoutM),
.out(write_dataE)
);

mux2x1 u_alumux (
.sel(ALUsrcE) ,
.in0(write_dataE) ,
.in1(immextE) ,
.out(SrcB)
);

adder u_adderplus4 (
.in1(PCF),
.in2(32'd4),
.out(PCplus4F)
);

adder u_addertarget (
.in1(PCE),
.in2(immextE),
.out(PCtargetE)
);

Alu u_ALU (
.ALUctrl(ALUctrlE) ,
.A(SrcA) ,
.B(SrcB) ,
.ALUout(ALUoutE) ,
.zero(zero)
);
mux3x1 u_resultmux (
.sel(resultsrcW) ,
.in0(ALUoutW) ,
.in1(read_dataW) ,
.in2(PCplus4W),
.out(resultW)
);

endmodule /// Datapath

//////////////////////////////////////////////////////////////////////
//////////////

module control_unit( //control unit


//instr memory inputs
input wire [6:0] opD, input wire [2:0] funct3D,
input wire funct7_5D,
//datapath outputs
output wire [1:0] immsrcD,
output wire ALUsrcD,
output wire [1:0] ALUctrlD,
output wire [1:0] resultsrcD,
output wire regwrD,
output wire jumpD,
output wire jalrD,
output wire branchD,
//data memory output
output wire memwrD
);

wire [1:0] ALUopD ;

main_decoder u_md (
.op(opD),
.jump(jumpD),
.jalr(jalrD),
.branch(branchD),
.immsrc(immsrcD),
.ALUsrc(ALUsrcD),
.ALUop(ALUopD), //
.resultsrc(resultsrcD),
.regwr(regwrD),
.memwr(memwrD)
);

Alu_decoder u_ad (
.ALUop(ALUopD),
.funct3(funct3D),
.funct7_5(funct7_5D),
.op_5(opD[5]),
.ALUctrl(ALUctrlD)
);
endmodule //control unit

//////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////

module hazard_unit (
//Hazard
//fowarding inputs
input wire rst,
input wire [4:0] rs1E,
input wire [4:0] rs2E,
input wire [4:0] rdM,
input wire [4:0] rdW,
input wire regwrM,
input wire regwrW,
//stalling inputs
input wire [4:0] rs1D,
input wire [4:0] rs2D,
input wire [4:0] rdE,
input wire resultsrcE0,
//flushing inputs
input wire PCsrcE0,

//forwarding outputs
output reg [1:0] forwardAE,
output reg [1:0] forwardBE,
//stalling outputs
output reg stallF ,
output reg stallD,
output reg flushE,
//flushing outputs
output reg flushD
);

always@(*) begin
if( (rs1E == rdM) && regwrM && (rs1E != 0) ) begin
forwardAE = 2'b10 ; end
else if( (rs1E == rdW) && regwrW && (rs1E != 0) ) begin
forwardAE = 2'b01 ; end
else
begin
forwardAE <= 2'b00 ; end

if( (rs2E == rdM) && regwrM &&( rs2E != 0) ) begin


forwardBE = 2'b10 ; end
else if( (rs2E == rdW) && regwrW && (rs2E != 0) ) begin
forwardBE = 2'b01 ; end
else
begin
forwardBE = 2'b00 ; end
end

always@(*) begin
if(rst) begin
stallF = 1'b0 ;
stallD = 1'b0 ; end
else if(( (rdE == rs1D) || (rdE == rs2D) ) && resultsrcE0 ) begin
stallF = 1'b1 ;
stallD = 1'b1 ; end
else
begin
stallF = 1'b0 ;
stallD = 1'b0 ; end
end

always@(*) begin
if(rst) begin
flushD = 1'b0 ; end
else if(PCsrcE0) begin
flushD = 1'b1 ; end
else
begin
flushD = 1'b0 ; end

if(rst) begin
flushE = 1'b0 ; end
else if((( (rdE == rs1D) || (rdE == rs2D) ) && resultsrcE0 ) ||
PCsrcE0) begin
flushE = 1'b1 ; end
else
begin
flushE = 1'b0 ; end
end
endmodule //Hazard Unit

//////////////////////////////////////////////////////////////////////
///////////////////////////

module instr_rom ( input wire [9:0] addr, //inst_rom


output wire [31:0]
read_data
);

reg [31:0] mem [0:63] ; //n-2 for word addressable memory

initial begin
$readmemh("testcasess.txt", mem); end

assign read_data = mem[addr[9:2]] ; //[n-1:2] for word addressable


memory

endmodule
//////////////////////////////////////////////////////////////////////
///////////////////////////

module ffp(input din,clk,rst_p, output reg q);


always @(posedge clk or posedge rst_p)
begin
if(rst_p==1)

q<=0;
// qb<=1;

else

q<=din;
// qb<=~din;

end
endmodule

//////////////////////////////////////////////////////////////////////
//
module ffn(input din,clk,rst_p, output reg q);
always @(negedge clk or posedge rst_p)
begin
if(rst_p==1)

q<=0;
// qb<=1;

else

q<=din;
// qb<=~din;

end
endmodule
//////////////////////////////////////////////////////////////////////
/
module mux2_1(input [1:0]in,input sel,output out);
assign out=(sel)?in[1]:in[0];
endmodule

module ddr(din,rst_p,clk,q);
wire qn,qp;
wire qbn,qbp;
input din,rst_p,clk;
output q;
// assign qb=~q;
ffn f1(din,clk,rst_p,qn);
ffp f2(din,clk,rst_p,qp);
mux2_1 m1({qp,qn},clk,q);
endmodule
/////////////////////////////////////////////////////////////////////

module ffpe(input din,en,clk,rst_p, output reg q);


always @(posedge clk or posedge rst_p)
begin
if(rst_p==1)

q<=0;
// qb<=1;

else
begin
if(en)
q<=din;
// qb<=~din;end
end
end
endmodule

//////////////////////////////////////////////////////////////////////
//
module ffne(input din,en,clk,rst_p, output reg q);
always @(negedge clk or posedge rst_p)
begin
if(rst_p==1)

q<=0;
// qb<=1;

else
begin
if(en)
q<=din;
// qb<=~din;end
end
end
endmodule
//////////////////////////////////////////////////////////////////////
/
module mux2_1e(input [1:0]in,input sel,output out);
assign out=(sel)?in[1]:in[0];
endmodule
/////////////////////////////////////////////////////
module ddre(din,en,rst_p,clk,q);
wire qn,qp;
wire qbn,qbp;
input din,rst_p,clk,en;
output q;
//assign qb=~q;
ffne f1(din,en,clk,rst_p,qn);
ffpe f2(din,en,clk,rst_p,qp);
mux2_1e m1({qp,qn},clk,q);
endmodule
///////////////////////////////////////////////////////

module fifp(input din,clk, output reg q);


always @(posedge clk )
begin
q<=din;
// qb<=~din;

end
endmodule

//////////////////////////////////////////////////////////////////////
//
module fifn(input din,clk, output reg q);
always @(negedge clk )
begin
q<=din;
// qb<=~din;

end
endmodule
//////////////////////////////////////////////////////////////////////
/
module muxf2_1(input [1:0]in,input sel,output out);
assign out=(sel)?in[1]:in[0];
endmodule

module ddfr(din,clk,q);
wire qn,qp;
wire qbn,qbp;
input din,clk;
output q;
// assign qb=~q;
fifn f1(din,clk,qn);
fifp f2(din,clk,qp);
muxf2_1 m1({qp,qn},clk,q);
endmodule
/////////////////////////////////////////////////////////
module data_ram ( input wire clk,
input wire rst,
input wire we,
input wire [9:0] addr,
input wire [31:0]write_data , output wire [31:0] read_data
);

reg [31:0] mem [0:80] ; integer i,j ;

always@(posedge clk,posedge rst) begin


if(rst) begin
for(i=0 ; i < 81 ; i = i+1) begin
mem[i] <= 'h0 ; end
end
else if(we) begin
mem[addr[9:2]] <= write_data ; end
end

always@(negedge clk,posedge rst) begin


if(rst) begin
for(j=0 ; j < 81 ; j = j+1) begin
mem[j] <= 'h0 ; end
end
else if(we) begin
mem[addr[9:2]] <= write_data ; end
end
assign read_data = mem[addr[9:2]] ;
endmodule

TESTBENCH MODULE OF PROPOSED RISC V DATAPATH


module top_RISCV_tb();
reg clk ;
reg rst ;
wire [31:0] instrF ;
wire [9:0] addr;
wire [31:0] write_dataM;
wire memwrM ;
wire [31:0] read_dataM;
wire [31:0] PCF;
wire [31:0] instrD;

//instantiation
riscc u_top (
.clk(clk),
.rst(rst),
.instrF(instrF),
.addr(addr),
.write_dataM(write_dataM),
.memwrM(memwrM),
.read_dataM(read_dataM),
.PCF(PCF),
.instrD(instrD)
);

instr_rom u_ins_rom (
.addr(PCF[9:0]),
.read_data(instrF)
);

data_ram u_data_ram (
.clk(clk),
.rst(rst),
.we(memwrM),
.addr(addr),
.write_data(write_dataM),
.read_data(read_dataM)
);

initial begin
clk = 0 ;
forever #250 clk = ~clk ; //clk with period 500ps
end

initial begin
rst = 1'b1 ;
#500;
rst = 1'b0 ; end

always@(negedge clk) begin


if(memwrM) begin
if((write_dataM == 2 )&& (addr == 96)) begin
$display("time = %0t , write_dataM = %4d , addr = %8d ,testcase1
passed (first sw)", $time , write_dataM , addr) ;
end
else if((write_dataM == 4) && (addr == 92)) begin
$display("time = %0t , write_dataM = %4d , addr = %8d ,testcase2
passed (second sw)", $time , write_dataM , addr) ;
#500;
$stop ;
end
else
begin
$display("time = %0t , write_dataM = %4d , addr = %8d ,testcase1,2
faild", $time , write_dataM , addr) ;

$stop ;
end
end
end

endmodule

TESTCASES
00500113 02728463 FF718393

00020463 00020463 02728463

00000293 008001EF 0471AA23

005203B3 00100113

402383B3 00910133

0471AA23 0221A023

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy