Family Manual: 16-Bit Digital Signal Controllers
Family Manual: 16-Bit Digital Signal Controllers
Family Manual: 16-Bit Digital Signal Controllers
Family Manual
DSP56800FM
Rev. 3.1
11/2005
freescale.com
Contents
Chapter 1
Introduction
1.1 DSP56800 Family Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.1.1 Core Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
1.1.2 Peripheral Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
1.1.3 Family Members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5
1.2 Introduction to Digital Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5
1.3 Summary of Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9
1.4 For the Latest Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10
Chapter 2
Core Architecture Overview
2.1 Core Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
2.1.1 Data Arithmetic Logic Unit (ALU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
2.1.2 Address Generation Unit (AGU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
2.1.3 Program Controller and Hardware Looping Unit . . . . . . . . . . . . . . . . . . . . . . . 2-4
2.1.4 Bus and Bit-Manipulation Unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
2.1.5 On-Chip Emulation (OnCE) Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
2.1.6 Address Buses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
2.1.7 Data Buses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
2.2 Memory Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
2.3 Blocks Outside the DSP56800 Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
2.3.1 External Data Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
2.3.2 Program Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
2.3.3 Bootstrap Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
2.3.4 IP-BUS Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
2.3.5 Phase Lock Loop (PLL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
2.4 DSP56800 Core Programming Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
Chapter 3
Data Arithmetic Logic Unit
3.1 Overview and Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
3.1.1 Data ALU Input Registers (X0, Y1, and Y0) . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
3.1.2 Data ALU Accumulator Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
3.1.3 Multiply-Accumulator (MAC) and Logic Unit . . . . . . . . . . . . . . . . . . . . . . . . 3-5
3.1.4 Barrel Shifter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5
3.1.5 Accumulator Shifter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6
3.1.6 Data Limiter and MAC Output Limiter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6
Chapter 4
Address Generation Unit
4.1 Architecture and Programming Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
4.1.1 Address Registers (R0-R3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
4.1.2 Stack Pointer Register (SP). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
4.1.3 Offset Register (N) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
4.1.4 Modifier Register (M01). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5
4.1.5 Modulo Arithmetic Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5
4.1.6 Incrementer/Decrementer Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5
4.2 Addressing Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6
4.2.1 Register-Direct Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7
4.2.1.1 Data or Control Register Direct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7
4.2.1.2 Address Register Direct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7
4.2.2 Address-Register-Indirect Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7
4.2.2.1 No Update: (Rj), (SP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9
4.2.2.2 Post-Increment by 1: (Rj)+, (SP)+ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11
4.2.2.3 Post-Decrement by 1: (Rn)-, (SP)- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-12
4.2.2.4 Post-Update by Offset N: (Rj)+N, (SP)+N. . . . . . . . . . . . . . . . . . . . . . . . 4-13
4.2.2.5 Index by Offset N: (Rj+N), (SP+N) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-14
4.2.2.6 Index by Short Displacement: (SP-xx), (R2+xx) . . . . . . . . . . . . . . . . . . . 4-15
4.2.2.7 Index by Long Displacement: (Rj+xxxx), (SP+xxxx) . . . . . . . . . . . . . . . 4-16
4.2.3 Immediate Data Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-17
4.2.3.1 Immediate Data: #xxxx. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-18
4.2.3.2 Immediate Short Data: #xx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-20
4.2.4 Absolute Addressing Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-20
4.2.4.1 Absolute Address (Extended Addressing): xxxx . . . . . . . . . . . . . . . . . . . 4-21
4.2.4.2 Absolute Short Address (Direct Addressing): <aa> . . . . . . . . . . . . . . . . . 4-22
4.2.4.3 I/O Short Address (Direct Addressing): <pp> . . . . . . . . . . . . . . . . . . . . . 4-23
4.2.5 Implicit Reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-23
4.2.6 Addressing Modes Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-23
4.3 AGU Address Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-25
4.3.1 Linear Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-25
4.3.2 Modulo Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-25
4.3.2.1 Modulo Arithmetic Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-25
4.3.2.2 Configuring Modulo Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-27
4.3.2.3 Supported Memory Access Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . 4-29
4.3.2.4 Simple Circular Buffer Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-29
4.3.2.5 Setting Up a Modulo Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-30
4.3.2.6 Wrapping to a Different Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-31
4.3.2.7 Side Effects of Modulo Arithmetic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-32
4.3.2.7.1 When a Pointer Lies Outside a Modulo Buffer . . . . . . . . . . . . . . . . . 4-32
4.3.2.7.2 Restrictions on the Offset Register . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-32
4.3.2.7.3 Memory Locations Not Available for Modulo Buffers . . . . . . . . . . . 4-33
4.4 Pipeline Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-33
Freescale Semiconductor v
Chapter 5
Program Controller
5.1 Architecture and Programming Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1
5.1.1 Program Counter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3
5.1.2 Instruction Latch and Instruction Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3
5.1.3 Interrupt Control Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3
5.1.4 Looping Control Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4
5.1.5 Loop Counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4
5.1.6 Loop Address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5
5.1.7 Hardware Stack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
5.1.8 Status Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
5.1.8.1 Carry (C) — Bit 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7
5.1.8.2 Overflow (V) — Bit 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7
5.1.8.3 Zero (Z) — Bit 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7
5.1.8.4 Negative (N) — Bit 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7
5.1.8.5 Unnormalized (U) — Bit 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8
5.1.8.6 Extension (E) — Bit 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8
5.1.8.7 Limit (L) — Bit 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8
5.1.8.8 Size (SZ) — Bit 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8
5.1.8.9 Interrupt Mask (I1 and I0) — Bits 8–9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8
5.1.8.10 Reserved SR Bits — Bits 10–14. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9
5.1.8.11 Loop Flag (LF) — Bit 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9
5.1.9 Operating Mode Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-10
5.1.9.1 Operating Mode Bits (MB and MA) — Bits 1–0 . . . . . . . . . . . . . . . . . . . 5-10
5.1.9.2 External X Memory Bit (EX) — Bit 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11
5.1.9.3 Saturation (SA) — Bit 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11
5.1.9.4 Rounding Bit (R) — Bit 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-12
5.1.9.5 Stop Delay Bit (SD) — Bit 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-12
5.1.9.6 Condition Code Bit (CC) — Bit 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-12
5.1.9.7 Nested Looping Bit (NL) — Bit 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13
5.1.9.8 Reserved OMR Bits — Bits 2, 7 and 9–14. . . . . . . . . . . . . . . . . . . . . . . . 5-13
5.2 Software Stack Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13
5.3 Program Looping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-14
5.3.1 Repeat (REP) Looping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-14
5.3.2 DO Looping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
5.3.3 Nested Hardware DO and REP Looping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
5.3.4 Terminating a DO Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-16
Chapter 6
Instruction Set Introduction
6.1 Introduction to Moves and Parallel Moves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1
6.2 Instruction Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3
6.3 Programming Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5
6.4 Instruction Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
6.4.1 Arithmetic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
6.4.2 Logical Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7
Chapter 7
Interrupts and the Processing States
7.1 Reset Processing State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1
7.2 Normal Processing State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2
7.2.1 Instruction Pipeline Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2
7.2.2 Instruction Pipeline with Off-Chip Memory Accesses. . . . . . . . . . . . . . . . . . . 7-3
7.2.3 Instruction Pipeline Dependencies and Interlocks . . . . . . . . . . . . . . . . . . . . . . 7-4
7.3 Exception Processing State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5
7.3.1 Sequence of Events in the Exception Processing State . . . . . . . . . . . . . . . . . . 7-5
7.3.2 Reset and Interrupt Vector Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7
7.3.3 Interrupt Priority Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8
7.3.4 Configuring Interrupt Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8
7.3.5 Interrupt Sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9
7.3.5.1 External Hardware Interrupt Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-10
7.3.5.2 DSC Core Hardware Interrupt Sources . . . . . . . . . . . . . . . . . . . . . . . . . . 7-11
7.3.5.3 DSC Core Software Interrupt Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-11
7.3.6 Interrupt Arbitration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12
7.3.7 The Interrupt Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-14
7.3.8 Interrupt Latency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-16
7.4 Wait Processing State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-17
7.5 Stop Processing State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-19
7.6 Debug Processing State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-22
Chapter 8
Software Techniques
8.1 Useful Instruction Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1
8.1.1 Jumps and Branches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
Chapter 9
JTAG and On-Chip Emulation (OnCE™)
9.1 Combined JTAG and OnCE Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1
9.2 JTAG Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
9.2.1 JTAG Capabilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3
9.2.2 JTAG Port Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3
9.3 OnCE Port. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4
9.3.1 OnCE Port Capabilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
9.3.2 OnCE Port Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
9.3.2.1 Command, Status, and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-7
9.3.2.2 Breakpoint and Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-7
9.3.2.3 Pipeline Save and Restore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-7
9.3.2.4 FIFO History Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-7
Appendix A
Instruction Set Details
A.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1
A.2 Programming Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-5
A.3 Addressing Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-6
A.4 Condition Code Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-6
A.4.1 The Condition Code Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-7
A.4.1.1 Size (SZ) — Bit 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-7
A.4.1.2 Limit (L) — Bit 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-8
A.4.1.3 Extension in Use (E) — Bit 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-8
A.4.1.4 Unnormalized (U) — Bit 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-9
A.4.1.5 Negative (N) — Bit 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-9
A.4.1.6 Zero (Z) — Bit 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-10
A.4.1.7 Overflow (V) — Bit 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-10
A.4.1.8 Carry (C) — Bit 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-10
A.4.2 Effects of the Operating Mode Register’s SA Bit . . . . . . . . . . . . . . . . . . . . A-11
Freescale Semiconductor ix
A.4.3 Effects of the OMR’s CC Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-11
A.4.4 Condition Code Summary by Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . A-12
A.5 Instruction Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-16
A.6 Instruction Set Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-26
A.7 Instruction Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-27
Appendix B
DSC Benchmarks
B.1 Benchmark Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2
B.1.1 Real Correlation or Convolution (FIR Filter) . . . . . . . . . . . . . . . . . . . . . . . . . B-5
B.1.2 N Complex Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-5
B.1.3 Complex Correlation Or Convolution (Complex FIR). . . . . . . . . . . . . . . . . . B-6
B.1.4 Nth Order Power Series (Real, Fractional Data) . . . . . . . . . . . . . . . . . . . . . . B-7
B.1.5 N Cascaded Real Biquad IIR Filters (Direct Form II) . . . . . . . . . . . . . . . . . . B-8
B.1.6 N Radix 2 FFT Butterflies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-10
B.1.7 LMS Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-12
B.1.7.1 Single Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-14
B.1.7.2 Double Precision. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-16
B.1.7.3 Double Precision Delayed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-18
B.1.8 Vector Multiply-Accumulate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-20
B.1.9 Energy in a Signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-21
B.1.10 [3x3][3x1] Matrix Multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-22
B.1.11 [NxN][NxN] Matrix Multiply (for fractional elements). . . . . . . . . . . . . . . . B-23
B.1.12 N Point 3x3 2-D FIR Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-26
B.1.13 Sine-Wave Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-28
B.1.13.1 Double Integration Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-28
B.1.13.2 Second Order Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-29
B.1.14 Array Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-30
B.1.14.1 Index of the Highest Signed Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-30
B.1.14.2 Index of the Highest Positive Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-30
B.1.15 Proportional Integrator Differentiator (PID) Algorithm . . . . . . . . . . . . . . . . B-31
B.1.15.1 PID (Version 1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-31
B.1.15.2 PID (Version 2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-32
B.1.16 Autocorrelation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-33
Freescale Semiconductor xi
Table 6-16 Data ALU Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15
Table 6-17 Immediate Value Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15
Table 6-18 Move Word Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-18
Table 6-19 Immediate Move Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-19
Table 6-20 Register-to-Register Move Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-19
Table 6-21 Move Word Instructions — Program Memory. . . . . . . . . . . . . . . . . . . . . . . . . 6-19
Table 6-22 Conditional Register Transfer Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-20
Table 6-23 Data ALU Multiply Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-20
Table 6-24 Data ALU Extended Precision Multiplication Instructions . . . . . . . . . . . . . . . 6-21
Table 6-25 Data ALU Arithmetic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-21
Table 6-26 Data ALU Miscellaneous Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-23
Table 6-27 Data ALU Logical Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-23
Table 6-28 Data ALU Shifting Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-24
Table 6-29 AGU Arithmetic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-25
Table 6-30 Bit-Manipulation Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-25
Table 6-31 Branch on Bit-Manipulation Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-26
Table 6-32 Change of Flow Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-27
Table 6-33 Looping Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-27
Table 6-34 Control Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-28
Table 6-35 Data ALU Instructions — Single Parallel Move . . . . . . . . . . . . . . . . . . . . . . . 6-29
Table 6-36 Data ALU Instructions — Dual Parallel Read . . . . . . . . . . . . . . . . . . . . . . . . . 6-30
Table 7-1 Processing States. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1
Table 7-2 Instruction Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3
Table 7-3 Additional Cycles for Off-Chip Memory Accesses . . . . . . . . . . . . . . . . . . . . . . 7-4
Table 7-4 DSP56800 Core Reset and Interrupt Vector Table. . . . . . . . . . . . . . . . . . . . . . . 7-7
Table 7-5 Interrupt Priority Level Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8
Table 7-6 Interrupt Mask Bit Definition in the Status Register . . . . . . . . . . . . . . . . . . . . . 7-8
Table 7-7 Fixed Priority Structure Within an IPL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-13
Table 8-1 Operations Synthesized Using DSP56800 Instructions . . . . . . . . . . . . . . . . . . . 8-1
Table A-1 Register Fields for General-Purpose Writes and Reads . . . . . . . . . . . . . . . . . . . . . . . . . A-1
Table A-2 Address Generation Unit (AGU) Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2
Table A-3 Data ALU Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2
Table A-4 Address Operands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3
Table A-5 Addressing Mode Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3
Table A-6 Miscellaneous Operands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3
Table A-7 Other Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-4
Table A-8 Notation Used for the Condition Code Summary Table . . . . . . . . . . . . . . . . . . . . . . . A-12
Table A-9 Condition Code Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-13
Table A-10 Instruction Timing Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-17
Freescale Semiconductor xv
Figure 4-9 Address Register Indirect: Indexed by Long Displacement . . . . . . . . . . . . . . . 4-16
Figure 4-10 Special Addressing: Immediate Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-18
Figure 4-11 Special Addressing: Immediate Short Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-19
Figure 4-12 Special Addressing: Absolute Address. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-21
Figure 4-13 Special Addressing: Absolute Short Address . . . . . . . . . . . . . . . . . . . . . . . . . . 4-22
Figure 4-14 Special Addressing: I/O Short Address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-23
Figure 4-15 Circular Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-26
Figure 4-16 Circular Buffer with Size M=37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-27
Figure 4-17 Simple Five-Location Circular Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-29
Figure 4-18 Linear Addressing with a Modulo Modifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-32
Figure 5-1 Program Controller Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2
Figure 5-2 Program Controller Programming Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3
Figure 5-3 Accessing the Loop Count Register (LC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5
Figure 5-4 Status Register Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7
Figure 5-5 Operating Mode Register (OMR) Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-10
Figure 6-1 Single Parallel Move. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
Figure 6-2 Dual Parallel Move . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3
Figure 6-3 DSP56800 Core Programming Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5
Figure 6-4 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-31
Figure 7-1 Interrupt Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6
Figure 7-2 Example Interrupt Priority Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9
Figure 7-3 Example On-Chip Peripheral and IRQ Interrupt Programming . . . . . . . . . . . . . 7-9
Figure 7-4 Illegal Instruction Interrupt Servicing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12
Figure 7-5 Interrupt Service Routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-15
Figure 7-6 Repeated Illegal Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-16
Figure 7-7 Interrupting a REP Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-17
Figure 7-8 Wait Instruction Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-18
Figure 7-9 Simultaneous Wait Instruction and Interrupt . . . . . . . . . . . . . . . . . . . . . . . . . . 7-18
Figure 7-10 STOP Instruction Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-19
Figure 7-11 STOP Instruction Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-20
Figure 7-12 STOP Instruction Sequence Recovering with RESET . . . . . . . . . . . . . . . . . . . 7-21
Figure 8-1 Example of a DSP56800 Stack Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-29
Figure 9-1 JTAG/OnCE Interface Block Diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
Figure 9-2 JTAG Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4
Figure 9-3 OnCE Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-6
Figure A-1 DSP56800 Core Programming Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-5
Figure A-2 Status Register (SR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-7
Figure B-1 N Radix 2 FFT Butterflies Memory Map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-10
Figure B-2 LMS Adaptive Filter Graphic Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-12
Example 3-1 Loading an Accumulator with a Word for Integer Processing . . . . . . . . . . . . . 3-11
Example 3-2 Reading a Word from an Accumulator for Integer Processing . . . . . . . . . . . . 3-12
Example 3-3 Correctly Reading a Word from an Accumulator to a D/A . . . . . . . . . . . . . . . 3-12
Example 3-4 Correct Saving and Restoring of an Accumulator — Word Accesses . . . . . . . 3-13
Example 3-5 Bit Manipulation on an Accumulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13
Example 3-6 Converting a 36-Bit Accumulator to a 16-Bit Value . . . . . . . . . . . . . . . . . . . . 3-14
Example 3-7 Fractional Arithmetic Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14
Example 3-8 Integer Arithmetic Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14
Example 3-9 Multiplying Two Signed Integer Values with Full Precision . . . . . . . . . . . . . . 3-21
Example 3-10 Fast Integer MACs using Fractional Arithmetic. . . . . . . . . . . . . . . . . . . . . . . . 3-21
Example 3-11 Multiplying Two Unsigned Fractional Values . . . . . . . . . . . . . . . . . . . . . . . . . 3-23
Example 3-12 64-Bit Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-23
Example 3-13 64-Bit Subtraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-23
Example 3-14 Fractional Single-Precision Times Double-Precision Value — Both Signed . 3-24
Example 3-15 Integer Single-Precision Times Double-Precision Value — Both Signed . . . . 3-24
Example 3-16 Multiplying Two Fractional Double-Precision Values. . . . . . . . . . . . . . . . . . . 3-25
Example 3-17 Demonstrating the Data Limiter — Positive Saturation . . . . . . . . . . . . . . . . . . 3-26
Example 3-18 Demonstrating the Data Limiter — Negative Saturation . . . . . . . . . . . . . . . . . 3-27
Example 3-19 Demonstrating the MAC Output Limiter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-28
Example 4-1 Initializing the Circular Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-29
Example 4-2 Accessing the Circular Buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-30
Example 4-3 Accessing the Circular Buffer with Post-Update by Three . . . . . . . . . . . . . . . 4-30
Example 4-4 No Dependency with the Offset Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-33
Example 4-5 No Dependency with an Address Pointer Register. . . . . . . . . . . . . . . . . . . . . . 4-33
Example 4-6 No Dependency with No Address Arithmetic Calculation. . . . . . . . . . . . . . . . 4-34
Example 4-7 No Dependency with (Rn+xxxx) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-34
Example 4-8 Dependency with a Write to the Offset Register . . . . . . . . . . . . . . . . . . . . . . . 4-34
Example 4-9 Dependency with a Bit-Field Operation on the Offset Register . . . . . . . . . . . . 4-34
Example 4-10 Dependency with a Write to an Address Pointer Register . . . . . . . . . . . . . . . . 4-34
Example 4-11 Dependency with a Write to the Modifier Register . . . . . . . . . . . . . . . . . . . . . 4-34
Example 4-12 Dependency with a Write to the Stack Pointer Register. . . . . . . . . . . . . . . . . . 4-35
Example 4-13 Dependency with a Bit-Field Operation and DO Loop . . . . . . . . . . . . . . . . . . 4-35
Audience
The information in this manual is intended to assist design and software engineers with integrating a
DSP56800 Family device into a design and with developing application software.
Organization
Information in this manual is organized into chapters by topic. The contents of the chapters are as follows:
Chapter 1, “Introduction.” This section introduces the DSP56800 core architecture and its application. It
also provides the novice with a brief overview of digital signal processing.
Chapter 2, “Core Architecture Overview.” The DSP56800 core architecture consists of the data
arithmetic logic unit (ALU), address generation unit (AGU), program controller, bus and bit-manipulation
unit, and a JTAG/On-Chip Emulation (OnCE™) port. This section describes each subsystem and the buses
interconnecting the major components in the DSP56800 central processing module.
Chapter 3, “Data Arithmetic Logic Unit.” This section describes the data ALU architecture, its
programming model, an introduction to fractional and integer arithmetic, and a discussion of other topics
such as unsigned and multi-precision arithmetic on the DSP56800 Family.
Chapter 4, “Address Generation Unit.” This section specifically describes the AGU architecture and its
programming model, addressing modes, and address modifiers.
Chapter 5, “Program Controller.” This section describes in detail the program controller architecture, its
programming model, and hardware looping. Note, however, that the different processing states of the
DSP56800 core, including interrupt processing, are described in Chapter 7, “Interrupts and the Processing
States.”
Suggested Reading
A list of DSC-related books is included here as an aid for the engineer who is new to the field of DSC:
Advanced Topics in Signal Processing, Jae S. Lim and Alan V. Oppenheim (Prentice-Hall: 1988).
Applications of Digital Signal Processing, A. V. Oppenheim (Prentice-Hall: 1978).
Digital Processing of Signals: Theory and Practice, Maurice Bellanger (John Wiley and Sons: 1984).
Digital Signal Processing, Alan V. Oppenheim and Ronald W. Schafer (Prentice-Hall: 1975).
Digital Signal Processing: A System Design Approach, David J. DeFatta, Joseph G. Lucas, and William S.
Hodgkiss (John Wiley and Sons: 1988).
Discrete-Time Signal Processing, A. V. Oppenheim and R.W. Schafer (Prentice-Hall: 1989).
Foundations of Digital Signal Processing and Data Analysis, J. A. Cadzow (Macmillan: 1987).
Handbook of Digital Signal Processing, D. F. Elliott (Academic Press: 1987).
Introduction to Digital Signal Processing, John G. Proakis and Dimitris G. Manolakis (Macmillan: 1988).
Multirate Digital Signal Processing, R. E. Crochiere and L. R. Rabiner (Prentice-Hall: 1983).
Signal Processing Algorithms, S. Stearns and R. Davis (Prentice-Hall: 1988).
Signal Processing Handbook, C. H. Chen (Marcel Dekker: 1988).
Signal Processing: The Modern Approach, James V. Candy (McGraw-Hill: 1988).
Theory and Application of Digital Signal Processing, Lawrence R. Rabiner and Bernard Gold
(Prentice-Hall: 1975).
External Address
Bus
PLL
Interface
Data
16-Bit DSC
CPU Core
Debug
JTAG I/O
Port
AA0012
The general-purpose MCU-style instruction set, with its powerful addressing modes and bit-manipulation
instructions, enables a user to begin writing code immediately, without having to worry about the
complexities previously associated with DSCs. A software stack allows for unlimited interrupt and
subroutine nesting, as well as support for structured programming techniques such as parameter passing
and the use of local variables. The veteran DSC programmer sees a powerful DSC instruction set with
many different arithmetic operations and flexible single- and dual-memory moves that can occur in parallel
with an arithmetic operation. The general-purpose nature of the instruction set also allows for an efficient
compiler implementation.
A variety of standard peripherals can be added around the DSP56800 core (see Figure 1-1 on page 1-1)
such as serial ports, general-purpose timers, real-time and watchdog timers, different memory
configurations (RAM, FLASH, or both), and general-purpose I/O (GPIO) ports.
On-Chip Emulation (OnCE™) capability is provided through a debug port conforming to the Joint Test
Action Group (JTAG) standard. This provides real-time, embedded system debugging with on-chip
emulation capability through the five-pin JTAG interface. A user can set hardware and software
breakpoints, display and change registers and memory locations, and single step or step through multiple
instructions in an application.
The DSP56800’s efficient instruction set, multiple internal buses, on-chip program and data memories,
external bus interface, standard peripherals, and industry-standard debug support make the DSP56800
Family an excellent solution for real-time embedded control tasks. It is an excellent fit for wireless or
wireline DSC applications, digital control, and controller applications in need of more processing power.
Program AGU
Controller M01 N SP
SR OMR Instr. Decoder MOD. R0
+/- ALU
And R1
LA LC
R2
Interrupt Unit
PC HWS R3
Clock Gen.
Clock & Control
Program PAB
Memory XAB1 External
XAB2 Bus
Data Memory PDB Interface
CGDB
IP-BUS (or PDGB)
Peripherals
XDB2
Data Limiter
ALU
Bus And Bit
Manipulation
Unit
Y1 Y0 X0 A2 A1 A0 B2 B1 B0
OnCE
MAC
and
ALU
AA0006
Analog Filter
Rf
Cf
x(t) x(t)
+ y(t) y(t)
Input
Ri – Output
From
To
Sensor
Actuator
t Rf
y(t) 1
--------- = – ------ ----------------------------
x(t) R i 1 + jwR f C f
Frequency Characteristics
Ideal Actual
Gain
Filter Filter
f
fc
Frequency
AA0003
The equivalent circuit using a DSC is shown in Figure 1-5 on page 1-7. This application requires an
analog-to-digital (A/D) converter and digital-to-analog (D/A) converter in addition to the DSC. Even with
these additional parts, the component count can be lower using a DSC due to the high integration available
with current components.
A/D
∑ c(k) × (n – k) D/A
k=0 y(t)
x(t)
x(n) Finite Impulse y(n)
Response
f
fc
Frequency
A
Analog
Gain
Filter
f
fc
Frequency
A
Gain
Digital
Filter
f
fc
Frequency AA0004
Processing in this circuit begins by band limiting the input signal with an anti-alias filter, eliminating
out-of-band signals that can be aliased back into the pass band due to the sampling process. The signal is
then sampled, digitized with an A/D converter, and sent to the DSC.
The filter implemented by the DSC is strictly a matter of software. The DSC can directly employ any filter
that can also be implemented using analog techniques. Also, adaptive filters can be easily put into practice
using DSC, whereas these filters are extremely difficult to implement using analog techniques. (Similarly,
compression can also be implemented on a DSC.)
The DSC output is processed by a D/A converter and is low-pass filtered to remove the effects of
digitizing. In summary, the advantages of using the DSC include the following:
• Fewer components
• Stable, deterministic performance
• No filter adjustments
• Wide range of applications
• Filters with much closer tolerances
• High noise immunity
• Adaptive filters easily implemented
• Self-test can be built in
• Better power-supply rejection
The DSP56800 Family is not a custom IC designed for a particular application; it is designed as a
general-purpose DSC architecture to efficiently execute commonly used DSC benchmarks and controller
code in minimal time.
As shown in Figure 1-6, the key attributes of a DSC are as follows:
• Multiply/accumulate (MAC) operation
• Fetching up to two operands per instruction cycle for the MAC
• Program control to provide versatile operation
• Input/output to move data in and out of the DSC
FIR Filter
N–1
x(t)
A/D ∑ c(k) × (n – k ) D/A
y(t)
k=0
x(n) y(n)
X
Memory
Program
MAC
AA0005
The multiply-accumulation (MAC) operation is the fundamental operation used in DSC. The DSP56800
Family of processors has a dual Harvard architecture optimized for MAC operations. Figure 1-6 on
page 1-8 shows how the DSP56800 architecture matches the shape of the MAC operation. The two
operands, c( ) and x( ), are directed to a multiply operation, and the result is summed. This process is built
into the chip by allowing two separate data-memory accesses to feed a single-cycle MAC. The entire
process must occur under program control to direct the correct operands to the multiplier and save the
accumulated result as needed. Since the memory and the MAC are independent, the DSC can perform two
memory moves, a multiply and an accumulate, and two address updates in a single operation. As a result,
many DSC benchmarks execute very efficiently for a single-multiplier architecture.
• Instruction set—The instruction mnemonics are MCU-like, making the transition from
programming microprocessors to programming the chip as easy as possible. New microcontroller
instructions, addressing modes, and bit-field instructions allow for significant decreases in program
code size. The orthogonal syntax controls the parallel execution units. The hardware DO loop
instruction and the repeat (REP) instruction make writing straight-line code obsolete.
• Low power—Designed in CMOS, the DSP56800 Family inherently consumes very low power.
Two additional low power modes, stop and wait, further reduce power requirements. Wait is a
low-power mode where the DSP56800 core is shut down but the peripherals and interrupt controller
continue to operate so that an interrupt can bring the chip out of wait mode. In stop mode, even more
of the circuitry is shut down for the lowest power-consumption mode. There are also several
different ways to bring the chip out of stop mode.
Program AGU
Controller M01 N SP
SR OMR MOD. R0
Instr. Decoder +/- ALU R1
LA LC and
R2
Interrupt Unit
PC HWS R3
Program
Memory
XAB1
XAB2
PAB Data
Memory
PDB
CGDB
XDB2 External
Bus
PGDB (or IP-Bus) Interface
Data IP-BUS
Limiter
ALU Interface
Note that Figure 2-1 illustrates two methods for connecting peripherals to the DSP56800 core: using the
Freescale-standard IP-BUS interface or via a dedicated Peripheral Global Data Bus (PGDB). The interface
method used to connect to peripherals is dependent on the specific DSP56800-based device being used.
The latest products have chosen the IP-BUS interface. Consult your device user’s manual for more
information on peripheral interfacing.
The address registers are 16-bit registers that may contain an address or data. Each address register can
provide an address for the XAB1 and PAB address buses. For instructions that read two values from X data
memory, R3 provides an address for the XAB2, and R0 or R1 provides an address for the XAB1. The
modifier and offset registers are 16-bit registers that control updating of the address registers. The offset
register can also be used to store 16-bit data. AGU registers may be read or written by the CGDB as 16-bit
operands. Refer to Chapter 4, “Address Generation Unit,” for a detailed description of the AGU.
Data transfer between the data ALU and the X data memory uses the CGDB when one memory access is
performed. When two simultaneous memory reads are performed, the transfers use the CGDB and the
XDB2. All other data transfers occur using the CGDB, except transfers to and from peripherals on
DSP56800-based devices that implement the IP-BUS or PGDB peripheral data bus. Instruction word
fetches occur simultaneously over the PDB. The bus structure supports general register-to-register moves,
register-to-memory moves, and memory-to-register moves, and can transfer up to three 16-bit words in the
same instruction cycle. Transfers between buses are accomplished in the bus and bit-manipulation unit. As
a general rule, when any register less than 16 bits wide is read, the unused bits are read as zeros. Reserved
and unused bits should always be written with zeros to insure future compatibility.
$7F 127
Interrupt
$0 Vectors 0 $0 0
NOTE: The placement of the peripheral space is dependent on the specific system
implementation for the DSP56800 core. When the IP-BUS interface is used,
peripheral registers may be memory mapped into any data (X) memory address
range and are accessed with standard X-memory reads and writes.
Locations $0 through $007F in the program memory space are available for reset and interrupt vectors.
Peripheral registers are located in the data memory address space as memory-mapped registers. This
peripheral space can be located anywhere in the data address space, although the address range
$FFC0–$FFFF provides faster access when using an addressing mode optimized for this region; however,
the location of the peripheral space is dependent on the specific peripheral bus implementation of the
DSP56800 core. See Section 4.2.4.3, “I/O Short Address (Direct Addressing): <pp>,” on page 4-23 for
more information.
Program On-Chip
Data
PLL RAM/FLASH RAM/FLASH Expansion
Expansion Expansion Area
Clock
Generator
XDB2
Peripheral
Modules
XAB1
Address XAB2
Generation
PAB
Unit DSC
16-Bit
Core
Internal PDB IP-BUS
Data Bus
CGDB Bridge
Switch
Data ALU
Program 16 x 16 + 36 → 36-Bit MAC
JTAG/
Controller Three 16-Bit Input Registers OnCETM
Two 36-Bit Accumulators
X0 Y Y1 Y0
15 0 15 0 15 0
Accumulator Registers
35 32 31 16 15 0
A A2 A1 A0
3 0 15 0 15 0
35 32 31 16 15 0
B B2 B1 B0
3 0 15 0 15 0
R1
R2
R3 15 0 15 0
SP N M01
PC MR CCR OMR
LC LA
AA0007
• Rounding
• Absolute value
• Division iteration
• Normalization iteration
• Conditional register moves (Tcc)
• Saturation (limiting)
XDB2
CGDB
A2 A1 A0
LIMITER
B2 B1 B0
Y1
Y0
X0
Optional
Inverter
Arith/Logical
x Shifter
Rounding
Constant +
OMR’s SA Bit MAC Output Limiter
EXT:MSP:LSP
Condition Code
OMR’s CC Bit Generation
Condition Codes
to Status Register
Accumulator Registers
35 32 31 16 15 0
A A2 A1 A0
3 0 15 0 15 0
35 32 31 16 15 0
B B2 B1 B0
3 0 15 0 15 0 AA0035
Accessing an accumulator through its individual portions (A2, A1, A0, B2, B1, or B0) is useful for systems
and control programming. When accumulators are manipulated using their constituent components,
saturation and limiting are disabled. This allows for microcontroller-like 16-bit integer processing for
non-DSC purposes.
Section 3.2, “Accessing the Accumulator Registers,” provides a complete discussion of the ways in which
the accumulators can be employed. A description of the data limiting and saturation features of the data
ALU is provided in Section 3.4, “Saturation and Data Limiting.”
$AAAA $4 $AAAA $4
16 4 16 4
Multi-Bit Multi-Bit
Shifting Unit Shifting Unit
A F F A A A 0 0 0 0 A F A A A 0 0 0 0 0
35 32 31 16 15 0 35 32 31 16 15 0
Figure 3-3. Right and Left Shifts Through the Multi-Bit Shifting Unit
The barrel shifter performs all multi-bit shifts operations: arithmetic shifts (ASLL, ASRR), and logical
shift (LSRR). When the destination is a 36-bit accumulator, the extension register is always loaded with
sign extension from bit 31 for arithmetic shifts (and zero extended for logical shift). The LSP is always set
to zero for these operations. Note that the LSLL is implemented as an ASLL instruction but only accepts
16-bit registers as destinations. For information on LSLL, refer to Section 6.5.2, “LSLL Alias,” on page
6-12 and Appendix A.
In all cases in Table 3-1 where a MOVE operation is specified, it is understood that the function is
identical for parallel moves and bit-field operations.
4 LSB of
Not Used Word
15 4 3 0
Register F2 Used
No Bits Present F2 Register F2
as a Destination
When F2 is read, the register contents occupy the low-order portion (bits 3–0) of the word; the high-order
portion (bits 15–4) is sign extended. See Figure 3-5.
15 4 3 0
Register F2
No Bits Present F2 Register F2
Used as a Source
4 LSB of
Word
15 4 3 0
Sign Extension Contents
CGDB Bus Contents
of F2 of F2
Figure 3-6 shows the result of writing values to each portion of the accumulator. Note that only the portion
specified in the instruction is modified; the other two portions remain unchanged.
See Section 3.2, “Accessing the Accumulator Registers,” for a discussion of when it is appropriate to
access an accumulator by its individual portions and when it is appropriate to access it as an entire
accumulator.
Successfully using the DSP56800 Family requires a full understanding of the methods and implications of
the various accumulator-register access methods. The architecture of the accumulator registers offers a
great deal of flexibility and power, but it is necessary to completely understand the access mechanisms
involved to fully exploit this power.
Loading a 16-bit integer value into the A1 portion of the register is generally discouraged. In almost all
cases, it is preferable to follow Example 3-1 on page 3-11. One notable exception is when 36-bit
accumulator values must be stored temporarily. See Section 3.2.5, “Saving and Restoring Accumulators,”
for more details.
Note that with the use of the A1 register instead of the A register, saturation is disabled. The value in A1 is
written “as is” to memory.
Note the use of the A accumulator instead of the A1 register. Using the A accumulator enables saturation.
It is important that interrupt service routines do not use the MOVE A,X:(SP)+ instruction when saving to
the stack. This instruction operates with saturation enabled, and may inadvertently store the value $7FFF
or $8000 onto the stack, according to the rules employed by the Data Limiter. This could have catastrophic
effects on any DSC calculation that was in progress.
Since the BFTSTH, BFTSTL, BRCLR, and BRSET instructions only test the accumulator value and do
not modify it, it is recommended to do these operations on the A1 register where no limiting can occur
when integer processing is performed.
Where limiting is enabled, as in the second example in Example 3-6, limiting only occurs when the
extension register is in use. You can determine if the extension register is in use by examining the
extension bit (E) of the status register. Refer to Section 5.1.8, “Status Register,” on page 5-6.
Integer arithmetic, on the other hand, is invaluable for controller code, for array indexing and address
computations, compilers, peripheral setup and handling, bit manipulation, bit-exact algorithms, and other
general-purpose tasks. Typically, saturation is not used in this mode, but is available if desired. (See
Example 3-8.)
Example 3-8. Integer Arithmetic Examples
4 x 3 = 12
1201 + 79 = 1280
63 / 9 = 7
100 << 1 = 200
The main difference between fractional and integer representations is the location of the decimal (or
binary) point. For fractional arithmetic, the decimal (or binary) point is always located immediately to the
right of the MSP’s most significant bit; for integer values, it is always located immediately to the right of
the value’s LSB. Figure 3-8 on page 3-15 shows the location of the decimal point (binary point), bit
weights and operands alignment for different fractional and integer representations supported on the
DSP56800 architecture.
The following equation shows the relationship between a 16-bit integer and a fractional value:
Fractional Value = Integer Value / (215)
There is a similar equation relating 36-bit integers and fractional values:
Fractional Value = Integer Value / (231)
Table 3-3 shows how a 36-bit value can be interpreted as either an integer or a fractional value, depending
on the location of the binary point.
Table 3-3. Interpretation of 36-bit Data Values
1.When the accumulator extension registers are in use, the data contained in the accu-
mulators cannot be stored exactly in memory or other registers. In these cases the data
must be limited to the most positive or most negative number consistent with the size
of the destination.
A2 A1 A0 A2 A1 A0
X0 $0040 X0 $0040
Fractional word-sized arithmetic would be performed in a similar manner. For arithmetic operations where
the destination is a 16-bit register or memory location, the fractional or integer operation is correctly
calculated and stored in its 16-bit destination.
3.3.5 Multiplication
The multiplication operation is not the same for integer and fractional arithmetic. The result of a fractional
multiplication differs in a simple manner from the result of an integer multiplication. This difference
amounts to a 1-bit shift of the final result, as illustrated in Figure 3-10. Any binary multiplication of two
N-bit signed numbers gives a signed result that is 2N-1 bits in length. This 2N-1 bit result must then be
correctly placed into a field of 2N bits to correctly fit into the on-chip registers. For correct fractional
multiplication, an extra 0 bit is placed at the LSB to give a 2N bit result. For correct integer multiplication,
an extra sign bit is placed at the MSB to give a 2N bit result.
Integer Fractional
S S S S
X X
Signed Multiplier Signed Multiplier
2N Bits 2N Bits
AA0042
The MPY, MAC, MPYR, and MACR instructions perform fractional multiplication and fractional
multiply-accumulation. The IMPY16 instruction performs integer multiplication. Section 3.3.5.2, “Integer
Multiplication,” explains how to perform integer multiplication.
Signed 16 16
Intermediate s s 0
Multiplier Result
32 Bits
Signed Fractional
MPY Result EXP MSP LSP 0
36 Bits
AA0043
16 Bits
Signed
Intermediate s 0
Multiplier Result
31 Bits
S Ext.
Signed Integer
Output EXP MSP Unchanged
16 Bits
AA0044
At other times it is necessary to maintain the full 32-bit precision of an integer multiplication. To obtain
integer results, an MPY instruction is used, immediately followed by an ASR instruction. The 32-bit long
integer result is then correctly located into the MSP and LSP of an accumulator with correct sign extension
in the extension register of the same accumulator (see Example 3-9).
Example 3-9. Multiplying Two Signed Integer Values with Full Precision
MPY X0,Y0,A ; Generates correct answer shifted
; 1 bit to the left
ASR A ; Leaves Correct 32-bit Integer
; Result in the A Accumulator
; and the A2 register contains
; correct sign extension
When a multiply-accumulate is performed on a set of integer numbers, there is a faster way for generating
the result than performing an ASR instruction after each multiply. The technique is to use fractional
multiply-accumulates for the bulk of the computation and to then convert the final result back to integer.
See Example 3-10.
Example 3-10. Fast Integer MACs using Fractional Arithmetic
MOVE X:(R0)+,Y0 X:(R3)+,X0
DO #Count,LABEL ; Count defined as number of repetitions
MAC X0,Y0,A X:(R0)+,Y0 X:(R3)+,X0
LABEL:
ASR A ; Convert to Integer only after MACs are
; completed
3.3.6 Division
Fractional and integer division of both positive and signed values is supported using the DIV instruction.
The dividend (numerator) is a 32-bit fractional or 31-bit integer value, and the divisor (denominator) is a
16-bit fractional or integer value, respectively. See Section 8.4, “Division,” on page 8-13 for a complete
discussion of division.
16 Bits
X0
32 Bits
Y1 Y0
Signed x Unsigned
X0 x Y0
Signed x Signed
X0 x Y1
+
Sign Ext.
A2 A1 A0 B1
48 Bits AA0046
;
; Signed 32x32 => 64 Multiplication Subroutine
;
; Parameters:
; R1 = ptr to lowest word of one operand
; R2 = ptr to lowest word of one operand
; R3 = ptr to where results are stored
MULT_S32_X_S32:
CLR B ; clears B2 portion
; Operation ; X0 Y1 Y0 A
; --------- ; ----- ----- ----- -------------------
MOVE X:(R1),Y0 ; --- --- lwr1 -----
ANDC #CLRMSB,Y0 ; --- --- lwr1' -----
MOVE X:(R2)+,Y1 ; --- lwr2 lwr1' -----
MPYSU Y0,Y1,A ; --- lwr2 lwr1' lwr1'.s * lwr2.u
TSTW X:(R1)+ ; check if MSB set in original lwr1 value
BGE CORRECT_RES1 ; perform correction if this was true
MOVE Y1,B1 ; --- lwr2 lwr1' -----
ADD B,A ; --- lwr2 lwr1' lwr1.u * lwr2.u
CORRECT_RES1:
MOVE A0,X:(R3)+ ; --- lwr2 lwr1' lwr1.u * lwr2.u
; Multiply two cross products and save next lowest 16-bits of result
; Operation ; X0 Y1 Y0 A
; --------- ; ----- ----- ----- -------------------
MOVE A1,X:TMP ; (arithmetic 16-bit right shift of 36-bit accum)
MOVE A2,A ; ---- ---- ---- -----
MOVE X:TMP,A0 ; ---- ---- ---- A = product1 >> 16
RTS
INC A ; A = $0_7FFD_0000
MOVE A,X:(R0)+ ; Write $7FFD to memory (limiter enabled)
INC A ; A = $0_7FFE_0000
MOVE A,X:(R0)+ ; Write $7FFE to memory (limiter enabled)
INC A ; A = $0_7FFF_0000
MOVE A,X:(R0)+ ; Write $7FFF to memory (limiter enabled)
Once the accumulator increments to $8000 in Example 3-17, the positive result can no longer be written to
a 16-bit memory location without overflow. So, instead of writing an overflowed value to memory, the
value of the most positive 16-bit number, $7FFF, is written instead by the data limiter block. Note that the
data limiter block does not affect the accumulator; it only affects the value written to memory. In the last
instruction, the limiter is disabled because the register is specified as A1.
Consider a second example, shown in Example 3-18 on page 3-27.
Example 3-18. Demonstrating the Data Limiter — Negative Saturation
MOVE #$1008,R0 ; Store results starting in address $1008
MOVE #$8003,A ; Initialize A = $F_8003_0000
DEC A ; A = $F_8002_0000
MOVE A,X:(R0)+ ; Write $8002 to memory (limiter enabled)
DEC A ; A = $F_8001_0000
MOVE A,X:(R0)+ ; Write $8001 to memory (limiter enabled)
DEC A ; A = $F_8000_0000
MOVE A,X:(R0)+ ; Write $8000 to memory (limiter enabled)
Once the accumulator decrements to $7FFF in Example 3-18, the negative result can no longer fit into a
16-bit memory location without overflow. So, instead of writing an overflowed value to memory, the value
of the most negative 16-bit number, $8000, is written instead by the data limiter block.
Test logic exists in the extension portion of each accumulator register to support the operation of the
limiter circuit; the logic detects overflows so that the limiter can substitute one of two constants to
minimize errors due to overflow. This process is called “saturation arithmetic.” When limiting does occur,
a flag is set and latched in the status register. The value of the accumulator is not changed.
Table 3-4. Saturation by the Limiter Using the MOVE Instruction
It is possible to bypass this limiting feature when reading an accumulator by reading it out through its
individual portions.
Figure 3-14 on page 3-28 demonstrates the importance of limiting. Consider the A accumulator with the
following 36-bit value to be read to a 16-bit destination:
0000 1.000 0000 0000 0000 0000 0000 0000 0000 (in binary)
(+ 1.0 in fractional decimal, $0_8000_0000 in hexadecimal)
If this accumulator is read without the limiting enabled by a MOVE A1,X0 instruction, the 16-bit X0
register after the MOVE instruction would contain the following, assuming signed fractional arithmetic:
1.000 0000 0000 0000 (- 1.0 fractional decimal, $8000 in hexadecimal)
This is clearly in error because the value -1.0 in the X0 register greatly differs from the value of +1.0 in the
source accumulator. In this case, overflow has occurred. To minimize the error due to overflow, it is
preferable to write the maximum (“limited”) value the destination can assume. In this example, the limited
value would be:
0.111 1111 1111 1111 (+ 0.999969 fractional decimal, $7FFF in hexadecimal)
This is clearly closer to the original value, +1.0, than -1.0 is, and thus introduces less error. Saturation is
equally applicable to both integer and fractional arithmetic.
Thus, saturation arithmetic can have a large effect in moving from register A1 to register X0. The
instruction MOVE A1,X0 performs a move without limiting, and the instruction MOVE A,X0 performs a
move of the same 16 bits with limiting enabled. The magnitude of the error without limiting is 2.0; with
limiting it is 0.000031.
3 0 15 0 15 0 3 0 15 0 15 0
*Limiting automatically occurs when the 36-bit operands A and B are read with a MOVE instruction. Note that the
contents of the original accumulator are not changed.
INC A ; A = $0_7FFD_0000
INC A ; A = $0_7FFE_0000
INC A ; A = $0_7FFF_0000
Once the accumulator increments to $7FFF in Example 3-19, the saturation logic in the MAC Output
limiter prevents it from growing larger because it can no longer fit into a 16-bit memory location without
overflow. So instead of writing an overflowed value to back to the A accumulator, the value of the most
positive 32-bit number, $7FFF_FFFF, is written instead as the arithmetic result.
The saturation logic operates by checking 3 bits of the 36-bit result out of the MAC unit: EXT[3], EXT[0],
and MSP[15]. When the SA bit is set, these 3 bits determine if saturation is performed on the MAC unit’s
output and whether to saturate to the maximum positive value ($7FFF_FFFF) or the maximum negative
value ($8000_0000), as shown in Table 3-5.
Table 3-5. MAC Unit Outputs with Saturation Enabled
0 0 1 $0_7FFF_FFFF
0 1 0 $0_7FFF_FFFF
0 1 1 $0_7FFF_FFFF
1 0 0 $F_8000_0000
1 0 1 $F_8000_0000
1 1 0 $F_8000_0000
The MAC Output Limiter not only affects the results calculated by the instruction, but can also affect
condition code computation as well. See Appendix A.4.2, “Effects of the Operating Mode Register’s SA
Bit,” on page A-11 for more information.
Also, the MAC Output Limiter only affects operations performed in the data ALU. It has no effect on
instructions executed in other blocks of the core, such as the following:
• Bit Manipulation Instructions (Table 6-30 and Table 6-31 on page 6-26)
• Move instructions (Table 6-18 through Table 6-21)
• Looping instructions (Table 6-33 on page 6-27)
• Change of flow instructions (Table 6-32 on page 6-27)
• Control instructions (Table 6-34 on page 6-28)
NOTE:
The SA bit affects the TFR instruction when it is set, optionally limiting
data as it is transferred from one accumulator to another.
3.5 Rounding
The DSP56800 provides three instructions that can perform rounding — RND, MACR, and MPYR. The
RND instruction simply rounds a value in the accumulator register specified by the instruction, whereas
the MPYR or MACR instructions round the result calculated by the instruction in the MAC array. Each
rounding instruction rounds the result to a single-precision value so the value can be stored in memory or
in a 16-bit register. In addition, for instructions where the destination is one of the two accumulators, the
LSP of the destination accumulator (A0 or B0) is set to $0000.
The DSC core implements two types of rounding: convergent rounding and two’s-complement rounding.
For the DSP56800, the rounding point is between bits 16 and 15 of a 36-bit value; for the A accumulator, it
is between the A1 register’s LSB and the A0 register’s MSB. The usual rounding method rounds up any
value above one-half (that is, LSP > $8000) and rounds down any value below one-half (that is, LSP <
$8000). The question arises as to which way the number one-half (LSP = $8000) should be rounded. If it is
always rounded one way, the results will eventually be biased in that direction. Convergent rounding
solves the problem by rounding down if the number is even (bit 16 equals zero) and rounding up if the
number is odd (bit 16 equals one), whereas two’s-complement rounding always rounds this number up.
The type of rounding is selected by the rounding bit (R) of the operating mode register (OMR) in the
program controller.
Case III: If A0 = $8000 (1/2), and the LSB of A1 = 0 (even), then round down (add nothing)
Before Rounding After Rounding
0
A2 A1 A0 A2 A1 A0*
XX..XX XXX...XXX0100 1000........000 XX..XX XXX...XXX0100 000.........000
35 32 31 16 15 0 35 32 31 16 15 0
Case IV: If A0 = $8000 (1/2), and the LSB = 1 (odd), then round up (add 1 to A1)
Once the rounding bit has been programmed in the OMR register, there is a delay of one instruction cycle
before the new rounding mode becomes active.
This chapter covers the architecture and programming model of the address generation unit, its addressing
modes, and a discussion of the linear and modulo arithmetic capabilities of this unit. It concludes with a
discussion of pipeline dependencies related to the address generation unit.
CGDB(15:0)
SP
M01 N R0
Modulo
R1
Arithmetic
Unit R2
R3
Inc./Dec.
R3 Only
All four address pointer registers and the SP are used in generating addresses in the register indirect
addressing modes. The offset register can be used by all four address pointer registers and the SP, whereas
the modulo register can be used by the R0 or by both the R0 and R1 pointer registers.
Whereas all the address pointer registers and the SP can be used in many addressing modes, there are some
instructions that only work with a specific address pointer register. These cases are presented in Table 4-5
on page 4-9.
The address generation unit is connected to four major buses: CGDB, XAB1, XAB2, and PAB. The
CGDB is used to read or write any of the address generation unit registers. The XAB1 and XAB2 provide
a primary and secondary address, respectively, to the X data memory, and the PAB provides the address
when accessing the program memory.
A block diagram of the address generation unit is shown in Figure 4-1, and its corresponding programming
model is shown in Figure 4-2. The blocks and registers are explained in the following subsections.
15 0
R0
R1
R2
R3 15 0 15 0
SP N M01
NOTE:
If the N address register is changed with a MOVE instruction, this
register’s contents will be available for use on the immediately following
instruction. In this case the instruction that writes the N address register
will be stretched one additional instruction cycle. This is true for the case
when the N register is used by the immediately following instruction; if N
is not used, then the instruction is not stretched an additional cycle. If the
N address register is changed with a bit-field instruction, the new contents
will not be available for use until the second following instruction.
Other assembler forcing operators are available for jump and branch instructions, as shown in Table 4-2.
Table 4-2. Jump and Branch Forcing Operators
HHH Y, Y1, Y0
HHHH X0
1. The register field notations found in the middle column are explained in more detail in
Table 6-16 on page 6-15 and Table 6-15 on page 6-14.
modes specify that the operand is (or operands are) in memory and provide the specific address(es) of the
operand(s). A portion of the data bus movement field in the instruction specifies the memory reference to
be performed. The type of address arithmetic used is specified by the address modifier register.
Table 4-4. Addressing Mode — Address Register Indirect
Instructions that access P memory are not allowed when the XP bit in the OMR is set (that is, when the
instructions are executing from data memory).
1. Rj represents one of the four pointer registers R0-R3; Rn is any of the AGU address registers
R0-R3 or SP.
Address-register-indirect modes may require an offset and a modifier register for use in address
calculations. The address register (Rj or SP) is used as the address register, the shared offset register is used
to specify an optional offset from this pointer, and the modifier register is used to specify the type of
arithmetic performed.
Some addressing modes are only available with certain address registers (Rn). For example, although all
address registers support the “indexed by long displacement” addressing mode, only the R2 address
register supports the “indexed by short displacement” addressing mode. For instructions where two reads
are performed from the X data memory, the second read using the R3 pointer must always be from on-chip
memory. The addressed register sets are summarized in Table 4-5.
The type of arithmetic to be performed is not encoded in the instruction, but it is specified by the address
modifier register (M01 for the DSP56800 core). It indicates whether linear or modulo arithmetic is
performed when doing address calculations. In the case where there is not a modifier register for a
particular register set (R2 or R3), linear addressing is always performed. For address calculations using R0,
the modifier register is always used; for calculations using R1, the modifier register is optionally used.
Each address-register-indirect addressing mode is illustrated in the following subsections.
A2 A1 A0 A2 A1 A0
A 0 1 2 3 4 5 6 7 8 A 0 1 2 3 4 5 6 7 8
35 32 31 16 15 0 35 32 31 16 15 0
X Memory X Memory
15 0 15 0
$1000 X X X X $1000 1 2 3 4
R0 $1000 R0 $1000
15 0 15 0
N (n/a) N (n/a)
15 0 15 0
B2 B1 B0 B2 B1 B0
B A 6 5 4 3 F E D C B A 6 5 4 3 F E D C
35 32 31 16 15 0 35 32 31 16 15 0
X Memory X Memory
15 0 15 0
$2501 X X X X $2501 X X X X
$2500 X X X X $2500 F E D C
R1 $2500 R1 $2501
15 0 15 0
N (n/a) N (n/a)
15 0 15 0
B2 B1 B0 B2 B1 B0
B 0 6 5 4 3 F E D C B 0 6 5 4 3 F E D C
35 32 31 16 15 0 35 32 31 16 15 0
X Memory X Memory
15 0 15 0
$4735 X X X X $4735 6 5 4 3
$4734 X X X X $4734 X X X X
R1 $4735 R1 $4734
15 0 15 0
N (n/a) N (n/a)
15 0 15 0
Y1 Y0 Y1 Y0
Y 5 5 5 5 A A A A Y 5 5 5 5 A A A A
31 16 15 0 31 16 15 0
X Memory X Memory
15 0 15 0
$3204 X X X X $3204 X X X X
$3200 X X X X $3200 5 5 5 5
R2 $3200 R2 $3204
15 0 15 0
N $0004 N $0004
15 0 15 0
A2 A1 A0 A2 A1 A0
A F E D C B A 9 8 7 A F E D C B A 9 8 7
35 32 31 16 15 0 35 32 31 16 15 0
X Memory X Memory
15 0 15 0
$7003 X X X X $7003 E D C B
$7000 X X X X $7000 X X X X
R0 $7000 R0 $7000
15 0 + 15 0
N $0003 N $0003
15 0 15 0
A2 A1 A0 A2 A1 A0
A F E D C B A 9 8 7 A F E D C B A 9 8 7
35 32 31 16 15 0 35 32 31 16 15 0
X Memory X Memory
15 0 15 0
$7003 X X X X $7003 E D C B
$7000 X X X X $7000 X X X X
R2 $7000 R2 $7000
15 0 15 0
+
N $4567 N $4567
15 0 15 0
A2 A1 A0 A2 A1 A0
A F E D C B A 9 8 7 A F E D C B A 9 8 7
35 32 31 16 15 0 35 32 31 16 15 0
X Memory X Memory
15 0 15 0
$80CF X X X X $80CF E D C B
$7000 X X X X $7000 X X X X
R0 $7000 R0 $7000
15 0 + 15 0
N $4567 N $4567
15 0 15 0
B2 B1 B0 B2 B1 B0
B X X X X X X X X X B X A 9 8 7 X X X X
35 32 31 16 15 0 35 32 31 16 15 0
B2 B1 B0 B2 B1 B0
B X X X X X X X X X B 0 1 2 3 4 0 0 0 0
35 32 31 16 15 0 35 32 31 16 15 0
B2 B1 B0 B2 B1 B0
B X X X X X X X X X B F A 9 8 7 0 0 0 0
35 32 31 16 15 0 35 32 31 16 15 0
N XXXX N $0027
15 0 15 0
X0 XXXX X0 $FFC6
15 0 15 0
B2 B1 B0 B2 B1 B0
B X X X X X X X X X B X 0 0 1 C X X X X
35 32 31 16 15 0 35 32 31 16 15 0
B2 B1 B0 B2 B1 B0
B X X X X X X X X X B 0 0 0 1 C 0 0 0 0
35 32 31 16 15 0 35 32 31 16 15 0
B2 B1 B0 B2 B1 B0
B X X X X X X X X X B F F F C 6 0 0 0 0
35 32 31 16 15 0 35 32 31 16 15 0
1. I/O short addressing mode is used when the peripheral registers are mapped to the last 64 lo-
cations in X memory. When IP-BUS (or PGDB) interface maps these registers outside the
X:$FFC0-X:$FFFF range, they are then accessed with other suitable standard addressing mode.
X0 XXXX X0 $1234
15 0 15 0
X Memory X Memory
15 0 15 0
$5079 1 2 3 4 $5079 1 2 3 4
R2 $ABCD R2 $ABCD
15 0 15 0
X Memory X Memory
15 0 15 0
$0003 X X X X $0003 A B C D
$0000 $0000
R3 XXXX R3 $5678
15 0 15 0
$FFFB 5 6 7 8 $FFFB 5 6 7 8
Operand Reference
Uses
Addressing Mode Assembler Syntax
M011 2 3 4 5
S C D A P6 X 7
XX 8
Register Direct
Software stack No X
No update No X (Rn)
Implicit No X X X X
1. The M01 modifier can only be used on the R0/N/M01 or R1/N/M01 register sets
2. Hardware stack reference
3. Program controller register reference
4. Data ALU register reference
5. Address Generation Unit register reference
6. Program memory reference
7. X memory reference
8. Dual X memory read
The modulo arithmetic unit in the AGU simplifies the use of a circular buffer by handling the address
pointer wrapping for you. After establishing a buffer in memory, the R0 and R1 address pointers can be
made to wrap in the buffer area by programming the M01 register.
Modulo arithmetic is enabled by programming the M01 register with a value that is one less than the size
of the circular buffer. See Section 4.3.2.2, “Configuring Modulo Arithmetic,” for exact details on
programming the M01 register. Once enabled, updates to the R0 or R1 registers using one of the
post-increment or post-decrement addressing modes are performed with modulo arithmetic, and will wrap
correctly in the circular buffer.
The address range within which the address pointers will wrap is determined by the value placed in the
M01 register and the address contained within one of the pointer registers. Due to the design of the modulo
arithmetic unit, the address range is not arbitrary, but limited based on the value placed in M01. The lower
bound of the range is calculated by taking the size of the buffer, rounding it up to the next highest power of
two, and then rounding the address contained in the R0 or R1 pointers down to the nearest multiple of that
value.
For example: for a buffer size of M, a value 2k is calculated such that 2k > M. This is the buffer size
rounded up to the next highest power of two. For a value M of 37, 2k would be 64. The lower boundary of
the range in which the pointer registers will wrap is the value in the R0 or R1 register with the low-order k
bits all set to zero, effectively rounding the value down to the nearest multiple of 2k (64 in this case). This
is shown in Figure 4-16 on page 4-27.
Memory
$00B0
(Unavailable
Addresses)
Upper Boundary: $00A4 Lower Bound + Size - 1 = Upper Bound
Circular
Buffer
When modulo arithmetic is performed on the buffer pointer register, only the low-order k bits are
modified; the upper 16 - k bits are held constant, fixing the address range of the buffer. The algorithm used
to update the pointer register (R0 in this case) is as follows:
R0[15:k] = R0[15:k]
R0[k-1:0] = (R0[k-1:0] + offset) MOD (M01 + 1)
Note that this algorithm can result in some memory addresses being unavailable. If the size of the buffer is
not an even power of two, there will be a range of addresses between M and 2k-1 (37 and 63 in our
example) that are not addressable. Section 4.3.2.7.3, “Memory Locations Not Available for Modulo
Buffers,” addresses this issue in greater detail.
$0000 (Reserved) —
$4000 (Reserved) —
$7FFF (Reserved) —
$8000 (Reserved) —
$C000 (Reserved) —
$FFFE (Reserved) —
The high-order two bits of the M01 register determine the arithmetic mode for R0 and R1. A value of 00
for M01[15:14] selects modulo arithmetic for R0. A value of 10 for M01[15:14] selects modulo arithmetic
for both R0 and R1. A value of 11 disables modulo arithmetic. The remaining 14 bits of M01 hold the size
of the buffer minus one.
NOTE:
The reserved values ($0000, $4000-$8000, and $C000-$FFFE) should not
be used. The behavior of the modulo arithmetic unit is undefined for these
values, and may result in erratic program execution.
(Rn) (Rn)+
(Rn)- (Rn)+N
(Rn+N) (Rn+xxxx)
As noted in the preceding discussion, modulo arithmetic is only supported for the R0 and R1 address
registers.
$0804
Circular
Buffer M01 Register = Size - 1 = 5 - 1 = $0004
R0 $0800
The location of the buffer in memory is determined by the value of the R0 pointer when it is used to access
memory. The size of the memory buffer (five in this case) is rounded up to the nearest power of two (eight
in this case). The value in R0 is then rounded down to the nearest multiple of eight. For the base address to
be X:$0800, the initial value of R0 must be in the range X:$0800 – X:$0804. Note that the initial value of
R0 does not have to be X:$0800 to establish this address as the lower bound of the buffer. However, it is
often convenient to set R0 to the beginning of the buffer. The source code in Example 4-1 shows the
initialization of the example buffer.
Example 4-1. Initializing the Circular Buffer
MOVE #(5-1),M01 ; Initialize the buffer for five locations
MOVE #$0800,R0 ; R0 can be initialized to any location
; within the buffer. For simplicity, R0
; is initialized to the value of the lower
; boundary
The buffer is used simply by accessing it with MOVE instructions. The effect of modulo address
arithmetic becomes apparent when the buffer is accessed multiple times, as in Example 4-2 on page 4-30.
Example 4-2. Accessing the Circular Buffer
MOVE X:(R0)+,X0 ; First time accesses location $0800
; and bumps the pointer to location $0801
MOVE X:(R0)+,X0 ; Second accesses at location $0801
MOVE X:(R0)+,X0 ; Third accesses at location $0802
MOVE X:(R0)+,X0 ; Fourth accesses at location $0803
MOVE X:(R0)+,X0 ; Fifth accesses at location $0804
; and bumps the pointer to location $0800
For the first several memory accesses, the buffer pointer is incremented as expected, from $0800 to $0801,
$0802, and so forth. When the pointer reaches the top of the buffer, rather than incrementing from $0804 to
$0805, the pointer value “wraps” back to $0800.
The behavior is similar when the buffer pointer register is incremented by a value greater than one.
Consider the source code in Example 4-3, where R0 is post-incremented by three rather than one. The
pointer register correctly “wraps” from $0803 to $0801 — the pointer does not have to land exactly on the
upper and lower bound of the buffer for the modulo arithmetic to wrap the value properly.
Example 4-3. Accessing the Circular Buffer with Post-Update by Three
MOVE #(5-1),M01 ; Initialize the buffer for five locations
MOVE #$0800,R0 ; Initialize the pointer to $0800
MOVE #3,N ; Initialize “bump value” to 3
NOP
NOP
MOVE X:(R0)+N,X0 ; First time accesses location $0800
; and bumps the pointer to location $0803
MOVE X:(R0)+N,X0 ; Second accesses at location $0803
; and wraps the pointer around to $0801
In addition, the pointer register does not need to be incremented; it could be decremented instead.
Instructions that post-decrement the buffer pointer also work correctly. Executing the instruction MOVE
X:(R0)-,X0 when the value of R0 is $0800 will correctly set R0 to $0804.
2. Find the nearest power of two greater than or equal to the circular buffer size. In this
example, the value would be 2k ≥ 37, which gives us a value of k = 6.
3. From k, derive the characteristics of the lower boundary of the circular buffer. Since the “k”
least-significant bits of the address of the lower boundary must all be 0s, then the buffer
base address must be some multiple of 2k. In this case, k = 6, so the base address is some
multiple of 26 = 64.
4. Locate the circular buffer in memory.
— The location of the circular buffer in memory is determined by the upper 16 - k bits of the
address pointer register used in a modulo arithmetic operation. If there is an open area of
memory from locations 111 to 189 ($006F to $00BD), for example, then the addresses of the
lower and upper boundaries of the circular buffer will fit in this open area for J = 2:
Lower boundary = (J x 64) = (2 x 64) = 128 = $0080
Upper boundary = (J x 64) + 36 = (2 x 64) + 36 = 164 = $00A4
— The exact area of memory in which a circular buffer is prepared is specified by picking a value
for the address pointer register, R0 or R1, whose value is inclusively between the desired lower
and upper boundaries of the circular buffer. Thus, selecting a value of 139 ($008B) for R0
would locate the circular buffer between locations 128 and 164 ($0080 to $00A4) in memory
since the upper 10 (16 - k) bits of the address indicate that the lower boundary is 128 ($0080).
— In summary, the size and exact location of the circular buffer is defined once a value is assigned
to the M01 register and to the address pointer register (R0 or R1) that will be used in a modulo
arithmetic calculation.
5. Determine the upper boundary of the circular buffer, which is the lower boundary + #
locations - 1.
6. Select a value for the offset register if it is used in modulo operations.
— If the offset register is used in a modulo arithmetic calculation, it must be selected as follows:
|N| ≤ M01 + 1 [where |N| refers to the absolute value of the contents of the offset register]
— The special case where N is a multiple of the block size, 2k, is discussed in Section 4.3.2.6,
“Wrapping to a Different Bank.”
7. Perform the modulo arithmetic calculation.
— Once the appropriate registers are set up, the modulo arithmetic operation occurs when an
instruction with any of the following addressing modes using the R0 (or R1, if enabled) register
is executed:
(Rn)
(Rn)+
(Rn)-
(Rn)+N
(Rn+N)
(Rn+xxxx)
— If the result of the arithmetic calculation would exceed the upper or lower bound, then wrapping
around is correctly performed.
If |N| is greater than M01, the result is data dependent and unpredictable except for the special case where
N = L*(2k), a multiple of the block size, 2k, where L is a positive integer. For this special case when using
the (Rn)+N addressing mode, the pointer Rn will be updated using linear arithmetic to the same relative
address that is L blocks forward in memory (see Figure 4-18). Note that this case requires that the offset N
must be a positive two’s-complement integer.
2k
M
(Rn) + N MOD M01
where N = 2k (L = 1)
2k
M
This technique is useful in sequentially processing multiple tables or N-dimensional arrays. The special
modulo case of (Rn)+N with N = L*(2k) is useful for performing the same algorithm on multiple blocks of
data in memory (e.g., implementing a bank of parallel IIR filters).
In Example 4-5 there is no pipeline dependency since the R2 and N registers, used in the address
calculation, are not written in the previous instruction. Since there is no dependency, no extra instruction
cycles are inserted.
Example 4-5. No Dependency with an Address Pointer Register
MOVE #$7,R1 ; Write to R1 register
MOVE X:(R2)+N,X0 ; R1 not used in this instruction
In Example 4-6 there is no pipeline dependency since there is no address calculation performed in the
second instruction. Instead, the R1 register is used as the source operand in a MOVE instruction, for which
there is no pipeline dependency. Since there is no dependency, no extra instruction cycles are inserted.
Example 4-7 represents a special case. For the X:(Rn+xxxx) addressing mode, there is no pipeline
dependency even if the same Rn register is written on the previous cycle. This is true for R0-R3 as well as
the SP register. Since there is no dependency, no extra instruction cycles are inserted.
Example 4-7. No Dependency with (Rn+xxxx)
MOVE #$7,R1 ; Write to R1 register
MOVE X:(R1+$3456),X0 ; X:(Rn+xxxx) addressing mode using R1
In Example 4-8 there is a pipeline dependency since the N register is used in the second instruction. This is
true for using N to update R0-R3 as well as the SP register. For the case where a dependency is caused by
a write to the N register, the DSC core automatically stalls the pipeline by inserting one extra instruction
cycle. Thus, this sequence is allowed. This dependency also exists for the (Rn+N) addressing mode.
Example 4-8. Dependency with a Write to the Offset Register
MOVE #$7,N ; Write to the N register
MOVE X:(R2)+N,X0 ; N register used in address arithmetic calculation
In Example 4-9 there is a pipeline dependency since the N register is used in the second instruction. This is
true for using N to update R0-R3 as well as the SP register. For the case where a dependency is caused by
a bit-field operation on the N register, this sequence is not allowed and is flagged by the assembler. This
sequence may be fixed by rearranging the instructions or inserting a NOP between the two instructions.
This dependency only applies to the BFSET, BFCLR, or BFCHG instructions. There is no dependency for
the BFTSTH, BFTSTL, BRCLR, or BRSET instructions. This dependency also exists for the (Rn+N)
addressing mode.
Example 4-9. Dependency with a Bit-Field Operation on the Offset Register
BFSET #$7,N ; Bit-field operation on the N register
MOVE X:(R2)+N,X0 ; N register used in address arithmetic calculation
In Example 4-10 there is a pipeline dependency since the address pointer register written in the first
instruction is used in an address calculation in the second instruction. For the case where a dependency is
caused by a write to one of these registers, this sequence is not allowed and is flagged by the assembler.
This sequence may be fixed by rearranging the instructions or inserting a NOP between the two
instructions.
Example 4-10. Dependency with a Write to an Address Pointer Register
MOVE #$7,R2 ; Write to the R2 register
MOVE X:(R2)+,X0 ; R2 register used in address
; arithmetic calculation
In Example 4-11 there is a pipeline dependency since the M01 register written in the first instruction is
used in an address calculation in the second instruction. For the case where a dependency is caused by a
write to the M01 register, this sequence is not allowed and is flagged by the assembler. This sequence may
be fixed by rearranging the instructions or inserting a NOP between the two instructions.
Example 4-11. Dependency with a Write to the Modifier Register
MOVE #$7,M01 ; Write to the M01 register
MOVE X:(R0)+,X0 ; M01 register used in address arithmetic calculation
In Example 4-12 there is a pipeline dependency since the SP register written in the first instruction is used
by the immediately following JSR instruction to store the subroutine return address. The stack pointer will
not be updated with the immediate value in this case. This sequence may be fixed by inserting a NOP
between the two instructions.
Example 4-12. Dependency with a Write to the Stack Pointer Register
MOVE #$3800,SP ; Write to the SP register
JSR LABEL ; SP implicitly used to save the return address
; of the subroutine call
In Example 4-13 there is a pipeline dependency due to contention in the LF bit of the SR register. During
the first execution cycle of the BFSET instruction, the SR, whose LF bit is zero, is read. At the same time,
the first operand of the DO instruction is fetched. During the second execution cycle of the BFSET
instruction, the SR’s content is modified and written back to the SR. This is also the DO instruction decode
cycle, when the LF bit is set. In this case, the LF bit is first set by the DO decode, then cleared by the
BFSET SR modification. A cleared LF bit signals the end of a DO loop, so the DO loop is executed only
once. This sequence can be fixed by inserting a NOP instruction between these two instructions.
Example 4-13. Dependency with a Bit-Field Operation and DO Loop
BFSET #$0200,SR ; Write to the SR register
DO #8,ENDLOOP ; Repeat 8 times body of loop
; (instructions)
ENDLOOP:
PAB
PDB
16-Bit Incrementer
Instruction Decoder
HWS0 LF
HWS1 NL
Control Signals
LA
LC
Looping Control
IPR
Interrupt Control
Interrupt Request
External Mode
Select Pin(s)
OMR
Control Bits
to DSC Core
Condition Codes
from Data ALU
SR
Status and Control
Bits to DSC Core AA0008
Program Controller
15 0 15 8 7 0 15 0
PC MR CCR OMR
Program Status Register (SR) Operating Mode
Counter Register
15 0 12 0 15 0
LC LA
AA0009
Details of interrupt arbitration and the exception processing state are discussed in Section 7.3, “Exception
Processing State,” on page 7-5. The reset processing state is discussed in Section 7.1, “Reset Processing
State,” on page 7-1.
15 13 12 0
Register LC
No Bits Present LC Register LC
Used as a Source
13 LSB of
Word
15 13 12 0
Zero Extension Contents CGDB Bus Contents
of LC of LC
15 13 12 0
CGDB Bus Contents
13 LSB of
Word
Not Used
15 13 12 0
Register LC Used
No Bits Present LC Register LC
as a Destination
This register is not stacked by a DO instruction and not unstacked by end-of-loop processing, as is done on
other Freescale DSCs. Section 5.3, “Program Looping,” discusses what occurs when the loop count is zero.
See Section 8.6.4, “Nested Loops,” on page 8-22 for a discussion of nesting loops in software.
The upper three bits of this register will read as zero during DSC read operations and should be written as
zero to ensure future compatibility.
SR 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Status Register
Reset = $0300 LF I1 I0 SZ L E U N Z V C
* * * * *
Read/Write
LF — Loop Flag
I1,I0 — Interrupt Mask
SZ — Size
L — Limit
E — Extension
U — Unnormalized
N — Negative
Z — Zero
V — Overflow
C — Carry
* Indicates reserved bits that are read as zero and should be written with zero for future compatibility AA0011
0 0 (Reserved) (Reserved)
0 1 IPL 0, 1 None
1 0 (Reserved) (Reserved)
1 1 IPL 1 IPL 0
OMR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
OMR
Operating Mode
Register NL CC SD R SA EX MB MA
* * * * * * * *
Reset = $0000
Read/Write
NL — Nested Looping
CC — Condition Codes
SD — Stop Delay
R — Rounding
SA — Saturation
EX — External X Memory
MA,MB — Operating Mode
* Indicates reserved bits that are read as zero and should be written with zero for future compatibility AA0013
Program Memory
Configuration
MB MA Chip Operating Mode Reset Vector
(consult specific 56800
Family device manual)
Program Memory
Configuration
MB MA Chip Operating Mode Reset Vector
(consult specific 56800
Family device manual)
The bootstrap modes are used to initially load an on-chip program RAM upon exiting reset from external
memory or through a peripheral. Operating modes 0 and 1 typically would be different for a program
FLASH part because no bootstrapping operation is required for a FLASH part. An example of possible
operating modes for a program FLASH part are shown in Table 5-3 on page 5-11.
Table 5-3. Program FLASH Operating Modes
Program Memory
MB MA Chip Operating Mode Reset Vector
Configuration
The MB and MA bit values are typically established on reset from an external input. Once the chip leaves
reset, they can be changed under software control. For more information about how they are configured on
reset, consult the appropriate device’s user’s manual.
Saturation is performed by a dedicated circuit inside the MAC unit. The saturation logic operates by
checking 3 bits of the 36-bit result out of the MAC unit — EXT[3], EXT[0], and MSP[15]. When the SA
bit is set, these 3 bits determine if saturation is performed on the MAC unit’s output and whether to
saturate to the maximum positive or negative value, as shown in Table 5-4.
Table 5-4. MAC Unit Outputs With Saturation Mode Enabled (SA = 1)
0 0 0 (Unchanged)
0 0 1 $0 7FFF FFFF
0 1 0 $0 7FFF FFFF
0 1 1 $0 7FFF FFFF
1 0 0 $F 8000 0000
1 0 1 $F 8000 0000
1 1 0 $F 8000 0000
1 1 1 (Unchanged)
NOTE:
Saturation mode is always disabled during the execution of the following
instructions: ASLL, ASRR, LSLL, LSRR, ASRAC, LSRAC, IMPY16,
MPYSU, MACSU, AND, OR, EOR, NOT, LSL, LSR, ROL, and ROR.
For these instructions, no saturation is performed at the output of the MAC
unit.
this bit set to one. Otherwise, the chip will not generate the unsigned
conditions correctly.
The effects of the CC bit on the condition codes generated by data ALU arithmetic operations are
discussed in more detail in Section 3.6, “Condition Code Generation,” on page 3-33.
NL LF DO Loop Status
0 0 No DO loops active
1 0 (Illegal combination)
If both the NL and LF bits are set (that is, two DO loops are active) and a DO instruction is executed, a
hardware-stack-overflow interrupt occurs because there is no more space on the hardware stack to support
a third DO loop.
The NL bit is also affected by any accesses to the hardware stack register. Any MOVE instruction that
writes this register copies the old contents of the LF bit into the NL bit and then sets the LF bit. Any reads
of this register, such as from a MOVE or TSTW instruction, copy the NL bit into the LF bit and then clear
the NL bit.
data memory pointed to by the stack pointer (SP) register. The PUSH instruction macro (two instruction
cycles) pre-increments the SP register, and the POP instruction (one instruction cycle) will post-decrement
the SP register.
The program counter and the SR are pushed on this stack for subroutine calls and interrupts. These
registers are pulled from the stack for returns from subroutines using the RTS instruction (which pulls and
discards the original SR). For returns from interrupt service routines that use the RTI instruction (the entire
SR is restored from the stack).
The software stack is also used for nesting hardware DO loops in software on the DSP56800 architecture.
On the DSP56800 architecture, the user must push and pop the LA and LC registers explicitly if DO loops
are nested. In this case, the software stack is typically used for this purpose, as demonstrated in
Section 8.6.4, “Nested Loops,” on page 8-22. The hardware stack is used, however, for stacking the
address of the first instruction in the loop. Because this stack is implemented using locations in the X data
memory, there is no limit to the number of interrupts or jump-to subroutines or combinations of these that
can be accommodated by this stack.
NOTE:
Care must be taken to allocate enough space in the X data memory so that
stack operations do not overlap other areas of data used by the program.
Similarly, it may be desirable to locate the stack in on-chip memory to
avoid delays due to wait states or bus arbitration.
See Section 8.5, “Multiple Value Pushes,” on page 8-19 and Section 8.8, “Parameters and Local
Variables,” on page 8-28 for recommended techniques for using the software stack.
NOTE:
REP loops are not interruptible since they are fetched only once. A DO
loop with a single instruction can be used in place of a REP instruction if
it is necessary to be able to interrupt while the loop is in progress.
For the case of REP looping with a register value, when the register
contains the value zero, then the instruction to be repeated is not executed
(as is desired in an application), and instruction flow continues with the
next sequential instruction. This is also true when an immediate value of
zero is specified.
5.3.2 DO Looping
The DO instruction is a two-word instruction that performs hardware looping on a block of instructions. It
executes this block of instructions for the amount of times specified either with a 6-bit unsigned value or
using the 13 least significant bits of a DSC core register. DO looping is interruptible and uses one location
on the hardware stack for each DO loop. For cases where an immediate value larger than 63 is desired for
the loop count, it is possible to use the technique presented in Section 8.6.1, “Large Loops (Count Greater
Than 63),” on page 8-20.
The program controller register’s 13-bit loop count and 16-bit loop address register are used to implement
no-overhead hardware program loops. When a program loop is initiated with the execution of a DO
instruction, the following events occur:
1. The LC and LA registers are loaded with values specified in the DO instruction.
2. The SR’s LF bit is set, and its old value is placed in the NL bit.
3. The address of the first instruction in the program loop is pushed onto the hardware stack.
A program loop begins execution after the DO instruction and continues until the program address fetched
equals the loop address register contents (the last address of program loop). The contents of the loop
counter are then tested for one. If the loop counter is not equal to one, the loop counter is decremented and
the top location in the DO Loop Stack is read (but not pulled) into the PC to return to the top of the loop. If
the loop counter is equal to one, the program loop is terminated by incrementing the PC, purging the stack
(pulling the top location and discarding the contents), and continuing with the instruction immediately
after the last instruction in the loop.
NOTE:
For the case of DO looping with a register value, when the register contains
the value zero, then the loop code is repeated 2k times, where k = 13 is the
number of bits in the LC register. If there is a possibility that a register
value may be less than or equal to zero, then the technique outlined in
Section 8.6.2, “Variable Count Loops,” on page 8-21 should be used. A
DO loop with an immediate value of zero is not allowed.
The reason that nesting of hardware DO loops is supported is to provide for faster interrupt servicing.
When hardware DO loops are not nested, a second hardware stack location is left available for immediate
use by an interrupt service routine.
MOVE <any_DSCcore_register>,<X_Data_Memory>
MOVE <any_DSCcore_register>,<On_chip_peripheral_register>
MOVE <X_Data_Memory>,<any_DSCcore_register>
MOVE <On_chip_peripheral_register>,<any_DSCcore_register>
MOVE <immediate_value>,<any_DSCcore_register>
MOVE <immediate_value>,<X_Data_Memory>
MOVE <immediate_value>,<On_chip_peripheral_register>
For any MOVE instruction accessing X data memory or an on-chip memory-mapped peripheral register,
seven different addressing modes are supported. Additional addressing modes are available on the subset
of DSC core registers that are most frequently accessed, including the registers in the data ALU, and all
pointers in the address generation unit.
For all moves on the DSP56800, the syntax orders the source and destination as follows: SRC,DST. The
source of the data to be moved and the destination are separated by a comma, with no spaces either before
or after the comma.
The assembler syntax also specifies which memory is being accessed (program or data memory) on any
memory move. Table 6-1 shows the syntax for specifying the correct memory space for any memory
access; an example of a program memory access is shown where the address is contained in the register R2
and the address register is post-incremented after the access. The two examples for X data memory
accesses show an address-register-indirect addressing mode in the first example and an absolute address in
the second.
Table 6-1. Memory Space Symbols
The DSP56800 instruction set supports two additional types of moves — the single parallel move and the
dual parallel read. Both of these are considered “parallel moves” and are extremely powerful for DSC
algorithms and numeric computation.
The single parallel move allows an arithmetic operation and one memory move to be completed with one
instruction in one instruction cycle. For example, it is possible to add two numbers while reading or
writing a value from memory in the same instruction.
Figure 6-1 illustrates a single parallel move, which uses one program word and executes in one instruction
cycle.
Figure 6-2 illustrates a double parallel move, which uses one program word and executes in one instruction
cycle.
MACR X0,Y0,A X:(R0)+N,Y0 X:(R3)-,X0
Both types of parallel moves use a subset of available DSP56800 addressing modes, and the registers
available for the move portion of the instruction are also a subset of the total set of DSC core registers.
These subsets include the registers and addressing modes most frequently found in high-performance
numeric computation and DSC algorithms. Also, the parallel moves allow a move to occur only with an
arithmetic operation in the data ALU. A parallel move is not permitted, for example, with a JMP, LEA, or
BFSET instruction.
X0 Y Y1 Y0
15 0 15 0 15 0
Accumulator Registers
35 32 31 16 15 0
A A2 A1 A0
3 0 15 0 15 0
35 32 31 16 15 0
B B2 B1 B0
3 0 15 0 15 0
R1
R2
R3 15 0 15 0
SP N M01
PC MR CCR OMR
LC LA
AA0007
Instruction Description
ADD Add
Instruction Description
CLR Clear
CMP Compare
NEG Negate
NORM Normalize1
RND Round
SUB Subtract
Instruction Description
OR Logical inclusive OR
Instruction Description
Instruction Description
NOTE:
Due to instruction pipelining, if an AGU register (Rj, N, SP, or M01) is
directly changed with a bit-field instruction, the new contents may not be
available for use until the second following instruction (see the restrictions
discussed in Section 4.4, “Pipeline Dependencies,” on page 4-33).
See Section 8.1.1, “Jumps and Branches,” on page 8-2 for other instructions that can be synthesized.
Instruction Description
page 6-29 and Table 6-36 on page 6-30 and are discussed in detail in Section 6.1, “Introduction to Moves
and Parallel Moves,” and Appendix A, “Instruction Set Details.” The LEA instruction is also included in
this instruction group.
NOTE:
There is a PUSH instruction macro, described in Section 8.5, “Multiple
Value Pushes,” on page 8-19, that can be used with the POP instruction
alias presented in Section 6.5.5, “POP Alias,” on page 6-13.
Instruction Description
NOTE:
Due to instruction pipelining, if an AGU register (Rj, SP, or M01) is
directly changed with a move instruction, the new contents may not be
available for use until the second following instruction. See the restrictions
discussed in Section 4.4, “Pipeline Dependencies,” on page 4-33.
Instruction Description
Instruction Description
BRA Branch
JMP Jump
NOP No operation
Desired Remapped
Operands Operands
Instruction Instruction
Desired Remapped
Operands Operands
Instruction Instruction
Note that for the ANDC instruction, a one’s-complement of the mask value is used when remapping to the
BFCLR instruction. For the NOTC instruction, all bits in the 16-bit mask are set to one.
In Example 6-2, an immediate value is logically ORed with a location in memory.
Example 6-2. Logical OR with a Data Memory Location
ORC #$00FF,X:$0400 ; Set all bits of lower byte in X:$0400
The assembler translates this instruction into BFSET #$00FF,X:$400, which performs the same
operation. If the assembled code is later disassembled, it will appear as a BFSET instruction.
CLR X0, Y1, Y0, Identical to MOVE #0,<register>; does not set condition
A1, B1, codes
R0–R3, N
HHH A, B, A1, B1 Seven data ALU registers — two accumulators, two 16-bit MSP
X0, Y0, Y1 portions of the accumulators, and three 16-bit data registers
Y1, Y0, X0
OMR, SR
LA, LC
HWS
Table 6-15 shows the register set available for use as pointers in address-register-indirect addressing
modes. This table also shows the notation used for AGU registers in AGU arithmetic operations.
Table 6-15. Address Generation Unit (AGU) Registers
Rn R0–R3 Five AGU registers available as pointers for addressing and as sources
SP and destinations for move instructions
Rj R0, R1, R2, R3 Four pointer registers available as pointers for addressing
Table 6-16 shows the register set available for use in data ALU arithmetic operations. The most common
field used in this table is FDD.
Table 6-16. Data ALU Registers
FDD A, B Five data ALU registers — two 36-bit accumulators and three 16-bit data
X0, Y0, Y1 registers accessible during data ALU operations
F1DD A1, B1 Five data ALU registers — two 16-bit MSP portions of the
X0, Y0, Y1 accumulators and three 16-bit data registers accessible during data ALU
operations
~F,F ~F,F refers to any of two valid accumulator combinations: A,B or B,A
F1 A1, B1 The 16-bit MSP portion of two accumulators accessible as source operands
in parallel move instructions
This section contains helpful information on using the summary tables. It contains some notation used
within the tables.
The register field notation is found in Section 6.6.1, “Register Field Notation.”
Some additional notation to be considered is found in the instruction summary tables when allowed
registers for multiplications are specified (Table 6-23 on page 6-20). In these tables, the following entry is
found:
(+)Y0,X0,FDD
The notation (+) in this entry indicates that an optional + or - sign can be specified before the input register
combination. If a - is specified, the multiplication result is negated. This allows each of the following
examples to be valid DSP56800 instructions:
MAC X0,Y0,A ; A + X0*Y0 -> A
MAC +X0,Y0,A ; A + X0*Y0 -> A
MAC -X0,Y0,A ; A - (X0*Y0) -> A
As an example, Table 6-36 on page 6-30 shows all registers and addressing modes that are allowed when
performing a dual read instruction, one of the DSP56800’s parallel move instructions. The instructions
shown in Example 6-3 are allowed.
Example 6-3. Valid Instructions
MOVE X:(R0)+,Y0 X:(R3)+,X0
MACR X0,Y1,A X:(R1)+N,Y1 X:(R3)-,X0
ADD Y0,B X:(R1)+N,Y0 X:(R3)+,X0
Consulting the information in Table 6-36 on page 6-30 shows that this instruction is not valid for each of
the following reasons:
• The only operands accepted for ADD or SUB are X0,F, Y1,F, Y0,F, A,B, or B,A, where F is either
the A or B accumulator register. Thus, X0,Y1,A is an invalid entry.
• The pointer R2 is not allowed for the first memory read.
• The post-decrement addressing mode is not available for the first memory read.
• The X0 register may not be a destination for the first memory read because it is not listed in the
Destination 1 column.
• The post-update by N addressing mode is not allowed for the second memory read. The second
memory read is always identified as the memory move that uses R3 in instructions with two
memory moves. For the second memory read, only the post-increment and post-decrement
addressing modes are allowed.
• The Y0 register may not be a destination for the second memory read because it is not listed in the
Destination 2 column.
vb
HHHH X:pp
or
X:<<pp
HHHH X:aa
or
X:<aa
MOVE #<-64,63> HHHH 2 1 Signed 7-bit integer data (data is put in the lowest 7 bits
or of the word portion of any accumulator, upper 8 bits and
MOVEI extension reg are sign extended, LSP portion is set to
“0”)
X:(SP-xx) 6 2
X:xxxx 6 3
MOVE #xxxx X:pp 4 2 Move 16-bit immediate data to the last 64 locations of X
or or data memory-peripheral registers.
MOVEP X:<<pp X:<<pp represents a 6-bit absolute I/O address.
MOVE #xxxx X:aa 4 2 Move 16-bit immediate date to a location within the first
or or 64 words of X data memory.
MOVES X:<aa X:aa represents a 6-bit absolute address.
1. These instructions are not allowed when the XP bit in the OMR is set (that is, when the instructions are executing
from data memory).
A B (No transfer)
B A (No transfer)
B A R0 R1
Note: The Tcc instruction does not allow the following condition codes: HI, LS, NN, and NR.
FDD,X:aa 6 2
#<0-31>,FDD 4 1 Add an immediate integer 0–31.
#xxxx 6 2 Add a signed 16-bit immediate integer.
(parallel) 2 1 Refer to Table 6-35 & Table 6-36.
F1,DD
F1,DD
F1,DD
ALIAS: the ANDC, EORC, ORC, and NOTC can also be used to perform logical operations on registers
and data memory locations. ANDC, EORC, and ORC allow logical operations with 16-bit immediate data.
See Section 6.5.1, “ANDC, EORC, ORC, and NOTC Aliases,” for additional information.
(Rn)+N 2 1 Add N index register to the Rn register and store the result in
the Rn register
TSTW (Rn)- 2 1 Test and decrement AGU register. Refer to Table 6-25 for
other forms of TSTW that are executed in the Data ALU.
BFTSTL #<MASK16>,DDDDD 4 2 BFTSTL tests all bits selected by the 16-bit imme-
diate mask. If all selected bits are clear, then the C
#<MASK16>,X:(R2+xx) 6 2 bit is set. Otherwise it is cleared.
BFCHG #<MASK16>,DDDDD 4 2 BFCHG tests all bits selected by the 16-bit imme-
diate mask. If all selected bits are set, then the C
#<MASK16>,X:(R2+xx) 6 2 bit is set. Otherwise it is cleared. Then it inverts all
selected bits.
#<MASK16>,X:(SP-xx) 6 2
All registers in DDDDD are permitted except
#<MASK16>,X:aa 4 2 HWS.
X:aa represents a 6-bit absolute address. Refer to
#<MASK16>,X:<<pp 4 2 Absolute Short Address (Direct Addressing):
<aa> on page 4-22
#<MASK16>,X:xxxx 6 3
X:<<pp represents a 6-bit absolute I/O address.
BFCLR #<MASK16>,DDDDD 4 2 BFCLR tests all bits selected by the 16-bit imme-
diate mask. If all selected bits are set, then the C
#<MASK16>,X:(R2+xx) 6 2 bit is set. Otherwise it is cleared. Then it clears all
selected bits.
#<MASK16>,X:(SP-xx) 6 2
All registers in DDDDD are permitted except
#<MASK16>,X:aa 4 2 HWS.
X:aa represents a 6-bit absolute address. Refer to
#<MASK16>,X:<<pp 4 2 Absolute Short Address (Direct Addressing):
<aa> on page 4-22
#<MASK16>,X:xxxx 6 3
X:<<pp represents a 6-bit absolute I/O address.
BFSET #<MASK16>,DDDDD 4 2 BFSET tests all bits selected by the 16-bit imme-
diate mask. If all selected bits are clear, then the C
#<MASK16>,X:(R2+xx) 6 2 bit is set. Otherwise it is cleared. Then it sets all
selected bits.
#<MASK16>,X:(SP-xx) 6 2
All registers in DDDDD are permitted except
#<MASK16>,X:aa 4 2 HWS.
X:aa represents a 6-bit absolute address. Refer to
#<MASK16>,X:<<pp 4 2 Absolute Short Address (Direct Addressing):
<aa> on page 4-22
#<MASK16>,X:xxxx 6 3
X:<<pp represents a 6-bit absolute I/O address.
BRCLR #<MASK8>,DDDDD,<OFFSET7> 10/8 2 BRCLR tests all bits selected by the immediate mask.
#<MASK8>,X:(R2+xx),<OFFSET7> 12/10 2 If all selected bits are clear, then the carry bit is set and
a PC relative branch occurs. Otherwise it is cleared and
#<MASK8>,X:(SP-xx),<OFFSET7> 12/10 2 no branch occurs.
#<MASK8>,X:aa,<OFFSET7> 10/8 2
All registers in DDDDD are permitted except HWS.
#<MASK8>,X:<<pp,<OFFSET7> 10/8 2
#<MASK8>,X:xxxx,<OFFSET7> 12/10 3 <MASK8> specifies a 16-bit immediate value where
either the upper or lower 8 bits contains all zeros.
BRSET #<MASK8>,DDDDD,<OFFSET7> 10/8 2 BRSET tests all bits selected by the immediate mask.
#<MASK8>,X:(R2+xx),<OFFSET7> 12/10 2 If all selected bits are set, then the carry bit is set and a
PC relative branch occurs. Otherwise it is cleared and
#<MASK8>,X:(SP-xx),<OFFSET7> 12/10 2 no branch occurs.
#<MASK8>,X:aa,<OFFSET7> 10/8 2
All registers in DDDDD are permitted except HWS.
#<MASK8>,X:<<pp,<OFFSET7> 10/8 2
#<MASK8>,X:xxxx,<OFFSET7> 12/10 3 <MASK8> specifies a 16-bit immediate value where
either the upper or lower 8 bits contains all zeros.
1. First cycle count is if branch is taken (condition is true); second is if branch is not taken.
JSR <ABS16> 8 2 Push 16-bit return address and jump to 16-bit target address
RTI 10 1 Return from interrupt, restoring 16-bit PC and SR from the stack
1. First cycle count is if branch is taken (condition is true); second is if branch is not taken.
DO #<1-63>,<ABS16> 6 2 Load LC register with unsigned value and start hardware DO loop
with 6-bit immediate loop count. The last address is 16-bit abso-
lute. Loop count = 0 not allowed by assembler.
Any register allowed except: SP, M01, SR, OMR, and HWS.
ENDDO 2 1 Remove one value from the hardware stack and update the NL
and LF bits appropriately.
Note: Does not branch to the end of the loop.
Any register allowed except: SP, M01, SR, OMR, and HWS.
ILLEGAL 4 1 Execute the illegal instruction exception. This instruction is made avail-
able so that code may be written to test and verify interrupt
handlers for illegal instructions.
NOP 2 1 No operation.
SWI 8 1 Execute the trap exception at the highest interrupt priority level, level 1
(non-maskable).
X0 X:(Rj)+
ADD X0,F Y1 X:(Rj)+N
SUB Y1,F Y0
CMP Y0,F
A
TFR A,B B
B,A A1
B1
ABS F
ASL
ASR
CLR
RND
TST
INC or INCW
DEC or DECW
NEG (F = A or B) (Rj = R0-R3)
1. These instructions occupy only 1 program word and executes in 1 instruction cycle for every addressing
mode.
2. The destination of the data ALU operation is not allowed to be the same register as the destination of the
parallel read operation. Memory writes are allowed in this case.
Each instruction in Table 6-35 requires one program word and executes in one instruction cycle. The data
type accessed by the single memory move in all single parallel move instructions is signed word.
The solid double line running down the center of the table indicates that the data ALU operation is
independent from the parallel memory move. As a result, any valid operation can be combined with any
valid memory move. Example 6-5 lists examples of valid single parallel move instructions.
Example 6-5. Examples of Single Parallel Moves
MAC Y1,X0,A X:(R0)+,X0
MAC Y1,X0,A X0,X:(R0)+
ASL B X:(R0)+,Y1
ASL B Y1,X:(R0)+
It is not permitted to perform MAC A,B X:(R0)+,X0 because the MAC instruction requires three
operands, as shown in Table 6-35. The operands are not independent of the operation performed. This is
why a single line is used to separate the operation from the operands instead of a double line.
For the MAC, MPY, MACR, and MPYR instructions, the assembler accepts the two source operands in
any order.
Table 6-36. Data ALU Instructions — Dual Parallel Read
ADD X0,F
SUB Y1,F
Y0,F
(F = A or B)
MOVE
1. These parallel instructions are not allowed when the XP bit in the OMR is set (that is, when the instructions are ex-
ecuting from data memory).
2. These instructions occupy only 1 program word and executes in 1 instruction cycle for every addressing mode.
NOTE:
The data types accessed by the two memory moves in all dual parallel read
instructions are signed words.
Figure 6-4 demonstrates pipelining; F1, D1, and E1 refer to the fetch, decode, and execute operations,
respectively, of the first instruction. Note that the third instruction contains an instruction extension word
and takes two cycles to execute.
Each instruction requires a minimum of three instruction cycles (six machine cycles) to be fetched,
decoded, and executed. A new instruction may be started after two machine cycles, making the throughput
rate to be one instruction executed every instruction cycle for single-cycle instructions. Two-word
instructions require a minimum of eight machine cycles to execute, and a new instruction may start after
four machine cycles.
State Description
Reset The state where the DSC core is forced into a known reset state. Typically, the first
program instruction is fetched upon exiting this state.
Normal The state of the DSC core where instructions are normally executed.
Exception The state of interrupt processing, where the DSC core transfers program control from its current
location to an interrupt service routine using the interrupt vector table.
Wait A low-power state where the DSC core is shut down but the peripherals and interrupt machine
remain active.
Stop A low-power state where the DSC core, the interrupt machine, and most (if not all) of the periph-
erals are shut down.
Debug The state where the DSC core is halted and all registers in the On-Chip Emulation (OnCE) port
of the processor are accessible for program debug.
5. Clears the status register’s (SR) loop flag and condition code bits and sets the interrupt
mask bits
6. Clears the following bits in the operating mode register: nested looping, condition codes,
stop delay, rounding, and external X memory
The DSC remains in the reset state until the RESET pin is deasserted. When hardware deasserts the
RESET pin, the following occur:
1. The chip operating mode bits in the OMR are loaded from an external source, typically mode
select pins; see the appropriate device manual for details.
2. A delay of 16 instruction cycles (NOPs) occurs to sync the local clock generator and state
machine.
3. The chip begins program execution at the program memory address defined by the state of
the MA and MB bits in the OMR and the type of reset (hardware or COP time-out). The
first instruction must be fetched and then decoded before execution. Therefore, the first
instruction execution is two instruction cycles after the first instruction fetch.
After this last step, the DSC enters the normal processing state upon exiting reset. It is also possible for the
DSC to enter the debug processing state upon exiting reset when system debug is underway.
Instruction Cycle
Operation
1 2 3 4 5 6 7 • • •
Fetch F1 F2 F3 F3e F4 F5 F6 • • •
Decode D1 D2 D3 D3e D4 D5 • • •
Execute E1 E2 E3 E3e E4 • • •
Table 7-2 demonstrates pipelining. “F1,” “D1,” and “E1” refer to the fetch, decode, and execute operations
of the first instruction, respectively. The third instruction, which contains an instruction extension word,
takes two instruction cycles to execute. Although it takes three instruction cycles (six machine cycles) for
the pipeline to fill and the first instruction to execute, an instruction usually executes on each instruction
cycle thereafter (two machine cycles).
Memory Space
Number of
Comments
Program X Memory X Memory Additional Cycles
Fetch First Access Second Access
Note: The ‘mv’ and ‘mvm’ cycle time values reflect the additional time required for all MOVE instructions and for
MOVEM instructions, respectively.
No Pipeline Effect
ORC #$0001,SR ; Changes carry bit at the end of execution time slot
JCS LABEL ; Reads condition codes in SR in its
; execution time slot
The JCS instruction will test the carry bit modified by the ORC without any pipeline effect in this code segment.
Pipeline Effect
ORC #$0008,OMR ; Sets EX bit at execution time slot
MOVE X:$17,A ; Reads internal memory instead of external
; memory
A pipeline effect occurs because the address of the MOVE is formed at its decode time before the ORC changes the EX bit
(which changes the memory map) in the ORC’s execution time slot. The following code produces the expected results of reading
the external FLASH:
ORC #$0008,OMR ; Sets EX bit at execution time slot
NOP ; Delays the MOVE so it will read the updated memory map
MOVE X:$17,A ; Reads external memory
Section 4.4, “Pipeline Dependencies,” on page 4-33 contains more details on interlocks caused during
address generation.
AA0056
Steps 1 through 3 listed on page page 7-5 require two additional instruction cycles, effectively making the
interrupt pipeline five levels deep.
Interrupt
Interrupt
Starting Interrupt Source
Priority Level
Address
$0004 - (Reserved)
$0008 1 SWI
$000E 1 (Reserved)
$0010 0 IRQA
$0012 0 IRQB
...
It is required that a two-word JSR instruction is present in any interrupt vector location that may be fetched
during exception processing. If an interrupt vector location is unused, then the JSR instruction is not
required.
The hardware reset and COP reset are special cases because they are reset vectors, not interrupt vectors.
There is no IPL specified for these two because these conditions reset the chip and reset takes precedence
over any interrupt. Typically a two-word JMP instruction is used in the reset vectors. The hardware reset
vector will either be at address $0000 or $E000 and the COP reset vector will either be at $0002 or $E002
depending on the operating mode of the chip. The different operating modes are discussed in
Section 5.1.9.1, “Operating Mode Bits (MB and MA) — Bits 1–0,” on page 5-10.
The interrupt mask bits (I1, I0) in the SR reflect the current priority level and indicate the IPL needed for
an interrupt source to interrupt the processor (see Table 7-6). Interrupts are inhibited for all priority levels
below the current processor priority level. Level 1 interrupts, however, are not maskable and, therefore,
can always interrupt the processor.
Table 7-6. Interrupt Mask Bit Definition in the Status Register
0 0 (Reserved) (Reserved)
0 1 IPL 0, 1 None
1 0 (Reserved) (Reserved)
1 1 IPL 1 IPL 0
IRQA Mode
IRQB Mode
(Reserved)
Channel 6 IPL
Channel 5 IPL
Channel 4 IPL
Channel 3 IPL
Channel 2 IPL
Channel 1 IPL
Channel 0 IPL
* Indicates reserved bits, read as zero and should be written with zero for future compatibility
AA0057
In the example interrupt priority register (IPR), shown in Figure 7-2, the interrupt for each on-chip
peripheral device (channels 0–6) and for each external interrupt source (IRQA, IRQB), can be enabled or
disabled under software control. The IPR also specifies the trigger mode of the external interrupt sources.
Figure 7-3 shows how it might be programmed for different interrupts.
0 No —
1 Yes 0
IBL0 IBL1
Enabled? IPL Trigger Mode
IAL0 IAL1
0 Level sensitive
0 No —
1 Edge sensitive
1 Yes 0
AA0058
When an interrupt request is recognized and accepted by the DSC core, a two-word JSR instruction is
fetched from the interrupt vector table. Because the program flow is directed to a different starting address
within the table for each different interrupt, the interrupt structure can be described as “vectored.” A
vectored interrupt structure has low execution overhead. If it is known beforehand that certain interrupts
will not be used or enabled, those locations within the table can instead be used for program or data
storage.
Main Interrupt
Program Service Routine
Fetches Fetches
II (NOP)
n6
No Fetch ii1
No Fetch ii2
ii3
ii4
ii5
This interrupt can be used as a diagnostic tool to allow the programmer to examine the stack and locate the
illegal instruction, or the application program can be restarted with the hope that the failure was a soft
error. The ILLEGAL instruction, found in Appendix A, “Instruction Set Details,“ is useful for testing the
illegal interrupt service routine to verify that it can recover correctly from an illegal instruction. Note that
the illegal instruction trap does not fire for all invalid opcodes.
interrupts are prioritized according to the IPLs shown in Table 7-7, and the interrupt source with the
highest priority is selected. The interrupt vector corresponding to that source is then placed on the program
address bus so that the program controller can fetch the interrupt instruction.
Table 7-7. Fixed Priority Structure Within an IPL
Level 1 (Non-maskable)
Illegal instruction —
HWS overflow —
OnCE trap —
Lower SWI —
Level 0 (Maskable)
Interrupts from a given source are not buffered. The processor will not arbitrate a new interrupt from the
same source until after it fetches the second word of the interrupt vector of the current interrupt.
An internal interrupt-acknowledge signal clears the appropriate interrupt-pending flag for DSC core
interrupts. Some peripheral interrupts may also be cleared by the internal interrupt-acknowledge signal, as
defined in their specifications. Peripheral interrupt requests that need a read/write action to some register
do not receive the internal interrupt-acknowledge signal, and their interrupt requests will remain pending
until their registers are read/written. Further, if the interrupt comes from an IRQ pin and is programmed as
level triggered, the interrupt request will not be cleared. The acknowledge signal will be generated after the
interrupt vectors have been generated, not before.
If more than one interrupt is pending when an instruction is executed, the processor will first service the
interrupt with the highest priority level. When multiple interrupt requests with the same IPL are pending, a
second fixed-priority structure within that IPL determines which interrupt the processor will service. For
two interrupts programmed at the same priority level (non-maskable or level 0), Table 7-7 shows the
exception priorities within the same priority level. The information in this table only applies when two
interrupts arrive simultaneously or where two interrupts are simultaneously pending.
Whenever a level 0 interrupt has been recognized and exception processing begins, the DSP56800
interrupt controller changes the interrupt mask bits in the program controller’s SR to allow only level 1
interrupts to be recognized. This prevents another level 0 interrupt from interrupting the interrupt service
routine in progress. If an application requires that a level 0 interrupt can interrupt the current interrupt
service routine, it is necessary to use one of the techniques discussed in Section 8.10.1, “Setting Interrupt
Priorities in Software,” on page 8-30.
Interrupt Interrupt
Vector Table Subroutine
Main
Program
Interrupt PC Resumes
Synchronized JSR ii2 Operation
and Jump Address ii3
Interrupts
Recognized n1 ii4 Re-enabled
as Pending n2
Interrupt
Routine
Explicit iin
Return From RTI
Interrupt
(Should Be RTI)
Interrupts Re-enabled
i = Interrupt
ii = Interrupt Instruction Word
n = Normal Instruction Word
(b) Program Controller Pipeline
AA0069
Figure 7-5 demonstrates the interrupt pipeline. The point at which interrupts are re-enabled and subsequent
interrupts are allowed is shown to illustrate the non-interruptible nature of the early instructions in the long
interrupt service routine.
Reset is a special exception, which will normally contain only a JMP instruction at the exception start
address.
There is only one case in which the stacked address will not point to the illegal instruction. If the illegal
instruction follows an REP instruction (see Figure 7-6), the processor will effectively execute the illegal
instruction as a repeated NOP, and the interrupt vector will then be inserted in the pipeline. In this
illustration, the first instruction (n7 in Figure 7-6) following an illegal instruction (n6) is lost as a
consequence of the illegal opcode. The second instruction following an illegal instruction will be the next
instruction that will be fetched, decoded, and executed normally (n8).
In DO loops, if the illegal instruction is in the loop address (LA) location and the instruction preceding it
(that is, at LA-1) is being interrupted, the loop counter (LC) will be decremented as if the loop had reached
the LA instruction. When the interrupt service ends and the instruction flow returns to the loop, the
instruction after the illegal instruction will be fetched (since it is the next sequential instruction in the
flow).
Interrupts Re-enabled
i = Interrupt
ii = Interrupt Instruction Word
n = Normal Instruction Word
i% = Interrupt Rejected
(b) Program Controller Pipeline AA0071
Figure 7-8 shows a wait instruction being fetched, decoded, and executed. It is fetched as n3 in this
example and, during decode, is recognized as a wait instruction. The following instruction (n4) is aborted,
and the internal clock is disabled from all internal circuitry except the internal peripherals. The processor
stays in this state until an interrupt or reset is recognized. The response time is variable due to the timing of
the interrupt with respect to the internal clock.
i = Interrupt
ii = Interrupt Instruction Word
n = Normal Instruction Word Only Internal Peripherals
Receive Clock
AA0074
Figure 7-8 shows the result of an interrupt bringing the processor out of the wait state. The two appropriate
interrupt vectors are fetched and put in the instruction pipe. The next instruction fetched is n4, which had
been aborted earlier. Instruction execution proceeds normally from this point.
Figure 7-9 shows an example of the wait instruction being executed at the same time that an interrupt is
pending. Instruction n4 is aborted, as in the preceding example. The wait instruction causes a
five-instruction-cycle delay from the time it is decoded, after which the interrupt is processed normally.
The internal clocks are not turned off, and the net effect is that of executing eight NOP instructions
between the execution of n2 and ii1.
IRQA
Fetch n3 n4 — — — — n4
Decode n2 STOP — — — —
Execute n1 n2 STOP STOP STOP STOP
131,072 T or 16 T
Clock Stopped Cycle Count Started
AA0076
Figure 7-11 shows the system being restarted through asserting the IRQA signal. If the exit from the stop
state was caused by a low level on the IRQA pin, then the processor will service the highest priority
pending interrupt. If no interrupt is pending, then the processor resumes at the instruction following the
STOP instruction that brought the processor into the stop state.
IRQA
Fetch n3 n4 — — — — ii1
Decode n2 STOP — — — —
Execute n1 n2 STOP STOP STOP STOP
Stop Cycle Count 1 2 3 4 5 6 7 8 9 10 11 12 (13)
IRQA = Interrupt Request A Signal
n = Normal Instruction Word Resume Stop Cycle Count 6,
STOP = Interrupt Instruction Word Interrupts Enabled
131,072 T or 16 T
Clock Stopped Cycle Count Started
AA0077
An IRQA deasserted before the end of the stop cycle count will not be recognized as pending. If IRQA is
asserted when the stop cycle count completes, then an IRQA interrupt will be recognized as pending and
will be arbitrated with any other interrupts.
Specifically, when IRQA is asserted, the internal clock generator is started and begins a delay determined
by the SD bit of the OMR. When the chip uses the internal clock oscillator, the SD bit should be set to zero
to allow a longer delay time of 128K T cycles (131,072 T cycles), so that the clock oscillator may stabilize.
When the chip uses a stable external clock, the SD bit may be set to one to allow a shorter (16 T cycle)
delay time and a faster startup of the chip.
For example, assume that the SD equals 0 so that the 128K T counter is used. During the 128K T count the
processor ignores interrupts until the last few counts and, at that time, begins to synchronize them. At the
end of the 128K T cycle delay period, the chip restarts instruction processing, completes stop cycle 4
(interrupt arbitration occurs at this time), and executes stop cycles 7, 8, 9, and 10. (It takes 17 T from the
end of the 128K T delay to the first instruction fetch.) If the IRQA signal is released (pulled high) after a
minimum of 4T but after fewer than 128K T cycles, no IRQA interrupt will occur, and the instruction
fetched after stop cycle 8 will be the next sequential instruction (n4 in Figure 7-10). An IRQA interrupt
will be serviced as shown in Figure 7-11 if the following conditions are true:
1. The IRQA signal had previously been initialized as level sensitive.
2. IRQA is held low from the end of the 128K T cycle delay counter to the end of stop cycle
count 8.
3. No interrupt with a higher interrupt level is pending.
If IRQA is not asserted during the last part of the STOP instruction sequence (6, 7, and 8) and if no
interrupts are pending, the processor will refetch the next sequential instruction (n4). Since the IRQA
signal is asserted, the processor will recognize the interrupt and fetch and execute the JSR instruction
located at P:$0010 and P:$0011 (the IRQA interrupt vector locations).
To ensure servicing IRQA immediately after leaving the stop state, the following steps must be taken
before the execution of the STOP instruction:
1. Define IRQA as level sensitive; an edge-triggered interrupt will not be serviced.
2. Ensure that no stack error is pending.
3. Execute the STOP instruction and enter the stop state.
4. Recover from the stop state by asserting the IRQA pin and holding it asserted for the entire
clock recovery time. If it is low, the IRQA vector will be fetched.
5. The exact elapsed time for clock recovery is unpredictable. The external device that asserts
IRQA must wait for some positive feedback, such as a specific memory access or a change
in some predetermined I/O pin, before deasserting IRQA.
The STOP sequence totals 131,104 T cycles (if the SD equals 0) or 48 T cycles (if the SD equals 1) in
addition to the period with no clocks from the stop fetch to the IRQA vector fetch (or next instruction).
However, there is an additional delay if the internal oscillator is used. An indeterminate period of time is
needed for the oscillator to begin oscillating and then stabilize its amplitude. The processor will still count
131,072 T cycles (or 16 T cycles), but the period of the first oscillator cycles will be irregular; thus, an
additional period of 19,000 T cycles should be allowed for oscillator irregularity (the specification
recommends a total minimum period of 150,000 T cycles for oscillator stabilization). If an external
oscillator is used that is already stabilized, no additional time is needed.
The PLL may or may not be disabled when the chip enters the stop state. If it is disabled and will not be
re-enabled when the chip leaves the stop state, the number of T cycles will be much greater because the
PLL must regain lock.
If the STOP instruction is executed when the IRQA signal is asserted, the clock generator will not be
stopped, but the four-phase clock will be disabled for the duration of the 128K T cycle (or 16 T cycle)
delay count. In this case the STOP instruction looks like a 131,072 T + 35 T cycle (or 51 T cycle) NOP,
since the STOP instruction itself is eight instruction cycles long (32 T) and synchronization of IRQA is 3
T, totaling 35 T.
A stack error interrupt that is pending before the processor enters the stop state is not cleared and will
remain pending. During the clock-stabilization delay in stop mode, any edge-triggered IRQ interrupts are
cleared and ignored.
If RESET is used to restart the processor (see Figure 7-12), the 128K T cycle delay counter would not be
used, all pending interrupts would be discarded, and the processor would immediately enter the Reset
processing state as described in Section 7.1, “Reset Processing State.” For example, the stabilization time
recommended in DSP56824 Technical Data for the clock (RESET should be asserted for this time) is only
50 T for a stabilized external clock, but is the same 150,000 T for the internal oscillator. These stabilization
times are recommended and are not imposed by internal timers or time delays. The DSC fetches
instructions immediately after exiting reset. If the user wishes to use the 128K T (or 16 T) delay counter, it
can be started by asserting IRQA for a short time (about two clock cycles).
RESET
Processor Enters
Reset State Processor Leaves Reset State
Operation Description
JRSET, JRCLR Jumps if all selected bits in bit field is set or clear
BR1SET, BR1CLR Branches if at least one selected bit in bit field is set or clear
JR1SET, JR1CLR Jumps if at least one selected bit in bit field is set or clear
Operation Description
JVS, JVC, BVS, BVC Jumps or branches if the overflow bit is set or clear
JPL, JMI, JES, JEC, JLMS, JLMC, Jumps or branches on other condition codes
BPL, BMI, BES, BEC, BLMS, BLMC
NEG Negates another data ALU register, an AGU register, or a memory location
Accumulator sign extend Sign extends the accumulator into the A2 or B2 portion
Accumulator unsigned load Zeros the accumulator LSP and extension register
; JRCLR Operation
; Emulated in 5 Icyc (4 Icyc if false), 4 Instruction Words
BFTSTL #xxxx,X:<ea> ; 16-bit mask allowed
JCS label ; 16-bit jump address allowed
; BR1CLR Operation
; Emulated in 5 Icyc (4 Icyc if false), 3 Instruction Words
BFTSTH #xxxx,X:<ea> ; 16-bit mask allowed
BCC label ; 7-bit signed PC relative offset allowed
; JR1CLR Operation
; Emulated in 5 Icyc (4 Icyc if false), 4 Instruction Words
BFTSTH #xxxx,X:<ea> ; 16-bit mask allowed
JCC label ; 16-bit jump address allowed
; BVC Operation
; Emulated in 5 Icyc (4 Icyc if false), 3 Instruction Words
BFTSTH #$0002,SR ; Test V bit in SR
BCC label ; 7-bit signed PC relative offset allowed
; BES Operation
; Emulated in 5 Icyc (4 Icyc if false), 3 Instruction Words
BFTSTH #$0020,SR ; Test E bit in SR
BCS label ; 7-bit signed PC relative offset allowed
Similar code can be written for JMI, JEC, JES, JLMC, JLMS, BPL, BMI, BEC, BLMC, and BLMS. The
JLMS and JLMC are used for “jump if limit set” and “jump if limit clear,” respectively; this is done to
avoid any confusion with the JLS (“jump if lower or same”) instruction.
The NEG instruction can be used directly, executing in one instruction cycle, in cases where it is already
known that the least significant portion (LSP) of an accumulator is $0000. This is true immediately after a
value is moved to the A or B accumulator from memory or a register, as shown in the following code:
; Example of 1 Icyc NEGW Operation
; Works because A0 is already equal to $0000
MOVE X:(R0),A ; Move a 16-bit value to an accumulator,
; clearing A0 register
NEG A ; Now negates upper 20 bits of accumulator
; since A0 = 0
The technique shown in the following code can be used for cases when 16-bit data is being processed and
when it can be guaranteed that the LSP or extension register of the accumulator contains no required
information:
; 16-bit NEGW Operation
; Operates on MSP, Forces EXT to sign extension, LSP to $0, 2 Icyc
MOVE A1,A ; Force A2 to sign extension,
; force A0 cleared
NEG A ; Now negates upper 20 bits of accumulator
; since A0 = 0
The following technique may be used for the case where the CC bit in the SR is set to a 1, the LSP may not
be $0000, and the user is not interested in the values in the accumulator extension registers:
; 16-bit NEGW Operation
; CC bit must be set, operates on MSP, doesn’t affect A0, 2 Icyc
NOT A ; One’s-complement of A1, A2 unchanged
INCW A ; Increment to get two’s-complement,
; A2 may be incorrect
; MAX operation
; Emulated at 4 Icyc
CMP X0,A
TLT X0,A ; (can also use TLE if desired)
; MIN Operation
; Emulated at 4 Icyc
CMP Y0,A
TGT Y0,A ; (can also use TGE if desired)
; ASR12 Operation
; Emulated in 8 Icyc, 8 Instruction Words
ASL A
ASL A
ASL A
ASL A
PUSH A1 ; (PUSH is a 2-word, 2 Icyc macro)
MOVE A2,A
POP A0
; ASR13 Operation
; Emulated in 7 Icyc, 7 Instruction Words
ASL A
ASL A
ASL A
PUSH A1 ; (PUSH is a 2-word, 2 Icyc macro)
MOVE A2,A
POP A0
; ASR14 Operation
; Emulated in 6 Icyc, 6 Instruction Words
ASL A
ASL A
PUSH A1 ; (PUSH is a 2-word, 2 Icyc macro)
MOVE A2,A
POP A0
; ASR15 Operation
; Emulated in 5 Icyc, 5 Instruction Words
ASL A
PUSH A1 ; (PUSH is a 2-word, 2 Icyc macro)
MOVE A2,A
POP A0
; ASR16 Operation
; Emulated in 2 Icyc, 2 Instruction Words
MOVE A1,A0 ; (Assumes EXT contains sign extension)
MOVE A2,A1
; ASR17 Operation
; Emulated in 3 Icyc, 3 Instruction Words
ASR A
MOVE A1,A0 ; (Assumes EXT contains sign extension)
MOVE A2,A1
; ASR18 Operation
; Emulated in 4 Icyc, 4 Instruction Words
ASR A
ASR A
MOVE A1,A0 ; (Assumes EXT contains sign extension)
MOVE A2,A1
; ASR19 Operation
; Emulated in 5 Icyc, 5 Instruction Words
ASR A
ASR A
ASR A
MOVE A1,A0 ; (Assumes EXT contains sign extension)
MOVE A2,A1
; ASR20 Operation
; Emulated in 6 Icyc, 6 Instruction Words
ASR A
ASR A
ASR A
ASR A
; ASL17 Operation
; Emulated in 5 Icyc, 5 Instruction Words
ASL A
PUSH A1 ; (PUSH is a 2-word, 2 Icyc macro)
MOVE A0,A
POP A2
; ASL18 Operation
; Emulated in 6 Icyc, 6 Instruction Words
ASL A
ASL A
PUSH A1 ; (PUSH is a 2-word, 2 Icyc macro)
MOVE A0,A
POP A2
; ASL19 Operation
; Emulated in 7 Icyc, 7 Instruction Words
ASL A
ASL A
ASL A
PUSH A1 ; (PUSH is a 2-word, 2 Icyc macro)
MOVE A0,A
POP A2
8.4 Division
It is possible to perform fractional or integer division on the DSP56800 core. There are several questions to
consider when implementing division on the DSC core:
• Are both operands always guaranteed to be positive?
• Are operands fractional or integer?
• Is only the quotient needed, or is the remainder needed as well?
• Will the calculated quotient fit in 16 bits in integer division?
• Are the operands signed or unsigned?
• How many bits of precision are in the dividend?
• What about overflow in fractional and integer division?
• Will there be “integer division” effects?
NOTE:
In a division equation, the “dividend” is the numerator, the “divisor” is the
denominator, and the “quotient” is the result.
Once all these questions have been answered, it is possible to select the appropriate division algorithm. The
fractional algorithms support a 32-bit signed dividend, and the integer algorithms support a 31-bit signed
dividend. All algorithms support a 16-bit divisor.
Note that the most general division algorithms are the fractional and integer algorithms for four-quadrant
division that generate both a quotient and a remainder. These take the largest number of instruction cycles
to complete and use the most registers.
For extended precision division, where the number of quotient bits required is more than 16, the DIV
instruction and routines presented in this section are no longer applicable. For further information on
division algorithms, consult the following references (or others as required):
Theory and Application of Digital Signal Processing, Lawrence R. Rabiner and Bernard Gold
(Prentice-Hall: 1975), pages 524–530.
Computer Architecture and Organization, John Hayes (McGraw-Hill: 1978), pages 190–199.
NOTE:
The REP instruction is not interruptible; therefore, if user requires a
interruptible sequence on the division, it is advisable to use the DO
instruction or perform loop unrolling on the REP sequences.
; Setup
MOVE B,Y1 ; Copy dividend to Y1
ABS B ; Force the dividend positive
TSTW B ; TSTW always clears carry bit and more efficient
; than using BFCLR #$0001,SR
; Division Operation
REP #16 ; Carry bit must be clear for first DIV
DIV X0,B ; Form positive quotient in B0
; Setup
ASL B ; Shift of dividend required for integer division
MOVE B,Y1 ; Save Sign Bit of dividend (B1) in MSB of Y1
ABS B ; Force the dividend positive
TSTW B ; TSTW always clears carry bit and more efficient
; than using BFCLR #$0001,SR
; Division Operation
REP #16 ; Carry bit must be clear for first DIV
DIV X0,B ; Form positive quotient in B0
; Setup
MOVE B1,Y1 ; Save dividend sign to identify quadrant
MOVE B1,N ; Save dividend sign - remainder must have same
ABS B ; Force dividend positive
TSTW B ; TSTW always clears carry bit and more efficient
; than using BFCLR #$0001,SR
; Division Operation
REP #16 ; Carry bit must be clear for first DIV
DIV X0,B ; Form positive quotient in B0
; Setup
MOVE A2,A ; Set up accumulator with sign of remainder
MOVE Y1,A0 ; Move fractional remainder to LSP
MAC Y0,X0,A ; Multiply quotient with divisor and add remainder
; Accumulator contains original dividend value
; Setup
ASL B ; Shift of dividend required for integer division
MOVE B1,Y1 ; Save dividend sign to identify quadrant
MOVE B1,N ; Save dividend sign - remainder must have same
ABS B ; Force dividend positive
TSTW B ; TSTW always clears carry bit and more efficient
; than using BFCLR #$0001,SR
; Division Operation
REP #16 ; Carry bit must be clear for first DIV
DIV X0,B ; Form positive quotient in B0
; Setup
MOVE A2,A ; Set up accumulator with sign of remainder
MOVE Y1,A0 ; Move integer remainder to LSP
ASL A ; Correct remainder for fractional operation
MAC Y0,X0,A ; Multiply quotient with divisor and add remainder
ASR A ; Correct result to obtain integer value
; Accumulator contains original dividend value
When remainders are computed, the results can be easily checked by multiplying the quotient to the divisor
and adding the remainder to the product as shown at the end of the 4-Quadrant algorithms with remainder.
The final answer should be the same as the original dividend.
Another use of the PUSH instruction is for temporary storage. Sometimes a temporary variable is required,
such as in swapping two registers. There are two techniques for doing this, the first using an unused
register and the second using a location on the stack. The second technique uses the PUSH instruction
macro and works whenever there are no other registers available. The two techniques are shown in the
following code:
; Swapping two registers (X0, R0) using an Available Register (N)
; 3 Icyc, 3 Instruction Words
MOVE X0,N ; X0 -> TEMP
MOVE R0,X0 ; R0 -> X0
MOVE N,R0 ; TEMP -> R0
8.6 Loops
The DSP56800 core contains a powerful and flexible hardware DO loop mechanism. It allows for loop
counts of up to 8,192 iterations, large number of instructions (maximum of 64K) to reside within the body
of the loop, and hardware DO loops can be interrupted. In addition, loops execute correctly from both
on-chip and off-chip program memory, and it is possible to single step through the instructions in the loop
using the OnCE port for emulation.
The DSP56800 core also contains a useful hardware REP loop mechanism, which is very useful for very
simple, fast looping on a single instruction. It is very useful for simple nesting when the inner loop only
contains a single instruction. For a REP loop, the instruction to be repeated is only fetched once from
program memory, reducing activity on the buses. This is very useful when executing code from off-chip
program memory. However, REP loops are not interruptible.
NOTE:
This technique should not be used for the REP instruction because it will
destroy the value of the LC register if done by a REP instruction nested
within a hardware DO loop.
8.6.4.1 Recommendations
For nested looping it is recommended that the innermost loop be a hardware DO loop when appropriate
and that all outer loops be implemented as software loops. Even though it is possible to nest hardware DO
loops, it is better to implement all outer loops using software looping techniques for two reasons:
1. The DSP56800 allows only two nested hardware DO loops.
2. The execution time of an outer hardware loop is comparable to the execution time of a
software loop.
Likewise, there is little difference in code size between a software loop and an outer loop implemented
using the hardware DO mechanism.
The hardware nesting capability of DO loops should instead be used for efficient interrupt servicing. It is
recommended that the main program and all subroutines use no nested hardware DO loops. It is also
recommended that software looping be used whenever there is a JSR instruction within a loop and the
called subroutine requires the hardware DO loop mechanism. If these two rules are followed, then it can be
guaranteed that no more than one hardware DO loop is active at a time. If this is the case, then the second
HWS location is always available to ISRs for faster interrupt processing. This significantly reduces the
amount of code required to free up and restore the hardware looping resources such as the HWS when
entering and exiting an ISR, since it is already known upon entering the ISR that a HWS location is
available.
If this technique is used, the ISRs should not themselves be interruptible, or, if they can be interrupted,
then any ISR that can interrupt an ISR already in progress must save off one HWS location. See
Section 8.12, “Freeing One Hardware Stack Location.”
The following code shows the recommended nesting technique:
Additional
Number of Icyc to Number of Icyc Total Number of
Loop Technique
Set Up Loop Executed Instruction Words
Each Loop
It is recommended that the nesting of hardware DO loops not be used for implementing nested loops.
Instead, it is recommended that all outer loops in a nested looping scheme be implemented using software
looping techniques. Likewise, it is recommended that software looping techniques be used when a loop
contains a JSR and the called routine contains many instructions or contains a hardware DO loop.
;
; ------ Technique 1 ------
;
;
; ------ Technique 2 ------
;
PUSH LC ; Save outer loop registers if nested loop
PUSH LA
DO #LoopCnt,LABEL
; (instructions)
Bcc OVER ; 3 Icyc for each iteration
ENDDO ; 6 Icyc if loop terminates when false
BRA LABEL
OVER:
(instructions)
LABEL:
POP LA ; Restore outer loop registers if nested loop
POP LC
ROUTINE1:
MOVE #5,N ; Allocate room for local variables
LEA (SP)+N
; (instructions)
MOVE X:(SP-9),r0 ; Get pointer variable
MOVE X:(SP-7),B ; Get second data variable
MOVE X:(R0),X0 ; Get data pointed to by pointer variable
ADD X0,B
MOVE B,X:(SP-8) ; Store sum in first data variable
; (instructions)
MOVE #-5,N
LEA (SP)+N
RTS
In a similar manner it is also possible to allocate space and to access variables that are locally used by a
subroutine, referred to as local variables. This is done by reserving stack locations above the location that
stores the return address stacked by the JSR instruction. These locations are then accessed using the
DSP56800’s stack addressing modes. For the case of local variables, the value of the stack pointer is
updated to accommodate the local variables. For example, if five local variables are to be allocated, then
the stack pointer is increased by the value of five to allocate space on the stack for these local variables.
When large numbers of variables are allocated on the stack, it is often more efficient to use the (SP)+N
addressing mode.
It is possible to support passed parameters and local variables for a subroutine at the same time. In this case
the program first pushes all passed parameters onto the stack (see Figure 8-1) using the technique outlined
in Section 8.5, “Multiple Value Pushes.” Then the JSR instruction is executed, which pushes the return
address and the SR onto the stack. Upon being entered, the subroutine first allocates space for local
variables by updating the SP. Then, both passed parameters and local variables can be accessed with the
stack addressing modes.
X Data Memory
Status Register
Return Address
AA0092
; Enter Section with Tight Loop - R3 and A can now be used by tight loop
MOVE $C000,R3
CLR A
MOVE X:(R0)+,Y0 X:(R3)+,X0
REP #32
MAC X0,Y0,A X:(R0)+,Y0 X:(R3)+,X0
MOVE A,X:(R2)+ ; store result
In the preceding example there are four PUSH instruction macros in a row. For more efficient and compact
code, use the technique outlined in Section 8.5, “Multiple Value Pushes.” In certain cases it may also be
possible to store critical information within the first 64 locations of X data memory, on the top of the stack,
or in an unused register such as N when an extra location is required within a tight loop itself.
8.10 Interrupts
The interrupt mechanism on the DSP56800 is simple, yet flexible. There are two levels of interrupts:
maskable and non-maskable. All maskable interrupts on the chip can be masked at one spot in the SR.
Likewise, individual peripherals can be individually masked within one register, within the interrupt
priority register (IPR), or at the peripheral itself. It is beneficial to have a single register in which all
maskable interrupts can be individually masked. This gives the user the capability to set up interrupt
priorities within software.
When programming interrupts, it is necessary to correctly set up the following tasks:
1. Initialize and program the peripheral, enabling interrupts within the peripheral.
2. Program the IPR to enable interrupts on that particular interrupt channel.
3. Enable interrupts in the SR.
; (interrupt code)
; Note: Can use any core register in <register>, e.g. MOVE #LABEL,X0
MOVE <register>,X:(SP)+ ; Push address of target code location
MOVE SR,X:(SP) ; Push SR onto stack last
RTS ; Will return to address specified in <register> and
; correct the SP register to its original value
LEA (R2)+
MOVE X:(R2)+,HWS ; Puts one value onto stack and sets LF bit
BRCLR #$8000,X:(R2),OVER
; If NL bit set, then push a value onto HWS
LEA (R2)+
MOVE X:(R2)+,HWS
OVER:
JTAG OnCE
OnCE Command,
Status & Control
Test
External Access XAB1
Interface PAB
Port
Controller
Breakpoint Logic
Trace Logic
Event Counter
PDB
PGDB
Pipeline
Registers
PAB
FIFO
History
Buffer
AA0093
As already noted, the JTAG module is the master. It enables interaction with the debug services provided
by the OnCE, and its external serial interface is used by the OnCE port for sending and receiving
debugging commands and data.
To OnCE Port
Decode
TDO
ID Register
Bypass Register
TMS
From ONCE Port
TAP
TCK
Controller
JTAG Reset
AA0119
The serial interface supports communications with the host development or test system. It is implemented
as a serial interface to occupy as few external pins on the device as possible. Consult the device’s user’s
manual for a full description of the interface signals. All JTAG and OnCE commands and data are sent
over this interface from the host system. The JTAG interface is also used by the OnCE port when it is
active. In this mode, the JTAG acts as the OnCE port’s interface controller, and transparently passes all
communications through to the OnCE port.
Commands sent to the JTAG module are decoded and processed by the command decoder. Commands for
the JTAG port are completely independent from the DSP56800 instruction set, and are executed in parallel
by the JTAG logic.
Registers in the JTAG module hold chip identification information and the information gathered by
boundary scan operations. The ID register contains the industry-standard Freescale identification
information, which is unique for each Freescale DSC. The boundary scan register holds a snapshot of the
device’s pins when sampled by the JTAG port.
OnCE
Command
Decoder OnCE Command,
Stat. Reg
Breakpoint Register
XAB1 Breakpoint
MUX
PAB and Trace Breakpoint
Logic
and
Trace
Count Reg.
PDB
PDB Register
Pipeline
PGDB Registers
PAB
Address
FIFO
AA0096
Together, these sub-modules provide a full-featured emulation and debug environment. Communication
with the OnCE port module is handled via the JTAG port and thus may be considered the primary
communications sub-module for the OnCE port, although it operates independently. The operations of the
OnCE port occur independently of the main DSP56800 core logic, and require no core resources.
BRA JMP
Sequential program flow can be assumed between recorded instructions, so it is possible for the user to
reconstruct the program flow extending back through quite a large number of instructions. To complete the
execution history, the first location of the FIFO always holds the address of the last executed instruction,
regardless of whether or not it caused a change of program flow.
A.1 Notation
Each instruction description contains notation used to abbreviate certain operands and operations. The
symbols and their respective descriptions are listed in Table A-1 through Table A-7 on page A-4.
Table A-1 shows the register set available for the most important move instructions. Sometimes the
register field is broken into two different fields — one where the register is used as a source and the other
where it is used as a destination. This is important because a different notation is used when an
accumulator is being stored without saturation. In addition, see the register fields in Table A-2 on
page A-2, which are also used in move instructions as sources and destinations within the AGU.
Table A-1. Register Fields for General-Purpose Writes and Reads
HHH A, B, A1, B1 Seven data ALU registers — two accumulators, two 16-bit MSP portions of
X0, Y0, Y1 the accumulators and three 16-bit data registers
Y1, Y0, X0
OMR, SR
LA, LC
HWS
Rn R0-R3 Five AGU registers available as pointers for addressing and as sources and
SP destinations for move instructions
Rj R0, R1, R2, R3 Four pointer registers available as pointers for addressing
Table A-3 shows the register set available for use in data ALU arithmetic operations. The most common
field used in this table is FDD.
Table A-3. Data ALU Registers
FDD A, B Five data ALU registers — two 36-bit accumulators and three 16-bit data
X0, Y0, Y1 registers accessible during data ALU operations
F1DD A1, B1 Five data ALU registers — two 16-bit MSP portions of the
X0, Y0, Y1 accumulators and three 16-bit data registers accessible during data ALU
operations
~F,F ~F,F refers to any of two valid accumulator combinations: A,B or B,A
F1 A1, B1 The 16-bit MSP portion of two accumulators accessible as source operands
in parallel move instructions
Address operands used in the instruction field sections of the instruction descriptions are given in
Table A-4. Addressing mode operators that are accepted by the assembler for specifying a specific
addressing mode are shown in Table A-5.
Symbol Description
ea Effective address
X: X memory reference
Symbol Description
Miscellaneous operand notation, including generic source and destination operands and immediate data
specifiers, are summarized in Table A-6.
Table A-6. Miscellaneous Operands
Symbol Description
#xx Immediate short data (7 bits for MOVEI, 6 bits for DO/REP)
Symbol Description
r Rounding constant
1. For instruction names that contain parentheses, such as DEC(W) or IMPY(16), the portion
within the parentheses is optional.
X0 Y Y1 Y0
15 0 15 0 15 0
Accumulator Registers
35 32 31 16 15 0
A A2 A1 A0
3 0 15 0 15 0
35 32 31 16 15 0
B B2 B1 B0
3 0 15 0 15 0
R1
R2
R3 15 0 15 0
SP N M01
PC MR CCR OMR
LA
Software Stack
12 0
(Located in X Memory)
LC
Loop Counter
AA0007
MR CCR
SR 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Status Register
Reset = $0300 LF * * * * * I1 I0 SZ L E U N Z V C
Read/Write
LF — Loop Flag
I1,I0 — Interrupt Mask
SZ — Size
L — Limit
E — Extension
U — Unnormalized
N — Negative
Z — Zero
V — Overflow
C — Carry
* Indicates reserved bits, read as zero and should be written with zero for future compatibility
The C, V, Z, N, U, and E bits are true condition code bits that reflect the condition of the result of a data
ALU operation. These condition code bits are not affected by address ALU calculations or by data
transfers over the CGDB. The N, Z, and V condition code bits are updated by the TSTW instruction, which
can operate on both memory and registers. The L bit is a latching overflow bit that indicates that an
overflow has occurred in the data ALU or that limiting has occurred when moving an accumulator register
to memory. The SZ bit is a latching bit that indicates the size of an accumulator when it is moved to data
memory.
Notation Description
=0 Cleared.
=1 Set.
(number) Set according to the special computation defined by the note with the corresponding number. The notes
may be found immediately after Table A-9.
T L bit can be set if limiting occurs when reading an accumulator during a parallel move or by the instruction
itself. An example of the latter case is BFCHG #$8000,A, which must first read the A accumulator
before performing the bit-manipulation operation.
CT L bit can be set if overflow has occurred in the result or if limiting occurs when an accumulator is being
read.
The condition code computation shown in Table A-9 may differ from that defined in the opcode
descriptions; see Section A.7, “Instruction Descriptions.” This indicates that the standard definition may be
used to generate the specific condition code result. For example, the Z flag computation for the CLR
instruction is shown as the standard definition, while the opcode description indicates that the Z flag is
always set. Table A-9 gives the chip implementation viewpoint, while the opcode descriptions give the
user viewpoint.
Instruction SZ L E U N Z V C Comments
ADD * CT *A *A *A *A *A *A
ASR * T *A *A *A *A =0 (3)
Bcc — — — — — — — —
BFCHG — T — — — — — (4)
BFCLR — T — — — — — (4)
BFSET — T — — — — — (4)
BFTSTH — T — — — — — (4)
BFTSTL — T — — — — — (5)
BRA — — — — — — — —
BRCLR — T — — — — — (5)
BRSET — T — — — — — (4)
CMP * CT *A *A *A *A *A *A
DEBUG — — — — — — — —
DEC(W) * CT *B *B *B *B *B *B
Instruction SZ L E U N Z V C Comments
INCW * CT *B *B *B *B *B *B
Jcc — — — — — — — —
JMP — — — — — — — —
JSR — — — — — — — —
LEA — — — — — — — —
MAC * CT *A *A *A *A *A —
MACR * CT *A *A *A *A *A —
MACSU — C *A *A *A *A *A —
MOVE * T — — — — — —
(10) (10) (10) (10) (10) (10) (10) (10) NA unless SR is the destina-
tion in the instruction
MPY * CT *A *A *A *A *A — V cleared
MPYR * CT *A *A *A *A *A — V cleared
MPYSU — C *A *A *A *A *A — V cleared
NEG * CT *A *A *A *A *A *A
NOP — — — — — — — —
OR — — — — *16 *16 =0 —
POP — — — — — — — —
Instruction SZ L E U N Z V C Comments
REP — T — — — — — —
RTS — — — — — — — —
STOP — — — — — — — —
SUB * CT *A *A *A *A *A *A
Tcc — — — — — — — —
TFR * T — — — — — —
WAIT — — — — — — — —
NOTES:
1. V is set if the MSB of the destination operand (bit 35 for an accumulator or bit 31 for the Y
register) is changed as a result of the left shift; V is cleared otherwise.
2. C is set if the MSB of the source operand (bit 35 for an accumulator or bit 31 for the Y
register) is set and is cleared otherwise.
3. C is set if bit 0 of the source operand is set and is cleared otherwise.
4. C is set if all bits specified by the mask are set and is cleared otherwise. Bits that are not set
in the mask should be ignored. If a bit-field instruction is performed on the status register,
all bits in this register selected by the bit field’s mask can be affected.
5. C is set if all bits specified by the mask are cleared and is cleared otherwise. Ignore bits that
are not set in the mask. Note that if a bit-field instruction is performed on the status register,
all bits in this register selected by the bit field’s mask can be affected.
6. C is set if the MSB of the result is cleared (bit 35 for an accumulator or bit 31 for the Y
register). The C bit is cleared if the MSB of the result is set.
7. For the accumulators, C is set if bit 31 of the source operand is set and is cleared otherwise.
For the Y1, Y0, and X0 registers, C is set if bit 15 of the source operand is set and is cleared
otherwise.
8. For the accumulators, C is set if bit 16 of the source operand is set and is cleared otherwise.
For the Y1, Y0, and X0 registers, C is set if bit 0 of the source operand is set and is cleared
otherwise.
Symbol Description
Instruction Instruction
Mnemonic Clock Cycles Mnemonic Clock Cycles
Words Words
ADC 1 2 LSRR 1 2
CLR 1 2+mv OR 1 2
DO 2 6 ROL 1 2
ENDDO 1 2 ROR 1 2
Instruction Instruction
Mnemonic Clock Cycles Mnemonic Clock Cycles
Words Words
ILLEGAL 1 4 SBC 1 2
LSR 1 2
1. This MOVE applies only to the case where two reads are performed in parallel from the X memory.
2. The STOP instruction disables the internal clock oscillator. After the clock is turned on, an internal
counter counts 65,536 cycles before enabling the clock to the internal DSC circuits.
3. The WAIT instruction takes a minimum of 16 cycles to execute when an internal interrupt is pending at
the time the WAIT instruction is executed.
X: (X memory move) 0 ax
Register → register 0
X memory ↔ register ea + ax
Register ↔ P memory ap
Note: The “ap” term represents the wait states spent when accessing the program memory
during DATA read or write operations and does not refer to instruction fetches.
NOTE:
All two-word jumps execute three program memory fetches to refill the
pipeline, one of them being the instruction word located at the jump
instruction’s second-word address + 1. If the jump instruction was fetched
from a program memory segment with wait states, another “ap” should be
added to account for that third fetch.
RTI, RTS 2 * ap + 2 * ax
NOTE:
The term “2 * ap” represents the two instruction fetches done by the
RTI/RTS instruction to refill the pipeline. The ax term represents fetching
the return address from the software stack when the stack pointer points to
external X memory, and the 2 * ax term includes both this fetch and the
fetch of the SR as performed by the RTI and RTS instructions.
Register 0
X memory ea + ax
No update 0 0
Post-increment by 1 0 0
Post-decrement by 1 0 0
Indexed by offset N 0 2
Special
Immediate data 1 2
Absolute address 1 2
Implicit 0 0
X: Int — — 0 — — —
X: Ext — — wx1 — — —
P: — Int — — 0 — —
P: — Ext — — wp2 — —
IO: — — Int — — 0 —
X:X: Int:Int — — — — — 0
X:X: Ext:Int — — — — — wx
X:X: I/O:Int — — — — — 0
Problem
Calculate the number of DSP56800 instruction program words and the number of oscillator clock cycles
required for the following instruction:
MACR X0,Y0,A X:(R0)+,Y0 X:(R3)+,X0
Where the following conditions are true:
• Operating mode register (OMR) = $02 (normal expanded memory map).
• External X memory accesses require zero wait state, (assume external mem requires no wait state
and BCR contains the value $00).
• R0 address register = $C000 (external X memory).
• R3 address register = $0052 (internal X memory).
Solution
To determine the number of instruction program words and the number of oscillator clock cycles required
for the given instruction, the user should perform the following steps:
1. Look up the number of instruction program words and the number of oscillator clock cycles
required for the opcode-operand portion of the instruction inTable A-11 on page A-18.
According to Table A-11 on page A-18, the MACR instruction will require one instruction
program word and will execute in (2 + mv) oscillator clock cycles. The term “mv”
represents the additional instruction program words (if any) and the additional oscillator
clock cycles (if any) that may be required over and above those needed for the basic MACR
instruction due to the parallel move portion of the instruction.
The parallel move portion of the MACR instruction consists of an XX memory read.
According to Table A-12 on page A-19, the parallel move portion of the instruction will
require mv = axx additional oscillator clock cycles. The term “axx” represents the number
of additional oscillator clock cycles (if any) that are required to access two operands in the
X memory.
3. Evaluate the “axx” term using Table A-20 on page A-22.
The parallel move portion of the MACR instruction consists of an XX Memory Read.
According to Table A-20 on page A-22, the term “axx” depends upon where the
referenced X memory locations are located in the DSP56800 memory space. External X
memory accesses may require additional oscillator clock cycles depending on the memory
device’s speed. Here we assume external X memory accesses require wx = 0 wait state or
additional oscillator clock cycle. For this example, the second X memory reference is
assumed to be an internal reference, while the first X memory reference is assumed to be
an external reference. Thus, according to Table A-20 on page A-22, the XX memory
reference in the parallel move portion of the MACR instruction will require axx = wx = 0
additional oscillator clock cycle.
4. Compute the final results.
Thus, based upon the assumptions given for Table A-11 on page A-18, the instruction
Problem
Calculate the number of DSP56800 instruction program words and the number of oscillator clock cycles
required for the following instruction:
JEQ $2000
Where the following conditions are true:
• OMR = $02 (normal expanded memory map).
• External P memory accesses require four wait states (assume external memory access requires 4
wait states in this example).
Solution
To determine the number of instruction program words and the number of oscillator clock cycles required
for the given instruction, the user should perform the following steps:
1. Look up the number of instruction program words and the number of oscillator clock cycles
required for the opcode-operand portion of the instruction in Table A-11 on page A-18.
According to Table A-11 on page A-18, the Jcc instruction will require two instruction
program words and will execute in (4 + jx) oscillator clock cycles. The term “jx” represents
the number of additional oscillator clock cycles (if any) required for a jump-type
instruction.
2. Evaluate the “jx” term using Table A-16 on page A-20.
According to Table A-16 on page A-20, the Jcc instruction will require 2 + 2 * ap
additional oscillator clock cycles if the “ea” condition is true; otherwise, 2 * ap if the
condition is false. The term “ap” represents the number of additional oscillator clock
cycles (if any) that are required to access a P memory operand. Note that the “+ (2 * ap)”
term represents the two program memory instruction fetches executed at the end of a
one-word jump instruction to refill the instruction pipeline.
3. Evaluate the “ap” term using Table A-20 on page A-22.
According to Table A-20 on page A-22, the term “ap” depends upon where the referenced
P memory location is located in the 16-bit DSC memory space. External memory accesses
require additional oscillator clock cycles according to the number of wait states required.
Here we assume that external P memory accesses require wp = 4 wait states or additional
oscillator clock cycles. For this example the P memory reference is assumed to be an
external reference. Thus, according to Table A-20 on page A-22, the Jcc instruction will
use the value ap = wp = 4 oscillator clock cycles.
4. Compute the final results.
Thus, based upon the assumptions given for Table A-11 on page A-18, the instruction
JEQ $2000
Problem
Calculate the number of DSP56800 instruction program words and the number of oscillator clock cycles
required for the following instruction:
RTS
According to Table A-11 on page A-18, the RTS instruction will require one instruction
program word and will execute in (10 + rx) oscillator clock cycles. The term “rx” represents
the number of additional oscillator clock cycles (if any) required for an RTS instruction.
2. Evaluate the “rx” term using Table A-17 on page A-21.
According to Table A-17 on page A-21, the RTS instruction will require rx = 2 * ap + 2 *
ax additional oscillator clock cycles. In this case “ax = 0” because the instruction accesses
the stack on internal memory. The term “ap” represents the number of additional oscillator
clock cycles (if any) that are required to access a P memory operand. The term “(2 * ap)”
represents the two program memory instruction fetches executed at the end of an RTS
instruction to refill the instruction pipeline.
3. Evaluate the “ap” term using Table A-20 on page A-22.
According to Table A-20 on page A-22, the term “ap” depends upon where the referenced
P memory location is located in the 16-bit DSC memory space. External memory accesses
may require additional oscillator clock cycles, according to the memory device’s speed.
Here we assume that external P memory accesses require wp = 4 wait states or additional
oscillator clock cycles. For this example the P memory reference is assumed to be an
internal reference. This means that the return address ($0100) pulled from the system
stack by the RTS instruction is in internal P memory. Thus, according to Table A-20 on
page A-22, the RTS instruction will use the value ap = 0 additional oscillator clock cycles.
4. Compute the final results.
Thus, based upon the assumptions given for Table A-11 on page A-18, the instruction
RTS
will require one instruction program word and will execute in (10 + rx) = (10 + (2 * ap) +
(2 * ax)) = (10 + (2 * 0) + (2 * 0)) = 10 oscillator clock cycles.
Example:
ABS A X:(R0)+,Y0 ; take ABS value, move data into Y0,
; update R0
A2 A1 A0 A2 A1 A0
Explanation of Example:
Prior to execution, the 36-bit A accumulator contains the value $F:FFFF:FFF2. Since this is a negative
number, the execution of the ABS instruction takes the two’s-complement of that value and returns
$0:0000:000E.
Note: When the D operand equals $8:0000:0000 (-16.0 when interpreted as a decimal fraction), the ABS in-
struction will cause an overflow to occur since the result cannot be correctly expressed using the stan-
dard 36-bit, fixed-point, two’s-complement data representation. Data limiting does not occur (that is,
A is not set to the limiting value of $7:FFFF:FFFF) but remains unchanged.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Parallel Moves:
ABS F X:(Rn)+ X0
X:(Rn)+N Y1
Y0
A
B
A1
B1
X0 X:(Rn)+
Y1 X:(Rn)+N
Y0
A
B
(F = A or B) A1
B1
1. This instruction occupies only 1 program word and executes in 1 instruction cycle for every ad-
dressing mode.
2. The destination of the data ALU operation is not allowed to be the same register as the destination
of the parallel read operation. Memory writes are allowed in this case.
Usage: This instruction is typically used in multi-precision addition operations (see Section 3.3.8, “Multi-Pre-
cision Operations,” on page 3-23) when it is necessary to add together two numbers that are larger than
32 bits (such as 64-bit or 96-bit addition).
Example:
ADC Y,A
Before Execution After Execution
SR 0301 SR 0300
Explanation of Example:
Prior to execution, the 32-bit Y register, comprised of the Y1 and Y0 registers, contains the value
$2000:8000, and the 36-bit accumulator contains the value $0:2000:8000. In addition, C is set to one.
The ADC instruction automatically sign extends the 32-bit Y register to 36 bits and adds this value to
the 36-bit accumulator. In addition, C is added into the LSB of this 36-bit addition. The 36-bit result
is stored back in the A accumulator, and the condition codes are set correctly. The Y1:Y0 register pair
is not affected by this instruction.
Note: C is set correctly for multi-precision arithmetic, using long word operands only when the extension
register of the destination accumulator (A2 or B2) contains sign extension of bit 31 of the destination
accumulator (A or B).
LF * * * * * I1 I0 SZ L E U N Z V C
Usage: This instruction can be used for both integer and fractional two’s-complement data.
Example:
ADD X0,A X:(R0)+,Y0 X:(R3)+,X0 ; 16-bit add, update
; Y0,X0,R0,R3
X0 FFFF X0 FFFF
Explanation of Example:
Prior to execution, the 16-bit X0 register contains the value $FFFF, and the 36-bit A accumulator con-
tains the value $0:0100:0000. The ADD instruction automatically appends the 16-bit value in the X0
register with 16 LS zeros, sign extends the resulting 32-bit long word to 36 bits, and adds the result to
the 36-bit A accumulator. Thus, 16-bit operands are always added to the MSP of A or B (A1 or B1),
with the result correctly extending into the extension register (A2 or B2). Operands of 16 bits can be
added to the LSP of A or B (A0 or B0) by loading the 16-bit operand into Y0; this forms a 32-bit word
by loading Y1 with the sign extension of Y0 and executing an ADD Y,A or ADD Y,B instruction.
Similarly, the second accumulator can also be used as the source operand.
Note: C is set correctly using word or long word source operands if the extension register of the destination
accumulator (A2 or B2) contains sign extension from bit 31 of the destination accumulator (A or B).
C is always set correctly by using accumulator source operands.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
F1,DD
A,B
B,A
Y,A
Y,B
FDD,X:aa 6 2
A,B A
B,A B
A1
B1
X0 X:(Rn)+
Y1 X:(Rn)+N
Y0
A
B
(F = A or B) A1
B1
1. This instruction occupies only 1 program word and executes in 1 instruction cycle
for every addressing mode.
2. The destination of the data ALU operation is not allowed to be the same register as
the destination of the parallel read operation. Memory writes are allowed in this case.
1. This parallel instruction is not allowed when the XP bit in the OMR is set (that is, when the instructions
are executing from data memory).
2. This instruction occupies only 1 program word and executes in 1 instruction cycle for every addressing
mode.
Timing: 2 + mv oscillator clock cycles for ADD instructions with a single or dual parallel move.
Refer to previous table for ADD instructions without a parallel move.
Memory: 1 program word for ADD instructions with a single or dual parallel move.
Refer to previous table for ADD instructions without a parallel move.
Usage: This instruction is used for the logical AND of two registers; the ANDC instruction is appropriate to
AND a 16-bit immediate value with a register or memory location.
Example:
AND X0,A ; AND X0 with A1
X0 7F00 X0 7F00
Explanation of Example:
Prior to execution, the 16-bit X0 register contains the value $7F00, and the 36-bit A accumulator con-
tains the value $6:1234:5678. The AND X0,A instruction logically ANDs the 16-bit value in the X0
register with bits 31–16 of the A accumulator (A1) and stores the 36-bit result in the A accumulator.
Bits 35–32 in the A2 register and bits 15–0 in the A0 register are not affected by this instruction.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
F1,DD
Description: Logically AND a 16-bit immediate data value with the destination operand, and store the results back
into the destination. C is also modified as described in the following discussion. This instruction per-
forms a read-modify-write operation on the destination and requires two destination accesses.
Example:
ANDC #$5555,X:$A000 ; AND with immediate data
SR 0301 SR 0300
Explanation of Example:
Prior to execution, the 16-bit X memory location X:$A000 contains the value $C3FF. Execution of the
instruction tests the state of bits: 0, 2, 4, 6, 8, 10, 12, and 14 in X:$A000. It clears C (because not all
the bits tested were set in destination X:$A000). Result from logical AND is written back to the tested
location.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Description: Arithmetically shift the destination operand (D) 1 bit to the left, and store the result in the destination.
The MSB of the destination prior to the execution of the instruction is shifted into C, and a zero is shift-
ed into the LSB of the destination. A duplicate destination is not allowed when ASL is used in con-
junction with a parallel read.
Implementation Note:
When a 16-bit register is specified as the operand for ASL, this instruction is actually assembled as an
LSL with the same register argument.
Example:
ASL A X:(R3)+N,Y0 ; multiply A by 2,
; update R3,Y0
SR 0300 SR 0373
Explanation of Example:
Prior to execution, the 36-bit A accumulator contains the value $A:0123:0123. Execution of the
ASL A instruction shifts the 36-bit value in the A accumulator 1 bit to the left and stores the result
back in the A accumulator. C is set by the operation because bit 35 of A was set prior to the execution
of the instruction. The V bit of CCR (bit 1) is also set because bit 35 of A has changed during the ex-
ecution of the instruction. The U bit of CCR (bit 4) is set because the result is not normalized, the E bit
of CCR (bit 5) is set because the signed integer portion of the result is in use, and the L bit of CCR (bit
6) is set because an overflow has occurred.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Parallel Moves:
ASL F X:(Rn)+ X0
X:(Rn)+N Y1
Y0
A
B
A1
B1
X0 X:(Rn)+
Y1 X:(Rn)+N
Y0
A
B
(F = A or B) A1
B1
1. This instruction occupies only 1 program word and executes in 1 instruction cycle
for every addressing mode.
2. The destination of the data ALU operation is not allowed to be the same register
as the destination of the parallel read operation. Memory writes are allowed in this case.
Example:
ASLL Y1,X0,A
Before Execution After Execution
Y1 AAAA Y1 AAAA
X0 0004 X0 0004
Explanation of Example:
Prior to execution, the Y1 register contains the value to be shifted ($AAAA) and the X0 register con-
tains the amount by which to shift ($0004). The contents of the destination register are not important
prior to execution because they have no effect on the calculated value. The ASLL instruction arithmet-
ically shifts the value $AAAA four bits to the left and places the result in the destination register A.
Since the destination is an accumulator, the extension word (A2) is filled with sign extension, and the
LSP (A0) is set to zero.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Instruction Fields:
ASLL Y1,X0,FDD 2 1 Arithmetic shift left of the first operand by value speci-
Y0,X0,FDD fied in four LSBs of the second operand; places result in
Y1,Y0,FDD FDD
Y0,Y0,FDD
A1,Y0,FDD
B1,Y1,FDD
Description: Arithmetically shift the destination operand (D) 1 bit to the right and store the result in the destination
accumulator. The LSB of the destination prior to the execution of the instruction is shifted into C, and
the MSB of the destination is held constant. A duplicate destination is not allowed when ASR is used
in conjunction with a parallel read.
Example:
ASR B X:(R2)+,Y0 ; divide B by 2,
; update R2, load Y0
Before Execution After Execution
SR 0300 SR 0329
Explanation of Example:
Prior to execution, the 36-bit B accumulator contains the value $A:A864:A865. Execution of the
ASR B instruction shifts the 36-bit value in the B accumulator 1 bit to the right and stores the result
back in the B accumulator. C is set by the operation because bit 0 of B was set prior to the execution
of the instruction. The N bit of CCR (bit 3) is also set because bit 35 of the result in B is set. The E bit
of CCR (bit 5) is set because the signed integer portion of B is used by the result.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Parallel Moves:
ASR F X:(Rn)+ X0
X:(Rn)+N Y1
Y0
A
B
A1
B1
X0 X:(Rn)+
Y1 X:(Rn)+N
Y0
A
B
(F = A or B) A1
B1
1. This instruction occupies only 1 program word and executes in 1 instruction cycle
for every addressing mode.
2. The destination of the data ALU operation is not allowed to be the same register
as the destination of the parallel read operation. Memory writes are allowed in this case.
Usage: This instruction is typically used for multi-precision arithmetic right shifts.
Example:
ASRAC Y1,X0,A ; right shift Y1 by X0 and
; accumulate in A
Before Execution After Execution
Y1 C003 Y1 C003
X0 0004 X0 0004
Explanation of Example:
Prior to execution, the Y1 register contains the value to be shifted ($C003), the X0 register contains
the amount by which to shift ($0004). The ASRAC instruction arithmetically shifts the value $C003
four bits to the right and accumulates this result with the value already in the destination register A.
Since the destination is an accumulator, the extension word (A2) is filled with sign extension.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Example:
ASRR Y1,X0,A ; right shift of 16-bit Y1 by X0
Y1 AAAA Y1 AAAA
X0 0004 X0 0004
Explanation of Example:
Prior to execution, the Y1 register contains the value to be shifted ($AAAA) and the X0 register con-
tains the amount by which to shift ($0004). The contents of the destination register are not important
prior to execution because they have no effect on the calculated value. The ASRR instruction arithmet-
ically shifts the value $AAAA four bits to the right and places the result in the destination register A.
Since the destination is an accumulator, the extension word (A2) is filled with sign extension, and the
LSP (A0) is set to zero.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
ASRR Y1,X0,FDD 2 1 Arithmetic shift right of the first operand by value speci-
Y0,X0,FDD fied in four LSBs of the second operand; places result in
Y1,Y0,FDD FDD
Y0,Y0,FDD
A1,Y0,FDD
B1,Y1,FDD
Example:
CMP X0,A
BNE LABEL ; branch to label if Z condition clear
INCW A
INCW A
LABEL:
ADD B,A
Explanation of Example:
In this example, if the Z bit is zero when executing the BNE instruction, program execution skips the
two INCW instructions and continues with the ADD instruction. If the specified condition is not true,
no branch is taken, the program counter is incremented by one, and program execution continues with
the first INCW instruction. The Bcc instruction uses a PC-relative offset of two for this example.
Restrictions:
A Bcc instruction used within a DO loop cannot begin at the LA or LA-1 within that DO loop.
A Bcc instruction cannot be repeated using the REP instruction.
1. The clock-cycle count depends on whether the branch is taken. The first value applies if the branch is taken, and the
second applies if it is not.
Usage: This instruction is very useful in performing I/O and flag bit manipulation.
Example:
BFCHG #$0310,X:<<$FFE2 ;test and change bits 4, 8, and 9
;in a peripheral register
SR 0001 SR 0000
Explanation of Example:
Prior to execution, the 16-bit X memory location X:$FFE2 contains the value $0010. Execution of the
instruction tests the state of the bits 4, 8, and 9 in X:$FFE2; does not set C (because not all of the bits
specified in the immediate mask were set); and then complements the bits.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
BFCHG #<MASK16>,DDDDD 4 2 BFCHG tests all bits selected by the 16-bit immediate
mask. If all selected bits are set, then the C bit is set. Oth-
#<MASK16>,X:(R2+xx) 6 2 erwise it is cleared. Then it inverts all selected bits.
Usage: This instruction is very useful in performing I/O and flag bit manipulation.
Example:
BFCLR #$0310,X:<<$FFE2 ; test and clear bits 4, 8, and 9 in
; an on-chip peripheral register
SR 0001 SR 0000
Explanation of Example:
Prior to execution, the 16-bit X memory location X:$FFE2 contains the value $7F95. Execution of the
instruction tests the state of the bits 4, 8, and 9 in X:$FFE2; clears C (because not all of the bits spec-
ified in the immediate mask were set); and then clears the bits.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
BFCLR #<MASK16>,DDDDD 4 2 BFCLR tests all bits selected by the 16-bit immediate
mask. If all selected bits are set, then the C bit is set. Oth-
#<MASK16>,X:(R2+xx) 6 2 erwise it is cleared. Then it clears all selected bits.
Usage: This instruction is very useful in performing I/O and flag bit manipulation.
Example:
BFSET #$F400,X:<<$FFE2
SR 0000 SR 0000
Explanation of Example:
Prior to execution, the 16-bit X memory location X:$FFE2 contains the value $8921. Execution of the
instruction tests the state of bits 10, 12, 13, 14, and 15 in X:$FFE2; does not set C (because not all of
the bits specified in the immediate mask were set); and then sets the bits.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
BFSET #<MASK16>,DDDDD 4 2 BFSET tests all bits selected by the 16-bit immediate
mask. If all selected bits are set, then the C bit is set. Oth-
#<MASK16>,X:(R2+xx) 6 2 erwise it is cleared. Then it sets all selected bits.
Usage: This instruction is very useful for testing I/O and flag bits.
Example:
BFTSTH #$0310,X:<<$FFE2 ; test high bits 4, 8, and 9 in
; an on-chip peripheral register
SR 0000 SR 0001
Explanation of Example:
Prior to execution, the 16-bit X memory location X:$FFE2 contains the value $0FF0. Execution of the
instruction tests the state of bits 4, 8, and 9 in X:$FFE2 and sets C (because all of the bits specified in
the immediate mask were set).
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
BFTSTH #<MASK16>,DDDDD 4 2 BFTSTH tests all bits selected by the 16-bit immediate
mask. If all selected bits are set, then the C bit is set. Oth-
#<MASK16>,X:(R2+xx) 6 2 erwise it is cleared.
Usage: This instruction is very useful for testing I/O and flag bits.
Example:
BFTSTL #$0310,X:<<$FFE2 ; test low bits 4, 8, and 9
Before Execution After Execution
SR 0000 SR 0001
Explanation of Example:
Prior to execution, the 16-bit X memory location X:$FFE2 contains the value $18EC. Execution of the
instruction tests the state of bits 4, 8, and 9 in X:$FFE2 and sets C (because all of the bits specified in
the immediate mask were clear).
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
BFTSTL #<MASK16>,DDDDD 4 2 BFTSTL tests all bits selected by the 16-bit immediate
mask. If all selected bits are clear, then the C bit is set.
#<MASK16>,X:(R2+xx) 6 2 Otherwise it is cleared.
All registers in DDDDD are permitted except HWS.
#<MASK16>,X:(SP-xx) 6 2
X:aa represents a 6-bit absolute address. Refer to Abso-
#<MASK16>,X:aa 4 2 lute Short Address (Direct Addressing): <aa> on page
4-22.
#<MASK16>,X:<<pp 4 2 X:<<pp represents a 6-bit absolute I/O address. Refer to
I/O Short Address (Direct Addressing): <pp> on page
#<MASK16>,X:xxxx 6 3
4-23.
Example:
BRA LABEL
INCW A
INCW A
LABEL
ADD B,A
Explanation of Example:
In this example, program execution skips the two INCW instructions and continues with the ADD in-
struction. The BRA instruction uses a PC-relative offset of two for this example.
Restrictions:
A BRA instruction used within a DO loop cannot begin at the LA or LA-1 within that DO loop.
A BRA instruction cannot be repeated using the REP instruction.
Instruction Fields:
Example:
BRCLR #$0013,X:<<$FFE2,LABEL
INCW A
INCW A
LABEL:
ADD B,A
SR 0000 SR 0001
Explanation of Example:
Prior to execution, the 16-bit X memory location X:$FFE2 contains the value $18EC. Execution of the
instruction tests the state of bits 4, 1, and 0 in X:$FFE2 and sets C (because all of the bits specified in
the immediate mask were clear). Since C is set, program execution is transferred to the address offset
from the current program counter by the displacement specified in the instruction (the two INCW in-
structions are not executed).
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
BRCLR #<MASK8>,DDDDD,<OFFSET7> 10/8 2 BRCLR tests all bits selected by the immediate mask.
If all selected bits are clear, then the carry bit is set and
#<MASK8>,X:(R2+xx),<OFFSET7> 12/10 2 a PC relative branch occurs. Otherwise it is cleared
and no branch occurs.
#<MASK8>,X:(SP-xx),<OFFSET7> 12/10 2
All registers in DDDDD are permitted except HWS.
#<MASK8>,X:aa,<OFFSET7> 10/8 2
MASK8 specifies a 16-bit immediate value where
#<MASK8>,X:<<pp,<OFFSET7> 10/8 2 either the upper or lower 8 bits contains all zeros.
#<MASK8>,X:xxxx,<OFFSET7> 12/10 3
AA specifies a 7-bit PC relative offset.
1. The first cycle count refers to the case when the condition is true and the branch is taken. The second cycle count
refers to the case when the condition is false and the branch is not taken.
Example:
BRSET #$00F0,X:<<$FFE2,LABEL
INCW A
INCW A
LABEL:
ADD B,A
SR 0000 SR 0001
Explanation of Example:
Prior to execution, the 16-bit X memory location X:$FFE2 contains the value $0FF0. Execution of the
instruction tests the state of bits 4, 5, 6, and 7 in X:$FFE2 and sets C (because all of the bits specified
in the immediate mask were set). Since C is set, program execution is transferred to the address offset
from the current program counter by the displacement specified in the instruction (the two INCW in-
structions are not executed)
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
BRSET #<MASK8>,DDDDD,<OFFSET7> 10/8 2 BRSET tests all bits selected by the immediate mask. If
all selected bits are set, then the carry bit is set and a PC
#<MASK8>,X:(R2+xx),<OFFSET7> 12/10 2 relative branch occurs. Otherwise it is cleared and no
branch occurs.
#<MASK8>,X:(SP-xx),<OFFSET7> 12/10 2
All registers in DDDDD are permitted except HWS.
#<MASK8>,X:aa,<OFFSET7> 10/8 2
MASK8 specifies a 16-bit immediate value where
#<MASK8>,X:<<pp,<OFFSET7> 10/8 2 either the upper or lower 8 bits contains all zeros.
#<MASK8>,X:xxxx,<OFFSET7> 12/10 3
AA specifies a 7-bit PC relative offset.
1. The first cycle count refers to the case when the condition is true and the branch is taken. The second cycle count
refers to the case when the condition is false and the branch is not taken.
Implementation Note:
When a 16-bit register is used as the operand for CLR, this instruction is actually assembled as a
MOVE #0,<register> instruction. It will disassemble as MOVE instruction.
Example:
CLR A A,X:(R0)+ ; save A into X data memory before
; clearing it
Explanation of Example:
Prior to execution, the 36-bit A accumulator contains the value $2:3456:789A. Execution of the
CLR A instruction clears the 36-bit A accumulator to zero.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Parallel Moves:
CLR F X:(Rn)+ X0
X:(Rn)+N Y1
Y0
A
B
A1
B1
X0 X:(Rn)+
Y1 X:(Rn)+N
Y0
A
B
(F = A or B) A1
B1
1. This instruction occupies only 1 program word and executes in 1 instruction cy-
cle for every addressing mode.
2. The destination of the data ALU operation is not allowed to be the same register
as the destination of the parallel read operation. Memory writes are allowed in this
case.
Usage: This instruction can be used for both integer and fractional two’s-complement data.
Note: When a word is specified as the source, it is sign extended and zero filled to form a valid 36-bit oper-
and. In order for C to be set correctly as a result of the subtraction, the destination must be properly
sign extended. The destination can be improperly sign extended by writing A1 or B1 explicitly prior
to executing the compare, so that A2 or B2, respectively, may not represent the correct sign extension.
This note particularly applies to the case in which the source is extended to compare 16-bit operands,
such as X0 with A1.
Example:
CMP Y0,A X0,X:(R1)+N ; compare Y0 and A, save X0,
; update R1
Y0 0024 Y0 0024
SR 0300 SR 0319
Explanation of Example:
Prior to execution, the 36-bit A accumulator contains the value $0:0020:0000, and the 16-bit Y0 reg-
ister contains the value $0024. Execution of the CMP Y0,A instruction automatically appends the
16-bit value in the Y0 register with 16 LS zeros, sign extends the resulting 32-bit long word to 36 bits,
subtracts the result from the 36-bit A accumulator, and updates the CCR (leaving the A accumulator
unchanged).
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Instruction Fields:
F1,DD
A,B
B,A
A,B A
B,A B
A1
B1
X0 X:(Rn)+
Y1 X:(Rn)+N
Y0
A
B
(F = A or B) A1
B1
1. This instruction occupies only 1 program word and executes in 1 instruction cycle for ev-
ery addressing mode.
2. The destination of the data ALU operation is not allowed to be the same register as the
destination of the parallel read operation. Memory writes are allowed in this case.
Example:
DECW A X:(R2)+,X0 ; Decrement the 20 MSBs of A and then
; update R2,X0
Explanation of Example:
Prior to execution, the 36-bit A accumulator contains the value $0:0001:0033. Execution of the
DECW A instruction decrements by one the upper 20 bits of the A accumulator.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
DEC F X:(Rn)+ X0
or X:(Rn)+N Y1
DECW Y0
A
B
A1
B1
X0 X:(Rn)+
Y1 X:(Rn)+N
Y0
A
B
(F = A or B) A1
B1
1. This instruction occupies only 1 program word and executes in 1 instruction cycle
for every addressing mode.
2. The destination of the data ALU operation is not allowed to be the same register
as the destination of the parallel read operation. Memory writes are allowed in this case.
If D[35] ⊕ S[15] = 1
Then
C; D1 + S D1
D2 D1 D0
Else
C; D1 - S D1
D2 D1 D0
Description: This instruction is a divide iteration that is used to calculate 1 bit of the result of a division. After the
correct number of iterations, this instruction will divide the destination operand (D)—dividend or nu-
merator—by the source operand (S)—divisor or denominator—and store the result in the destination
accumulator. The 32-bit dividend must be a positive value that is correctly sign extended to 36 bits and
that is stored in the full 36-bit destination accumulator. The 16-bit divisor is a signed value and is
stored in the source operand. (The division of signed numbers is handled using the techniques docu-
mented in Section 8.4, “Division,” on page 8-13.) This instruction can be used for both integer and
fractional division. Each DIV iteration calculates 1 quotient bit using a non-restoring division algo-
rithm (see the description that follows). After the execution of the first DIV instruction, the destination
operand holds both the partial remainder and the formed quotient. The partial remainder occupies the
high-order portion of the destination accumulator and is a signed fraction. The formed quotient occu-
pies the low-order portion of the destination accumulator (A0 or B0) and is a positive fraction. One bit
of the formed quotient is shifted into the LSB of the destination accumulator at the start of each DIV
iteration. The formed quotient is the true quotient if the true quotient is positive. If the true quotient is
negative, the formed quotient must be negated. For fractional division, valid results are obtained only
when |D| < |S|. This condition ensures that the magnitude of the quotient is less than one (that is, it is
fractional) and precludes division by zero.
The DIV instruction calculates 1 quotient bit based on the divisor and the previous partial remainder.
To produce an N-bit quotient, the DIV instruction is executed N times, where N is the number of bits
of precision that is desired in the quotient (1 < N < 16). Thus, for a full-precision (16-bit) quotient,
16 DIV iterations are required. In general, executing the DIV instruction N times produces an N-bit
quotient and a 32-bit remainder, which has (32 – N) bits of precision and whose N MSBs are zeros.
The partial remainder is not a true remainder and must be corrected (due to the non-restoring nature of
the division algorithm) before it may be used. Therefore, once the divide is complete, it is necessary
to reverse the last DIV operation and restore the remainder to obtain the true remainder.
The DIV instruction uses a non-restoring division algorithm that consists of the following operations:
1. Compare the source and destination operand sign bits. An exclusive OR operation is performed on
bit 35 of the destination operand and bit 15 of the source operand.
2. Shift the partial remainder and the quotient. The 36-bit destination accumulator is shifted 1 bit to the
left. C is moved into the LSB (bit 0) of the accumulator.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
SR 0301 SR 0301
Explanation of Example:
This example shows only a single iteration of the division instruction. Please refer to Section 8.4 for a
complete description of a division algorithm.
Instruction Fields:
During the first instruction cycle, the DO instruction’s source operand is loaded into the 13-bit LC reg-
ister, and the second location in the HWS receives the contents of the first location. The LC register
stores the remaining number of times the DO loop will be executed and can be accessed from inside
the DO loop as a loop count variable subject to certain restrictions. The DO instruction allows all reg-
isters on the DSC core to specify the number of loop iterations, except for the following: M01, HWS,
OMR, and SR. If immediate short data is instead used to specify the loop count, the 6 LSBs of the LC
register are loaded from the instruction, and the upper 7 MSBs are cleared.
During the second instruction cycle, the current contents of the PC are pushed onto the HWS. The DO
instruction’s destination address (shown as “expr”) is then loaded into the LA register. This 16-bit op-
erand is located in the instruction’s 16-bit absolute address extension word (as shown in the opcode
section). The value in the PC pushed onto the HWS is the address of the first instruction following the
DO instruction (that is, the first actual instruction in the DO loop). At the bottom of the loop, when it
is necessary to return to the top for another loop pass, this value is read (that is, copied but not pulled)
from the top of the HWS and loaded into the PC.
During the third instruction cycle, the LF is set. The PC is repeatedly compared with LA to determine
if the last instruction in the loop has been fetched. If LA equals PC, the last instruction in the loop has
been fetched and the LC is tested. If LC is not equal to one, it is decremented by one, and top of HWS
is loaded into the PC to fetch the first instruction in the loop again. If LC equals one, the end-of-loop
processing begins.
During the end-of-loop processing, the NL bit is written into the LF, and the NL bit is cleared. The
contents of the second HWS location are written into the first HWS location. Instruction fetches now
continue at the address of the instruction that follows the last instruction in the DO loop.
DO loops can also be nested as shown in Section 8.6, “Loops,” on page 8-20. When DO loops are nest-
ed, the end-of-loop addresses must also be nested and are not allowed to be equal. The assembler gen-
erates an error message when DO loops are improperly nested.
Note: Due to pipelining, if an address register (R0–R3, SP, or M01) is changed using a move-type instruction
(LEA, Tcc, MOVE, MOVEC, MOVEP, or parallel move), the new contents of the destination address
register will not be available for use during the following instruction (that is, there is a single instruc-
tion cycle pipeline delay). This restriction also applies to the situation in which the last instruction in
a DO loop changes an address register and the first instruction at the top of the DO loop uses that same
address register. The top instruction becomes the following instruction due to the loop construct.
Note: If the A or B accumulator is specified as a source operand, and the data from the accumulator indicates
that extension is used, the value to be loaded into the LC register will be limited to a 16-bit maximum
positive or negative saturation constant. If positive saturation occurs, the limiter places $7FFF onto the
bus, and the lower 13 bits of this value are all ones. The thirteen ones are loaded into the LC register
as the maximum unsigned positive loop count allows. If negative saturation occurs, the limiter places
$8000 onto the bus, and the lower 13 bits of this value are all zeros. The thirteen zeros are loaded into
the LC register, specifying a loop count of zero. The A and B accumulators remain unchanged.
Note: If LC is zero upon entering the DO loop, the loop is executed 213 times. To avoid this, use the software
technique outlined in Section 8.6, “Loops,” on page 8-20.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Proper DO loop operation is not guaranteed if an instruction starting at the LA-2, LA-1, or LA specifies
one of the program controller registers SR, SP, LA, LC, or (implicitly) PC as a destination register.
Similarly, the HWS register may not be specified as a source or destination register in an instruction
starting at the LA-2, LA-1, or LA. Additionally, the HWS register cannot be specified as a source reg-
ister in the DO instruction itself, and LA cannot be used as a target for jumps to subroutine (that is, JSR
to LA). A DO instruction cannot be repeated using the REP instruction.
The following instructions cannot begin at the indicated position(s) near the end of a DO loop:
At the LA-1:
ENDDO
Single-word instructions that read LC, SP, or HWS
At the LA:
Any two-word instruction (this restriction applies to the situation in which the DSC
simulator’s single-line assembler is used to change the last instruction in a DO loop from
a one-word instruction to a two-word instruction)
Similarly, since the DO instruction accesses the program controller registers, the DO instruction must
not be immediately preceded by any of the following instructions:
Other Restrictions:
DO HWS,xxxx
JSR to (LA) whenever the LF is set
A DO instruction cannot be repeated using the REP instruction
Example:
DO Y0,ENDLP ; execute loop ending at ENDLP (Y0)
; times
MOVEC LC,A ; get current value of loop counter
CMP Y1,A ; compare loop counter with Y1
JNE CONTINU ; jump if LC not equal to Y1
ENDDO ; equal, restore all DO registers
JMP ENDLP ; jump to ENDLP, continue after loop
CONTINU: ; LC not equal to Y1, continue loop
MOVE #1,X:$4000 ; (last instruction in DO loop)
ENDLP: MOVE #$1234,X0 ; (first instruction AFTER DO loop)
Explanation of Example:
This example illustrates the use of the ENDDO instruction to terminate the current DO loop. The value
of the LC is compared with the value in the Y1 register to determine if execution of the DO loop should
continue. The ENDDO instruction updates certain program controller registers but does not automat-
ically jump past the end of the DO loop. Thus, if this action is desired, a JMP/BRA instruction (that is,
JMP ENDLP as shown previously) must be included after the ENDDO instruction to transfer program
control to the first instruction past the end of the DO loop.
Note: The ENDDO instruction updates the program controller registers appropriately but does not automat-
ically jump past the end of the loop. If desired, this must be done explicitly by the programmer.
Restrictions:
Due to pipelining and the fact that the ENDDO instruction accesses the program controller registers,
the ENDDO instruction must not be immediately preceded by any of the following instructions:
MOVEC to SR or HWS
MOVEC from HWS
Any bit-field instruction on the SR
Also, the ENDDO instruction cannot be the next-to-last instruction in a DO loop (at the LA-1).
Condition Codes Affected:
The condition codes are not affected by this instruction.
Instruction Fields:
ENDDO 2 1 Remove one value from the hardware stack and update
the NL and LF bits appropriately
Note: Does not branch to the end of the loop
Usage: This instruction is used for the logical exclusive OR of two registers. If it is desired to exclusive OR a
16-bit immediate value with a register or memory location, then the EORC instruction is appropriate.
Example:
EOR Y1,B ; Exclusive_OR Y1 with B1
Y1 FF00 Y1 FF00
Explanation of Example:
Prior to execution, the 16-bit Y1 register contains the value $FF00, and the 36-bit B accumulator con-
tains the value $5:5555:6789. The EOR Y1,B instruction logically exclusive ORs the 16-bit value in
the Y1 register with bits 31–16 of the B accumulator (B1) and stores the 36-bit result in the B accumu-
lator. The lower word of the accumulator (B0) and the extension bits (B2) are not affected by the op-
eration.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
F1,DD
Description: Logically exclusive OR a 16-bit immediate data value with the destination operand (D) and store the
results back into the destination. C is also modified as described below. This instruction performs a
read-modify-write operation on the destination and requires two destination accesses.
Example:
EORC #$0FF0,X:<<$FFE0 ; Exclusive OR with immediate data
SR 0301 SR 0300
Explanation of Example:
Prior to execution, the 16-bit X memory location X:$FFE0 contains the value $5555. Execution of the
instruction performs a logical XOR of the 16-bit immediate data value ($0FF0) with the destination
contents ($5555). In this case, it tests the 8 bits [11:4] in and writes back the result ($5AA5) in desti-
nation X:$FFE0. The C bit is cleared because not all of the tested bits were set.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
If the ILLEGAL instruction is in a DO loop at the LA and the instruction at the LA-1 is being inter-
rupted, then LC will be decremented twice due to the same mechanism that causes LC to be decrement-
ed twice if JSR, REP,… are located at the LA.
Since REP is not interruptible, repeating an ILLEGAL instruction results in the interrupt not being tak-
en until after completion of the REP. After servicing the interrupt, program control will return to the
address of the second word following the ILLEGAL instruction. Of course, the ILLEGAL interrupt
service routine should abort further processing, and the processor should be re-initialized.
Usage: The ILLEGAL instruction provides a means for testing the interrupt service routine executed upon dis-
covering an illegal instruction. This allows a user to verify that the interrupt service routine can cor-
rectly recover from an illegal instruction and restart the application. The ILLEGAL instruction is not
used in normal programming.
Example:
ILLEGAL
Explanation of Example: See the previous description.
Usage: This instruction is useful in general computing when it is necessary to multiply two integers and the
nature of the computation can guarantee that the result fits in a 16-bit destination. In this case, it is bet-
ter to place the result in the MSP (A1 or B1) of an accumulator, because more instructions have access
to this portion than to the other portions of the accumulator.
Note: No overflow control or rounding is performed during integer multiply instructions. The result is always
a 16-bit signed integer result that is sign extended to 20 bits.
Example:
IMPY16 Y0,X0,A ; form 16-bit product
X0 0003 X0 0003
Y0 0004 Y0 0004
Explanation of Example:
Prior to execution, the data ALU registers X0 and Y0 contain, respectively, two 16-bit signed integer
values ($0003 and $0004). The contents of the destination accumulator are not important prior to ex-
ecution. Execution of the IMPY X0,Y0,A instruction integer multiplies X0 and Y0 and stores the re-
sult ($000C) in A1. A0 remains unchanged, and A2 is sign extended.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Example:
INCW A X:(R0)+,X0 ; Increment the 20 MSBs of A
; update X0 and R0
Explanation of Example:
Prior to execution, the 36-bit A accumulator contains the value $0:0001:0033. Execution of the
INCW A instruction increments by one the upper 20 bits of the A accumulator.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Parallel Moves:
INC F X:(Rn)+ X0
or X:(Rn)+N Y1
INCW Y0
A
B
A1
B1
X0 X:(Rn)+
Y1 X:(Rn)+N
Y0
A
B
(F = A or B) A1
B1
1. This instruction occupies only 1 program word and executes in 1 instruction cy-
cle for every addressing mode.
2. The destination of the data ALU operation is not allowed to be the same register
as the destination of the parallel read operation. Memory writes are allowed in this
case.
Example:
CMP X0,A
JCS LABEL ; jump to label if carry bit is set
INCW A
INCW A
LABEL:
ADD B,A
Explanation of Example:
In this example, if C is one when executing the JCS instruction, program execution skips the two
INCW instructions and continues with the ADD instruction. If the specified condition is not true, no
jump is taken, the program counter is incremented by one, and program execution continues with the
first INCW instruction. The Jcc instruction uses a 16-bit absolute address for this example.
Restrictions:
A Jcc instruction used within a DO loop cannot begin at the LA or LA-1 within that DO loop.
A Jcc instruction cannot be repeated using the REP instruction.
The condition codes are tested but not modified by this instruction.
Instruction Fields:
1. The clock-cycle count depends on whether the branch is taken. The first value applies if the branch is taken, and the
second applies if it is not.
Example:
JMP LABEL
Explanation of Example:
In this example, program execution is transferred to the address represented by label. The DSC core
supports up to 16-bit program addresses.
Restrictions:
A JMP instruction used within a DO loop cannot begin at the LA within that DO loop.
A JMP instruction cannot be repeated using the REP instruction.
Instruction Fields:
Example:
JSR LABEL ; jump to absolute address of a
; subroutine indicated by LABEL
Explanation of Example:
In this example, program execution is transferred to the subroutine at the address represented by LA-
BEL. The DSC core supports up to 16-bit program addresses.
Restrictions:
A JSR instruction used within a DO loop cannot begin at the LA within that DO loop.
A JSR instruction used within a DO loop cannot specify the LA as its target.
A JSR instruction cannot be repeated using the REP instruction.
Instruction Fields:
JSR <ABS16> 8 2 Push return address and status register and jump to 16-bit
target address
Example:
LEA (R0)+N ; update R0 using (R0)+N
R0 8001 R0 8C02
N 0C01 N 0C01
Explanation of Example:
Prior to execution, the 16-bit address register R0 contains the value $8001, the 16-bit address register
N contains the value $0C01, and the 16-bit modulo register M01 contains the value $1000. Execution
of the LEA (R0)+N instruction adds the contents of the R0 register to the contents of the N register
and stores the resulting updated address in the R0 address register. The addition is performed using
modulo arithmetic since it is done with the R0 register and M01 is not equal to $FFFF. No wraparound
occurs during the addition because the result falls within the boundaries of the modulo buffer.
C Unch. Unchanged 0
D2 D1 D0
Description: Logically shift 16 bits of the destination operand (D) by 1 bit to the left, and store the result in the des-
tination. If the destination is a 36-bit accumulator, the result is stored in the MSP of the accumulator
(FF1 portion), and the remaining portions of the accumulator are not modified. The MSB of the desti-
nation (bit 31 if the destination is a 36-bit accumulator) prior to the execution of the instruction is shift-
ed into C, and zero is shifted into the LSB of D1 (bit 16 if the destination is a 36-bit accumulator). The
result is not affected by the state of the saturation bit (SA).
Example:
LSL B ; shift 1 bit left
SR 0300 SR 0305
Explanation of Example:
Prior to execution, the 36-bit B accumulator contains the value $6:8000:00AA. Execution of the
LSL B instruction shifts the 16-bit value in the B1 register 1 bit to the left and stores the result back
in the B1 register. C is set by the operation because bit 31 of B1 was set prior to the execution of the
instruction. The Z bit of CCR (bit 2) is also set because the result in B1 is zero.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Implementation Note:
This instruction is actually implemented by the assembler using the ASLL instruction. It will disas-
semble as ASLL.
Example:
LSLL Y1,X0,Y1 ; left shift of 16-bit Y1 by X0
X0 0004 X0 0004
Explanation of Example:
Prior to execution, the Y1 register contains the value to be shifted ($AAAA) and the X0 register con-
tains the amount to shift by ($0004). The contents of the destination register are not important prior to
execution because they have no effect on the calculated value. The LSLL instruction logically shifts
the value $AAAA four bits to the left and places the result in the destination register Y1.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 S L E U N Z V C
LSLL Y1,X0,DD 2 1 Logical shift left of the first operand by value specified in
Y0,X0,DD four LSBs of the second operand; places result in DD.
Y1,Y0,DD
Y0,Y0,DD
A1,Y0,DD Use ASLL when left shifting is desired on one of the two
B1,Y1,DD accumulators.
Unch. Unchanged C
D2 D1 D0
Description: Logically shift 16 bits of the destination operand (D) by 1 bit to the right, and store the result in the
destination. If the destination is a 36-bit accumulator, the result is stored in the MSP of the accumulator
(FF1 portion), and the remaining portions of the accumulator are not modified. The LSB of the desti-
nation (bit 16 if the destination is a 36-bit accumulator) prior to the execution of the instruction is shift-
ed into C, and zero is shifted into the MSB of D1 (bit 31 if the destination is a 36-bit accumulator). The
result is not affected by the state of the saturation bit (SA).
Example:
LSR B ; divide B1 by 2 (B1 considered unsigned)
SR 0300 SR 0305
Explanation of Example:
Prior to execution, the 36-bit B accumulator contains the value $F:0001:00AA. Execution of the
LSR B instruction shifts the 16-bit value in the B1 register 1 bit to the right and stores the result back
in the B1 register. C is set by the operation because bit 16 of B was set prior to the execution of the
instruction. The Z bit of CCR (bit 2) is also set because the result in B1 is zero.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
N — Always cleared
Z — Set if the MSP of result or all bits of 16-bit register result are zero
V — Always cleared
C — Set if bit 16 of accumulator or bit 0 of 16-bit register was set prior to the execution
of the instruction
Instruction Fields:
Example:
LSRAC Y1,X0,A ; 16-bit add
Before Execution After Execution
Y1 C003 Y1 C003
X0 0004 X0 0004
Explanation of Example:
Prior to execution, the Y1 register contains the value to be shifted ($C003), the X0 register contains
the amount by which to shift ($0004), and the destination accumulator contains $0:000:0099. The
LSRAC instruction logically shifts the value $C003 four bits to the right and accumulates this result
with the value already in the destination register A. Since the destination is an accumulator, the exten-
sion word (A2) is filled with sign extension.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Example:
LSRR Y1,X0,A ; right shift of 16-bit Y1 by X0
Y1 AAAA Y1 AAAA
X0 0004 X0 0004
Explanation of Example:
Prior to execution, the Y1 register contains the value to be shifted ($AAAA), and the X0 register con-
tains the amount by which to shift ($0004). The contents of the destination register are not important
prior to execution because they have no effect on the calculated value. The LSRR instruction logically
shifts the value $AAAA four bits to the right and places the result in the destination register (A). Since
the destination is an accumulator, the extension word (A2) is filled with sign extension, and the LSP
(A0) is set to zero.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
LSRR Y1,X0,FDD 2 1 Logical shift right of the first operand by value specified
Y0,X0,FDD in four LSBs of the second operand; places result in FDD
Y1,Y0,FDD (when result is to an accumulator F, zero extends into F2)
Y0,Y0,FDD
A1,Y0,FDD
B1,Y1,FDD
Usage: This instruction is used for multiplication and accumulation of fractional data or integer data when a
full 32-bit product is required (see Section 3.3.5.2, “Integer Multiplication,” on page 3-20). When the
destination is a 16-bit register, this instruction is useful only for fractional data.
Example:
MAC X0,Y1,A X:(R1)+,Y1 X:(R3)+,X0
X0 4000 X0 2000
Y1 0AA0 Y1 0450
Explanation of Example:
Prior to execution, the 16-bit X0 register contains the value $4000, the 16-bit Y1 register contains the
value $0AA0, and the 36-bit A accumulator contains the value $0:0003:0003. Execution of the
MAC X0,Y1,A instruction multiplies the 16-bit signed value in the X0 register by the 16-bit signed
value in Y1, adds the resulting 32-bit product to the 36-bit A accumulator, and stores the result
($0:0553:0003) into the A accumulator. In parallel, X0 and Y1 are updated with new values fetched
from data memory, and the two address registers (R1 and R3) are post-incremented by one.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Parallel Moves:
Data ALU Operation Parallel Memory Move
X0 X:(Rn)+
Y1 X:(Rn)+N
Y0
A
B
(F = A or B) A1
B1
1. This instruction occupies only 1 program word and executes in 1 instruction cy-
cle for every addressing mode.
2. The destination of the data ALU operation is not allowed to be the same register
as the destination of the parallel read operation. Memory writes are allowed in this
case.
1. This parallel instruction is not allowed when the XP bit in the OMR is set (that is, when the instructions
are executing from data memory).
2. This instruction occupies only 1 program word and executes in 1 instruction cycle for every addressing
mode.
Timing: 2 + mv oscillator clock cycles for MAC instructions with parallel move
2 oscillator clock cycles for MAC without parallel move
Usage: This instruction is used for the multiplication, accumulation, and rounding of fractional data.
Example:
MACR -X0,Y1,A
X0 4000 X0 4000
Y1 C000 Y1 C000
Explanation of Example:
Prior to execution, the 16-bit X0 register contains the value $4000, the 16-bit Y1 register contains the
value $C000, and the 36-bit A accumulator contains the value $0:0003:8000. Execution of the
MACR -X0,Y1,A instruction multiplies the 16-bit signed value in the X0 register by the 16-bit
signed value in Y1 and subtracts the resulting 32-bit product from the 36-bit A accumulator, rounds
the result, and stores the result ($0:2004:0000) into the A accumulator. In this example, the default
rounding (convergent rounding) is performed.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Instruction Fields:
X0 X:(Rn)+
Y1 X:(Rn)+N
Y0
A
B
(F = A or B) A1
B1
1. This instruction occupies only 1 program word and executes in 1 instruction cy-
cle for every addressing mode.
2. The destination of the data ALU operation is not allowed to be the same register
as the destination of the parallel read operation. Memory writes are allowed in this
case.
1. This parallel instruction is not allowed when the XP bit in the OMR is set (that is, when the instructions
are executing from data memory).
2. This instruction occupies only 1 program word and executes in 1 instruction cycle for every addressing
mode.
Timing: 2 + mv oscillator clock cycles for MACR instructions with a parallel move
2 oscillator clock cycles for MACR instructions without parallel move
Example:
MACSU X0,Y0,A
X0 3456 X0 3456
Y0 8000 Y0 8000
Explanation of Example:
The 16-bit X0 register contains the value $3456 and the 16-bit Y0 register contains the value $8000.
Execution of the MACSU X0,Y0,A instruction multiplies the 16-bit signed value in the X0 register
by the 16-bit unsigned value in Y0, and then adds the result to the A accumulator and stores the signed
result back into the A accumulator. If this were a MAC instruction, Y0 ($8000) would equal -1.0, and
the multiplication result would be $F:CBAA:0000. Since this is a MACSU instruction, Y0 is consid-
ered unsigned and equals +1.0. This gives a multiplication result of $0:3456:0000.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Since the on-chip peripheral registers are accessed as locations in X data memory, there are many move
instructions that can access these peripheral registers. Also, the case of “No Move Specified” for arith-
metic operations optionally allows a parallel move.
When a 36-bit accumulator (A or B) is specified as a source operand (S), there is a possibility that the
data may be limited. If the data out of the accumulator indicates that the accumulator extension bits are
in use, and the data is to be moved into a 16-bit destination, the value stored in the destination is limited
to a maximum positive or negative saturation constant to minimize truncation error. Limiting does not
occur if an individual 16-bit accumulator register (A1, A0, B1, or B0) is specified as a source operand
instead of the full 36-bit accumulator (A or B). This limiting feature allows block floating-point oper-
ations to be performed with error detection since the L bit in the CCR is latched (that is, sticky).
When a 36-bit accumulator (A or B) is specified as a destination operand (D), any 16-bit source data
to be moved into that accumulator is automatically extended to 36 bits by sign extending the MSB of
the source operand (bit 15) and appending the source operand with 16 LS zeros. The automatic sign
extension and zeroing features may be circumvented by specifying the destination register to be one
of the individual 16-bit accumulator registers (A1 or B1).
The MOVE, MOVE(C), MOVE(I), MOVE(M), MOVE(P), and MOVE(S) descriptions are found on
the following pages. Detailed descriptions of the two parallel move types are covered under the MOVE
instruction. The Tcc and TFR descriptions are covered in their respective sections.
Seventeen data ALU instructions allow the capability of specifying an optional single parallel move.
These data ALU instructions have been selected for optimal performance on the critical sections of fre-
quently used DSC algorithms. A summary of the different data ALU instructions, registers used for the
memory move, and addressing modes available for the single parallel move is shown in Table 6-35,
“Data ALU Instructions — Single Parallel Move,” on page 6-29.
If the arithmetic operation of the instruction specifies a given source register (S) or destination register
(D), that same register or portion of that register may be used as a source in the parallel data bus move
operation. This allows data to be moved in the same instruction in which it is being used as a source
operand by a data ALU operation. That is, duplicate sources are allowed within the same instruction.
Examples of duplicate sources include the following:
Description: If the arithmetic operation portion of the instruction specifies a given destination accumulator, that
same accumulator or portion of that accumulator may not be specified as a destination in the parallel
data bus move operation. Thus, if the opcode-operand portion of the instruction specifies the 36-bit A
or B accumulator as its destination, the parallel data bus move portion of the instruction may not spec-
ify A0/B0, A1/B1, A2/B2, or A/B as its destination. That is, duplicate destinations are not allowed
within the same instruction. Examples of duplicate destinations include the following:
Exceptions:
Instructions TST and CMP allow both the accumulator and its lower portion (A and A0, B and B0) to
be the parallel move destination even if this accumulator is used by the data ALU operation. These in-
structions do not have a true destination.
R3 00FF R3 0103
N 0004 N 0004
Explanation of Example:
Prior to execution, the 16-bit R3 address register contains the value $00FF, the A accumulator contains
the value $0:5555:3333, and the 16-bit X memory location X:$00FF contains the value $1234. Execu-
tion of the parallel move portion of the instruction, A,X:(R3)+ N, uses the R3 address register to
move the contents of the A1 register before left shifting into the 16-bit X memory location (X:$00FF).
R3 is then updated by the value in the N register.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
SZ — Set according to the standard definition of the SZ bit during parallel move
L — Set if data limiting has occurred during parallel move
X0 X:(Rj)+
ADD X0,F Y1 X:(Rj)+N
SUB Y1,F Y0
CMP Y0,F
A
TFR A,B B
B,A A1
B1
ABS F
ASL
ASR
CLR
RND
TST
INC or INCW
DEC or DECW
NEG (F = A or B) (Rj = R0-R3)
1. These instructions occupy only 1 program word and executes in 1 instruction cycle for
every addressing mode.
2. The destination of the data ALU operation is not allowed to be the same register as the
destination of the parallel read operation. Memory writes are allowed in this case.
Six data ALU instructions (ADD, MAC, MACR, MPY, MPYR, and SUB) allow the capability of
specifying an optional dual memory read. In addition, MOVE can be specified. These data ALU in-
structions have been selected for optimal performance on the critical sections of frequently used DSC
algorithms. A summary of the different data ALU instructions, registers used for the memory move,
and addressing modes available for the dual parallel read is shown in Table 6-36, “Data ALU Instruc-
tions — Dual Parallel Read,” on page 6-30. When the MOVE instruction is selected, only the dual
memory accesses occur — no arithmetic operation is performed.
Example:
MPYR X0,Y0,A X:(R0)+,Y0 X:(R3)+,X0
X0 4000 X0 CCCC
Y0 5555 Y0 BBBB
Explanation of Example:
Prior to execution, the 16-bit X0 register contains the value $4000, and the 16-bit Y0 register contains
the value $5555. Execution of the parallel move portion of the instruction,
X:(R0)+,Y0 X:(R3)+,X0, moves the 16-bit value in the X memory location X:(R0) into the reg-
ister Y0, moves the 16-bit X memory location X:(R3) into the register X0, and post-increments by one
the 16-bit values in the R0 and R3 address registers. The multiplication is performed with the old val-
ues of X0 and Y0, and the result rounded using convergent algorithm before storing it in the accumu-
lator.
Note: The second X data memory parallel read using the R3 address register can never access off-chip mem-
ory or on-chip peripherals. It can only access on-chip X data memory.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
ADD X0,F
SUB Y1,F
Y0,F
(F = A or B)
1. These parallel instructions are not allowed when the XP bit in the OMR is set (that is, when the instruc-
tions are executing from data memory).
2. These instructions occupy only 1 program word and executes in 1 instruction cycle for every addressing
mode.
If the HWS is specified as a destination operand, the contents of the first HWS location are copied into
the second one, and the LF and NL bits are updated accordingly. If the HWS is specified as a source
operand, the contents of the second HWS location are copied into the first one, and the LF and NL bits
are updated accordingly. This allows more efficient manipulation of the HWS.
When a 36-bit accumulator (A or B) is specified as a source operand, there is a possibility that the data
may be limited. If the data out of the shifter indicates that the accumulator extension register is in use,
and the data is to be moved into a 16-bit destination, the value stored in the destination is limited to a
maximum positive or negative saturation constant to minimize truncation error. Limiting does not oc-
cur if an individual 16-bit accumulator register (A1, A0, B1, or B0) is specified as a source operand
instead of the full 36-bit accumulator (A or B). This limiting feature allows block floating-point oper-
ations to be performed with error detection since the L bit in the CCR is latched (that is, sticky).
When a 36-bit accumulator (A or B) is specified as a destination operand, any 16-bit source data to be
moved into that accumulator is automatically extended to 36 bits by sign extending the MSB of the
source operand (bit 15) and appending the source operand with 16 LS zeros. The automatic sign ex-
tension and zeroing features may be circumvented by specifying the destination register to be one of
the individual 16-bit accumulator registers (A1 or B1).
Note: Due to pipelining, if an address register (Rj, SP, or M01) is changed with a MOVE or bit-field instruc-
tion, the new contents will not be available for use as a pointer until the second following instruction.
If the SP is changed, no PUSH or POP instructions are permitted until the second following instruction.
Note: If the N address register is changed with a MOVE instruction, this register’s contents will be available
for use on the immediately following instruction. In this case the instruction that writes the N address
register will be stretched one additional instruction cycle. This is true for the case when the N register
is used by the immediately following instruction; if N is not used, then the instruction is not stretched
an additional cycle. If the N address register is changed with a bit-field instruction, the new contents
will not be available for use until the second following instruction.
LC 0100 LC 0100
X0 0123 X0 0100
Explanation of Example:
Execution of the MOVEC instruction moves the contents of the program controller’s 13-bit LC register
into the data ALU’s 16-bit X0 register.
Example:
MOVEC X:$CC00,N ; move X data memory value into the
; N register
N 0123 N 0100
Explanation of Example:
Execution of the MOVEC instruction moves the contents of the X data memory at location $CC00 into
the AGU’s 16-bit N register.
Example:
MOVEC R2,X:(R3+$3072) ; move R2 register into X data
; memory
Before Execution After Execution
R2 AAAA R2 AAAA
Explanation of Example:
Prior to execution, the contents of R3 is $1000. Execution of the MOVEC instruction moves the
AGU’s 16-bit R2 register contents into the X data memory at the location $4072.
Restrictions:
A MOVEC instruction used within a DO loop that specifies the HWS as the source or that specifies
the SR or HWS as the destination cannot begin at the LA-2, LA-1, or LA within that DO loop.
A MOVEC instruction that specifies the HWS as the source or as the destination cannot be used im-
mediately before a DO instruction.
A MOVEC instruction that specifies the HWS as the source or that specifies the SR or HWS as the
destination cannot be used immediately before an ENDDO instruction.
A MOVEC instruction that specifies the SR, HWS, or SP as the destination cannot be used immedi-
ately before an RTI or RTS instruction.
A MOVEC HWS,HWS instruction is illegal and cannot be used.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
If D is the SR:
Instruction Fields:
X:(Rn+N) DDDDD 4 1 —
DDDDD X:(Rn+N) 4 1
DDDDD X:(Rn+xxxx) 6 2
Example:
MOVEI #<$FFC7,X0 ; moves negative value into X0
X0 1234 X0 FFC7
Explanation of Example:
Prior to execution, X0 contains the value $1234. Execution of the instruction moves the value $FFC7
into X0.
Example:
MOVEI #$C33C,X:$A009 ; moves 16-bit value directly into a
; memory location
Explanation of Example:
Prior to execution, the X data memory location $A009 contains the value $1234. Execution of the in-
struction moves the value $C33C into this memory location.
Note: The MOVEP and MOVES instructions also provide a mechanism for loading 16-bit immediate values
directly into the last 64 and first 64 locations, respectively, in X data memory.
MOVE #<-64-63> A, B, A1, B1 2 1 Signed 7-bit integer data (data is put in the
or X0, Y0, Y1 lowest 7 bits of the word portion of any accu-
MOVEI R0-R3, N mulator, upper 9 bits and extension reg are
sign extended, LSP portion is set to “0”)
X:(SP-xx) 6 2
X:xxxx 6 3
When a 36-bit accumulator (A or B) is specified as a source operand, there is a possibility that the data
may be limited. If the data out of the shifter indicates that the accumulator extension register is in use,
and the data is to be moved into a 16-bit destination, the value stored in the destination is limited to a
maximum positive or negative saturation constant to minimize truncation error. Limiting does not oc-
cur if an individual 16-bit accumulator register (A1, A0, B1, or B0) is specified as a source operand
instead of the full 36-bit accumulator (A or B). This limiting feature allows block floating-point oper-
ations to be performed with error detection since the L bit in the CCR is latched (that is, sticky).
When a 36-bit accumulator (A or B) is specified as a destination operand, any 16-bit source data to be
moved into that accumulator is automatically extended to 36 bits by sign extending the MSB of the
source operand (bit 15) and appending the source operand with 16 LS zeros. The automatic sign ex-
tension and zeroing features may be circumvented by specifying the destination register to be one of
the individual 16-bit accumulator registers (A1 or B1).
Example:
MOVEM P:(R2)+N,A ; move P:(R2) into A,
; update R2 with N
R2 $0077 R2 $007A
Explanation of Example:
Prior to execution, the 36-bit A accumulator contains the value $A:1234:5678, R2 contains the value
$0077, the N register contains the value $0003, and the 16-bit program memory location P:(R2) con-
tains the value $0116. Execution of the MOVEM instruction moves the 16-bit program memory loca-
tion P:(R2) into the 36-bit A accumulator. R2 is then post-incremented by N.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
1. These instructions are not allowed when the XP bit in the OMR is set (that is, when the instructions are executing
from data memory).
When a 36-bit accumulator (A or B) is specified as a source operand, there is a possibility that the data
may be limited. If the data out of the shifter indicates that the accumulator extension register is in use,
and the data is to be moved into a 16-bit destination, the value stored in the destination is limited to a
maximum positive or negative saturation constant to minimize truncation error. Limiting does not oc-
cur if an individual 16-bit accumulator register (A1, A0, B1, or B0) is specified as a source operand
instead of the full 36-bit accumulator (A or B). This limiting feature allows block floating-point oper-
ations to be performed with error detection since the L bit in the CCR is latched (that is, sticky).
When a 36-bit accumulator (A or B) is specified as a destination operand, any 16-bit source data to be
moved into that accumulator is automatically extended to 36 bits by sign extending the MSB of the
source operand (bit 15) and appending the source operand with 16 LS zeros. The automatic sign ex-
tension and zeroing features may be circumvented by specifying the destination register to be one of
the individual 16-bit accumulator registers (A1 or B1).
Usage: This MOVEP instruction provides a more efficient way of accessing the last 64 locations in X memory,
which may be allocated to memory-mapped peripheral registers. If located outside the
X:$FFC0-X:$FFFF range, use other suitable addressing mode. Consult the specific DSP56800-based
device’s user manual for information on where in the memory map peripheral registers are located.
Example:
MOVEP R1,X:<<$FFE2 ; write to location X:$FFE2
Before Execution After Execution
R1 5555 R1 5555
Explanation of Example:
Prior to execution, the peripheral location <<$FFE2 contains the value $0123. Execution of the
MOVEP R1,X:<<$FFE2 instruction moves the value $5555 contained in the R1 register into the lo-
cation.
Example:
MOVEP #$0342,X:$FFE4 ; moves 16-bit value into
; peripheral location $FFE4
Explanation of Example:
Prior to execution, the word at X data memory location $FFE4 contains the value $AAAA. Execution
of the instruction moves the value $0342 into this location. Note that $FFE4 is recognized as a periph-
eral mapped register.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
1. The MOVEP instruction provides a more efficient way of accessing the last 64 locations in X memory, which may
be allocated to memory-mapped peripheral registers. If peripheral registers are located outside the X:$FFC0-X:$FFFF
range, use other suitable addressing mode. Consult the specific DSP56800-based device’s user manual for information
on where in the memory map peripheral registers are located.
When a 36-bit accumulator (A or B) is specified as a source operand, there is a possibility that the data
may be limited. If the data out of the shifter indicates that the accumulator extension register is in use,
and the data is to be moved into a 16-bit destination, the value stored in the destination is limited to a
maximum positive or negative saturation constant to minimize truncation error. Limiting does not oc-
cur if an individual 16-bit accumulator register (A1, A0, B1, or B0) is specified as a source operand
instead of the full 36-bit accumulator (A or B). This limiting feature allows block floating-point oper-
ations to be performed with error detection since the L bit in the CCR is latched (that is, sticky).
When a 36-bit accumulator (A or B) is specified as a destination operand, any 16-bit source data to be
moved into that accumulator is automatically extended to 36 bits by sign extending the MSB of the
source operand (bit 15) and appending the source operand with 16 LS zeros. The automatic sign ex-
tension and zeroing features may be circumvented by specifying the destination register to be one of
the individual 16-bit accumulator registers (A1 or B1).
Example:
MOVES X:<$0034,Y1 ; write to X:$0034
Y1 0123 Y1 5555
Explanation of Example:
Prior to execution, X:$0034 contains the value $5555 and Y1 contains the value $0123. Execution of
the instruction moves the value $5555 into the Y1 register.
Example:
MOVES #$0342,X:$24 ; moves 16-bit value directly
; into memory location
Explanation of Example:
Prior to execution, the contents of the X data memory location $24 contains the value $AAAA. The
MOVES zero-extends the value $24 to form the memory address $0024. Execution of the instruction
moves the value $0342 into this location. Note that address $24 is recognized as a candidate for short
addressing.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Usage: This instruction is used for multiplication of fractional data or integer data when a full 32-bit product
is required (see Section 3.3.5.2, “Integer Multiplication,” on page 3-20). When the destination is a
16-bit register, this instruction is useful only for fractional data.
Example:
MPY X0,Y1,A ; multiply X0 by Y1
X0 4000 X0 4000
Y1 F456 Y1 F456
Explanation of Example:
Prior to execution, the 16-bit X0 register contains the value $4000 (0.5), the 16-bit Y1 register contains
the value $F456 (-0.09112), and the 36-bit A accumulator contains the value $0:1000:0000 (0.125).
Execution of the MPY X0,Y1,A instruction multiplies the 16-bit signed value in the X0 register by
the 16-bit signed value in Y1 and stores the result ($F:FA2B:0000) into the A accumulator,
X0 * Y1 = -0.04556 (truncated here to 5 decimal places).
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Parallel Moves:
Data ALU Operation Parallel Memory Move
1
Operation Operands Source Destination2
X0 X:(Rn)+
Y1 X:(Rn)+N
Y0
A
B
(F = A or B) A1
B1
1. This instruction occupies only 1 program word and executes in 1 instruction cycle
for every addressing mode.
2. The destination of the data ALU operation is not allowed to be the same register
as the destination of the parallel read operation. Memory writes are allowed in this case.
1. This parallel instruction is not allowed when the XP bit in the OMR is set (that is, when the instructions
are executing from data memory).
2. This instruction occupies only 1 program word and executes in 1 instruction cycle for every addressing
mode.
Timing: 2 + mv oscillator clock cycles for MPY instructions with parallel move
2 oscillator clock cycles for MPY instructions without parallel move
Description: Multiply the two signed 16-bit source operands, round the 32-bit fractional product, and place the re-
sult in the destination (D). Both source operands must be located in the FF1 portion of an accumulator
or in X0, Y0, or Y1. The fractional product is sign extended before the rounding operation, and the
result is then stored in the destination. If the destination is one of the 16-bit registers, only the high-or-
der 16 bits of the rounded fractional result are stored. This instruction uses the rounding technique that
is selected by the R bit in the OMR. When the R bit is cleared (default mode), convergent rounding is
selected; when the R bit is set, two’s-complement rounding is selected. Refer to Section 3.5, “Round-
ing,” on page 3-30 for more information about the rounding modes. Note that the rounding operation
will always zero the LSP of the result if the destination (D) is an accumulator.
Usage: This instruction is used for multiplication and rounding of fractional data.
Example:
MPYR -X0,Y1,A ; multiply X0 by Y1 and
; negate the product
X0 4000 X0 4000
Y1 F456 Y1 F456
Explanation of Example:
Prior to execution, the 16-bit X0 register contains the value $4000 (0.5), the 16-bit Y1 register contains
the value $F456 (-0.09112), and the 36-bit A accumulator contains the value $00:1000:1234
(0.12500). Execution of the MPYR -X0,Y1,A instruction multiplies the 16-bit signed value in the X0
register by the 16-bit signed value in Y1, rounds the result, and stores the result ($0:05D5:0000) into
the A accumulator, -X0 * Y1 = 0.04556 (truncated here to 5 decimal places). In this example, the de-
fault rounding (convergent rounding) is performed.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Instruction Fields:
Operation Operands C W Comments
X0 X:(Rn)+
Y1 X:(Rn)+N
Y0
A
B
(F = A or B) A1
B1
1. This instruction occupies only 1 program word and executes in 1 instruction cycle
for every addressing mode.
2. The destination of the data ALU operation is not allowed to be the same register
as the destination of the parallel read operation. Memory writes are allowed in this case.
1. This parallel instruction is not allowed when the XP bit in the OMR is set (that is, when the instructions
are executing from data memory).
2. This instruction occupies only 1 program word and executes in 1 instruction cycle for every addressing
mode.
Timing: 2 + mv oscillator clock cycles for MPYR instructions with parallel move
2 oscillator clock cycles for MPYR instructions without parallel move
Usage: In addition to single-precision multiplication of a signed value times unsigned value, this instruction
is also used for multi-precision multiplications, as shown in Section 3.3.8.2, “Multi-Precision Multi-
plication,” on page 3-23.
Example:
MPYSU X0,Y0,A
X0 3456 X0 3456
Y0 8000 Y0 8000
Explanation of Example:
The 16-bit X0 register contains the value $3456, and the 16-bit Y0 register contains the value $8000.
Execution of the MPYSU X0,Y0,A instruction multiplies the 16-bit signed value in the X0 register
by the 16-bit unsigned value in Y0 and stores the signed result into the A accumulator. If this was a
MPY instruction, Y0 ($8000) would equal -1.0, and the multiplication result would be
$F:CBAA:0000. Since this is an MPYSU instruction, Y0 is considered unsigned and equals +1.0. This
gives a multiplication result of $0:3456:0000.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Usage: This instruction is used for negating a 36-bit accumulator. It can also be used to negate a 16-bit value
loaded in the MSP of an accumulator if the LSP of the accumulator is $0000 (see Section 8.1.6, “Un-
signed Load of an Accumulator,” on page 8-7).
Example:
NEG B X0,X:(R3)+ ; 0-B → B, save X0, update R3
SR 0300 SR 0309
Explanation of Example:
Prior to execution, the 36-bit B accumulator contains the value $0:1234:5678. The NEG B instruction
takes the two’s-complement of the value in the B accumulator and stores the 36-bit result back in the
B accumulator.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Parallel Moves:
NEG F X:(Rn)+ X0
X:(Rn)+N Y1
Y0
A
B
A1
B1
X0 X:(Rn)+
Y1 X:(Rn)+N
Y0
A
B
(F = A or B) A1
B1
1. This instruction occupies only 1 program word and executes in 1 instruction cycle
for every addressing mode.
2. The destination of the data ALU operation is not allowed to be the same register
as the destination of the parallel read operation. Memory writes are allowed in this case.
Example:
NOP ; increment the program counter
Explanation of Example:
The NOP instruction increments the PC and completes any pending pipeline actions.
NOP 2 1 No operation
Example:
TST A
REP #31 ; maximum number of iterations
; (31) needed
NORM R0,A ; perform one normalization
; iteration
R0 0000 R0 FFF1
Explanation of Example:
Prior to execution, the 36-bit A accumulator contains the value $0:0000:8000, and the 16-bit R0 ad-
dress register contains the value $0000. The repetition of the NORM R0,A instruction normalizes the
value in the 36-bit accumulator and stores the resulting number of shifts performed during that normal-
ization process in the R0 address register. A negative value reflects the number of left shifts performed,
while a positive value reflects the number of right shifts performed during the normalization process.
In this example, 15 left shifts are required for normalization.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Example:
NOT A A,X:(R2)+ ; save A1 and take the 1’s complement
; of A1
SR 0300 SR 0300
Explanation of Example:
Prior to execution, the 36-bit A accumulator contains the value $5:1234:5678. The NOT A instruction
takes the one’s-complement of bits 31–16 of the A accumulator (A1) and stores the result back in the
A1 register. The remaining A accumulator bits are not affected.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Implementation Note:
This instruction is an alias to the BFCHG instruction, and assembles as BFCHG with the 16-bit imme-
diate mask set to $FFFF. This instruction will disassemble as a BFCHG instruction.
Description: Take the one’s complement of the destination operand (D), and store the result in the destination. This
instruction is a 16-bit operation. If the destination is a 36-bit accumulator, the one’s-complement is
performed on bits 31–16 of the accumulator. The remaining bits of the destination accumulator are not
affected. C is also modified as described in following discussion.
Example:
NOTC R2
R2 CAA3 R2 355C
SR 3456 SR 3456
Explanation of Example:
Prior to execution, the R2 register contains the value $CAA3. Execution of the instruction comple-
ments the value in R2. C is modified as described in following discussion.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Usage: This instruction is used for the logical OR of two registers. If it is desired to OR a 16-bit immediate
value with a register or memory location, then the ORC instruction is appropriate.
Example:
OR Y1,B ; OR Y1 with B
Y1 FF00 Y1 FF00
Explanation of Example:
Prior to execution, the 16-bit Y1 register contains the value $FF00, and the 36-bit B accumulator con-
tains the value $0:1234:5678. The OR Y1,B instruction logically ORs the 16-bit value in the Y1 reg-
ister with B1 and stores the 36-bit result in the B accumulator.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
F1,DD
Description: Logically OR a 16-bit immediate data value with the destination operand (D) and store the results back
into the destination. C is also modified as described in following discussion. This instruction performs
a read-modify-write operation on the destination and requires two destination accesses.
Example:
ORC #$5050,X:<<$7C30 ; OR with immediate data
SR 0300 SR 0300
Explanation of Example:
Prior to execution, the 16-bit X memory location X:$7C30 contains the value $00AA. Execution of the
instruction tests the state of bits 14, 12, 6, and 4 in X:$7C30; does not set C (because all these bits were
not set); and then sets the bits.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Implementation Note:
This instruction is implemented by the assembler using either a MOVE or LEA instruction, depending
on the form. When a destination register is specified, a MOVE X:(SP)-,<register> instruction
is assembled. When no destination register is specified, POP assembles as LEA (SP)-. The instruc-
tion will always disassemble as either MOVE or LEA.
Example:
POP LC
LC 0099 LC AAAA
SP 0100 SP 00FF
Explanation of Example:
Prior to execution, the LC register contains the value $0099, and the SP contains the value $0100. The
POP instruction reads from the location in X data memory pointed to by the SP and places this value
in the LC register. The SP is then decremented after the read from memory.
The REP instruction allows all registers on the DSC core to specify the number of loop iterations ex-
cept for the following: M01, HWS, OMR, and SR. If immediate short data is instead used to specify
the loop count, the 6 LSBs of the LC register are loaded from the instruction and the upper 7 MSBs are
cleared.
Note: If the A or B accumulator is specified as a source operand, and the data out of the accumulator indicates
that extension is in use, the value to be loaded into the LC register will be limited to a 16-bit maximum
positive or negative saturation constant. If positive saturation occurs, the limiter places $7FFF onto the
bus, and the lower 13 bits of this value are all ones. The 13 ones are loaded into the LC register as the
maximum unsigned positive loop count allowed. If negative saturation occurs, the limiter places $8000
onto the bus, and the lower 13 bits of this value are all zeros. The 13 zeros are loaded into the LC reg-
ister, specifying a loop count of zero. The A and B accumulators remain unchanged.
Note: Once in progress, the REP instruction and the REP loop may not be interrupted until completion of the
REP loop.
Restrictions:
The REP instruction can repeat any single word instruction except the REP instruction itself and any
instruction that changes program flow. The following instructions are not allowed to follow a REP in-
struction:
Also, a REP instruction cannot be the last instruction in a DO loop (at the LA). The assembler will
generate an error if any of the preceding instructions are found immediately following a REP instruc-
tion.
X0 0003 X0 0003
Y1 0000 Y1 0003
LC 00A5 LC 00A5
Explanation of Example:
Prior to execution, the 16-bit X0 register contains the value $0003, and the 13-bit LC register contains
the value $00A5. Execution of the REP X0 instruction takes the lower 13 bits of the value in the X0
register and stores it in the 13-bit LC register. Then, the single word INCW instruction immediately
following the REP instruction is repeated $0003 times. The contents of the LC register before the REP
loop are restored upon exiting the REP loop.
Example:
REP X0 ; repeat (X0) times
INCW Y1 ; increment the Y1 register
ASL Y1 ; multiply the Y1 register by 2
X0 0000 X0 0000
Y1 0005 Y1 000A
LC 00A5 LC 00A5
Explanation of Example:
Prior to execution, the 16-bit X0 register contains the value $0000, and the 13-bit LC register contains
the value $00A5. Execution of the REP X0 instruction takes the lower 13 bits of the value in the X0
register and stores it in the 13-bit LC register. Since the loop count is zero, the single word INCW in-
struction immediately following the REP instruction is skipped and execution continues with the ASL
instruction. The contents of the LC register before the REP loop are restored upon exiting the REP
loop.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Example:
RND A ; round A accumulator into
; A2:A1, zero A0
A2 A1 A0 A2 A1 A0
Before Execution After Execution
A2 A1 A0 A2 A1 A0
A2 A1 A0 A2 A1 A0
Explanation of Example:
Prior to execution, the 36-bit A accumulator contains the value $5:1236:789A for Case I, the value
$0:1236:8000 for Case II and the value $0:1235:8000 for Case III. Execution of the RND A instruction
rounds the value in the A accumulator into the MSP of the A accumulator (A1) and then zeros the LSP
of the A accumulator (A0). The example is given assuming that the convergent rounding is selected.
Case II is the special case that distinguishes convergent rounding from the two’s-complement round-
ing, since it clears the LSB of the MSP after the rounding operation is performed.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
RND F 2 1 Round
Parallel Moves:
RND F X:(Rn)+ X0
X:(Rn)+N Y1
Y0
A
B
A1
B1
X0 X:(Rn)+
Y1 X:(Rn)+N
Y0
A
B
(F = A or B) A1
B1
1. This instruction occupies only 1 program word and executes in 1 instruction cy-
cle for every addressing mode.
2. The destination of the data ALU operation is not allowed to be the same register
as the destination of the parallel read operation. Memory writes are allowed in this
case.
C Unch. Unchanged
D2 D1 D0
Description: Logically shift 16 bits of the destination operand (D) 1 bit to the left, and store the result in the desti-
nation. If the destination is a 36-bit accumulator, the result is stored in the MSP of the accumulator
(FF1 portion), and the remaining portions of the accumulator are not modified. The MSB of the desti-
nation (bit 31 for accumulators or bit 15 for registers) prior to the execution of the instruction is shifted
into C, and the previous value of C is shifted into the LSB of the destination (bit 16 if the destination
is a 36-bit accumulator). The result is not affected by the state of the saturation bit (SA).
Example:
ROL B ; rotate B1 left 1 bit
SR 0001 SR 0000
Explanation of Example:
Prior to execution, the 36-bit B accumulator contains the value $F:0001:00AA. Execution of the
ROL B instruction shifts the 16-bit value in the B1 register 1 bit to the left, shifting bit 31 into C, ro-
tating C into bit 16, and storing the result back in the B1 register.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
ROL FDD 2 1 Rotate 16-bit register left by 1 bit through the carry bit
C Unch. Unchanged
D2 D1 D0
Description: Logically shift 16 bits of the destination operand (D) 1 bit to the right, and store the result in the des-
tination. If the destination is a 36-bit accumulator, the result is stored in the MSP of the accumulator
(FF1 portion), and the remaining portions of the accumulator are not modified. The LSB of the desti-
nation (bit 16 for a 36-bit accumulator) prior to the execution of the instruction is shifted into C, and
the previous value of C is shifted into the MSB of the destination (bit 31 for a 36-bit accumulator). The
result is not affected by the state of the saturation bit (SA).
Example:
ROR B ; rotate B1 right 1 bit
SR 0000 SR 0005
Explanation of Example:
Prior to execution, the 36-bit B accumulator contains the value $F:0001:00AA. Execution of the
ROR B instruction shifts the 16-bit value in the B1 register 1 bit to the right, shifting bit 16 into C,
rotating C into bit 31, and storing the result back in the B1 register.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
ROR FDD 2 1 Rotate 16-bit register right by 1 bit through the carry bit
Example:
RTI ; pull the SR and PC registers
; from the stack
SR 0309 SR 1300
SP 0100 SP 00FE
Explanation of Example:
The RTI instruction pulls the 16-bit PC and the 16-bit SR from the stack and updates the system SP.
Program execution continues at $754C.
Restrictions:
Due to pipelining in the program controller and the fact that the RTI instruction accesses certain pro-
gram controller registers, the RTI instruction must not be immediately preceded by any of the follow-
ing instructions:
MOVE(C) to the SP
Any bit-field instruction performed on the SR
An RTI instruction cannot be the last instruction in a DO loop (at the LA).
An RTI instruction cannot be repeated using the REP instruction.
Condition Codes Affected:
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
All bits — Set according to the value pulled from the stack
Instruction Fields:
Example:
RTS ; pull SR (and discard it) &
; pull PC from the stack
SR 8009 SR 8009
SP 0100 SP 00FE
Explanation of Example:
The example makes the assumption that during entry of the subroutine, only the LF bit (SR bit 15) is
on. During execution of the subroutine, the C and N bits were set. To perform the return, RTS pops the
16-bit PC from the software stack, and updates the SP. Program execution continues at $754C.
Restrictions:
Due to pipelining in the program controller and the fact that the RTS instruction accesses certain pro-
gram controller registers, the RTS instruction must not be immediately preceded by the following in-
struction:
MOVE(C) to the SP
An RTS instruction cannot be the last instruction in a DO loop (at the LA).
An RTS instruction cannot be repeated using the REP instruction.
Manipulation of bits 10-14 in the stack location corresponding to the SR register may generate unwant-
ed behavior. These bits will read as zero during DSC read operations and should be written as zero to
ensure future compatibility.
Condition Codes Affected:
The condition codes are not affected by this instruction.
Instruction Fields:
Usage: This instruction is typically used in multi-precision subtraction operations (see Section 3.3.8.1,
“Multi-Precision Addition and Subtraction,” on page 3-23) when it is necessary to subtract two num-
bers that are larger than 32 bits, such as 64-bit or 96-bit subtraction.
Example:
SBC Y,A
SR 0301 SR 0310
Explanation of Example:
Prior to execution, the 32-bit Y register (comprised of the Y1 and Y0 registers) contains the value
$3FFF:FFFE, and the 36-bit accumulator contains the value $0:4000:0000. In addition, C is set to one.
The SBC instruction automatically sign extends the 32-bit Y registers to 36-bits and subtracts this val-
ue from the 36-bit accumulator. In addition, C is subtracted from the LSB of this 36-bit addition. The
36-bit result is stored back in the A accumulator, and the conditions codes are set correctly. The Y1:Y0
register pair is not affected by this instruction.
Note: C is set correctly for multi-precision arithmetic using long-word operands only when the extension reg-
ister of the destination accumulator (A2 or B2) contains sign extension of bit 31 of the destination ac-
cumulator (A or B).
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Description: Enter the stop processing state. All activity in the processor is suspended until the RESET pin is as-
serted, the IRQA pin is asserted, or an on-chip peripheral asserts a signal to exit the stop processing
state. The stop processing state is a very low-power standby mode where all clocks to the DSC core,
as well as the clocks to many of the on-chip peripherals such as serial ports, are gated off. It is still
possible for timers to continue to run in stop state. In these cases the timers can be individually powered
down at the peripheral itself for lower power consumption. The clock oscillator can also be disabled
for lowest power consumption.
When the exit from the stop state is caused by a low level on the RESET pin, then the processor enters
the reset processing state. The time to recover from the stop state using RESET will depend on a clock
stabilization delay controlled by the stop delay (SD) bit in the OMR.
When the exit from the stop state is caused by a low level on the IRQA pin, then the processor will
service the highest priority pending interrupt and will not service the IRQA interrupt unless it is highest
priority. The interrupt will be serviced after an internal delay counter counts 524,284 clock phases (that
is, [219-4]T) or 28 clock phases (that is, [25-4]T) of delay if the SD bit is set to one. During this clock
stabilization count delay, all peripherals and external interrupts are cleared and re-enabled/arbitrated
at the start of the 17T period following the count interval. The processor will resume program execu-
tion at the instruction following the STOP instruction (the one that caused the entry into the stop state)
after the interrupts have been serviced or, if no interrupt was pending, immediately after the delay
count plus 17T. If the IRQA pin is asserted when the STOP instruction is executed, the internal delay
counter will be started. Refer to Section 7.5, “Stop Processing State,” on page 7-19 for details on the
stop mode.
Restrictions:
A STOP instruction cannot be repeated using the REP instruction.
A STOP instruction cannot be the last instruction in a DO loop (that is, at the LA).
Example:
STOP ; enter low-power standby mode
Explanation of Example:
The STOP instruction suspends all processor activity until the processor is reset or interrupted as pre-
viously described. The STOP instruction puts the processor in a low-power standby mode. No new in-
structions are fetched until the processor exits the STOP processing state.
Timing: The STOP instruction disables internal distribution of the clock. The time to exit the stop state depends
on the value of the SD bit.
Usage: This instruction can be used for both integer and fractional two’s-complement data.
Example:
SUB X0,A X:(R2)+N,X0 ; 16-bit subtract, load X0,
; update R2
X0 0003 X0 3456
Explanation of Example:
Prior to execution, the 16-bit X0 register contains the value $0003 and the 36-bit A accumulator con-
tains the value $0:0058:1234. The SUB instruction automatically appends the 16-bit value in the X0
register with 16 LS zeros, sign extends the resulting 32-bit long word to 36 bits, and subtracts the result
from the 36-bit A accumulator. Thus, 16-bit operands are always subtracted from the MSP of A or B
(A1 or B1) with the results correctly extending into the extension register (A2 or B2).
Operands of 16 bits can be subtracted from the LSP of A or B (A0 or B0). This can be achieved using
the Y register. When loading the 16-bit operand into Y0 and loading Y1 with the sign extension of Y0,
a 32-bit word is formed. Executing a SUB Y,A or SUB Y,B instruction generates the desired opera-
tion. Similarly, the second accumulator can also be used for the source operand.
Note: Bit C is set correctly using word or long word source operands if the extension register of the destina-
tion accumulator (A2 or B2) contains sign extension from bit 31 of the destination accumulator (A or
B). C is always set correctly using accumulator source operands.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
SUB DD,FDD 2 1 36-bit subtract of two registers. 16-bit source registers are
first sign extended internally and concatenated with 16
F1,DD zero bits to form a 36-bit operand.
A,B
B,A
Y,A
Y,B
A,B A
B,A B
A1
B1
X0 X:(Rn)+
Y1 X:(Rn)+N
Y0
A
B
(F = A or B) A1
B1
1. This instruction occupies only 1 program word and executes in 1 instruction cy-
cle for every addressing mode.
2. The destination of the data ALU operation is not allowed to be the same register
as the destination of the parallel read operation. Memory writes are allowed in this
case.
1. This parallel instruction is not allowed when the XP bit in the OMR is set (that is, when the instructions
are executing from data memory).
2. This instruction occupies only 1 program word and executes in 1 instruction cycle for every addressing
mode.
Timing: 2 + mv oscillator clock cycles for SUB instructions with a parallel move
Refer to previous tables for SUB instructions without a parallel move
Example:
SWI ; begin SWI exception processing
Explanation of Example:
The SWI instruction suspends normal instruction execution and initiates SWI exception processing.
Restrictions:
A SWI instruction cannot be repeated using the REP instruction.
Condition Codes Affected:
The condition codes are not affected by this instruction.
Instruction Fields:
Usage: When used after the CMP instruction, the Tcc instruction can perform many useful functions such as
a “maximum value” or “minimum value” function. The desired value is stored in the destination accu-
mulator. If address register R0 is used as an address pointer into an array of data, the address of the
desired value is stored in the address register R1. The Tcc instruction may be used after any instruction
and allows efficient searching and sorting algorithms.
Note: This instruction is considered to be a move-type instruction. Due to pipelining, if an address register
(R0 or R1 for the Tcc instruction) is changed using a move-type instruction, the new contents of the
destination address register will not be available for use during the following instruction (that is, there
is a single-instruction-cycle pipeline delay).
A B (No transfer)
B A (No transfer)
B A R0 R1
Note: The Tcc instruction does not allow the following condition codes: HI, LS, NN, and NR.
Usage: This instruction is very similar to a MOVE instruction but has two uses. First, it can be used to perform
a 36-bit transfer of one accumulator to another. Second, when used with a parallel move, this instruc-
tion allows a register move and a memory move to occur simultaneously in 1 instruction that executes
in 1 instruction cycle.
Example:
TFR B,A X:(R0)+,Y1 ; move B to A and update Y1, R0
Explanation of Example:
Prior to execution, the 36-bit A accumulator contains the value $3:0123:0123 and the 36-bit B accu-
mulator contains the value $A:CCCC:EEEE. Execution of the TFR B,A instruction moves the 36-bit
value in B into the 36-bit A accumulator.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
B,A
Parallel Moves:
Data ALU Operation Parallel Memory Move
A,B A
B,A B
A1
B1
X0 X:(Rn)+
Y1 X:(Rn)+N
Y0
A
B
A1
(F = A or B) B1
1. This instruction occupies only 1 program word and executes in 1 instruction cy-
cle for every addressing mode.
2. The destination of the data ALU operation is not allowed to be the same register
as the destination of the parallel read operation. Memory writes are allowed in this
case.
Example:
TST A X:(R0)+N,B ; set condition codes for the
; value in A, update B & R0
SR 0300 SR 0338
Explanation of Example:
Prior to execution, the 36-bit A accumulator contains the value $8:0203:0000, and the 16-bit SR con-
tains the value $0300. Execution of the TST A instruction compares the value in the A register with
zero and updates the CCR accordingly. The contents of the A accumulator are not affected.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
Parallel Moves:
TST F X:(Rn)+ X0
X:(Rn)+N Y1
Y0
A
B
A1
B1
X0 X:(Rn)+
Y1 X:(Rn)+N
Y0
A
B
(F = A or B) A1
B1
1. This instruction occupies only 1 program word and executes in 1 instruction cycle for ev-
ery addressing mode.
2. The destination of the data ALU operation is not allowed to be the same register as the
destination of the parallel read operation. Memory writes are allowed in this case.
Example:
TSTW X:$0007 ; set condition codes using X:$0007
SR 0300 SR 0308
Explanation of Example:
Prior to execution, location X:$0007 contains the value $FC00 and the 16-bit SR contains the value
$0300. Execution of the instruction compares the value in memory location X:$0007 with zero and up-
dates the CCR accordingly. The value of location X:$0007 is not affected.
Note: This instruction does not set the same set of condition codes that the TST instruction does. Both in-
structions correctly set the V, N, Z, and C bits, but TST sets the E bit and TSTW does not. This is a
16-bit test operation when done on an accumulator (A or B), where limiting is performed if appropriate
when reading the accumulator.
MR CCR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LF * * * * * I1 I0 SZ L E U N Z V C
TSTW DDDDD 2 1 Test 16-bit word in register. All registers allowed except
(except HWS) HWS. Limiting performed if an accumulator is specified
and the extension register is in use.
X:(Rn+N) 4 1
X:(Rn+xxxx) 6 2
X:(R2+xx) 4 1
X:(SP-xx) 4 1
X:aa represents a 6-bit absolute address. Refer to Abso-
X:aa 2 1
lute Short Address (Direct Addressing): <aa> on page
4-22.
X:<<pp 2 1
When an unmasked interrupt or external (hardware) processor reset occurs, the processor leaves the
wait state and begins exception processing of the unmasked interrupt or reset condition.
Restrictions:
A WAIT instruction cannot be the last instruction in a DO loop (at the LA).
A WAIT instruction cannot be repeated using the REP instruction.
Example:
WAIT ; enter low-power mode,
; wait for interrupt
Explanation of Example:
The WAIT instruction suspends normal instruction execution and waits for an unmasked interrupt or
external reset to occur. No new instructions are fetched until the processor exits the wait processing
state.
Timing: If an internal interrupt is pending during the execution of the WAIT instruction, the WAIT instruction
takes a minimum of 32T cycles to execute.
If no internal interrupt is pending when the WAIT instruction is executed, the period that the DSC is
in the wait state equals the sum of the period before the interrupt or reset causing the DSC to exit the
wait state and a minimum of 28T cycles to a maximum of 31T cycles (see the appropriate data sheet).
In each code example, the number of program words that each instruction occupies and the execution time
(in instruction cycles) for each are listed in the comments and summed at the end.
Table B-1 shows the number of program words and instruction cycles for each benchmark. All the I/O
accesses in this chapter are assumed to be implemented using I/O short addressing mode.
Table B-1. Benchmark Summary
Program
Execution Time
Benchmark Length
(# Icyc)
(# Words)
N Complex Multiplies 6N 18
Program
Execution Time
Benchmark Length
(# Icyc)
(# Words)
Vector Multiply-Accumulate 2N 14
Energy in a Signal 1N 7
page 132
opt cc
; Global Definition for Loop Count
N_ EQU 100 ; loop count in various benchmarks
; Peripheral addr are dependent on device implementation (assumes short addr mode)
InputValue EQU $FFC8 ; I/O peripheral address used for input
Output EQU $FFC9 ; I/O peripheral address used for output
opt cc
MOVE #N_,N ; 2 2 load size of vector
MOVE #A_Vec1,R0 ; 2 2 pointer to vector
MOVE #B_Vec1,R3 ; 2 2 pointer to vector
CLR A X:(R0)+,Y0 ; 1 1 clear & load 1st element in A
MOVE X:(R3)+,X0 ; 1 1 load 1st element in B
REP N ; 1 3 repeat until done
MAC Y0,X0,A X:(R0)+,Y0 X:(R3)+,X0 ; 1 1 correlation and load new val
RND A ; 1 1 rounding the result
; _________
; Total: 11 1N+12
N M
y( k)= ∑ an y ( k – n ) + ∑ bm x ( k – m )
n=1 m=0
The Biquad Direct Form II realization of the IIR filter above can be described in the following two
equations:
N
r0,r2
X memory
k ar/xr
A X=A+BW
ai/xi
r3,r1
+ br/yr
bi/yi
+
r1
cos(2πk/N)
+
-sin(2πk/N)
k
W
- X0 Y0 Y1
bi br wr -wi
A B
B Y=A-BWk
yi/ai/yr/ar xi/ai/xr/ar
AA0079
y(n)
d(n) e(n)
AA0080
X memory
r0 x(n)
x(n-1)
.
.
x(n-N+1)
r3, r1 c0
c1
c2
.
c(N-1)
AA0081
opt cc
PUSH M01 ; 2 2 save addr mode state
MOVE #X_Vec7,R0 ; 2 2 start of X
MOVE #N_-1,M01 ; 2 2 modulo N_
MOVE M01,Y1 ; 1 1 initialize REP loop count
MOVE #-2,N ; 1 1 adjustment for filtering
MOVEP X:InputValue,Y0 ; 1 1 get input sample
MOVE #Coeff,R3 ; 2 2 start of coefficients
CLR A Y0,X:(R0)+ ; 1 1 save input in x(n),incr R0
MOVE X:(R3)+,X0 ; 1 1 X0=c[0] and incr R3
REP Y1 ; 1 3 do fir
MAC Y0,X0,A X:(R0)+,Y0 X:(R3)+,X0 ; 1 1 accum & update x[i] and c[i]
MACR Y0,X0,A ; 1 1 last tap
MOVEP A,X:Output ; 1 1 output fir if desired
; (Get d(n), subtract fir output, multiply by “u”, put the result in y1.
; This section is application dependent.)
MOVE #Coeff,R3 ; 2 2 start of coefficients
MOVE R3,R1 ; 1 1 start of coefficients
MOVE X:(R0)+,Y0 ; 1 1 Y0=x(n) and incr R0
MOVE X:(R3)+,A ; 1 1 a=c[0] and incr R3
DO #NTaps,EndDO1_7_1 ; 2 3 update coefficients
MACR X0,Y0,A X:(R0)+,Y0 X:(R3)+,X0 ; 1 1 A=c[i]*x[n-i] Y0=x[n-i-1]
; X0=c[i+1]
TFR X0,A A,X:(R1)+ ; 1 1 A=c[i+1], save c[i]*x[n-i]
_COEFF_UPDATE1_7_1:
X memory
r0 x(n)
x(n-1)
.
.
x(n-N+1)
r1, r3 c0_H
c0_L
c1_H
c1_L
. AA0082
opt cc
PUSH M01 ; 2 2 save addr mode state
MOVE #X_Vec7,R0 ; 2 2 start of X
MOVE #N_-1,M01 ; 2 2 modulo N_
MOVE M01,Y1 ; 1 1 initialize REP loop count
MOVE #2,N ; 1 1 adjustment for filtering
MOVEP X:InputValue,Y0 ; 1 1 get input sample
MOVE #Coeff,R3 ; 2 2 start of coefficients
CLR A Y0,X:(R0)+ ; 1 1 save input in x(n),incr R0
MOVE X:(R3)+N,X0 ; 1 1 X0=c[0,H] and incr R3
DO Y1,Do_FIR ; 2 3 do fir
MAC X0,Y0,A X:(R0)+,Y0 ; 1 1 accum & update x[i]
MOVE X:(R3)+N,X0 ; 1 1 update c[i,H]
Do_FIR:
MACR X0,Y0,A ; 1 1 last tap
MOVEP A,X:Output ; 1 1 output fir if desired
; (Get d(n), subtract fir output, multiply by “u”, put the result in x0.
; This section is application dependent.)
MOVE #Coeff,R3 ; 2 2 start of coefficients
MOVE R3,R1 ; 1 1 start of coefficients
MOVE X:(R0)+,y0 ; 1 1 Y0=x(n) and incr R0
MOVE X:(R3)+,A ; 1 1 a=c[0,H] and incr R3
MOVE X:(R3)+,A0 ; 1 1 a0=c[0,L] and incr R3
DO #NTaps,EndDO1_7_2 ; 2 3 update coef.
MAC X0,Y0,A X:(R0)+,Y0 ; 1 1 u e(n) x(n)+c; fetch x(n)
X memory
r0 x(n)
x(n-1)
.
x(n-N+1)
c0_H
r1, r3 c0_L
c1_H
c1_L
AA0083
Figure B-5. LMS Adaptive Filter — Double Precision Delayed Memory Map
X memory
r0 a1
a2
a3
r3
b1
c1 a1 b2
b1
c2 = a2 + y0 X b3
b2
c3 a3 b3 r1
c1
c2
c3
AA0084
opt cc
; Y0 is assumed to have been initialized with the multiplier scalar value
MOVEI #N_,N ; 2 2 vector size
MOVE #A_Vec8,R0 ; 2 2 point to vec a
MOVE #B_Vec8,R3 ; 2 2 point to vec b
MOVE #C_Vec8,R1 ; 2 2 point to vec c
CLR A X:(R3)+,X0 ; 1 1 X0=b RESULT:A=0
MOVE X:(R0)+,A ; 1 1 A=a
DO N,EndDO1_8 ; 2 3 repeat prod N times
MAC y0,x0,a X:(R0)+,Y1 X:(R3)+,X0 ; 1 1 A=Y0*b+a load nxt a,b
TFR y1,a A,X:(R1)+ ; 1 1 A=next(a) STORE c
EndDO1_8:
; ____________
; Total: 14 2N+13
opt cc
MOVE #A_Vec9,R0 ; 2 2 point to signal a
MOVEI #N_,N ; 2 2 load vector size
CLR A X:(R0)+,Y0 ; 1 1 clear and load 1st val
DO N,EndDO1_9 ; 2 3 repeat size N times
MAC Y0,Y0,A X:(R0)+,Y0 ; 1 1 square value & load nxt
EndDO1_9:
; __________
; Total: 8 1N+8
; if vector pointer located inside 128 addresses, use MOVES X:<AA,R0 (1cyc, 1wrd)
; if vector size is less than 63, initializing N is not required.
Second option when the DO instruction is replaced by REP. This sequence is uninterruptible while
performing the MAC instruction.
opt cc
MOVE #A_Vec9,R0 ; 2 2 point to signal a
MOVEI #N_,N ; 2 2 load vector size
CLR A X:(R0)+,Y0 ; 1 1 clear and load 1st val
REP N ; 1 3 repeat size N times
MAC Y0,Y0,A X:(R0)+,Y0 ; 1 1 square value & load nxt
; __________
; Total: 7 1N+8
; if vector pointer located inside 128 addresses, use MOVES X:<AA,R0 (1cyc, 1wrd)
; if vector size is less than 63, initializing N is not required.
X memory
r3 a11
a12
a13
a21
a22
a23
c1 a11 a12 a13 b1
a31
c2 = a21 a22 a23 X b2
a32
c3 a31 a32 a33 b3
a33
r0 b1
b2
b3
r2
c1
c2
c3
AA0085
opt cc
MOVE #A_Matrx10,R3 ; 2 2 point to mat a
MOVE #B_Vec10,R0 ; 2 2 point to vec b
MOVE #2,M01 ; 2 2 mod 3 addr on R0
MOVE #C_Vec10,R2 ; 2 2 point to vec c
MOVE X:(R0)+,Y0 X:(R3)+,X0 ; 1 1 y0=a11; x0=b1
MPY Y0,X0,A X:(R0)+,Y0 X:(R3)+,X0 ; 1 1 a11*b1
MAC Y0,X0,A X:(R0)+,Y0 X:(R3)+,X0 ; 1 1 +a12*b2
MACR Y0,X0,A X:(R0)+,Y0 X:(R3)+,X0 ; 1 1 +a13*b3
MOVE A,X:(R2)+ ; 1 1 store c1
MPY Y0,X0,A X:(R0)+,Y0 X:(R3)+,X0 ; 1 1 a21*b1
MAC Y0,X0,A X:(R0)+,Y0 X:(R3)+,X0 ; 1 1 +a22*b2
MACR Y0,X0,A X:(R0)+,Y0 X:(R3)+,X0 ; 1 1 +a23*b3
MOVE A,X:(R2)+ ; 1 1 store c2
MPY Y0,X0,A X:(R0)+,Y0 X:(R3)+,X0 ; 1 1 a31*b1
MAC Y0,X0,A X:(R0)+,Y0 X:(R3)+,X0 ; 1 1 +a32*b2
MACR Y0,X0,A ; 1 1 +a33*b3->c3
MOVE A,X:(R2)+ ; 1 1 store c3
; ____________
; Total: 21 21
X memory
r3 a11
a11 .. a1k .. a1N b11 .. b1k .. b1N
. . .
ak1 .. akk .. akN bk1 .. bkk .. bkN a1k
X .
. .
aN1 .. aNk .. aNN bN1 .. bNk .. bNN ak1
.
aN1
=
.
opt cc
; This algorithm utilizes hardware nesting looping; user care necessary on next loop.
; The main assumption: no hardware loops active when this function is called.
MOVE #A_Matrx11,R3 ; 2 2 point to A[1,1]
MOVE R3,Y1 ; 1 1 save pntr to A[1,1]
MOVE #B_Matrx11,R0 ; 2 2 point to B[1,1]
MOVE R0,R1 ; 1 1 save pntr to B[1,1]
MOVE #C_Matrx11,R2 ; 2 2 point to C[1,1] (result)
MOVE #ROW_SIZE11,B ; 1 1 number of rows (N x N)
MOVE B,N ; 1 1 number of repetitions N
DO N,Traverse_A_rows_1 ; 2 3 do all rows
PUSH LC ; 2 2 save LC to allow nesting
PUSH LA ; 2 2 save LA to allow nesting
DO N,Traverse_B_columns_1 ; 2 3 compute a row in C
MOVE Y1,R3 ; 1 1 1st element in A row
MOVE R1,R0 ; 1 1 1st element in B column
CLR A X:(R3)+,X0 ; 1 1 clr sum, get elemnt in A
MOVE X:(R0)+N,Y0 ; 1 1 element in B, next B row
REP #ROW_SIZE11-1 ; 1 3 sum products except last
MAC Y0,X0,A X:(R0)+N,Y0 X:(R3)+,X0 ; 1 1 traverse B rows, A col
; FOR FRACTIONAL ELEMENTS, THE FOLLOWING TWO INSTRUCTIONS ARE REQUIRED
This next version makes use of software loop avoiding the hardware nested looping.
opt cc
; This algorithm utilizes software outer loop avoiding nesting and saving LC,LA regs.
; The main assumption: no hardware nesting loops active when function is called.
MOVE #A_Matrx11,R3 ; 2 2 point to A[1,1]
MOVE R3,Y1 ; 1 1 save pntr to A[1,1]
MOVE #B_Matrx11,R0 ; 2 2 point to B[1,1]
MOVE R0,R1 ; 1 1 save pntr to B[1,1]
MOVE #C_Matrx11,R2 ; 2 2 point to C[1,1] (result)
MOVE #ROW_SIZE11,B ; 1 1 number of rows (N x N)
MOVE B,N ; 1 1 number of repetitions N
MOVES N,X:RowsCnt ; 1 1 number of A rows to do
Traverse_A_rows_2:
DO N,Traverse_B_columns_2 ; 2 3 compute a row in C
MOVE Y1,R3 ; 1 1 1st element in A row
MOVE R1,R0 ; 1 1 1st element in B column
CLR A X:(R3)+,X0 ; 1 1 clr sum, get elemnt in A
MOVE X:(R0)+N,Y0 ; 1 1 element in B, next B row
REP #ROW_SIZE11-1 ; 1 3 sum products except last
MAC Y0,X0,A X:(R0)+N,Y0 X:(R3)+,X0 ; 1 1 traverse B rows, A col
; FOR FRACTIONAL ELEMENTS, THE FOLLOWING TWO INSTRUCTIONS ARE REQUIRED
; THE MACR DOES THE FINAL ACCUMULATION WITH ROUNDING FOR FRACTIONAL RESULTS
MACR Y0,X0,A X:(R1)+,X0 ; 1 1 last sum, next col in R1
MOVE A,X:(R2)+ ; 1 1 save result in C row
The image is an array of 128 pixels x 128 pixels. To provide boundary conditions for the FIR filtering, the
image is surrounded by a set of zeros such that the image is actually stored as a 130x130 array (see
Figure B-10).
130
0 0 0
0 128 0
130
0 Image 0
Area
0 0 0
AA0088
The image (with boundary) is stored in row-major storage. The first element of the array image is
image(1,1) followed by image(1,2). The last element of the first row is image(1,130) followed by the
beginning of the next column image(2,1). These are stored sequentially in the array Image (“im” on
instruction comment) in data memory. For example:
• Image(1,1) maps to index 0.
• Image(1,130) maps to index 129.
• Image(2,1) maps to index 130.
See Table B-2 for the definitions of R0, R2, and R3.
Although many other implementations are possible, this is a realistic type of image environment where the
actual size of the image may not be an exact power of two. Other possibilities include storing a 128x128
image but computing only a 127x127 result, computing a 128x128 result without boundary conditions but
throwing away the pixels on the border, and so on.
Table B-2. Variable Descriptions
Variable Description
R0 image(n,m) image(n,m+1) image(n,m+2)
image(n+130,m) image(n+130,m+1) image(n+130,m+2)
image(n+2*130,m) image(n+2*130,m+1) image(n+2*130,m+2)
R2 output image
R3 FIR coefficients
T T
sin(w t)
0
y1 = 2*sin(πFs/F0)
F0 = Oscillation Frequency
Fs = Sampling Frequency
AA0089
opt cc
CLR B ; 1 1 integration initial value
MOVE #$4000,A ; 2 2 initial value, tone amplitude
MOVE #0,N ; 1 1 set for no post-increment
sin(w t)
– 0
T T
y0 = 2*cos(2πFs/F0)
F0 = Oscillation Frequency
Fs = Sampling Frequency
AA0090
opt cc
CLR A ; 1 1 integration initial value
MOVE #$4000,Y1 ; 2 2 initial value, tone amplitude
MOVE #$6D4B,Y0 ; 2 2 2*sin(pi*Fs/Fo)
MOVE #$1,R1 ; 1 1 arbitrary location for store
MOVE #DummyLoc13,R0 ; 1 1 temporary location to swap val
MOVE #0,N ; 1 1 set for no post-increment
DO X0,EndDO1_13_2 ; 2 3 repeat x0 times
MAC -Y1,Y0,A ; 1 1 1st integration
NEG A Y1,X:(R1)+N ; 1 1 correct and store a result
MAC Y1,Y0,A ; 1 1 2nd integration
MOVE A,X:(R0)+N ; 1 1 use TempLoc for swapping values
TFR Y1,A X:(R0)+N,Y1 ; 1 1 prepare for next integration
EndDO1_13_2:
MOVE Y1,X:(R1) ; 1 1 final approx stored
; ___________
; Total: 16 5N+12
k0
x(n) y(n) X memory
r3 k0
k1
k2
T T
k1
x(n-1) x(n-1)
x(n-2)
r0 x(n)
T k2
x(n-2)
A/D analog-to-digital
AS accumulator shifter
D/A digital-to-analog
E extension bit
FIFO first-in-last-out
GT greater than
HI high
HS high or same
IC integrated circuit
I/O input/output
L limit bit
LIFO last-in-first-out
LT less than
MAC multiply-accumulate
MR mode register
MS most significant
PC program counter
R rounding bit
SA saturation bit
SP stack pointer
SR status register
SZ size bit
TO trace occurrence
U unnormalized bit
V overflow bit
X external
Z zero bit
E-mail:
support@freescale.com
DSP56800FM
Rev. 3.1, 11/2005