Table of Contents
The ARM processor has 31 general-purpose registers including a program counter and 6 status registers. All registers are 32 bits wide but only 12 are implemented in status registers. From these, 15 general-purpose registers (R0 to R14), one or two status registers and the program counter are visible at any time.
The general-purpose registers can be divided into two categories, unbanked (R0 to R7) and banked (R8 to R14). The unbanked registers refer to the same physical registers in all processor modes whereas the physical register of a banked register depends on the specific processor mode.
Although general-purpose, some of the registers have special use in applications, for example the program counter is a generel-purpose register whereas any other and can be refered to as R15.
As is specified in the ARM calling standard, the first four registers R0-R3 are temporary registers which are used for passing parameters to subroutines. If a subroutine uses more than four parameters the rest are stored in the stack. So between subroutine calls, these registers can not be expected to save their state.
The rest of the unbanked registers R4-R7 should always be stored in a subroutine before using them. The same goes for the banked registers R8-R12. Their state is expected to be saved between subroutine calls, with the exception of R12 when using GCC. For reasons unknown, R12 is used as a temporary register in GCC and its state is not saved between subroutine calls.
Some of the banked registers have special use, namely R13, R14 and R15. R13 is used as stack pointer, R14 as link register and R15 as program counter. From these, only the program counter has restrictions about its use, the stack pointer and link register can be used as general-purpose registers at any time but their state should be saved prior to use.
R13, R14 and R15 also have predefined alternative names which can be used. The stack pointer can be refered to as SP, the link register as LR and the program counter as PC.
The link register has a special use in ARM architecture. In each processor mode the mode's own version of LR holds the return address of a subroutine or if an exception occurs, the appropriate exception mode's version of LR is set to the exception return address. When a subroutine call is performed by a BL or BLX instruction, LR is set to the subroutine return address. The subroutine is ended by copying LR to the program counter. This is normally done in one of the two following ways:
MOV PC, LR BX LR
STMFD SP!, {<registers>, LR}and loading it straight to the program counter:
LDMFD SP!, {<registers>, PC}
According to the ARM calling standard, when a subroutine is executed, parameters passed to it have been stored into registers R0-R3 and in stack. If one is going to use registers other than R0-R3 in the subroutine, their state should be saved prior to use. Multiple registers can be saved with a single instructions and should be used if the number of registers to be stored is more than two.
Example 1, subroutine uses registers R0-R6 so registers R4-R6 should be stored on entry:
STMFD SP!, {R4-R6} ; Stores registers R4-R6 into stack
and restored in exit:
LDMFD SP!, {R4-R6} ; Loads values from stack into registers R4-R6 MOV PC, LR ; Returns from subroutine
Example 2, subroutine uses registers R0-R6 and the link register so the registers R4-R6 and the link register should be stored on entry:
STMFD SP!, {R4-R6, LR} ; Stores registers R4-R6 and LR into stack
and restored in exit:
LDMFD SP!, {R4-R6, PC} ; Loads values from stack into registers R4-R6 and program counter
In ARM architecture, all instructions can be executed conditionally and can be chosen to update or not to update status registers. Also many different addressing modes are available. ARM assembler instructions are mostly in these forms:
<opcode>{<cond>}{S} <Rd>, <Rn>, <addressing_mode> <opcode>{<cond>}{S} <Rd>, <addressing_mode>
where Rd is destination register, Rn is source register and S is status register update flag.
Examples:
ADD R0, R0, #1
ADD R0, R1, #1
ADDS R2, R1, R1, LSL #8 MOVVS R0, #0
LDR R0, [R3] ; Address in R3 has to be longword aligned ADD R0, R0, R1, LSL R2 LDRH R5, [R4], R6 ; Address in R4 has to be word aligned
Some considerations are important when using ARM assembler. The architecture has some oddities as well as normal instruction timings which have to be considered.
The ARM architecture has a unique way of implementing immediate values in operations. Immediate values are stored in the instructions themselves as an 8-bit constant value and a 4-bit right rotate to be applied to that constant. The rotation has to be an even number of bits (0,2,4,8,..,26,28,30). So all immediate values are not acceptable in instructions, instead one can load the value into a register from the literal pool and use the register in place of the immediate value:
MOV R0, #0x1000
MOV R0, #0xFFFFFFFF
LDR R0, =0x1004 ; MOV R0, #0x1004 is illegal
Loading from literal pool can cause a cache miss and should be avoided if possible by using multiple instructions to load a register with some immediate value:
MOV R0, #0x1000 ORR R0, R0, #4
Almost all instructions in ARM architecture take a single clockcycle but some instructions have a result delay so that the processor will stall if the next instruction tries to use the result from the current instruction.
Loading values from memory has a result delay of one cycle so the following instruction after a memory load instruction should not use the result from the memory read:
LDR R0, [R1] ADD R0, R0, #1 ; 1 cycle stall SUB R2, R2, R3 ; Instructions take 4 cycles in total
one should use this instead:
LDR R0, [R1] SUB R2, R2, R3 ADD R0, R0, #1 ; Instructions take 3 cycles in total
The multiplication and multiplication/add instructions suffer from the same kind of result delays, only bigger. The result delay of a multiplication is between 1 and 3 clockcycles depending on the bit format of the multiplier operand. This should be taken into account when using multiplication instructions like in the case of loading a value from memory.
Contrary to the usual architectures, ARM can execute every instruction conditionally instead of needing to compare and branch accordingly to certain rule. One should try to take advantage of the fact:
MOV R0, R1 ; Copy R1 into R0 CMP R0, #4 ; Compare R0 against the value 4 BEQ label ; If R0 is 4 jump to label MOV R2, #0 ; Otherwise load R2 with zero label MOV R2, #1 ; Load R2 with one
can be written more efficiently discarding an unnecessary branch:
MOV R0, R1 ; Copy R1 into R0 CMP R0, #4 ; Compare R0 against the value 4 MOVNE R2, #0 ; If R0 is not 4, load R2 with zero MOVEQ R2, #1 ; If R0 is 4, load R2 with one