Leros: The Return of the Accumulator Machine
An FPGA-optimized tiny processor core for utility functions (e.g., SW UART). The challenge is to get the resources below 500 LC and use just 2 RAM blocks. The processor is named after the Greek island Leros where the architecture was designed.
The Leros project is hosted at GitHub in https://github.com/leros-dev.
Leros is documented in the following publications and documents:
- Martin Schoeberl. Leros: A Tiny Microcontroller for FPGAs. In Proceedings of the 21st International Conference on Field Programmable Logic and Applications (FPL 2011), Chania, Crete, Greece, September 2011.
- James Caska and Martin Schoeberl. Java dust: How small can embedded Java be? In Proceedings of the 9th International Workshop on Java Technologies for Real-Time and Embedded Systems (JTRES 2011), York, UK, ACM, September 2011.
- Morten Borup Petersen. A Compiler Backend and Toolchain for the Leros Architecture BSc. thesis, Technical University of Denmark (2019)
- Martin Schoeberl and Morten Borup Petersen. Leros: The return of the accumulator machine. Architecture of Computing Systems - ARCS 2019 - 32nd International Conference, Proceedings, 115-127, May, 2019.
A work-in-progress handbook is available as LaTeX sources at Leros Handbook
Architecture
Leros is an accumulator machine with a register file. Memory is accessed via indirect load and store instructions.
Leros Aims
An accumulator instruction that does less than a typical RISC instruction is probably more RISC than the typical load/store register-based RISC architecture.
- A simple architecture
- Results in a cheap FPGA implementation
- Easy to use in teaching
- Just a few instructions
- Different bit width
- 16-bit version for tiny microcontroller
- 32-bit version as a reasonable target for C
- 64-bit version for a Linux port
Further aims:
- Serve as an example of a small Chisel project for the Chisel book
- Use in teaching in fall 2018
- Provide virtual memory (with paging) for a Linux port
- Use for student projects
- Use for manycore experiments (NoC with more than 9 cores on a DE2-115)
Instruction Set Architecture
The instructions of Leros can be categorized into the following types:
- ALU operation with the accumulator and an immediate
- ALU operation with the accumulator and a register
- Load and store
- Indirect load and store
- Conditional branches
- Jump and link
- Arithmetic shift right
- (Input and output)
List of Instructions
Opcode | Function | Description |
---|---|---|
add | A = A + Rn | Add register Rn to A |
addi | A = A + i | Add immediate value i to A (sign extend i) |
sub | A = A - Rn | Subtract register Rn from A |
subi | A = A - i | Subtract immediate value i from A (sign extend i) |
sra | A = A » 1 | Shift A arithmetic right |
and | A = A and Rn | And register Rn with A |
andi | A = A and i | And immediate value i with A |
or | A = A or Rn | Or register Rn with A |
ori | A = A or i | Or immediate value i with A |
xor | A = A xor Rn | Xor register Rn with A |
xori | A = A xor i | Xor immediate value i with A |
load | A = Rn | Load register Rn into A |
loadi | A = i | Load immediate value i into A (sign extend i) |
loadhi | A$_{31-8}$ = i | Load immediate into second byte (sign extend i) |
loadh2i | A$_{31-16}$ = i | Load immediate into third byte (sign extend i) |
loadh3i | A$_{31-24}$ = i | Load immediate into fourth byte (sign extend i) |
store | Rn = A | Store A into register Rn |
jal | PC = A, Rn = PC + 2 | Jump to A and store return address in Rn |
ldaddr | AR = Rn | Load address register AR with Rn |
loadind | A = mem[AR+(i « 2)] | Load a word from memory into A |
loadindb | A = mem[AR+i]$_{7-0}$ | Load a byte signe extending from memory into A |
storeind | mem[AR+(i « 2)] = A | Store A into memory |
storeindb | mem[AR+i]$_{7-0}$ = A | Store a byte into memory |
br | PC = PC + o | Branch |
brz | if A == 0 PC = PC + o | Branch if A is zero |
brnz | if A != 0 PC = PC + o | Branch if A is not zero |
brp | if A >= 0 PC = PC + o | Branch if A is positive |
brn | if A < 0 PC = PC + o | Branch if A is negative |
scall | scall A | System call (simulation hook) |
Instruction Encoding
Instructions are 16 bits wide. The higher byte is used to encode the instruction, the lower byte contains either an immediate value, a register number, or a branch offset (part of the branch offset uses also bits in the upper byte).
+--------+--------+
|iiiiiiii|nnnnnnnn|
+--------+--------+
For example 00001001.00000010
is an add immediate instruction that
adds 2 to the accumulator, where 00001000.00000011
adds the content
of R3 to the accumulator. For branches, we use 3 of the instruction bits
for larger offsets.
The following table shows all currently defined instructions (21, if you include all conditional branch variations).
Not all instruction bits are currently used (unused are marked with -
).
Bit 0 selects between immediate and using a register. The following list
is the complete instruction set.
+--------+----------+
|00000---| nop |
|000010-0| add |
|000010-1| addi |
|000011-0| sub |
|000011-1| subi |
|00010---| sra |
|00011---| - |
|00100000| load |
|00100001| loadi |
|00100010| and |
|00100011| andi |
|00100100| or |
|00100101| ori |
|00100110| xor |
|00100111| xori |
|00101001| loadhi |
|00101010| loadh2i |
|00101011| loadh3i |
|00110---| store |
|001110-?| out |
|000001-?| in |
|01000---| jal |
|01001---| - |
|01010---| ldaddr |
|01100-00| ldind |
|01100-01| ldindb |
|01100-10| ldindh |
|01110-00| stind |
|01110-01| stindb |
|01110-10| stindh |
|1000nnnn| br |
|1001nnnn| brz |
|1010nnnn| brnz |
|1011nnnn| brp |
|1100nnnn| brn |
|11111111| scall |
+--------+----------+
Comments
loadh
makes only sense for immediate values.
Can easily be extended to 64 bits when ignoring the immediate bit. Load function from ALU could be dropped.
Load address and following load/store should be emitted as pair as they are dependent. Possible interrupts should be disabled between those two instructions.
ldindb/ldindh
sign extends.
Why do we have a nop? addi 0 can serve as nop if needed.
Getting Started
To run all test execute:
sbt test
More targets (e.g., synthesize for an FPGA) can be found in the Makefile.
LLVM Toolchain
Initially, pull and build the leros-llvm by executing the build.sh
script in the root repository directory.
The LLVM toolchain provides all the binary utilities from GNU Binutils. Following are a couple of examples on how the toolchain may be used in a development process:
Note: If an LLVM installation is already present on your machine, ensure that the executables within the build directory of the Leros toolchain are executed instead of the LLVM executables accessible through the PATH
.
To compile a C source file for the Leros architecture, execute:
clang -target leros32 -c foo.c -o foo.o
This will create an unlinked ELF object file containing Leros machine code.
To check that actual Leros instructions were emitted, objdump
may be used to disassemble the object file:
llvm-objdump -d foo.o
The Leros toolchain assumes a number of constants to be present in certain registers when compiling a C program. These registers are initialized in the Leros crt0.leros.c file. For more information on crt0 files, refer to: https://en.wikipedia.org/wiki/Crt0.
The crt0 object file as well as the runtime library functions are built by the build.sh
script and placed inside the toolchain library folders. These object files are automatically linked whenever using the Leros linker.
For compiling a Leros program and linking it with the crt0 object file, execute:
clang -target leros32 foo.c -o foo.out
This will emit an executable ELF file, which may be executed by the Leros simulator (https://github.com/leros-dev/leros-sim).
If a flat binary version of an executable is needed, the llvm-objcopy
may be used:
llvm-objcopy foo.out -O binary foo.out foo.bin
This will dump all of the ELF sections to a flat binary file, suitable for running on simulators or used to initialize hardware ROMs. Note, that this will emit the various program sections at some default address. When executing on hardware, it may be desired to emit code at a specific address placement. For this, a linker script is needed.
As an example, it is desired for a programs entry point (and .text segment) to be emitted at address 0x0
.
A linker script for this may be:
# file: leros.ld
ENTRY(_start)
SECTIONS
{
. = 0x0;
.text : { *(.text) }
}
Here, we refer to the _start
symbol specified in the crt0.leros.c file, as well as specify that the .text section - the instructions of the program - are to be emitted from address 0x0.
The linker script may be passed as an argument to the linker through clang, by specifying:
clang -target leros32 -Xlinker leros.ld foo.c -o foo.out
The flat binary may then be extracted from the foo.out
ELF file.
For compiling a Leros program to assembly, execute:
clang -target leros32 -S foo.c -o foo.s
Leros Versions and Compilers
Initial Version
The initial version of Leros was designed as a 16-bit accumulator machine and written in VHDL. Besides writing assembly programs a Java JVM for microcontroller has been ported to support Leros. Also a software simulator written in Java is available.
Current Development Version
To provide a reasonable target for C programs, we will extend Leros to 32 bits and rewrite the hardware description in Chisel. We will try to make Leros to be configurable being 16 or 32 bits. LLVM will be adapted for Leros32 and feedback from this compiler backend may result in changes in the instruction set. This may break the compatibility with the VHDL version of Leros and the Java compiler.
We aim to provide enough documentation and simulators so that this version can be used in the teaching of basic computer architecture.