Aphrodite Research Notes
A copy of this document can be found in the Aphrodite respository. It contains my research notes from the research project, conducted in summer 2022 under Professor Calvin Deutschbein.
2022-05-25
Starting from Ubuntu 20.04
Following https://github.com/pulp-platform/pulp-runtime/blob/master/README.md
$ sudo apt install git python3-pip gawk texinfo libgmp-dev libmpfr-dev libmpc-dev
$ sudo pip3 install pyelftools
Cloning repositories into jldey@ubuntu:~/Repos/pulp
$ git clone --recursive https://github.com/pulp-platform/pulp-riscv-gnu-toolchain
so now we should have the toolchain installed, if all’s well. Now, toolchain dependencies:
$ sudo apt-get install autoconf automake autotools-dev curl libmpc-dev libmpfr-dev
libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc
zlib1g-dev
2022-06-10
https://youtu.be/27tndT6cBH0
https://pulp-platform.org/docs/riscv_workshop_zurich/schiavone_wosh2019_tutorial.pdf
- energy-efficient, open-source programmable microcontroller
- use in IoT devices
- “Parallel Ultra Low Power”
- PULPissimo architecture
- RISC-V based microcontroller with peripherals
2022-06-16
meeting notes:
Adding PATH variables
export PATH=$PATH:/opt/riscv/bin
echo PATH
before and after changes- save PATH before changing it (in case of bricked systems)
export OLDPATH=$PATH
echo $PATH >oldpath.txt
- get either 32-bit or 64-bit working with some extensions
- focus on 32-bit, no extensions rv32i
- Look at Qemu – emulation environment (runs test suite)
work progress
adding to PATH
- old path:
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
$ export PATH=$PATH:/opt/riscv/bin
PATH modified successfully.
$ ./configure --prefix=/opt/riscv
- this is telling me that there is no such file or directory as
‘configure’—did I add the wrong thing to PATH?
- -> try changing the PATH addition to the directory of the installation
$ make linux
2022-06-21
The problem wasn’t my PATH. What I needed to do was:
$ cd Repos/pulp/pulp-riscv-gnu-toolchain
This allowed me to run the configure command that was in the repository that we cloned earlier. Hopefully, if this completes successfully, I will have configured the toolchain.
It did not complete successfully. I may have run out of disk space.
New vm created, with 100GB disk space. Here’s hoping.
[If this still doesn’t work, try appending /opt/riscv/bin
to the beginning
of PATH, not the end.]
IT WORKED!!! The successful one was the Newlib version, which uses the commands:
$ ./configure --prefix=/opt/riscv
$ make
2022-06-22
Installing and configuring Qemu: https://wiki.qemu.org/Hosts/Linux
$ sudo apt-get install git libglib2.0-dev libfdt-dev libpixman-1-dev zlib1g-dev ninja-build
$ sudo apt-get install git-email
$ sudo apt-get install libaio-dev libbluetooth-dev libcapstone-dev libbrlapi-dev libbz2-dev
$ sudo apt-get install libcap-ng-dev libcurl4-gnutls-dev libgtk-3-dev
$ sudo apt-get install libibverbs-dev libjpeg8-dev libncurses5-dev libnuma-dev
$ sudo apt-get install librbd-dev librdmacm-dev
$ sudo apt-get install libsasl2-dev libsdl2-dev libseccomp-dev libsnappy-dev libssh-dev
$ sudo apt-get install libvde-dev libvdeplug-dev libvte-2.91-dev libxen-dev liblzo2-dev
$ sudo apt-get install valgrind xfslibs-dev
$ git clone git://git.qemu-project.org/qemu.git
$ # Switch to the QEMU root directory.
$ cd qemu
$ # Prepare a native debug build.
$ mkdir -p bin/debug/native
$ cd bin/debug/native
$ # Configure QEMU and start the build.
$ ../../../configure --enable-debug
$ make
$ # Return to the QEMU root directory.
$ cd ../../..
$ bin/debug/native/x86_64-softmmu/qemu-system-x86_64 -L pc-bios
I have the test BIOS working. Now, to see what we can do with qemu and the PULP toolchain.
$ ./configure --prefix=$RISCV --disable-linux --with-arch=rv64ima # or --with-arch=rv32ima
$ make newlib
$ make check-gcc-newlib
$ ./configure --prefix=$RISCV
$ make linux
This step is resulting in errors. I might need to add the following qemu stuff to PATH, although I’m not sure what directory I’ll find it in.
$ # Need qemu-riscv32 or qemu-riscv64 in your `PATH`.
$ make check-gcc-linux
2022-07-08
parallel process: generate traces from anything on qemu
look into verilator
PROCESS OVERVIEW (high-level):
- Get RISC-V/PULPissimo running in emulation or simulation
- Use simulation tools to generate logs/outputs of as much information about register transfers as possible
- Parse those logs and outputs into a trace
IMMEDIATE GOAL: emulate the RISC-V ISA and run Linux.
2022-07-11
Simulation vs. Emulation
- simulation runs like a whole processor, with wires and all
- emulation recreates an ISA
- Verify qemu works with a bare-metal C program
- Try running Linux on the virt RISC-V
I took this test program as an example of C written for bare metal.
Task 1. was proving too much to wrap my head around today, so I went back to the QEMU docs to try and boot Fedora. Lo and behold, I got it to work, with one minor alteration (RULE #1, never trust the README):
login: root
password: fedora_rocks!
Let’s try some basic things here:
$ ls
$ mkdir jldey
$ cd jldey
all seemed to work as expected.
$ echo "Hello World!"
$ echo $PATH > path.txt
$ cat path.txt
also worked as expected.
Shutdown:
$ /sbin/shutdown -h
See the session transcript for more details.
2022-07-12
getting register values through DEBUG ports
- a Qemu setting (debug? trace?)
Try new risc-v C compilers
- research error 306 on the PULP riscv-gcc
Inspired by our meeting this morning, I did some background reading on RISC, MIPS, and ARM.
Why would Intel choose not to use a RISC approach?
CISC architecture allows for “higher programming productivity” in Assembly. It also means fewer instructions will be used to compile high-level programs.
fewer instructions –> less heat produced per time This is an electrical engineering result, which is not apparent from our level of research (ISA level).
Microcoding allows emulation of wider word length— allowing greater compatibility between different products in a family —but so does the RISC-V standard
RISC is better from a CS standpoint, CISC is better for the e. engineers and the physicists.
Bash tutorial: figuring out if there’s a new file
$ ls -al > before.txt
$ ./compile
$ ls -al > after.txt
$ diff before.txt after.txt > diff.txt
$ cat diff.txt
2022-07-13
Memory sinkhole–differences between CISC and RISC designs
Configuring the new toolchain
Since the PULP toolchain is for PULP architecture and not the Qemu virt
machine, I grabbed the riscv-gnu-toolchain instead:
$ git clone https://github.com/riscv/riscv-gnu-toolchain --recursive
Despite what the README would have you believe, the --recursive
is
necessary to clone the submodules.
Next, as recommended, I installed dependencies.
$ sudo apt-get install autoconf automake autotools-dev curl python3 libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev
I got the riscv-gnu-toolchain built successfully using the following commands:
$ ./configure --prefix=/opt/riscv --enable-multilib
$ sudo make
To verify that it built and configured correctly, I tried compiling my hello.c
from my example bare metal C for ARM. That attempt produced the following
result:
jldey@ubuntu:~/Repos/riscv-gnu-toolchain/riscv-gcc$ chmod 777 compile -v
jldey@ubuntu:~/Repos/riscv-gnu-toolchain/riscv-gcc$ ./compile ~/Desktop/hello.c
/home/jldey/Desktop/hello.c: 1: Syntax error: "(" unexpected
I have never in my life been so happy to encounter a compile error. Now, we work on finding out what the hell went wrong.
2022-07-13
Test program: change the value of a register and detect that it was changed:
- less complex: integer addition
- more complex: “Hello World!”
use an assembly program to generate a trace
- ANYTHING running on Qemu (asm or C)
- Get register values out of Qemu
This RISC-V bare-metal assembly “Hello World!” referenced the riscv-probe, which is a “simple machine mode program to probe RISC-V control and status registers.” I believe this will be important for generating traces. But first, I had to install its dependencies, starting with riscv-tools, which keeps failing to build on my system.
$ sudo apt-get install autoconf automake autotools-dev curl libmpc-dev libmpfr-dev libgmp-dev libusb-1.0-0-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev device-tree-compiler pkg-config libexpat-dev
$ git clone https://github.com/riscv-software-src/riscv-tools.git
$ cd riscv-tools
$ git submodule update --init --recursive
$ export RISCV=/home/jldey/Repos/riscv-toolchain/ # install directory for the toolchain
$ ./build.sh
The build fails at the Configuring project riscv-pk
step, with the following
error codes:
Configuring project riscv-pk
Building project riscv-pk
gcc: error: unrecognized argument in option ‘-mcmodel=medany’
gcc: note: valid arguments to ‘-mcmodel=’ are: 32 kernel large medium small
make: *** [Makefile:319: file.o] Error 1
I’m not sure what that means, so we’ll just move on for the time being. Revisited on 2022-07-18
Running the assembly Hello World
After copying the included assembly into a file called hello.s
, I created the
makefile and linker script
link.lds
as specified.
It turns out that I had not, in fact, configured my toolchain correctly. I
again tried the ./configure
and make
steps for the riscv-gnu-toolchain
,
still with /opt/riscv/bin
as my install directory.
I had to try this several times, getting unhelpful Error 1
and Error 2
messages in the make
process. On the advice of the README, I cleared out my
install directory before trying to install again into the same directory. I
tried the following, and this seemed to work:
$ ./configure --prefix=/opt/riscv --enable-multilib
$ sudo make linux
The process teminated, and I verified that there were executable files in
/opt/riscv/bin
.
2022-07-15
I finally figured out how to invoke my newly configured riscv-gcc
, with the
following command:
$ riscv64-unknown-linux-gnu-gcc --help
This produced a help menu. I next tried to compile hello.c
, and recieved this
error message:
$ riscv64-unknown-linux-gnu-gcc ~/Desktop/hello.c
riscv64-unknown-linux-gnu-gcc: error trying to exec 'cc1': execvp: No such file or directory
I then invoked verbose mode with the -v
flag and tried compiling again, with
this output:
Using built-in specs.
COLLECT_GCC=riscv64-unknown-linux-gnu-gcc
Target: riscv64-unknown-linux-gnu
Configured with: /home/jldey/Repos/pulp/pulp-riscv-gnu-toolchain/riscv-gcc/configure --target=riscv64-unknown-linux-gnu --prefix= --with-sysroot=/sysroot --with-newlib --without-headers --disable-shared --disable-threads --with-system-zlib --enable-tls --enable-languages=c --disable-libatomic --disable-libmudflap --disable-libssp --disable-libquadmath --disable-libgomp --disable-nls --disable-bootstrap --enable-checking=yes --disable-multilib --with-abi=lp64d --with-arch=rv64imafdc
Thread model: single
gcc version 7.1.1 20170509 (GCC)
COLLECT_GCC_OPTIONS='-v' '-march=rv64imafdc' '-mabi=lp64d'
cc1 -quiet -v -iprefix /usr/bin/../lib/gcc/riscv64-unknown-linux-gnu/7.1.1/ /home/jldey/Desktop/hello.c -quiet -dumpbase hello.c -march=rv64imafdc -mabi=lp64d -auxbase hello -version -o /tmp/ccuBhus5.s
riscv64-unknown-linux-gnu-gcc: error trying to exec 'cc1': execvp: No such file or directory
From this output, it seems that my ./configure --enable-multilib
flag had not,
for whatever reason, configured multilib support. I did find it somewhat odd
that there were no riscv32-...
executables in /opt/riscv/bin
, which was the
first sign that it didn’t configure multilib. Since the riscv-probe
requires a
multilib installation, I’ll have to try again (but I’ll use a different install
directory, because I have a working riscv-gcc
right now, even if it’s only
for the 64-bit architecture. –> It seems that I didn’t read far enough on the
toolchain README, and the Linux multilib cross-compiler uses the prefix
riscv64-unknown-linux-gnu-
, although it can target both 32-bit and 64-bit
systems. Since I knew my 64-bit virt
emulator was working, I opted to use that
for my target.
Let’s try to build our Assembly hello world, instead of C. Since I’ve tested the
virt
machine in Qemu, I’ll try targeting it with my assembler and linker. To
do this, I looked up the UART0
address for virt
, which I found on line 85 of
qemu/hw/riscv/virt.c
:
static const MemMapEntry virt_memmap[] = {
...
[VIRT_UART0] = { 0x10000000, 0x100 },
...
};
In accordance with this, I changed line 5 of hello.s
to be
lui t0, 0x10000
since that is the line that loads our UART0 address into register t0. As noted
in my annotations of this assembly,
lui
loads the given 20-bit immediate into the upper 20 bits of the destination
register rd
, in this case, t0
.
EXIT QEMU: Ctrl+A
then x
To run the program, I had to change the Makefile so that it would invoke my Linux glibc multilib cross-compiler rather than the Newlib 32-bit cross-compiler used in the example:
hello: hello.o link.lds
riscv64-unknown-linux-gnu-ld -T link.lds -o hello hello.o
hello.o: hello.s
riscv64-unknown-linux-gnu-as -o hello.o hello.s
clean:
rm hello hello.o
This Makefile takes hello.s
as input, then runs it through the assembler,
which produced the ELF file hello.o
, and finally ran hello.o
through the
linker, to produce the linked ELF file hello
. Now, we are ready to run our
program in Qemu, which I did with the following:
$ cd ~/Repos/Qemu
$ qemu-system-riscv64 -machine virt -nographic -kernel ~/Desktop/hello
This produced the following output:
qemu-system-riscv64: warning: No -bios option specified. Not loading a firmware.
qemu-system-riscv64: warning: This default will change in a future QEMU release. Please use the -bios option to avoid breakages when this happens.
qemu-system-riscv64: warning: See QEMU's deprecation documentation for details.
Hello
This only printed “Hello” once, presumably since the virt
machine only has one
core, while the SiFive board used in the example has two cores. After printing
“Hello”, the processor sat there running the infinite loop that finished our
assembly program. Without this infinite loop, the program counter would continue
incrementing, which could lead it to execute illegal instructions. Or, in more
relevant security terms, it could start leaking memory.
To terminate Qemu, I pressed Ctrl+A
, released, then pressed X
. This sequence
of keystrokes produced the output:
QEMU: Terminated
This gives us a process to execute arbitrary RISC-V assembly programs using our Qemu emulator. Next step: generate a trace.
2022-07-18
Building riscv-tools
so that we can use riscv-probe
to “probe control and
status registers.” After several failed attempts, I realized that my $RISCV
environment variable should be set to the directory where my toolchain is
installed, i.e. /opt/riscv
. I also noticed that I needed to change some
--host
flags in build.sh
to reflect my installation of the Linux glibc
toolchain (prefix riscv64-unknown-linux-gnu-
, rather than the Newlib toolchain
with prefix riscv64-unknown-elf-
).
Since I installed my toolchain in a directory that’s restricted, I had to use the following command to run the build script:
$ sudo -E ./build.sh
I got a compiler error in riscv-pk/pk/pk.c
:
../pk/pk.c: In function 'rest_of_boot_loader':
../pk/pk.c:139:3: error: both arguments to '__builtin___clear_cache' must be pointers
139 | __clear_cache(0, 0);
| ^~~~~~~~~~~~~~~~~~~
make: *** [Makefile:319: pk.o] Error 1
The function signature as given by documentation is the following:
void __builtin___clear_cache(void *begin, void *end);
From the code, I cannot determine what cache is supposed to be cleared here. I
suppose I could try casting the zeroes to int *
type:
__clear_cache((int *)0, (int *)0);
Switching Gears
Let’s look into Qemu features that might allow us to look at register values.
Monitor
The -monitor
flag lets us view the register contents at a fixed point in time,
by executing info registers
in the monitor console. However, I do not believe
this tool allows us to periodically write register values to a file. Unless, of
course, I could configure a character device driver that sends monitor info to a
log file. That seems like more trouble than it’s worth, for a result that’s less
than optimal.
gdb (GNU Debugger)
Launching our emulator with the -s
and -S
flags makes it listen for an
incoming TCP connection on port 1234, from gdb, and wait until gdb tells it to
launch the kernel.
gdb seems to be having trouble connecting to the runtime with our assembly program. I’ll try connecting it to the runtime with Fedora.
That also seemed to fail. This might be the fact that I’m in the wrong directory
to look at the vmlinux
executable.
warning: Architecture rejected target-supplied description
warning: No executable has been specified and target does not support
determining executable automatically. Try using the "file" command.
Truncated register 37 in remote 'g' packet
Everything I’ve tried creates these error messages.
2022-07-19
riscv-tools
According to an issue report on the GH repo that mentioned the same compile
error I got, the riscv-tools
repository is no longer maintained, so I’m going
to download and compile a new version of riscv-pk
, the step which keeps
failing.
$ git clone https://github.com/riscv-software-src/riscv-pk.git
$ mkdir build
$ cd build
$ ../configure --prefix=$RISCV --host=riscv64-unknown-linux-gnu
$ make
$ make install
Config failed on the up-to-date version. Reviewing the log file indicates that my version of the riscv-gcc is different than what this package expects.
2022-07-20
Not much progress was made today. I worked on creating the Fedora boot demo with
info registers
accesses, and starting to parse through bits of the QEMU source
to attempt some fprintf hacking.
2022-07-21
- Demo Done!
-
QEMU sprintf hacking Not close enough for comfort
For tomorrow:
- QEMU wrapper
QEMU wrapper
Using the -S
flag prevents the CPU from starting, and requires a cont
command to proceed with execution. This will be useful to include in my wrapper
script.
Other useful monitor commands:
stop
: stops execution of VMc
/cont
: resumes executionlogfile
: write logs to the specified file (instead of default)
Wrapper goals:
- Start QEMU with a linked ELF as input
- start the VM paused (
-S
) -monitor stdio
so program can write commands to monitor
- start the VM paused (
- Ping monitor every so often (specify as commandline option?)
- QEMU “single-step” mode (take the first N cycles)
- build a simple character driver to use instead of
stdio
? - write this output to a trace file
- Terminate VM
- send
quit
command (or simplyq
) to monitor- quit condition?
- timeout -> fixed time? -> based on last output change (pc?)
- user-specified?
-> if the program is reading/writing to the
monitor console, does that mean the user can issue a
quit
? -> does the user ping the program, or the monitor?
- quit condition?
- send
2022-07-22
GOAL: Get a Python script that starts QEMU and can write commands to monitor.
Investigating singlestep
There is a monitor option that runs in “single-step” mode. I have attempted to run it on my Hello World program, with no detectable result so far.
$ qemu-system-riscv64 -machine virt -kernel ~/Desktop/riscv-hello-world/asm/hello -monitor stdio -S
\[...\]
(qemu) singlestep
(qemu) singlestep
(qemu) info registers
pc 0000000000001000
\[...\]
f31/ft11 0000000000000000
(qemu) c
After the c
command instructing the VM to continue, the entire “Hello” string
printed in the VM window, and there was no indication that the execution had
paused for single-step execution.
Looking more deeply at QEMU debugging options, it might be the case that -d cpu
will write the CPU registers “after the execution of each code block.” This
could be very helpful.
The -d cpu
option does print the same CSRs as monitor shows, but doesn’t
include FPRs. The -D ./<filename>
flag saves a log in the current directory.
Thus,
qemu-system-riscv64 -machine virt -kernel ~/Desktop/riscv-hello-world/asm/hello -monitor stdio -s -S -d cpu -D ./hello_log.txt
produces the log hello.txt. A similar effect can be achieved with
logfile ./hello_log.txt
in the monitor.
2022-07-25
Writing the Python wrapper script
It looks like the subprocess module for Python is the best way to run Qemu inside a Python script, as it gives us flexibility of where to route the child process’s stdin, stdout, and stderr.
That is, assuming I can get it to run Qemu correctly with the flags I need. I’ve
gotten os.system
to function as expected, but am yet to figure out how to do
the same thing with a subprocess.
Looking at the documentation, I discovered that subprocess.call("*",shell=True)
functions as basically a 1:1 replacement for os.system
.
2022-07-29
Understanding Subprocess
After lots of reading and tinkering with the subprocess
module, I have
discovered a few things.
# The following results in undefined behavior when called on the list args
# but executes as expected when called on args joined into a single string.
s = sp.run(shlex.join(args), shell=True, text=True, env=env)
The following are equivalent (although they have slightly different return values):
os.system(" ".join(args))
retcode = sp.call(" ".join(args), shell=True)
s = sp.run(" ".join(args), shell=True, text=True)
Using shell=True
leaves a potential vulnerability to shell injection attacks,
although using shlex.join()
mitigates this risk by using a properly shell-
escaped command.
It may be necessary/good practice to copy the environment variables into a
dictionary before opening the subprocess, and passing that dictionary as the
env
argument.
subprocess.run()
waits for the child process to complete before moving on to
the next line of Python code, however subprocess.Popen()
“executes a child
program in a new process.” The new Popen
object can then be sent input from
the parent and have its output read by the parent.
The child process’s stdin
, stdout
, and stderr
can all be sent to an
existing file object, or through a pipe. For extra flexibility, we will use a
pipe. subprocess.PIPE
is a file object that can handle I/O to and from the
child.
Entering Ctrl+D
into the terminal without an active process closes the window,
causing a loss of the information that was contained in it.
Popen.communicate()
waits for the child process to terminate before sending
and receiving data. I believe I will want to use the Popen
object’s I/O
streams to send and recieve data.
2022-08-01
- Troubleshoot concurrency
- Work QEMU with concurrency know-how
- Failing that, write a script that runs QEMU with logging enabled, terminates QEMU, and parses the output to a Daikon trace.
Subprocess is not the right package to use for spawning interactive programs inside Python. See this article about why you shouldn’t use Popen pipes with interactive child processes.
Instead, we’ll use the pexpect
package.
This is so much easier to work with than subprocess
was. I already
have a small QEMU-formatted trace, after about 20
minutes of reading and 2 minutes of writing a couple lines of code.
2022-08-02
Daikon
Daikon takes two files as input: a .decls and a .dtrace:
Decls file
A .decls containes a header and timepoints.
Timepoints
Timepoints contain:
- its type
- elements of state (variables)
- variable kind (variable)
- declaration and representation type
- comparability
- CSRs, GPRs, and FPRs have their own respective comparabilities.
- variable name corresponds to register name
Per Calvin’s recommendation: declare all variables the same way at
tick():::ENTER
and tick():::EXIT0
. Additionally, give the same values at
enter and exit.
-> Can we have no exit values?
dtrace files
A .dtrace consists of a bunch of program points.
Program point
Program points have the following:
- a name, with enter or exit
- a nonce, which is monotonically increasing (1-indexed)
- every register
- name
- value
- the (believed) hardcoded value 1
There must be a line of whitespace between program points, or Daikon will error. Extra whitespace might be allowed, but that minimum must be present.
qtrace -> 1 dtrace, 1 decls
Priorities:
- Get Daikon working and parse qtraces into dtraces
- Run Fedora on SiFive emulation and see if it logs different CSRs. -> it doesn’t.
Writing qToDaikon
This script needs to take a qtrace, preferably either a monitor trace or a QEMU log file, and change it into a dtrace. At the heart of this problem is parsing our register values into an internal data structure, such as a list. Conveniently, a lot of that can be handled by this one line of code:
re.findall("\w+\s+[0-9a-f]{16}|\w+\s+[0-9a-f]x[0-9a-f]",line)
This uses a regular expression to match the labels and registers from the qtrace format into a list of strings, where each string has a register label/value pair that essentially forms a program point. This is something we can quickly comb through and write to a dtrace file.
2022-08-03
Installing Daikon
Following the Daikon install instructions, I unpacked the Daikon distribution, like so:
$ mkdir ~/Repos/Daikon
$ cd ~/Repos/Daikon
$ wget http://plse.cs.washington.edu/daikon/download/daikon-5.8.12.tar.gz
$ tar zxf daikon-5.8.12.tar.gz
This created a daikon-5.8.12/
subdirectory. I then executed the following:
$ export DAIKONDIR=~/Repos/Daikon/daikon-5.8.12
$ source $DAIKONDIR/scripts/daikon.bashrc
This output the error message:
Cannot infer JAVA_HOME; please set it. Aborting daikon.bashrc .
I realized I had to install the Java Development Kit, which I did with the following:
$ sudo apt install default-jdk
and verified installation with:
$ java -version
Then, I checked that DAIKONDIR
was still set correctly with echo
, and I
repeated the source
command transcribed above. This no longer returned an
error message, and I checked that it had set the environment variable
JAVA_HOME
, as a sanity check that it had done its work, and sure enough, I got
the filepath I expected.
Continuing qToDaikon
While troubleshooting the above and waiting for installations to finish, I wrote
the parts of qToDaikon.py
that open a .dtrace
file for writing, and parse
the input into a 2D list of timepoints by register name/value strings. Next, we
have to take this 2D list and parse the strings inside it, then write those
register names and values into a .dtrace
file in the proper format.
2022-08-04
In our qscript
output, we notice that we are logging duplicate register values
at many different timepoints, because we are receiving info registers
outputs
more often than the values are actually updated. To find the number of unique
timesteps, we compare the value of every register at the current timepoint to
the corresponding value at the previous timepoint using list equality. We only
append a timepoint to the list of timepoints if it passes this uniqueness check.
Now we have a toolchain to run QEMU, generate a trace, and parse that trace into
Daikon format. The next step is to generate the .decls
, which I did with the
script make_decls.py
, which takes lists of the CSRs, GPRs, and FPRs, and gives
each category its own comparability. This should only need to be generated
once, and I’ve included the resulting universal.decls
here, under src/utils.
Our next step will be to combine qscript.py
and qToDaikon.py
into a single
tool, which both runs QEMU on an ELF file, and generates a .dtrace
. Now that
we have all the pieces in place, this should go smoothly.
2022-08-05
Now having a .decls
and a .dtrace
, I ran Daikon on these files:
java daikon.Daikon ~/Desktop/riscv-tracegen/src/utils/universal.decls ~/Desktop/riscv-tracegen/src/utils/20220801-151842.dtrace
After a couple of failed attempts, because I had forgotten to include proper
whitespace in my .dtrace
, this run produced the output (a list of properties)
found in hello_props.txt. The properties discovered here are
exactly what we would expect: most of the registers are equal to each other with
value 0, and a few were changed to one nonzero value (except the program counter
pc
, which had two nonzero values, which, again, is as expected. The registers
were also grouped in the categories I specified with the comparability
property, CSRs, GPRs, and FPRs.
Aphrodite
Our overall tool, combining the trace generation from qscript
and the .dtrace
formatting from qToDaikon
, called Aphrodite, takes a RISC-V executable and a
timeout in minutes, executing QEMU and writing a well-formatted dtrace. I have
verified Aphrodite’s correctness with our bare-metal hello world program, and
will now use Aphrodite to boot Linux with a 20-minute timeout, to see how much
data it generates. After that, I’ll run it with a longer timeout (hopefully, 2
hours).
After much debugging, I was able to get Aphrodite to successfully boot Linux. Much to my surprise, the boot process completed in less than an hour, and I was able to log in and do a couple basic shell commands at the terminal, before engaging the shutdown sequence. This resulted in a 15.7 MB trace that contains 6,484 program points.
Next Steps
- Figure out what special FP values are (l. 23-34)
- Analyze daikon_output.txt lines 23-34 (special values) and 42-58 (CSR ordering).
- Run Daikon again comparing two program points to each other (relabel every
other tick to be entry/exit of previous. (changes in trace encoding)
- zero/nonzero conditions that describe CWE
- Conditionals in Daikon (split-info file)
- Non-hardware CWEs
Invoking Daikon with a split-info file (.spinfo):
java -cp $DAIKONDIR/daikon.jar daikon.Daikon Foo.decls Foo.spinfo Foo.dtrace