Virgil is a fully self-hosted language. Not only the entire compiler, but also the entire runtime system and garbage collector (GC) are written in Virgil! Other than a small amount of custom assembly generated by the compiler for startup, there is no code in any other language needed to run a natively-compiled Virgil program. But Virgil is a type-safe, memory-safe language, how is this possible?
The answer is that, on native targets, Virgil provides an unsafe Pointer
type and associated operations that allow the Virgil runtime system and GC to be written in Virgil (rather than in a lower-level language or directly in assembly).
Pointer
is used for in VirgilThere are three main uses of Pointer
in Virgil’s supporting code.
First, Virgil provides the System
component on all targets, which gives a very rudimentary system call layer to do I/O and manipulate files.
On native targets, the System
component is written in Virgil source code and uses Pointer
s and kernel system calls to implement its methods.
Second, the (very thin) Virgil runtime provides source-level stack traces upon program errors like NullCheckException
, etc.
These stacktraces are printed out by the runtime system, which is also written in Virgil.
The runtime system walks the call stack and uses metadata from the compiler to reconstruct source locations for each native frame.
Third, the most complex, the Virgil garbage collector traces and copies the heap of the program, including live objects, as necessary, during program execution. It also uses metadata from the compiler, such as the location of root references in native frames, the layout of heap objects, etc, to do this.
Pointer
is not used forPointer
is an all-powerful mechanism to read and write the native process’s memory.
There are no safety checks.
In theory, it gives the ability to read or mutate any data in the program; we could build the “perfect” data structures laid out exactly how we want in memory.
In practice, applications should avoid using Pointer
s and custom data structures for performance tuning in Virgil.
Rather, applications should use the safer, more convenient constructs like classes, closures, ADTs, arrays, etc.
Pointer
s are only for interfacing lower-level software like an operating system kernel.
Pointer
s work in VirgilIn Virgil, pointers are untyped, raw byte addresses. They have these simple rules:
Pointer
type per targetPointer
type can be used anywhere any other Virgil type can be usedPointer
is exactly the target address size (either 32 or 64 bits)Pointer
values are not scanned or relocated by the garbage collectorload
and store
operations to read/write Virgil values directly from/to memoryPointer.SIZE
and Pointer.NULL
There are two important constants that are members of the Pointer
type.
var x: int = Pointer.SIZE; // the size, in bytes, of pointers on this target
var y: Pointer = Pointer.NULL; // the null pointer, i.e. address 0
Pointer
arithmeticAs byte addresses, pointers support addition of a signed integer offset, and the subtraction of two pointers. Since pointer size is target-specific, the type of the offset, or the result of subtracting two pointers, is different on different targets.
var n = Pointer.NULL; // null pointer
var p1 = n + 66; // add 66 (bytes) to a pointer
var p2 = n + 68L; // add 68 (bytes) to a pointer (offset can be {long} on 64-bit)
var diffI: int = p - n; // difference between pointers is of type {int} on 32-bit targets
var diffL: long = p - n; // difference between pointers is of type {long} on 64-bit targets
Pointer
comparisonAs (unsigned) byte addresses, pointers can be compared with familiar inequality operators.
However, a Pointer
cannot be directly compared to an integer.
var p: Pointer;
var q: Pointer;
var r1 = (p == q); // equality comparison between two pointers
var r2 = (p != q); // not equal comparison
var r3 = (p < q); // less than
var r4 = (p <= q); // less than or equal
var r5 = (p > q); // greater than
var r6 = (p >= q); // greater than or equal
var x = (p == 99); // ERROR: cannot compare pointer to integer
load
and store
operations on Pointer
Virgil pointers are more than addresses.
They support unchecked (potentially unsafe) access to memory with indiviual loads and stores.
With the load
and store
methods on a pointer, your program can read or write any Virgil values to that address.
Both methods have a type parameter indicating the type of the value to be loaded or stored.
With this mechanism, we can not only read or write primitive values to a pointer, but also other values, like pointers, or references (!).
This is dangerous and can not only interpret data as the wrong type, but potentially damage the heap, leading to a crash later.
It is recommended against doing dangerous loads/stores that subvert the type system.
var p: Pointer;
var x: int = p.load<int>(); // load an int (i32) from {p}
p.store<int>(33); // store an int into {p}
var y: string = p.load<string>(); // unchecked, raw reference load, dangerous!
cmpswp
on Pointer
Some native targets have an instruction for compare-and-swap, often used in implementing locks or other concurrent utilities.
Rather than offering higher-level mechanisms for concurrency (so far), Virgil just exposes this operation as a method on Pointer
.
Eventually, Virgil will have higher-level concurrency constructs that are implemented with compare-and-swap.
var p: Pointer;
p.store<int>(33);
var x: bool = p.cmpswp(33, 44); // returns true if {33} was atomically swapped to {44}
Outside of implementing Virgil’s own runtime system and garbage collector, the most common use case for Pointer
is to interface to lower-level software like kernels for doing I/O.
In these cases, we typically use a Virgil array (often Array<byte>
) as the underlying memory for exchange.
To get a pointer directly into the beginning of the contents of an array (i.e. element 0
), we can use Pointer.atContents
.
def STDIN = 0;
def SYS_read = 3;
var buf = Array<byte>.new(128);
// call Linux kernel to read directly into {buf}
Linux.syscall(SYS_read, (STDIN, Pointer.atContents(buf), buf.length));
Virgil supports experiments in new virtual machine designs. As it turns out, such virtual machines may need unsafe access to particular kinds of heap objects. Virgil is prototyping a mechanism for obtaining pointers into the middle of heap objects. Since pointers are not relocated by the GC, this is very dangerous. They are documented here for completeness.
class C(x: int) { }
var c = C.new(33);
var ptr_c = Pointer.atObject(c); // points at "beginning" of {c} object
var ptr_x = Pointer.atField(c.x); // points directly at {x} field of {c}
var a = Array<int>.new(3);
var ptr_length = Pointer.atLength(a); // points at the {length} of {a}
var ptr_a_0 = Pointer.atElement(a, 0); // points at {a[0]}
var ptr_a_1 = Pointer.atElement(a, 1); // points at {a[1]}
var elem_size = ptr_a_1 - ptr_a_0; // computes element size
Pointer
type is exposedThe compiler exposes the Pointer
type and associated operations only on these native targets:
x86-darwin
: 32-bit, Pointer.SIZE == 4
x86-linux
: 32-bit, Pointer.SIZE == 4
x86-64-linux
: 64-bit, Pointer.SIZE == 8
wasm
: 32-bit, Pointer.SIZE == 4
These targets are not “native” and do not have the Pointer
type: