Skip to content

Testing

Testing is built into Sailfin — there is no external framework to install, no import testing at the top of your file, and no runner to configure. The test keyword is part of the language itself.

This design reflects a core belief: the closer tests live to the code they verify, the more likely they are to stay up-to-date. Tests are first-class citizens, not afterthoughts.

fn add(a: number, b: number) -> number {
return a + b;
}
test "add returns the sum of two integers" {
assert add(2, 3) == 5;
}

Run it:

Terminal window
sfn test

Output when it passes:

PASS add returns the sum of two integers

Output when it fails:

FAIL add returns the sum of two integers
assertion failed: add(2, 3) == 5
left: 4
right: 5
--> src/math.sfn:8:5

A test block has three parts: the keyword test, a string name, and a body enclosed in braces.

test "descriptive name here" {
// assertions and logic
}

Test names are documentation. A reader scanning a test file should be able to understand the intended behavior of the system just by reading the test names. Write them as statements of fact about what the code does:

// Good — describes observable behavior
test "parse_int returns error variant on empty string" { ... }
test "array push increases length by one" { ... }
test "Config.load reads from XDG_CONFIG_HOME when set" { ... }
// Avoid — too vague
test "parse_int works" { ... }
test "test1" { ... }
test "edge case" { ... }

assert takes a boolean expression. If the expression evaluates to false, the test fails immediately with the failing expression printed as-is.

test "string length" {
let s = "hello";
assert s.length == 5;
}

assert is a statement, not a function call — there are no parentheses around the expression. When the assertion fails, Sailfin prints both sides of the comparison so you can see the mismatch without adding debug output yourself.

You can include multiple assertions in one test:

test "normalise_email lowercases and trims" {
let result = normalise_email(" [email protected] ");
assert result == "[email protected]";
assert result.length == 17;
}

When there are multiple assertions, execution stops at the first failure. If you want to verify several independent properties, consider separate tests — each will give a clear, isolated failure message.

Coming in 1.0: Richer assertion helpers such as assert_eq, assert_ne, assert_contains, and assert_throws are on the roadmap. For now, express these as boolean expressions following assert.

Tests for pure functions need no effects. These are the easiest to write, the fastest to run, and the most valuable to have.

fn clamp(value: number, min: number, max: number) -> number {
if value < min { return min; }
if value > max { return max; }
return value;
}
test "clamp returns value when within range" {
assert clamp(5, 0, 10) == 5;
}
test "clamp returns min when value is below range" {
assert clamp(-3, 0, 10) == 0;
}
test "clamp returns max when value is above range" {
assert clamp(99, 0, 10) == 10;
}
test "clamp handles equal min and max" {
assert clamp(7, 5, 5) == 5;
}

Tests declare effects exactly like functions do. This is intentional: the compiler enforces capability discipline everywhere, including test code. A test that calls fs.read without declaring ![io] is a compile error, not a runtime surprise.

test "config file loads successfully" ![io] {
let content = fs.read("tests/fixtures/sample.toml");
assert content.length > 0;
}
test "HTTP endpoint returns 200" ![io, net] {
let response = http.get("https://httpbin.org/get");
assert response.status == 200;
}

The effects listed in the test signature are exactly the capabilities that test is granted. This makes it easy to see, at a glance, which tests touch the filesystem or network.

In a large test suite, tests that declare ![io] or ![net] are the ones that may be slow, may require fixtures, and may fail due to environment issues. Tests with no effects are pure — they always run fast and always produce the same result. Effect annotations let tools (and people) reason about this without reading the test body.

For small modules, put tests in the same file as the code they test. This is the simplest option and keeps tests close to their subject:

src/math.sfn
fn factorial(n: number) -> number {
if n <= 1 { return 1; }
return n * factorial(n - 1);
}
test "factorial of zero is one" {
assert factorial(0) == 1;
}
test "factorial of five is 120" {
assert factorial(5) == 120;
}

For larger modules, or when the test file would dwarf the implementation, use a separate *_test.sfn file. The naming convention is <module>_test.sfn:

src/
├── parser.sfn
├── parser_test.sfn
├── typecheck.sfn
└── typecheck_test.sfn

For integration and end-to-end tests that don’t belong to a single module, use a tests/ directory:

project/
├── src/
│ ├── parser.sfn
│ └── parser_test.sfn # unit tests, co-located
└── tests/
├── unit/
│ └── math_test.sfn
├── integration/
│ └── pipeline_test.sfn
└── e2e/
└── full_run_test.sfn

This mirrors the structure used in the Sailfin compiler itself:

compiler/
├── src/
│ ├── parser.sfn
│ └── lexer.sfn
└── tests/
├── unit/
│ ├── parser_test.sfn
│ └── lexer_test.sfn
├── integration/
│ └── effect_checker_test.sfn
└── e2e/
└── full_pipeline_test.sfn
Terminal window
# Run all tests discovered from the current directory
sfn test
# Run tests in a specific file
sfn test src/parser_test.sfn
# Run tests in a directory (recursive)
sfn test tests/unit/
# Note: --filter is not yet supported; run a specific file to narrow scope
sfn test src/math_test.sfn

Projects built with the standard Sailfin Makefile have these targets:

Terminal window
make test # Full suite: unit + integration + e2e
make test-unit # Unit tests only
make test-integration # Integration tests only
make test-e2e # End-to-end tests only
PASS factorial of zero is one
PASS factorial of five is 120
FAIL factorial of negative number returns one
assertion failed: factorial(-1) == 1
left: -1
right: 1
--> src/math.sfn:24:5
3 tests, 2 passed, 1 failed

Tests run in declaration order within a file. When running multiple files, the order between files is alphabetical.

When you have many similar cases to verify, use an array of test-case structs and loop over them. This avoids repetitive test bodies and makes it easy to add new cases.

struct ParseCase {
input: string;
expected: number;
should_fail: boolean;
}
struct ParseError {
message: string;
}
test "parse_int handles all cases" {
let cases: ParseCase[] = [
ParseCase { input: "0", expected: 0, should_fail: false },
ParseCase { input: "42", expected: 42, should_fail: false },
ParseCase { input: "-7", expected: -7, should_fail: false },
ParseCase { input: "", expected: 0, should_fail: true },
ParseCase { input: "abc", expected: 0, should_fail: true },
ParseCase { input: "2147483648", expected: 0, should_fail: true },
];
for c in cases {
let result = parse_int(c.input); // returns number | ParseError
match result {
ParseError { message } => assert c.should_fail,
_ => {
assert !c.should_fail;
assert result == c.expected;
},
}
}
}

Always test the boundaries of your domain. For a function that works on collections, test the empty case, the one-element case, and a representative multi-element case.

fn median(values: number[]) -> number { ... }
test "median of empty array returns 0.0" {
assert median([]) == 0.0;
}
test "median of single element returns that element" {
assert median([7.0]) == 7.0;
}
test "median of even-length array averages middle two" {
assert median([1.0, 2.0, 3.0, 4.0]) == 2.5;
}
test "median of odd-length array returns middle element" {
assert median([1.0, 3.0, 5.0]) == 3.0;
}

When a function returns a tagged enum, match against it directly. This gives you a clearer failure message than unwrapping.

enum Direction {
North,
South,
East,
West,
}
test "parse_direction recognises north" {
let result = parse_direction("north"); // returns Direction | ParseError
match result {
Direction.North => { /* pass */ },
ParseError { message } => assert false, // unexpected error
_ => assert false, // wrong variant
}
}

Integration tests verify that multiple components work correctly together, often using real effects like the filesystem or network. They are slower than unit tests and may require setup.

test "round-trip: write then read returns original content" ![io] {
let path = "tests/fixtures/temp_roundtrip.txt";
let original = "hello, world\n";
try {
fs.write(path, original);
let recovered = fs.read(path);
assert recovered == original;
} finally {
fs.delete(path);
}
}

The finally block runs even if an assertion fails, ensuring the temporary file is cleaned up regardless of the test outcome.

Put static test files in a tests/fixtures/ directory and read them in tests that need them:

test "config parser handles multi-section TOML" ![io] {
let source = fs.read("tests/fixtures/multi_section.toml");
let config = Config.parse(source);
assert config.sections.length == 3;
}

Keep fixtures small and purpose-built. A fixture that exists to test one behavior should not be reused for a different test if the two tests might need to diverge.

Sailfin does not have beforeEach/afterEach hooks. Instead, extract setup into a helper function and call it at the top of each test that needs it. Use try/finally for teardown:

fn create_temp_dir() -> string ![io, rand] {
let path = "tests/temp/{{rand.uuid()}}";
fs.mkdir(path);
return path;
}
test "compiler emits expected IR" ![io, rand] {
let dir = create_temp_dir();
try {
let source = "fn main() ![io] { print(\"hi\"); }";
let out_path = "{{dir}}/out.sfn-asm";
compile_to_file(source, out_path);
let ir = fs.read(out_path);
assert ir.contains("define void @main");
} finally {
fs.remove_all(dir);
}
}

To verify that a function throws under expected conditions, wrap the call in a try/catch block inside the test. The test fails if the exception is not thrown.

test "divide throws on zero divisor" {
let threw = false;
try {
let _ = divide(10, 0);
} catch (err) {
threw = true;
}
assert threw;
}

To verify that the right kind of error is thrown, dispatch on the error’s shape inside the catch block using match:

test "parse_config throws ParseError on malformed input" {
let got_parse_error = false;
try {
let _ = parse_config("{{ not valid toml }}}");
} catch (err) {
match err {
ParseError { message } => { got_parse_error = true; },
_ => assert false, // unexpected error shape
}
}
assert got_parse_error;
}

Coming in 1.0: Typed catch clauses — catch (err: ParseError) { ... } — are on the roadmap.

Functions that return T | ErrorType don’t throw — they return an error value in the union. Test these by matching against the result:

test "parse_int returns success for valid input" {
let result = parse_int("42");
match result {
ParseError { message } => assert false,
_ => assert result == 42,
}
}
test "parse_int returns error for non-numeric input" {
let result = parse_int("not a number");
match result {
ParseError { message } => assert message.contains("not a number"),
_ => assert false,
}
}

Current status: model and prompt blocks parse correctly today but do not execute. Model invocation is planned for after the 1.0 release. This section describes the intended testing story.

When model execution lands, prompt-based functions will require ![model] effects and will be testable using a seed parameter for reproducibility:

// Planned execution — parses today; model invocation lands post-1.0
test "summarize returns a short string" ![model] {
let result = summarize("A very long article about the history of programming languages...");
assert result.length < 200;
}

Each model call produces a generation card containing the seed used. By fixing the seed, you get deterministic output across runs — the same prompt with the same seed against the same model version will always return the same response.

The workflow will look like:

  1. Run the test once with a known seed to capture the expected output.
  2. Pin the expected output and seed in the test.
  3. Future runs compare against the pinned output — any change is a signal to review.

This is deliberately different from mocking: you’re testing the real model behavior, pinned to a specific configuration.

Coverage tooling is planned. The goal is line-level coverage that understands effect boundaries — so you can ask “do I have coverage for all my ![net] paths?” not just “which lines were executed?”.

For now, use the discipline of writing tests first to drive coverage naturally.

Test names are documentation. Someone reading your test file shouldn’t need to open the implementation to understand what the code is supposed to do.

One behavior per test. When a test verifies a single behavior, the failure message is precise. When a test verifies five behaviors, a failure tells you something broke but not what.

// Prefer: one behavior per test
test "trim removes leading whitespace" {
assert trim(" hello") == "hello";
}
test "trim removes trailing whitespace" {
assert trim("hello ") == "hello";
}
// Avoid: too many behaviors in one test
test "trim works" {
assert trim(" hello") == "hello";
assert trim("hello ") == "hello";
assert trim(" hello ") == "hello";
assert trim("") == "";
assert trim("no spaces") == "no spaces";
}

Test the interface, not the implementation. If your test breaks when you refactor internals without changing behavior, the test is testing the wrong thing.

Keep unit tests fast. A test suite that takes two minutes to run will be skipped. Unit tests should take milliseconds. If a test is slow, look for ![io] or ![net] effects — those are your slow paths. Separate them into an integration test suite that you run less frequently.

Write a test for every bug fix. Before fixing a bug, write a test that reproduces it. Then fix the bug. Then confirm the test passes. This prevents regressions and documents the exact scenario that was broken.

Use try/finally for cleanup. Any test that creates temporary files, starts servers, or modifies shared state should clean up in a finally block. This ensures cleanup happens even when assertions fail.