Type-Safe Configs in C99: Why I Prefer Code-Gen over Parsing

In the world of C, configuration is often a “stringly-typed” nightmare.

Most developers reach for a JSON or YAML parser. At runtime, you load a file, traverse a tree of generic Value nodes, and manually cast strings to integers while praying you didn’t miss a null check. If a user provides a string where a port number should be, your application might crash, or worse, start up in a “half-broken” state that only fails hours later.

I built cfgsafe to treat configuration as a compiled asset rather than a runtime mystery. It uses a declarative schema to generate a strongly-typed C99 single-header library, shifting the burden of validation from your application’s hot path to the build-step.

cfgsafe

A declarative schema-driven configuration engine for C99, generating strongly-typed and memory-safe code.

CSECURITYAOT

Manual LoC: Adding an Integer Field (with Range Validation)

The delta here isn’t just about brevity; it’s about defensive programming. In a standard JSON library, adding a single validated integer requires:

Updating the C struct.
Fetching the object key.
Validating the existence of the key.
Validating the value type (is it actually an int?).
Validating the business logic (is it in range?).
Writing 3 separate error-handling branches.

With cfgsafe, you write one line in the schema. The other 15 lines of boilerplate are moved into the generated header, where they are guaranteed to be correct.

The Workflow: 3 Steps to Type-Safety

cfgsafe replaces runtime guesswork with a predictable build-time pipeline.

Define: Create a .schema file using the cfgsafe DSL. Here you define your types, constraints (ranges, regex), and data sources (Env, CLI).
Generate: Run the cfg-gen tool. It parses your schema and outputs a single config.h header containing a native C struct and a robust validation engine.
Load: Call the generated Config_load function in your main(). It atomically resolves all values from files, env vars, and CLI flags (passing argc and argv), validates them, and populates your struct.

1. The Philosophy: Schema-First Design

Instead of writing code to parse data, you write a schema to describe data. cfgsafe uses a custom DSL that allows you to define constraints directly in the definition.

// app.schema
schema Database {
  host: string { default: "localhost" }
  port: int    { default: 5432, range: 1..65535, env: "DB_PORT" }
  password: string { secret: true, required: true }
}

schema Config {
  service_name: string { required: true, pattern: "^[a-z0-9-]+$" }
  db: Database {}
}

By defining properties like range, pattern, or env in the schema, you eliminate hundreds of lines of manual validation logic. The generator (cfg-gen) handles the heavy lifting of ensuring the environment variables are checked and the regex is matched before your application even sees the data.

2. AOT Strong Typing: No More String Lookups

When you run cfg-gen app.schema, it outputs a native C struct. This is the “magic” of the AOT approach. You no longer access your configuration through string lookups like json_object_get_int(obj, "port"). You access it as a native member:

// Generated header snippet
typedef struct {
    const char* host;
    int64_t port;
    const char* password;
} Database_t;

typedef struct {
    const char* service_name;
    Database_t db;
} Config_t;

This provides three immediate benefits:

Compile-time Safety: If you typo cfg.db.prt, the compiler refuses to build the app.
Zero Overhead: Accessing a struct member is a single memory offset. There is no hash map lookup or string comparison at runtime.
IDE Autocomplete: Your editor knows exactly what fields exist and what their types are.

3. The Memory Model: Solving “Death by a Thousand free()”

Managing string lifetimes in C is notoriously difficult. When you parse a JSON file, you often end up with dozens of tiny allocations scattered across the heap. Freeing them correctly—especially in nested structures—is a common source of memory leaks.

cfgsafe uses an Internal Memory Pool. During the Config_load phase, all strings, arrays, and nested objects are allocated into a single, contiguous block of memory.

#define CONFIG_IMPLEMENTATION
#include "config.h"

int main(int argc, char** argv) {
  Config_t cfg;
  cfg_error_t err;

  // One call to load, validate, and resolve CLI/Env/INI
  if (Config_load(&cfg, "config.ini", argc, (const char**)argv, &err) != CFG_SUCCESS) {
    fprintf(stderr, "Config Error: %s at %s\n", err.message, err.field);
    return 1;
  }

  // ... use cfg natively ...

  // One call to free every single allocation associated with the config
  Config_free(&cfg);
  return 0;
}

By centralizing the memory management, we ensure that “half-allocated” configs are impossible. If validation fails halfway through, the library cleans up the pool automatically before returning an error.

4. Layered Resolution and Security

Modern applications rarely get their config from just one place. cfgsafe implements a strict, predictable precedence:

CLI Arguments (e.g., --db.port 8080)
Environment Variables (e.g., DB_PORT=5432)
INI File (e.g., port = 3000)
Schema Defaults

This layering is baked into the generated code. The library also treats the secret property with high priority. Fields marked as secret are automatically redacted from any auto-generated debug logs or error messages, preventing API keys or passwords from leaking into stdout.

5. Built-in Validation Primitives

The generator produces specialized validation routines for every field. For example, if you use the exists property on a string, cfgsafe will verify that the path exists on the filesystem during the load phase:

schema Logs {
  path: string { exists: true, default: "/var/log/myapp" }
}

The generated code will perform an access() check (on Unix) or GetFileAttributes (on Windows) before considering the configuration “valid.” This “fail-fast” behavior ensures that your application doesn’t start up only to crash the first time it tries to write to a non-existent log directory.

6. Comparison: Why AOT Beats Runtime Parsing

Most C developers default to libconfig or json-c. While these libraries are mature, they operate entirely at runtime, which introduces a “safety gap” between your config file and your code.

Feature	libconfig	cfgsafe
Type Safety	Runtime `config_lookup_int`	Compile-time Structs
Lookup Cost	O(log N) Hash Lookup	O(1) Memory Offset
Memory Management	Manual / Recursive	Atomic Pool
Validation	Manual (Post-parsing)	Declarative (In-Schema)
Precedence	Manual Merging	Automatic (CLI > Env > File)

With libconfig, you access data like this:

int port;
if (!config_lookup_int(&cfg, "db.port", &port)) {
    // Handle error...
}

If you typo "db.port" as "db.prt", the compiler can’t help you. You won’t know until the program runs and fails. Furthermore, libconfig doesn’t know that port should be between 1 and 65535—you have to write that if (port < 1 || port > 65535) check yourself, every time.

cfgsafe closes this gap by moving the knowledge of your data structure into the compilation phase. By the time your main() receives the Config_t struct, every field has already been validated against your schema’s constraints. If the data was invalid, Config_load would have already returned an error, preventing your app from entering an inconsistent state.

Conclusion: Treating Config as Code

In systems programming, we value predictability and performance. We use compilers to catch our mistakes before they hit production. Why should our configuration be any different?

By moving to a code-generation workflow, we stop treating configuration as a “bag of strings” and start treating it as a first-class citizen of our type system. cfgsafe isn’t just about parsing; it’s about building more resilient systems by ensuring that once your main() function starts executing, your environment is guaranteed to be valid, typed, and secure.