Parsing Algorithm
CCL Parsing Algorithm
Section titled “CCL Parsing Algorithm”CCL is parsed through recursive descent to a fixed point. The algorithm is simple:
- Parse text into key-value entries
- Build nested objects from indented entries
- Recursively parse values that contain more CCL
- Stop when no more CCL syntax remains (fixed point)
Two parser architectures realize this recipe — a greedy recursive-descent (“pacman”) parser and a line-oriented indent-stack parser. Both produce identical trees from the same input; the choice is about implementation size, streaming behavior, and worst-case complexity. See Choosing a strategy below for a comparison.
Rules every parser must implement
Section titled “Rules every parser must implement”The following rules are strategy-agnostic — every CCL parser must honor them regardless of which architecture it adopts.
Input Format
Section titled “Input Format”CCL consists of key-value pairs separated by =:
key = valueanother = value with spacesnested = child = nested value sibling = another nestedParse Entries
Section titled “Parse Entries”Find the = delimiter and split into key and value. The strategy used depends on the delimiter_first_equals or delimiter_prefer_spaced behavior (see Behavior Reference):
"key = value" → Entry {key: "key", value: "value"}"a = b = c" → Entry {key: "a", value: "b = c"} (delimiter_first_equals)"a = b = c" → Entry {key: "a=b", value: "c"} (delimiter_prefer_spaced, spaced = preferred)Key whitespace rules:
- Trim all whitespace from keys (including newlines):
" key "→"key" - Keys can span multiple lines — see Multi-Line Keys below
Value whitespace rules:
- Trim leading whitespace on first line:
key = value→"value" - Trim trailing whitespace on final line
- Preserve internal structure (newlines + indentation for continuation lines)
Indentation tracking:
- Determine the baseline indentation (N) for the current parsing context
- For each subsequent line, compare its indentation to N:
indent > N→ continuation line (part of value)indent ≤ N→ new entry starts
- Which characters count as whitespace depends on parser behavior:
tabs_as_whitespace: spaces and tabs are whitespacetabs_as_content: only spaces are whitespace; tabs are content
Special keys:
- Empty key
= value→ list item - Comment entry
/ = text→ key is/, value istext
Multi-Line Keys
Section titled “Multi-Line Keys”When a line has no = delimiter, it is the beginning of a multi-line key. The OCaml reference implementation’s parser reads many (not_char '=') which naturally consumes across line boundaries until = is found — multi-line keys are a natural consequence of the parsing algorithm.
The rule is:
- Buffer the line’s content (trimmed).
- Continue reading until a line containing
=is found. - The buffered text plus any text before
=on the final line forms the key (with all whitespace including newlines collapsed and trimmed).
Examples:
key= valLine 1 has no = → buffer "key". Line 2 has = with nothing before it → key is "key", value is "val".
long keyname = AliceLines are consumed until = is found. The text before = spans both lines → key is "long key name", value is "Alice".
Implementation approaches:
- Parser combinator (as in the OCaml reference): Reading
many (not_char '=')naturally consumes across line boundaries until=is found, making multi-line keys implicit. - Explicit buffering: Buffer lines without
=and consume the buffer when a line containing=is found, prepending the buffer to the key.
Build Hierarchy
Section titled “Build Hierarchy”build_hierarchy always returns a map (object/dict). Multiple entries with the same key (including empty key "" for list items) accumulate into a list stored under that key. See AI Implementation Guide: build_hierarchy for the full algorithm and type details, and Bare List Hierarchy Representation for the canonical output shape when entries have empty keys.
Indentation determines structure. Example:
parent = child = nested sibling = anotherThe parser records parent’s indentation level (0). Lines child and sibling have greater indentation (2), so they become part of parent’s value:
Entry {key: "parent", value: "child = nested\nsibling = another"}Recursive Parsing (Fixed Point)
Section titled “Recursive Parsing (Fixed Point)”Parse values that contain CCL syntax:
value: "child = nested\nsibling = another"→ Contains '=' → Parse recursively→ Results in: {child: "nested", sibling: "another"}Fixed-point termination:
- Parse value as CCL
- If parsing yields structure → recurse on nested values
- If value has no ’=’ → stop (fixed point reached)
- Prevents infinite recursion: plain strings have no structure to parse
Complete Example
Section titled “Complete Example”Input:
database = host = localhost port = 5432
users = = alice = bobParse entries:
Entry {key: "database", value: "host = localhost\nport = 5432"}Entry {key: "users", value: "= alice\n= bob"}Recursive parsing:
database.value contains '=' → parse recursively: Entry {key: "host", value: "localhost"} Entry {key: "port", value: "5432"}
users.value contains '=' → parse recursively: Entry {key: "", value: "alice"} Entry {key: "", value: "bob"}Build objects:
{ "database": { "host": "localhost", "port": "5432" }, "users": ["alice", "bob"]}Fixed point reached: “localhost”, “5432”, “alice”, “bob” contain no ’=’ → stop.
Choosing a strategy
Section titled “Choosing a strategy”| Criterion | Pacman (OCaml) | Indent-stack (TS, Gleam) |
|---|---|---|
| Time | O(N·D) typical; O(N²) worst | O(N) |
| Memory | O(N) (holds value slices recursively) | O(D) baseline state + O(N) flat entries |
| Streaming input | Not streaming-friendly | Natural (tokenize line-by-line) |
| Implementation size | Very compact (~91 lines with combinators) | Medium (hundreds of lines) |
| Self-recursive grammar | Yes (parser is its own sub-parser) | No (two phases) |
| Multi-line keys | Natural fallout of many (not_char '=') | Feasible (Gleam does it with explicit buffering) |
| Good fit for | Combinator-heavy languages, compact implementations | Large/streaming inputs, predictable worst case |
Pick pacman when implementation size wins and you have good combinator support. Pick indent-stack when streaming or worst-case guarantees matter, or when your language’s ecosystem leans imperative or state-machine-friendly. Both satisfy the rules in Rules every parser must implement; validate conformance against ccl-test-data.
Error Handling Essentials
Section titled “Error Handling Essentials”Malformed input:
- Line with no ’=’ → part of a multi-line key; buffer and continue reading until
=is found - Inconsistent indentation → use explicit indentation counting
- Empty lines → ignore
Edge cases:
- Keys with ’=’ in them → depends on delimiter strategy. With
delimiter_first_equals(default), the first ’=’ is always the split point so keys cannot contain ’=’. Withdelimiter_prefer_spaced,=is preferred, allowing ’=’ in keys. See Behavior Reference - Values with ’=’ → fine, parse recursively
- Unicode in keys/values → valid, CCL is UTF-8
- CRLF vs LF → CCL treats only LF as a newline, so CRs present are preserved as-is
Why This Is Core CCL
Section titled “Why This Is Core CCL”From the blog post:
“The simplest possible config language is just key-value pairs. That’s it.”
The recursive fixed-point algorithm is how those key-value pairs create nested structure. The OCaml type definition makes this explicit:
type t = Fix of t Map.Make(String).tThis says: “A CCL value is a fixed point of a map from strings to CCL values.”
The recursion is the structure. The fixed point is the termination.
What’s NOT in This Algorithm
Section titled “What’s NOT in This Algorithm”These are library conveniences, not core CCL:
- Type conversion: “5432” → integer (user’s job)
- Validation: checking required keys (user’s job)
- Dotted key expansion:
foo.bar→ nested object (optional) - Pretty printing: formatting output (implementation detail)
Core CCL is: parse key-value pairs recursively until fixed point.