Analysis Architecture ===================== SqC uses a multi-pass analysis architecture: :: Source Files | v [Tree-sitter Parser] --> AST (per-file) | v [Pre-scan Pass] --> Cross-file context (function defs, summaries, macros, | struct types, global states, call graph, call-site args) v [CFG Construction] --> Per-function control-flow graphs | v [Dataflow Analysis] --> Null state, value range, reaching defs, init state | v [Rule Evaluation] --> 285 CERT C rules applied to AST + CFG + context | v [Suppression Filter] --> Hash-based + wildcard (glob/prefix) suppression | v [Export] --> CSV, XLSX, JSON, SARIF Analysis Modules ---------------- **Tree-sitter parsing** (``src/analyze/mod.rs``). Fast, incremental, error-tolerant C parsing. Each ``.c`` file is parsed into an AST; the orchestrator coordinates prescan, CFG construction, dataflow, and per-rule evaluation with optional Rayon parallelism. **Cross-file pre-scan** (``src/analyze/prescan.rs``, ``context.rs``). Walks ``-d`` directories collecting function definitions, header prototypes, function summaries, call graphs, macro constants/aliases, struct field types, global constants, and global pointer null states. Second pass aggregates call-site argument null states and propagates transitive frees through parameter pass-through chains (max 8 iterations). Results stored in ``ProjectContext``, optionally cached to binary (``--save-prescan`` / ``--load-prescan``). Consumed by 15+ rules. **Function summaries** (``src/analyze/function_summary.rs``). Lightweight inter-procedural summaries computed during prescan: ``frees_params``, ``can_return_null``, ``returns_allocation``, ``checks_null_params``, ``modifies_params``, ``dereferences_params``, ``never_returns``, ``callsite_param_null_states`` (aggregated from all call sites), ``callsite_param_field_null_states`` (struct field propagation), ``callsite_param_pointee_null_states`` (pointer-to-pointer propagation), ``return_range`` (VRA inter-procedural), ``param_passthroughs`` (transitive free tracking). Consumed by 7 rules. **Control-flow graphs** (``src/analyze/cfg.rs``). Per-function CFG with basic blocks, typed edges (Fallthrough, TrueBranch, FalseBranch, BackEdge, Return, Break, Continue, Goto), and ``condition_range`` metadata for path-sensitive edge refinement. Optional macro-constant-aware construction for dead-branch elimination. Consumed by 8 rules (INT30/31/32/33/34-C, EXP33/34-C, MEM01-C). **Null state dataflow** (``src/analyze/null_state.rs``). Forward dataflow on CFG with NullState lattice (Unknown → DefinitelyNull / PossiblyNull / NotNull). Edge refinement on branch conditions supports compound ``||`` / ``&&`` expressions. Seeded from global pointer states, call-site parameter states, and function summaries. Primary consumer: EXP34-C; also used by API00-C. **Value range analysis** (``src/analyze/value_range.rs``). CFG-based forward value-range dataflow for integer variables. Tracks ``TypedRange`` (interval + signedness/bit-width) per variable. Handles sequential assignments, conditional narrowing, loop bounds, and early-return guards. Inter-procedural return ranges from function summaries. Consumed by INT30/31/32/33/34-C. **Constant evaluation** (``src/analyze/const_eval.rs``). Syntactic constant folding of ``#define`` macro constants and arithmetic expressions. Includes built-in C99 ````/```` macros (LP64 model). ``try_evaluate_range()`` computes value ranges from constants + variables + loop bounds via ancestor walks. Consumed by 11 rules (INT, ENV, ERR, FIO, FLP, STR families). **Reaching definitions** (``src/analyze/dataflow.rs``). Standard iterative worklist algorithm computing which definitions (Declaration, Assignment, Parameter, NullAssignment, FreeCall, NullableCall) reach each program point. Supports use-after-free and null dereference queries. Primary consumer: MEM01-C. **Initialization state** (``src/analyze/init_state.rs``). Forward dataflow tracking initialization status with malloc-aware semantics (Uninitialized, MaybeUninitialized, Initialized, MallocUninitialized, MallocInitialized). Detects partial-init patterns in loops. Primary consumer: EXP33-C. **Standard function database** (``src/utility/cert_c/std_functions.rs``). ~370 C11, POSIX, and Windows API functions recognized to suppress false positives on standard library calls (DCL31-C, DCL07-C). **Suppression system** (``src/analyze/suppression.rs``). Inline ``// SQC-SUPPRESS`` comments and ``.sqc-suppress.toml`` files. SHA-256 hash-based point suppressions and glob/prefix wildcard suppressions. Current Capabilities -------------------- ==================================== ===================================================== Capability Implementation ==================================== ===================================================== Local variable/type inference Per-function ``collect_variable_types`` Preprocessor block traversal ``preproc_*`` node recursion Standard function database ~370 C11/POSIX/Windows functions Cross-file function scanning ``-d`` flag pre-scan with binary cache CFG construction Per-function with ``condition_range`` metadata Reaching definitions Iterative worklist dataflow (MEM01-C) Inter-procedural summaries Null returns, freed params, no-return, return ranges, dereferences, pass-throughs CFG-based null state dataflow Forward dataflow with NullState lattice, compound condition support, global/call-site seeding Value range analysis CFG-based forward dataflow, inter-procedural return ranges, type-aware intervals Initialization state analysis Forward dataflow with malloc-aware semantics Constant evaluation Macro resolution, built-in limits, sizeof types Call-site null propagation Aggregated argument states across all callers Transitive free propagation Parameter pass-through chains (MEM31-C) Global pointer null state Cross-file extern pointer tracking (EXP34-C) Struct field type resolution Prescan-collected struct definitions Taint tracking Intra-function (FIO30-C, STR02-C) Dead-branch elimination Macro-constant-aware CFG construction ==================================== ===================================================== Known Limitations ----------------- ============================== ==================================================== Gap Impact ============================== ==================================================== No preprocessor expansion Macros appear as function calls; partially mitigated by ``collect_macro_aliases`` No alias analysis Pointer aliasing unresolved; field-scoped alias collection causes cross-function issues No symbolic execution Complex path conditions not evaluated No SSA form No use-def chains beyond reaching definitions VRA intra-procedural only Inter-procedural argument ranges and field-sensitive VRA not implemented; return ranges available Limited taint tracking Intra-function only (STR02-C, FIO30-C); cross-function taint for injection CWEs planned Struct field tracking limited Prescan-visible structs only (INT32-C/INT30-C); no field-level free or null tracking No ownership model Cross-function memory ownership untracked; limits MEM31-C/MEM30-C precision ============================== ==================================================== Architectural Ceiling --------------------- Current TP rate: **67.5%** (Juliet, v0.3.119, 74 CWEs). The remaining gaps are concentrated in CWEs requiring deeper analysis: - **CWE-190/191** (integer overflow/underflow): 60.9%/55.3% vs clang-tidy 94%. Requires more complete value-range propagation and bounds-check recognition. - **CWE-369** (divide by zero): 56.0% vs clang-tidy 94.7%. Requires stronger zero-value tracking through assignments. - **CWE-476** (null dereference): 61.9% vs clang-tidy 94.3%. Requires deeper inter-procedural null propagation and alias analysis. - **CWE-121** (stack buffer overflow): 57.5% vs clang-tidy 86.6%. Requires symbolic buffer size tracking across assignments. Alias analysis and field-sensitive value tracking are the two capabilities most likely to lift the ceiling. Each would require significant architectural investment but could push TP rate toward 75%+. Competitor Landscape -------------------- 5-tool comparison on 15 overlapping Juliet CWEs (28,488 files): ============== ============== ========= ==================================== =========== Tool Detection Rate FP Rate Analysis Depth Price ============== ============== ========= ==================================== =========== clang-tidy 91.6% 0.8% AST + path-sensitive Free **SqC** 67.5% 32.5% AST + CFG + inter-procedural -- Frama-C 61.0% 39.0% Abstract interpretation Free Infer 43.6% 56.4% Separation logic Free cppcheck 36.4% 63.6% Data-flow Free ============== ============== ========= ==================================== =========== *SqC results from v0.3.119 Juliet benchmark (74 CWEs). Competitor figures from prior study on 15 overlapping CWEs.* SqC achieves 100% precision (zero FP) on 34 CWEs including CWE-690, CWE-761, CWE-78, and CWE-416. Broadest CWE coverage (74+ CWEs benchmarked vs clang-tidy's 15). **Key context**: Tools on average find ~20% of weaknesses in Juliet (ISSTA2022). Even commercial tools miss 27% (Goseva2015). Industry FP target for adoption is 10--20%. See :doc:`bibliography` for full references.