Analysis Architecture
SqC uses a multi-pass analysis architecture:
Source Files
|
v
[Tree-sitter Parser] --> AST (per-file)
|
v
[Pre-scan Pass] --> Cross-file context (function defs, summaries, macros,
| struct types, global states, call graph, call-site args)
v
[CFG Construction] --> Per-function control-flow graphs
|
v
[Dataflow Analysis] --> Null state, value range, reaching defs, init state
|
v
[Rule Evaluation] --> 285 CERT C rules applied to AST + CFG + context
|
v
[Suppression Filter] --> Hash-based + wildcard (glob/prefix) suppression
|
v
[Export] --> CSV, XLSX, JSON, SARIF
Analysis Modules
- Tree-sitter parsing (
src/analyze/mod.rs). Fast, incremental, error-tolerant C parsing. Each
.cfile is parsed into an AST; the orchestrator coordinates prescan, CFG construction, dataflow, and per-rule evaluation with optional Rayon parallelism.- Cross-file pre-scan (
src/analyze/prescan.rs,context.rs). Walks
-ddirectories collecting function definitions, header prototypes, function summaries, call graphs, macro constants/aliases, struct field types, global constants, and global pointer null states. Second pass aggregates call-site argument null states and propagates transitive frees through parameter pass-through chains (max 8 iterations). Results stored inProjectContext, optionally cached to binary (--save-prescan/--load-prescan). Consumed by 15+ rules.- Function summaries (
src/analyze/function_summary.rs). Lightweight inter-procedural summaries computed during prescan:
frees_params,can_return_null,returns_allocation,checks_null_params,modifies_params,dereferences_params,never_returns,callsite_param_null_states(aggregated from all call sites),callsite_param_field_null_states(struct field propagation),callsite_param_pointee_null_states(pointer-to-pointer propagation),return_range(VRA inter-procedural),param_passthroughs(transitive free tracking). Consumed by 7 rules.- Control-flow graphs (
src/analyze/cfg.rs). Per-function CFG with basic blocks, typed edges (Fallthrough, TrueBranch, FalseBranch, BackEdge, Return, Break, Continue, Goto), and
condition_rangemetadata for path-sensitive edge refinement. Optional macro-constant-aware construction for dead-branch elimination. Consumed by 8 rules (INT30/31/32/33/34-C, EXP33/34-C, MEM01-C).- Null state dataflow (
src/analyze/null_state.rs). Forward dataflow on CFG with NullState lattice (Unknown → DefinitelyNull / PossiblyNull / NotNull). Edge refinement on branch conditions supports compound
||/&&expressions. Seeded from global pointer states, call-site parameter states, and function summaries. Primary consumer: EXP34-C; also used by API00-C.- Value range analysis (
src/analyze/value_range.rs). CFG-based forward value-range dataflow for integer variables. Tracks
TypedRange(interval + signedness/bit-width) per variable. Handles sequential assignments, conditional narrowing, loop bounds, and early-return guards. Inter-procedural return ranges from function summaries. Consumed by INT30/31/32/33/34-C.- Constant evaluation (
src/analyze/const_eval.rs). Syntactic constant folding of
#definemacro constants and arithmetic expressions. Includes built-in C99<limits.h>/<stdint.h>macros (LP64 model).try_evaluate_range()computes value ranges from constants + variables + loop bounds via ancestor walks. Consumed by 11 rules (INT, ENV, ERR, FIO, FLP, STR families).- Reaching definitions (
src/analyze/dataflow.rs). Standard iterative worklist algorithm computing which definitions (Declaration, Assignment, Parameter, NullAssignment, FreeCall, NullableCall) reach each program point. Supports use-after-free and null dereference queries. Primary consumer: MEM01-C.
- Initialization state (
src/analyze/init_state.rs). Forward dataflow tracking initialization status with malloc-aware semantics (Uninitialized, MaybeUninitialized, Initialized, MallocUninitialized, MallocInitialized). Detects partial-init patterns in loops. Primary consumer: EXP33-C.
- Standard function database (
src/utility/cert_c/std_functions.rs). ~370 C11, POSIX, and Windows API functions recognized to suppress false positives on standard library calls (DCL31-C, DCL07-C).
- Suppression system (
src/analyze/suppression.rs). Inline
// SQC-SUPPRESScomments and.sqc-suppress.tomlfiles. SHA-256 hash-based point suppressions and glob/prefix wildcard suppressions.
Current Capabilities
Capability |
Implementation |
|---|---|
Local variable/type inference |
Per-function |
Preprocessor block traversal |
|
Standard function database |
~370 C11/POSIX/Windows functions |
Cross-file function scanning |
|
CFG construction |
Per-function with |
Reaching definitions |
Iterative worklist dataflow (MEM01-C) |
Inter-procedural summaries |
Null returns, freed params, no-return, return ranges, dereferences, pass-throughs |
CFG-based null state dataflow |
Forward dataflow with NullState lattice, compound condition support, global/call-site seeding |
Value range analysis |
CFG-based forward dataflow, inter-procedural return ranges, type-aware intervals |
Initialization state analysis |
Forward dataflow with malloc-aware semantics |
Constant evaluation |
Macro resolution, built-in limits, sizeof types |
Call-site null propagation |
Aggregated argument states across all callers |
Transitive free propagation |
Parameter pass-through chains (MEM31-C) |
Global pointer null state |
Cross-file extern pointer tracking (EXP34-C) |
Struct field type resolution |
Prescan-collected struct definitions |
Taint tracking |
Intra-function (FIO30-C, STR02-C) |
Dead-branch elimination |
Macro-constant-aware CFG construction |
Known Limitations
Gap |
Impact |
|---|---|
No preprocessor expansion |
Macros appear as function calls; partially mitigated
by |
No alias analysis |
Pointer aliasing unresolved; field-scoped alias collection causes cross-function issues |
No symbolic execution |
Complex path conditions not evaluated |
No SSA form |
No use-def chains beyond reaching definitions |
VRA intra-procedural only |
Inter-procedural argument ranges and field-sensitive VRA not implemented; return ranges available |
Limited taint tracking |
Intra-function only (STR02-C, FIO30-C); cross-function taint for injection CWEs planned |
Struct field tracking limited |
Prescan-visible structs only (INT32-C/INT30-C); no field-level free or null tracking |
No ownership model |
Cross-function memory ownership untracked; limits MEM31-C/MEM30-C precision |
Architectural Ceiling
Current TP rate: 67.5% (Juliet, v0.3.119, 74 CWEs). The remaining gaps are concentrated in CWEs requiring deeper analysis:
CWE-190/191 (integer overflow/underflow): 60.9%/55.3% vs clang-tidy 94%. Requires more complete value-range propagation and bounds-check recognition.
CWE-369 (divide by zero): 56.0% vs clang-tidy 94.7%. Requires stronger zero-value tracking through assignments.
CWE-476 (null dereference): 61.9% vs clang-tidy 94.3%. Requires deeper inter-procedural null propagation and alias analysis.
CWE-121 (stack buffer overflow): 57.5% vs clang-tidy 86.6%. Requires symbolic buffer size tracking across assignments.
Alias analysis and field-sensitive value tracking are the two capabilities most likely to lift the ceiling. Each would require significant architectural investment but could push TP rate toward 75%+.
Competitor Landscape
5-tool comparison on 15 overlapping Juliet CWEs (28,488 files):
Tool |
Detection Rate |
FP Rate |
Analysis Depth |
Price |
|---|---|---|---|---|
clang-tidy |
91.6% |
0.8% |
AST + path-sensitive |
Free |
SqC |
67.5% |
32.5% |
AST + CFG + inter-procedural |
– |
Frama-C |
61.0% |
39.0% |
Abstract interpretation |
Free |
Infer |
43.6% |
56.4% |
Separation logic |
Free |
cppcheck |
36.4% |
63.6% |
Data-flow |
Free |
SqC results from v0.3.119 Juliet benchmark (74 CWEs). Competitor figures from prior study on 15 overlapping CWEs.
SqC achieves 100% precision (zero FP) on 34 CWEs including CWE-690, CWE-761, CWE-78, and CWE-416. Broadest CWE coverage (74+ CWEs benchmarked vs clang-tidy’s 15).
Key context: Tools on average find ~20% of weaknesses in Juliet (ISSTA2022). Even commercial tools miss 27% (Goseva2015). Industry FP target for adoption is 10–20%. See Bibliography for full references.