Architecture and Internals
LicenseGuard is organized into focused modules.
Module map
licenseguard.resolver- requirements parsing and installed dependency graph walklicenseguard.license_detection- extract and normalize license metadatalicenseguard.license_tokens- OR/AND tokenization and drift comparisonslicenseguard.policy- built-in and file-backed classification ruleslicenseguard.scan- orchestrates end-to-end scan and report assemblylicenseguard.pypi- optional latest-release metadata retrieval + cachelicenseguard.webapp- FastAPI app and embedded dashboardlicenseguard.cli- command entrypoint and UX
Design principles
- Prefer installed-runtime truth over declared intent
- Keep warning behavior explicit and non-blocking where possible
- Separate parsing, resolution, classification, and presentation concerns
- Keep the frontend framework-free for operational simplicity
Resolution model
Resolver behavior is installed-only:
- Parse direct roots from
requirements.txt - Build installed distribution map from
importlib.metadata.distributions() - BFS through
requires_distbut only follow dependencies present in installed map - Skip missing dependencies silently
This keeps output aligned with the actual runtime environment.
Why installed-only matters
Installed-only scanning avoids false positives from:
- stale or partially-applied
requirements.txtfiles - optional declarations not present in runtime
- unresolved transitive declarations from uninstalled packages
Scan pipeline
- Load roots and resolve packages
- Build row data from installed distributions
- Classify status and reason via policy logic
- Append unpinned direct-dependency warnings into row reason
- Optionally enrich rows with PyPI drift fields
- Build summary and return report
Data contracts
- Resolver returns
ResolvedPackagerecords - Scan converts records into serializable row dictionaries
- Summary is derived from status counts over current row set
- Web and CLI consume the same scan result shape
Web architecture
The web app is a local FastAPI service with in-memory session state:
- Current requirements path
- Current policy config/path
- Drift flags and cache path
- Last scan result for download endpoints
Frontend is a single HTML page with vanilla JS and CSS embedded in webapp.py.
Performance notes
- Distribution map lookup is cached in resolver
- UI uses lazy section rendering and paged row expansion
- Drift mode introduces network latency and optional disk cache behavior
Extension points
Common enhancement paths:
- Add policy presets per organization
- Add SARIF or SPDX export formatters
- Add persistent report storage backend
- Add richer diffing between consecutive scans