autocodebook - Automatic Codebook and Tracking for 'Spark' and 'dplyr'
Pipelines
Wraps 'dplyr' verbs (mutate, summarise, filter) to
automatically capture variable metadata (type, source columns,
categories, and source code), producing a codebook and
eligibility tracking table with zero manual documentation.
Works with both 'sparklyr' (tbl_spark) and local data frames.
Adds big-data optimizations (caching, assume-unique counting,
checkpointing) and a standardized report module with an
eligibility flowchart, editable codebook export (HTML, DOCX,
XLSX), and cross-sectional or longitudinal variable inspection.