roll_rate_analysis package
Module contents
Roll rate analysis for credit risk scorecards.
- class roll_rate_analysis.MOMRollRateTable(month_i: LazyFrame | DataFrame | str | Path, month_i_plus_1: LazyFrame | DataFrame | str | Path, *, unique_key_col: str, delinquency_col: str, max_delq: int = 6, binary_cols: Sequence[str] = ())[source]
Bases:
objectMonth-over-month roll rate table for two consecutive monthly snapshots.
Parameters
- month_i:
Data for month
i. Accepts a polarsLazyFrame/DataFrameor a path/string pointing to a CSV file.- month_i_plus_1:
Data for month
i+1. Same supported types asmonth_i.- unique_key_col:
Name of the account identifier column. Must exist in both inputs.
- delinquency_col:
Name of the delinquency column (integer months past due). Must exist in both inputs.
- max_delq:
Largest delinquency level kept as its own row/column. Anything above rolls into the
N+bucket.- binary_cols:
Optional binary indicator columns to append to the matrix. Listed in descending priority — the first entry wins ties. Each indicator gets one extra row and column.
Use
>>> table = MOMRollRateTable( ... "jan.csv", "feb.csv", ... unique_key_col="id", delinquency_col="delq", max_delq=6, ... ) >>> matrix = table.compute() # polars.DataFrame, the full transition matrix >>> reduced = table.reduce() # polars.DataFrame, roll_down / stable / roll_up
computeandreduceare idempotent; the matrix is cached after the first call. Both return polarsDataFrame``s whose first column (``from_state) holds the row label.- property matrix: DataFrame
Return the cached transition matrix, computing it on first access.
- class roll_rate_analysis.SnapshotRollRateTable(snapshot: LazyFrame | DataFrame | str | Path, observation: Sequence[LazyFrame | DataFrame | str | Path], performance: Sequence[LazyFrame | DataFrame | str | Path], *, unique_key_col: str, delinquency_col: str, max_delq: int = 6, detailed: bool = False, granularity: int = 1, keep_cols: Sequence[str] | None = None)[source]
Bases:
objectRoll rate table for a snapshot month with observation and performance windows.
For every account in the snapshot, the observation window is reduced to its maximum delinquency across the supplied observation files, and similarly for the performance window. The resulting transition matrix has rows indexed by the observation max-delinquency and columns indexed by the performance max-delinquency.
Parameters
- snapshot:
Data for the snapshot month (defines the account universe). Accepts a polars
LazyFrame/DataFrameor a path/string pointing to a CSV.- observation:
Sequence of frames or paths forming the observation window.
- performance:
Sequence of frames or paths forming the performance window.
- unique_key_col:
Name of the account identifier column. Must exist in every input.
- delinquency_col:
Name of the delinquency column. Must exist in every observation and performance frame.
- max_delq:
Largest delinquency level kept as its own row/column. Anything above rolls into the
N+bucket.- detailed:
Split delinquency levels 3 and 4 into
granularitysub-rows showing how many times the account hit that level during the observation window.- granularity:
Number of sub-rows per detailed level. Must be ≥ 2 when
detailed.- keep_cols:
Optional column whitelist applied to each observation/performance frame before joining (memory optimisation). Must include
delinquency_col.
Use
>>> table = SnapshotRollRateTable( ... "snap.csv", ... ["obs1.csv", "obs2.csv"], ... ["perf1.csv", "perf2.csv"], ... unique_key_col="id", ... delinquency_col="delq", ... detailed=True, ... granularity=2, ... ) >>> matrix = table.compute() # polars.DataFrame, the full transition matrix >>> reduced = table.reduce() # polars.DataFrame, roll_down / stable / roll_up
computeandreduceare idempotent; the matrix is cached after the first call.- property extra_rows: int
Number of additional rows beyond
max_delq + 1due to detailed mode.
- property matrix: DataFrame
Return the cached transition matrix, computing it on first access.
roll_rate_analysis.mom module
Month-over-month roll rate table.
- class roll_rate_analysis.mom.MOMRollRateTable(month_i: LazyFrame | DataFrame | str | Path, month_i_plus_1: LazyFrame | DataFrame | str | Path, *, unique_key_col: str, delinquency_col: str, max_delq: int = 6, binary_cols: Sequence[str] = ())[source]
Bases:
objectMonth-over-month roll rate table for two consecutive monthly snapshots.
Parameters
- month_i:
Data for month
i. Accepts a polarsLazyFrame/DataFrameor a path/string pointing to a CSV file.- month_i_plus_1:
Data for month
i+1. Same supported types asmonth_i.- unique_key_col:
Name of the account identifier column. Must exist in both inputs.
- delinquency_col:
Name of the delinquency column (integer months past due). Must exist in both inputs.
- max_delq:
Largest delinquency level kept as its own row/column. Anything above rolls into the
N+bucket.- binary_cols:
Optional binary indicator columns to append to the matrix. Listed in descending priority — the first entry wins ties. Each indicator gets one extra row and column.
Use
>>> table = MOMRollRateTable( ... "jan.csv", "feb.csv", ... unique_key_col="id", delinquency_col="delq", max_delq=6, ... ) >>> matrix = table.compute() # polars.DataFrame, the full transition matrix >>> reduced = table.reduce() # polars.DataFrame, roll_down / stable / roll_up
computeandreduceare idempotent; the matrix is cached after the first call. Both return polarsDataFrame``s whose first column (``from_state) holds the row label.- property matrix: DataFrame
Return the cached transition matrix, computing it on first access.
roll_rate_analysis.snapshot module
Snapshot roll rate table over observation and performance windows.
- class roll_rate_analysis.snapshot.SnapshotRollRateTable(snapshot: LazyFrame | DataFrame | str | Path, observation: Sequence[LazyFrame | DataFrame | str | Path], performance: Sequence[LazyFrame | DataFrame | str | Path], *, unique_key_col: str, delinquency_col: str, max_delq: int = 6, detailed: bool = False, granularity: int = 1, keep_cols: Sequence[str] | None = None)[source]
Bases:
objectRoll rate table for a snapshot month with observation and performance windows.
For every account in the snapshot, the observation window is reduced to its maximum delinquency across the supplied observation files, and similarly for the performance window. The resulting transition matrix has rows indexed by the observation max-delinquency and columns indexed by the performance max-delinquency.
Parameters
- snapshot:
Data for the snapshot month (defines the account universe). Accepts a polars
LazyFrame/DataFrameor a path/string pointing to a CSV.- observation:
Sequence of frames or paths forming the observation window.
- performance:
Sequence of frames or paths forming the performance window.
- unique_key_col:
Name of the account identifier column. Must exist in every input.
- delinquency_col:
Name of the delinquency column. Must exist in every observation and performance frame.
- max_delq:
Largest delinquency level kept as its own row/column. Anything above rolls into the
N+bucket.- detailed:
Split delinquency levels 3 and 4 into
granularitysub-rows showing how many times the account hit that level during the observation window.- granularity:
Number of sub-rows per detailed level. Must be ≥ 2 when
detailed.- keep_cols:
Optional column whitelist applied to each observation/performance frame before joining (memory optimisation). Must include
delinquency_col.
Use
>>> table = SnapshotRollRateTable( ... "snap.csv", ... ["obs1.csv", "obs2.csv"], ... ["perf1.csv", "perf2.csv"], ... unique_key_col="id", ... delinquency_col="delq", ... detailed=True, ... granularity=2, ... ) >>> matrix = table.compute() # polars.DataFrame, the full transition matrix >>> reduced = table.reduce() # polars.DataFrame, roll_down / stable / roll_up
computeandreduceare idempotent; the matrix is cached after the first call.- property extra_rows: int
Number of additional rows beyond
max_delq + 1due to detailed mode.
- property matrix: DataFrame
Return the cached transition matrix, computing it on first access.