Transformers
scikit-learn transformers for the data.
from latent_calendar.datasets import load_online_transactions
df = load_online_transactions()
transformers = create_raw_to_vocab_transformer(id_col="Customer ID", timestamp_col="InvoiceDate")
df_wide = transformers.fit_transform(df)
CalandarTimestampFeatures
Bases: BaseEstimator
, TransformerMixin
Day of week and prop into day columns creation.
Source code in latent_calendar/transformers.py
transform(X, y=None)
Create 2 new columns.
Source code in latent_calendar/transformers.py
HourDiscretizer
Bases: BaseEstimator
, TransformerMixin
Discretize the hour column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col
|
str
|
The name of the column to discretize. |
'hour'
|
minutes
|
int
|
The number of minutes to discretize by. |
60
|
Source code in latent_calendar/transformers.py
LongToWide
Bases: BaseEstimator
, TransformerMixin
Unstack the assumed last index as vocab column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col
|
str
|
The name of the column to unstack. |
'num_events'
|
as_int
|
bool
|
Whether to cast the values to int. |
True
|
minutes
|
int
|
The number of minutes to discretize by. |
60
|
multiindex
|
bool
|
Whether the columns are a multiindex. |
True
|
Source code in latent_calendar/transformers.py
transform(X, y=None)
Unstack the assumed last index as vocab column.
Source code in latent_calendar/transformers.py
RawToVocab
Bases: BaseEstimator
, TransformerMixin
Transformer timestamp level data into id level data with vocab columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id_col
|
str
|
The name of the id column. |
required |
timestamp_col
|
str
|
The name of the timestamp column. |
required |
minutes
|
int
|
The number of minutes to discretize by. |
60
|
additional_groups
|
list[str] | None
|
Additional columns to group by. |
None
|
cols
|
list[str] | None
|
Additional columns to sum. |
None
|
as_multiindex
|
bool
|
Whether to return columns as a multiindex. |
True
|
Source code in latent_calendar/transformers.py
VocabAggregation
Bases: BaseEstimator
, TransformerMixin
NOTE: The index of the grouping stays.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
groups
|
list[str]
|
The columns to group by. |
required |
cols
|
list[str] | None
|
Additional columns to sum. |
None
|
Source code in latent_calendar/transformers.py
VocabTransformer
Bases: BaseEstimator
, TransformerMixin
Create a vocab column from the day of week and hour columns.
Source code in latent_calendar/transformers.py
create_raw_to_vocab_transformer(id_col, timestamp_col, minutes=60, additional_groups=None, as_multiindex=True)
Wrapper to create the transformer from the configuration options.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id_col
|
str
|
The name of the id column. |
required |
timestamp_col
|
str
|
The name of the timestamp column. |
required |
minutes
|
int
|
The number of minutes to discretize by. |
60
|
additional_groups
|
list[str] | None
|
Additional columns to group by. |
None
|
as_multiindex
|
bool
|
Whether to return columns as a multiindex. |
True
|
Returns:
Type | Description |
---|---|
RawToVocab
|
A transformer that transforms timestamp level data into id level data with vocab columns. |
Source code in latent_calendar/transformers.py
create_timestamp_feature_pipeline(timestamp_col, discretize=True, minutes=60, create_vocab=True)
Create a pipeline that creates features from the timestamp column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timestamp_col
|
str
|
The name of the timestamp column. |
required |
discretize
|
bool
|
Whether to discretize the hour column. |
True
|
minutes
|
int
|
The number of minutes to discretize by. Ignored if discretize is False. |
60
|
create_vocab
|
bool
|
Whether to create the vocab column. |
True
|
Returns:
Type | Description |
---|---|
Pipeline
|
A pipeline that creates features from the timestamp column. |
Example
Create features for the online transactions dataset.
Source code in latent_calendar/transformers.py
prop_into_day(dt)
Returns the proportion into the day from datetime like object.
0.0 is midnight and 1.0 is midnight again.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dt
|
datetime | DatetimeProperties
|
datetime like object |
required |
Returns:
Type | Description |
---|---|
float | Series
|
numeric value(s) between 0.0 and 1.0 |