Replace numerically coded variables with human-readable values#
- ccao.vars_recode(data: DataFrame, cols: List[str] | None = None, code_type: str = 'long', as_factor: bool = True, dictionary: DataFrame | None = None) DataFrame #
Replace numerically coded variables with human-readable values.
The system of record stores characteristic values in a numerically encoded format. This function can be used to translate those values into a human-readable format. For example, EXT_WALL = 2 will become EXT_WALL = “Masonry”. Note that the values and their translations must be specified via a user-defined dictionary. The default dictionary is
vars_dict
.Options for
code_type
are:"long"
, which transforms EXT_WALL = 1 to EXT_WALL = Frame"short"
, which transforms EXT_WALL = 1 to EXT_WALL = FRME"code"
, which keeps the original values (useful for removing improperly coded values, see the note below)
- Parameters:
data (pandas.DataFrame) – A pandas DataFrame with columns to have values replaced.
cols (list[str]) – A list of column names to be transformed, or
None
to select all columns.code_type (str) – The recoding type. See description above for options.
as_factor (bool) – If True, re-encoded values will be returned as categorical variables (pandas Categorical). If False, re-encoded values will be returned as plain strings.
dictionary (pandas.DataFrame) – A pandas DataFrame representing the dictionary used to translate encodings.
- Raises:
ValueError – If the dictionary is missing required columns or if invalid input is provided.
- Returns:
The input DataFrame with re-encoded values for the specified columns.
- Return type:
pandas.DataFrame
Note
Values which are in the data but are NOT in the dictionary will be converted to NaN.
- Example:
import ccao sample_data = ccao.sample_athena # Defaults to `long` code type ccao.vars_recode(data=sample_data) # Recode to `short` code type ccao.vars_recode(data=sample_data, code_type="short") # Recode only specified columns ccao.vars_recode(data=sample_data, cols="GAR1_SIZE")