The ADDCHARS SQL table includes individual rows listing the PIN, start date, and characteristic updates associated with a 288 Home Improvement Exemption. This data format is difficult to work with and complicated by the fact that multiple 288s can be active at the same time for different periods, and some columns from ADDCHARS add to existing characteristics, while some overwrite existing characteristics. Additionally, 288s can be active for multi-code properties, meaning that one building on a PIN could have an improvement while the others remain untouched.

The goal of this function is to transform these single row records into a sparse data frame which lists a row and characteristic update per PIN per YEAR.

The transformation caused by this function is most easily visualized with a mock dataset.

The base ADDCHARS data looks like this:

qu_pinyearqu_townqu_upload_datequ_sqft_bldqu_rooms
123452013771307022000
123452015771507033001

This function will transform it into this:

qu_pinyearqu_sqft_bldqu_rooms
1234520132000
1234520142000
1234520155001
1234520165001
1234520175001
1234520183001
1234520193001
1234520203001

Each PIN will have a row for each year that a 288 is active and a column for each characteristic specified by additive_source and replacement_source. This table can be easily left joined to modeling data to update characteristic values.

chars_sparsify(
  data,
  pin_col,
  year_col,
  town_col,
  upload_date_col,
  additive_source,
  replacement_source
)

Arguments

data

A data frame containing ADDCHARS columns.

pin_col

A column name specifying the PIN column.

year_col

A column name specifying the year column.

town_col

A column name specifying the town column. This is used to determine the length of the 288, given the starting date specified by year_col.

upload_date_col

A column name specifying the upload date of the 288, this is use to determine which replacement characteristics take precedence if multiple 288s were filed in the same year (the latest one is chosen).

additive_source

A tidyselect selection of columns which contains additive characteristic values. These are values such as square feet which get ADDED to existing characteristics. For example, if qu_sqft_bld is equal to 100, then 100 sqft get added to the existing square footage of the property. The easiest way to specify all additive columns is to use the built-in crosswalk, any_of(ccao::chars_cols$add$source).

replacement_source

A tidyselect selection of columns which contains replacement characteristic values. These are values such as number of rooms which get OVERWRITE existing characteristics. For example, if QU_ROOMS is equal to 3, then the property will be update to have 3 total rooms. The easiest way to specify all replacement columns is to use the built-in crosswalk, any_of(ccao::chars_cols$replace$source).

Value

A sparsified data frame of characteristic updates per PIN per year. See example.

Note

Use dplyr/tidyselect syntax for specifying column names. For example, use qu_sqft_bld instead of "qu_sqft_bld", or use a tidyselect function such as starts_with("qu_").

See also

Other chars_funs: chars_288_active(), chars_fix_age(), chars_update()

Examples


# Load the included example dataset
data("chars_sample_hie")

chars_sparsify(
  chars_sample_hie[1:3, ],
  pin_col = pin,
  year_col = year,
  town_col = as.character(qu_town),
  upload_date_col = qu_upload_date,
  additive_source = any_of(chars_cols$add$source),
  replacement_source = any_of(chars_cols$replace$source)
)
#> # A tibble: 11 × 31
#>    pin             year qu_rooms qu_beds qu_full_bath qu_half_bath qu_sqft_bld
#>    <chr>          <int>    <dbl>   <dbl>        <dbl>        <dbl>       <dbl>
#>  1 10254170360000  2017        1       1            1            0         483
#>  2 10254170360000  2018        1       1            1            0         483
#>  3 10254170360000  2019        1       1            1            0         483
#>  4 10254170360000  2020        1       1            1            0         483
#>  5 13253230040000  2017        4       2            2            0        1170
#>  6 13253230040000  2018        4       2            2            0        1170
#>  7 13253230040000  2019        4       2            2            0        1170
#>  8 13253230040000  2020        4       2            2            0        1170
#>  9 13253230040000  2021        0       0            0            0           0
#> 10 13253230040000  2022        0       0            0            0           0
#> 11 13253230040000  2023        0       0            0            0           0
#> # ℹ 24 more variables: qu_lnd_sqft <dbl>, qu_no_com_unit <dbl>,
#> #   qu_type_of_res <chr>, qu_use <chr>, qu_exterior_wall <chr>, qu_roof <chr>,
#> #   qu_basement_type <chr>, qu_basement_finish <chr>, qu_heat <chr>,
#> #   qu_air <chr>, qu_fire_place <dbl>, qu_attic_type <chr>,
#> #   qu_attic_finish <chr>, qu_type_plan <chr>, qu_type_design <chr>,
#> #   qu_construct_quality <chr>, qu_garage_const <chr>,
#> #   qu_garage_attached <chr>, qu_garage_area <chr>, qu_porch <chr>, …