Boosted trees via LightGBM — train

Wrapper for lgb.train tree-based models with some expanded/advanced options.

train_lightgbm(
  x,
  y,
  num_iterations = 10,
  max_depth = 17,
  num_leaves = 31,
  link_max_depth = FALSE,
  add_to_linked_depth = 2L,
  categorical_feature = NULL,
  weight = NULL,
  validation = 0,
  sample_type = "random",
  early_stop = NULL,
  max_bin = NULL,
  feature_pre_filter = FALSE,
  free_raw_data = TRUE,
  verbose = 0,
  save_tree_error = FALSE,
  ...
)

Arguments

x: A matrix of predictors.
y: A numeric vector of outcome data.
num_iterations: Integer value for the number of iterations (trees) to grow.
max_depth: Integer value for the maximum leaf distance from the root node.
num_leaves: Integer value for the maximum possible number of leaves in one tree.
link_max_depth: Logical, default FALSE. When TRUE, and when max_depth is unconstrained -1, then max_depth will be set to floor(log2(num_leaves)) + link_max_depth_add.
add_to_linked_depth: Integer value to add to max_depth when it is linked to num_leaves.
categorical_feature: A character vector of feature names or an integer vector with the indices of the features.
weight: A numeric vector of sample weights. Should be the same length as the number of rows of x.
validation: A positive number on [0, 1). validation is the proportion of data in x and y that is used for performance assessment and early stopping.
sample_type: The sampling method for the validation set. Can be either "random" (a completely random sample) or "recent" (the last X where X is the proportion specified by validation).
early_stop: An integer or NULL. If an integer, it is the number of iterations without improvement before stopping. Must be set when validation is > 0.
max_bin: Max number of bins that feature values will be bucketed in.
feature_pre_filter: Tell LightGBM to ignore the features that are unsplittable based on min_data_in_leaf.
free_raw_data: LightGBM constructs its data format, called a "Dataset", from tabular data. By default, that Dataset object on the R side does not keep a copy of the raw data. This reduces LightGBM's memory consumption, but it means that the Dataset object cannot be changed after it has been constructed. If you'd prefer to be able to change the Dataset object after construction, set free_raw_data = FALSE. Useful for debugging.
verbose: Integer. < 0: Fatal, = 0: Error (Warning), = 1: Info, > 1: Debug.
save_tree_error: Boolean. Whether or not to use the training set to compute errors for each tree that will be stored on the record_evals attribute. Note that this parameter is mutually exclusive with validation and early_stop because otherwise it can override the set used for cross validation.
...: Engine arguments, hyperparameters, etc. that are passed on to lgb.train.

Value

A fitted lgb.Booster object.