Wrapper for lgb.train tree-based models with some expanded/advanced options.

train_lightgbm(
  x,
  y,
  num_iterations = 10,
  max_depth = 17,
  num_leaves = 31,
  link_max_depth = FALSE,
  add_to_linked_depth = 2L,
  categorical_feature = NULL,
  weight = NULL,
  validation = 0,
  sample_type = "random",
  early_stop = NULL,
  max_bin = NULL,
  feature_pre_filter = FALSE,
  free_raw_data = TRUE,
  verbose = 0,
  save_tree_error = FALSE,
  ...
)

Arguments

x

A matrix of predictors.

y

A numeric vector of outcome data.

num_iterations

Integer value for the number of iterations (trees) to grow.

max_depth

Integer value for the maximum leaf distance from the root node.

num_leaves

Integer value for the maximum possible number of leaves in one tree.

link_max_depth

Logical, default FALSE. When TRUE, and when max_depth is unconstrained -1, then max_depth will be set to floor(log2(num_leaves)) + link_max_depth_add.

add_to_linked_depth

Integer value to add to max_depth when it is linked to num_leaves.

categorical_feature

A character vector of feature names or an integer vector with the indices of the features.

weight

A numeric vector of sample weights. Should be the same length as the number of rows of x.

validation

A positive number on [0, 1). validation is the proportion of data in x and y that is used for performance assessment and early stopping.

sample_type

The sampling method for the validation set. Can be either "random" (a completely random sample) or "recent" (the last X where X is the proportion specified by validation).

early_stop

An integer or NULL. If an integer, it is the number of iterations without improvement before stopping. Must be set when validation is > 0.

max_bin

Max number of bins that feature values will be bucketed in.

feature_pre_filter

Tell LightGBM to ignore the features that are unsplittable based on min_data_in_leaf.

free_raw_data

LightGBM constructs its data format, called a "Dataset", from tabular data. By default, that Dataset object on the R side does not keep a copy of the raw data. This reduces LightGBM's memory consumption, but it means that the Dataset object cannot be changed after it has been constructed. If you'd prefer to be able to change the Dataset object after construction, set free_raw_data = FALSE. Useful for debugging.

verbose

Integer. < 0: Fatal, = 0: Error (Warning), = 1: Info, > 1: Debug.

save_tree_error

Boolean. Whether or not to use the training set to compute errors for each tree that will be stored on the record_evals attribute. Note that this parameter is mutually exclusive with validation and early_stop because otherwise it can override the set used for cross validation.

...

Engine arguments, hyperparameters, etc. that are passed on to lgb.train.

Value

A fitted lgb.Booster object.