GAM¶
-
class
pygam.pygam.
GAM
(lam=0.6, max_iter=100, n_splines=25, spline_order=3, penalties='auto', tol=0.0001, distribution='normal', link='identity', callbacks=['deviance', 'diffs'], fit_intercept=True, fit_linear=False, fit_splines=True, dtype='auto', constraints=None, verbose=False)¶ Bases:
pygam.core.Core
Generalized Additive Model
Parameters: - callbacks (list of strings or list of CallBack objects,) – default: [‘deviance’, ‘diffs’] Names of callback objects to call during the optimization loop.
- constraints (str or callable, or iterable of str or callable,) –
default: None Names of constraint functions to call during the optimization loop.
- Must be in {‘convex’, ‘concave’, ‘monotonic_inc’, ‘monotonic_dec’,
- ’circular’, ‘none’}
If None, then the model will apply no constraints.
If only one str or callable is specified, then is it copied for all features.
- distribution (str or Distribution object, default: 'normal') – Distribution to use in the model.
- link (str or Link object, default: 'identity') – Link function to use in the model.
- dtype (str in {'auto', 'numerical', 'categorical'},) –
or list of str, default: ‘auto’ String describing the data-type of each feature.
- ’numerical’ is used for continuous-valued data-types,
- like in regression.
- ’categorical’ is used for discrete-valued data-types,
- like in classification.
If only one str is specified, then is is copied for all features.
- lam (float or iterable of floats > 0, default: 0.6) –
Smoothing strength; must be a positive float, or one positive float per feature.
Larger values enforce stronger smoothing.
If only one float is specified, then it is copied for all features.
- fit_intercept (bool, default: True) –
Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.
NOTE: the intercept receives no smoothing penalty.
- fit_linear (bool or iterable of bools, default: False) –
Specifies if a linear term should be added to any of the feature functions. Useful for including pre-defined feature transformations in the model.
If only one bool is specified, then it is copied for all features.
- NOTE: Many constraints are incompatible with an additional linear fit.
- eg. if a non-zero linear function is added to a periodic spline
function, it will cease to be periodic.
this is also possible for a monotonic spline function.
- fit_splines (bool or iterable of bools, default: True) –
Specifies if a smoother should be added to any of the feature functions. Useful for defining feature transformations a-priori that should not have splines fitted to them.
If only one bool is specified, then it is copied for all features.
NOTE: fit_splines supercedes n_splines. ie. if n_splines > 0 and fit_splines = False, no splines will be fitted.
- max_iter (int, default: 100) – Maximum number of iterations allowed for the solver to converge.
- penalties (str or callable, or iterable of str or callable,) –
default: ‘auto’ Type of penalty to use for each feature.
penalty should be in {‘auto’, ‘none’, ‘derivative’, ‘l2’, }
If ‘auto’, then the model will use 2nd derivative smoothing for features of dtype ‘numerical’, and L2 smoothing for features of dtype ‘categorical’.
If only one str or callable is specified, then is it copied for all features.
- n_splines (int, or iterable of ints, default: 25) –
Number of splines to use in each feature function; must be non-negative. If only one int is specified, then it is copied for all features.
Note: this value is set to 0 if fit_splines is False
- spline_order (int, or iterable of ints, default: 3) –
Order of spline to use in each feature function; must be non-negative. If only one int is specified, then it is copied for all features
Note: if a feature is of type categorical, spline_order will be set to 0.
- tol (float, default: 1e-4) – Tolerance for stopping criteria.
- verbose (bool, default: False) – whether to show pyGAM warnings
-
coef_
¶ array, shape (n_classes, m_features) – Coefficient of the features in the decision function. If fit_intercept is True, then self.coef_[0] will contain the bias.
-
statistics_
¶ dict – Dictionary containing model statistics like GCV/UBRE scores, AIC/c, parameter covariances, estimated degrees of freedom, etc.
-
logs_
¶ dict – Dictionary containing the outputs of any callbacks at each optimization loop.
The logs are structured as {callback: […]}
References
Simon N. Wood, 2006 Generalized Additive Models: an introduction with R
Hastie, Tibshirani, Friedman The Elements of Statistical Learning http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf
Paul Eilers & Brian Marx, 2015 International Biometric Society: A Crash Course on P-splines http://www.ibschannel2015.nl/project/userfiles/Crash_course_handout.pdf
-
confidence_intervals
(X, width=0.95, quantiles=None)¶ estimate confidence intervals for the model.
Parameters: - X (array-like of shape (n_samples, m_features)) – input data matrix
- width (float on [0,1], default: 0.95) –
- quantiles (array-like of floats in (0, 1), default: None) – instead of specifying the prediciton width, one can specify the quantiles. so width=.95 is equivalent to quantiles=[.025, .975]
Returns: intervals
Return type: np.array of shape (n_samples, 2 or len(quantiles))
Notes
- Wood 2006, section 4.9
- Confidence intervals based on section 4.8 rely on large sample results to deal with non-Gaussian distributions, and treat the smoothing parameters as fixed, when in reality they are estimated from the data.
-
deviance_residuals
(X, y, weights=None, scaled=False)¶ method to compute the deviance residuals of the model
these are analogous to the residuals of an OLS.
Parameters: - X (array-like) – input data array of shape (n_saples, m_features)
- y (array-like) – output data vector of shape (n_samples,)
- weights (array-like shape (n_samples,) or None, default: None) – containing sample weights if None, defaults to array of ones
- scaled (bool, default: False) – whether to scale the deviance by the (estimated) distribution scale
Returns: deviance_residuals – with shape (n_samples,)
Return type: np.array
-
fit
(X, y, weights=None)¶ Fit the generalized additive model.
Parameters: - X (array-like, shape (n_samples, m_features)) – Training vectors, where n_samples is the number of samples and m_features is the number of features.
- y (array-like, shape (n_samples,)) – Target values (integers in classification, real numbers in regression) For classification, labels must correspond to classes.
- weights (array-like shape (n_samples,) or None, default: None) – containing sample weights if None, defaults to array of ones
Returns: self – Returns fitted GAM object
Return type:
-
generate_X_grid
(n=500)¶ create a nice grid of X data
array is sorted by feature and uniformly spaced, so the marginal and joint distributions are likely wrong
Parameters: n (int, default: 500) – number of data points to create Returns: Return type: np.array of shape (n, n_features)
-
gridsearch
(X, y, weights=None, return_scores=False, keep_best=True, objective='auto', progress=True, **param_grids)¶ Performs a grid search over a space of parameters for a given objective
Warning
gridsearch
is lazy and will not remove useless combinations from the search space, eg.>>> n_splines=np.arange(5,10), fit_splines=[True, False]
will result in 10 loops, of which 5 are equivalent because even though fit_splines==False
it is not recommended to search over a grid that alternates between known scales and unknown scales, as the scores of the candidate models will not be comparable.
Parameters: - X (array) – input data of shape (n_samples, m_features)
- y (array) – label data of shape (n_samples,)
- weights (array-like shape (n_samples,) or None, default: None) – containing sample weights if None, defaults to array of ones
- return_scores (boolean, default False) – whether to return the hyperpamaters and score for each element in the grid
- keep_best (boolean) – whether to keep the best GAM as self. default: True
- objective (string, default: 'auto') – metric to optimize. must be in [‘AIC’, ‘AICc’, ‘GCV’, ‘UBRE’, ‘auto’] if ‘auto’, then grid search will optimize GCV for models with unknown scale and UBRE for models with known scale.
- progress (bool, default: True) – whether to display a progress bar
- **kwargs (dict, default {'lam': np.logspace(-3, 3, 11)}) –
pairs of parameters and iterables of floats, or parameters and iterables of iterables of floats.
if iterable of iterables of floats, the outer iterable must have length m_features.
the method will make a grid of all the combinations of the parameters and fit a GAM to each combination.
Returns: if return_scores == True –
- model_scores : dict
Contains each fitted model as keys and corresponding objective scores as values
else – self, ie possibly the newly fitted model
-
loglikelihood
(X, y, weights=None)¶ compute the log-likelihood of the dataset using the current model
Parameters: - X (array-like of shape (n_samples, m_features)) – containing the input dataset
- y (array-like of shape (n,)) – containing target values
- weights (array-like of shape (n,)) – containing sample weights
Returns: log-likelihood – containing log-likelihood scores
Return type: np.array of shape (n,)
-
partial_dependence
(X=None, feature=-1, width=None, quantiles=None)¶ Computes the feature functions for the GAM and possibly their confidence intervals.
if both width=None and quantiles=None, then no confidence intervals are computed
Parameters: - X (array or None, default: None) – input data of shape (n_samples, m_features). if None, an equally spaced grid of 500 points is generated for each feature function.
- feature (array-like of ints, default: -1) – feature for which to compute the partial dependence functions if feature == -1, then all features are selected, excluding the intercept if feature == ‘intercept’ and gam.fit_intercept is True, then the intercept’s partial dependence is returned
- width (float on (0, 1), default: None) – width of the confidence interval if None, defaults to 0.95
- quantiles (array-like of floats on (0, 1), default: None) – instead of specifying the prediciton width, one can specify the quantiles. so width=.95 is equivalent to quantiles=[.025, .975] if None, defaults to width
Returns: - pdeps (np.array of shape (n_samples, len(feature)))
- conf_intervals (list of length len(feature)) – containing np.arrays of shape (n_samples, 2 or len(quantiles))
-
predict
(X)¶ preduct expected value of target given model and input X often this is done via expected value of GAM given input X
Parameters: X (array-like of shape (n_samples, m_features), default: None) – containing the input dataset Returns: y – containing predicted values under the model Return type: np.array of shape (n_samples,)
-
predict_mu
(X)¶ preduct expected value of target given model and input X
Parameters: X (array-like of shape (n_samples, m_features), default: None) – containing the input dataset Returns: y – containing expected values under the model Return type: np.array of shape (n_samples,)
-
sample
(X, y, quantity='y', sample_at_X=None, weights=None, n_draws=100, n_bootstraps=1, objective='auto')¶ Simulate from the posterior of the coefficients and smoothing params.
Samples are drawn from the posterior of the coefficients and smoothing parameters given the response in an approximate way. The GAM must already be fitted before calling this method; if the model has not been fitted, then an exception is raised. Moreover, it is recommended that the model and its hyperparameters be chosen with gridsearch (with the parameter keep_best=True) before calling sample, so that the result of that gridsearch can be used to generate useful response data and so that the model’s coefficients (and their covariance matrix) can be used as the first bootstrap sample.
These samples are drawn as follows. Details are in the reference below.
1. n_bootstraps many “bootstrap samples” of the response (y) are simulated by drawing random samples from the model’s distribution evaluated at the expected values (mu) for each sample in X. 2. A copy of the model is fitted to each of those bootstrap samples of the response. The result is an approximation of the distribution over the smoothing parameter lam given the response data y. 3. Samples of the coefficients are simulated from a multivariate normal using the bootstrap samples of the coefficients and their covariance matrices.
Notes
A
gridsearch
is donen_bootstraps
many times, so keep n_bootstraps small. Make n_bootstraps < n_draws to take advantage of the expensive bootstrap samples of the smoothing parameters.For now, the grid of lam values is the default of gridsearch. Until randomized grid search is implemented, it is not worth setting n_bootstraps to a value greater than one because the smoothing parameters will be identical in each bootstrap sample.
Parameters: - X (array of shape (n_samples, m_features)) – empirical input data
- y (array of shape (n_samples,)) – empirical response vector
- quantity ({'y', 'coef', 'mu'}, default: 'y') – What quantity to return pseudorandom samples of. If sample_at_X is not None and quantity is either ‘y’ or ‘mu’, then samples are drawn at the values of X specified in sample_at_X.
- sample_at_X (array of shape (n_samples_to_simulate, m_features) or) –
- default (None,) –
Input data at which to draw new samples.
Only applies for quantity equal to ‘y’ or to ‘mu’. If None, then sample_at_X is replaced by X.
- weights (np.array of shape (n_samples,)) – sample weights
- n_draws (positive int, default: 100) – The number of samples to draw from the posterior distribution of the coefficients and smoothing parameters
- n_bootstraps (positive int, default: 1) – The number of bootstrap samples to draw from simulations of the response (from the already fitted model) to estimate the distribution of the smoothing parameters given the response data. If n_bootstraps is 1, then only the already fitted model’s smoothing parameter is used, and the distribution over the smoothing parameters is not estimated using bootstrap sampling.
- objective (string, default: 'auto') – metric to optimize in grid search. must be in [‘AIC’, ‘AICc’, ‘GCV’, ‘UBRE’, ‘auto’] if ‘auto’, then grid search will optimize GCV for models with unknown scale and UBRE for models with known scale.
Returns: draws – Simulations of the given quantity using samples from the posterior distribution of the coefficients and smoothing parameter given the response data. Each row is a pseudorandom sample.
If quantity == ‘coef’, then the number of columns of draws is the number of coefficients (len(self.coef_)).
Otherwise, the number of columns of draws is the number of rows of sample_at_X if sample_at_X is not None or else the number of rows of X.
Return type: 2D array of length n_draws
References
Simon N. Wood, 2006. Generalized Additive Models: an introduction with R. Section 4.9.3 (pages 198–199) and Section 5.4.2 (page 256–257).