PoissonGAM¶

class pygam.pygam.PoissonGAM(lam=0.6, max_iter=100, n_splines=25, spline_order=3, penalties='auto', dtype='auto', tol=0.0001, callbacks=['deviance', 'diffs'], fit_intercept=True, fit_linear=False, fit_splines=True, constraints=None, verbose=False)¶

Bases: pygam.pygam.GAM

Poisson GAM

This is a GAM with a Poisson error distribution, and a log link.

Parameters:

callbacks (list of strings or list of CallBack objects,) – default: [‘deviance’, ‘diffs’] Names of callback objects to call during the optimization loop.
constraints (str or callable, or iterable of str or callable,) –
default: None Names of constraint functions to call during the optimization loop.

Must be in {‘convex’, ‘concave’, ‘monotonic_inc’, ‘monotonic_dec’,

’circular’, ‘none’}

If None, then the model will apply no constraints.

If only one str or callable is specified, then is it copied for all features.
dtype (str in {'auto', 'numerical', 'categorical'},) –
or list of str, default: ‘auto’ String describing the data-type of each feature.

’numerical’ is used for continuous-valued data-types,

like in regression.

’categorical’ is used for discrete-valued data-types,

like in classification.

If only one str is specified, then is is copied for all features.
lam (float or iterable of floats > 0, default: 0.6) –
Smoothing strength; must be a positive float, or one positive float per feature.

Larger values enforce stronger smoothing.

If only one float is specified, then it is copied for all features.
fit_intercept (bool, default: True) –
Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

NOTE: the intercept receives no smoothing penalty.
fit_linear (bool or iterable of bools, default: False) –
Specifies if a linear term should be added to any of the feature functions. Useful for including pre-defined feature transformations in the model.

If only one bool is specified, then it is copied for all features.

NOTE: Many constraints are incompatible with an additional linear fit.

eg. if a non-zero linear function is added to a periodic spline function, it will cease to be periodic.
this is also possible for a monotonic spline function.
fit_splines (bool or iterable of bools, default: True) –
Specifies if a smoother should be added to any of the feature functions. Useful for defining feature transformations a-priori that should not have splines fitted to them.

If only one bool is specified, then it is copied for all features.

NOTE: fit_splines supercedes n_splines. ie. if n_splines > 0 and fit_splines = False, no splines will be fitted.
max_iter (int, default: 100) – Maximum number of iterations allowed for the solver to converge.
penalties (str or callable, or iterable of str or callable,) –
default: ‘auto’ Type of penalty to use for each feature.

penalty should be in {‘auto’, ‘none’, ‘derivative’, ‘l2’, }

If ‘auto’, then the model will use 2nd derivative smoothing for features of dtype ‘numerical’, and L2 smoothing for features of dtype ‘categorical’.

If only one str or callable is specified, then is it copied for all features.
n_splines (int, or iterable of ints, default: 25) –
Number of splines to use in each feature function; must be non-negative. If only one int is specified, then it is copied for all features.

Note: this value is set to 0 if fit_splines is False
spline_order (int, or iterable of ints, default: 3) –
Order of spline to use in each feature function; must be non-negative. If only one int is specified, then it is copied for all features

Note: if a feature is of type categorical, spline_order will be set to 0.
tol (float, default: 1e-4) – Tolerance for stopping criteria.
verbose (bool, default: False) – whether to show pyGAM warnings

coef_¶: array, shape (n_classes, m_features) – Coefficient of the features in the decision function. If fit_intercept is True, then self.coef_[0] will contain the bias.

statistics_¶: dict – Dictionary containing model statistics like GCV/UBRE scores, AIC/c, parameter covariances, estimated degrees of freedom, etc.

logs_¶

dict – Dictionary containing the outputs of any callbacks at each optimization loop.

The logs are structured as {callback: […]}

References

Simon N. Wood, 2006 Generalized Additive Models: an introduction with R

Hastie, Tibshirani, Friedman The Elements of Statistical Learning http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf

Paul Eilers & Brian Marx, 2015 International Biometric Society: A Crash Course on P-splines http://www.ibschannel2015.nl/project/userfiles/Crash_course_handout.pdf

fit(X, y, exposure=None, weights=None)¶

Fit the generalized additive model.

Parameters:	X (array-like, shape (n_samples, m_features)) – Training vectors, where n_samples is the number of samples and m_features is the number of features. y (array-like, shape (n_samples,)) – Target values (integers in classification, real numbers in regression) For classification, labels must correspond to classes. exposure (array-like shape (n_samples,) or None, default: None) – containing exposures if None, defaults to array of ones weights (array-like shape (n_samples,) or None, default: None) – containing sample weights if None, defaults to array of ones
Returns:	self – Returns fitted GAM object
Return type:	object

gridsearch(X, y, exposure=None, weights=None, return_scores=False, keep_best=True, objective='auto', **param_grids)¶

performs a grid search over a space of parameters for a given objective

NOTE: gridsearch method is lazy and will not remove useless combinations from the search space, eg.

>>> n_splines=np.arange(5,10), fit_splines=[True, False]

will result in 10 loops, of which 5 are equivalent because even though fit_splines==False

it is not recommended to search over a grid that alternates between known scales and unknown scales, as the scores of the candidate models will not be comparable.

Parameters:

X (array) – input data of shape (n_samples, m_features)
y (array) – label data of shape (n_samples,)
exposure (array-like shape (n_samples,) or None, default: None) – containing exposures if None, defaults to array of ones
weights (array-like shape (n_samples,) or None, default: None) – containing sample weights if None, defaults to array of ones
return_scores (boolean, default False) – whether to return the hyperpamaters and score for each element in the grid
keep_best (boolean) – whether to keep the best GAM as self. default: True
objective (string, default: 'auto') – metric to optimize. must be in [‘AIC’, ‘AICc’, ‘GCV’, ‘UBRE’, ‘auto’] if ‘auto’, then grid search will optimize GCV for models with unknown scale and UBRE for models with known scale.
**kwargs (dict, default {'lam': np.logspace(-3, 3, 11)}) –
pairs of parameters and iterables of floats, or parameters and iterables of iterables of floats.

if iterable of iterables of floats, the outer iterable must have length m_features.

the method will make a grid of all the combinations of the parameters and fit a GAM to each combination.

Returns:

if return_values == True –

model_scores : dict

Contains each fitted model as keys and corresponding objective scores as values
else – self, ie possibly the newly fitted model

loglikelihood(X, y, exposure=None, weights=None)¶

compute the log-likelihood of the dataset using the current model

Parameters:	X (array-like of shape (n_samples, m_features)) – containing the input dataset y (array-like of shape (n,)) – containing target values exposure (array-like shape (n_samples,) or None, default: None) – containing exposures if None, defaults to array of ones weights (array-like of shape (n,)) – containing sample weights
Returns:	log-likelihood – containing log-likelihood scores
Return type:	np.array of shape (n,)

predict(X, exposure=None)¶

preduct expected value of target given model and input X often this is done via expected value of GAM given input X

Parameters:	X (array-like of shape (n_samples, m_features), default: None) – containing the input dataset exposure (array-like shape (n_samples,) or None, default: None) – containing exposures if None, defaults to array of ones
Returns:	y – containing predicted values under the model
Return type:	np.array of shape (n_samples,)