tslearn.clustering
.KernelKMeans¶

class
tslearn.clustering.
KernelKMeans
(n_clusters=3, kernel='gak', max_iter=50, tol=1e06, n_init=1, kernel_params=None, sigma=1.0, n_jobs=None, verbose=0, random_state=None)[source]¶ Kernel Kmeans.
Parameters:  n_clusters : int (default: 3)
Number of clusters to form.
 kernel : string, or callable (default: “gak”)
The kernel should either be “gak”, in which case the Global Alignment Kernel from [2] is used or a value that is accepted as a metric by scikitlearn’s pairwise_kernels
 max_iter : int (default: 50)
Maximum number of iterations of the kmeans algorithm for a single run.
 tol : float (default: 1e6)
Inertia variation threshold. If at some point, inertia varies less than this threshold between two consecutive iterations, the model is considered to have converged and the algorithm stops.
 n_init : int (default: 1)
Number of time the kmeans algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.
 kernel_params : dict or None (default: None)
Kernel parameters to be passed to the kernel function. None means no kernel parameter is set. For Global Alignment Kernel, the only parameter of interest is sigma. If set to ‘auto’, it is computed based on a sampling of the training set (cf tslearn.metrics.sigma_gak). If no specific value is set for sigma, its defaults to 1.
 sigma : float or “auto” (default: “auto”)
Bandwidth parameter for the Global Alignment kernel. If set to ‘auto’, it is computed based on a sampling of the training set (cf tslearn.metrics.sigma_gak)
Deprecated since version 0.4: Setting sigma directly as a parameter for KernelKMeans and GlobalAlignmentKernelKMeans is deprecated in version 0.4 and will be removed in 0.6. Use kernel_params instead.
 n_jobs : int or None, optional (default=None)
The number of jobs to run in parallel for GAK crosssimilarity matrix computations.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See scikitlearns’ Glossary for more details. verbose : int (default: 0)
If nonzero, joblib progress messages are printed.
 random_state : integer or numpy.RandomState, optional
Generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.
Attributes:  labels_ : numpy.ndarray
Labels of each point
 inertia_ : float
Sum of distances of samples to their closest cluster center (computed using the kernel trick).
 sample_weight_ : numpy.ndarray
The weight given to each sample from the data provided to fit.
 n_iter_ : int
The number of iterations performed during fit.
Notes
The training data are saved to disk if this model is serialized and may result in a large model file if the training dataset is large.
References
[1] Kernel kmeans, Spectral Clustering and Normalized Cuts. Inderjit S. Dhillon, Yuqiang Guan, Brian Kulis. KDD 2004. [2] Fast Global Alignment Kernels. Marco Cuturi. ICML 2011. Examples
>>> from tslearn.generators import random_walks >>> X = random_walks(n_ts=50, sz=32, d=1) >>> gak_km = KernelKMeans(n_clusters=3, kernel="gak", random_state=0) >>> gak_km.fit(X) # doctest: +ELLIPSIS KernelKMeans(...) >>> print(numpy.unique(gak_km.labels_)) [0 1 2]
Methods
fit
(X[, y, sample_weight])Compute kernel kmeans clustering. fit_predict
(X[, y])Fit kernel kmeans clustering using X and then predict the closest cluster each time series in X belongs to. from_hdf5
(path)Load model from a HDF5 file. from_json
(path)Load model from a JSON file. from_pickle
(path)Load model from a pickle file. get_params
([deep])Get parameters for this estimator. predict
(X)Predict the closest cluster each time series in X belongs to. set_params
(**params)Set the parameters of this estimator. to_hdf5
(path)Save model to a HDF5 file. to_json
(path)Save model to a JSON file. to_pickle
(path)Save model to a pickle file. 
fit
(X, y=None, sample_weight=None)[source]¶ Compute kernel kmeans clustering.
Parameters:  X : arraylike of shape=(n_ts, sz, d)
Time series dataset.
 y
Ignored
 sample_weight : arraylike of shape=(n_ts, ) or None (default: None)
Weights to be given to time series in the learning process. By default, all time series weights are equal.

fit_predict
(X, y=None)[source]¶ Fit kernel kmeans clustering using X and then predict the closest cluster each time series in X belongs to.
It is more efficient to use this method than to sequentially call fit and predict.
Parameters:  X : arraylike of shape=(n_ts, sz, d)
Time series dataset to predict.
 y
Ignored
Returns:  labels : array of shape=(n_ts, )
Index of the cluster each sample belongs to.

classmethod
from_hdf5
(path)[source]¶ Load model from a HDF5 file. Requires
h5py
http://docs.h5py.org/Parameters:  path : str
Full path to file.
Returns:  Model instance

classmethod
from_json
(path)[source]¶ Load model from a JSON file.
Parameters:  path : str
Full path to file.
Returns:  Model instance

classmethod
from_pickle
(path)[source]¶ Load model from a pickle file.
Parameters:  path : str
Full path to file.
Returns:  Model instance

get_params
(deep=True)[source]¶ Get parameters for this estimator.
Parameters:  deep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:  params : dict
Parameter names mapped to their values.

predict
(X)[source]¶ Predict the closest cluster each time series in X belongs to.
Parameters:  X : arraylike of shape=(n_ts, sz, d)
Time series dataset to predict.
Returns:  labels : array of shape=(n_ts, )
Index of the cluster each sample belongs to.

set_params
(**params)[source]¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.Parameters:  **params : dict
Estimator parameters.
Returns:  self : estimator instance
Estimator instance.

to_hdf5
(path)[source]¶ Save model to a HDF5 file. Requires
h5py
http://docs.h5py.org/Parameters:  path : str
Full file path. File must not already exist.
Raises:  FileExistsError
If a file with the same path already exists.