Skip to content

Model-based clustering based on parameterized finite Gaussian mixture models. Models are estimated by EM algorithm initialized by hierarchical model-based agglomerative clustering. The optimal model is then selected according to BIC.

License

Notifications You must be signed in to change notification settings

KalinNonchev/mclustpy

Repository files navigation

Downloads Python package contributions welcome

mclustpy

mclustpy is a Python function for clustering data using the Mclust algorithm from the R package mclust. The function takes a 2D numpy array of data and returns a dictionary containing various output values computed by the Mclust algorithm.

Installation

mclustpy requires the following dependencies:

  • numpy
  • rpy2

To install mclustpy, you can use pip:

pip install mclustpy

Usage

from mclustpy import mclustpy
import numpy as np

data = np.random.rand(1000, 10)
data.shape

res = mclustpy(data, G=9, modelNames='EEE', random_seed=2020)

The mclustpy function takes the following parameters:

  • data: a 2D numpy array of data to be clustered.
  • G: an integer specifying the maximum number of mixture components to be considered (default is 9).
  • modelNames: a string specifying the model types to be considered (default is 'EEE').
  • random_seed: an integer specifying the random seed for reproducibility (default is 2020).

The function returns a dictionary containing the following output values:

  • call: the function call used to run the Mclust algorithm.
  • data: the input data as an R matrix.
  • modelName: the model name(s) selected by the algorithm.
  • n: the number of observations in the data.
  • d: the number of variables in the data.
  • G: the number of mixture components selected by the algorithm.
  • BIC: the Bayesian Information Criterion (BIC) value for the selected model.
  • loglik: the log-likelihood of the selected model.
  • df: the number of degrees of freedom in the selected model.
  • bic: the BIC value for each model considered.
  • icl: the Integrated Completed Likelihood (ICL) value for each model considered.
  • hypvol: the hypervolume of the cluster tree for each model considered.
  • parameters: the estimated parameters for each component in the selected model.
  • z: the posterior probabilities of assignment to each component for each observation.
  • classification: the classification of each observation under the selected model.
  • uncertainty: a measure of uncertainty in the classification of each observation.

For more info take a look at the original mclust page

License Notice:

This package, mclustpy, is licensed under the MIT License. However, it depends on the R package mclust, which is licensed under the GNU General Public License (GPL ≥2). Users must ensure compliance with the GPL license when using mclustpy.

About

Model-based clustering based on parameterized finite Gaussian mixture models. Models are estimated by EM algorithm initialized by hierarchical model-based agglomerative clustering. The optimal model is then selected according to BIC.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy