Context :
SelvarClustIndep is a software implemented in C++ with object-oriented programming. It is devoted to the variable selection in model-based clustering.
It is a greedy algorithm associated to the SRUW modeling proposed by C.Maugis, G.Celeux and M.-L. Martin-Magniette in [1] and [2], modifying the method of Raftery and Dean [3] and improving our SelvarClust algorithm [4]. The SRUW modeling takes
into account the three possible roles: relevant, redundant and independent variables. This software allows to study datasets where observations are described
by quantitative variables. It returns a data clustering and the selected model composed of the number of clusters, the mixture form, the variance matrix form
for the linear regression and the independent Gaussian density, and the variable partition.
Main references :
- [1] Maugis, C., Celeux, G. and Martin-Magniette, M.-L. (2009) Variable selection in model-based clustering: A general variable role modeling. Computational Statistics and Data Analysis, 53, 3872-3882.
- [2] Maugis, C., Celeux, G. and Martin-Magniette, M.-L. (2008) Variable selection in model-based clustering: A general variable role modeling. INRIA Research Report, RR-6744.
- [3] Raftery, A.E. and Dean, N. (2006) Variable Selection for Model-Based Clustering. Journal
of the American Statistical Association, 101, 168-178.
- [4] Maugis, C., Celeux, G. and Martin-Magniette, M.-L. (2009) Variable selection for Clustering with Gaussian Mixture Models. Biometrics, 65, 701-709.
- SelvarClustIndep uses Mixmod software (version 2.1.1) available here.
First install the mixmod software (see the Quick Start for an installation help).
In the following, we call mixmodDir the full path of the directory where Mixmod software is located.
- Declare the path of Mixmod by adding the following command in the bash shell : export PATH=mixmodDir/Mixmod/BIN:$PATH
- Download the following .zip file containing the .cpp files, the .hpp files and the Makefile for Linux.
SelvarClustIndep.zip | |
Unzip SelvarClustIndep.zip in a directory. In the following, the full path of this directory is called SelvarClustIndepDir.
Compile with the command make. The executable SelvarClustIndep is then created. You can declare this executable in the bash shell with the command export PATH=SelvarClustIndepDir:$PATH
Arguments and Usage in Linux:
For running the
SelvarClustIndep algorithm, use the following command :
nohup ./SelvarClustIndep Arg1 Arg2 Arg3 Arg4 Arg5 Arg6
with the following arguments :
Arg1 : | path of the file containing the data (Example : /home/example/Data.txt) |
Arg2 : | path of the file containing the considered cluster numbers (Example : /home/example/NbClusters.txt) |
| NbClusters.txt contains a column given the considered numbers of Gaussian mixture components. |
Arg3 : | path of the file containing the considered Gaussian mixture forms (Example : /home/example/MixtureForms.txt) |
| MixtureForms.txt contains a column given the number of each considered Gaussian mixture forms according to the correspondence table. |
Arg4 : | path of the file containing the considered forms of the regression covariance matrix (Example : /home/example/RegForms.txt) |
| RegForms.txt contains a column given the number of each considered form (1: spherical form, 2: diagonal form, 3: general form) |
Arg5 : | path of the file containing the considered forms for the variance matrix of the independent Gaussian density (Example : /home/example/IndepForms.txt) |
| IndepForms.txt contains a column given the number of each considered form (1: spherical form, 2: diagonal form) |
Arg6 : | path of the directory where the results will be saved (Example : /home/Results) |
Results :
After using the
SelvarClustIndep algorithm, the directory given in
Arg6 for saving results contains the following files:
- SelectedModel.txt : Give the model which is selected with the SelvarClustIndep algorithm.
- res.txt : Give the variable partition and the criterion value for each model (K,m,r,l).
- MAPlabelFinal.txt : Give the data clustering obtained with the MAP (maximum a posteriori) rule.
- probaposterior.txt : Give the conditional probabilities for each individual.
- mixtureparameters.txt : Give the proportion, the mean vector and the variance matrix for each mixture component.
- regressionparameters.txt : Give the intercept, the coefficient matrix and the covariance matrix for the regression of the redundant variables U on a subset R of relevant clustering variables S.
- indepparameters.txt : Give the estimated parameters of the independent Gaussian density.
Examples :
Three examples are given below. DATA.zip contains the files for using
SelvarClustIndep with the command
nohup ./SelvarClustIndep DATAxxx.txt k.txt m.txt reg.txt indep.txt Resultsxxx/.
DATA1.txt, DATA2.txt and DATA3.txt contain a dataset simulated according to Scenario 1, Scenario 5 and Scenario 6 respectively (see Section "Seven simulated situations" in [1] ).
- SelvarClustIndep uses Mixmod software (version 2.1.1) available here.
First install the mixmod software in the folder C:\Program Files\Mixmod and with the name Mixmod. See the Quick Start for an installation help.
- Declare the path of Mixmod : From the desktop, right-click My Computer and click properties. In the System Properties window, click on the Advanced tab. In the Advanced section, click the Environment Variables button.
Finally, in the Environment Variables window, highlight the path variable in the Systems Variable section and click edit. Add a semicolon and the path
C:\Program Files\Mixmod\BIN.
- Download the following executable in a directory whose the full path is called SelvarClustIndepDir in the following.
SelvarClustIndepWindows.exe | |
Arguments and Usage in Windows:
For running the
SelvarClustIndep algorithm, use the following command :
SelvarClustIndepDir\SelvarClustIndepWindows.exe Arg1 Arg2 Arg3 Arg4 Arg5 Arg6
with the following arguments :
Arg1 : | path of the file containing the data |
Arg2 : | path of the file containing the considered cluster numbers |
| NbClusters.txt contains a column given the considered numbers of Gaussian mixture components. |
Arg3 : | path of the file containing the considered Gaussian mixture forms |
| MixtureForms.txt contains a column given the number of each considered Gaussian mixture forms according to the correspondence table. |
Arg4 : | path of the file containing the considered forms of the regression covariance matrix |
| RegForms.txt contains a column given the number of each considered form (1: spherical form, 2: diagonal form, 3: general form) |
Arg5 : | path of the file containing the considered forms for the Gaussian density variance matrix |
| IndepForms.txt contains a column given the number of each considered form (1: spherical form, 2: diagonal form) |
Arg6 : | path of the directory where the results will be saved |
Results :
After using the
SelvarClustIndep algorithm, the directory given in
Arg6 for saving results contains the following files:
- SelectedModel.txt : Give the model which is selected with the SelvarClustIndep algorithm.
- res.txt : Give the variable partition and the criterion value for each model (K,m,r,l).
- MAPlabelFinal.txt : Give the data clustering obtained with the MAP (maximum a posteriori) rule.
- probaposterior.txt : Give the conditional probabilities for each individual.
- mixtureparameters.txt : Give the proportion, the mean vector and the variance matrix for each mixture component.
- regressionparameters.txt : Give the intercept, the coefficient matrix and the covariance matrix for the regression of the redundant variables U on a subset R of relevant clustering variables S.
- indepparameters.txt : Give the estimated parameters of the independent Gaussian density.
Bugs and Feedback - Contacts
Send an e-mail with the subject "Bugs-SelvarClustIndep" at cathy.maugis -AT- insa-toulouse.fr