A new algorithm and theory for penalized regression-based clustering

Chong Wu, Sunghoon Kwon, Xiaotong T Shen, Wei Pan

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Clustering is unsupervised and exploratory in nature. Yet, it can be performed through penalized regression with grouping pursuit, as demonstrated in Pan et al. (2013). In this paper, we develop a more efficient algorithm for scalable computation and a new theory of clustering consistency for the method. This algorithm, called DC-ADMM, combines difference of convex (DC) programming with the alternating direction method of multipliers (ADMM). This algorithm is shown to be more computationally efficient than the quadratic penalty based algorithm of Pan et al. (2013) because of the former's closed-form updating formulas. Numerically, we compare the DC-ADMM algorithm with the quadratic penalty algorithm to demonstrate its utility and scalability. Theoretically, we establish a finitesample mis-clustering error bound for penalized regression based clustering with the L0 constrained regularization in a general setting. On this ground, we provide conditions for clustering consistency of the penalized clustering method. As an end product, we put R package prclust implementing PRclust with various loss and grouping penalty functions available on GitHub and CRAN.

Original languageEnglish (US)
JournalJournal of Machine Learning Research
Volume17
StatePublished - Oct 1 2016

Fingerprint

Penalized Regression
Method of multipliers
Alternating Direction Method
Clustering
Grouping
Penalty
Convex Programming
Pursuit
Penalty Function
Clustering Methods
Convex optimization
Error Bounds
Updating
Regularization
Scalability
Closed-form
Efficient Algorithms
Demonstrate

Keywords

  • Alternating direction method of multipliers (ADMM)
  • Clustering consistency
  • Difference of convex (DC) programming
  • Truncated L1-penalty (TLP)

Cite this

A new algorithm and theory for penalized regression-based clustering. / Wu, Chong; Kwon, Sunghoon; Shen, Xiaotong T; Pan, Wei.

In: Journal of Machine Learning Research, Vol. 17, 01.10.2016.

Research output: Contribution to journalArticle

@article{25a22f3fd65140f8ba3301584d6de569,
title = "A new algorithm and theory for penalized regression-based clustering",
abstract = "Clustering is unsupervised and exploratory in nature. Yet, it can be performed through penalized regression with grouping pursuit, as demonstrated in Pan et al. (2013). In this paper, we develop a more efficient algorithm for scalable computation and a new theory of clustering consistency for the method. This algorithm, called DC-ADMM, combines difference of convex (DC) programming with the alternating direction method of multipliers (ADMM). This algorithm is shown to be more computationally efficient than the quadratic penalty based algorithm of Pan et al. (2013) because of the former's closed-form updating formulas. Numerically, we compare the DC-ADMM algorithm with the quadratic penalty algorithm to demonstrate its utility and scalability. Theoretically, we establish a finitesample mis-clustering error bound for penalized regression based clustering with the L0 constrained regularization in a general setting. On this ground, we provide conditions for clustering consistency of the penalized clustering method. As an end product, we put R package prclust implementing PRclust with various loss and grouping penalty functions available on GitHub and CRAN.",
keywords = "Alternating direction method of multipliers (ADMM), Clustering consistency, Difference of convex (DC) programming, Truncated L1-penalty (TLP)",
author = "Chong Wu and Sunghoon Kwon and Shen, {Xiaotong T} and Wei Pan",
year = "2016",
month = "10",
day = "1",
language = "English (US)",
volume = "17",
journal = "Journal of Machine Learning Research",
issn = "1532-4435",
publisher = "Microtome Publishing",

}

TY - JOUR

T1 - A new algorithm and theory for penalized regression-based clustering

AU - Wu, Chong

AU - Kwon, Sunghoon

AU - Shen, Xiaotong T

AU - Pan, Wei

PY - 2016/10/1

Y1 - 2016/10/1

N2 - Clustering is unsupervised and exploratory in nature. Yet, it can be performed through penalized regression with grouping pursuit, as demonstrated in Pan et al. (2013). In this paper, we develop a more efficient algorithm for scalable computation and a new theory of clustering consistency for the method. This algorithm, called DC-ADMM, combines difference of convex (DC) programming with the alternating direction method of multipliers (ADMM). This algorithm is shown to be more computationally efficient than the quadratic penalty based algorithm of Pan et al. (2013) because of the former's closed-form updating formulas. Numerically, we compare the DC-ADMM algorithm with the quadratic penalty algorithm to demonstrate its utility and scalability. Theoretically, we establish a finitesample mis-clustering error bound for penalized regression based clustering with the L0 constrained regularization in a general setting. On this ground, we provide conditions for clustering consistency of the penalized clustering method. As an end product, we put R package prclust implementing PRclust with various loss and grouping penalty functions available on GitHub and CRAN.

AB - Clustering is unsupervised and exploratory in nature. Yet, it can be performed through penalized regression with grouping pursuit, as demonstrated in Pan et al. (2013). In this paper, we develop a more efficient algorithm for scalable computation and a new theory of clustering consistency for the method. This algorithm, called DC-ADMM, combines difference of convex (DC) programming with the alternating direction method of multipliers (ADMM). This algorithm is shown to be more computationally efficient than the quadratic penalty based algorithm of Pan et al. (2013) because of the former's closed-form updating formulas. Numerically, we compare the DC-ADMM algorithm with the quadratic penalty algorithm to demonstrate its utility and scalability. Theoretically, we establish a finitesample mis-clustering error bound for penalized regression based clustering with the L0 constrained regularization in a general setting. On this ground, we provide conditions for clustering consistency of the penalized clustering method. As an end product, we put R package prclust implementing PRclust with various loss and grouping penalty functions available on GitHub and CRAN.

KW - Alternating direction method of multipliers (ADMM)

KW - Clustering consistency

KW - Difference of convex (DC) programming

KW - Truncated L1-penalty (TLP)

UR - http://www.scopus.com/inward/record.url?scp=84995388348&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84995388348&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84995388348

VL - 17

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

ER -