Theory of Computing
-------------------
Title : Correlation Clustering with a Fixed Number of Clusters
Authors : Ioannis Giotis and Venkatesan Guruswami
Volume : 2
Number : 13
Pages : 249-266
URL : http://www.theoryofcomputing.org/articles/v002a013
Abstract
--------
We continue the investigation of problems concerning
"correlation clustering" or "clustering with qualitative information,"
which is a clustering formulation that has been studied recently
(Bansal, Blum, Chawla (2004), Charikar, Guruswami, Wirth (FOCS'03),
Charikar, Wirth (FOCS'04), Alon et al. (STOC'05)).
In this problem, we are given a complete graph on n nodes (which
correspond to nodes to be clustered) whose edges are labeled +
(for similar pairs of items) and - (for dissimilar pairs of
items). Thus our input consists of only qualitative information on
similarity and no quantitative distance measure between items. The
quality of a clustering is measured in terms of its number of
agreements, which is simply the number of edges it correctly
classifies, that is the sum of number of - edges whose endpoints
it places in different clusters plus the number of + edges both of
whose endpoints it places within the same cluster.
In this paper, we study the problem of finding clusterings that maximize
the number of agreements, and the complementary minimization version
where we seek clusterings that minimize the number of disagreements.
We focus on the situation when the number of clusters is stipulated
to be a small constant k. Our main result is that for every k,
there is a polynomial time approximation scheme for both maximizing
agreements and minimizing disagreements. (The problems are NP-hard
for every k \ge 2.) The main technical work is for the minimization
version, as the PTAS for maximizing agreements follows along the lines
of the property tester for Max k-CUT by Goldreich, Goldwasser, Ron (1998).
In contrast, when the number of clusters is not specified, the problem
of minimizing disagreements was shown to be APX-hard (Chawla, Guruswami,
Wirth (FOCS'03)), even though the maximization version admits a PTAS.