TY - JOUR
T1 - Enumerating all maximal biclusters in numerical datasets
AU - Veroneze, Rosana
AU - Banerjee, Arindam
AU - Von Zuben, Fernando J.
N1 - Publisher Copyright:
© 2016 Elsevier Inc.
PY - 2017/2/10
Y1 - 2017/2/10
N2 - Biclustering has proved to be a powerful data analysis technique due to its wide success in various application domains. However, the existing literature presents efficient solutions only for enumerating maximal biclusters with constant values, or heuristic-based approaches which cannot find all biclusters or even support the maximality of the obtained biclusters. Here, we present a general family of biclustering algorithms for enumerating all maximal biclusters with (i) constant values on rows, (ii) constant values on columns, or (iii) coherent values. Versions for perfect and for perturbed biclusters are provided. Our algorithms have four key properties (only the algorithm for perturbed biclusters with coherent values fails to exhibit the first property): they are (1) efficient (take polynomial time per pattern), (2) complete (find all maximal biclusters), (3) correct (all biclusters attend the user-defined measure of similarity), and (4) non-redundant (all the obtained biclusters are maximal and the same bicluster is not enumerated twice). They are based on a generalization of an efficient formal concept analysis algorithm called In-Close2. Experimental results point to the necessity of having efficient enumerative biclustering algorithms and provide a valuable insight into the scalability of our family of algorithms and its sensitivity to user-defined parameters.
AB - Biclustering has proved to be a powerful data analysis technique due to its wide success in various application domains. However, the existing literature presents efficient solutions only for enumerating maximal biclusters with constant values, or heuristic-based approaches which cannot find all biclusters or even support the maximality of the obtained biclusters. Here, we present a general family of biclustering algorithms for enumerating all maximal biclusters with (i) constant values on rows, (ii) constant values on columns, or (iii) coherent values. Versions for perfect and for perturbed biclusters are provided. Our algorithms have four key properties (only the algorithm for perturbed biclusters with coherent values fails to exhibit the first property): they are (1) efficient (take polynomial time per pattern), (2) complete (find all maximal biclusters), (3) correct (all biclusters attend the user-defined measure of similarity), and (4) non-redundant (all the obtained biclusters are maximal and the same bicluster is not enumerated twice). They are based on a generalization of an efficient formal concept analysis algorithm called In-Close2. Experimental results point to the necessity of having efficient enumerative biclustering algorithms and provide a valuable insight into the scalability of our family of algorithms and its sensitivity to user-defined parameters.
KW - Efficient enumeration
KW - Maximal biclusters
KW - Multiple types of biclusters
KW - Numerical datasets
KW - Perfect and perturbed biclusters
UR - http://www.scopus.com/inward/record.url?scp=84996504056&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84996504056&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2016.10.029
DO - 10.1016/j.ins.2016.10.029
M3 - Article
AN - SCOPUS:84996504056
SN - 0020-0255
VL - 379
SP - 288
EP - 309
JO - Information Sciences
JF - Information Sciences
ER -