Data-Adaptive Active Sampling for Efficient Graph-Cognizant Classification

Dimitris Berberidis, Georgios B. Giannakis

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

This paper deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using similarity measures among nodal features. Leveraging the graph for classification builds on the premise that labels across neighboring nodes are correlated according to a categorical Markov random field (MRF). This model is further relaxed to a Gaussian (G)MRF with labels taking continuous values - an approximation that not only mitigates the combinatorial complexity of the categorical model, but also offers optimal unbiased soft predictors of the unlabeled nodes. The proposed sampling strategy is based on querying the node whose label disclosure is expected to inflict the largest change on the GMRF, and in this sense it is the most informative on average. Connections are established to other sampling methods including uncertainty sampling, variance minimization, and sampling based on the Σ-optimality criterion. A simple yet effective heuristic is also introduced for increasing the exploration capabilities of the sampler, and reducing bias of the resultant classifier, by adjusting the confidence on the model label predictions. The novel sampling strategies are based on quantities that are readily available without the need for model retraining, rendering them computationally efficient and scalable to large graphs. Numerical tests using synthetic and real data demonstrate that the proposed methods achieve accuracy that is comparable or superior to the state of the art even at reduced runtime.

Original languageEnglish (US)
Article number8445612
Pages (from-to)5167-5179
Number of pages13
JournalIEEE Transactions on Signal Processing
Volume66
Issue number19
DOIs
StatePublished - Oct 1 2018

Fingerprint

Sampling
Labels
Classifiers

Keywords

  • Active learning
  • classification
  • expected change
  • graph-based

Cite this

Data-Adaptive Active Sampling for Efficient Graph-Cognizant Classification. / Berberidis, Dimitris; Giannakis, Georgios B.

In: IEEE Transactions on Signal Processing, Vol. 66, No. 19, 8445612, 01.10.2018, p. 5167-5179.

Research output: Contribution to journalArticle

@article{6667b29c9db044c683d7c761f727285a,
title = "Data-Adaptive Active Sampling for Efficient Graph-Cognizant Classification",
abstract = "This paper deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using similarity measures among nodal features. Leveraging the graph for classification builds on the premise that labels across neighboring nodes are correlated according to a categorical Markov random field (MRF). This model is further relaxed to a Gaussian (G)MRF with labels taking continuous values - an approximation that not only mitigates the combinatorial complexity of the categorical model, but also offers optimal unbiased soft predictors of the unlabeled nodes. The proposed sampling strategy is based on querying the node whose label disclosure is expected to inflict the largest change on the GMRF, and in this sense it is the most informative on average. Connections are established to other sampling methods including uncertainty sampling, variance minimization, and sampling based on the Σ-optimality criterion. A simple yet effective heuristic is also introduced for increasing the exploration capabilities of the sampler, and reducing bias of the resultant classifier, by adjusting the confidence on the model label predictions. The novel sampling strategies are based on quantities that are readily available without the need for model retraining, rendering them computationally efficient and scalable to large graphs. Numerical tests using synthetic and real data demonstrate that the proposed methods achieve accuracy that is comparable or superior to the state of the art even at reduced runtime.",
keywords = "Active learning, classification, expected change, graph-based",
author = "Dimitris Berberidis and Giannakis, {Georgios B.}",
year = "2018",
month = "10",
day = "1",
doi = "10.1109/TSP.2018.2866812",
language = "English (US)",
volume = "66",
pages = "5167--5179",
journal = "IEEE Transactions on Signal Processing",
issn = "1053-587X",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "19",

}

TY - JOUR

T1 - Data-Adaptive Active Sampling for Efficient Graph-Cognizant Classification

AU - Berberidis, Dimitris

AU - Giannakis, Georgios B.

PY - 2018/10/1

Y1 - 2018/10/1

N2 - This paper deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using similarity measures among nodal features. Leveraging the graph for classification builds on the premise that labels across neighboring nodes are correlated according to a categorical Markov random field (MRF). This model is further relaxed to a Gaussian (G)MRF with labels taking continuous values - an approximation that not only mitigates the combinatorial complexity of the categorical model, but also offers optimal unbiased soft predictors of the unlabeled nodes. The proposed sampling strategy is based on querying the node whose label disclosure is expected to inflict the largest change on the GMRF, and in this sense it is the most informative on average. Connections are established to other sampling methods including uncertainty sampling, variance minimization, and sampling based on the Σ-optimality criterion. A simple yet effective heuristic is also introduced for increasing the exploration capabilities of the sampler, and reducing bias of the resultant classifier, by adjusting the confidence on the model label predictions. The novel sampling strategies are based on quantities that are readily available without the need for model retraining, rendering them computationally efficient and scalable to large graphs. Numerical tests using synthetic and real data demonstrate that the proposed methods achieve accuracy that is comparable or superior to the state of the art even at reduced runtime.

AB - This paper deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using similarity measures among nodal features. Leveraging the graph for classification builds on the premise that labels across neighboring nodes are correlated according to a categorical Markov random field (MRF). This model is further relaxed to a Gaussian (G)MRF with labels taking continuous values - an approximation that not only mitigates the combinatorial complexity of the categorical model, but also offers optimal unbiased soft predictors of the unlabeled nodes. The proposed sampling strategy is based on querying the node whose label disclosure is expected to inflict the largest change on the GMRF, and in this sense it is the most informative on average. Connections are established to other sampling methods including uncertainty sampling, variance minimization, and sampling based on the Σ-optimality criterion. A simple yet effective heuristic is also introduced for increasing the exploration capabilities of the sampler, and reducing bias of the resultant classifier, by adjusting the confidence on the model label predictions. The novel sampling strategies are based on quantities that are readily available without the need for model retraining, rendering them computationally efficient and scalable to large graphs. Numerical tests using synthetic and real data demonstrate that the proposed methods achieve accuracy that is comparable or superior to the state of the art even at reduced runtime.

KW - Active learning

KW - classification

KW - expected change

KW - graph-based

UR - http://www.scopus.com/inward/record.url?scp=85052698722&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85052698722&partnerID=8YFLogxK

U2 - 10.1109/TSP.2018.2866812

DO - 10.1109/TSP.2018.2866812

M3 - Article

AN - SCOPUS:85052698722

VL - 66

SP - 5167

EP - 5179

JO - IEEE Transactions on Signal Processing

JF - IEEE Transactions on Signal Processing

SN - 1053-587X

IS - 19

M1 - 8445612

ER -