The problem of node classification in networks is an important one in a wide variety of social networking domains. In many real applications such as product recommendations, the class of interest may be very rare. In such scenarios, it is often very difficult to learn the most relevant node classification characteristics, both because of the paucity of training data, and because of poor connectivity among rare class nodes in the network structure. Node classification methods crucially dependent upon structural homophily, and a lack of connectivity among rare class nodes can create significant challenges. However, many such social networks are content-rich, and the content-rich nature of such networks can be leveraged to compensate for the lack of structural connectivity among rare class nodes. While content-centric and semi-supervised methods have been used earlier in the context of paucity of labeled data, the rare class scenario has not been investigated in this context. In fact, we are not aware of any known classification method which is tailored towards rare class detection in networks. This paper will present a spectral approach for rare-class detection, which uses a distance-preserving transform, in order to combine the structural information in the network with the available content. We will show the advantage of this approach over traditional methods for collective classification.
|Original language||English (US)|
|Title of host publication||SIAM International Conference on Data Mining 2015, SDM 2015|
|Editors||Suresh Venkatasubramanian, Jieping Ye|
|Publisher||Society for Industrial and Applied Mathematics Publications|
|Number of pages||9|
|State||Published - 2015|
|Event||SIAM International Conference on Data Mining 2015, SDM 2015 - Vancouver, Canada|
Duration: Apr 30 2015 → May 2 2015
|Name||SIAM International Conference on Data Mining 2015, SDM 2015|
|Other||SIAM International Conference on Data Mining 2015, SDM 2015|
|Period||4/30/15 → 5/2/15|
Bibliographical notePublisher Copyright:
Copyright © SIAM.