Wednesday, July 31, 2013

1307.8012 (R. J. Lyon et al.)

A Study on Classification in Imbalanced and Partially-Labelled Data Streams    [PDF]

R. J. Lyon, J. M. Brooke, J. D. Knowles, B. W. Stappers
The domain of radio astronomy is currently facing significant computational challenges, foremost amongst which are those posed by the development of the world's largest radio telescope, the Square Kilometre Array (SKA). Preliminary specifications for this instrument suggest that the final design will incorporate between 2000 and 3000 individual 15 metre receiving dishes, which together can be expected to produce a data rate of many TB/s. Given such a high data rate, it becomes crucial to consider how this information will be processed and stored to maximise its scientific utility. In this paper, we consider one possible data processing scenario for the SKA, for the purposes of an all-sky pulsar survey. In particular we treat the selection of promising signals from the SKA processing pipeline as a data stream classification problem. We consider the feasibility of classifying signals that arrive via an unlabelled and heavily class imbalanced data stream, using currently available algorithms and frameworks. Our results indicate that existing stream learners exhibit unacceptably low recall on real astronomical data when used in standard configuration; however, good false positive performance and comparable accuracy to static learners, suggests they have definite potential as an on-line solution to this particular big data challenge.
View original: http://arxiv.org/abs/1307.8012

No comments:

Post a Comment