James P. Long, Noureddine El Karoui, John A. Rice, Joseph W. Richards, Joshua S. Bloom
Efficient and automated classification of periodic variable stars is becoming
increasingly important as the scale of astronomical surveys grows. Several
recent papers have used methods from machine learning and statistics to
construct classifiers on databases of labeled, multi--epoch sources with the
intention of using these classifiers to automatically infer the classes of
unlabeled sources from new surveys. However, the same source observed with two
different synoptic surveys will generally yield different derived metrics
(features) from the light curve. Since such features are used in classifiers,
this survey-dependent mismatch in feature space will typically lead to degraded
classifier performance. In this paper we show how and why feature distributions
change using OGLE and \textit{Hipparcos} light curves. To overcome survey
systematics, we apply a method, \textit{noisification}, which attempts to
empirically match distributions of features between the labeled sources used to
construct the classifier and the unlabeled sources we wish to classify. Results
from simulated and real--world light curves show that noisification can
significantly improve classifier performance. In a three--class problem using
light curves from \textit{Hipparcos} and OGLE, noisification reduces the
classifier error rate from 27.0% to 7.0%. We recommend that noisification be
used for upcoming surveys such as Gaia and LSST and describe some of the
promises and challenges of applying noisification to these surveys.
View original:
http://arxiv.org/abs/1201.4863
No comments:
Post a Comment