Scalable k-NN based text clustering