Estimate best K for K-Means in parallel

Gap statistic is often used to determine the best number of clusters. Please see a local version implementation for gap statistic here: https://github.com/echen/gap-statistic. It is often desired to parallelize such tedious job to boost the speed. I implement a parallelized version basd on the source code: library(plyr) library(ggplot2) # Calculate log(sum_i(within-cluster_i sum of squares around cluster_i …

How to view all columns of a data.frame in R

If you want to view a large data.frame with a large number of columns in R, you will only see first several columns in the UI. After reading this post, I found that if using utils::View(your_data_frame) , you can view all the columns in a new window. Reference: http://stackoverflow.com/questions/19341853/r-view-does-not-display-all-columns-of-data-frame