Gap statistic is often used to determine the best number of clusters. Please see a local version implementation for gap statistic here: https://github.com/echen/gap-statistic. It is often desired to parallelize such tedious job to boost the speed. I implement a parallelized version basd on the source code: library(plyr) library(ggplot2) # Calculate log(sum_i(within-cluster_i sum of squares around cluster_i …
Category Archives: R
How to view all columns of a data.frame in R
If you want to view a large data.frame with a large number of columns in R, you will only see first several columns in the UI. After reading this post, I found that if using utils::View(your_data_frame) , you can view all the columns in a new window. Reference: http://stackoverflow.com/questions/19341853/r-view-does-not-display-all-columns-of-data-frame
R sapply
Background In my memory, `sapply` is a function that takes a vector to consume and returns another vector as result. Today I am sharing a “bizarre” behavior of it. Later I will talk about the reason to account for this weird behavior. Details Let’s first look at the following line, from which I scratched …