Adaptive Regularization of Weight Vectors — Mathematical derivation

In this post I am showing my understanding about the paper Adaptive Regularization of Weight Vectors: http://papers.nips.cc/paper/3848-adaptive-regularization-of-weight-vectors.pdf The paper aims to address the negatives of a previous algorithm called confidence weighted (CW) learning by introducing the algorithm Adaptive Regularization Of Weights (AGOW). CW and AGOW are both online learning algorithms, meaning updates happen after observing each …

Math Derivation for Bayesian Probabilistic Matrix Factorization Model

In this paper I am trying to derive the mathematical formulas that appear in the paper Bayesian Probabilistic Matrix Factorization using Markov Chain Monte Carlo. We will use exactly the same notations as in the paper. The part that I am interested in is the Inference part (Section 3.3). Sample $latex U_i$ In Gibbs sampling, …

Estimate best K for K-Means in parallel

Gap statistic is often used to determine the best number of clusters. Please see a local version implementation for gap statistic here: https://github.com/echen/gap-statistic. It is often desired to parallelize such tedious job to boost the speed. I implement a parallelized version basd on the source code: library(plyr) library(ggplot2) # Calculate log(sum_i(within-cluster_i sum of squares around cluster_i …

How to make match prediction using TrueSkill?

TrueSkill [1] is a widely accepted algorithm for matchmaking, in which player skills are initialized with a random variable following Gaussian distribution and then get updated through team-based match outcomes. Eventually, with sufficient amount of matches, player skill random variables will converge to some means with very small variance. There are multiple open sourced libraries …

Optimization Overview

In this post, I am summarizing all I know for optimization. It is a hard, primary problem once you define your model with some kind of objective function integrating a bunch of parameters and you start to wonder how you could learn the values of those parameters  to minimize (or maximize) the objective function. Unconstrained Optimization …

How to understand Eigenvector/Eigenvalue?

This topic has been in my wish list for a long time. I will try to articulate my understanding of eigenvector and eigenvalue after digestion informative materials from different sources. First of all, we can consider multiplication between a square matrix and a vector as a transformation of such that is rotated and scaled. By …

BFGS and L-BFGS materials

Good explanation for BFGS and L-BFGS can be found in this textbook: https://www.dropbox.com/s/qavnl2hr170njbd/NumericalOptimization2ndedJNocedalSWright%282006%29.pdf?dl=0 In my own words: Gradient descent and its various variants have been proved to learn parameters well in practical unconstrained optimization problems. However, it may converge too slowly or too roughly depending on how you set learning rate. Remember the optimum in unconstrained …

Recommendation System Review

Traditional Collaborative Filtering each user is represented by a O(N) length vector. N is the number of items. Therefore, user-item matrix has size O(MN). M is the number of users. for a user we want to recommend items for, we scan through user-item matrix, and find most similar k users. Based on the item vectors …

My Lempel-Ziv Compressor

Lempel-Ziv algorithm is a widely known compression algorithm. Its compression rate is proved to asymptotically reach the entropy of per symbol in the sequence to be compressed, when the length of the sequence is long enough.i.e., , where is per symbol entropy in the sequence . If, for example, you have a sequence of letters …