BFGS and L-BFGS materials

Good explanation for BFGS and L-BFGS can be found in this textbook: https://www.dropbox.com/s/qavnl2hr170njbd/NumericalOptimization2ndedJNocedalSWright%282006%29.pdf?dl=0 In my own words: Gradient descent and its various variants have been proved to learn parameters well in practical unconstrained optimization problems. However, it may converge too slowly or too roughly depending on how you set learning rate. Remember the optimum in unconstrained …

Recommendation System Review

Traditional Collaborative Filtering each user is represented by a O(N) length vector. N is the number of items. Therefore, user-item matrix has size O(MN). M is the number of users. for a user we want to recommend items for, we scan through user-item matrix, and find most similar k users. Based on the item vectors …

My Lempel-Ziv Compressor

Lempel-Ziv algorithm is a widely known compression algorithm. Its compression rate is proved to asymptotically reach the entropy of per symbol in the sequence to be compressed, when the length of the sequence is long enough.i.e., , where is per symbol entropy in the sequence . If, for example, you have a sequence of letters …

Find Kth Smallest Element in Two Sorted Arrays

Problem Given K and two sorted arrays with length M and N, find the K-th smallest element among all elements in the two arrays.   Code import sys class Solution(object): def findKthSmallest(self, nums1, nums2, k): “”” :type nums1: sorted List[int] :type nusm2: sorted List[int] :type k: int :rtype: int “”” if not (0 <k <= …

Solve Travelling Salesman Problem using DP

Claim In this post, I am using the same notation as in the video below and the wiki page (https://en.wikipedia.org/wiki/Held%E2%80%93Karp_algorithm)   Our problem is given a matrix costs in which costs[i][j] is the traveling cost going from node i to node j, we are asked to return the minimum cost for traveling from node 0 and visit …

You are almost always getting significant p-value in large dataset

Recently, an exploding reproducibility research paper [1] shows that more than half of past published papers in the main psychological journals can’t be replicated to generate significance findings. This makes me dig a little further on how to conduct statistical tests. Let me show you my little experiments in `R` first. # remove(list=ls()) set.seed(19910715) n1 …

Minimum Description Length (MDL): a scoring function to learn Bayesian Network Structure

Bayesian Network Augmented Naive-Bayes (BAN) is an improved version of Naive-Bayes, in which every attribute may have at most one other attribute as its parent other than the class variable . For example (figure from [1]): Though BAN provides a more flexible structure compared to Naive Bayes, the structure (i.e., dependence/independence relationships between attributes) needs …