How to understand Eigenvector/Eigenvalue?

This topic has been in my wish list for a long time. I will try to articulate my understanding of eigenvector and eigenvalue after digestion informative materials from different sources.

First of all, we can consider multiplication between a square matrix A \in \mathbb{R}^{k \times k} and a vector \vec{v} \in \mathbb{R}^k as a transformation of \vec{v} such that \vec{v} is rotated and scaled. By definition, eigenvectors of A are vectors on the directions on which A has invariant rotation transformation, i.e., A\vec{v}=\lambda\vec{v}.

What kind of matrices have eigenvectors? Only square matrices are possible to have eigenvectors. This is because:

Screenshot from 2016-01-04 13:59:31

ref: http://math.stackexchange.com/a/583976/235140

However, not all square matrices have eigenvectors: http://math.stackexchange.com/a/654713/235140

Two useful interpretations of eigenvectors if you regard a square matrix as rotation + scale transformation:

  1.  no other vectors when acted by this matrix can be stretched as much as eigenvectors. (ref: http://math.stackexchange.com/a/243553/235140)
  2. eigenvectors keep the same direction after the transformation. (ref: https://www.quora.com/What-do-eigenvalues-and-eigenvectors-represent-intuitively)

Now, let’s see the scenarios in computer science realm where eigenvectors/eigenvalues play a role.

1.  PCA

Now we look at all kinds of matrics including square matrices. One way to understand a matrix is that it represents a coordination system. Suppose P \in \mathbb{R}^{m \times k} is a matrix and \vec{x} \in \mathbb{R}^k is a column vector, then one way to interpret P \vec{x}=\vec{y} is that every row of the matrix P is a base vector of its coordination system. Every component of \vec{y} is a dot product of a row of P and \vec{x}, which can be seen as a kind of projection from \vec{x} on the corresponding row of P.

If now we have a transformation matrix P \in \mathbb{R}^{m \times k} in which rows are base vectors of some coordination system, X \in \mathbb{X}^{k \times n} is data matrix where n is the number of data points and k is the number of dimensions, then Y=PX, Y \in \mathbb{R}^{m \times n} is a dimension reduced matrix of X such that every column of Y is representation of every column of X using base vectors in P.

PCA is trying to find such P that not only reduces the dimension of X but also satisfies other properties. And what it concludes is that P is a matrix in which rows are eigenvectors of X‘s covariance matrix. (ref: https://czxttkl.com/?p=339https://www.cs.princeton.edu/picasso/mats/PCA-Tutorial-Intuition_jp.pdf)

2.  PageRank / Stationary distribution of Markov chains

Both try to find eigenvectors with eigenvalues equal to 1: P\vec{x} = \vec{x}. In PageRank, P is a fractional PageRank constribution link matrix in which P_{ij} is the page rank contributed to webpage j when webpage i links it. \vec{x} is the PageRank value vector for all web pages that we want to calculate. When calculating stationary distribution of Markov States, P is simply a transition matrix denoting pairwise transition probabilities between nodes while \vec{x} is the ultimate stationary distribution of nodes.

Leetcode 111: Minimum Depth of Binary Tree

https://leetcode.com/problems/minimum-depth-of-binary-tree/

Given a binary tree, find its minimum depth.

The minimum depth is the number of nodes along the shortest path from the root node down to the nearest leaf node.

 

Code

# Definition for a binary tree node.
# class TreeNode(object):
#     def __init__(self, x):
#         self.val = x
#         self.left = None
#         self.right = None

class Solution(object):
    def minDepth(self, root):
        """
        :type root: TreeNode
        :rtype: int
        """
        if root is None:
            return 0
            
        if root.left is None and root.right is None:
            return 1
        
        if root.left is None and root.right is not None:
            return self.minDepth(root.right) + 1
        
        if root.right is None and root.left is not None:
            return self.minDepth(root.left) + 1
        
        return min(self.minDepth(root.left), self.minDepth(root.right))+1
        

 

Idea

There are 5 conditions to consider:

  1. if root is None, the current depth is zero
  2. if both root’s left and right child are none, the current root is a leaf node which has depth 1
  3. if only one of root.left or root.right child is none, the depth depends on the depth of the non-none child.
  4. if both root.left and root.right child are not none, then the minimum depth is the min of minDepth(root.left) and minDepth(root.right) plus 1

 

Of course, you can take a few logical steps to further reduce lines of codes:

class Solution:
    # @param root, a tree node
    # @return an integer    
    def minDepth(self, root):
        if root == None:
            return 0
        if root.left==None or root.right==None:
            return self.minDepth(root.left)+self.minDepth(root.right)+1
        return min(self.minDepth(root.right),self.minDepth(root.left))+1

 

Leetcode 179: Largest Number

https://leetcode.com/problems/largest-number/

Given a list of non negative integers, arrange them such that they form the largest number.

For example, given [3, 30, 34, 5, 9], the largest formed number is 9534330.

Note: The result may be very large, so you need to return a string instead of an integer.

 

Code

class Solution(object):
    def largestNumber(self, nums):
        """
        :type nums: List[int]
        :rtype: str
        """
        return ''.join(sorted(map(str, nums),cmp=lambda x,y:cmp(y+x, x+y))).lstrip('0') or '0'

 

Idea

Efficiently using `sorted` function makes the code very short. When concatenated string starts with zeros, you should strip all left zeros. The sort will take O(NlogN) time.

 

Reference:

https://leetcode.com/discuss/21550/my-3-lines-code-in-java-and-python

Leetcode 112: Path Sum

https://leetcode.com/problems/path-sum-ii/

Given a binary tree and a sum, determine if the tree has a root-to-leaf path such that adding up all the values along the path equals the given sum.

For example:
Given the below binary tree and sum = 22,

              5
             / \
            4   8
           /   / \
          11  13  4
         /  \      \
        7    2      1

return true, as there exist a root-to-leaf path 5->4->11->2 which sum is 22.

 

Code

# Definition for a binary tree node.
# class TreeNode(object):
#     def __init__(self, x):
#         self.val = x
#         self.left = None
#         self.right = None

class Solution(object):
    def hasPathSum(self, root, sum):
        """
        :type root: TreeNode
        :type sum: intyy
        :rtype: List[List[int]]
        """
        if root is None: return False
        if root.left is None and root.right is None: return sum==root.val
        
        left = self.hasPathSum(root.left, sum-root.val) if root.left else False
        right = self.hasPathSum(root.right, sum-root.val) if root.right else False
        return left or right

 

Idea

Use DFS to traverse nodes.

 

 

Leetcode 113: Path Sum II

https://leetcode.com/problems/path-sum-ii/

Given a binary tree and a sum, find all root-to-leaf paths where each path’s sum equals the given sum.

For example:
Given the below binary tree and sum = 22,

              5
             / \
            4   8
           /   / \
          11  13  4
         /  \    / \
        7    2  5   1

return

[
   [5,4,11,2],
   [5,8,4,5]
]

 

Code

# Definition for a binary tree node.
# class TreeNode(object):
#     def __init__(self, x):
#         self.val = x
#         self.left = None
#         self.right = None

class Solution(object):
    def pathSum(self, root, sum):
        """
        :type root: TreeNode
        :type sum: int
        :rtype: List[List[int]]
        """
        if root is None: return []
        self.res = []
        self._pathSum(root, sum, [])
        return self.res
        
    def _pathSum(self, root, sum, l):
        l += [root.val]
        if root.left is None and root.right is None: 
            if sum==root.val:
                self.res += [l[:]]
        
        if root.left: self._pathSum(root.left, sum-root.val, l) 
        if root.right: self._pathSum(root.right, sum-root.val, l) 
        l.pop()
        
        
        

 

Idea

Use DSP to traverse every node in the tree. Pass the path from the root to the current node as a parameter in the traversal. Whenever it reaches a leaf, count if all nodes in the path sum to the target number. When a node finishes traversing its left node and right node, it should be popped out of the path.

Line 24: when you add current path (`l`) into `self.res`, you need to make a deep copy of `l`: `l[:]`

Leetcode 242: valid anagram

https://leetcode.com/problems/valid-anagram/

Given two strings s and t, write a function to determine if t is an anagram of s.

For example,
s = “anagram”, t = “nagaram”, return true.
s = “rat”, t = “car”, return false.

Note:
You may assume the string contains only lowercase alphabets.

Follow up:
What if the inputs contain unicode characters? How would you adapt your solution to such case?

 

 

Code

class Solution(object):
    def isAnagram(self, s, t):
        """
        :type s: str
        :type t: str
        :rtype: bool
        """
        if s is None and t is not None: return False
        if t is None and s is not None: return False
        if s is None and t is None: return True
        if len(s) != len(t): return False
        
        l = [0] * 26
        for c in s:
            l[ord(c) - ord('a')] += 1
        for c in t:
            l[ord(c) - ord('a')] -= 1
            
        for v in l:
            if v: return False
            
        return True

 

Idea

Create a dictionary counting occurrences of the 26 letters in `s` and `t`. You need O(N) time and O(N) space.

You can also use (in-place) sorting algorithms to solve this problem. You need O(NlogN) time and no space.

 

Reference

https://leetcode.com/discuss/50580/python-solutions-sort-and-dictionary

Leetcode 115: Distinct Subsequences

https://leetcode.com/problems/distinct-subsequences/

Given a string S and a string T, count the number of distinct subsequences of T in S.

A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ACE" is a subsequence of "ABCDE" while "AEC" is not).

Here is an example:
S = "rabbbit", T = "rabbit"

Return 3.

 

The question is a bit ambiguous. Please see the following link for clarification.

https://leetcode.com/discuss/599/task-clarification

 

Code

class Solution(object):
    def numDistinct(self, s, t):
        """
        :type s: str
        :type t: str
        :rtype: int
        """
        if s is None or t is None:
            return 0
        if len(t) > len(s):
            return 0
            
        num = [[1] + [0] * (len(t)) for i in range(len(s)+2)]   # a (len(s)+1)*(len(t)+1) matrix

        for i in xrange(1, len(s)+1):
            for j in xrange(1, min(i, len(t))+1):
                num[i][j] = num[i-1][j] 
                if s[i-1] == t[j-1]: num[i][j] += num[i-1][j-1]

        return num[len(s)][len(t)] 
        

 

Idea

Use dynamic programing to tackle this problem. Create an array, `num`, in which `num[i][j]` denotes the number of distinct subsequences same to `t[:j]` in `s[:i]`.  For `num[i][j] = 0` where `i <j` because you can’t find subsequences longer than the full sequence. `num[i][0]` is always 1 because every full sequence has a subsequence of empty string. 

Therefore, the array `num` is initialized as:

dise(1)

Since we are using dynamic programming, we need to find the formula to update `num[i][j]`. We propose that `num[i][j]=num[i-1][j] + (s[i-1]==t[j-1]?num[i-1][j-1]:0)`. In other words, the  distinct subsequences equal to `t[:j]` in `s[:i]` come from two parts:

  1. the distinct subsequences equal to `t[:j]` in `s[:i-1]`. This is equivalent to discard `s[i-1]` and check the number of distinct subsequences in `s[:i-1]`.
  2. if `s[i-1]==t[j-1]`, the distinct subsequences equal to `t[:j-1]` in `s[i-1]`. This is equivalent to discard `s[i-1]` and `t[j-1]` and compare the two remaining strings.

The update rule can be visualized as:

dise(3)

Overall, this algorithm takes O(N^2) time to update the full `num` array and takes O(N^2) space to store `num` array. (Suppose O(N) is the length of `s` and `t`).

 

More

2D space in dynamic programming can often be reduced to 1D space. Our code updates` num` in a “column-first-row-after” manner. Essentially, we only need to have the previous row vector to update the current row vector. Note that `num[i][j]` needs the values from `num[i-1][j-1]` and `num[i-1][j]`. In order to not erase necessary values in `num[i-1][j]` and `num[i-1][j-1]`, we need to update the row vector backwards.

class Solution(object):
    def numDistinct(self, s, t):
        """
        :type s: str
        :type t: str
        :rtype: int
        """
        if s is None or t is None:
            return 0
        if len(t) > len(s):
            return 0
            
        num = [1] + [0] * (len(t)) 

        for i in xrange(1, len(s)+1):
            for j in xrange(min(i, len(t)), 0, -1):
                if s[i-1] == t[j-1]: num[j] += num[j-1]

        return num[len(t)] 
        

Now, the algorithm takes O(N^2) time and only O(N) space.

 

 

Create 2D array in Python

I used to create 2D zero array by the following way, for example:

arr = [[0] * 3] * 4

However, `arr` actually references the list [0,0,0] 4 times. If you set `arr[1][1] = 5`, for example, you will find all “other” lists in `arr` have 5 then.

>>> arr[1][1] = 5
>>> arr
[[0, 5, 0], [0, 5, 0], [0, 5, 0], [0, 5, 0]]

 

Therefore, the best way to create a 2D array is either using `numpy` or:

arr = [[0]*3 for i in range(4)]

 

Reference

http://stackoverflow.com/a/6667529/1758727

Leetcode 65: valid number

https://leetcode.com/problems/valid-number/

65. Valid Number

Total Accepted: 38767 Total Submissions: 328141 Difficulty: Hard

Validate if a given string is numeric.

Some examples:
"0" => true
" 0.1 " => true
"abc" => false
"1 a" => false
"2e10" => true

Note: It is intended for the problem statement to be ambiguous. You should gather all requirements up front before implementing one.

Update (2015-02-10):
The signature of the C++ function had been updated. If you still see your function signature accepts a const char * argument, please click the reload button to reset your code definition.

 

Code

class Solution(object):
    def isNumber(self, s):
        s = s.strip()
        return self._isNumber(s, ["+", "-", "."], True, True)

    def _isNumber(self, s, valid_first_chars = [], check_dot=True, check_e=True):
        if not s: return False
        if s.isdigit(): return True

        if (not s[0] in map(str, range(10)) + valid_first_chars):
            return False

        if s[0] in valid_first_chars:
            if (s[0] == "+" or s[0] == "-") and self._isNumber(s[1:], ["."], check_dot, check_e):
                return True
            if s[0] == "." and self._isNumber(s[1:], [], False, check_e):
                return True
        
        if check_e:
            es = s.split("e")
            if len(es) > 2: return False
            if len(es)==2 and self._isNumber(es[0], ["+", "-", "."], True, False) \
                and self._isNumber(es[1], ["+", "-"], False, False):
                return True            

        if check_dot:
            dots = s.split(".")
            if len(dots) > 2: return False
            if len(dots)==2 and self._isNumber(dots[0], ["+", "-"], False, False) \
                and (self._isNumber(dots[1], [], False, False) or dots[1] == ''):
                return True 

        return False

        

 

Idea

My (also most people’s) way is going straight to handle different cases for parsing numbers. 

  1. The most trivial case: an empty string or a null variable can’t be a valid number  (line 7)
  2. Use string.isdigit() to check if a string contains pure number (without ‘+’, ‘-‘, ‘.’ or ‘e’).   (line 8)
  3. a valid number must start with 0~9 or “+”, “-” or “.”  (line 10-11)
  4. if `s` starts with “+” or “-“, then `s[1:]` should be a valid number without prefix as “+” or “-“. However `s[1:]` can start with `.`, for example, “+.43” is a valid number  (line 14-15)
  5. if `s` starts with “.”, then `s[1:]` should be a pure valid number without any possibility to have prefix such as “+”, “-” and “.”  (line 16-17)
  6. if you want to check if `s` is in scientific notation, the part before “e” should be a valid number with at most one prefix (“+”, “-” or “.”). The part after `e` can’t start with “.” but can start with “+” or “-“.  (line 22-24)
  7. if you want to check if `s` is in a float notation, the part before “.” should be a valid number without any dot in it. The part after “.” should not have any dot in it either. It should not have any sign (“+” or “-“) after “e”.   (line 29-31)

 

Reference

An exhausted list of test case:

Screenshot from 2015-12-30 13:54:40

 

Use SSH tunnel and Firefox for proxy

1.Use a specified port (e.g., 12345) for SSH SOCKS server:

ssh -D 12345 user@host.domain

2. Go to the setting menu in Firefox, then Advanced->Network->Connection->Settings

Screenshot from 2015-12-12 01:06:48

3.Check the “Manual proxy configuration”, fill in SOCKS host (127.0.0.1) and port (your specified port, 12345 in our example)

4. Also, type in “about:config” in the address bar in Firefox, change “network.proxy.socks_remote_dns” to true.

 

Reference

https://www.linode.com/docs/networking/ssh/setting-up-an-ssh-tunnel-with-your-linode-for-safe-browsing