czxttkl – Page 29

Graph Model Recap

I wrote a post before talking about sum-product algorithm: http://maider.blog.sohu.com/307377602.html

At that time, I had fresh memory on how graph model works. However, today when I read the post again, I am lost again. Therefore, I want to recap the basic motivations of graph models. This post can serve as an intro to the old post. I think remembering the motivations for an algorithm is really more important than details in it.

What does a Graph Model represent?

A graph model represents a joint distribution of all variables appearing in the a graph. There are three types of graph models, one is directed, one is undirected graph model, and the other is factor graph. For me, I find directed model is really intuitive because you can tell casual relationship from it. (See below the directed graph of the sex-gender-disease example I made in the sum-product algorithm post.) I am less familiar with undirected graph model, though. The factor graph is usually converted from directed/undirected graph model for inference learning purpose. So no matter what is your original graph model (directed or undirected), all learning algorithms take place at corresponding factor graphs.

A summary from Bishop’s book:

Another summary from his paper:

Factor Graph

A factor graph consists of multiple factors, each is a clique connecting a fraction of total variables. Say you have want to calculate marginal distribution $latex p(x_i)$ from the joint distribution $latex p(x_1, x_2, \cdots, x_N)$. The naive way to calculate that is:

get all possible configuration of $latex (x_1, x_2, \cdots, x_{i-1}, x_{i+1}, \cdots, x_N)$.
for each configuration $latex \hat{x}$, you calculate the joint probability of $latex p(x_i, \hat{x})$
Sum all the joint probabilities under all configurations, which equals to $latex p(x_i)$.

If $latex x_1$ can take $latex |x_1|$ different values, $latex x_2$ can take $latex |x_2|$ different values, …, we will need to sum up $latex \prod\limits_{j \neq i} |x_j|$ joint probabilities.

Factor graph helps eliminate unnecessary computation. For example, if we have only four variables and the joint distribution is $latex p(x_1, x_2, x_3, x_4) = p(x_1)p(x_2|x_1)p(x_3|x_2)p(x_4|x_3) &s=2$, then $latex p(x_4)=\Big[\sum\limits_{x_3}p(x_4|x_3)\big[\sum\limits_{x_2}p(x_3|x_2)[\sum\limits_{x_1}p(x_1)p(x_2|x_1)]\big]\Big] &s=2$. Now you only need to compute addition operation $latex |x_1||x_2| + |x_2| + |x_3| $ times.

What does Sum-Product algorithm do?

Sum-Product algorithm can easily calculate the marginal probability of any variable in the model. It, however, is not an inference learning algorithm. (i.e., it is not used for learning parameters in a graph model.)

What do inference algorithms do?

Inference algorithms learn the parameters of graph models. After that, you can use sum product algorithm to calculate marginal probability of any variable in the model.

Use PDB to check variables before crashes

1. Use `python -i your_script.py` to execute your program with interactive mode. This means, after your program finishes executing, or your program crashes in the midway, you will enter a python shell.

2. Suppose your script has a bug so that you enter the python shell after it crashes. Now you can play with pdb by:

import pdb
pdb.pm()
# print your variables or do whatever you want

Leetcode 209: Minimum Size Subarray Sum

https://leetcode.com/problems/minimum-size-subarray-sum/

Minimum Size Subarray Sum

Total Accepted: 20591 Total Submissions: 84896 Difficulty: Medium

Given an array of n positive integers and a positive integer s, find the minimal length of a subarray of which the sum ≥ s. If there isn’t one, return 0 instead.

For example, given the array [2,3,1,2,4,3] and s = 7,
the subarray [4,3] has the minimal length under the problem constraint.

click to show more practice.

Credits:
Special thanks to @Freezen for adding this problem and creating all test cases.

Code

import sys

class Solution(object):
    def minSubArrayLen(self, s, nums):
        """
        :type s: int
        :type nums: List[int]
        :rtype: int
        """
        p1 = p2 = 0
        cum = 0
        min_len = sys.maxint        

        for num in nums:
            cum += num
            if cum >=s:
                min_len = min(min_len, p1-p2+1)
                while p2 < p1:
                    if cum-nums[p2] >=s:
                        cum = cum-nums[p2]
                        p2 += 1
                        min_len = min(min_len, p1-p2+1)
                    else:
                        break
            p1 += 1

        return 0 if min_len == sys.maxint else min_len

Idea

Remember that all numbers are positive. You can have a fast pointer `p1` and a slow pointer `p2`. You first make the fast pointer to traverse the numbers until sum from nums[p2] to nums[p1] is more than s. Then, you start to move `p2` forward until the sum from nums[p2] to nums[p1] is no more than s anymore. In total, `p1` and `p2` both traverse at most O(N) distance. So the time complexity of the algorithm is O(N). Since we don’t require additional space, the space complexity of the alogrithm is O(1).

Leetcode 208: Implement Trie (Prefix Tree)

https://leetcode.com/problems/implement-trie-prefix-tree/

Implement Trie (Prefix Tree)

Total Accepted: 18703 Total Submissions: 75061 Difficulty: Medium

Implement a trie with insert, search, and startsWith methods.

Note:
You may assume that all inputs are consist of lowercase letters a-z.

Code

class TrieNode(object):
    def __init__(self):
        """
        Initialize your data structure here.
        """
        self.nodes = dict()
        self.is_word = False 


class Trie(object):

    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        """
        Inserts a word into the trie.
        :type word: str
        :rtype: void
        """
        p = self.root
        for idx, ch in enumerate(word):
            if not p.nodes.get(ch, None):         
                p.nodes[ch] = TrieNode()
            p = p.nodes[ch]
            if idx==len(word)-1:
                p.is_word = True

    def search(self, word):
        """
        Returns if the word is in the trie.
        :type word: str
        :rtype: bool
        """
        p = self.root
        for idx, ch in enumerate(word):
            if not p.nodes.get(ch, None):
                return False
            p = p.nodes[ch]
            if idx==len(word)-1 and not p.is_word:
                return False
        
        return True
        

    def startsWith(self, prefix):
        """
        Returns if there is any word in the trie
        that starts with the given prefix.
        :type prefix: str
        :rtype: bool
        """
        p = self.root
        for idx, ch in enumerate(prefix):
            if not p.nodes.get(ch, None):
                return False
            p = p.nodes[ch]
            
        return True
        

# Your Trie object will be instantiated and called as such:
# trie = Trie()
# trie.insert("somestring")
# trie.search("key")

Idea

See idea about Trie:

http://www.geeksforgeeks.org/trie-insert-and-search/

https://leetcode.com/discuss/34840/maybe-the-code-is-not-too-much-by-using-next-26-c

Leetcode 220: Contains Duplicate III

https://leetcode.com/problems/contains-duplicate-iii/

Contains Duplicate III

Total Accepted: 15083 Total Submissions: 90517 Difficulty: Medium

Given an array of integers, find out whether there are two distinct indices i and j in the array such that the difference between nums[i] and nums[j] is at most t and the difference between i and j is at most k.

Code

import sys

class Solution(object):
    def containsNearbyAlmostDuplicate(self, nums, k, t):
        """
        :type nums: List[int]
        :type k: int
        :type t: int
        :rtype: bool
        """
        if not nums or k <=0 or t<0:
            return False

        def tran_num(org_num):
            return org_num + sys.maxint     
            
        def to_bucket(num_tran):
            return num_tran / (t+1)

        bucket_dict = {}
        for idx, num in enumerate(nums):
            num_tran = tran_num(num)
            bucket = to_bucket(num_tran)        
            
            if bucket_dict.get(bucket) or \
                bucket_dict.get(bucket+1, num_tran+t+1) - num_tran <=t or \
                num_tran - bucket_dict.get(bucket-1, num_tran-t-1) <=t:
                return True
                
            if idx >=k:
                old_bucket = to_bucket(tran_num(nums[idx-k]))
                bucket_dict.pop(old_bucket)
            
            bucket_dict[bucket] = num_tran

        return False

Idea (https://leetcode.com/discuss/38206/ac-o-n-solution-in-java-using-buckets-with-explanation)

We want to check if a number differs at most t with any of previous k elements by checking if any two numbers fall into the same bucket we shall create.

To do so, we first need to add every number by `sys.maxint`. This is to make sure every number can be transformed into an non-negative number so that we don’t need to worry about how to make different bucketing rules for negative and positive numbers.

For one transformed number, `num_tran`, we assign it to bucket `num_tran/(t+1)`. If two numbers fall into the same bucket, they must be of at most t difference. However, two numbers can also differ at most t if they are from two adjacent buckets. That’s why we have three condition checkings at line 25-27.

bucket_dict always has at most K elements. We maintain that by always popping out idx-k element. (line 30-32)

The time complexity is O(N) and the space complexity is O(K).

Idea II

You can also maintain a balanced binary search tree of size K. Then you can always get the closest value of nums[i] (current visiting number) in the binary search tree in O(logK). If the closest value is within t of nums[i]’s range, you can return True. Since you need to traverse all numbers in nums, and for each number you need to search in O(logK) time, the time complexity is O(NlogK) and the space complexity is O(K).

Reference:

https://leetcode.com/discuss/38177/java-o-n-lg-k-solution

Leetcode 261: Graph Valid Tree

https://leetcode.com/problems/graph-valid-tree/

Graph Valid Tree

Total Accepted: 2794 Total Submissions: 10778 Difficulty: Medium

Given n nodes labeled from 0 to n - 1 and a list of undirected edges (each edge is a pair of nodes), write a function to check whether these edges make up a valid tree.

For example:

Given n = 5 and edges = [[0, 1], [0, 2], [0, 3], [1, 4]], return true.

Given n = 5 and edges = [[0, 1], [1, 2], [2, 3], [1, 3], [1, 4]], return false.

Hint:

Given n = 5 and edges = [[0, 1], [1, 2], [3, 4]], what should your return? Is this case a valid tree? Show More Hint

Note: you can assume that no duplicate edges will appear in edges. Since all edges are undirected, [0, 1] is the same as [1, 0] and thus will not appear together inedges.

Overall idea

We need to check two properties to determine whether a set of edges form a valid tree:

it has n-1 edges
it is acyclic

Code 1 (Union Find)

class Solution(object):
    def validTree(self, n, edges):
        """
        :type n: int
        :type edges: List[List[int]]
        :rtype: bool
        """
        union_arr = range(n)
        def find_union(p):
            if  union_arr[p] == p:
                return p
            else:
                return find_union(union_arr[p])        
                
        for p1, p2 in edges:
            p1_set = find_union(p1)        
            p2_set = find_union(p2)
            if p1_set == p2_set:
                return False
            union_arr[p1_set] = p2_set
        
        return len(edges) == n-1

Idea

This algorithm uses an idea called union find. You first initialize each node so that each node itself forms a node set. (We use `union_arr` to record which set a node belongs to). As we traverse all edges, we will find connected components. The union find algorithm makes sure that every node in a connected component will point to a same node set by using `find_union` function. Therefore, if we see a new edge with two points in the same node set, we will return False because the edge makes a cycle in the graph. If no cycle is found, we will finally check if there are exactly `n-1` edges to form a tree rather than disjoint parts in the graph. (line 22)

Reference

https://leetcode.com/discuss/52563/ac-java-union-find-solution

http://www.geeksforgeeks.org/union-find/

Updated 2016/09/30:

The union find algorithm can also be used to find connected components in an undirected graph. See leetcode 130: Surrounding Regions.

Code 2 (DFS)

class Solution(object):
    def validTree(self, n, edges):
        """
        :type n: int
        :type edges: List[List[int]]
        :rtype: bool
        """
        adj = {s:[] for s in range(n)}
        for p1, p2 in edges:
            adj[p1] += [p2]
            adj[p2] += [p1]
        stk = [0]
        while stk:
            cur = stk.pop()
            stk.extend(adj.pop(cur, []))
        return len(edges)==n-1 and not adj

Idea

We start to visit all nodes from node 0. (line 12) If we finish traversing all reachable nodes but there are still some adjacency matrix entry left then we know the given edges actually form multiple separate graphs. Therefore we should return False.

Code 3(BFS)

from collections import deque
class Solution(object):
    def validTree(self, n, edges):
        """
        :type n: int
        :type edges: List[List[int]]
        :rtype: bool
        """
        adj = {s:[] for s in range(n)}
        for p1, p2 in edges:
            adj[p1] += [p2]
            adj[p2] += [p1]
        q = deque()
        q.append(0)
        while q:
            cur = q.popleft()
            q.extend(adj.pop(cur, []))
        return len(edges)==n-1 and not adj

Idea

Similar idea as in code 2. But you implement the traversal of nodes using `deque`.

Leetcode 99: Validate Binary Search Tree

https://leetcode.com/problems/validate-binary-search-tree/

Validate Binary Search Tree

Total Accepted: 65929 Total Submissions: 326515 Difficulty: Medium

Given a binary tree, determine if it is a valid binary search tree (BST).

Assume a BST is defined as follows:

The left subtree of a node contains only nodes with keys less than the node’s key.
The right subtree of a node contains only nodes with keys greater than the node’s key.
Both the left and right subtrees must also be binary search trees.

confused what "{1,#,2,3}" means? > read more on how binary tree is serialized on OJ.

OJ’s Binary Tree Serialization:

The serialization of a binary tree follows a level order traversal, where ‘#’ signifies a path terminator where no node exists below.

Here’s an example:

The above binary tree is serialized as "{1,2,3,#,#,4,#,#,5}".

Code

# Definition for a binary tree node.
# class TreeNode(object):
#     def __init__(self, x):
#         self.val = x
#         self.left = None
#         self.right = None

class Solution(object):
    def isValidBST(self, root):
        """
        :type root: TreeNode
        :rtype: bool
        """
        return self.helper(root, None, None)
        
    def helper(self, root, min, max):
            if not root:
                return True

            if (min is not None and root.val <= min) or (max is not None and root.val >= max):
                return False
            
            return self.helper(root.left, min, root.val) and self.helper(root.right, root.val, max)

Idea

Just check the basic rule of BST: all nodes in a node’s left subtree should be less than the node’s value. all nodes in a node’s right subtree should be larger than the node’s value.

Another Code

# Definition for a binary tree node.
# class TreeNode(object):
#     def __init__(self, x):
#         self.val = x
#         self.left = None
#         self.right = None

class Solution(object):
    def isValidBST(self, root):
        """
        :type root: TreeNode
        :rtype: bool
        """
        self.prev = None
        return self.inorder(root)

    def inorder(self, root):
        if not root:
            return True
        if not self.inorder(root.left):
            return False
        if self.prev and self.prev.val >= root.val:
            return False
        self.prev = root
        return self.inorder(root.right)

Idea

We use another property of BST: the inorder traversal of BST should return us an ascending order of numbers. We can do inorder traversal with a variable (`prev` in our code) recording which node we visited before we are visiting the current node. Whenever we find `prev.val` is larger than the current node’s value, we report invalidity of the tree.

leetcode 61: Rotate List

https://leetcode.com/problems/rotate-list/

Rotate List

Total Accepted: 50875 Total Submissions: 233034 Difficulty: Medium

Given a list, rotate the list to the right by k places, where k is non-negative.

For example:
Given 1->2->3->4->5->NULL and k = 2,
return 4->5->1->2->3->NULL.

Code

# Definition for singly-linked list.
# class ListNode(object):
#     def __init__(self, x):
#         self.val = x
#         self.next = None

class Solution(object):
    def rotateRight(self, head, k):
        """
        :type head: ListNode
        :type k: int
        :rtype: ListNode
        """
        if not head:
            return None
            
        len_cnt = 0
        tail_reach = False
        fast = head
        while k:
            len_cnt += 1
            if not fast.next:
                tail_reach = True
                fast = head
            else:
                fast = fast.next
            k -= 1
            if tail_reach:
                k %= len_cnt
        
        if fast == head:
            return head
        
        slow = head
        while fast.next:
            fast = fast.next
            slow = slow.next
        
        fast.next = head
        new_head = slow.next
        slow.next = None
        return new_head

Idea

Use `fast` pointer to traverse `k` nodes. Then use a slow pointer to traverse from head. Both fast and slow pointers traverse one node by one node, until fast pointer reaches the end (line 35-37). Therefore, the slow pointer traverses exactly len(nums)-k nodes. Its next node should be the new head.

More succinct code can be found: https://discuss.leetcode.com/topic/14470/my-clean-c-code-quite-standard-find-tail-and-reconnect-the-list

Leetcode 26: Remove Duplicates from Sorted Array

https://leetcode.com/problems/remove-duplicates-from-sorted-array/

Remove Duplicates from Sorted Array

Total Accepted: 87967 Total Submissions: 280545 Difficulty: Easy

Given a sorted array, remove the duplicates in place such that each element appear only once and return the new length.

Do not allocate extra space for another array, you must do this in place with constant memory.

For example,
Given input array nums = [1,1,2],

Your function should return length = 2, with the first two elements of nums being 1 and 2 respectively. It doesn’t matter what you leave beyond the new length.

Code

class Solution(object):
    def removeDuplicates(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        if not nums:
            return 0
        if len(nums)==1:
            return 1
            
        prev = nums[0]
        del_cnt = 0
        for i in xrange(1, len(nums)):
            idx = i- del_cnt
            if nums[idx] == prev:
                nums.pop(idx)
                del_cnt += 1
            else:
                prev = nums[idx]
            
        return len(nums)

Idea

My code doesn’t only return the length of non-duplicated elements but also pop duplicated elements in the original array.

Code that does not need to pop: https://discuss.leetcode.com/topic/3102/my-solution-time-o-n-space-o-1

Leetcode 53: Maximum Subarray

https://leetcode.com/problems/maximum-subarray/

Maximum Subarray

Total Accepted: 79595 Total Submissions: 227725 Difficulty: Medium

Find the contiguous subarray within an array (containing at least one number) which has the largest sum.

For example, given the array [−2,1,−3,4,−1,2,1,−5,4],
the contiguous subarray [4,−1,2,1] has the largest sum = 6.

click to show more practice.

More practice:

If you have figured out the O(n) solution, try coding another solution using the divide and conquer approach, which is more subtle.

Code

import sys
class Solution(object):
    def maxSubArray(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
            
        max_sum = -sys.maxint
        tmp_sum = 0
        
        for i in xrange(0, len(nums)):
            if tmp_sum + nums[i] >= 0:
                tmp_sum += nums[i]
                max_sum = max(max_sum, tmp_sum)
            else:
                tmp_sum = 0
                # for corner case where the array only contains negative nums
                max_sum = max(max_sum, nums[i])
                
        return max_sum

Idea

We create a variable `tmp_sum` to record what is the maximum sum you have seen. If you visit a positive number, you can definitely add it to `tmp_sum`; if you visit a negative number, as long as `tmp_sum` is still larger than 0, you can add it also. If a negative number makes `tmp_sum` to negative, that means the maximum sum cannot happen by including this negative number and any arbitrary numbers before it.

Another idea is to use divide and conquer:

https://leetcode.com/discuss/694/how-solve-maximum-subarray-using-divide-and-conquer-approach

Step1. Select the middle element of the array. So the maximum subarray may contain that middle element or not.

Step 2.1 If the maximum subarray does not contain the middle element, then we can apply the same algorithm to the the subarray to the left of the middle element and the subarray to the right of the middle element.

Step 2.2 If the maximum subarray does contain the middle element, then the result will be simply the maximum suffix subarray of the left subarray plus the maximum prefix subarray of the right subarray

Step 3 return the maximum of those three answer.

Here is a sample code for divide and conquer solution. Please try to understand the algorithm before look at the code:

class Solution {
public:
    int maxSubArray(int A[], int n) {
        // IMPORTANT: Please reset any member data you declared, as
        // the same Solution instance will be reused for each test case.
        if(n==0) return 0;
        return maxSubArrayHelperFunction(A,0,n-1);
    }
    
    int maxSubArrayHelperFunction(int A[], int left, int right) {
        if(right == left) return A[left];
        int middle = (left+right)/2;
        int leftans = maxSubArrayHelperFunction(A, left, middle);
        int rightans = maxSubArrayHelperFunction(A, middle+1, right);
        int leftmax = A[middle];
        int rightmax = A[middle+1];
        int temp = 0;
        for(int i=middle;i>=left;i--) {
            temp += A[i];
            if(temp > leftmax) leftmax = temp;
        }
        temp = 0;
        for(int i=middle+1;i<=right;i++) {
            temp += A[i];
            if(temp > rightmax) rightmax = temp;
        }
        return max(max(leftans, rightans),leftmax+rightmax);
    }
};

The divide and conquer algorithm takes O(nlogn) time and O(n) space just as merge sort.