czxttkl – Page 34

Leetcode 139: WordBreak

https://leetcode.com/problems/word-break/

Word Break

Total Accepted: 65299 Total Submissions: 278950 Difficulty: Medium

Given a string s and a dictionary of words dict, determine if s can be segmented into a space-separated sequence of one or more dictionary words.

For example, given
s = "leetcode",
dict = ["leet", "code"].

Return true because "leetcode" can be segmented as "leet code".

Let’s say we have K words in dictionary and s has length N.

DP Idea

You create a boolean array `f` of length `len(s)+1`. “`f[i] == True`” means substring `s[0:i-1]` can be matched successfully.

Jave code (Ref: https://leetcode.com/discuss/18904/java-implementation-using-dp-in-two-ways):

public class Solution {
    public boolean wordBreak(String s, Set<String> dict) {

        boolean[] f = new boolean[s.length() + 1];
        Arrays.fill(f, false);

        f[0] = true;

        //Second DP
        for(int i=1; i <= s.length(); i++){
            for(int j=0; j < i; j++){
                if(f[j] && dict.contains(s.substring(j, i))){
                    f[i] = true;
                    break;
                }
            }
        }

        return f[s.length()];
    }
}

This algorithm takes O(N^2) time complexity and O(N) space.

Python Code Using DP (Ref: https://leetcode.com/discuss/23605/simple-dp-solution-in-python-with-description)

def word_break(s, words):
    d = [False] * len(s)    
    for i in range(len(s)):
        for w in words:
            if w == s[i-len(w)+1:i+1] and (d[i-len(w)] or i-len(w) == -1):
                d[i] = True
    return d[-1]

This algorithm improves the time complexity to O(NK) but still uses O(N) space.

Another improved Python Code

class Solution(object):
    def wordBreak(self, s, wordDict):
        """
        :type s: str
        :type wordDict: Set[str]
        :rtype: bool
        """
  
        break_idx = []
        for i in xrange(len(s)):
            if s[0:i+1] in wordDict:
                break_idx.append(i)
            else:
                for b in break_idx:
                    if s[b+1:i+1] in wordDict:
                        break_idx.append(i)
                        break
        
        return len(break_idx)>0 and break_idx[-1]== len(s)-1

This algorithm works better if you know there are only a few positions in `s` that can be segmented into words. (i.e., you know d[i]==True for only a small number of i, in the previous algorithm)

BFS Idea:

https://leetcode.com/discuss/8479/a-solution-using-bfs

But it is essentially similar to the DP idea. The time complexity is O(N^2).

Code

from collections import deque

class Solution(object):
    def wordBreak(self, s, wordDict):
        """
        :type s: str
        :type wordDict: Set[str]
        :rtype: bool
        """
        l = deque([0])
        visited = set()
        while l:
            n = l.popleft()
            for word in wordDict:
                start = n
                end = start + len(word)
                if s[start:end] == word and end not in visited:
                    if end == len(s):
                        return True
                    l.append(end)
                    visited.add(end)
        
        return False

You can also use Trie (prefix tree) to solve this problem. (TODO)

Leetcode 23: Merge k sorted lists

https://leetcode.com/problems/merge-k-sorted-lists/

Merge k Sorted Lists

Total Accepted: 59265 Total Submissions: 278075 Difficulty: Hard

Merge k sorted linked lists and return it as one sorted list. Analyze and describe its complexity.

Code 1:

import heapq
# Definition for singly-linked list.
# class ListNode(object):
#     def __init__(self, x):
#         self.val = x
#         self.next = None

class Solution(object):
    def mergeKLists(self, lists):
        """
        :type lists: List[ListNode]
        :rtype: ListNode
        """
        dummy = ListNode(-1)
        cur = dummy
        if not lists:
            return None

        h = [(lists[i].val, lists[i]) for i in xrange(len(lists)) if lists[i] ]

        heapq.heapify(h)

        while len(h) > 0:
            t = heapq.heappop(h)
            cur.next = t[1]
            cur = cur.next
            if t[1].next:
                heapq.heappush(h, (t[1].next.val, t[1].next)) 

        return dummy.next

Use a min-heap to store k lists’ heads. Build a heap requires O(k) time. Every time you pop the least value from the heap, and push into that value’s next node into the heap. Since there are N elements in total and each heap push takes O(logK) time, the algorithm takes O(K+NlogK) time (O(K) is for heapify) but also O(K) additional space.

See [1] and [2] why the time complexity of heapify is O(K).

[1] http://stackoverflow.com/questions/9755721/how-can-building-a-heap-be-on-time-complexity

[2] https://www.cs.bgu.ac.il/~ds122/wiki.files/Presentation09.pdf

Code 2:

Use O(NlogK) time and there is no additional space required. Everytime you merge k/2 pairs lists. Reference: https://leetcode.com/discuss/20774/sharing-straightforward-solution-without-structure-vector

class Solution {
public:
    ListNode *mergeTwoLists(ListNode* l1, ListNode* l2) {
        if (NULL == l1) return l2;
        else if (NULL == l2) return l1;
        if (l1->val <= l2->val) {
            l1->next = mergeTwoLists(l1->next, l2);
            return l1;
        }
        else {
            l2->next = mergeTwoLists(l1, l2->next);
            return l2;
        }
    }
    ListNode *mergeKLists(vector<ListNode *> &lists) {
        if (lists.empty()) return NULL;
        int len = lists.size();
        while (len > 1) {
            for (int i = 0; i < len / 2; ++i) {
                lists[i] = mergeTwoLists(lists[i], lists[len - 1 - i]);
            }
            len = (len + 1) / 2;
        }

        return lists.front();
    }
};

Similar Idea using Java:

public class Solution {
    public ListNode mergeTwoLists(ListNode l1, ListNode l2) {
        if (l1 == null) return l2;
        if (l2 == null) return l1;

        ListNode head=null;
        ListNode former=null;
        while (l1!=null&&l2!=null) {
            if (l1.val>l2.val) {
                if (former==null) former=l2; else former.next=l2;
                if (head==null) head=former; else former=former.next;
                l2=l2.next;
            } else {
                if (former==null) former=l1; else former.next=l1;
                if (head==null) head=former; else former=former.next;
                l1=l1.next;
            }
        }
        if (l2!=null) l1=l2;
        former.next=l1;
        return head;

    }

    public ListNode mergeKLists(List<ListNode> lists) {
        if (lists.size()==0) return null;
        if (lists.size()==1) return lists.get(0);
        if (lists.size()==2) return mergeTwoLists(lists.get(0), lists.get(1));
        return mergeTwoLists(mergeKLists(lists.subList(0, lists.size()/2)), 
            mergeKLists(lists.subList(lists.size()/2, lists.size())));
    }
}

https://leetcode.com/discuss/9279/a-java-solution-based-on-priority-queue

Reference:

Leetcode 269: Alien Dictionary

https://leetcode.com/problems/alien-dictionary/

Alien Dictionary

Total Accepted: 1390 Total Submissions: 8315 Difficulty: Hard

There is a new alien language which uses the latin alphabet. However, the order among letters are unknown to you. You receive a list of words from the dictionary, where words are sorted lexicographically by the rules of this new language. Derive the order of letters in this language.

For example,
Given the following words in dictionary,

[
  "wrt",
  "wrf",
  "er",
  "ett",
  "rftt"
]

The correct order is: "wertf".

Note:

You may assume all letters are in lowercase.
If the order is invalid, return an empty string.
There may be multiple valid order of letters, return any one of them is fine.

Code

from sets import Set

class Solution(object):
    def alienOrder(self, words):
        """
        :type words: List[str]
        :rtype: str
        """
        if not words:
            return ""

        if len(words) == 1:
            return "".join([s for s in Set(words[0])])

        # len(words) >=2
        adj_dict = self.build_graph(words)
        visit_dict = dict([(k,0) for k in adj_dict.keys()])
        res = []
        self.valid = True
        
        for c in adj_dict.keys():
            self.topo_sort(c, visit_dict, adj_dict, res)
        return "".join([s for s in res]) if self.valid else ""

    def build_graph(self, words):
        adj_dict = {}

        def add_node(c):
            if adj_dict.get(c, None) == None:
                adj_dict = {}

        def add_link(src, des):
            adj_dict[src][des] = None

        for i in xrange(len(words)-1):
            w1 = words[i]
            w2 = words[i+1]
            start2 = 0
            for j in xrange(min(len(w1), len(w2))):
                if w1[j] == w2[j]:
                    add_node(w1[j])
                else:
                    add_node(w1[j])
                    add_node(w2[j])
                    add_link(w2[j], w1[j]) 
                    start2 = j+1
                    break   
            for j in xrange(start2, len(w1)):
                add_node(w1[j])             
            for j in xrange(start2, len(w2)):
                add_node(w2[j])
                
        return adj_dict

    def topo_sort(self, c, visit_dict, adj_dict, res):
        if self.valid and not visit_dict:
            visit_dict = 1
            for adj_c in adj_dict.keys():
                if visit_dict[adj_c] == 1:
                    self.valid = False
                    return
                if not visit_dict[adj_c] and self.valid:
                    self.topo_sort(adj_c, visit_dict, adj_dict, res)
            
            visit_dict = 2
            res.append(c)

Idea

Build a graph first, in which nodes are letters and a link from letter A to letter B means A should be latter than B in the final order. Then use topological sort to output nodes in order. https://en.wikipedia.org/wiki/Topological_sorting

Review: The Google File System

Original paper: http://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf

Cons and pros on Quora: https://www.quora.com/Reviews-of-Google-File-System

Paper discussion: http://pages.cs.wisc.edu/~swift/classes/cs736-fa08/blog/2008/11/the_google_file_system.html

A blog discussion: http://kaushiki-gfs.blogspot.com/

A ppt discussion: http://www.slideshare.net/tutchiio/gfs-google-file-system

A QA series: http://pages.cs.wisc.edu/~thanhdo/qual-notes/fs/fs4-gfs.txt

GFS wiki: http://google-file-system.wikispaces.asu.edu/

2.3

GFS uses one-master-mutiple-chunkservers mode. Client will query the master about the metadata of files, which reside in the master’s memory hence query is fast. The real data operation takes place at chunkservers. The client can cache metadata of chunkservers so interaction with the master can be reduced.

2.5

A relative large chunk size (64MB) can reduce interaction with the master further: many read/write can happen on the same chunk and network overhead can be reduced by “keeping a persistent TCP connection to the chunkserver over an extended period of time”. The only shortage happens when small executable files, usually taking only one chunk, are requested from many clients at the same time and thus pressure the few chunk servers. Imagine the normal usage case, where a large file usually takes several chunks. Different clients may access different chunks therefore no particular chunk servers become overloaded.

2.7

GFS has a relaxed consistency model. GFS guarantees that:For example, `Record Append` make sure the content to be appended will be appended at least once. The append offset is chosen by GFS to be at the end of the file but not fixed as in normal POSIX. In most cases, after concurrent successes of `Record Append`, the file region will be defined, i.e. all users can see the same content which have already had all their contents appended. Application should deal with duplication or any other inconsistency itself: http://www.cs.cornell.edu/courses/cs6464/2009sp/lectures/15-gfs.pdf. Also from Section 3 you will know that GFS uses leases to ensure that all replicas execute mutations in the same order to achieve consistency.

That means, GFS is not transparent to applications. Applications need to deal with duplication and validity of data.

4.1 Namespace Management and Locking

You first need to know read lock and write lock:

http://www.coderanch.com/t/418888/java-EJB-SCBCD/certification/Difference-Read-Lock-Write-Lock

https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock

In a nutshell, write lock can only be acquired exclusively. Read lock can be acquired concurrently by many clients.

There is no per-directory data structure in GFS. With prefix compression (something like we see from information theory), the whole namespace tree is loaded in memory. Each node in the namespace tree has an associated read-write lock. The paper gives the following example to illustrate how locks are acquired when two operations, snapshot and file creation, happen at the same time:

Note that snapshot operation needs to get write lock of `/home/user` because during the snapshot you can’t let other client to get any lock of `/home/user`. For example, if another client can get read lock of `/home/user`, then if it subsequently gets a write lock of `/home/user/foo`, it can create files `/home/user` during `/home/user` is being snapshotted. In the consequence, there might be data inconsistency between `/home/user` and `/save/user`.

At the end

Actually I get very vague design information from the paper. I guess it is because I have shallow knowledge in large distributed system design. I will come back and update this post in the future. Let me use one slide from others to conclude it now:

Leetcode 146: LRU cache

https://leetcode.com/problems/lru-cache/

LRU Cache

Total Accepted: 50542 Total Submissions: 329169 Difficulty: Hard

Design and implement a data structure for Least Recently Used (LRU) cache. It should support the following operations: get and set.

get(key) – Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1.
set(key, value) – Set or insert the value if the key is not already present. When the cache reached its capacity, it should invalidate the least recently used item before inserting a new item.

Code:

class Node(object):
    def __init__(self, val, key=None):
        self._prev = None
        self._next = None
        self._val = val
        self._key = key
    
    @property
    def val(self):
        return self._val
    
    @val.setter
    def val(self, value):
        self._val = value
        
    @property
    def key(self):
        return self._key

    @property
    def prev(self):
        return self._prev 

    @prev.setter
    def prev(self, node):
        self._prev = node     
   
    @property
    def next(self):
        return self._next

    @next.setter
    def next(self, node):
        self._next = node

class LRUCache(object):

    def __init__(self, capacity):
        """
        :type capacity: int
        """
        self.capacity = capacity
        self.table = {}
        self.dummy_head = Node(-1)
        self.dummy_tail = Node(-1)
        self.dummy_head.next = self.dummy_tail
        self.dummy_tail.prev = self.dummy_head

    def get(self, key):
        """
        :rtype: int
        """
        if self.table.get(key, None):
            node = self.table.get(key)
            self.del_node(node)
            self.tail_node(node)
            return node.val
        else:
            return -1

    def set(self, key, value):
        """
        :type key: int
        :type value: int
        :rtype: nothing
        """
        if self.table.get(key, None):
            node = self.table.get(key)
            node.val = value
            self.del_node(node)
            self.tail_node(node)            
        else:
            node = Node(value, key)
            self.table[key] = node
            self.tail_node(node)

            if len(self.table) > self.capacity: 
                node_del = self.dummy_head.next
                self.del_node(node_del)
                self.table.pop(node_del.key, None)

    def del_node(self, node):
        node.prev.next = node.next
        node.next.prev = node.prev

    def tail_node(self, node):
        node.prev = self.dummy_tail.prev
        node.next = self.dummy_tail
        self.dummy_tail.prev.next = node
        self.dummy_tail.prev = node

Idea:

First, we want to create a double linked list storing caches. Using this data structure, we can insert and delete cache in O(1) time. To be more convenient, we have a dummy head and dummy tail at the beginning and the end of the double linked list. All caches will be inserted between dummy_head and dummy_tail. The cache closer to dummy_head is less recently used. The cache closer to dummy_tail is more recently used. In `LRU`, we also have a `self.table` hash table to record key-node mapping.

Now, when you set a cache, you check whether it is stored in the double linked list before. If it is (line 67), we update its value. Then we remove it from the double linked list temporarily and then add it to the tail. If the cache is not stored before, you have to add its key to self.table as well as add it to the tail. However, this may cause the cache size overflow. To deal with it, you need to remove the next element dummy_head points to.

A caveat is that `get` function will not simply return a cache’s value but also needs to put that cache to the tail of the double linked list (line 55-56).

Reference:

https://leetcode.com/discuss/20139/java-hashtable-double-linked-list-with-touch-of-pseudo-nodes

Leetcode 271: Encode and Decode Strings

https://leetcode.com/problems/encode-and-decode-strings/

Encode and Decode Strings

Total Accepted: 1272 Total Submissions: 4999 Difficulty: Medium

Design an algorithm to encode a list of strings to a string. The encoded string is then sent over the network and is decoded back to the original list of strings.

Machine 1 (sender) has the function:

string encode(vector<string> strs) {
  // ... your code
  return encoded_string;
}

Machine 2 (receiver) has the function:

vector<string> decode(string s) {
  //... your code
  return strs;
}

So Machine 1 does:

string encoded_string = encode(strs);

and Machine 2 does:

vector<string> strs2 = decode(encoded_string);

strs2 in Machine 2 should be the same as strs in Machine 1.

Implement the encode and decode methods.

Note:

The string may contain any possible characters out of 256 valid ascii characters. Your algorithm should be generalized enough to work on any possible characters.
Do not use class member/global/static variables to store states. Your encode and decode algorithms should be stateless.
Do not rely on any library method such as eval or serialize methods. You should implement your own encode/decode algorithm.

Code:

class Codec:
    def encode(self, strs): 
        return ''.join(s.replace('|', '||') + ' | ' for s in strs) 
    def decode(self, s): 
        return [t.replace('||', '|') for t in s.split(' | ')[:-1]]

Idea:

Use ” | ” as a delimiter of strings. If any string has “|” in it, you need to replace it with “||” in advance. When decoding, you first get ” | ” splitted words. Then, if there is any “||” in splitted words, you need to replace it back with “|”. Don’t forget to exclude the last element of `s.split(‘ | ‘)` since it is an empty string.

Reference:

https://leetcode.com/discuss/54910/1-liners-ruby-python

Leetcode 15: 3Sum

https://leetcode.com/problems/3sum/

3Sum

Total Accepted: 78227 Total Submissions: 457872 Difficulty: Medium

Given an array S of n integers, are there elements a, b, c in S such that a + b + c = 0? Find all unique triplets in the array which gives the sum of zero.

Note:

Elements in a triplet (a,b,c) must be in non-descending order. (ie, a ≤ b ≤ c)
The solution set must not contain duplicate triplets.

    For example, given array S = {-1 0 1 2 -1 -4},

    A solution set is:
    (-1, 0, 1)
    (-1, -1, 2)

Code

class Solution(object):
    def threeSum(self, nums):
        """
        :type nums: List[int]
        :rtype: List[List[int]]
        """
        if not nums or len(nums) < 3:
            return []
            
        res = []
        nums.sort()
        
        for i in xrange(len(nums)-2):
            if i > 0 and nums[i-1] == nums[i]:
                continue
            
            target = -nums[i]
            left = i + 1
            right = len(nums) - 1
            while left < right:
                if nums[left] + nums[right] == target:
                    res.append([nums[i], nums[left], nums[right]])
                    left_val = nums[left]
                    right_val = nums[right]
                    left += 1
                    right -= 1
                    while left < right and nums[left]==left_val:
                        left += 1
                    while right > left and nums[right]==right_val:
                        right -= 1
                elif nums[left] + nums[right] < target:
                    left = left + 1
                else:
                    right = right - 1
                    
        return res

Idea:

You sort the array first in O(nlogn) time. Then you have a pointer `i` starting from 0 and ending at len(nums)-3. The number pointed by `i` denotes the candidate for the first number in a 3sum triplet. You want to run a 2sums algorithm on all elements on the right of `i` with target equal to `-nums[i]`: you create a `left` pointer right next to `i` and a `right` pointer at the end of nums. You move `left` to right or move `right` to left according to `nums[left] + nums[right] ? target`. You should take care of duplication, either for `i` (line 14 -15) and for `left` and `right` (line 27-30)

The time complexity is O(N^2) and space complexity is O(1).

Reference:

Leetcode 1: Two sums

https://leetcode.com/problems/two-sum/

Two Sum

Total Accepted: 141545 Total Submissions: 770154 Difficulty: Medium

Given an array of integers, find two numbers such that they add up to a specific target number.

The function twoSum should return indices of the two numbers such that they add up to the target, where index1 must be less than index2. Please note that your returned answers (both index1 and index2) are not zero-based.

You may assume that each input would have exactly one solution.

Input: numbers={2, 7, 11, 15}, target=9
Output: index1=1, index2=2

Code:

class Solution(object):
    def twoSum(self, nums, target):
        """
        :type nums: List[int]
        :type target: int
        :rtype: List[int]
        """
        dict = {}
        for idx, vlu in enumerate(nums):
            idx2 = dict.get(target - vlu, -1)
            if idx2 < 0:
                dict[vlu] = idx
            else:
                return [idx2+1, idx+1]

Idea:

O(N) space. O(N) time. Read my another post on 3Sum, which sorts array in place in O(nlogn) time but costs no additional space. (3Sum algorithm takes O(N^2) overall)

UPDATE (2016/2/13):
The return format had been changed to zero-based indices. Please read the above updated description carefully.

Leetcode 33: Search in Rotated Sorted Array

https://leetcode.com/problems/search-in-rotated-sorted-array/

Search in Rotated Sorted Array

Total Accepted: 72911 Total Submissions: 252621 Difficulty: Hard

Suppose a sorted array is rotated at some pivot unknown to you beforehand.

(i.e., 0 1 2 4 5 6 7 might become 4 5 6 7 0 1 2).

You are given a target value to search. If found in the array return its index, otherwise return -1.

You may assume no duplicate exists in the array.

Code

class Solution(object):
    def search(self, nums, target):
        """
        :type nums: List[int]
        :type target: int
        :rtype: int
        """
        if not nums:
            return -1
        
        left, right = 0, len(nums)-1                    
        while left <= right:
            mid = (left + right) / 2
            if nums[mid] == target:
                return mid
            if nums[left] <=nums[mid]:
            # left ok
                if nums[left] <= target < nums[mid]:
                    right = mid - 1
                else:
                    left = mid + 1
            else:
            # right ok
                if nums[right] >= target > nums[mid]:
                    left = mid + 1
                else:
                    right = mid - 1

        return -1

Idea

It is a variant of binary search. You have `left`, `right` and `mid` three pointers. You only return exact index of `mid` if you find `nums[mid] == target`. Otherwises you return -1. To identify whether you need to search in `[left, mid-1]` or `[mid+1, right]`, you need to determine `nums[left] <= nums[mid]` (line 16). If it is True, that means `nums[left:mid]` is intact from rotation. If it is False, that means `nums[mid+1:right]` is intact by rotation. Then you determine you need to search in the intact part of the rotated part (line 18 and 24).

Reference:

Leetcode 272: Closest Binary Search Tree Value II

https://leetcode.com/problems/closest-binary-search-tree-value-ii/

Closest Binary Search Tree Value II

Total Accepted: 1212 Total Submissions: 4499 Difficulty: Hard

Given a non-empty binary search tree and a target value, find k values in the BST that are closest to the target.

Note:

Given target value is a floating point.
You may assume k is always valid, that is: k ≤ total nodes.
You are guaranteed to have only one unique set of k values in the BST that are closest to the target.

Follow up:
Assume that the BST is balanced, could you solve it in less than O(n) runtime (where n = total nodes)?

Code

# Definition for a binary tree node.
# class TreeNode(object):
#     def __init__(self, x):
#         self.val = x
#         self.left = None
#         self.right = None

class Solution(object):
    def closestKValues(self, root, target, k):
        """
        :type root: TreeNode
        :type target: float
        :type k: int
        :rtype: List[int]
        """
        stk1 = []
        stk2 = []
        self.inorder(root, False, target, stk1)
        self.inorder(root, True, target, stk2)
        
        res = []
        for _ in xrange(k):
            if not stk1:
                res.append(stk2.pop())
            elif not stk2:
                res.append(stk1.pop())
            else:
                if abs(stk1[len(stk1)-1] - target) < abs(stk2[len(stk2)-1] - target):
                    res.append(stk1.pop())
                else:
                    res.append(stk2.pop())
        return res
        
    def inorder(self, root, reverse, target, stk):
        if root is None:
            return 
        self.inorder(root.right, reverse, target, stk) if reverse else self.inorder(root.left, reverse, target, stk)
        # The first inequality is less than or equal, the second inequality must be larger than (without equal).
        # Or the first is less than, the second is larger than or equal to
        if not reverse and target <= root.val:
            return
        if reverse and target > root.val:
            return
        stk.append(root.val)
        self.inorder(root.left, reverse, target, stk) if reverse else self.inorder(root.right, reverse, target, stk)

Idea:

If we do inorder depth-first traverse (left->root->right) of a binary search tree, we can get an ascending sorted list of elements. If we do reverse inorder depth-first traverse (right->root->left) of a binary search tree, we can get a descending sorted list.

So we create two stacks to store. Stack 1 records the ascending sorted list but it terminates before the element which <= target. (line 40). Stack 2 records the descending sorted list but it terminates at the element which < target. (line 42). (You can also make the first inequality with less than, and the second inequality with larger than or equal to. The main point is that the two stacks shouldn’t contain any element at the same time.)

At last you peek at stack 1 and stack 2. You need to pop element into `res` list from whichever stack has the top element closer to target.

Reference:

https://leetcode.com/discuss/55240/ac-clean-java-solution-using-two-stacks