Leetcode 148: Sort List

Total Accepted: 54042 Total Submissions: 237747 Difficulty: Medium

Sort a linked list in O(n log n) time using constant space complexity.

Code:

# Definition for singly-linked list.
# class ListNode(object):
#     def __init__(self, x):
#         self.val = x
#         self.next = None

class Solution(object):
    def sortList(self, head):
        """
        :type head: ListNode
        :rtype: ListNode
        """
        if head is None or head.next is None:
            return head
        slow= head
        fast = head.next
        while fast.next and fast.next.next:
            slow = slow.next
            fast = fast.next.next
        
        s2 = self.sortList(slow.next)
        slow.next = None
        s1 = self.sortList(head)
        return self.merge(s1, s2)
        
        
    def merge(self, left, right):
        head = ListNode(0)
        c = head

        while left and right:
            if right.val < left.val:
                c.next = right
                c = right
                right = right.next
            else: 
                c.next = left
                c = left
                left = left.next
        while left:
                c.next = left
                c = left
                left = left.next
        while right:
                c.next = right
                c = right
                right = right.next
                
                
        return head.next

 

Idea: 

Essentially, it is a merge sort. And we apply the same idea as seen in Find LinkedList Cycle that we create a slow point and a fast point: slow point moves to next node every time, while fast point moves to point next node of next node every time. When the fast point hits the end, slow point stops at the middle of the linked list. Then we apply merge sort starting at slow.next. After that, we want to apply merge sort again in the first half linkedlist, from head to slow. So we need to set slow.next to None.

 
 

Leetcode 257: Binary Tree Path

https://leetcode.com/problems/binary-tree-paths/

Binary Tree Paths

Total Accepted: 15076 Total Submissions: 68979 Difficulty: Easy

Given a binary tree, return all root-to-leaf paths.

For example, given the following binary tree:

   1
 /   \
2     3
 \
  5

All root-to-leaf paths are:

["1->2->5", "1->3"]

 

Code1:

# Definition for a binary tree node.
# class TreeNode(object):
#     def __init__(self, x):
#         self.val = x
#         self.left = None
#         self.right = None

class Solution(object):
    def binaryTreePaths(self, root):
        """
        :type root: TreeNode
        :rtype: List[str]
        """
        if root is None:
            return []
            
        self.res = [] if root.left or root.right else [str(root.val)]
        if root.left:
            self.helper(str(root.val), root.left) 
        if root.right:
            self.helper(str(root.val), root.right)
        return self.res

    def helper(self, path, node):
        if node.left is None and node.right is None:
            self.res.append(path + "->" + str(node.val))
        else:
            if node.left:
                self.helper(path+ "->" + str(node.val), node.left)
            if node.right:
                self.helper(path+ "->" + str(node.val), node.right)

 

Code2:

# Definition for a binary tree node.
# class TreeNode(object):
#     def __init__(self, x):
#         self.val = x
#         self.left = None
#         self.right = None

class Solution(object):
    def binaryTreePaths(self, root):
        """
        :type root: TreeNode
        :rtype: List[str]
        """
        res = []
        if root is None:
            return res

        if root.left is None and root.right is None:
            res.append(str(root.val))
        else:
            if root.left:
                res.extend([str(root.val) + "->" + p for p in self.binaryTreePaths(root.left)])
            if root.right:
                res.extend([str(root.val) + "->" + p for p in self.binaryTreePaths(root.right)])
        print res
        return res

 

Code 1 is more optimized than code 2 in that code 2 will always return `res` list as result hence waste a lot of space. All in all it is just a DFS algorithm.

 

Leetcode 187: Repeated DNA Sequences

https://leetcode.com/problems/repeated-dna-sequences/

Repeated DNA Sequences

Total Accepted: 26334 Total Submissions: 125642 Difficulty: Medium

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

 

Code:

class Solution(object):
    def findRepeatedDnaSequences(self, s):
        """
        :type s: str
        :rtype: List[str]
        """
        ecd = 0
        res = []
        d = dict()
        for i in xrange(len(s)):
            ecd = ecd << 3
            ecd = ecd | ord(s[i]) & 7
            ecd = ecd & int('0b'+'1'*30, 2) 
            
            if d.get(ecd) is None:
                d[ecd] = 1
            elif d[ecd] == 1 and i >= 9:
                res.append(s[i-9:i+1])
                d[ecd] = 2
        return res

 

Idea:

Since we only have “A”, “C”, “G” and “T”, we can encode each character using the last three bits of its binary representation. We want to encode every 10 character substring eventually. We create a variable `ecd`, which should only have 30 bits valid (10 character*3 bits/character). Since python can allow you to have very large integer, you need to mask the last 30 bits. (Line 13) For every reading bit, you left shift the previous code 3 bits (line 11), intercept this character’s last 3 bits and then fill the last 3 empty bits (line 12).

Then it is easy to find whether one encode has been saved or not. If the encode already exists as a key in the dictionary, you only need to report it once.

Reference:

https://leetcode.com/discuss/24478/i-did-it-in-10-lines-of-c

Leetcode 236: Lowest Common Ancestor of a Binary Tree

https://leetcode.com/problems/lowest-common-ancestor-of-a-binary-tree/

Lowest Common Ancestor of a Binary Tree

Total Accepted: 17453 Total Submissions: 63885 Difficulty: Medium

Given a binary tree, find the lowest common ancestor (LCA) of two given nodes in the tree.

According to the definition of LCA on Wikipedia: “The lowest common ancestor is defined between two nodes v and w as the lowest node in T that has both v and w as descendants (where we allow a node to be a descendant of itself).”

        _______3______
       /              \
    ___5__          ___1__
   /      \        /      \
   6      _2       0       8
         /  \
         7   4

For example, the lowest common ancestor (LCA) of nodes 5 and 1 is 3. Another example is LCA of nodes 5 and 4 is 5, since a node can be a descendant of itself according to the LCA definition.

Code:

# Definition for a binary tree node.
# class TreeNode(object):
#     def __init__(self, x):
#         self.val = x
#         self.left = None
#         self.right = None

class Solution(object):
    def lowestCommonAncestor(self, root, p, q):
        if root is None:
            return None
        if root == p or root == q:       
            return root
        left = self.lowestCommonAncestor(root.left, p, q)
        right = self.lowestCommonAncestor(root.right, p, q)
        if left and right:
            return root
        elif left:
            return left
        elif right:
            return right
        else:
            None

 

Idea:

lowestCommonAncestor will do the following judgement:

  1. if applying lowestCommonAncestor on root.left returns non-None value, it means at least p or q is in its left subtree.
  2. if appling lowestCommonAncestor on root.right returns non-None value, it means at least p or q is in its right subtree.
  3. if both root’s left and root’s right return non-None values, then p and q are in its left subtree and right subtree. So the common ancestor should be root. 
  4. if only one of root’s left and root’s right returns non-None value, that means both p and q is in that subtree. In this case, just return whichever non-None value you get.
  5. If lowestCommonAncestor returns none, that means neither p or q is in root and its subtrees.

 

Leetcode 226: Invert Binary Tree

Total Accepted: 39609 Total Submissions: 100664 Difficulty: Easy

Invert a binary tree.

     4
   /   \
  2     7
 / \   / \
1   3 6   9

to

     4
   /   \
  7     2
 / \   / \
9   6 3   1

Trivia:
This problem was inspired by this original tweet by Max Howell:

Google: 90% of our engineers use the software you wrote (Homebrew), but you can’t invert a binary tree on a whiteboard so fuck off.

 

Code

# Definition for a binary tree node.
# class TreeNode(object):
# def __init__(self, x):
# self.val = x
# self.left = None
# self.right = None

class Solution(object):
    def invertTree(self, root):
        """
        :type root: TreeNode
        :rtype: TreeNode
        """
        if root is None:
            return

        root.left, root.right = root.right, root.left

        if root.left:
            self.invertTree(root.left)
        if root.right:
            self.invertTree(root.right) 

        return root

 

 

 

Leetcode 5: Longest Palindromic Substring

Longest Palindromic Substring

Total Accepted: 71879 Total Submissions: 343504 Difficulty: Medium

Given a string S, find the longest palindromic substring in S. You may assume that the maximum length of S is 1000, and there exists one unique longest palindromic substring.

Code:

class Solution(object):
    def longestPalindrome(self, s):
        """
        :type s: str
        :rtype: str
        """
        if len(s) == 0:
            return ""

        start = 0
        max_len = 1        

        for i in xrange(len(s)):
            if i-max_len-1 >=0 and s[i-max_len-1:i+1] == s[i-max_len-1:i+1][::-1]:
                start = i - max_len - 1
                max_len = max_len + 2
                continue

            if i-max_len >=0 and s[i-max_len:i+1] == s[i-max_len:i+1][::-1]:
                start = i - max_len
                max_len = max_len + 1

        return s[start:start+max_len]

 

Idea: 

The optimal solution is O(N^2) time complex. So nothing very special. You traverse every position in `s`. You always compare if s[i-max_len-1:i+1] reads same as s[i-max_len-1:i+1][::-1]. See the picture below.

Untitled Diagram

We first test whether s[i-max_len-1:i+1] is palindrome then s[i-max_len:i+1] because the next palindrome can possibly be longer than the previous palindrome by at most 2 characters. 

 

Reference:

https://leetcode.com/discuss/21332/python-o-n-2-method-with-some-optimization-88ms

Compile Hadoop 2.7 on Eclipse on Ubuntu 15.04

  1. Install libtool: sudo apt-get install libtool 
  2. Install protoc 2.5: http://stackoverflow.com/questions/29797763/how-do-i-install-protobuf-2-5-on-arch-linux-for-compiling-hadoop-2-6-0-using-mav
  3. Git clone hadoop-common and compile using Maven then import into Eclipse: https://wiki.apache.org/hadoop/EclipseEnvironment
  4. Make sure Eclipse uses officially supported JRE: https://wiki.apache.org/hadoop/HadoopJavaVersions, http://help.eclipse.org/mars/index.jsp?topic=%2Forg.eclipse.jdt.doc.user%2Ftasks%2Ftask-add_new_jre.htm
  5. You may still find some class cannot be resolved, or some import package can’t be resolved. This is due to that Maven generates some java files that Eclipse doesn’t know how to link. The only way to do that is to move those mvn generated files (usually in `some nested directory/target/generated-sources or generated-test-sources`) to the correct place as indicated from the Eclipse error message.
  6. For other miscellaneous errors you can refer to http://diary-of-programmer.blogspot.com/2014/11/hadoop-tips-hadoop-build-error-related-findbugs-protobuf-avro.html
  7. Run the word count example to test!

Python multiprocessing map function with shared memory object as additional parameter

Suppose you want to create a bunch of processes to consume a list of elements:

import multiprocessing
def consume(ele):
    return ele ** 2    

if __name__ == '__main__': 
    m_list = [9,8,7,6,5]
    pool = multiprocessing.Pool(processes=300)
    result = pool.map(consume, m_list)
    print result

 

Now, if you want to have a dictionary to be shared across processes, and pass it as an additional parameter to `consume`:

import multiprocessing
from multiprocessing import Manager
import functools

def consume(ele, dic):
    if dic.get(ele) is None:
        dic[ele] = ele ** 2
    return ele ** 2    

if __name__ == '__main__': 
    m_list = [9,8,7,6,5]
    # Use proc manager to create a shared memory dict across processes
    proc_manager = Manager()
    m_dict = proc_manager.dict([(s, s**2) for s in [3,4,5,6]])
    pool = multiprocessing.Pool(processes=300)
    result = pool.map(functools.partial(consume, dic=m_dict), m_list)
    print result
    print m_dict

 

Notice that we use functools to add additional parameter. We use Manager to create a shared memory object across processes.

 

p.s.: This post may be helpful to understand how (when) processes are forked in Python: http://stackoverflow.com/questions/8640367/python-manager-dict-in-multiprocessing/9536888#9536888

Very important to set verbose output for Ubuntu User

I often mess up with Ubuntu System so much that I can’t manage to boot the system. I always end up with being stuck at the splash screen with no clue what is going on in the background. So, I suggest that for every ubuntu user you should turn on verbose output during the boot. This can be achieved by:

  1. open /etc/default/grub, change quiet splash to empty string for `GRUB_CMDLINE_LINUX_DEFAULT`.
    # GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
    GRUB_CMDLINE_LINUX_DEFAULT=""
  2.  Update to take effect:

    sudo update-grub

     

Reference:

http://askubuntu.com/questions/248/how-can-i-show-or-hide-boot-messages-when-ubuntu-starts

Leetcode 4: Median of Two Sorted Arrays

https://leetcode.com/problems/median-of-two-sorted-arrays/

Median of Two Sorted Arrays

Total Accepted: 65901 Total Submissions: 385447 Difficulty: Hard

There are two sorted arrays nums1 and nums2 of size m and n respectively. Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

Let me first show the code and then show the idea.

import sys

class Solution(object):
    def findMedianSortedArrays(self, nums1, nums2):
        """
        :type nums1: List[int]
        :type nums2: List[int]
        :rtype: float
        """
        if len(nums1) == 0 and len(nums2) == 0:
           return None
        	
        if len(nums1) < len(nums2):
           nums1, nums2 = nums2, nums1

        if len(nums2) == 0:        
           return (nums1[(len(nums1)-1)/2] + nums1[len(nums1)/2]) / 2.0

        jmin, jmax = 0, 2*len(nums2)        
        while jmin <= jmax:
            jmid = (jmin + jmax) / 2
            imid = len(nums1) + len(nums2) - jmid
            i_left = nums1[(imid-1)/2] if imid else - sys.maxint
            i_right = nums1[imid/2] if 2*len(nums1) - imid else sys.maxint
            j_left = nums2[(jmid-1)/2] if jmid else - sys.maxint
            j_right = nums2[jmid/2] if 2*len(nums2) - jmid else sys.maxint
            if i_left > j_right:
                jmin = jmid + 1
            elif j_left > i_right:
                jmax = jmid - 1
            else: // i_left <= j_right and j_left <= i_right               
                return (max(i_left, j_left) + min(i_right, j_right)) / 2.0 

 

Idea

The length of nums1 or nums2 can be odd or even. The total length of nums1 and nums2 can also be odd or even. To calculate median, if the list has N (N is odd) elements, you should return the very middle element list[N/2]. If N is even, you should return the average of the two middle elements: (list[(N-1)/2] + list[N/2]) / 2. All these considerations will make your brain twisted. So first of all we need to come up with a more general notation system to deal with either odd or even numbers.

The notation system can be better illustrated in the following two examples:

original array: 99, 100, 101

original index: 0, 1, 2

expanded array: #, 99, #, 100, #, 101, #

expanded index: 0, 1, 2, 3, 4, 5, 6

median cut original index: 1 

median cut expanded index: 3

median: 100

 

original array: 99, 100, 101, 102

original index: 0, 1, 2, 3

expanded array: #, 99, #, 100, #, 101, #, 102, #

expanded index: 0, 1, 2, 3, 4, 5, 6, 7, 8

median cut original index: 2,3 

median cut expanded index: 4

median: (100 + 101)/2

The two arrays are expanded by adding `#` between numbers as well as at the beginning and at the end of the array. You may find that ‘#’ always have odd index in the expanded array. Numbers always have even index in the expanded array. Every position in the expanded array denotes a possible cut. A cut at position `c` (0-based, c!=0 and c!=len(arr)) in the expanded array will return:

  1. If you cut at `#`, then the cut will return the average of two numbers on its left and right. i.e., (ori[(c-1)/2] + ori)/2
  2. If you cut at a number in the expanded array, you return that number directly. Or equivalently, you can imagine you cut on that number so the number is copied and splitted both on the cut’s left side and right side. So you still return the average of two (same) numbers on its left and right, i.e., (ori[(c-1)/2] + ori)/2, since ori[(c-1)/2] and ori always point to that number in this case.

The expanded array always has odd number length. If we know where we should cut in the expanded array to find the median of the original array (say position `c`), you can always return (ori[(c-1)/2] + ori)/2. You can think that the largest value on the left of cut `c` in the original array is always ori[(c-1)/2]. The smallest value on the right of cut `c` in the original array is always ori. 

Now go back to our problem. We have nums1 and nums2. We make sure that nums1 always has no shorter length than nums2 in order to make us focus only on one case. (line 13-14.) 

From now on we suppose that nums1 and nums2 have already been expanded. So the two expanded arrays have 2*N1+1 and 2*N2+1 length respectively. (N1 >= N2). The total length of nums1 and nums2 is thus 2*N1+2*N2+2, an even number.  The whole algorithm’s idea is to find two cuts, cut_i and cut_j, in nums1 and nums2 so that the median of the two lists resides in the middle of the two cuts. Such two cuts, cut_i and cut_j, will have two following properties:

  1. There should be totally (N1+N2) elements on the left of cut_i in nums1 and on the left of cut_j in nums2. There should also be totally (N1+N2) elements on the right of cut_1 in nums1 and on the left of cut_j in nums2. So once cut_i’s or (cut_j’s) index is determined, the other is determined also.
  2. The median is between the two cuts’ values. So, the largest element in nums1 which is on the left of cut_i (nums1[(cut_i-1)/2]), should be no larger than the smallest element on the right of cut_j in nums2 (nums2[cut_j/2]). The largest element on the left of cut_j in nums2, nums2[(cut_j-1)/2], should be no larger than the smallest element on the right of cut_i in nums1 (nums1[cut_i/2]).  

Using these two properties, you can start to binary search a cut in nums2, and consequently determine the cut in nums1. If either of the two properties doesn’t hold, you need to adjust the cut in nums2 in a binary search manner. However, we shouldn’t first start binary search a cut in nums1 (the longer list) and then determine the cut in nums2 (the shorter list). Because you may find a cut in nums1 on the left of which there already are more elements than N1+N2. In that case, you can’t even determine where is the cut in nums2.

The only corner cases you need to handle is when any cut is at index 0 or end index. But such cases can be handled intuitively: if you cut at index 0, your left value should be -inf. If you cut at end index, your right value should be inf.

 

 

Reference

https://leetcode.com/discuss/41621/very-concise-iterative-solution-with-detailed-explanation