czxttkl – Page 14

Leetcode 181: Employees Earning More Than Their Managers

https://leetcode.com/problems/employees-earning-more-than-their-managers/

181. Employees Earning More Than Their Managers

Total Accepted: 27138
Total Submissions: 76486
Difficulty: Easy
Contributors: Admin

The Employee table holds all employees including their managers. Every employee has an Id, and there is also a column for the manager Id.

+----+-------+--------+-----------+
| Id | Name  | Salary | ManagerId |
+----+-------+--------+-----------+
| 1  | Joe   | 70000  | 3         |
| 2  | Henry | 80000  | 4         |
| 3  | Sam   | 60000  | NULL      |
| 4  | Max   | 90000  | NULL      |
+----+-------+--------+-----------+

Given the Employee table, write a SQL query that finds out employees who earn more than their managers. For the above table, Joe is the only employee who earns more than his manager.

+----------+
| Employee |
+----------+
| Joe      |
+----------+

Code

# Write your MySQL query statement below
select Name as "Employee" 
from Employee e1
where e1.Salary > (select e2.Salary from Employee e2 where e2.Id = e1.ManagerId)

Idea

Nothing too much.

Leetcode 180: Consecutive Numbers

180. Consecutive Numbers

Total Accepted: 12576
Total Submissions: 52571
Difficulty: Medium
Contributors: Admin

Write a SQL query to find all numbers that appear at least three times consecutively.

+----+-----+
| Id | Num |
+----+-----+
| 1  |  1  |
| 2  |  1  |
| 3  |  1  |
| 4  |  2  |
| 5  |  1  |
| 6  |  2  |
| 7  |  2  |
+----+-----+

For example, given the above Logs table, 1 is the only number that appears consecutively for at least three times.

Code

# Write your MySQL query statement below
select distinct l1.Num as ConsecutiveNums
from Logs l1, Logs l2, Logs l3
where l1.Id = l2.Id -1 and l1.Id = l3.Id-2 and l1.Num = l2.Num and l1.Num = l3.Num

Idea

For this kind of problems, you need to have multiple instances of the same table (Logs l1, Logs l2, Logs l3).

Leetcode 178: Rank Score

https://leetcode.com/problems/rank-scores/

178. Rank Scores

Total Accepted: 13824
Total Submissions: 58154
Difficulty: Medium
Contributors: Admin

Write a SQL query to rank scores. If there is a tie between two scores, both should have the same ranking. Note that after a tie, the next ranking number should be the next consecutive integer value. In other words, there should be no “holes” between ranks.

+----+-------+
| Id | Score |
+----+-------+
| 1  | 3.50  |
| 2  | 3.65  |
| 3  | 4.00  |
| 4  | 3.85  |
| 5  | 4.00  |
| 6  | 3.65  |
+----+-------+

For example, given the above Scores table, your query should generate the following report (order by highest score):

+-------+------+
| Score | Rank |
+-------+------+
| 4.00  | 1    |
| 4.00  | 1    |
| 3.85  | 2    |
| 3.65  | 3    |
| 3.65  | 3    |
| 3.50  | 4    |
+-------+------+

Code

# Write your MySQL query statement below
select Score,
(select count(distinct s2.Score) from Scores s2 where s2.score >= s1.score) as Rank
from  Scores s1
order by Score desc

Idea

Select count(*)… result can be returned as a column.

Leetcode 3466: Moving Average from Data Stream

346. Moving Average from Data Stream

Total Accepted: 11701
Total Submissions: 20717
Difficulty: Easy
Contributors: Admin

Given a stream of integers and a window size, calculate the moving average of all integers in the sliding window.

For example,

MovingAverage m = new MovingAverage(3);
m.next(1) = 1
m.next(10) = (1 + 10) / 2
m.next(3) = (1 + 10 + 3) / 3
m.next(5) = (10 + 3 + 5) / 3

Code

class MovingAverage(object):

    def __init__(self, size):
        """
        Initialize your data structure here.
        :type size: int
        """
        self.size = size
        self.vals = [0]* size
        self.cnt = 0
        self.sum = 0

    def next(self, val):
        """
        :type val: int
        :rtype: float
        """
        self.cnt += 1
        idx = (self.cnt - 1) % self.size 

        # replace old value
        self.sum -= self.vals[idx]
        self.sum += val
        self.vals[idx] = val

        return float(self.sum) / self.cnt if self.cnt < self.size \
            else float(self.sum) / self.size

# Your MovingAverage object will be instantiated and called as such:
# obj = MovingAverage(size)
# param_1 = obj.next(val)

Idea

A cyclic list

You can also use deque however when you calculate average it takes O(K) time where K is the window size.: https://discuss.leetcode.com/topic/44116/4-line-python-solution-using-deque

Leetcode 163: Missing Ranges

163. Missing Ranges

Total Accepted: 19465
Total Submissions: 66232
Difficulty: Medium
Contributors: Admin

Given a sorted integer array where the range of elements are in the inclusive range [lower, upper], return its missing ranges.

For example, given [0, 1, 3, 50, 75], lower = 0 and upper = 99, return ["2", "4->49", "51->74", "76->99"].

Code

class Solution(object):
    def findMissingRanges(self, nums, lower, upper):
        """
        :type nums: List[int]
        :type lower: int
        :type upper: int
        :rtype: List[str]
        """
        if nums is None:
            nums = []
        
        res = []
        nums = [lower-1] + nums + [upper+1]

        for i in xrange(1, len(nums)):
            prev, cur = nums[i-1], nums[i]
            if cur - prev == 2:
                res.append(str(cur-1))
            elif cur - prev > 2:
                res.append(str(prev+1) + '->' + str(cur-1))

        return res

Idea

To make the code consistent in different cases, I add lower-1 in front of nums and append upper+1 in the end of nums. Then if the current element is exactlye greater than the previous element by 2, then we miss a single number; else, if the current element is greater than the previous element more than 2, then we miss a range; if the current element is just 1 greater than the previous element, we do not do anything.

Also see a similar question: https//nb4799.neu.edu/wordpress/?p=944

Leetcode 388: Longest Absolute File Path

388. Longest Absolute File Path

Total Accepted: 11837
Total Submissions: 35565
Difficulty: Medium
Contributors: Admin

Suppose we abstract our file system by a string in the following manner:

The string "dir\n\tsubdir1\n\tsubdir2\n\t\tfile.ext" represents:

dir
    subdir1
    subdir2
        file.ext

The directory dir contains an empty sub-directory subdir1 and a sub-directory subdir2 containing a file file.ext.

The string "dir\n\tsubdir1\n\t\tfile1.ext\n\t\tsubsubdir1\n\tsubdir2\n\t\tsubsubdir2\n\t\t\tfile2.ext" represents:

dir
    subdir1
        file1.ext
        subsubdir1
    subdir2
        subsubdir2
            file2.ext

The directory dir contains two sub-directories subdir1 and subdir2. subdir1 contains a file file1.ext and an empty second-level sub-directory subsubdir1. subdir2contains a second-level sub-directory subsubdir2 containing a file file2.ext.

We are interested in finding the longest (number of characters) absolute path to a file within our file system. For example, in the second example above, the longest absolute path is"dir/subdir2/subsubdir2/file2.ext", and its length is 32 (not including the double quotes).

Given a string representing the file system in the above format, return the length of the longest absolute path to file in the abstracted file system. If there is no file in the system, return0.

Note:

The name of a file contains at least a . and an extension.
The name of a directory or sub-directory will not contain a ..

Time complexity required: O(n) where n is the size of the input string.

Notice that a/aa/aaa/file1.txt is not the longest file path, if there is another path aaaaaaaaaaaaaaaaaaaaa/sth.png.

Code

class Solution(object):
    def lengthLongestPath(self, input):
        """
        :type input: str
        :rtype: int
        """
        if not input:
            return 0
        
        longest = 0
        prev_tab_cnt = 0
        # store tuples (name, level)
        stk = []

        s = input.split('\n')
        
        for ss in s:
            ss_tab_cnt = 0
            while ss.startswith('\t'):
                ss_tab_cnt += 1
                # \t only takes one character
                ss = ss[1:]
            
            while stk and stk[-1][1] >= ss_tab_cnt:
                stk.pop()
            stk.append((ss, ss_tab_cnt))
            if self.isfile(ss):
                longest = max(longest, len('/'.join(map(lambda x: x[0], stk))))

        return longest

    def isfile(self, s):
        if s.startswith('.') or len(s.split('.')) >= 2:
            return True
        else:
            return False

Idea

Maintain a stack which stores the names of file/directory as well as their levels. Whenever you see it is a file name, concatenate all names in the stack. Whenever you see the current level is less than the stack top’s level, that means the previous trace has ended and the current file/directory starts from some upper level. So you need to pop stack (line 24~25).

You can change stack to dictionary. Reference: https://discuss.leetcode.com/topic/55097/simple-python-solution

Why do we use Poisson distribution or Negative Binomial distribution for regression?

Let’s say we have predictor variables (features) denoted as $latex X \in \mathbb{R}^n$ and response variable (label) $latex y$ whose underlying random variable is $latex Y$. If we want to fit an Ordinary Least Square (OLS) regression such that $latex y=WX+\epsilon$ where $latex \epsilon$ is an error term, then we have the following assumptions:

strict exogenous: $latex E(\epsilon|X)=0$ (error is not correlated with data itself, i.e., error comes from outside system)
spherical error: $latex Var(\epsilon|X)=\sigma^2 I_n$ (variance on each dimension should be the same.)
… More on wikipedia

If we think $latex Y$, rather than growing linearly as X does, actually grows exponentially, then we have $latex E[log (Y)|X] = WX$. We can treat $latex y’ = log(y)$ and fit a new OLS regression. However, the problem is that when $latex WX$ is smaller than 0, it causes predicted $latex \hat{y}’$ smaller than 0, which makes no sense given it is the result of log.

Although we can ignore this in practice if fit is already good, the alternative is to fit a Poisson regression. $latex Y \sim \text{Poisson Distribution}(WX)$, i.e.,

$latex \lambda := E(Y|X;W)=e^{WX} \\ p(y|X;W)=\frac{\lambda^y}{y!}e^{-\lambda}= \frac{e^{yWX} \cdot e^{-e^{WX}}}{y!} &s=2$

We can learn $latex W$ by doing optimization to maximize the overall likelihood.

The advantage of using Poisson distribution/regression is that it automatically deals with non-negative properties. No matter how small $latex Wx$ is, $latex y$ will always be positive. This can be illustrated through the following plot from wikipedia:

Screenshot from 2016-11-06 09:44:53

Replace y axis’ title with $latex p(y|x;W)$. We find that if $latex \lambda$ is small (e.g., close to 1), $latex p(y|x;W)$’s probability mass will be concentrated around 0 but still strictly larger than 0. When $latex \lambda$ is close to 4, $latex p(y|x;W)$’s probability mass is a skewed peak. When $latex \lambda$ is becoming larger, then the shape of $latex p(y|x;W)$’s distribution is similar to a normal distribution.

The strong assumption brought by Poisson regression is that $latex E(Y|X;W)=Var(Y|X;W)$. If this assumption does not hold, you may have a poor fit. For example, there is a phenomenon called overdispersion: When $latex WX$ is large, $latex Var(Y|X;W)>E(Y|X;W)$ due to the inability of the model to account for the complication when data values are extreme. People have applied various models, such as negative binomial model or zero-inflated model, to fit the data if Poisson regression dose not suit. See: http://www.ats.ucla.edu/stat/stata/seminars/count_presentation/count.htm

Now, let’s look at one of the models, negative binomial model. In this model, the mean of $latex log(Y)$ is accounted by both $latex WX$ and an unobserved error term $latex \epsilon$:

$latex E[Y|X, \epsilon;W] = e^{WX+\epsilon}= e^{WX} \cdot e^{\epsilon} = \mu \cdot \tau \\ p(y|X,\tau;W)= \frac{e^{-\mu\tau} \cdot (\mu\tau)^y}{y!}&s=2$

If we assume $latex \tau$ follows a gamma distribution, i.e. $latex p(\tau;\theta) \sim gamma(\theta, \theta)$, then:

$latex p(y|X;W) = \int_{0}^{\infty} p(y|X, \tau;W) p(\tau;\theta) d\tau \\= \frac{\Gamma(y+\theta)}{y!\Gamma(\theta)}(\frac{\theta}{\theta+\mu})^\theta (\frac{\mu}{\theta+\mu})^y &s=2$

We integrate out $latex \tau$ resulting to a negative binomial distribution, with:

$latex E[Y|X] = \mu \\ Var[Y|X] = \mu + \frac{\mu^2}{\theta} &s=2$

Therefore, the negative binomial model bypasses the restriction that $latex E[Y|X] = Var[Y|X]$ by introducing a new parameter $latex \theta$ to make the model more flexible when fitting to data.

Although Poisson regression originally assumes y should be positive integers (that’s why we often hear that Poisson regression is for count data), there are techniques to allow y to be positive real values. See http://stats.stackexchange.com/questions/70054/how-is-it-possible-that-poisson-glm-accepts-non-integer-numbers

Reference:

http://stats.stackexchange.com/questions/3024/why-is-poisson-regression-used-for-count-data

http://maider.blog.sohu.com/304621504.html

http://www.karlin.mff.cuni.cz/~pesta/NMFM404/NB.html

Leetcode 198: House Robber

https://leetcode.com/problems/house-robber/

198. House Robber

Total Accepted: 101581
Total Submissions: 275668
Difficulty: Easy
Contributors: Admin

You are a professional robber planning to rob houses along a street. Each house has a certain amount of money stashed, the only constraint stopping you from robbing each of them is that adjacent houses have security system connected and it will automatically contact the police if two adjacent houses were broken into on the same night.

Given a list of non-negative integers representing the amount of money of each house, determine the maximum amount of money you can rob tonight without alerting the police.

Code (Naive)

class Solution(object):
    def rob(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        if not nums:
            return 0
        return max(nums[0] + self.helper(nums, 2), self.helper(nums, 1))
    
    def helper(self, nums, idx):
        if idx >= len(nums):
            return 0
        else:
            return max(nums[idx] + self.helper(nums, idx+2), self.helper(nums, idx+1))

Idea

This is my naive code. While this seems working, it actually takes O(2^N) time and O(N) space.

Code

class Solution(object):
    def rob(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        if not nums:
            return 0
            
        prev, cur = 0, 0
        for n in nums:
            new_cur = max(prev+n, cur)
            prev = cur
            cur = new_cur
        
        return cur

Idea

It uses two caches: prev, cur. For every n in nums, prev is the maximum stolen amount until the element which is two elements apart from n. cur is the maximum stolen amount until the element before n. Therefore, the maximum stolen amount until n, new_cur, is max(prev+n, cur).

Reference: https://discuss.leetcode.com/topic/17199/python-solution-3-lines

Leetcode 298: Binary Tree Longest Consecutive Sequence

https://leetcode.com/problems/binary-tree-longest-consecutive-sequence/

Given a binary tree, find the length of the longest consecutive sequence path.

The path refers to any sequence of nodes from some starting node to any node in the tree along the parent-child connections. The longest consecutive path need to be from parent to child (cannot be the reverse).

For example,

Longest consecutive sequence path is 3-4-5, so return 3.

Longest consecutive sequence path is 2-3,not3-2-1, so return 2.

Code

# Definition for a binary tree node.
# class TreeNode(object):
#     def __init__(self, x):
#         self.val = x
#         self.left = None
#         self.right = None

class Solution(object):
    def longestConsecutive(self, root):
        """
        :type root: TreeNode
        :rtype: int
        """
        if root is None:
            return 0
            
        self.res = 0
        self.helper(root, root.left, 1)
        self.helper(root, root.right, 1)
        return self.res
        
    def helper(self, prev, cur, length):
        if cur is None:
            self.res = max(self.res, length)
            return
            
        if cur.val == prev.val + 1:
            self.helper(cur, cur.left, length+1)
            self.helper(cur, cur.right, length+1)
        else:
            self.res = max(self.res, length)
            self.helper(cur, cur.left, 1)
            self.helper(cur, cur.right, 1)

Idea

There are three parameters of helper function, prev (the previous node), cur (the current node) and length (the accumulated length before cur). If cur.val == prev.val + 1, then we can continue exploring the current node’s children with length increased by 1 (line 28, 29). Otherwise, refresh length. (line 32, 33)

The time complexity in the worst case will always be O(N) where N is the number of tree nodes.

Of course you can implement one that does not require a global variable: https://discuss.leetcode.com/topic/29205/simple-recursive-dfs-without-global-variable

Leetcode 70: Climbing Stairs

70. Climbing Stairs

Total Accepted: 139063
Total Submissions: 363734
Difficulty: Easy
Contributors: Admin

You are climbing a stair case. It takes n steps to reach to the top.

Each time you can either climb 1 or 2 steps. In how many distinct ways can you climb to the top?

Code

from math import sqrt

class Solution(object):
    def climbStairs(self, n):
        """
        :type n: int
        :rtype: int
        """
        if n < 0:
            return -1
        elif n == 0:
            return 0
        else:
            return int((((1 + sqrt(5))/2)**(n+1) - ((1 - sqrt(5))/2)**(n+1))/sqrt(5))

Idea

Essentially a fibonacci sequence.