Leetcode 141: Linked List Cycle

Linked List Cycle

 
Total Accepted: 74414 Total Submissions: 204066 Difficulty: Medium

Given a linked list, determine if it has a cycle in it.

Follow up:
Can you solve it without using extra space?

https://leetcode.com/problems/linked-list-cycle/

 

Naive: O(N) space, O(N) time

# Definition for singly-linked list.
# class ListNode(object):
#     def __init__(self, x):
#         self.val = x
#         self.next = None

class Solution(object):
    def hasCycle(self, head):
        """
        :type head: ListNode
        :rtype: bool
        """
        if not head or not head.next:
            return False
            
        dic = {id(head): True}
        
        while head.next is not None:
            head = head.next
            if dic.get(id(head), None):
               return True
            else:
                dic[id(head)] = True
        return False 
        

 

Use two pointers (one fast, one slow). The distance between them will increase from 0 to at most half of the length of the list, as the slow point traverses the linked list one node by one node. If there is a loop, the fast pointer will eventually meet with the slow pointer.

# Definition for singly-linked list.
# class ListNode(object):
#     def __init__(self, x):
#         self.val = x
#         self.next = None

class Solution(object):
    def hasCycle(self, head):
        """
        :type head: ListNode
        :rtype: bool
        """
        if not head or not head.next:
            return False
        
        slow_p = fast_p = head
        while fast_p and fast_p.next:
            slow_p, fast_p = slow_p.next, fast_p.next.next
            if slow_p is fast_p:
                return True
        return False
            

 

 

Leetcode 282: Expression Add Operators

Expression Add Operators

https://leetcode.com/problems/expression-add-operators/

Given a string that contains only digits 0-9 and a target value, return all possibilities to add operators +, -, or * between the digits so they evaluate to the target value.

Examples:

"123", 6 -> ["1+2+3", "1*2*3"] 
"232", 8 -> ["2*3+2", "2+3*2"]
"00", 0 -> ["0+0", "0-0", "0*0"]
"3456237490", 9191 -> []

 

Code:

class Solution(object):
    def addOperators(self, num, target):
        """
        :type num: str
        :type target: int
        :rtype: List[str]
        """
        def solve(target, pos, negate, prod):
            expr = []

            for i in xrange(pos, len(num)):
                if i != pos and num[pos] == "0":
                    break

                if i == len(num) -1:
                    if negate * prod * int(num[pos:i+1]) == target:
                        expr.extend([num[pos:i+1]])
                    break
                
                add_expr = solve(target - prod * negate * long(num[pos:i+1]), i+1, 1, 1)
                expr.extend([num[pos:i+1] + "+" + e for e in add_expr])

                sub_expr = solve(target - prod * negate * long(num[pos:i+1]), i+1, -1, 1)     
                expr.extend([num[pos:i+1] + "-" + e for e in sub_expr])

                mul_expr = solve(target, i+1, 1, prod * negate * long(num[pos:i+1]))
                expr.extend([num[pos:i+1] + "*" + e for e in mul_expr])

            return expr
        
        return solve(target, 0, 1, 1)

 

Idea:

Overall we use recursion (DFS) to solve this problem. 

We need to determine where to put the first operator. Operators can be added randomly among digits. For example, output expressions can contain multi-digits number such as `12 + 123* 0`. That is why we need `for i in xrange(pos, len(num)):` in line `11` to try out all possible positions to insert the first operator.

Then we need to avoid to put invalid numbers in the expression which start with “0”. That is done by line `12` to `13`.

The main part is `solve` function. It records the current position we are looking at (`pos`), whether the previous operation is minus (`negate`) and a product of previous numbers if the previous operation is product (`prod`).  The process is to try different operators and numbers, and to evaluate the current expression being tried.

 

If the next operation after the first number (A) is `+`, for example:

A+B*C = target

A+B+C = target

then we can equivalently try to solve (B C) with target=target-A. But we need to consider A if any operator before it is minus or multiple. That’s why we should rather reframe target = target – prod * negate * A.

 

If the next operation after the first number (A) is `-`, for example,

A – B*C = target

A – B+C = target

then we can equivalently try to solve(-B C) with target=target-A. Here we negate B by letting negate=-1 in the `solve `function.

 

If the next operation after the first number (A) is `*`, for example,

A*B*C = target

A*B+C = target

then we need to record the A as the current accumulated product. 

 

 

 

How to remote access mysql via ssh

You need to let mysql listening on the whole Internet (or the IP address you want to restrict to). By default mysql only allows inbound traffic from localhost.

To configure that, first you need to locate the configuration file that mysql takes

mysql --help --verbose | more

# You will find a paragraph like:

Default options are read from the following files in the given order:
/etc/my.cnf /etc/mysql/my.cnf /usr/etc/my.cnf ~/.my.cnf 

For example, `/etc/mysql/my.cnf` contains `bind_address: 127.0.0.1`. Then we need to change it to `bind_address: 0.0.0.0`.

 

Then, for example in mysql, you have a user name `root`. You should go into mysql shell and grant the user `root` with previlige to be used on your server machine’s Internet IP address:

# substitute 1.2.3.4 by the server machine's Internet IP address
GRANT ALL PRIVILEGES ON *.* TO 'root'@'1.2.3.4' IDENTIFIED BY 'PASSWORD' WITH GRANT OPTION;
FLUSH PRIVILEGES;

 

Reboot mysql to make sure everything goes effective:

sudo /etc/init.d/mysqld restart

 

Now, ssh tunnel on the machine you are using:

#3306 is default mysql port

# Explanation can be found here: http://www.revsys.com/writings/quicktips/ssh-tunnel.html

ssh -f -l <usr_name> -L 3306:<server_ip_addr>:3306 <server_ip_addr> -N

 

Finally, connect to mysql shell on the machine you are using:

mysql -u root -p -h 127.0.0.1

 

 

Enable GPU for Theano

1. Install Theano

http://deeplearning.net/software/theano/install.html

 

2. Use the following script to test Theano can work at least in CPU mode:

'''
test whether theano is using cpu or gpu
'''
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 10000

rng = numpy.random.RandomState(22)     
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print f.maker.fgraph.toposort()
t0 = time.time()
for i in xrange(iters):
    r = f()
t1 = time.time()
print 'Looping %d times took' % iters, t1 - t0, 'seconds'
print 'Result is', r

if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print 'Used the cpu'
else:
    print 'Used the gpu'

 

3. Install Cuda: 

Follow instructions here: http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/#axzz3l18NxDB5

You need to first install Cuda ToolKit, then `sudo apt-get update` and then `sudo apt-get install cuda`.

Also need to follow the post-installation steps: http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#post-installation-actions

 

4.  Add the following .theanorc file in your home directory:

[global]
floatX = float32
device = gpu0             # This enables GPU usage

[nvcc]
fastmath = True

http://deeplearning.net/software/theano/library/config.html

 

5.  Retry the script in 2.

You are almost always getting significant p-value in large dataset

Recently, an exploding reproducibility research paper [1] shows that more than half of past published papers in the main psychological journals can’t be replicated to generate significance findings.

This makes me dig a little further on how to conduct statistical tests. Let me show you my little experiments in `R` first.

#
remove(list=ls())
set.seed(19910715)
n1 = 1000000
n2 = 1000000
mean1 = 0.01
mean2 = 0

grp1 <- rnorm(n1, mean=mean1, sd=1)
grp2 <- rnorm(n2, mean=mean2, sd=1)
t_test <- t.test(grp1, grp2)

> t_test

    Welch Two Sample t-test

data:  grp1 and grp2
t = 5.8024, df = 2e+06, p-value = 6.538e-09
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.005435609 0.010980841
sample estimates:
  mean of x   mean of y 
0.009316992 0.001108767 

What I did above is to generate two groups of samples following normal distribution. They are supposed to be only 0.01 different in regard of mean. Then I did a two sides t-test to test whether the means of the two samples are different. The p-value, 6.54e-09, shows drastic significance.

True, they are indeed different since they come from two normal distributions with two close but different means. However, in the imperfect real world, almost any pair of groups of samples will have at least tiny difference in their means. Hence, you are very easy to get a significance result. That is why reporting p-value is not enough by itself.

According to the paper [2], you should always report effect size together with p-value. You can also use `pes` in `compute.es` package to calculate Cohen’s d, one type of effect size (or called standardized mean difference):

library(compute.es)
pes_ <- pes(p=t_test$p.value, n.1=n1, n.2=n2, dig = 10, tail="two")

> pes_
Mean Differences ES: 
 
 d [ 95 %CI] = 0.008205837 [ 0.005434016 , 0.01097766 ]     # This is Cohen's d
  var(d) = 2e-06 
  p-value(d) = 6.5e-09 
  U3(d) = 50.32736 % 
  CLES(d) = 50.23148 % 
  Cliff's Delta = 0.004629622 
 
 g [ 95 %CI] = 0.008205834 [ 0.005434014 , 0.01097765 ] 
  var(g) = 2e-06 
  p-value(g) = 6.5e-09 
  U3(g) = 50.32736 % 
  CLES(g) = 50.23148 % 
 
 Correlation ES: 
 
 r [ 95 %CI] = 0.004102884 [ 0.002716995 , 0.005488758 ] 
  var(r) = 5e-07 
  p-value(r) = 6.5e-09 
 
 z [ 95 %CI] = 0.004102907 [ 0.002717001 , 0.005488813 ] 
  var(z) = 5e-07 
  p-value(z) = 6.5e-09 
 
 Odds Ratio ES: 
 
 OR [ 95 %CI] = 1.014995 [ 1.009905 , 1.020111 ] 
  p-value(OR) = 6.5e-09 
 
 Log OR [ 95 %CI] = 0.01488374 [ 0.009856215 , 0.01991127 ] 
  var(lOR) = 6.5798e-06 
  p-value(Log OR) = 6.5e-09 
 
 Other: 
 
 NNT = 433.793 
 Total N = 2e+06

The result turns out to be a very poor effect size (only 0.0082), as you can compare with the table of Cohen’s d effect size:

Screenshot from 2015-09-04 09:06:17

The power of the test will be determined once you specify sample size, significance level and effect size ([3], [5]). So I use an R package `pwr` to calculate the power. I use conventional 0.05 significance level.

# 
library(pwr)
pwr.t.test(n=n1, d=pes_$d, sig.level = 0.05, type="two.sample", alternative="two.sided")

     Two-sample t test power calculation 

              n = 1e+06
              d = 0.008205837
      sig.level = 0.05
          power = 0.9999391            # What it returns
    alternative = two.sided

NOTE: n is number in *each* group

The power is 0.9999391, which indicates that you are not gonna to miss any true difference almost for sure. Although our effect size is poor, we have large `n` (sample size). That is why we have such high power.

 

Some observations can be found:

1. Keeping other parameters the same but decreasing `sig.level` will cause the power to decrease. Intuitively, if you want to pay a high price to avoid Type I Error, you inevitably introduce some additional probability of neglecting true facts. 

2. Keeping other parameters the same but decreasing `n` will cause the power to decrease. Intuitively, this says that if you are given infinite data, you will definitely find any true facts.

3. Keeping other parameters the same but decreasing `d` will cause the power to decrease. Intuitively, a very large effect size will definitely help you identify true facts.

 

The `pwr` package can also help you determine the effect size you want, given `n`, `sig.level` and `power`:

pwr.t.test(n=n1, sig.level=0.05, power=0.9999391, type="two.sample", alternative="two.sided")

     Two-sample t test power calculation 

              n = 1e+06
              d = 0.008220784             # What it returns
      sig.level = 0.05
          power = 0.9999391
    alternative = two.sided

NOTE: n is number in *each* group

You can also use pwr.t.test to determine sample sizes. For example, you want significance level to be 0.05, power to be 0.8 and small effect size (0.2), then you will need 393 samples in each group at least.

> pwr.t.test(d=0.2, sig.level=0.05, power=0.8, type="two.sample", alternative="two.sided")

Two-sample t test power calculation 

              n = 393.4057
              d = 0.2
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number in *each* group

The `pwr` package has a function to give you what values different scales of effect size should be:

> cohen.ES(test="t", size=c("small", "medium", "large"))

     Conventional effect size from Cohen (1982) 

           test = t
           size = small
    effect.size = 0.2

 

References

[1] Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.

[2] Sullivan, G. M., & Feinn, R. (2012). Using effect size-or why the P value is not enough. Journal of graduate medical education, 4(3), 279-282.

[3] http://www.statmethods.net/stats/power.html

[4] http://www.ats.ucla.edu/stat/r/dae/t_test_power3.htm

[5] http://www.quora.com/Why-dont-people-care-much-about-power-1-Type-II-error-of-a-hypothesis-test

How to to write multi-lined subscript in latex?

In latex, what is the best way to write multi-lined formula in subscript? Use `\atop`!

For example,

\begin{align*}
P(M_n=1) =\sigma(&w_s(\sum\limits_{j_r \in T_{nr}}\vec{U_{j_r}} \cdot \vec{C_{j_r}}  - \sum\limits_{j_b \in T_{nb}}\vec{U_{j_b}} \cdot \vec{C_{j_b}} ) + \\ &w_b(\sum\limits_{j_{r_1}, j_{r_2} \in T_{nr} \atop j_{r_1} \neq j_{r_2} }\left\Vert\vec{C_{j_{r_1}}} - \vec{C_{j_{r_2}}}\right\Vert^2 - \sum\limits_{j_{b_1}, j_{b_2} \in T_{nb} \atop j_{b_1} \neq j_{b_2} }\left\Vert\vec{C_{j_{b_1}}} - \vec{C_{j_{b_2}}} \right\Vert^2) + b)
\end{align*}

will result in:

Screenshot from 2015-08-23 13:53:08

How to write multi-lined formula?

In latex, what is the best way to write multi-lined formula? Actually, the `begin/end{align*}` block supports `\\\` in it.

For example,

\begin{align*}
P(M_n=1) =\sigma(&w_s(\sum\limits_{j_r \in T_{nr}}\vec{U_{j_r}} \cdot \vec{C_{j_r}}  - \sum\limits_{j_b \in T_{nb}}\vec{U_{j_b}} \cdot \vec{C_{j_b}} ) + \\ &w_b(\sum\limits_{j_{r_1}, j_{r_2} \in T_{nr} \atop j_{r_1} \neq j_{r_2} }\left\Vert\vec{C_{j_{r_1}}} - \vec{C_{j_{r_2}}}\right\Vert^2 - \sum\limits_{j_{b_1}, j_{b_2} \in T_{nb} \atop j_{b_1} \neq j_{b_2} }\left\Vert\vec{C_{j_{b_1}}} - \vec{C_{j_{b_2}}} \right\Vert^2) + b)
\end{align*}

will result in:

Screenshot from 2015-08-23 13:53:08

Minimum Description Length (MDL): a scoring function to learn Bayesian Network Structure

Bayesian Network Augmented Naive-Bayes (BAN) is an improved version of Naive-Bayes, in which every attribute $latex X_i$ may have at most one other attribute as its parent other than the class variable $latex C$. For example (figure from [1]):

Screenshot from 2015-08-03 22:43:28

Though BAN provides a more flexible structure compared to Naive Bayes, the structure (i.e., dependence/independence relationships between attributes) needs to be learnt. [2] proposed a method using a scoring function called Minimum Description Length (MDL) for structure learning. Basically, given training data set $latex D$, you start from a random network structure $latex g_1$ and calculate its $latex MDL(g_1|D)$. We set the current best network structure $latex g^* = g_1$. Then for any other possible structure $latex g_i$, $latex g_i$ is set to $latex g^*$ if $latex MDL(g_i|D) <= MDL(g^*|D)$. As you can see, we want to find a network structure which can minimize MDL(g|D).

 

Now let’s look at the exact formula of $latex MDL(g|D)$:

$latex MDL(g|D) = \frac{\log N}{2}|g| – LogLikelihood(g|D)&s=2$

 

The second term of $latex MDL(g|D)$, which is the negative LogLikelihood of $latex g$ given data $latex D$, is easy to understand. Given the data $latex D$, we would like the likelihood of such structure $latex g$ as large as possible. Since we try to minimize $latex MDL(g|D)$, we write it in a negative log likelihood format. However, merely using the negative LogLikelihood(g|D) as a scoring function is not enough, since it strongly favors the structure to be too sophisticated (a.k.a, overfitting). That is why the first term $latex \frac{\log N}{2}|g|$ comes into place to regulate the overall structure:

$latex |g|$ is the number of parameters in the network. $latex N$ is the size of dataset. $latex \frac{\log N}{2}$ is the length of bits to encode every parameter. So $latex \frac{\log N}{2}|g|$ penalizes the networks with many parameters. Taking for example the network in the figure above, suppose attribute $latex x_1$ has two possible values, i.e., $latex x_1 = \{a_1, a_2\}$. Similarly, $latex x_2 = \{b_1, b_2, b_3\}$, $latex x_3 = \{c_1,c_2\}$, $latex x_4 = \{d_1, d_2, d_3, d_4\}$, and $latex C=\{o_1, o_2\}$. Then attribute $latex x_1, x_2, x_3, x_4$ can have $latex 2\times 3 \times 2 \times 4=48$ combinations of values. Further, we can use one more parameter to represent the joint probability of $latex P(x_1, x_2, x_3, x_4, C=o_1)$, one minus which is the joint probability of $latex P(x_1, x_2, x_3, x_4, C=o_2)$. Hence $latex |g| = (|C|-1) \cdot (|x_1| |x_2| |x_3| |x_4|) = 48$. Why we can use $latex \frac{\log N}{2}$ bits to encode each parameter? Please refer to section 2 in [3] and my reading note on that at the end of this post.

 

Back to [2], you can find in the paper that the negative likelihood functino in $latex MDL$ is replaced by mutual information format. Such transformation is equivalent and was proved in [4]. 

 

My reading note on Section 2 in [3]

Until equation (3), the author is just describing the context. Not too hard to understand. Equation (3) basically says there are $latex k(g)$ number of the parameters in the network. This is exactly what I explained two paragraphs earlier.

Then, the author gave equation (4) (5) (6):

Screenshot from 2015-08-04 00:24:25

 

Here, $latex \theta$ is a vector of length $latex |k(g)|$. Each component of vector $latex \theta$ is noted as $latex \theta[q,s,g]$, which refers to the conditional probability of observing $latex q \in A_{N+1}$ and $latex s \in S(g)$ given a model $latex g$. For a given network $latex g$, $latex \sum_{q}\sum_{s}\theta[q,s,g] =1$.

 

Then, $latex w(\theta)$ is the prior distribution of the vector $latex \theta$. The author supposes it is a Dirichlet distribution, thus $latex w(\theta)$ equals to:

Screenshot from 2015-08-04 00:33:47

 

For any possible value $latex \theta$, the posterior probability is $latex w(\theta)P_\theta(y^n|x^n) &s=2$. The real marginal probaiblity of $latex P(y^n|x^n)$ is to integrate over all possible $latex \theta$ values: 

Screenshot from 2015-08-04 00:37:51

 

Then the author said the description length of $latex y^n|x^n$ is just to take negative log as in equation (4). I don’t quite get it at this point. What I can think of is that the negative log likelihood is perfect for its monotonically decreasing property. The higher probability of $latex P(y^n|x^n)$, the more desirable the structure is. Since we want to minimize MDL anyway, why not just assign MDL to such negative log format? 

 

Then the author claimed that:

Screenshot from 2015-08-04 00:50:34

This is why we can use $latex \frac{\log N}{2}$ bits to encode each parameter.

 

Reference

[1] Cheng, J., & Greiner, R. (1999, July). Comparing Bayesian network classifiers. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence (pp. 101-108). Morgan Kaufmann Publishers Inc..

[2] Hamine, V., & Helman, P. (2005). Learning optimal augmented Bayes networks. arXiv preprint cs/0509055.

[3] Suzuki, J. (1999). Learning Bayesian belief networks based on the minimum description length principle: basic properties. IEICE transactions on fundamentals of electronics, communications and computer sciences, 82(10), 2237-2245.

[4] Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine learning, 29(2-3), 131-163.

Other references, though not used in the post, are useful in my opinion:

[5] De Campos, L. M. (2006). A scoring function for learning bayesian networks based on mutual information and conditional independence tests. The Journal of Machine Learning Research, 7, 2149-2187.

[6] Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14(5), 465-471.

[7] Liu, Z., Malone, B., & Yuan, C. (2012). Empirical evaluation of scoring functions for Bayesian network model selection. BMC bioinformatics, 13(Suppl 15), S14.

A viewer for Python Dict

Just found a beautiful viewer library that helps you view Python Dict object.

Here is the code:

"""
Created on Fri Feb 22 12:52:28 2013

@author: kranzth
"""
from traits.api \
    import HasTraits, Instance, Str, on_trait_change

from traitsui.api \
    import View, VGroup, Item, ValueEditor, TextEditor

from copy import deepcopy

class DictEditor(HasTraits):
    SearchTerm = Str()
    Object = Instance( object )

    def __init__(self, obj, **traits):
        super(DictEditor, self).__init__(**traits)
        self._original_object = obj
        self.Object = self._filter(obj)

    def trait_view(self, name=None, view_elements=None):
        return View(
          VGroup(
            Item( 'SearchTerm',
                  label      = 'Search:',
                  id         = 'search',
                  editor     = TextEditor(),
                  #style      = 'custom',
                  dock       = 'horizontal',
                  show_label = True
            ),
            Item( 'Object',
                  label      = 'Debug',
                  id         = 'debug',
                  editor     = ValueEditor(),
                  style      = 'custom',
                  dock       = 'horizontal',
                  show_label = False
            ),
          ),
          title     = 'Dictionary Editor',
          width     = 800,
          height    = 600,
          resizable = True,
        )

    @on_trait_change("SearchTerm")
    def search(self):
        self.Object = self._filter(self._original_object, self.SearchTerm)

    def _filter(self, object_, search_term=None):
        def has_matching_leaf(obj):
            if isinstance(obj, list):
                return any(
                        map(has_matching_leaf, obj))
            if isinstance(obj, dict):
                return any(
                        map(has_matching_leaf, obj.values()))
            else:
                try:
                    if not str(obj) == search_term:
                        return False
                    return True
                except ValueError:
                    False

        obj = deepcopy(object_)
        if search_term is None:
            return obj

        if isinstance(obj, dict):
            for k in obj.keys():
                if not has_matching_leaf(obj[k]):
                    del obj[k]

            for k in obj.keys():
                if isinstance(obj, dict):
                    obj[k] = self._filter(obj[k], search_term)
                elif isinstance(obj, list):
                    filter(has_matching_leaf,obj[k])

        return obj



def build_sample_data():
    def make_one_level_dict():
        return dict(zip(range(100),
                        range(100,150) + map(str,range(150,200))))

    my_data = make_one_level_dict()
    my_data[11] = make_one_level_dict()
    my_data[11][11] = make_one_level_dict()
    return my_data

# Test
if __name__ == '__main__':
    my_data = {"map":"dafda", "23423":"da324"}
    b = DictEditor(my_data)
    b.configure_traits()

 

You need to install `traitsui` in advance.

sudo apt-get install python-traitsui

 

The outcome sample (not bad, huh?):

Screenshot from 2015-07-27 12:15:22

How to run a PPTP VPN server?

Setting up a VPN using PPTP protocal can be vulnerable for security. Yet it is an easy and quick way to set up a virtual private network (VPN).

The procedure to setup up a PPTP VPN server can be found here:  https://help.ubuntu.com/community/PPTPServer

The way to connect to a PPTP server on a client machine can be found here: https://my.hostvpn.com/knowledgebase/19/Setup-a-PPTP-VPN-Connection-in-Linux-Ubuntu.html