Life saving tips for PyDev

I wrote a very useful post a year ago, talking about 20 life-saving tips in Eclipse: http://maider.blog.sohu.com/281903297.html. In that context, the tips focus more on efficiency during Java development. In this post, several tips particularly useful for Python developers are collected. I will continue updating the post whenever I find something useful.

1. Press `F2` to run single line in “.py” files

This is useful if you are editing a python source file and you just want to tentatively try running one line.

2.  It is often needed to switch between Editor View and Console View. So `Ctrl + F7` can help you switch between Views.

 

How does gradient vanish in Multi-Layer Neural Network?

Background

This post reviews how we update weights using the back propagation approach in a neural network. The goal of the review is to illustrate a notorious phenomenon in training MLNN, called “gradient vanish”.

Start

Let’s suppose that we have a very simple NN structure, with only one unit in each hidden layer, input layer and output layer. Each unit has sigmoid function as its activation function:

Untitled

Also suppose we have training data: $latex (x_1, y_1), (x_2, y_2), \cdots, (x_n, y_n)$. Therefore, we have $latex h_1 = \frac{1}{1+e^{-w_0x_i}}$, $latex h_2 = \frac{1}{1+e^{-w_1h_1}}$, $latex h_3 = \frac{1}{1+e^{-w_2h_2}} $, $latex o = \frac{1}{1+e^{-w_3h_3}}$

Our cost function, which we want to minimize, is as follows:

$latex C=\sum\limits_{i=1}^n \left\Vert y_i – o_i\right\Vert^2$ 

 

If we are going to update $latex w_0$ after a round of inputting all the training data, we use the following rule (with $latex \epsilon$ as the learning rate):

$latex w_0 – \epsilon\nabla_{w_0}C$. 

Due to the chain rule of derivatives, we can have equivalently:

$latex \nabla_{w_0}C = \sum\limits_{i=1}^n\frac{\partial C_x}{\partial o} \cdot \frac{\partial o}{\partial h_3} \cdot \frac{\partial h_3}{\partial h_2} \cdot \frac{\partial h_2}{\partial h_1} \cdot \frac{\partial h_1}{\partial w_0}&s=3$

$latex =\sum\limits_{i=1}^n2(y_i – o_i) \cdot \delta'(w_3h_3) \cdot (-w_3) \delta'(w_2h_2) \cdot(-w_2) \cdot \delta'(w_1h_1) \cdot (-w_1) \cdot \delta'(w_0x_i) \cdot (-x_i)$

$latex =\sum\limits_{i=1}^n2(y_i – o_i)\cdot (-w_3) \cdot (-w_2) \cdot (-w_1) \cdot (-w_0) \cdot (-x_i) \cdot \delta'(w_3h_3) \cdot\delta'(w_2h_2) \cdot\delta'(w_1h_1)\cdot \delta'(w_0x_i)$

,where $latex \delta'(t) = (\frac{1}{1+e^{-t}})’$ is the derivative of sigmoid function.

 

We should be very familiar with the shape of sigmoid function as well as its derivative.

sigmoidc (1)1

From the two plots above, we know that the largest derivative of sigmoid function is 0.25 and it takes place at x=0. Let’s go back to $latex \nabla_{w_0}C$ of the multiple-layer neural network,  which is the gradient descent of the weight associated with the layer farthest away from the output unit. It has four sigmoid derivatives, namely $latex \delta'(w_3h_3) \cdot\delta'(w_2h_2) \cdot\delta'(w_1h_1)\cdot \delta'(w_0x_i)$, each of which cannot overseed 0.25. As a result, $latex \nabla_{w_0}C$ would become very tiny.

That’s how gradient vanishing happens! It’s the phenomenon that certain weights in a multi-layer neural network cannot get updated effectively. 

 

Reference

http://neuralnetworksanddeeplearning.com/chap5.html

http://www-labs.iro.umontreal.ca/~bengioy/dlbook/mlp.html

How to view all columns of a data.frame in R

If you want to view a large data.frame with a large number of columns in R, you will only see first several columns in the UI. After reading this post, I found that if using utils::View(your_data_frame) , you can view all the columns in a new window.

Reference:

http://stackoverflow.com/questions/19341853/r-view-does-not-display-all-columns-of-data-frame

Logical deduction in NP-completeness proof

Maybe it sounds simple to computer scientists, I just want to backup some logical deduction of myself in NP-completeness proofs in case I will deal with such proofs in the future.

In NP-completeness proof, we always want to find a polynomial-time reduction (transformation) from problem A to problem B, denoted as:

Capture

We say that there is an algorithm which for each instance $latex I$ of Problem A constructs Problem B in time which is polynomial with respect to the input size.

If $latex B \in P$, then $latex A \in P$. Think about it in contradiction: if $latex A$ is a NP-complete problem, a problem as hard as the “hardest problems” in the world, and suddenly we have a polynomial-time to transform it to problem B which is in P, then why not applying such transformation then solving B, both in polynomial time, every time we are given to solve $latex A$? 

However, it is easy (and wrong) to think that if $latex B \in \text{NP-complete}$ then $latex A \in \text{NP-complete}$. It is wrong because I can even transform $\latex A$, a problem in P,  by my will to any problem as tricky as NP-complete problems. So having $latex B \in \text{NP-complete}$ and a polynomial time transformation from $latex A$ to $latex B$ can’t guarantee $latex A$ to be anything.

If $latex A \in \text{NP-complete}$, then $latex B \in \text{NP-complete}$. Initially I think this is counter-intuitive. After all, by definition of polynomial time reduction, every instance of $latex A$ is mapped to $latex B$, but not every instance of $latex B$ can be assured to be mapped. Thinking in the following way makes me more sense: at least some instances of $latex B$ are mapped from every instance of $latex A$. So the mapped instances of $latex B$ are as hard as NP-complete problems. If the other instances of $latex B$ can be solved in P, problem $latex B$ itself still falls in the $latex NP-complete$ category.

All in all, I think all my logical deduction holds if the following diagram is true:

414px-Complexity_classes.svg

Otherwise, all the NP-complete proofs in this world will need to take with a grain of salt. 

 

 

 

Guitar Chord Music Theory

It’s a bit shame that I just start to learn music theory of guitar chords after FIVE years since I picked up guitar. In this post I will share how I have learned guitar chords from beginning.

 

1. know mappings from alphabets to do-rei-mi-fa-so-la-si

C is DO, D is REI, E is MI, so on so forth.

 

2. know every note on the neck of a guitar.

Ask yourself questions such as, what is the note on the 2nd fret on the 6-th string? How about the note on the 12th fret on 3rd string?

This page might be helpful: http://basicsofguitar.blogspot.com/2009/07/how-to-know-your-fretboard-well.html 

 

3. know what is a major scale?

A major scale is a scale that starts from some note, from which the following notes have fixed intervals.

The so called fixed intervals are:

full, full, half, full, full, full, half

full = 2 semi-tone, half = 1 semi-tone

For example,

C  C#  D  D#  E  F  F#  G  G#  A  A#  B  C ...

   D♭      E♭        G♭     A♭     B♭

In the first line, C, C3, D, D#,…..,A, A#, B make up a chromatic scale. (A chromatic scale, by definition, contains 12 notes with semitone as interval.) An octave is the range from the first C to the next C, which spans 12 semitones. Physically, the frequency of a note is half as the frequency of the note in the next octave. 

C, D, E, F, G, A, B are the notes in C major scale. Because D is 2 semi-tones higher than C, E is 2semi-tones higher than D, but F is 1semi-tones higher than E, so on so forth, which is exactly the intervals we said before: full, full, half, full, full, full, half.

On the second line, we list D, E, G, A and B because D is the same note as C#, E is the same note as D#, G is the same note as F#, so on so forth.

 

good references to learn scale:

http://basicsofguitar.blogspot.com/2010/04/how-to-figure-out-notes-for-major-scale.html

http://musictheorysite.com/major-scales/list-of-all-major-scales

 

4. I believe people remember notes on the guitar neck based on the C major scale. That is, the open strings from the 1st to 6th strings are called E, B, G, D, A and E.  So we will just use this scale from now on.

 

5. A chord often consists of three notes, which are called triads.

The first triad is called the root. The first letter in a chord denotes the root of this chord. For example, Cmaj means this is the major chord, with the root as C. Similarly, Amin denotes the minor chord with the root as A. The second triad is called the third. The third often determines the type of this chord (whether it is a minor or major or something else.) The third triad is called the fifth.

A major chord contains the triads with 4 semitones interval between the root and the third, and with 3 semitones interval between the third the fifth. A minor chord contains the triads with 3 semitones interval between the root and the third, and with 4 semitones interval between the third and the fifth.

So looking at the following scale again.

C  C#  D  D#  E  F  F#  G  G#  A  A#  B  C ...

   D♭      E♭        G♭     A♭     B♭

Cmaj will be a major chord consisting C(root), E(the third), and G(the fifth) because E is 4 semitones higher than C and G is 3 semitones higher than E.

Cmin will a minor chord consisting C(root), D#(the third) and G(the fifth) because D# is 3 semitones higher than C and G is 4 semitones higher than D#.

Besides major and minor chords, there are other types of chords. Please see more info below. These are the websites I am referring to along the way I learned chords.

http://www.essentialguitar.com/

http://www.fretjam.com/guitar-chord-theory.html

 

6. After learning these, all you have to do first is to memorize all major chords rooting at different notes. Then, if you are asked to give other types of chords with same root, you can just move your finger accordingly from the major chord.

 

A general way to add backslash on symbols in latex

You know, in latex you can type

 \neq

 to make an inequation symbol:

`$latex \neq$`

 

Of course, you don’t want to remember how to type in latex all symbols with backslash on them. So here is a general way to add backslash on any symbol in latex: add `\not` in front of a normal symbol. 

Hence, `\not\rightsquigarrow` becomes $latex \not\rightsquigarrow$. `\not\subseteq` becomes $latex \not\subseteq$. so on so forth.

Proof that BFS finds shortest distance

Intuitively, Breadth-First Search (BFS) can find shortest distances from a source $latex s&s=4$ to any other reachable vertex. However, proving this intuition needs a bit hard work. After some searches, I found that there are two ideas to prove it. The two ideas both use induction to prove it. Both ideas require a bit long proof.

  1. The first idea is based on CLRS Page 597-600. The second reference below is talking about the same idea.
  2. The second idea is from a pdf file from NYU.  (the third reference)

References:

Here I am showing my proof, a small variant of CLRS:

Continue reading “Proof that BFS finds shortest distance”