How does gradient vanish in Multi-Layer Neural Network?

Background

This post reviews how we update weights using the back propagation approach in a neural network. The goal of the review is to illustrate a notorious phenomenon in training MLNN, called “gradient vanish”.

Start

Let’s suppose that we have a very simple NN structure, with only one unit in each hidden layer, input layer and output layer. Each unit has sigmoid function as its activation function:

Untitled

Also suppose we have training data: $latex (x_1, y_1), (x_2, y_2), \cdots, (x_n, y_n)$. Therefore, we have $latex h_1 = \frac{1}{1+e^{-w_0x_i}}$, $latex h_2 = \frac{1}{1+e^{-w_1h_1}}$, $latex h_3 = \frac{1}{1+e^{-w_2h_2}} $, $latex o = \frac{1}{1+e^{-w_3h_3}}$

Our cost function, which we want to minimize, is as follows:

$latex C=\sum\limits_{i=1}^n \left\Vert y_i – o_i\right\Vert^2$ 

 

If we are going to update $latex w_0$ after a round of inputting all the training data, we use the following rule (with $latex \epsilon$ as the learning rate):

$latex w_0 – \epsilon\nabla_{w_0}C$. 

Due to the chain rule of derivatives, we can have equivalently:

$latex \nabla_{w_0}C = \sum\limits_{i=1}^n\frac{\partial C_x}{\partial o} \cdot \frac{\partial o}{\partial h_3} \cdot \frac{\partial h_3}{\partial h_2} \cdot \frac{\partial h_2}{\partial h_1} \cdot \frac{\partial h_1}{\partial w_0}&s=3$

$latex =\sum\limits_{i=1}^n2(y_i – o_i) \cdot \delta'(w_3h_3) \cdot (-w_3) \delta'(w_2h_2) \cdot(-w_2) \cdot \delta'(w_1h_1) \cdot (-w_1) \cdot \delta'(w_0x_i) \cdot (-x_i)$

$latex =\sum\limits_{i=1}^n2(y_i – o_i)\cdot (-w_3) \cdot (-w_2) \cdot (-w_1) \cdot (-w_0) \cdot (-x_i) \cdot \delta'(w_3h_3) \cdot\delta'(w_2h_2) \cdot\delta'(w_1h_1)\cdot \delta'(w_0x_i)$

,where $latex \delta'(t) = (\frac{1}{1+e^{-t}})’$ is the derivative of sigmoid function.

 

We should be very familiar with the shape of sigmoid function as well as its derivative.

sigmoidc (1)1

From the two plots above, we know that the largest derivative of sigmoid function is 0.25 and it takes place at x=0. Let’s go back to $latex \nabla_{w_0}C$ of the multiple-layer neural network,  which is the gradient descent of the weight associated with the layer farthest away from the output unit. It has four sigmoid derivatives, namely $latex \delta'(w_3h_3) \cdot\delta'(w_2h_2) \cdot\delta'(w_1h_1)\cdot \delta'(w_0x_i)$, each of which cannot overseed 0.25. As a result, $latex \nabla_{w_0}C$ would become very tiny.

That’s how gradient vanishing happens! It’s the phenomenon that certain weights in a multi-layer neural network cannot get updated effectively. 

 

Reference

http://neuralnetworksanddeeplearning.com/chap5.html

http://www-labs.iro.umontreal.ca/~bengioy/dlbook/mlp.html

How to view all columns of a data.frame in R

If you want to view a large data.frame with a large number of columns in R, you will only see first several columns in the UI. After reading this post, I found that if using utils::View(your_data_frame) , you can view all the columns in a new window.

Reference:

http://stackoverflow.com/questions/19341853/r-view-does-not-display-all-columns-of-data-frame

Logical deduction in NP-completeness proof

Maybe it sounds simple to computer scientists, I just want to backup some logical deduction of myself in NP-completeness proofs in case I will deal with such proofs in the future.

In NP-completeness proof, we always want to find a polynomial-time reduction (transformation) from problem A to problem B, denoted as:

Capture

We say that there is an algorithm which for each instance $latex I$ of Problem A constructs Problem B in time which is polynomial with respect to the input size.

If $latex B \in P$, then $latex A \in P$. Think about it in contradiction: if $latex A$ is a NP-complete problem, a problem as hard as the “hardest problems” in the world, and suddenly we have a polynomial-time to transform it to problem B which is in P, then why not applying such transformation then solving B, both in polynomial time, every time we are given to solve $latex A$? 

However, it is easy (and wrong) to think that if $latex B \in \text{NP-complete}$ then $latex A \in \text{NP-complete}$. It is wrong because I can even transform $\latex A$, a problem in P,  by my will to any problem as tricky as NP-complete problems. So having $latex B \in \text{NP-complete}$ and a polynomial time transformation from $latex A$ to $latex B$ can’t guarantee $latex A$ to be anything.

If $latex A \in \text{NP-complete}$, then $latex B \in \text{NP-complete}$. Initially I think this is counter-intuitive. After all, by definition of polynomial time reduction, every instance of $latex A$ is mapped to $latex B$, but not every instance of $latex B$ can be assured to be mapped. Thinking in the following way makes me more sense: at least some instances of $latex B$ are mapped from every instance of $latex A$. So the mapped instances of $latex B$ are as hard as NP-complete problems. If the other instances of $latex B$ can be solved in P, problem $latex B$ itself still falls in the $latex NP-complete$ category.

All in all, I think all my logical deduction holds if the following diagram is true:

414px-Complexity_classes.svg

Otherwise, all the NP-complete proofs in this world will need to take with a grain of salt. 

 

 

 

Guitar Chord Music Theory

It’s a bit shame that I just start to learn music theory of guitar chords after FIVE years since I picked up guitar. In this post I will share how I have learned guitar chords from beginning.

 

1. know mappings from alphabets to do-rei-mi-fa-so-la-si

C is DO, D is REI, E is MI, so on so forth.

 

2. know every note on the neck of a guitar.

Ask yourself questions such as, what is the note on the 2nd fret on the 6-th string? How about the note on the 12th fret on 3rd string?

This page might be helpful: http://basicsofguitar.blogspot.com/2009/07/how-to-know-your-fretboard-well.html 

 

3. know what is a major scale?

A major scale is a scale that starts from some note, from which the following notes have fixed intervals.

The so called fixed intervals are:

full, full, half, full, full, full, half

full = 2 semi-tone, half = 1 semi-tone

For example,

C  C#  D  D#  E  F  F#  G  G#  A  A#  B  C ...

   D♭      E♭        G♭     A♭     B♭

In the first line, C, C3, D, D#,…..,A, A#, B make up a chromatic scale. (A chromatic scale, by definition, contains 12 notes with semitone as interval.) An octave is the range from the first C to the next C, which spans 12 semitones. Physically, the frequency of a note is half as the frequency of the note in the next octave. 

C, D, E, F, G, A, B are the notes in C major scale. Because D is 2 semi-tones higher than C, E is 2semi-tones higher than D, but F is 1semi-tones higher than E, so on so forth, which is exactly the intervals we said before: full, full, half, full, full, full, half.

On the second line, we list D, E, G, A and B because D is the same note as C#, E is the same note as D#, G is the same note as F#, so on so forth.

 

good references to learn scale:

http://basicsofguitar.blogspot.com/2010/04/how-to-figure-out-notes-for-major-scale.html

http://musictheorysite.com/major-scales/list-of-all-major-scales

 

4. I believe people remember notes on the guitar neck based on the C major scale. That is, the open strings from the 1st to 6th strings are called E, B, G, D, A and E.  So we will just use this scale from now on.

 

5. A chord often consists of three notes, which are called triads.

The first triad is called the root. The first letter in a chord denotes the root of this chord. For example, Cmaj means this is the major chord, with the root as C. Similarly, Amin denotes the minor chord with the root as A. The second triad is called the third. The third often determines the type of this chord (whether it is a minor or major or something else.) The third triad is called the fifth.

A major chord contains the triads with 4 semitones interval between the root and the third, and with 3 semitones interval between the third the fifth. A minor chord contains the triads with 3 semitones interval between the root and the third, and with 4 semitones interval between the third and the fifth.

So looking at the following scale again.

C  C#  D  D#  E  F  F#  G  G#  A  A#  B  C ...

   D♭      E♭        G♭     A♭     B♭

Cmaj will be a major chord consisting C(root), E(the third), and G(the fifth) because E is 4 semitones higher than C and G is 3 semitones higher than E.

Cmin will a minor chord consisting C(root), D#(the third) and G(the fifth) because D# is 3 semitones higher than C and G is 4 semitones higher than D#.

Besides major and minor chords, there are other types of chords. Please see more info below. These are the websites I am referring to along the way I learned chords.

http://www.essentialguitar.com/

http://www.fretjam.com/guitar-chord-theory.html

 

6. After learning these, all you have to do first is to memorize all major chords rooting at different notes. Then, if you are asked to give other types of chords with same root, you can just move your finger accordingly from the major chord.

 

A general way to add backslash on symbols in latex

You know, in latex you can type

 \neq

 to make an inequation symbol:

`$latex \neq$`

 

Of course, you don’t want to remember how to type in latex all symbols with backslash on them. So here is a general way to add backslash on any symbol in latex: add `\not` in front of a normal symbol. 

Hence, `\not\rightsquigarrow` becomes $latex \not\rightsquigarrow$. `\not\subseteq` becomes $latex \not\subseteq$. so on so forth.

Proof that BFS finds shortest distance

Intuitively, Breadth-First Search (BFS) can find shortest distances from a source $latex s&s=4$ to any other reachable vertex. However, proving this intuition needs a bit hard work. After some searches, I found that there are two ideas to prove it. The two ideas both use induction to prove it. Both ideas require a bit long proof.

  1. The first idea is based on CLRS Page 597-600. The second reference below is talking about the same idea.
  2. The second idea is from a pdf file from NYU.  (the third reference)

References:

Here I am showing my proof, a small variant of CLRS:

Continue reading “Proof that BFS finds shortest distance”

How to migrate WordPress from one host from another?

WordPress really SUCK! It doesn´t provide a viable, simple and easy solution for export/import of your whole site. Here I am sharing how I managed to migrate my WordPress site from an old host to a new one.

  • make sure Apache2, PHP, MySQL, and PHP MySQL extension work in perfect on your new host. Here is one instruction: http://www.tecmint.com/install-wordpress-on-ubuntu-16-04-with-lamp/
  • bear in mind that WordPress saves all static contents (style, theme, uploaded images, etc) in your file system. So make a copy of WordPress folder on the old host located in /var/www/html/wordpress_folder.
  • since WordPress stores all dynamic contents in mysql database (e.g., users, articles), you need to export that database to ¨.sql¨ file. Let’s suppose in /var/www/html/<wordpress_folder>/wp-config.php, we have:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    
    // ** MySQL settings - You can get this info from your web host ** //
    /** The name of the database for WordPress */
    define('DB_NAME', 'your_dbname');
     
    /** MySQL database username */
    define('DB_USER', 'your_usrname');
     
    /** MySQL database password */
    define('DB_PASSWORD', 'your_password');
     
    /** MySQL hostname */
    define('DB_HOST', 'your_hostname');

    Then, on the machine that hosts your wordpress website, you can do the export by:

    mysqldump -u your_usrname -p -h your_hostname your_dbname > wordpress_bakup.sql

    which will complete after receiving your_password.

  • Open the exported ¨.sql¨ file, replace all the old site urls to the new site urls, then save the ¨.sql¨ file. For example, in `vim` you can use:
     :%s/<old_ip_addr>/<new_ip_addr>/g
  • on the new host, put the old host static content under the same path relative to html root directory.
  • on the new host, import the ¨.sql¨ to a newly created database with the same name. (You may need to create a database called your_dbname first. This can be done my first logging in mysql by <span class="lang:default decode:true crayon-inline">mysql -u usr -p</span> , then create database your_dbname;  )
    mysql -u usr -p your_dbname < your_bak.sql
  • change ¨wp-config.php¨ accordingly so that DB password and username match those on the new host.
1
2
3
4
5
/** MySQL database username */
define('DB_USER', 'usr');
 
/** MySQL database password */
define('DB_PASSWORD', 'pwd');
  •  change the permission of `wp-content/uploads`, `wp-content/upgrades`, `wp-
  • content /plugins` folders to 777.
  •  add “define(‘FS_METHOD’, ‘direct’);” in the `wp-config.php`.

That´s it. Enjoy it.

—————————————————————————————————–

Updated on Aug 24th, 2015:

If you ever encounter file permission errors during updates or anything else, change permission of the whole wordpress directory as well as the files in it to 777. Do the update. And then change the permissions back as per the official document suggests:

For Directories:

find /path/to/your/wordpress/install/ -type d -exec chmod 755 {} \;

For Files:

find /path/to/your/wordpress/install/ -type f -exec chmod 644 {} \;

—————————————————————————————————–

Updated on April 8th, 2019

Another way to migrate wordpress:

Use Tools -> Export on the old site, and Use Tools->Import on the new site. Up to this point, the new site should have the identical content as in the old site. To update post contents that contain the old site’s URLs, use phpAdmin or mysqldump to export the database to an SQL file. In this file, replace the old site’s URLs with the new ones. Import the updated SQL file again using phpAdmin or mysqldump.

—————————————————————————————————-

Updated on May 6th, 2019

I just found this plugin is very useful

All-in-One WP Migration

How to make contents wider in WordPress Theme?

It again proved how silly WordPress is: I can’t find any theme with wide content area. Why do we have to waste almost a third of page area in EVERY WordPress theme? I read many articles about how to make custom CSS to widen the content column but no magic happened.

So I decided to modify `style.css` which is styling my theme, Twenty-fifteen. (Don’t have time to back up the theme or create a child theme.) Modifying `style.css` is easy but bear in mind whenever it gets updated it gets overwritten. So you need to modify `style.css` after every update.

As you can see on my site, my article has wide area so it has more contents to offer within one screen. You can download my ‘style.css’ here: https://dl.dropboxusercontent.com/u/5055823/style.css