Change mongodb data path to portable hard drive

Background

I have a 1TB portable usb hard drive which I want to host mongodb data. The OS is ubuntu 15.04

 

Procedure

stop your mongodb service

sudo service mongod stop

 

edit `/etc/fstab` so that the computer automatically mounts your portable drive under “mongodb” user everytime the OS boots up:

UUID=018483db-1b1c-4dd2-a373-58a6aa5a10be /               ext4    errors=remount-ro 0       1
# swap was on /dev/sdb5 during installation
UUID=ade6aa02-8f10-4251-9804-fe81850451a7 none            swap    sw              0       0i

# This line is added by you
# You must mount the portable drive under /media/mongodb
UUID=324fa516-3dba-4537-8e82-2a74ea20c4c6 /media/mongodb ext4  defaults  0   0

 

move the old data directory to `/media/mongodb` using `cp …`

 

make sure the new data path has mongodb:mongodb as owner and group.

chown -R mongodb:mongodb /media/mongodb/your_data_dir

 

change mongodb conf file by `vim /etc/mongod.conf’. Set the data path to new path.

 

restart mongodb

sudo service mongod restart

 

check if everything works by either:

sudo service mongod status

or:

tail /var/log/mongodb/mongodb.log (mongodb log path)

 

Reference

http://stackoverflow.com/questions/24781972/changing-mongod-dbpath-raise-permission-error

Create mongodb replica sets on different machines

Important prerequisites:

  1. Confirm connectivity between machines you want to set as replica set
  2. Confirm the same version of Mongodb is installed on each machine. I tried one with mongo 2.6 and another with mongo 3.0 and totally failed. Then I installed mongo 3.0 on every machine.
  3. Don’t set authentication for all the machines before you manage to set all replica sets. I spent a lot of time boggling by authentication issues.
  4. Suppose we have machine A which has a complete copy of data. All other machines, say B, C, D…. for example, need to sync data from A. Make sure all other machines besides A have a empty mongodb database before starting.

 

Now let’s start using a example where machine A has a complete copy of data and machine B is empty and needs to sync from A.

By default, mongod will read `/etc/mongod.conf` every time it starts. There are two formats of the config file: one is of “setting = value” format (http://docs.mongodb.org/v2.4/reference/configuration-options/) and the other is YAML (http://docs.mongodb.org/manual/reference/configuration-options/). The former format was used before Mongo 2.6 but is compatible for versions afterwards.

I use the same config file (of YAML format) on A and B:

storage:
 dbPath: "/media/mongodb/mongodb"
 directoryPerDB: false
 journal:
 enabled: true
systemLog:
 destination: file
 path: "/var/log/mongodb/mongodb.log"
 logAppend: true
 timeStampFormat: iso8601-utc
replication:
 oplogSizeMB: 10240
 replSetName: "rs0"
net:
 port: <your port>

(The correctness of the format of config file is very important. Sometimes, typos in config file can cause failure of start of a mongodb instance while no exception log can be traced.) 

 

To make the config file taking effect, you need to restart mongodb on both machines:

sudo service mongod restart

 

Now go to A’s mongo shell, initiate its replica state:

rsconfig = {_id:"rs0",members:[{_id:0, host:"machine_B_host:port",priority:1},{_id:1, host:"machine_A_host:port",priority:2}]}
rs.initiate(rsconfig) 

Remember to replace machine A and B’s address:port. If you want A to be PRIMARY as much as possible, set its priority higher than B. After this command, it should return OK now. You can also use `rs.reconfig(rsconfig)` later if you want to modify your configuration.

 

When the first time you start mongod on B, you will find you can’t query anything. For example:

show dbs
2015-07-09T22:52:00.208-0400 E QUERY    Error: listDatabases failed:{ "note" : "from execCommand", "ok" : 0, "errmsg" : "not master" }

That’s fine since B is a secondary server. In order to allow B for querying, you need to use `rs.slaveOk()` on B first.

 

 

Till now, we have set up two machines connecting as a replica set. But they can be easily hacked. For security considerations, we need to use a keyfile as an authentication media on both machines.

 

We first stop mongod instance:

sudo service mongod stop

 

Generate a keyfile following: http://docs.mongodb.org/manual/tutorial/generate-key-file/. Then push this file to both machines. I recommend you to put this file in the data directory, with `mongodb:mongodb` as owner and group,  and 600 permissions. 

 

Now add security specification in ‘/etc/mongod.conf:

storage:
 dbPath: "/media/mongodb/mongodb"
 directoryPerDB: false
 journal:
 enabled: true
systemLog:
 destination: file
 path: "/var/log/mongodb/mongodb.log"
 logAppend: true
 timeStampFormat: iso8601-utc
replication:
 oplogSizeMB: 10240
 replSetName: "rs0"
net:
 port: <your port>
security:
 keyFile: "/media/mongodb/mongodb/mongodb-keyfile"

 

As here suggested, specifying keyFile path automatically enables authentication. So before you start mongod instance with keyFile, you should add user and password to your mongo databases. For example:

use products
db.createUser({
    user: "accountUser",
    pwd: "password",
    roles: [ "readWrite"]
})

 

Now start mongodb again:

sudo service mongod start

 

Reference

http://blog.51yip.com/nosql/1580.html

Microsoft Word Tips

Tips again. This time is for Microsoft Word.

 

1. Ctrl+Shift+8

Show non-printing characters. You can also toggle it on manually in the menu.

Screenshot from 2015-07-03 09:26:11

 

2. Pay attention to the extension symbol in ribbon. It contains more options.

Screenshot from 2015-07-03 09:30:09

Or you can see paragraph settings in Page Layout in ribbon:

Screenshot from 2015-07-07 12:19:16

 

3. Sometimes, you will find some paragraphs don’t follow tightly after previous paragraphs. Selection these paragraphs, right click, go to Paragraphs, then Line and Page breaks, and select “Keep with next”.

Screenshot from 2015-07-03 09:31:48

 

4. The best way to insert figures as well as captions is to insert as a whole text box. As you can see, when I insert a table, I always put it with caption (if needed) in a text box with border filled with no color:

Screenshot from 2015-07-07 12:22:15

 

5. Export word to pdf without markup. First you need to switch to simple markup mode. Second, you need to go to tracking options, then “Advanced options” to set “change lines” to none. This step makes sure some markup on the border of pages will not show up in pdf. Third, go to export menu, in the pop-up window where you choose the location to save the pdf, you should go to “Options” and select “Document” in “Publish what”. 

Screenshot from 2015-07-07 12:18:40Screenshot from 2015-07-07 12:25:07Screenshot from 2015-07-07 12:26:14 Screenshot from 2015-07-07 12:26:26Screenshot from 2015-07-07 12:27:53Screenshot from 2015-07-07 12:29:07

 

6. Sometimes, you will find the tables or figures in the documents may change their positions in “All Markup” mode. This is because their positioning is not correct. Go to your object’s layout options. Then make sure either horizontal or vertical positioning is relative to Margin instead of other options.

Screenshot from 2015-07-07 12:36:29 Screenshot from 2015-07-07 12:36:40

 

 

 

 

My previous tip posts:

Regex: http://maider.blog.sohu.com/304850962.html

Eclipse: http://maider.blog.sohu.com/281903297.html

Git: https://czxttkl.com/?p=56

Install Latex distribution, editor and packages on Ubuntu

1. Install TexLive. This is the latex environment.

https://www.tug.org/texlive/quickinstall.html

We recommend to use .install-tl script to install TexLive. TexLive is usually installed at “/usr/local/texlive/year/bin/x86_64-linux/….”

Although you can install vanilla TexLive via sudo apt-get install texlive, we often observe it misses advanced accessories such as fonts.

 

2. Install Texmaker

http://www.xm1math.net/texmaker/

*** Note: On Ubuntu, Texmaker can be installed via apt-get install texmaker***

3. Open Texmaker, Options -> Configure Texmaker:

Set pdflatex location in your TexLive installed location. For example, mine is “/usr/local/texlive/2015/bin/x86_64-linux/pdflatex” -interaction=nonstopmode %.tex

4. Use `F1` to compile your tex file. Done.

5. In some versions of Ubuntu, certain shortcuts of Texmaker don’t work. To address that, you can do three steps:

a. sudo apt-get remove appmenu-qt5

b. reinstall Texmaker with Qt4-supported version: http://www.xm1math.net/texmaker/download.html

c. Keep Texmaker closed while going to ~/.config/xm1/ and edit texmaker.ini such that no same key is mapped to two different actions. (e.g. QuickBuild and PdfLatex are often mapped to the same key, F1. So please eliminate such repeating cases.)

6. To support pdf – tex synchronization (features like jump to the line from pdf to tex file, or show the red rectangle of current editing position), you need to add “-synctex=1” in pdflatex command:

"/usr/local/texlive/2016/bin/x86_64-linux/pdflatex" -interaction=nonstopmode -synctex=1 %.tex

 

To install CTAN packages:

Download package files, which usually contains a .ins and a .dtx file.

Run latex ***.ins. This will generate a ***.sty file in the same folder.

Copy and paste the sty file into texmf/tex/latex/local 

 

LDA Recap

It’s been half year since last time I studied LDA (Latent Dirichlet Allocation) model. I found learning something without writing them down can be frustrating when you start to review them after some while. Sometimes it is a bit embarrassing  that you feel you didn’t learn that at all.

Back to the topic, this post will record what I learned for LDA. It should be a start point for every one who wants to learn graphical model.

 

The generative process of LDA

LDA is way too different than usual discriminant algorithms in that it is a generative model. In a generative model, you conjure up some assumption of how data was generated. Specifically in LDA, a collection of $latex |D|$ documents with known $latex |K|$ topics and $latex |V|$ size vocabulary are generated in the following way: (To be more clear, the vocabulary is a set of words that appear at least once in documents. So the word count of $latex |D|$ documents, $latex |W|$, should be equal to or larger than $latex |V|$)

1. For $latex k = 1 \cdots K$ topics:    $latex \qquad \phi^{(k)} \sim Dirichlet(\beta) &s=2$

$latex \beta$ is a vector of length $latex |W|$ which is used to affect how word distribution of topic $latex k$, $latex \phi^{(k)} &s=2$, is generated. This step is essentially saying that, since you know you have $latex k$ topics, you should expect to have $latex k$ different word distributions. But how do you determine these word distributions? You draw from a Dirichlet distribution $latex Dirichlet(\beta)$ $latex k$ times to form the $latex k$ word distributions.

2. Now you have already had $latex k$ word distributions. Next you want to generate documents.

For each document $latex d \in D$:

(a) $latex \theta_d \sim Dirichlet(\alpha) &s=2$

You first need to determine its topics. Actually, document $latex d$ has a topic distribution $latex \theta_d$. This meets our common sense: even a document has a strong single theme, it more or less covers many different topics. How do you determine the topic distribution $latex \theta_d$ for the document $latex d$? You draw $latex \theta_d$ from another Dirichlet distribution $latex Dirichlet(\alpha)$.  

(b) For each word $latex w_i \in d$:

i. $latex z_i \sim Discrete(\theta_d) &s=2$

ii. $latex w_i \sim Discrete(\phi^{(z_i)}) &s=2$

This step is saying, before each word in the document $latex d$ is written, the word should be assigned to one single topic $latex z_i$ at first place. Such topic assignment for each word $latex w_i$ follows the topic distribution $latex \theta_d$ you got in (a). After you know which topic this word belongs to, you finally determine this word from the word distribution $latex \phi^{(z_i)}$.

Two more notes:

1. $latex \alpha$ and $latex \beta$ are normally called hyperparameters.

2. That’s why Dirichlet distribution is often called distribution of distribution. In LDA, word distribution for a specific topic, or topic distribution for a specific document, is drawn from predefined Dirichlet distribution. It is from the drawn word distribution that words are finally generated but not the Dirichlet distribution. (Or it is from the drawn topic distribution that each word in the document is finally assigned to a sampled topic. )

 

What we want after having this generative process?

What we observe or we know in advance are $latex \alpha$, $latex \beta$ and $latex \vec{w}$. $latex \vec{w}$ is a vector of word counts for all documents. What we don’t know and thus we want to get are $latex \theta$, $latex \phi$ and $latex \vec{z}$. $latex \theta$ is a matrix containing topic distributions of all documents. $latex \phi$ is a matrix containing word distributions of all topics. $latex \vec{z}$ is a vector of length $latex |W|$ representing each word’s topic. So ideally, we want to know:

$latex p(\theta, \phi, \vec{z} | \vec{w}, \alpha, \beta) = \frac{p(\theta, \phi, \vec{z}, \vec{w} | \alpha, \beta)}{p(\vec{w} | \alpha, \beta)} = \frac{p(\phi | \beta) p(\theta | \alpha) p(\vec{z}|\theta) p(\vec{w} | \phi_z)}{\iiint p(\phi | \beta) p(\theta | \alpha) p(\vec{z}|\theta) p(\vec{w} | \phi_z) d\theta \, d\phi \,d\vec{z}} &s=4$

Among which $latex p(\vec{w} | \phi_z)$ is the conditional probability of generating $latex \vec{w}$ given each word’s topic’s corresponding word distribution.

The crux of difficulty is intractability of integration of $latex \iiint p(\phi | \beta) p(\theta | \alpha) p(\vec{z}|\theta) p(\vec{w} | \phi_z) d\theta \, d\phi \,d\vec{z}$. There are a number of approximate inference techniques as alternatives to help get $latex \vec{z}$, $latex \theta$ and $latex \phi$ including variational inference and Gibbs Sampling. Following http://u.cs.biu.ac.il/~89-680/darling-lda.pdf, we are going to talk about how to use Gibbs Sampling to estimate $latex \vec{z}$, $latex \theta$ and $latex \phi$.

 

Collapsed Gibbs Sampling for LDA

The idea of Gibbs Sampling is that we can find $latex \vec{z}$ close enough to be a sample from the actual but intractable joint distribution $latex p(\theta, \phi, \vec{z} | \vec{w}, \alpha, \beta)$ by repeatedly sampling from conditional distribution $latex p(z^{i}|\vec{z}^{(-i)}) &s=2$. For example,

Loop:

1st word $latex w_1$ in $latex \vec{w}$, $latex z_1 \sim p(z_1 | z_2, z_3, \cdots, z_{|W|}) &s=2$

2nd word $latex w_2$ in $latex \vec{w}$, $latex z_2 \sim p(z_2 | z_1, z_3, \cdots, z_{|W|}) &s=2$

 …

The last word $latex w_{|W|}$ in $latex \vec{w}$, $latex z_{|W|} \sim p(z_{|W|} | z_1, z_2, \cdots, z_{|W|-1}) &s=2$

After we get $latex \vec{z} &s=2$, we can infer $latex \theta &s=2$ and $latex \phi &s=2$ according to the Bayesian posterior probability formula $latex p(A|B) = \frac{p(B|A)p(A)}{p(B)}$ (p(A), as a probability distribution, gets updated after observing event B). For example, suppose $latex X_d$ denotes the event having known the topics $latex \vec{z}_d$ of $latex \vec{w}_d$ words in document $latex d$. As we know from above, $latex p(\theta_d) = Dirichlet(\theta_d|\alpha)$. So $latex \theta_d$ can be estimated after observing $latex X_d$ (Such estimation is also called MAP: Maximum A Posterior) :

$latex p(\theta_d | X_d) = \frac{p(\theta_d) p(X_d|\theta_d)}{p(X_d)} = \frac{p(\theta_d) p(X_d|\theta_d)}{\int p(\theta_d) p(X_d|\theta_d) d\theta_d} =Dirichlet(\theta_d|\alpha + \vec{z_d})&s=2$

$latex \phi$ can be similarly inferred using the Bayesian posterior probability formula. So it turned out that Gibbs Sampling only samples $latex \vec{z}$ but it ended up letting us know also $latex \theta$ and $latex \phi$. This idea (technology) is called Collapsed Gibbs Sampling: rather than sampling upon all unknown parameters, we only sample one single parameter upon which estimation of other parameters relies.

 

How to get conditional probability $latex p(z_i | \vec{z}^{(-i)}) &s=3$?

$latex p(z_i | \vec{z}^{(-i)}) $ is the conditional probability of word $latex w_i$ belonging to $latex z_i$ given topics of all other words in documents. Since $latex \vec{w}$ is observable, and $latex \alpha$ and $latex \beta$ are hyperparameters, $latex p(z_i | \vec{z}^{(-i)}) = p(z_i, | \vec{z}^{(-i)}, \vec{w}, \alpha, \beta) &s=2$.

Hence we have:

Screenshot from 2015-06-29 11:55:28

Note:

1. From (3) to (4), we put $latex \vec{w}$ from behind the conditioning bar to the front because it will be easier to write the exact form of $latex p(\vec{z}, \vec{w}|\alpha, \beta)$:

$latex p(\vec{z}, \vec{w}|\alpha, \beta)=\iint p(\phi|\beta)p(\theta|\alpha)p(\vec{z}|\theta)p(\vec{w}|\phi_z) d\theta d\phi &s=2$.

 2. From (4) to (5), we break down $latex p(\vec{z}^{(-i)}, \vec{w} | \alpha, \beta) &s=2$ to $latex p(\vec{z}^{(-i)}, \vec{w}^{(-i)}| \alpha, \beta) p(w_i|\alpha, \beta) &s=2$ because word $latex w_i$ is generated without relying on $latex \vec{z}^{(-i)} &s=2$ and $latex \vec{w}^{(-i)} &s=2$:

$latex p(w_i) = \iiint p(w_i | \phi_{z_i}) p(z_i | \theta) p(\theta | \alpha) p(\phi | \beta) d\theta d\phi dz_i&s=2$

Actually $latex p(w_i)$ can be seen as a constant that the Gibbs Sampler can ignore. That’s why we have changed the equation symbol `=` in (5) to the proportion symbol $latex \propto$ in (6).

 

Before we proceed, we list a table of variables we will use:

$latex n_{d,k}$  The number of times words in document $latex d$ are assigned to topic $latex k$
$latex n_{k,w}$  The number of times word $latex w$ is assigned to topic $latex k$
$latex n_{d,\cdot}$  The number of words in document $latex d$
$latex n_{k,\cdot}$  The number of words belonging to topic $latex k$

 

Now let’s see how to calculate $latex p(\vec{z}, \vec{w}| \alpha, \beta) &s=2$:

Screenshot from 2015-06-29 15:49:40

After that, we can have:Screenshot from 2015-06-29 16:20:16

 

Among the equations above we applied the following properties:

1. For $latex \vec{\alpha} = (\alpha_1, \alpha_2, \dotsc, \alpha_K)$, $latex \vec{t}=(t_1, t_2, \dotsc, t_K)$ and $latex \sum_K t_i = 1 &s=2$: 

$latex B(\vec{\alpha})=\frac{\prod^K_{i=1}\Gamma(\alpha_i)}{\Gamma(\sum^K_{i=1}\alpha_i)}=\int_0^1 t_i^{\alpha_i-1} d\vec{t} &s=2$

2. $latex \Gamma(n) = (n-1)!$

3. From (9) to (10), since we are sampling $latex z_i$ among 1 to K topics, so no matter which topic $latex z_i$ is, $latex \frac{\Gamma(\sum^K_{k=1}n_{d,k}^{(-i)}+ \alpha_k)}{\Gamma(\sum^K_{k=1}n_{d,k} + \alpha_k)} &s=3$ is always $latex \frac{1}{\sum_{k=1}^K n_{d,k}^{(-i)} + \alpha_k} &s=3$ which the Gibbs Sampler could ignore.

Till now, Gibbs Sampler knows everything it needs to calculate the conditional probability $latex p(z_i | \vec{z}^{(-i)}) $ by counting $latex n_{d,k} &s=2$, $latex n_{k,w} &s=2$, $latex n_{d,\cdot} &s=2$ and $latex n_{k,\cdot} &s=2$, and knowing $latex \alpha$ and $latex \beta$.

 

 

 Pseudo-code

 Screenshot from 2015-06-29 17:02:26

 

References

http://u.cs.biu.ac.il/~89-680/darling-lda.pdf

https://gist.github.com/mblondel/542786

http://blog.csdn.net/v_july_v/article/details/41209515

http://www.mblondel.org/journal/2010/08/21/latent-dirichlet-allocation-in-python/

Collection of statistical hypothesis tests

This post is a collection of hypothesis test methodologies. The full collection is listed here: http://www.biostathandbook.com/testchoice.html. My post just goes over several hypothesis tests that are relevant to my research.

 

One-way ANOVA: http://www.biostathandbook.com/onewayanova.html

If you have one measurement variable and one nominal variable, and the nominal variable separates subjects into multiple groups, you want to test  whether the means of the measurement variable are the same for the different groups. Sometimes, the measurement variable is called dependent variable; the nominal variable is called independent variable.

 

Two-way ANOVA: http://www.biostathandbook.com/twowayanova.html

When you have one measurement variable but two nominal variable, then you would need to use two-way ANOVA. It tests whether the means of the measurement variable are equal for different values of the first nominal variable and whether the means are equal for different values of the second nominal variable. Additionally, it also tests whether there is interaction among the two nominal variables, i.e., whether the means of the measurement variable are the same when fixing one nominal variable and changing the other nominal variable. 

 

MANOVA: https://www.researchgate.net/post/What_is_the_difference_between_ANOVA_MANOVA

When you have multiple measurement variables, you can use MANOVA to test whether measurement variables are influenced by one or more nominal variables simultaneously.

 

Paired t-test (dependent t-test): http://www.biostathandbook.com/pairedttest.html

It is a test on whether the means of two paired populations are different significantly. What are paired populations and unpaired populations? See example here. It assumes the differences of pairs are normally distributed.

 

Wilcoxon signed-rank test: https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test

It is a non-parametric test which achieves the same goal as paired t-tests: 
compare two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ, in other words, whether the rankings of the means of two populations differ significantly. (It doesn’t need to assume the differences of pairs are normally distributed like paired t-test. It takes into account the absolute values of the differences.)

 

Signed test: https://en.wikipedia.org/wiki/Sign_test

If you want to calculate whether the differences of ranks are significant and the absolute values of the differences don’t matter.

 

Kendall rank correlation coefficient (Kendall $latex \tau$): https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient

It is a test on rank correlation (association) between two measured quantities. See one usage in “Deep Neural Networks for Optimal Team Composition”

 

Spearman’s rank correlation coefficient: https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient

Similar to Kendall rank correlation coefficient.

 

One way Chi Square Test: http://www.okstate.edu/ag/agedcm4h/academic/aged5980a/5980/newpage28.htm

Test if observed data falling in categories meets expectation.

 

More non-parametric statistical method: https://en.wikipedia.org/wiki/Nonparametric_statistics#Methods

 

Factor Analysis and PCA

I’ve been scratching my head for a day (06/24/2015) but it ended up that I am still baffled by the concept of Factor Analysis. So first of all, let me list what I’ve learned so far today.

PCA and Factor Analysis can be deemed similar from the perspective that they are both dimension reduction technologies.  I’ve talked about PCA in an old post. But it can’t be emphasized more on how mistakenly people understand and use PCA. Several good questions about PCA have been raised: Apply PCA on covariance matrix or correlation matrix?. More on here and here.

 

By far the most sensible writing about Factor Analysis is from wikipedia: https://en.wikipedia.org/wiki/Factor_analysis. Remember the following expressions behind Factor Analysis:

Screenshot from 2015-06-25 17:13:38

 

There has been a debate over differences between PCA and Factor Analysis for a long time. My own point is that the objective functions they try to optimize are different. Now let’s discuss an example case within a specific context. Suppose, we have a normalized data set $latex Z_{p \times n}$. ($latex n$ is the number of data points. $latex p$ is the number of features.) The loading and latent factor scores of Factor Analysis on $latex Z_{p \times n}$ are $latex L_{p \times k}$ and $latex F_{k \times n}$. ($latex k$ is the number of latent factors.) Error Term is $latex \epsilon_{p \times n}$. $latex Cov(\epsilon) = Cov(\epsilon_1, \epsilon_2, \cdots, \epsilon_p)=Diag(\Psi_1, \Psi_2, \cdots, \Psi_p) = \Psi$. ($latex \epsilon_i$ is row vector of $latex \epsilon_{p \times n}$, indicating error of each feature is independently distributed. ) Hence we have:  Z=LF+\epsilon

Then,

Screenshot from 2015-07-15 23:01:52Here we use a property $latex Cov(F)=I$ because that’s our assumption on Factor Analysis that latent factor scores should be uncorrelated and normalized. 

 

Now on the same data set $latex Z_{p \times n}$ we apply PCA. PCA aims to find a linear transformation $latex P_{k \times p}$ so that $latex PZ=Y$ where $latex Cov(Y)$ should be a diagonal matrix. But there is no requirement that $latex Cov(Y)=I$. Let’s say there exists a matrix $latex Q$ s.t. $QP=I$. Then we have:

Screenshot from 2015-07-15 23:23:44But don’t forget that in PCA we impose $latex PP^T=I$. So actually $latex Q=P^T$. In other words, $latex Cov(Z)=P^TCov(Y)P$.

 

In the comparison above, we can see only when $latex Cov(Y) \approx I$ in PCA and $latex \Psi \approx 0$ in FA the loading of PCA and FA can be similar. Therefore I don’t agree with the post here: http://stats.stackexchange.com/questions/123063/is-there-any-good-reason-to-use-pca-instead-of-efa/123136#123136  

 

 

P.S. the solution of $latex L$ and $latex F$ of either FA or PCA is not unique. Taking FA for example, if you have already found such $latex L$ and $latex F$ and you have an orthogonal matrix $latex Q$ s.t. $latex QQ^T = I$,  then $latex Z=LF + \epsilon = LQQ^TF + \epsilon = (LQ)(Q^TF) + \epsilon = L’F’ + \epsilon$. Or you can always set $latex L’ = -L$ and $latex F’ = -F$ so that $latex Z=LF+\epsilon = (-L)\cdot(-F)+\epsilon$. This formula is intuitive, since it says that we can always find a set of opposite factors and assign negative weights of loadings to depict the same data.

Unclean uninstallation of fcitx causes login loop on Ubuntu

Whether you realize or not, fcitx, a popular input method, if not properly uninstalled, can cause the notorious login loop issue on Ubuntu (i.e., every time you type your password and enter, you get asked to type your password again on the same login window). Here is the post talking about how to fully uninstall fcitx, as well as sogoupin, on Ubuntu.

http://jingyan.baidu.com/article/9faa723154c3dc473d28cb41.html

 

You may also encounter “fcitx-skin-sogou” problem afterwards. So here is a solution:

http://forum.ubuntu.org.cn/viewtopic.php?t=416810&p=3000862

 

You may then face the login loop issue, which can be solved by reinstalling ubuntu-session and ubuntu-desktop in this case.

http://askubuntu.com/questions/283985/unable-to-load-session-ubuntu-in-12-04

 

And then after many glitches I ended up with using IBus. Shiiiiiit.

alsamixer automatically mutes headphones and speakers

http://askubuntu.com/questions/280566/my-headphones-mute-alsamixer-when-i-plug-them-in-hp-dv6-12-04

http://askubuntu.com/questions/541847/is-there-any-way-to-save-alsamixer-settings-other-than-alsactl-store

The two links above are two popular links about dealing with the auto-mute problem of alsamixer. However, the two don’t help my situation.

 

My solution

1. Type the following command:

sudo alsamixer

2. Use `RightArrow` key to navigate to right until you see “auto-mute”

3. Use `UpArrow` or `DownArrow` key to set it to “Disabled”

Screenshot from 2015-06-14 08:26:13

4. Use `Esc` to quit. Hope the audio won’t be muted after your next reboot!

 

Update 06/08/2017:

The solution is totally different if the problem is that when you plug in your headphone, both headphone and speaker have sound output.

  1. in the terminal, open alsamixer.
  2. use rightarrow to navigate to the rightmost until you see Auto-Mute option, then use uparrow to select Line Out
  3. also make sure the other volume controls look like below

Screenshot from 2017-06-08 20-45-50

 

reference: https://askubuntu.com/questions/150887/sound-from-both-headphones-and-speakers

 

Learn LSTM in RNN

Long Short Term Memory is claimed to be capable of predicting time series when there are long time lags of unknown sizes between important events. However, as to 2015.6, not many clear tutorials have been found on the Internet. I am going to list a collection of materials I came across. Probably I will write a tutorial myself soon.

Wikipedia: https://en.wikipedia.org/wiki/Long_short_term_memory

Horchreiter, 1997. Long Short Term Memory.  http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf. This seems to be the very first paper applying LSTM in RNN context. I can’t understand it well however.

Felix Gers’s phd thesis. http://www.felixgers.de/papers/phd.pdf. Not very clear though.

The most clear entry-level tutorial for me: http://www.willamette.edu/~gorr/classes/cs449/lstm.html. It illustrates the reason LSTM is called LSTM.

Alex Graves. 2014. Generating Sequences With Recurrent Neural Networks. http://arxiv.org/pdf/1308.0850v5.pdf. This paper reveals how RNN can be used to generate things with LSTM.