## Text Mining Question Bank

**1. Natural Language Processing**

- Give 5 examples for Holonyms, Hyponyms, Hypernyms, Metonyms, Meronyms, Homonyms, Synonyms, Polysems.
- Draw the Venn diagram of Spellings-Meanings-Pronunciations.
- Why are Context Free Grammars Context free ?
- What is the difference between RTN and ATN ?
- Give examples of Prepositional Phrases.
- Compare CFG and ATN.
- Give 5 examples for Anaphora, Cataphora, Endophora, Exophora.
- Give 5 examples of NP ellipsis, VP ellipsis.
- Write a CFG, ATN for the following:
- “Tech Companies queue up for Open Source Professionals”.
- I love my language.
- Patriotism is not about watching cricket matches together.
- AMD’s microcode is more richer than Intel.
- Ron Weasley should marry Hermoine Granger.
- Krishna is a metonym for uncertainty.
- PMPO is 8 times that of RMS power measured for a 1KHz signal with an amplitude of 1V.

- What are the Named Entities in

- “Open Source helps Life Spring Hospitals” ?
- I want to work for Burning Glass Technologies Inc.
- The university life at SRM is very informal.
- AMD Phenom 5500 Black Edition can be unleashed to 4 cores.
- Hail Hitler!
- Anushka is taller than Surya.

- Do NP chunking on

- Tips and Tools for measuring the world and beating the odds
- The crazy frog is an awesome song
- Time flies like arrow.
- Thevaram was written by Appar.
- Text mining is awfully interesting.
- I need to get placed is a good company.

- Write a Regular Expression for replacing the beginning and end of all the lines in a text file with the strings “
” and “ ” respectively. - Write a regular expression for capturing Indian mobile numbers, land line numbers and Indian pin codes with maximum possible inherent validation.
- Write a regular expression for capturing the vehicle numbers, PAN numbers, Passport numbers in a new paper article.
- Identify rules to capturing dates and discriminating the job dates, education dates and date of birth.
- Give examples for Noun stemming in English & {Tamil or Telugu or Hindi} languages. Transliterate the Indian language.
- Give examples for Verb stemming in English & {Tamil or Telugu or Hindi} languages. Transliterate the Indian language.
- How does a spell checker work ?
- Take some arbitrary texts and summarize them in to a line or two. Justify the reason for the choice of words and sentences in your summary.
- Show some examples for word-by-word, sentence-by-sentence, context-by-context machine translation.

**2. Information Extraction & Statistical NLP**

- If Prob(A) is 0.4 and Prob(B) is 0.6, what is Prob(A,B), Prob(A|B), Prob(A u B), Prob(A – B), Prob(A n B) ? If some data is missing, assume a reasonable value for it.
- Let A be a random variable with instances a1, a2, a3, a4, a5. If P(a1) = 1.8e-4, P(a2) = 5.2e-8, P(a3) = 0.042, P(a4) = 0.00052, P(a5)=0.2, compute Sigma P(A), PI P(A) without mathematical underflow.
- Give real life examples for 1st order markov processes.
- Give real life examples of Expectation-Maximization.
- If p[[0.1 0.3 0.2 0.4],[0.3 0.4 0.2 0.1],[0.3 0.3 0.1 0.3], [0.2 0.4 0.1 0.3]] is the state transition probability of any 4 states {A,B,C,D} in a HMM, calculate P(A->B->C->D).
- Based on (5), check whether the probability of state sequence is commutative (ex: P(A->B->C) = P(C->B->A) ?)
- If the observation probability is [[.2 .4 .1 .3], [.6 .1 .0 .3], [.0 .0 .0 1.0], [.1 .1 .1 .7], [.4 .4 .1 .1]] for observations {i, j, k, l, m} in states as per(5). Compute the P(O={k,l}).
- Annotate the items in (9) of Section 1 and build the state transition, observation, initial probability matrices.
- Show that usage of forward probabilities reduce the time-complexity of evaluation problem.
- Show that usage of forward-backward probabilities reduce the time-complexity of decoding problem.

Powered by ScribeFire.