# Text Mining with MATLAB®

# Text Mining with MATLAB®

## Rafael E. Banchs

Language: English

Pages: 356

ISBN: 1461441501

Format: PDF / Kindle (mobi) / ePub

*Text Mining with MATLAB* provides a comprehensive introduction to text mining using MATLAB. It’s designed to help text mining practitioners, as well as those with little-to-no experience with text mining in general, familiarize themselves with MATLAB and its complex applications.

The first part provides an introduction to basic procedures for handling and operating with text strings. Then, it reviews major mathematical modeling approaches. Statistical and geometrical models are also described along with main dimensionality reduction methods. Finally, it presents some specific applications such as document clustering, classification, search and terminology extraction.

All descriptions presented are supported with practical examples that are fully reproducible. Further reading, as well as additional exercises and projects, are proposed at the end of each chapter for those readers interested in conducting further experimentation.

High Performance Computing: Programming and Applications (Chapman & Hall/CRC Computational Science)

This is the 1st string >> list{2} % retrieves the second string ans = ð2:11bÞ This is the 2nd one >> list{3} % retrieves the third string ans = ð2:11cÞ And the 3rd Additionally, each character or substring within the strings can be retrieved in the same way it is done in the case of character arrays: >> % retrieves the first four characters in the first string >> list{1}(1:4) ans = ð2:12aÞ This >> % retrieves the last three characters in the second string >> list{2}(end-2:end) ans =

Description \\rexp rexp\[ ^rexp rexp$ rexp(?=test) rexp(?!test) (?\=test)rexp (?\!test)rexp Anchor Anchor Anchor Anchor Look-ahead Look-ahead Look-behind Look-behind Matches Matches Matches Matches Matches Matches Matches Matches rexp rexp rexp rexp rexp rexp rexp rexp if if if if if if if if it it it it it it it it occurs at the start of a word occurs at the end of a word occurs at the start of the string occurs at the end of the string is followed by expression test is not followed by

the starting indexes of the occurrences found, if any. In case no occurrences are found, the function returns an empty variable. Let us consider, for instance, the following string: >> string = 'This is SECTION 4.1'; ð4:2Þ Now, suppose we want to match occurrences of the character sequence ’is’ . We can do it by using the function strfind as follows: >> indexes = strfind(string,'is') ð4:3Þ indexes = 3 6 Also, we can know the resulting number of occurrences of a pattern within a text string

experimental dataset for the sake of clarity and to keep the mathematical formulation at a basic level. You can refer to the further reading section for relevant references about other commonly used topic models. 7.4 Statistical Bag-of-Words p(z|d) = 1/γ p(d|z) p(z) ≈ 1/γ p(z)Π p(wn|z) n 163 ð7:34Þ where 1/c is just a normalization factor which ensures that the resulting values of p(z|d) constitute actual probabilities, i.e. Rz p(z|d) = 1. In the second step (M-step), we estimate new values

vector space model, as they clearly imply 184 8 Geometrical Models Document Space Term-document matrix da db dc de df dg dh w1 1 1 0 0 0 1 1 w2 0 1 1 1 0 0 1 w3 0 0 0 1 1 1 1 w1 1 dg da db dh Document collection dc 1 df 1 w3 de w2 da: w1 w1 w1; db: w2 w1 w2 w2 w1 dc: w2 w2; de: w3 w3 w3 w2 w3 w2 df: w3 w3 w3; dg: w3 w1 w1; dh: w1 w3 w1 w2 w2 Fig. 8.1 Illustrative example of the document space for a sample document collection that this kind of