MoRe than woRds, Text and Context: Language Analytics in Finance with R

Sanjiv R. Das, Karthik Mokashi - Santa Clara University

Post-tutorial notes

The materials used in the tutorial are available here.

Tutorial Description

This tutorial surveys the technology and empirics of text analytics with a focus on finance applications. We present various tools of information extraction and basic text analytics. We survey a range of techniques of classification and predictive analytics, and metrics used to assess the performance of text analytics algorithms. We then review the literature on text mining and predictive analytics in finance, and its connection to networks, covering a wide range of text sources such as blogs, news, web posts, corporate filings, etc. We end with textual content presenting forecasts and predictions about future directions. The tutorial will use the R programming language throughout and present many hands-on examples.

Tutorial Outline

The following major topics will be covered, time permitting:

What is Text Mining? Data and Algorithms.
Text Extraction.
String handling and Parsing. The stringr package.
Text Cleaning.
Using the XML package.
Text Response to News.
Text Handling, and Regular Expressions.
Text Mining Using the tm package. Stopwords, Stemming, etc.
Using Term Frequency - Inverse Document Frequency (TF-IDF).
The wordcloud package.
Extracting Text from Web Sources with APIs: Twitter, Facebook, Yelp, LinkedIn, etc.
Functions on Text.
Cosine Similarity for Text Analysis.
Dictionaries and Lexicons.
Mood Scoring, using Harvard Inquirer.
Text Classification. Bayes Classier, SVMs, Fisher's Discriminant, using Adjectves and Adverbs, Vector-Distance Classifier, etc.
Metrics: Confusion Matrix, Accuracy, False Positives, Disagreement measure, Precision, Recall, etc.
Grading Text: Readability, etc.
Text Summarization.
Text Mining Applications in Finance: stock trading with Tweets, Predicting Earnings, Commercial applications, Business Applications for Finance Firms.
Using Text Analytics to construct Financial Networks.
Latent Semantic Analysis.
Topic Analysis: Latent Dirichlet Allocation.
Using the rvest package.
Using the Rselenium package.
The future?

Background Knowledge

This tutorial is useful for newbie and advanced R users who are interested in the specic aspects of text mining. Attendees will learn various techniques and terminology related to this area of data science.

Since this will be hands-on, come with your WiFi ready laptop, and work with us as we explain the various concepts. The program files will be made available online at the tutorial. You will need an installation of R of course. RStudio, as always is helpful to have. And if possible, download the packages listed above and install them.

Download Materials

Materials for the tutorial may be downloaded here.