City of Raleigh Budget Sentiment Analysis

Posted on April 29, 2018 | 1 minute read

Package Import

Load necessary packages and set one global option.

library(tidyverse)
library(pdftools)
library(tidytext)
library(knitr)
library(kableExtra)

Retrieve File

Download the file from the City of Raleigh website, read that file in as a character vector, and delete the downloaded file from the directory.

download.file("https://cityofraleigh0drupal.blob.core.usgovcloudapi.net/drupal-prod/COR11/FY2018AdoptedBudget20160612.pdf",
              "FY2018AdoptedBudget.pdf",
              mode = "wb")

txt = pdf_text("FY2018AdoptedBudget.pdf")

unlink("FY2018AdoptedBudget.pdf")

Create Data Frame

Create a page number character vector, create a data frame by binding the page number character vector with the extracted text, and finally “unnest” all of the page text into individual words.

page = as.character(1:length(txt))

df = data.frame(cbind(page, txt))

budget_words = df %>%
  mutate(txt = as.character(txt)) %>%
  unnest_tokens(word, txt)

Cleaning

Remove stop words and save as clean object, join sentiment lexicon with clean object, and group the object by page and sentiments before summarising.

cleaned = budget_words %>%
  anti_join(stop_words)

sentiment = cleaned %>%
  inner_join(get_sentiments("nrc"))

sent_count = sentiment %>%
  group_by(page, sentiment) %>%
  summarise(sent_count = n()) %>%
  ungroup() %>%
  mutate(page = as.integer(page))

Visualize

Negative Word Table

Word Word Count
APPROPRIATION 77
BONDS 50
DEBT 219
EMERGENCY 67
EXPENDITURE 123
FEE 136
INCOME 65
RISK 31
TAX 153
WASTE 122

Trust Word Table

Word Word Count
BUDGET 400
CENTER 196
COUNCIL 173
GRANT 85
IMPROVEMENT 93
MANAGEMENT 165
ORDINANCE 88
PLANNING 77
RESOURCES 105
SYSTEM 101

Share via

Tags:R Markdown budget sentiment
comments powered by Disqus