# Parsing Functions in edgarWebR

#### 2017-10-12

New to edgarWebR 0.2.0 are functions for parsing SEC documents. While there are good R packages for XBRL processing, there is a gap in extracting information from other document types available via the site. edgarWebR currently provides functions for 2 of those -

• parse_submission() - Processes a raw SGML filing into component documents. These are the ‘Complete submission text file’ on filing pages. Similar to zip files, they contain all the files included in particular submission.
• parse_filing() - Processes a narrative filing (e.g. 10-K, 10-Q) into paragraphs annotated with part and item numbers. In a submission with many files, this is the main form.

This vignette will show how to use both functions to find the risks reported by in a company’s recent filing.

## Find a Submission

Using edgarWebR functions, we’ll first look up a recent filing.

ticker <- "STX"

filings <- company_filings(ticker, type = "10-Q", count = 40)
# Specifying the type provides all forms that start with 10-, so we need to
# manually filter.
filings <- filings[filings$type == "10-Q", ] # We're only interested in a particular filing filing <- filings[filings$filing_date == "2017-10-27", ]
filing$md_href <- paste0("[Link](", filing$href, ")")
knitr::kable(filing[, c("type", "filing_date", "accession_number", "size",
"md_href")],
col.names = c("Type", "Filing Date", "Accession No.", "Size", "Link"),
digits = 2,
format.args = list(big.mark = ","))
Type Filing Date Accession No. Size Link
7 10-Q 2017-10-27 0001193125-17-323042 6 MB Link

## Get the Complete Submission File

We’ll next get the list of files and find the link to the complete submission.

docs <- filing_documents(filing$href) doc <- docs[docs$description == 'Complete submission text file', ]
doc$md_href <- paste0("[Link](", doc$href, ")")

knitr::kable(doc[, c("seq", "description", "document", "size",
"md_href")],
col.names = c("Sequence", "Description", "Document",
digits = 2,
format.args = list(big.mark = ","))
12 NA Complete submission text file 0001193125-17-323042.txt 6,983,971 Link

Normally, we would use filing_documents() to get to the 10-Q directly, but as an example we’ll be using the complete submission file to demonstrate the parse_submission() function. You would want to use the complete submission file if you want to access the full list of files - e.g. in this case there are 80 files in the submission, but only 10 available on the website and therefore available to filing_documents() - or if you worry about efficiency and are downloading all of the documents.

## Parse the Complete Submission File

Now that we have the link to the complete submission file, we can parse it into components.

parsed_docs <- parse_submission(doc$href) knitr::kable(head(parsed_docs[, c("SEQUENCE", "TYPE", "DESCRIPTION", "FILENAME")]), col.names = c("Sequence", "Type", "Description", "Document"), digits = 2, format.args = list(big.mark = ",")) Sequence Type Description Document 1 10-Q 10-Q d432283d10q.htm 2 EX-10.1 EX-10.1 d432283dex101.htm 3 EX-10.3 EX-10.3 d432283dex103.htm 4 EX-10.4 EX-10.4 d432283dex104.htm 5 EX-31.1 EX-31.1 d432283dex311.htm 6 EX-31.2 EX-31.2 d432283dex312.htm And just for example, here’s the end of the full list - note the excel that isn’t on the SEC site for instance. knitr::kable(tail(parsed_docs[, c("SEQUENCE", "TYPE", "DESCRIPTION", "FILENAME")]), col.names = c("Sequence", "Type", "Description", "Document"), digits = 2, format.args = list(big.mark = ",")) Sequence Type Description Document 82 82 XML IDEA: XBRL DOCUMENT R65.htm 83 83 EXCEL IDEA: XBRL DOCUMENT Financial_Report.xlsx 84 84 XML IDEA: XBRL DOCUMENT Show.js 85 85 XML IDEA: XBRL DOCUMENT report.css 86 87 XML IDEA: XBRL DOCUMENT FilingSummary.xml 87 89 ZIP IDEA: XBRL DOCUMENT 0001193125-17-323042-xbrl.zip The 10-Q Filing document is Seq. 1, with the full text of the document in the TEXT column. # NOTE: the filing document is not always #1, so it is a good idea to also look # at the type & Description filing_doc <- parsed_docs[parsed_docs$TYPE == '10-Q' &
parsed_docsDESCRIPTION == '10-Q', 'TEXT'] substr(filing_doc, 1, 80) #> [1] "<HTML><HEAD>\n<TITLE>10-Q</TITLE>\n</HEAD>\n <BODY BGCOLOR=\"WHITE\">\n<h5 align=\"left" We can see that contains the raw document. For document types which are not plain text, e.g. the XBRL zip file, the content is uuencoded and would been further processing. ## Parse the Filing Document Fortunately edgaWebR functions that take URL’s will also take a string containing the document, so to parse the document, while we could have passed the URL to the online document we can just pass in the full string. doc <- parse_filing(filing_doc, include.raw = TRUE) unique(docpart.name)
#> [1] ""        "PART I"  "PART II"
unique(doc$item.name) #> [1] "" #> [2] "ITEM 1. FINANCIAL STATEMENTS" #> [3] "ITEM 2. MANAGEMENT'S DISCUSSION AND ANALYSIS OF FINANCIAL CONDITION AND RESULTS OF OPERATIONS" #> [4] "ITEM 3. QUANTITATIVE AND QUALITATIVE DISCLOSURES ABOUT MARKET RISK" #> [5] "ITEM 4. CONTROLS AND PROCEDURES" #> [6] "ITEM 1. LEGAL PROCEEDINGS" #> [7] "ITEM 1A. RISK FACTORS" #> [8] "ITEM 2. UNREGISTERED SALES OF EQUITY SECURITIES AND USE OF PROCEEDS" #> [9] "ITEM 3. DEFAULTS UPON SENIOR SECURITIES" #> [10] "ITEM 4. MINE SAFETY DISCLOSURES" #> [11] "ITEM 5. OTHER INFORMATION" #> [12] "ITEM 6. EXHIBITS" head(doc[grepl("market risk", doc$item.name, ignore.case = TRUE), "text"], 3)
#> [1] "ITEM 3. QUANTITATIVE AND QUALITATIVE DISCLOSURES ABOUT MARKET RISK"
#> [2] "We have exposure to market risks due to the volatility of interest rates, foreign currency exchange rates, credit rating changes, equity and bond markets. A portion of these risks may be hedged, but fluctuations could impact our results of operations, financial position and cash flows."
#> [3] "Interest Rate Risk. Our exposure to market risk for changes in interest rates relates primarily to our investment portfolio. As of September 29, 2017, we had no available-for-sale securities that had been in a continuous unrealized loss position for a period greater than 12 months. The Company determined no available-for-sale securities were other-than-temporarily impaired as of September 29, 2017. We currently do not use derivative financial instruments in our investment portfolio."
risks <- doc[grepl("market risk", doc$item.name, ignore.case = TRUE), "raw"] Now the document is all ready for whatever further processing we want. As a quick example we’ll pull out all the italicized risks. risks <- risks[grep("<i>", risks)] risks <- gsub("^.*<i>|</i>.*$", "", risks)
risks <- gsub("\n", " ", risks)
risks
#> [1] "Interest Rate Risk"             "Foreign Currency Exchange Risk"
#> [3] "Derivatives and Hedging. "      "Other Market Risks"

This is a fairly simplistic example, but should serve as a good tutorial on processing filings.

install.packages("edgarWebR")
# install.packages("devtools")
devtools::install_github("mwaldstein/edgarWebR")