## Parsing from JSON

There are already a couple of packages to parse JSON (e.g. rjson or jsonlite). But for me they all had some drawbacks when working with JSON columns from a database:

• bad/no vectorisation: RJSONIO and rjson silently use only the first element of the input; jsonlite collapses the input, producing invalid JSON and then complains about trailing garbage.
• handling of missing values: NA and NULL input produce unclear errors and one cannot control how to handle them.
• syntax: the camel case syntax doesn’t fit to the rest of the tidyverse.

jsontools relies on jsonlite for parsing from JSON and conversion to JSON. For parsing it has the following functions:

• parse_json() to parse a character vector of length 1 (i.e. a scalar).
• parse_json_vector() to parse a character vector of any length.

As an example assume we got the following JSON vector

library(jsontools)

customer_infos <- json2(c(
'{"name": "Peter", "age": 19, "premium": true}',
'{"name": "Daniel", "age": 41}',
NA,
'{"name": "Pablo", "age": 27, "premium": false}'
))

customer_infos
#> {"name": "Peter", "age": 19, "premium": true}
#> {"name": "Daniel", "age": 41}
#> NA
#> {"name": "Pablo", "age": 27, "premium": false}

By default the NA produces an error

parse_json_vector(customer_infos)
#> Error: input is NA.
#> To use a default value use the argument .na.

but with the .na argument we can specify which value should be used instead:

parse_json_vector(customer_infos, .na = NULL) %>% str()
#> List of 4
#>  $:List of 3 #> ..$ name   : chr "Peter"
#>   ..$age : int 19 #> ..$ premium: logi TRUE
#>  $:List of 2 #> ..$ name: chr "Daniel"
#>   ..$age : int 41 #>$ : NULL
#>  $:List of 3 #> ..$ name   : chr "Pablo"
#>   ..$age : int 27 #> ..$ premium: logi FALSE

Note that the default for simplifyDataFrame and simplifyMatrix has been changed to FALSE. While they can be quite nice for interactive parsing of JSON they do not allow you to control how the result should look like. For a more controlled conversion of the resulting list to a data frame have a look at the tibblify package.

## Conversion to JSON

• no vectorised version: One cannot easily convert every element of a vector/list to JSON and get back a JSON vector.
• convert every row of a data frame: it is not so easy and/or slow to convert every row of a data frame into JSON.

For conversion there are the following functions:

• format_json(): convert an R object to JSON.
• format_json_vector(): convert each element of an R object to JSON and return them as a JSON vector.
• format_json_rowwise(): convert each row of a data frame to JSON.
• write_json(): convert an R object to JSON and write it to disk.

Let’s look at our customer informations from above. Assume we have converted it into a tibble:

customer_infos_df <- tibble::tibble(
name = c("Peter", "Daniel", NA, "Pablo"),
age = c(19L, 41L, NA, 27L),
premium = c(TRUE, NA, NA, FALSE)
)

and filled the missing fields:

customer_infos_df[3, ] <- tibble::tibble(
name = "Michael",
age = 38,
)
customer_infos_df$premium[2] <- TRUE customer_infos_df #> # A tibble: 4 x 3 #> name age premium #> <chr> <int> <lgl> #> 1 Peter 19 TRUE #> 2 Daniel 41 TRUE #> 3 Michael 38 FALSE #> 4 Pablo 27 FALSE We can now easily convert it back into a JSON vector format_json_rowwise(customer_infos_df) #> {"name":"Peter","age":19,"premium":true} #> {"name":"Daniel","age":41,"premium":true} #> {"name":"Michael","age":38,"premium":false} #> {"name":"Pablo","age":27,"premium":false} Compare this to format_json() which turns the data frame into an array of objects: format_json(customer_infos_df) #> [{"name":"Peter","age":19,"premium":true},{"name":"Daniel","age":41,"premium":true},{"name":"Michael","age":38,"premium":false},{"name":"Pablo","age":27,"premium":false}] To format every element of a list to JSON instead of the whole list at once use format_json_list(): customer_infos_list <- parse_json_vector(customer_infos, .na = NULL) str(customer_infos_list) #> List of 4 #>$ :List of 3
#>   ..$name : chr "Peter" #> ..$ age    : int 19
#>   ..$premium: logi TRUE #>$ :List of 2
#>   ..$name: chr "Daniel" #> ..$ age : int 41
#>  $: NULL #>$ :List of 3
#>   ..$name : chr "Pablo" #> ..$ age    : int 27

format_json_list(customer_infos_list)
#> {"name":["Daniel"],"age":[41]}
#> {}
#> {"name":["Pablo"],"age":[27],"premium":[false]}

Notice that each value is wrapped in a JSON array? If you do not want this you have three possibilities:

• use json_u() on the elements you do not want in an array.
• use the argument auto_unbox = TRUE to only wrap elements of length > 1 in an array.
• put the data in a data frame.

While auto_unbox = TRUE is quite handy you should only use it in non-interactive scripts as otherwise the output JSON might change the format. For example we a value customers in a list. Usually, customers is a vector of length > 1.

x <- list(customers = 1:2)
format_json(x, auto_unbox = TRUE)
#> {"customers":[1,2]}

But in some rare cases there is only a single customer

x <- list(customers = 1)
format_json(x, auto_unbox = TRUE)
#> {"customers":1}

and instead of an array we get a scalar integer at the key “customers”.