The *SafeGraphR* package is now fully functional and has enterered beta

Exciting news for a Friday afternoon: the SafeGraphR package is now fully functional and has entered beta! The package is designed to make it easy to download, read in, and process SafeGraph patterns and stay-at-home data.

Features:

  1. Read and compile files (from AWS or from the shop) into R, including the ability to expand JSON columns and aggregate to different levels while minimizing memory footprint (plus, do multiple aggregations/expansions in a single file-read to save time)
  2. Produce a POI-NAICS link file
  3. Perform normalization: to sample size, to adjust for sampling rate differences, hierarchical Bayes shrinkage, seven-day moving averages, and scaling relative to a date or relative to previous year
  4. Helper functions and data sets make it easy to pull state and county FIPS codes from census block group codes, to link those FIPS codes to the actual names of the places, and also to put names on the NAICS codes.
  5. You can find installation information as well as two informative vignettes that walk you through working with patterns and stay-at-home data, respectively, at the package’s website here: Package for Processing and Analyzing SafeGraph Data • SafeGraphR

Let me know any issues you have.

Really nice work! Are the json expansion utilities able to handle the variable size fields now? Thanks

That is unfortunately not there yet. But it’s on my list for an update

10-4. What is your strategy? I’ve been lapplying over rows… inefficient, but feasible for subsets. I’d like to contribute if you find it useful.

I might try to make the expansion produce something in mergeable format and then let merge do the work.

K. I’ll let you know if I write anything useful. Thanks again. Also, your affiliation is changing soon, right? Congrats

THat’s right, I’ll be at Seattle University. Thanks!

@Jude_Bayham_Colorado_State_U Check the most recent SafeGraphR update, it can now handle unequal categories across rows. I haven’t checked how slow it gets on the big data (unfortunately it does have to epxand the data all the way before collapsing, rather than piece-by-piece as before) but it routes everything through data.table so should at least be faster than lapplying

Great! @Nick_H-K_Seattle_University, I’ll let you know if I have any issues/comments. I’m sure it will benchmark better than my solution. Thanks!