A few general questions about core and pattern data

Guanting_Yi · May 3, 2022, 1:07pm

Hello SafeGraph, I have a few questions regarding some features of the data, I appreciate the help. The questions are:

How did SafeGraph sample the cellphone signals? Did SafeGraph just randomly sample 10% from the nation? or SafeGraph did stratified sampling that samples from some geographical level?
How do SafeGraph assign the category tags to POI in the core dataset? Is it by algorithm that generates the tags based on name or other information? or is it by a SafeGraph worker that generates the tags?

Thanks!
Guanting

Jeff_Ho_SafeGraph · May 3, 2022, 3:18pm

The panel is not obtained through stratified sampling. See this FAQ for more details.

That said, we compared our sampling rate by geography to the census data here, so you can get a sense of the representativeness of the panel.

Category tags are assigned through a mix of algorithms and fixed tags (e.g., for well-known brands). See more detail here

Guanting_Yi · May 3, 2022, 8:48pm

Thanks Jeff! Could you elaborate a little more about how the algorithms assign the category tags in general? Thank you!

Jeff_Ho_SafeGraph · May 3, 2022, 9:38pm

I can share just what’s on the docs page, that it depends on on the POI name and other descriptive metadata.

Guanting_Yi · May 4, 2022, 3:24pm

I am not sure I am clear about what metadata in this context mean, could you share a bit more? Thank you.

Jeff_Ho_SafeGraph · May 5, 2022, 11:37pm

This may be a helpful document, which outlines SafeGraph’s sourcing methodology: SafeGraph's Data Sourcing Process

In this case the metadata is taken from POI descriptors across the variety of sources. I can’t provide much more context there, since it would be too specific to sources; however based on the above methodology you can probably assume what some of the metadata features would be.