I have some questions when exploring the monthly pattern data in 2018

Dear all, I have some questions when exploring the monthly pattern data in 2018. I found that several parks or food stores have just one or two visit in one or two month (Not January or December) and then have zero visit in the rest months of the year. I think the visit in these POIs may be too strange. However, it is not easy to pinpoint a threshold value to erase these strange visits. For example, the yearly visit of food stores in Chicago, combined by the monthly visit, can range from 1 to over 20,000 with a relatively continuous sequence (i.e., 1, 2, 3, 5, 6, 7, 8, 9, 11, 12, 13… 41, 42…). So how do you justify the valid visit or acceptable minimum visit value?

I’ve seen this problem come up before in the context of sports stadiums. One possible reason for it occurring is if there’s something odd with the nearby cell tower, or if the POI is not correctly included in the data at the time (but may be added later which is why it’s in the database). I’m not sure there’s a good fix, but you might try looking for massive variation in visits over time (like a 99% dropoff not just explained by the pandemic) and seeing if those locations look like an error. Maybe someone else has a better solution.

@Zhou_University_of_Toronto – can you let us know which Placekeys or SGPIDs you are seeing this issue with? @ross_epstein_safegraph @Bryan_SafeGraph

Thanks Nick @Nick_H-K_Seattle_University. One of way that I use so far is to find a relatively not-low value (i.e., yearly visit = 700 for food stores and yearly visit > 12 for parks), and directly use it as the threshold value.

Hi, @Auren_Hoffman_SafeGraph Since the amount of Placekeys is not low, if you don’t mind, you can privately send me your email and I can send all strange IDs I have found so far to you

you can send right to me! mailto:[email protected]|[email protected]

Sure. I will send it to you soon.

i received your files, thank you! We just recently also released the backfill - take a look here https://docs.safegraph.com/changelog/december-2020-release-notes. wonder if this help

Hi, so can I just directly go to Data Catalogue to redownload it?


@Zhou_University_of_Toronto - the team took a quick look here. we recommend you should disregard if the counts are not high enough/look like outliers. maybe the restuarants did not actually exist yet in 2018?

and for parks:

• A few of these POIs are misclassified as parks - like “Lincoln Park Archery Club” (sg:c3036af9ff144e8ba5ff354c2f451506) - which has much lower visits than the average park so stands out
• There look to be a few duplicates POIs for “Bill Jarvis Migratory Bird Sanctuary” (sg:1784c9785f104667b6b412defedac0b7) - this may want to be ignored
In general, visits to parks can be erratic b/c of events held in parks in normal times, so definitely need to keep that in mind.

Thanks for your team work.