Revealing SafeGraph’s Secret Method For Getting Accurate Store Visits From GPS Data

June 29, 2019

Store Visit Attribution Challenges & SafeGraph’s Approach To Solving Them

Burger King recently ran a promotion that went viral to sell a Whopper for just a single penny through their app. The one catch: the offer was valid only if the order was placed from a McDonald’s restaurant.

1 Cent Burger King Whoppers…But Only “At” McDonald's!

To make this marketing stunt work, Burger King had to get a user’s location and unlock the promotion only when an algorithm could resolve that the GPS data represented a real visit to a McDonald’s location. The general process of determining from location data if a device visited a particular place, brand, or type of store, is known as “Visit Attribution”.

Visit Attribution Connects Digital Data With Physical World Actions

Burger King isn’t the only company trying to do store visit attribution. Using location data for visit attribution is one of the fastest growing areas of big data analytics. The ability to connect digital data with physical world actions is valuable to many businesses and organizations, from ad-tech to sociology.

Store Visit Attribution Takes GPS Data And Turns It Into Store Visit Insights

Ad-tech companies are interested in online-to-offline attribution so that they can measure an ad campaign’s impact on store footfall traffic. That’s one reason why Snap acquired Placed for $135 million: to prove that Snap’s sponsored geofilters drive store visits.

Retail analytics firms use GPS data to create store visits insights to then analyze trade-areas and determine where else shoppers go in aggregate. These store visit patterns are then used by retail brands to determine where future stores get built (site selection).

Researchers and policy-makers can even use visit attribution to uncover societal problems such as to find where food-deserts exist, to help promote voting, analyze natural disasters, or to better plan cities.

“Open-sourcing” SafeGraph’s Secrets On Accurate Visit Attribution

Unfortunately, creating a great store visit attribution algorithm is difficult. SafeGraph’s engineering & data-science team built a high-precision solution to this problem. If you want to jump straight to the solution, check out our detailed 23-page technical whitepaper which explains our successful approach. But first, we want to elaborate on exactly why this problem is so challenging and our motivation for revealing our method publicly.

Overcoming Noisy GPS Data Is Challenging

The first stumbling block to creating a good visit attribution algorithm is dealing with the inherent noisiness of GPS data. GPS drift, caused by buildings or heavy cloud cover, can cause a whole series of GPS points to be shifted from their true location. There can also be GPS sinks, which pool multiple GPS pings to the same spot. As shown below, this makes a determining a store visit complicated.

Store Visit Attribution Takes GPS Data And Turns It Into Store Visit Insights

Another common occurrence is when two consecutive GPS pings in time are relatively close but still implausibly spaced. These jumps might occur if the GPS receivers in your phone lag or if a nearby skyscraper interferes with the location signal. These jumps can be problematic since legitimate visits are split into multiple smaller components. This leads to incorrect visits being created in a new area.

Place Data Is Fundamental For Creating Accurate Store Visits

Perfectly clean GPS data still isn’t enough to make visit attribution an easy problem to solve. If you want to translate GPS coordinates into places visited, you need to know about every place a person could go and where these places are located. In our experience working with many customers, partners, and our internal R&D team, the #1 most significant challenge for accurate visits attribution from mobile GPS data is not access to accurate mobile GPS data, but access to accurate Places data. So, accurate visit attribution solutions require an extensive Point of Interest (POI) dataset with data on a place’s location, brand, type of place, and open hours.

Polygons Are Significantly Better Than Centroids

Most POI solutions on the market today only have a store’s address and location represented as a single latitude and longitude point (a centroid). Having just the centroid causes problems because even with clean GPS data drawing an arbitrary radius around a store’s centroid isn’t sufficient location context for accurate attribution.

Using centroids often leads to picking up visits from people walking on sidewalks next to stores. Centroids also fail in cases like strip malls, where stores are in high density leading to many overlapping store radiuses. Another problem with using store centroids, pictured below, is what occurs when you have large stores surrounded by smaller stores. Visits systematically get attributed to smaller stores which are close by but in reality not visited.

Centroids With Radius Fail When Small Stores Are Close To Large Stores

Having a store’s centroid isn’t enough for accurate visit attribution. As illustrated below, having a POI’s exact building footprint (polygon) solves many of the shortcomings of using centroids with radiuses.

Centroids With Radius Fail When Small Stores Are Close To Large Stores

Having POI polygons also makes it easy to filter out people who walk past a store but never actually enter the building. It makes accurately attributing visits in dense areas like strip malls more feasible since the store building footprints won’t overlap. The benefits of polygons over centroids is what motivated SafeGraph to make SafeGraph Places: our dataset of over 5 million POI with exact polygons, brand information, and other business listing information.

Matching GPS Data to Places To Create Visits Needs Machine Learning

Now, with cleaned GPS data and an accurate POI dataset, there is still a final hurdle to creating a robust visit attribution algorithm. You need a method to join these two datasets to produce accurate visits. This problem isn’t trivial: you need to cluster GPS data points intelligently, account for store open hours, and decide which nearby POI a GPS cluster actually belongs to. To leverage machine learning in this problem requires encoding all your information into features and then training a machine learning algorithm on those features to produce visits.

SafeGraph invested significantly in solving this matching problem, and we developed a powerful method of matching GPS data to places to create visits. Our most successful approach includes augmenting a DBSCAN clustering algorithm that considers time in addition to space and a machine-learning algorithm that compares the likelihood of places visited based on the local candidates (rather than classifying binary likelihood of a visit to each place). We’ve shared a technical deep dive of SafeGraph’s innovative approach to matching in the last half of the 23-page long visit attribution whitepaper.

Our Motivation For Revealing Our Secret Visit Attribution Method

But why share all the secret sauce we’ve developed?

We are firm believers that innovation requires open access to information, data can transform society for the better, and that SafeGraph is just a data company. Over the past 2+ years that we have been working on visit attribution from mobile GPS data, the #1 limitation we’ve encountered in-house and in the industry is access to accurate and complete Places data. This is the reason we built and are so proud of SafeGraph Places, and want to share our visit attribution methodology which builds on top of SafeGraph Places data.

We want to enable everyone to leverage SafeGraph Places to its maximum potential. We want better ads, deeper sociological and economic understandings of human behavior, and a safer world. We hope this detailed technical white paper inspires progress and innovation in all these areas and beyond.

To learn all the details of how to do visits attribution from GPS & IP-address data, download our whitepaper or get in touch with our team.

The Source of Truth for Physical Places