Blog

Ideas-of-Interest

Introducing SafeGraph Places: The Source of Truth about Physical Places

July 25, 2018
|
By
Auren Hoffman

Our goal is to be the source of truth about physical places in the world

It turns out that getting basic truthful information about a place is really hard. It is hard to even find a good source of which stores are where. We know … because when we started SafeGraph we tried to buy it. We evaluated over 20 vendors and none of them were high quality. High quality places need to:

1) Have accurate polygons (over random centroids)

2) Eliminate Noise (i.e. PO Boxes)

3) Delete duplicates and inaccurate (or outdated) places

Enter SafeGraph Places.

In our current version (v 1.1), SafeGraph Places consists of almost every place in the U.S. where one can spend money. We’re working on having every place in the U.S. you can spend time (including office buildings, homes, parks, schools, etc.). And eventually, our goal is to be able describe every place in the world.

SafeGraph Places 1.1 is curated:

  • 5MM+ points of interest (POI) covering all places someone can spend money (including all key brands as well as “mom & pop” shops)
  • Accurate Polygons for every place
  • Additional critical information such as major brand (McDonald’s, Starbucks, etc.), name (Tampa Marriott Westshore, Dominique Ansel Bakery, etc.) hours of operation, street address, and category data

Big companies are basing their business on SafeGraph Places.

One of our early customers is one of the largest mobile carriers in the U.S. They run their location stack on top of SafeGraph Places. Before choosing SafeGraph, they evaluated a dozen vendors over a period of many months — going through the data programmatically and by hand.

SafeGraph algorithmically combines the best of all sources

There are thousands of sources of data about a place. Our challenge is to merge this data together and use the best attributes of each source.

SafeGraph ingests data from thousands of diverse sources that together represent billions of discrete pieces of information about places of interest. Our system programmatically ingests, compares, validates, merges data and draws precise polygons. We leverage unique, advanced truth data to continually improve the accuracy of Places, ultimately resulting in a map of Places of Interest that best represents truth.

We leverage thousands of sources — including satellite imagery data and municipal records to help generate the most accurate understanding of a store’s footprint.

Getting to the truth requires crazy machine learning

Simple things, like merging data from different sources, turn out to not be so simple. Semantic brand detection and hierarchy is important.

We identify true brands from POI with merely similar names (e.g. Lee’s Sandwiches vs. Lee’s Deli)

Understanding hierarchical relationships like native substores (e.g. Walmart Vision Center) vs. foreign substores (e.g. CVS inside Target) enables us to better filter or keep POI.

Spatial transformation and interpolation is important (and really hard). We intelligently partition an overall building shape into substores (think of a strip mall). We also strive to understand spatial relationships of substores within malls, stadiums, airports, and more.

You cannot have great location data without great polygons

getting precise maps is really, really hard

To really understand a place, you need to know its dimensions or shape (in geospatial parlance, that’s called a “polygon”). Essentially, it is a map that describes a place.

SafeGraph has detailed polygons for all 5+ million places we currently track (places in the U.S. where you can spend money).

Centroids & radiuses have baked-in errors

Centroids & radiuses have baked-in errors

Relying on centroids can significantly reduce accuracy. They overlap, have different radiuses that are hard to calculate, and the “centroid” is usually not at the center.

…but polygons can represent the truth

SafeGraph Places is built to answer: did a visit occur?

Traditional POI data or “business listing” vendors advertise 20+ million places. However, these typically include a lot of extraneous material that actually obfuscates the POI that people really go to.Extraneous POI includes ATMs, P.O. Boxes, kiosks, “Walmart Photo” (but also a “Walmart” at the same location). SafeGraph Places does the hard work to remove all the junk, and exposes only those things that matter for visit attribution: the places that people actually visit.

Focus on making sure we have ALL the stores

Traditional POI vendors usually only have 80% of a given brand’s stores. We ensure that we have close to 100% of every brand’s store in SafeGraph Places. We’ve also focused on the long tail of “Mom and Pop” stores. In addition, we go to great lengths to keep our POI data fresh as stores open, move, and close.

SafeGraph Places has been our secret … until now

While we have great customers for SafeGraph Places, we are only now marketing it broadly. Our initial customers include some of the most advanced geospatial companies (and they helped us make the product better). Now we are opening up SafeGraph Places to a wider set of customers.

You can also get data directly from SafeGraph. Since you got to the end of this blog post, use the discount code “SpringIntoSafeGraph” to get $100 of data for free: https://shop.safegraph.com/


Auren Hoffman
Auren Hoffman
CEO, SafeGraph. Fmr CEO of LiveRamp. Follow @auren