It turns out that getting basic truthful information about a place is really hard. It is hard to even find a good source of which stores are where. We know … because when we started SafeGraph we tried to buy it. We evaluated over 20 vendors and none of them were high quality. High quality places need to:
1) Have accurate polygons (over random centroids)
2) Eliminate Noise (i.e. PO Boxes)
3) Delete duplicates and inaccurate (or outdated) places
In our current version (v 1.1), SafeGraph Places consists of almost every place in the U.S. where one can spend money. We’re working on having every place in the U.S. you can spend time (including office buildings, homes, parks, schools, etc.). And eventually, our goal is to be able describe every place in the world.
SafeGraph Places 1.1 is curated:
One of our early customers is one of the largest mobile carriers in the U.S. They run their location stack on top of SafeGraph Places. Before choosing SafeGraph, they evaluated a dozen vendors over a period of many months — going through the data programmatically and by hand.
There are thousands of sources of data about a place. Our challenge is to merge this data together and use the best attributes of each source.
SafeGraph ingests data from thousands of diverse sources that together represent billions of discrete pieces of information about places of interest. Our system programmatically ingests, compares, validates, merges data and draws precise polygons. We leverage unique, advanced truth data to continually improve the accuracy of Places, ultimately resulting in a map of Places of Interest that best represents truth.
Simple things, like merging data from different sources, turn out to not be so simple. Semantic brand detection and hierarchy is important.
We identify true brands from POI with merely similar names (e.g. Lee’s Sandwiches vs. Lee’s Deli)
Understanding hierarchical relationships like native substores (e.g. Walmart Vision Center) vs. foreign substores (e.g. CVS inside Target) enables us to better filter or keep POI.
Spatial transformation and interpolation is important (and really hard). We intelligently partition an overall building shape into substores (think of a strip mall). We also strive to understand spatial relationships of substores within malls, stadiums, airports, and more.
To really understand a place, you need to know its dimensions or shape (in geospatial parlance, that’s called a “polygon”). Essentially, it is a map that describes a place.
SafeGraph has detailed polygons for all 5+ million places we currently track (places in the U.S. where you can spend money).
Relying on centroids can significantly reduce accuracy. They overlap, have different radiuses that are hard to calculate, and the “centroid” is usually not at the center.
Traditional POI data or “business listing” vendors advertise 20+ million places. However, these typically include a lot of extraneous material that actually obfuscates the POI that people really go to.Extraneous POI includes ATMs, P.O. Boxes, kiosks, “Walmart Photo” (but also a “Walmart” at the same location). SafeGraph Places does the hard work to remove all the junk, and exposes only those things that matter for visit attribution: the places that people actually visit.
Traditional POI vendors usually only have 80% of a given brand’s stores. We ensure that we have close to 100% of every brand’s store in SafeGraph Places. We’ve also focused on the long tail of “Mom and Pop” stores. In addition, we go to great lengths to keep our POI data fresh as stores open, move, and close.
While we have great customers for SafeGraph Places, we are only now marketing it broadly. Our initial customers include some of the most advanced geospatial companies (and they helped us make the product better). Now we are opening up SafeGraph Places to a wider set of customers.
You can also get data directly from SafeGraph. Since you got to the end of this blog post, use the discount code “SpringIntoSafeGraph” to get $100 of data for free: https://shop.safegraph.com/