Free Dataset Now Available: Retail Brands in Miami
Blog home

How Our Definition of a “Place” Has Evolved

May 1, 2023
by
Bryan Bonack

A brief story about SafeGraph’s rapid evolution as a Data-as-a-Service company

Only seven years ago, SafeGraph opened its doors as one of the few data-only companies in existence, thereby allowing us to position ourselves uniquely as a Data-as-a-Service (DaaS) provider. While our mission has always been to democratize access to clean, accurate, and highly granular data about physical places—aka, location-based data or geospatial data—at the time, we only captured a few million places across the U.S. 

Fast-forward to the present day. SafeGraph's Places database boasts rich attributes for over 41 million locations across more than 220 countries and territories worldwide. While expanding the scope and depth of our data has been crucial for growing our business, it hasn't come without obstacles. With each period of expansion, we've had to continuously reevaluate and even challenge our own understanding of what qualifies as a "place" to meet ever-evolving customer needs.

As we continue to progress on our mission, we believe it’s important to share some of the valuable insights we’ve gained along the way—and reveal some of our hard-earned learnings with you today.

Scaling our data alongside ever-growing customer expectations

When we first entered the market, we filled a gap that desperately needed filling. It seemed nearly impossible to access good, clean, and accurate geospatial data in the US without spending hours just to make it usable. So, our initial focus was about making our datasets immediately usable for our customers once delivered.

A tireless quest to scale and grow our dataset

However, once customers got a taste of just how refreshing it was to work with clean, accurate, and regularly updated datasets, they started asking questions about how we could expand our offering to make our data applicable to even more use cases. As they became increasingly curious about what else could be done using SafeGraph data, every question served as a catalyst for exploring new ways of expanding our dataset—both in terms of adding more physical places and appending more robust metadata to those places. 

For example, when we first started out, we were primarily interested in where people spent their money (i.e. retailers and restaurants) in the US only. Then, we expanded into places where people spend time but not necessarily money—like parks, schools, offices, warehouses, and manufacturing facilities. Next, we honed in on small footprint POIs like electric vehicle (EV) charging stations and ATMs. And finally, we started appending new attributes—like “store ID” (to help customers match places data to transaction data) and “category_tags” (to help customers isolate places based on narrow text descriptors)—and grew this scope globally. 

It was important for us to move fast and position SafeGraph as the go-to source for comprehensive and accurate geospatial data capturing the ins and outs of real world places. We also knew that if we couldn’t provide our customers with all the data they needed, they’d look elsewhere. So in many ways, we became voracious data scavengers to ensure that we had our customers’ backs at all times. 

Staying laser-focused on data accuracy 

But scaling our scope to address customer needs is only one side of the equation. As the dataset grows, it becomes exponentially more complex and must be balanced by rigorous quality assurance to maintain our market promise as a high quality data vendor. So much so that we began asking ourselves, “What is a real place?” 

Knowing that businesses, of all shapes and sizes, open and close every single day, we constantly have to double-check to ensure that the places in our dataset are actually “open for business” in the real world. We also know that adding more data sources to achieve the desired coverage exposes us to lower quality records that may not be legitimate places (ex: online shops with no physical address or names of events at venues like “Phantom of the Opera”). 

The perennial precision of our datasets and our customers’ trust in our ability to deliver quality data go hand in hand. We really can’t have one without the other. Not to mention, we need to think about the negative impact or “cost” a false positive in the data could have on businesses who look to our data as a source of truth. For us, making those kinds of errors isn’t acceptable. And although no dataset can be 100% accurate at all times, our team goes above and beyond to ensure precision—and we deliver it.

How the SafeGraph Places dataset is built

If you’ve ever wondered what steps we take to put together the SafeGraph Places dataset, here’s a sneak peek into the process: 

  • First, we capture baseline information about physical places—including name and address—from open, accessible sources that are known for their accuracy. A great example of this is the Starbucks Store Locator. Because Starbucks’ goal here is obviously to ensure that customers can find an open store nearby, it wouldn’t make sense for this POI data to be out-of-date. This is what we’d call a “high veracity” data source.

  • Now, once we’ve established that a given POI does, in fact, exist in the real world, we’ll pull and layer on whatever other raw metadata we can find to give those locations added dimension. This can include geographical coordinates, business operating hours, POI description, store footprint geometry, and so on. Basically, we want to paint the most complete and comprehensive picture possible of every POI in our dataset. And to maintain the accuracy of our data, we refresh tens of thousands of different data sources each month to keep the raw information associated with our POIs updated.

  • Here’s the “fun” part. Knowing that our customers come to us to access crystal clean data, we process these raw digital signals in a spark pipeline using geographic heuristics and machine learning to clean it up. We do this, first and foremost, to de-duplicate POIs that pop up across various data sources and, secondarily, to attach only the most accurate and precise pieces of metadata into our Places schema for any given POI.

    As part of this quality assurance process, we use several layers of geocoding to ensure the geographical coordinates (latitude/longitude) are based on a POI’s rooftop, standardize POI names and addresses, and finally bring in machine learning to infer the appropriate category description based on the metadata associated with each POI. 

Of course, there’s a lot of other work that happens in between each of these steps to ensure that the final product is exactly what our customers want and need from us. But if there’s one thing to takeaway from this, it’s that we make it a priority to go above and beyond to ensure we provide our customers with the most accurate and comprehensive geospatial data available. 

No one does “places” better than SafeGraph

What is a “place,” you might ask? As you can see, that answer is different today than it was four years ago, and it will inevitably evolve again to adapt to ever-changing customer needs and expectations. What we initially believed to be a “place” in geospatial terms has grown into something far bigger and with greater reach than we could have ever imagined—including many things we haven’t even started to tackle today (or even know we need to tackle yet). 

What we do know is that, however this definition may change, we will remain committed to our promise of providing our customers with complete, accurate, up-to-date data on physical places. For this reason, we’re pretty convinced that no one does “places” better than SafeGraph. 

---

What’s your definition of a place? Share your thoughts with us today.

Browse the latest

Questions? Get in touch with our team of data experts.