Blog home

Why Transparency Matters: Becoming the Most Transparent Data Company

January 24, 2022
by
SafeGraph

SafeGraph’s goal is to be the source of truth for data on physical places. To achieve this goal, we’re laser-focused on curating the most accurate, precise, and up-to-date geospatial datasets to power location analytics at large corporations, small businesses, and academic institutions alike. 

But we recognize that truth is aspirational. In such a rapidly, constantly changing world, building truth sets for physical places is a tall order, and one that we will never get 100% right. While we will always strive to be as close to 100% as possible, we will also always be transparent about where our data comes from, and how accurate it is. Part of this commitment to transparency means providing open-access to our data, as well as open-access to our sourcing process, schema, fill rates, and bugs.

How Transparency is Core to Our Mission of Becoming the Source of Truth for Physical Places

Since our beginning, we’ve been committed to transparency and providing access to high-quality places data without compromising consumer privacy. We’ve always been focused on data tied to latitude and longitude coordinates, never tied to people. And we’ve always been up-front about any limitations to the data we provide.

Our data journey

An integral part of analyzing locations is understanding how people interact with them. That’s why we continue to partner with other data providers to enrich our places data with additional consumer-related attributes, such as anonymized and aggregated mobility counts and transactions. 

Our very first dataset was aggregated and anonymized mobile pings curated from apps where consumers opted-in to share their location. At the time, we didn’t provide context as to what was happening at the locations of those pings, just the latitude, longitude, and time stamp. A lot of our customers were open with us about the limitations of this: mainly, that it was not helpful to see where people were traveling without knowing what was also there. So we decided to source points of interest (POI) data to add the necessary context.

But we quickly realized that high quality POI data was extremely difficult to source. The few companies that did provide it at the time did not update it frequently enough to accurately represent a changing world, often relied on inaccurate geocodes, and didn’t offer the amount of detail about each place that users really needed to derive valuable insight from the data. 

So we decided to build it ourselves. In 2018 we officially shifted our strategy from focusing on device locations to curating place locations. We maintained our original commitment to transparency with this shift in product strategy. Even though we decided to build our own POI database to be more accurate than others on the market, we recognized that we would never be 100% correct; places change way too much for that to ever be the case. We continue to publish our bugs and their fixes every month to keep our users updated with both the good and the bad.

Why is Transparency Important?

Organizations are often using data for mission-critical analytics. They need to be able to trust both the data itself and the company that produces it.

If there is uncertainty about the data itself, the analytics that are produced with it are then called into question. Whether the analysis is used for making an important business decision, advising a client, or informing consumers about something impacting them, the outcome needs to be trustworthy to be valuable. Transparency in what the data is - whether good or bad - serves as the foundation of that trust.

The same is true for data providers. If a company is not trusted, neither is the product nor the service it delivers. Being transparent about bugs is part of building trust in a company, but so is transparency around how the product is built in the first place. This is especially true for companies whose products may include sensitive information. End users don’t want to be associated with a company that has a bad reputation. 

For SafeGraph specifically, our data is often just one ingredient in a larger solution. If the integrity of that data is called into question, so is the end solution. We work hard to uphold the integrity of our data, and in turn protect the work of our clients.

How is SafeGraph Transparent?

We don’t hide anything about our data. SafeGraph publishes our data schema publicly, as well as bug fixes and release notes. We refresh our data every month to ensure it's an accurate reflection of a dynamically changing world. While we strive to create the best, most accurate places data possible, we know we’ll never be perfect. We value feedback from our users, and make it easy to report errors in our data so we can fix them.

SafeGraph also makes it easy to access our data. We provide data free to academics for use in research and education, and are committed to being open about what data we offer and how it’s built. Our goal is to make our data open to anyone who needs to use it.

The integrity of our company and data curation is a key value to each one of our employees, and we are proud to continue partnering with organizations who feel the same. In location data, privacy is critical to transparency. That’s why we are up-front about what data we build, how we build it, and what we do to protect consumer privacy. 

Other companies offer POI, mobility, and transaction data, but what makes SafeGraph different is our commitment to transparency and consumer data privacy. Because our focus is entirely on places, we never work with individual consumer data. Instead, we curate aggregated and anonymized consumer behavior data to provide a general idea of visit volume and frequency to specific places. 

What kind of data does SafeGraph build?

Part of being transparent is being very clear on the type of data we sell. SafeGraph provides data about physical places. SafeGraph’s point of interest (POI) database includes details about specific locations, such as lat/long coordinates, open/close dates, and NAICS codes. Geometry for SafeGraph POIs denotes structural boundaries and building relationships for accurate proximity analysis and geofencing. SafeGraph also provides foot traffic insights created from aggregated and anonymized mobile location data to deliver mobility insights without personally identifiable information (PII). We also recently launched SafeGraph Spend to provide anonymized and aggregated debit/credit card transactions at individual businesses. We apply consumer protection methodology to our foot traffic and transaction datasets since we are not in the business of providing information on how individuals behave, but rather how specific locations relate to trends in consumer behavior.

Does SafeGraph collect data?

We create our datasets from a combination of machine learning, web crawling, and third-party licensing. More specifically, our Core and Geometry datasets are built using open store locators, publicly available APIs, and licensed third-party data. We also apply our own machine learning techniques to infer additional attributes about a place, or determine its shape. 

To build our Patterns dataset, we only work with partners that source from mobile applications with which users have opted-in to sharing their location. Then using our own Core and Geometry datasets, we derive visit attribution to specific places. SafeGraph does not collect consumer data. Our Patterns dataset also provides insights into where people travel from to get to the specific place, and where else they go. We aggregate origin information at the Census Block Group (CBG) level and apply differential privacy techniques to enable analytics at an optimal geographic scale.

SafeGraph Spend is created using licensed third-party credit and debit transaction data, aggregated and anonymized at the store level. The purpose of the Spend dataset is not to understand the spending habits of any one individual, but rather how aggregated transactions relate to physical stores, geographic regions and types of places.

You can read more about the data sourcing process for all of our datasets here.

Does SafeGraph have an SDK?

SafeGraph does not have an SDK or any software available in app stores. 

How does SafeGraph protect consumer privacy when building its data?

From the beginning of our data sourcing process, SafeGraph never deals with personally identifiable information. The only data we see and work with is anonymized and aggregated movements of people who have opted-in to sharing their location with our partner organizations. To further ensure consumer privacy, we apply differential privacy techniques within our methodology. This means that even for neighborhoods with smaller populations or places with fewer visitors, neither we nor our users can derive personally identifiable information from any of our data.

Does SafeGraph sell personally identifiable information (PII)?

SafeGraph does not provide device-level or PII data in any capacity, and instead focuses on volume aggregations by geography to show how populations move in relation to physical places. We chose this strategy because it ultimately leads to higher data quality and resiliency while protecting individual privacy.

SafeGraph Provides Open Access to Data

One of SafeGraph’s core values is ensuring data is not hoarded by a few companies or individuals. We believe data should be open to all who need or want to use it to power innovation and make the world better. Even before our product strategy pivot in 2018, this commitment to data accessibility has been a core pillar of SafeGraph.

We strive to provide an environment where data scientists, academics, and businesses can work together using location data to create new insights. Our community of over 15,000 data scientists collaborate on innovative ways to use location data for the betterment of society and improvements to business strategies

SafeGraph’s Commitment to Transparency

At SafeGraph, we recognize that we’re just an ingredient in the larger solutions our clients are powering with data. To protect the integrity of those solutions and the decisions made from them, our number one priority is to be a trusted partner they can rely on. Our commitment to transparency builds trust among our users, and increases the usability and accessibility of our data, ultimately democratizing the use of reliable geospatial data for critical problem solving. These values will always be front and center in our strategy as we continue to grow.

If you’d like to learn more about SafeGraph data, check out our free datasets or schedule a demo with one of our experts. We’re here to help. 

Browse the latest

Questions? Get in touch with our team of data experts.