Free Dataset Now Available: Retail Brands in Miami
Blog home

Comparing SafeGraph and OpenStreetMap: The Hidden Cost of Free Data

July 13, 2022
by
Briana Brown

When sourcing points of interest (POI) data, many organizations first look at free, open source options like OpenStreetMap (OSM) before choosing to pay a provider. We compared a sample of OSM data to the same query of SafeGraph data to assess differences in accessibility, coverage, completion, and usability. A sample of data from each provider was obtained for Dollar General stores in Little Rock, Arkansas to mimic the user experience of someone analyzing that specific market and brand.

TL;DR

  • Acquiring the data for Dollar General store locations in Little Rock was more streamlined than searching multiple tools for a full dataset from OSM. However, OSM data is free and SafeGraph data does come at a monetary cost.
  • SafeGraph data contained 100% of the locations reported on Dollar General's website for Little Rock, while the best query tool we could find for OSM data provided only 26% of store locations and in differing formats.
  • While SafeGraph and OpenStreetMap provide a similar number of attribute columns about each place (28 and 24 respectively), SafeGraph's fill rate for the given sample was 95.6% while OSM's was 39.8%. SafeGraph also provided transparent and open documentation about each column and potential fill rates, while OSM data did not come with accompanying documentation.
  • Conducting common POI workflows using each dataset to reflect the user experience, we found that SafeGraph data produced more comprehensive and trustworthy results based on the higher coverage and completion rates. Analysis with SafeGraph data also took less time due to the availability of transparent documentation outlining each field, and the easier acquisition process when compared to OSM.

Read about our analysis in depth below.

Comparing SafeGraph and OpenStreetMap

Organizations large and small are increasingly realizing the importance of geospatial data. Whether they are building a consumer-facing application guiding people on where to go, or developing an internal analytics tool that informs strategic decisions, product managers and engineers are turning to POI data as a key ingredient in their solutions. 

But these builders are tasked with more than just populating maps with points. They are responsible for powering a positive user experience and delivering accurate information for their end users to glean insights from. The ability to provide trustworthy data in apps, platforms, and tools is the most critical goal of a product builder, regardless of if the end user is a consumer navigating to a store or a real estate developer choosing a new store location. It doesn’t matter how quickly or inexpensively a product is built if the end users can’t rely on it to solve their challenges.

That’s why product builders are frequently turning away from open source data. While open source data appears to be free, it does come at a cost when considering the negative effect poor data quality has on the user experience - not to mention the time and resources required to get open source data in a usable format. 

To better understand the differences in open source vs curated geospatial data, we conducted a study on SafeGraph and OpenStreetMap data. The goal of our study was to explore, quantify, and articulate both the technical differences between the two data sources and the impact using one over the other would have on common POI data workflows. 

Our study focuses on Dollar General Stores in Little Rock, Arkansas and uses the Dollar General website’s store locator as a source of truth to compare each dataset to. We assessed the following aspects of data quality in our research:

  • Accessibility: The length of time and steps involved needed to work with each data source, from acquiring the data needed to deriving results.
  • Coverage: The data’s ability to reflect real-world truth and the impact that ability has on performing common POI workflows and analysis.
  • Completeness: The level of attribution associated to individual POIs in each dataset and the impact that attribution has on adding context for visualizations and analytics.
  • Usability: The overall quality of each dataset as an input into location-based products, taking into consideration the need to clean, manipulate, or augment the data to be fit-for-purpose.

OpenStreetMap vs SafeGraph: Data Access

To begin our analysis of each data source, we examine how accessible SafeGraph and OpenStreetMap POI datasets are. 

Accessing SafeGraph data

From the SafeGraph website, anyone can get in contact with a data expert that will help guide them through the data procurement process. Data can be delivered through a CSV or directly to a warehouse or analytics environment such as AWS, Snowflake, or CARTO.

We worked with a CSV file of Dollar General locations in Little Rock, Arkansas from the SafeGraph July 2022 release. Records were filtered from the full release using the "brands," "city," and "region," columns. Seventeen POIs were identified with a "brands" value of Dollar General, "city" value of Little Rock, and "region" value of AR. 

SafeGraph data for Dollar General stores in Little Rock, AR; July 2022; visualized in QGIS.

Accessing OpenStreetMap data

There are multiple programs available for extracting OSM data, and several of them were explored as options for this report.

First to be tested was the Humanitarian OpenStreetMap Team (HOT OSM) export tool. By identifying an area of interest and setting the configurations to extract commercial shops, a geopackage was produced and visualized in QGIS. The results, both point and polygon, were filtered down to include only Dollar General stores; the results were one point and three polygons. Neither the point nor the polygon file had significant attribution, so we decided to look elsewhere.

BBBike is an alternate tool for extracting OSM data. This tool extracts all OSM data present for a given area, so the results included buildings, places, waterways, railroads, roads, and natural features. The "buildings" shapefile proved to be the most useful, but the file only included two attributes (“name” and “type”), much of which were incomplete. A filter for Dollar General in the “name” field resulted in three polygons.

Ultimately, the best tool for extracting the appropriate data proved to be Overpass Turbo. Once the area of Little Rock, AR was selected, the query wizard was used to generate statements to extract nodes (points), ways (lines), and relations tagged with the appropriate attribution (“Dollar General”). Results yielded two Dollar General point locations in the area of interest and three polygons, for a total of five distinct Dollar General locations. The field list included details like shop type, street address, business hours, website, and phone number.

OpenStreetMap data for Dollar General stores in Little Rock, AR; July 2022; visualized in the Overpass Turbo user interface.

OSM vs SafeGraph data access comparison results

In terms of acquisition, the SafeGraph process is much more streamlined than the OSM extraction workflow; multiple different tools were used to extract OSM data in order to identify the best source, and each one has a bit of a learning curve for new users. The price to acquire the data from SafeGraph is marginal, especially when considering time spent utilizing Overpass Turbo, HOTOSM, and/or BBBike.

It is also notable that the SafeGraph data can be easily visualized as both points and polygons, because the attribution for each point includes geometry (delivered in a column as well known text or WKT). Both point and polygon data can be exported from OSM as well, but the results of the two do not necessarily align and contain varying levels of completeness, requiring more time to be spent searching for and cleaning the data needed for the final output. 

In terms of accessibility, SafeGraph data proved to be easier and more efficient to acquire than OSM POI data. While SafeGraph data did come at a monetary cost, it did not require the same amount of time needed to search for and download the POIs as did the OSM data, which required a significant amount of time to obtain the data (data that was less complete at that).

OpenStreetMap vs SafeGraph: Data Coverage

To measure and compare the coverage of POI data from SafeGraph and OSM, we use Dollar General’s online store locator as a source of truth. When searching the store locator for "Little Rock, Arkansas" using their default geographic filter of 10 miles, 27 POIs appear on their map.

Dollar General store locator results for Little Rock, AR; July 2022.

Because the SafeGraph and OSM data is being filtered and acquired using the city name of Little Rock and not a radius like the Dollar General website, we then filtered the store locator records by city name. Fifteen POIs were identified with an address string including Little Rock as the city name within the default 10 mile radius. 

SafeGraph data coverage

Comparing the SafeGraph data for Dollar General stores in Little Rock, we were able to easily match 13 of the 17 SafeGraph POIs to locations on Dollar General’s website. For the remaining four Dollar General POIs cited in the SafeGraph data, we did some digging to identify any discrepancies.

Four additional POIs identified in SafeGraph’s data that were not initially matched via Dollar General’s store locator.

After researching further, we were able to see that all 17 of the SafeGraph Dollar General POIs were indeed listed as operational stores on Dollar General’s store locator. Three did not originally show up because they fall outside of the default 10 mile radius, and one had a different city name listed on Dollar General’s site.

Additionally, two store locations with a city name of Little Rock did appear on Dollar General’s website that were not included in the SafeGraph dataset. We did another investigation to see why and found that similar discrepancies in how the city name is attributed were the reason. The SafeGraph database does include all Dollar General POIs in Little Rock as listed on the site’s store locator, but our data acquisition filtering methodology required us to investigate a little further.

Two additional POIs identified with Dollar General's store locator were not in the initial SafeGraph data download due to differences in address strings.

OpenStreetMap data coverage

Of the OSM data acquisition methods used, Overpass Turbo provided the most Dollar General locations, with the results including two points and three polygons for a total of five stores.  All five stores were present on Dollar General’s store locator site.

Comparing these results to the 19 Dollar General locations identified on the company website (the 17 identified by SafeGraph plus the remaining two not included in our original SafeGraph query), we can see that OpenStreetMap’s data for Dollar General POIs in Little Rock is only 26% accurate.

OSM vs SafeGraph data coverage comparison results

SafeGraph data was found to be much more comprehensive than OSM for Dollar General stores in Little Rock. Not only are all store locations represented in the SafeGraph dataset compared to only 26% in OSM, but the SafeGraph data includes both a point and a polygon for each POI. The OSM data is a mix of the two geometry types, which is not ideal for users looking to develop uniform visualizations or analytics tools.

Explore the differences between the SafeGraph and OSM point data for Dollar General stores in Little Rock:

Zoom in to see differences in polygon coverage between SafeGraph and OSM. For OSM features where queries only returned polygons, we generated points for an easier visualization of the differences between the two providers.

OpenStreetMap vs SafeGraph: Data Completion

While locating points on a map is a critical function, much of the value from POI data lies in the attributes or columns associated with each location or row. Now that we have acquired the Dollar General POI data and measured its coverage of store locations, we will assess the level of detail associated to each location through the provided data attributes.

SafeGraph data fill rate

SafeGraph provides 28 columns of attributes for each location record or row. For the 17 Dollar General POIs identified by the original query, only 21 of the 476 attributes were incomplete (meaning the data was 95.6% complete). Some did contain "NULL" values, but those are accounted for in the product documentation (for example, a value of "NULL" in the “closed_on” field indicates that the business has not yet closed). The attribution for all Dollar General stores was almost entirely complete and thorough, including not just location information, but phone numbers, business hours, open dates, category, and other relevant details about that place. SafeGraph also provides clear reasons why some fields are not populated, for example erring on the side of caution rather than sharing false information about a place.

SafeGraph data attributes for Dollar General stores in Little Rock, AR; July 2022; visualized in QGIS.

OpenStreetMap data fill rate

In terms of completion, the OSM data included a total of 24 fields, and none of the five features were entirely complete. Among the five of them, over half the fields were left empty, for a completion rate of 39.8%. The attribution for these features is sporadically complete; some features included addresses, websites, phone numbers, and open hours, but most were nearly void of attribution. OSM data does not provide documentation to explain why some of these fields contain "NULL" values.

OSM data attributes for Dollar General stores in Little Rock, AR; July 2022; visualized in QGIS.

OSM vs SafeGraph data fill rate comparison results

Overall, the number of columns in both datasets is comparable, in the sense that both SafeGraph and OSM have fields for details of the business: name, phone, website, open hours, and street address. However, because OSM is a crowd-sourced database, it is left up to chance whether or not these fields are filled in. Some features were complete with much of this attribution, but most of them had significant gaps. In contrast, SafeGraph had a 95.6% completion rate, providing more context about each location than the OSM data. The open accessibility and transparency of SafeGraph's documentation also made it much easier to identify why certain fields were incomplete, whereas no context around data completion is available for OSM data.

OpenStreetMap vs SafeGraph: Usability

Once we compared the level of detail provided by each dataset, we set out to determine which one was more fit-for-purpose for common POI-related workflows. Using both the SafeGraph and OSM datasets, we conducted hot spot, proximity, and trade area analyses. These methods are often baked into tools in location-based analytics platforms that perform site selection, competitive intelligence, visit attribution, and other key functions. The right quality of data baked into a tool can drastically impact the output of these functions, so we decided to see the difference using SafeGraph data vs OSM had on the results.

The following technical analyses were performed to determine the value of each dataset to common workflows:

  • Hot spot analysis was conducted using kernel density estimation in QGIS, which statistically calculates areas with concentrations of a specific feature.
  • Proximity analysis was performed using an aggregated distance matrix generated from both point and hexagonal grids overlaid on the area of interest in QGIS. Each hexagon was then symbolized by color based on proximity to a Dollar General location.
  • Trade area analysis was conducted using Voronoi cells in QGIS, which generates polygons that are sized based on the spatial distribution of features. 

SafeGraph vs OpenStreetMap data usability

Hot spot analysis

The hot spot analysis conducted on the 17 SafeGraph features originally queried for identified one significant cluster in the center of the city of Little Rock. Deriving this information from a business intelligence tool, the company could decide to construct or eliminate Dollar General stores based on existing hot spots. Similarly, a competitor could choose to open a store in an underserved area.

In order to conduct the hot spot analysis with the OSM data, the polygon features needed to be converted into points and merged with the existing point data. However, even then, there were not enough features in the dataset to conduct a reliable analysis using kernel density estimation, so a manual hot spot analysis had to be conducted instead. The results identified no significant hot spots that could be used for decision making.

Kernel density estimation analysis with SafeGraph data vs OSM data; July 2022; visualized in QGIS.

Proximity analysis

To measure the proximity of Dollar Generals to various parts of Little Rock, statistics were calculated on hexagons that define distance values. According to the SafeGraph data, there is nowhere in the city of Little Rock that is more than 5.9 miles away from a Dollar General store location; on average, a Dollar General store is within 1.9 miles of any other point in Little Rock.

Using the same proximity analysis methodology, we found that, on average at any given point within the city of Little Rock, a Dollar General store is within 3.51 miles. Additionally, the furthest an individual can be from a Dollar General store is 9.97 miles. These distances are extremely different from those generated using SafeGraph data, and reflect how the data used for inputs into a product or tool can drastically impact the outcome of the results (and therefore the overall user experience).

Aggregated distance matrix analysis with SafeGraph data vs OSM data; July 2022; visualized in QGIS.

Trade area analysis

The final analysis conducted generated service areas for each Dollar General location with Voronoi polygons. Using the SafeGraph data originally queried for, 17 service areas were generated for Dollar General that extend even beyond the Little Rock city limits.

Because of the limited number of POIs in the OSM data, the resulting service areas generated with that data do not cover the entirety of the city of Little Rock. Additionally, because the Dollar General store locations from this dataset were more dispersed, without the presence of hot spots, the range of sizes of the Voronoi cells were much smaller. A user relying on the trade areas generated from the OSM data would risk misallocating resources based on false information, and not taking into consideration cannibalization from other Dollar General stores.

Voronoi polygon analysis with SafeGraph data vs OSM data; July 2022; visualized in QGIS.

OSM vs SafeGraph data quality comparison results

Overall, SafeGraph data appears to be very nearly complete and accurate, accounting for 100% of the stores Dollar General reports itself on its website. The attribution for all features was 95.6% complete and thorough, including not just location information, but phone numbers, business hours, and opening dates. It was sufficient to conduct the three chosen analyses, and according to all three analytical outputs, there is a cluster of Dollar General stores in the center of the city. 

Approximately 75% of the Dollar General locations in the city were missing from the OSM dataset, and the acquisition process was lengthy and disjointed. Additionally, the attribution for these features is sporadically complete; some features included addresses, websites, phone numbers, and open hours, but most were nearly void of attribution entirely. The results of the analyses did not identify any hot spots or discrepancies in service region size. However, because of these data gaps, very little confidence can be placed in the results of the analyses. Products built with the OSM data would require supplemental POI sources to ensure the output of their users' analyses are reliable and trustworthy.

While this study only compares SafeGraph and OSM data for one brand in one geographic location, the results can be used to infer what a larger analysis and the subsequent results would look like. This research was intended to recreate the user experience of someone performing specific analyses on a particular brand in a defined geographic region, but future comparisons could explore different markets, brands, or place categories to see how the two providers differ in other scenarios.

Our overall recommendation? If you are looking for free data, OpenStreetMap is available and will allow you to put points on a map. But if you are looking to cost effectively reflect real world truth and embed reliable data into your products, SafeGraph is the right provider for you.

Open source data may be free, but our analysis of its accessibility, accuracy, completeness, and overall usability uncovers that there is true hidden cost. Ready to get started with clean, accurate, and up-to-date data? Get in touch with the SafeGraph team. We’re here to help.

Explore SafeGraph data for yourself

Download a free sample

Browse the latest

Questions? Get in touch with our team of data experts.