When sourcing points of interest (POI) data, many organizations first look at free, open source options like OpenStreetMap (OSM) before choosing to pay a provider. We compared a sample of OSM data to the same query of SafeGraph data to assess differences in accessibility, coverage, completion, and usability. A sample of data from each provider was obtained for Dollar General stores in Little Rock, Arkansas to mimic the user experience of someone analyzing that specific market and brand.
Read about our analysis in depth below.
Organizations large and small are increasingly realizing the importance of geospatial data. Whether they are building a consumer-facing application guiding people on where to go, or developing an internal analytics tool that informs strategic decisions, product managers and engineers are turning to POI data as a key ingredient in their solutions.
But these builders are tasked with more than just populating maps with points. They are responsible for powering a positive user experience and delivering accurate information for their end users to glean insights from. The ability to provide trustworthy data in apps, platforms, and tools is the most critical goal of a product builder, regardless of if the end user is a consumer navigating to a store or a real estate developer choosing a new store location. It doesn’t matter how quickly or inexpensively a product is built if the end users can’t rely on it to solve their challenges.
That’s why product builders are frequently turning away from open source data. While open source data appears to be free, it does come at a cost when considering the negative effect poor data quality has on the user experience - not to mention the time and resources required to get open source data in a usable format.
To better understand the differences in open source vs curated geospatial data, we conducted a study on SafeGraph and OpenStreetMap data. The goal of our study was to explore, quantify, and articulate both the technical differences between the two data sources and the impact using one over the other would have on common POI data workflows.
Our study focuses on Dollar General Stores in Little Rock, Arkansas and uses the Dollar General website’s store locator as a source of truth to compare each dataset to. We assessed the following aspects of data quality in our research:
To begin our analysis of each data source, we examine how accessible SafeGraph and OpenStreetMap POI datasets are.
From the SafeGraph website, anyone can get in contact with a data expert that will help guide them through the data procurement process. Data can be delivered through a CSV or directly to a warehouse or analytics environment such as AWS, Snowflake, or CARTO.
We worked with a CSV file of Dollar General locations in Little Rock, Arkansas from the SafeGraph July 2022 release. Records were filtered from the full release using the "brands," "city," and "region," columns. Seventeen POIs were identified with a "brands" value of Dollar General, "city" value of Little Rock, and "region" value of AR.
There are multiple programs available for extracting OSM data, and several of them were explored as options for this report.
First to be tested was the Humanitarian OpenStreetMap Team (HOT OSM) export tool. By identifying an area of interest and setting the configurations to extract commercial shops, a geopackage was produced and visualized in QGIS. The results, both point and polygon, were filtered down to include only Dollar General stores; the results were one point and three polygons. Neither the point nor the polygon file had significant attribution, so we decided to look elsewhere.
BBBike is an alternate tool for extracting OSM data. This tool extracts all OSM data present for a given area, so the results included buildings, places, waterways, railroads, roads, and natural features. The "buildings" shapefile proved to be the most useful, but the file only included two attributes (“name” and “type”), much of which were incomplete. A filter for Dollar General in the “name” field resulted in three polygons.
Ultimately, the best tool for extracting the appropriate data proved to be Overpass Turbo. Once the area of Little Rock, AR was selected, the query wizard was used to generate statements to extract nodes (points), ways (lines), and relations tagged with the appropriate attribution (“Dollar General”). Results yielded two Dollar General point locations in the area of interest and three polygons, for a total of five distinct Dollar General locations. The field list included details like shop type, street address, business hours, website, and phone number.
In terms of acquisition, the SafeGraph process is much more streamlined than the OSM extraction workflow; multiple different tools were used to extract OSM data in order to identify the best source, and each one has a bit of a learning curve for new users. The price to acquire the data from SafeGraph is marginal, especially when considering time spent utilizing Overpass Turbo, HOTOSM, and/or BBBike.
It is also notable that the SafeGraph data can be easily visualized as both points and polygons, because the attribution for each point includes geometry (delivered in a column as well known text or WKT). Both point and polygon data can be exported from OSM as well, but the results of the two do not necessarily align and contain varying levels of completeness, requiring more time to be spent searching for and cleaning the data needed for the final output.
In terms of accessibility, SafeGraph data proved to be easier and more efficient to acquire than OSM POI data. While SafeGraph data did come at a monetary cost, it did not require the same amount of time needed to search for and download the POIs as did the OSM data, which required a significant amount of time to obtain the data (data that was less complete at that).
To measure and compare the coverage of POI data from SafeGraph and OSM, we use Dollar General’s online store locator as a source of truth. When searching the store locator for "Little Rock, Arkansas" using their default geographic filter of 10 miles, 27 POIs appear on their map.
Because the SafeGraph and OSM data is being filtered and acquired using the city name of Little Rock and not a radius like the Dollar General website, we then filtered the store locator records by city name. Fifteen POIs were identified with an address string including Little Rock as the city name within the default 10 mile radius.
Comparing the SafeGraph data for Dollar General stores in Little Rock, we were able to easily match 13 of the 17 SafeGraph POIs to locations on Dollar General’s website. For the remaining four Dollar General POIs cited in the SafeGraph data, we did some digging to identify any discrepancies.
After researching further, we were able to see that all 17 of the SafeGraph Dollar General POIs were indeed listed as operational stores on Dollar General’s store locator. Three did not originally show up because they fall outside of the default 10 mile radius, and one had a different city name listed on Dollar General’s site.
Additionally, two store locations with a city name of Little Rock did appear on Dollar General’s website that were not included in the SafeGraph dataset. We did another investigation to see why and found that similar discrepancies in how the city name is attributed were the reason. The SafeGraph database does include all Dollar General POIs in Little Rock as listed on the site’s store locator, but our data acquisition filtering methodology required us to investigate a little further.
Of the OSM data acquisition methods used, Overpass Turbo provided the most Dollar General locations, with the results including two points and three polygons for a total of five stores. All five stores were present on Dollar General’s store locator site.
Comparing these results to the 19 Dollar General locations identified on the company website (the 17 identified by SafeGraph plus the remaining two not included in our original SafeGraph query), we can see that OpenStreetMap’s data for Dollar General POIs in Little Rock is only 26% accurate.
SafeGraph data was found to be much more comprehensive than OSM for Dollar General stores in Little Rock. Not only are all store locations represented in the SafeGraph dataset compared to only 26% in OSM, but the SafeGraph data includes both a point and a polygon for each POI. The OSM data is a mix of the two geometry types, which is not ideal for users looking to develop uniform visualizations or analytics tools.
Explore the differences between the SafeGraph and OSM point data for Dollar General stores in Little Rock:
Zoom in to see differences in polygon coverage between SafeGraph and OSM. For OSM features where queries only returned polygons, we generated points for an easier visualization of the differences between the two providers.
While locating points on a map is a critical function, much of the value from POI data lies in the attributes or columns associated with each location or row. Now that we have acquired the Dollar General POI data and measured its coverage of store locations, we will assess the level of detail associated to each location through the provided data attributes.
SafeGraph provides 28 columns of attributes for each location record or row. For the 17 Dollar General POIs identified by the original query, only 21 of the 476 attributes were incomplete (meaning the data was 95.6% complete). Some did contain "NULL" values, but those are accounted for in the product documentation (for example, a value of "NULL" in the “closed_on” field indicates that the business has not yet closed). The attribution for all Dollar General stores was almost entirely complete and thorough, including not just location information, but phone numbers, business hours, open dates, category, and other relevant details about that place. SafeGraph also provides clear reasons why some fields are not populated, for example erring on the side of caution rather than sharing false information about a place.
In terms of completion, the OSM data included a total of 24 fields, and none of the five features were entirely complete. Among the five of them, over half the fields were left empty, for a completion rate of 39.8%. The attribution for these features is sporadically complete; some features included addresses, websites, phone numbers, and open hours, but most were nearly void of attribution. OSM data does not provide documentation to explain why some of these fields contain "NULL" values.
Overall, the number of columns in both datasets is comparable, in the sense that both SafeGraph and OSM have fields for details of the business: name, phone, website, open hours, and street address. However, because OSM is a crowd-sourced database, it is left up to chance whether or not these fields are filled in. Some features were complete with much of this attribution, but most of them had significant gaps. In contrast, SafeGraph had a 95.6% completion rate, providing more context about each location than the OSM data. The open accessibility and transparency of SafeGraph's documentation also made it much easier to identify why certain fields were incomplete, whereas no context around data completion is available for OSM data.
Once we compared the level of detail provided by each dataset, we set out to determine which one was more fit-for-purpose for common POI-related workflows. Using both the SafeGraph and OSM datasets, we conducted hot spot, proximity, and trade area analyses. These methods are often baked into tools in location-based analytics platforms that perform site selection, competitive intelligence, visit attribution, and other key functions. The right quality of data baked into a tool can drastically impact the output of these functions, so we decided to see the difference using SafeGraph data vs OSM had on the results.
The following technical analyses were performed to determine the value of each dataset to common workflows:
The hot spot analysis conducted on the 17 SafeGraph features originally queried for identified one significant cluster in the center of the city of Little Rock. Deriving this information from a business intelligence tool, the company could decide to construct or eliminate Dollar General stores based on existing hot spots. Similarly, a competitor could choose to open a store in an underserved area.
In order to conduct the hot spot analysis with the OSM data, the polygon features needed to be converted into points and merged with the existing point data. However, even then, there were not enough features in the dataset to conduct a reliable analysis using kernel density estimation, so a manual hot spot analysis had to be conducted instead. The results identified no significant hot spots that could be used for decision making.
To measure the proximity of Dollar Generals to various parts of Little Rock, statistics were calculated on hexagons that define distance values. According to the SafeGraph data, there is nowhere in the city of Little Rock that is more than 5.9 miles away from a Dollar General store location; on average, a Dollar General store is within 1.9 miles of any other point in Little Rock.
Using the same proximity analysis methodology, we found that, on average at any given point within the city of Little Rock, a Dollar General store is within 3.51 miles. Additionally, the furthest an individual can be from a Dollar General store is 9.97 miles. These distances are extremely different from those generated using SafeGraph data, and reflect how the data used for inputs into a product or tool can drastically impact the outcome of the results (and therefore the overall user experience).
The final analysis conducted generated service areas for each Dollar General location with Voronoi polygons. Using the SafeGraph data originally queried for, 17 service areas were generated for Dollar General that extend even beyond the Little Rock city limits.
Because of the limited number of POIs in the OSM data, the resulting service areas generated with that data do not cover the entirety of the city of Little Rock. Additionally, because the Dollar General store locations from this dataset were more dispersed, without the presence of hot spots, the range of sizes of the Voronoi cells were much smaller. A user relying on the trade areas generated from the OSM data would risk misallocating resources based on false information, and not taking into consideration cannibalization from other Dollar General stores.
Overall, SafeGraph data appears to be very nearly complete and accurate, accounting for 100% of the stores Dollar General reports itself on its website. The attribution for all features was 95.6% complete and thorough, including not just location information, but phone numbers, business hours, and opening dates. It was sufficient to conduct the three chosen analyses, and according to all three analytical outputs, there is a cluster of Dollar General stores in the center of the city.
Approximately 75% of the Dollar General locations in the city were missing from the OSM dataset, and the acquisition process was lengthy and disjointed. Additionally, the attribution for these features is sporadically complete; some features included addresses, websites, phone numbers, and open hours, but most were nearly void of attribution entirely. The results of the analyses did not identify any hot spots or discrepancies in service region size. However, because of these data gaps, very little confidence can be placed in the results of the analyses. Products built with the OSM data would require supplemental POI sources to ensure the output of their users' analyses are reliable and trustworthy.
While this study only compares SafeGraph and OSM data for one brand in one geographic location, the results can be used to infer what a larger analysis and the subsequent results would look like. This research was intended to recreate the user experience of someone performing specific analyses on a particular brand in a defined geographic region, but future comparisons could explore different markets, brands, or place categories to see how the two providers differ in other scenarios.
Our overall recommendation? If you are looking for free data, OpenStreetMap is available and will allow you to put points on a map. But if you are looking to cost effectively reflect real world truth and embed reliable data into your products, SafeGraph is the right provider for you.
Open source data may be free, but our analysis of its accessibility, accuracy, completeness, and overall usability uncovers that there is true hidden cost. Ready to get started with clean, accurate, and up-to-date data? Get in touch with the SafeGraph team. We’re here to help.