One of the biggest obstacles to innovation in the machine learning industry is the availability of clean, high-quality datasets. That’s why SafeGraph is committed to getting as much data out there in front of businesses and researchers.
One way SafeGraph has already made data more accessible is through offering points-of-interest, store visitor insights, and foot-traffic data available for purchase at low cost through a self-serve data portal.
Open Census Data is another step towards democratizing access to high-quality data.
While census data is offered for free on the Census Bureau website, it isn’t as open and easy to use as it may look on first glance.
Nick Singh, who heads growth at SafeGraph, explains:
“Accessing data on the Census Bureau website is a cumbersome process. The UI is confusing to use. You have to do a sequence of steps 50 times for each of the 50 states to get the data at the lowest granularity. Easy bulk access isn’t supported.”
The challenge with downloading Census data is exemplified by the GIS StackExchange question: “Where to get 2010 Census Block data?”.
The most upvoted answer leads with:
“It is on the new version of American Factfinder and don’t feel bad, even Census Bureau employees are confounded by the new site.”
The answer goes on to list 8 steps, and this only gets you part of the data.
SafeGraph’s Open Census Data contains 7500+ demographic attributes (like income, age, education, etc.) available at the Census Block Group level. All data from the American Community Survey is available in bulk with a clean schema and joined with Census Block Group (CBG) geometry.
SafeGraph’s open census dataset includes the following components:
This dataset can help you answer questions such as:
The schema and documentation for the dataset can be found here.
SafeGraph has also created an interactive map that illustrates the data and allows for easy exploration.
In addition to the Census data, SafeGraph has also included a version of SafeGraph Patterns at the neighborhood level (census block group). This dataset answers questions such as:
The free open neighborhood analytics dataset is a less granular version of SafeGraph’s premium Places Patterns dataset. SafeGraph’s premium Patterns dataset reports data at a “place” (store location) level. The free open neighborhood analytics has data at the Census Block Group (CBG) level, which covers roughly 600–3000 households.
So instead of reporting distance traveled to a store or top related brands for a specific place like in the Places Patterns dataset, the free neighborhood insights show how far people traveled to reach a neighborhood and the brand preferences for a whole community.
Ryan Fox Squire, who works on Product and Data Science at SafeGraph, points out an interesting use case for real estate analytics:
“One use case is a team of people at a big retail company using the data to decide where they should open a new store for their company. A big part of these analyses involves demographic data for candidate neighborhoods. Traditionally, you only look at what are the demographics of people who live physically near the new location.
But SafeGraph’s neighborhood insights in combination with census data lets you analyze not only who lives in this locality but also who travels to be near the candidate location. For example, most people spend a lot of time and money in places near where they work during the day and not near their homes.
Without knowing which communities people commute to, you have an incomplete picture for retail analyses. Neighborhood patterns and open census data helps give one the whole picture”
Another use case is how Neoway leveraged SafeGraph Patterns along with the Open Census data to help consult a major beverage-maker on how to optimize its product-mix at restaurants, bars, and stores, based on each location’s unique profile and demographic mix.
A data scientist on Kaggle used the neighborhood insights to understand what were the most popular brands visited across all neighborhoods.
SafeGraph Places has building footprints data and business listing info for 6 million places (points of interest) in the U.S. — almost every place you can spend money, from top retail brands to small Mom-and-Pop stores.
SafeGraph Places Patterns is a dataset of insights, such as distance traveled and top home CBG, for visitors to these points of interest. By combining US Census demographic data, with Places Patterns, one can get detailed demographic insights on a given store’s visitors.
Both datasets are available to preview & purchase on SafeGraph’s Shop.