SafeGraph is thrilled to announce an exciting partnership with AWS and Databricks to make insights about the physical world easier than ever.
Today Amazon launches AWS Data Exchange , a new platform for sharing data. SafeGraph is honored to be a founding data partner for the AWS Data Exchange, launching today with over 20 powerful datasets available for free or for purchase (you need to sign in to your AWS account to see the listings).
SafeGraph is honored to be a founding data partner for the AWS Data Exchange, launching today with over 20 powerful datasets available for free or for purchase.
To show the power of SafeGraph data from AWS Data Exchange inside Databricks, we’ve created this Databricks demonstration notebook (.dbc download here). For ready-to-run code, please see the complementary Databricks notebook.
Also, if you want to learn more about using SafeGraph data in Databricks, register for our upcoming webinar.
SafeGraph is just a data company, that’s all we do.
SafeGraph has three primary datasets:
AWS is one of the most important cloud services companies in the world. Making SafeGraph data available in the AWS Data Exchange is 100% aligned with the SafeGraph mission to democratize access to data.
The AWS Data Exchange is now hosting over 20+ datasets from SafeGraph, including:
Databricks is a unified analytics platform that enables data science, data engineering and business analytics teams to derive value from data at scale and with ease of use in a collaborative manner.
At its core, the Databricks platform is powered by Apache Spark and Delta Lake in a cloud native architecture, which gives users virtually unlimited horse power to acquire, clean, transform, combine and analyze data sets within minutes from a notebook interface, with popular languages of choice (python, scala, SQL, R). Because Databricks is a managed platform, customers do not have to become big data devops gurus to power their analytical needs, which reduces administrative burden, costs and risks of their data driven projects.
We’ve created this Databricks notebook (.dbc download here), and published this blog, so that you can hit the ground running using SafeGraph Data from AWS Data Exchange in Databricks. For ready-to-run code, please see the complementary Databricks notebook. For detailed instructions on setting up Databricks and loading in SafeGraph data, see the Databricks sister blog post.
To demonstrate the power of SafeGraph data inside Databricks, we are highlighting three datasets from SafeGraph currently available for free inside AWS Data Exchange.
Getting your data running in Databricks is just a few clicks away. We’ve published full step by step instructions for loading SafeGraph data into Databricks from AWS Data Exchange on the Databricks blog.
Once you have SafeGraph data loaded into Databricks, many exciting answers about consumer behavior are at your fingertips. To see these implemented in code, checkout the accompanying Databricks demonstration notebook.
With a few lines of code you can examine the relative popularity of individual locations of Starbucks, as well as the average popularity by hour across Starbucks nation-wide.
The data shows that traffic ramps up during the morning, and peak traffic is around 12pm and 1pm.
We can ask the same question but about what days of the week are popular. Looking at 20 random Starbucks examples we see that on average no days are strongly preferred over others. However, some POI do show interesting weekend vs weekday differences.
We can examine one of these POI and compare it to the national average.
This data shows that, nationally, the busiest days of the week at Starbucks are Wednesdays and Thursdays, although this is a mild preference. In contrast, safegraph_place_id sg:68513387500e48eb87d719207d058309 shows a very different pattern and is significantly less popular during the weekends compared to weekdays.
To visualize where this POI is located, you can read the (latitude, longitude) from the SafeGraph dataset and search for it in Google Maps. It turns out that this particular Starbucks is located on the campus of the Boston University School of Law. Presumably the fact that classes are not held during weekends is causing this very large weekday vs weekend difference.
SafeGraph reports the median distance travelled (from the home census block group) for each POI. Using this we can construct a histogram of Starbucks locations, showing how far people travel to visit Starbucks.
This data shows that most Starbucks locations draw visitors that live less than 10 kilometers away. However there is a long thin tail of Starbucks locations with the median distance from home is hundreds of km. These locations are likely in high-tourist or high-commute areas (like in an airport) where most visitors do not live geographically nearby.
The column related_same_month_brand and related_same_day_brand reports an index of how frequently visitors to a POI visit also visit other brands (relative to the average visitor rate to that brand). Here we look at what other brands are frequently visited by customers of Starbucks. The larger the index, the more frequently Starbucks customers visit that brand.
Although Starbucks is a national chain, cross-brand shopping is highly influenced by local geography. Here we show the top 5 top cross-shopping brands for Starbucks customers in California, New York, and Texas. Only McDonald’s is in the Top 5 of all 3 states.
You can use SafeGraph data from AWS Data Exchange in Databricks to analyze the customer demographics of individual POI or brands. For a deep dive on the methodology, along with more complete statistical analysis feel free to read this workbook.
Here we analyze Starbucks Customer Demographics along the Race Demographic dimension using available from SafeGraph in AWS Data Exchange. This analysis could be repeated for any demographic information tracked by the Census, and reported at the census block group level. That includes Ethnicity, Educational Attainment, Household Income, and much, much more.
To do this analysis we will use:
The baseline demographics of the United States are shown as a reference. SafeGraph Patterns shows interesting differences between the census area demographics of Starbucks Customers compared to the overall USA population
Importantly, these differences are not due to geographic sampling bias in the SafeGraph dataset. It is true that the SafeGraph dataset has some small geographic biases. For a full report see “What about bias in the SafeGraph dataset?”. However, we are able to measure and correct the small effects of sampling bias in the SafeGraph dataset. For details on this calculation, see the Databricks demonstration notebook. For a thorough discussion on this methodology, see A Workbook to Analyze Demographic Profiles from SafeGraph Patterns Data.
That's it – that's all we do. We want to understand the physical world and power innovation through open access to geospatial data. We believe data should be an open platform, not a trade secret. Information should not be hoarded so that only a few can innovate.