Comparing SafeGraph Aggregate Spend to the Census Monthly Retail Trade Survey

May 13, 2022
Garrett Hoffman

The Census Monthly Retail Trade Survey (MRTS) provides estimates of year-over-year (YOY) percent change in retail spending, by region and eleven 3-digit NAICS categories. This analysis examines the relationship between Census MRTS YOY percent change estimates and SafeGraph aggregate Spend YOY percentage change for the years 2021 & 2020.

SafeGraph Spend is our newest product, detailing anonymized consumer transactions at many POIs from our Places data. It's a great way of measuring economic activity and understanding how consumers interact with businesses. We will be looking at Region and NAICS aggregated Sum Spend, sub-selected from branded POIs in our Spend product, to help understand where we find agreement and disagreement with the Census Monthly Retail Trade Survey.

Observations for both Census MRTS and SafeGraph are across 11 NAICS codes and 51 US states (including Washington DC). Each observation represents the year-over-year change from 2020 to 2021, for both Census and SafeGraph Spend, for a single NAICS by Region by Month combination.

Histogram of Raw Data


At an initial glance, SafeGraph Spend data includes a greater number of negative YOY percent change observations. SafeGraph data also has a greater number of observations greater than 700%.

In short, there are strong seasonal trends in outlier observations for both the SafeGraph and Census samples. Since COVID-19 and related lockdown policies upended much of American life starting in 2020, we are analyzing a time when mobility and spending experienced large and swift changes. This can be clearly seen across both data sources. We will go deeper into details on these outliers in the ‘Seasonal Trends in Year-Over-Year Outliers' section of the blog.

To move forward with this analysis, the data needed to be trimmed to remove outliers. We removed anything from either source that has a YOY change estimate outside of mean +/- 2 * standard deviation. There are also some especially extreme values in both samples that could unnecessarily skew the mean and standard deviation statistics, so we first trimmed off observations where either estimate is greater or equal to 700%. While 700% itself is somewhat arbitrary, it represents an area far out on the tail of both distributions. This gave us more workable mean and standard deviation values to trim by.

Overall Accuracy Measurements

Now let's look at how well the two datasets match. We measure accuracy in two ways:

  1. Directional Accuracy: When the Census data show a positive YoY growth (2021 retail trade increased over 2020), how often does the SafeGraph data show a positive value? (and vice versa for negative values)
  2. Relative Ranking: What is the rank correlation between the SafeGraph and the Census data, and is it statistically significant?

SafeGraph Aggregate Spend has the same directional YOY pct. change as the Census MRTS about 65% of the time. The most common disagreement is 31% of observations where SafeGraph Spend data indicates negative growth, while Census MRTS indicates positive growth.

Across observations, when SafeGraph aggregate Spend predicts YOY growth in a Region by NAICS, Census MRTS also predicts YOY growth almost 95% of the time. While SafeGraph Spend is more likely to predict economic contraction in general, Census directional results almost always agree when we see positive growth in Spend.

Overall, Census MRTS and SafeGraph Spend share a correlation of 0.39 - a good start. However, as we will see, this overall number is hiding some important nuances in the underlying relationship. There are several places where SafeGraph Spend and Census MRTS are showing different levels, but similar ranks of NAICS category growth within a given region and month. In fact, they share a 0.53 Spearman rank-order correlation. When it comes to the highest and lowest growth sectors of a state, we are very commonly seeing the same top-performing business categories.

Correlation by Region


Looking at the average correlation by region, there are strong correlations across the United States. 39 out of 51 regions have an average correlation across NAICS categories greater than 0.5, punctuated by correlations of 0.8 in CO, GA, and MD.

Clearly, there are also some outlier regions with low correlations as well. Idaho and Iowa both have negative average correlations. This likely owes to the fact that we sub-selected Spend at branded POIs for this analysis, and those states have a greater proportion of retail trade measured at non-branded POIs.

Overall, there is an easily noticeable relationship between SafeGraph and Census estimates across the United States.

Correlation by NAICS Category


The strongest NAICS correlations are among clothing, electronics & appliance, and furniture stores. All have correlations greater than 0.45. While there is still room for improvement for some, most NAICS are showing evidence of a relationship between samples.

Some NAICS categories are ambiguously defined, and there is a valid disagreement between some brands and stores. This may be partially responsible for the lack of agreement between Census MRTS and SafeGraph in NAICS categories such as 'General Merchandise Stores', where the correlation is low.

Virginia YOY Change by NAICS

How does the comparison look for a particular state?

Virginia change by NAICS

Zooming in on a single region, both SafeGraph Spend and Census MRTS estimate high YOY growth across most included NAICS categories. In particular, both show YOY spend growth greater than 35% at gas stations. It's clear that driving behavior changed dramatically in this area from 2020 to 2021.

For many regions of interest, there is strong alignment between SafeGraph and Census MRTS, and the weakest agreement tends to be in NAICS categories that are the most poorly defined ("Miscellaneous Store Retailers", "General Merchandise Stores", "Motor Vehicle and Parts Dealers", and "Building Materials and Supplies Dealers").

Seasonal Trends in Year-Over-Year Outliers

Recall that each observation analyzed here is itself a YOY change measurement relative to the year prior. We would therefore expect some impact due to volatility in the reference month's numbers. For reference months with particularly low retail trade or Spend numbers, this could lead to greater variability when calculating YOY change.

We explore here seasonality in the YOY growth numbers which were removed as outliers.

removed observations

Most of the outliers that were removed were in March, April, and May.

Remember that we are looking at year-over-year changes from 2020 to 2021. 2020 was an odd year, especially due to the global spread of COVID-19 and resulting lockdowns. Most observations that were trimmed, or omitted by the Census, were during the spring, precisely when the United States was undergoing massive social change, as the COVID-19 lockdown policy was in full effect. Going forward, as we get further from the beginning of the pandemic, the impact of this will likely fade from both sources.

Key Findings and Takeaways

There is a lot that can be explored with SafeGraph Spend and the Census Monthly Retail Trade Survey. Broadly, the Census and SafeGraph are showing a shifting economy. Strong indicators point to major shifts in consumer purchasing habits over 2020 - 2021 in tandem with major disruptions in American life.


  • SafeGraph Spend and Census MRTS share an average correlation greater than 0.5 in 39 out of 51 US regions
  • Clothing, electronics & appliance, and furniture stores show the strongest relationship over this time period. Each NAICS category has an average correlation greater than 0.45
  • When SafeGraph Spend estimates positive YOY growth, Census MRTS predicts positive growth almost 95% of the time
  • SafeGraph Spend and the Census MRTS share a rank-order correlation of 0.53 for NAICS within each region-month, demonstrating alignment of the top and bottom growth NAICS each month within each state
  • The majority of outlier estimates, for both SafeGraph and Census MRTS, are in March, April, or May - yet another example of the significant disruption that occurred over this period in 2020

Given the granularity of SafeGraph Spend data, all the way down to our POIs, there are nearly limitless details that can be derived and analyzed. Looking through this macro lens shows promising alignment to another source of truth, and there is more to discover under the hood.

For more details on the code behind this analysis, checkout the Google Notebook.

