Understanding the relationship between store foot traffic and transactions has been top of mind for organizations in recent months, particularly as consumer behavior continues to shift in response to COVID-19. While it makes sense that increased visits to a store will most likely mean increased financial transactions there, it’s impossible to know for sure unless you analyze the two datasets together and look for a correlation.
Investigating a possible correlation between foot traffic and transaction data has multiple purposes for companies. Most importantly, understanding the relationship between how many people visit a store and how many purchases take place uncovers insights that can inform how the store operates in the future. Advertising campaigns, store inventory planning, and site selection workflows can all be improved with up-to-date mobility and transaction data, especially if there is a correlation between the two.
But correlating foot traffic and transaction data can also help data scientists understand if one dataset can be used as a proxy for another. If there is a clear relationship between the two, data science models can be developed that use only one of the datasets, but provide insights into both foot traffic and transactions.
To help you get started correlating foot traffic data with transaction data, we created a Google Colab notebook. We also break down the main steps here:
Before getting started, it’s essential to know what data you are working with. In this notebook, we explore fictional transaction data and SafeGraph Patterns data.
Our hypothetical transaction data includes where and when transactions take place, spend per transaction, and counts of total transactions and customers. However, it’s important to remember that transaction data from third parties may have incomplete data, such as not specifying exactly where a transaction took place. Not all transaction data will include the fields above.
Store IDs often appear in transaction data, however these IDs need to be joined to the actual address to correctly attribute transactions to stores. SafeGraph Places has a store_id column, which enables the join between transaction data to specific stores, also known as points of interest (POIs).
Once transaction data is attributed to POIs, we can compare it to SafeGraph Patterns, which provides data on monthly and weekly visits to millions of POIs across the US and Canada. For this particular analysis, we look at Dunkin’ stores in Chicago in 2019.
Because both the transaction dataset and SafeGraph Patterns include a store_ID column, you can join them using that field. Alternatively, you could join them using the Placekey unique identifier for a place.
Other transaction datasets may not include its own store_ID column, in which case you will need to extract it from the transaction description column. Remember that store_id formatting varies between brands, for example #12345 vs F12345, as well as within brands, for example, different length store_id. Because of this, you should use a robust method to extract the store_id. Regex is ideal because it allows you to specify a pattern and extract the important parts only.
If you decide to join using Placekey, you can use the free API to assign Placekeys to your transaction data.
Once the transaction and mobility data are joined, you can graph them to better visualize their relationship.
For example, you can see how the total number of visits compares to the total number of transactions across all POIs in the study area.
Similarly, you can visualize how the total number of visitors compares to the total number of customers who made a transaction at all POIs in the study area.
It is also helpful to look at the relationship between transactions and foot traffic at specific stores, instead of at a more macro level. The following graph shows the number of transactions Dunkin’ located at Placekey [email protected]
Once you have visualized the data, you can start to determine if the datasets are correlated or not.
Correlation is a measure of the statistical relationship between two variables, which can be between -1 and 1. A value close to zero represents a weak or no relationship between the two variables. A value close to 1 represents a strong positive relationship, eg. as one variable increases, so does the other. A value close to -1 represents a strong inverse relationship, eg. as one variable increases, the other decreases.
The acceptable level of correlation will depend on factors such as use case, availability and reliability of other data.
We can analyze the correlation between transaction data and Patterns data to measure how confident we feel about using one of the datasets as a proxy when the other dataset isn't available, as well as to better understand how each factor affects a business.
In addition, we can dive deeper to identify conditions that make the two datasets ideal proxies. For example, you might notice that transaction data has a strong relationship with Patterns data when there are at least 100 transactions per month.