SafeGraph data is all about examining foot traffic at a given location. However, what do you do when a single piece of land can be rightfully claimed by two different locations?
It's not that unusual for one place to be two places. For example, the Claire's in your nearby mall. Is that in the Claire's? Is that in the mall? Really, it's in both.
SafeGraph addresses this problem using the concept of parent and child locations. Some overarching location that contains many other locations within it is a parent. Common types of parents include malls, airports, and shopping centers. You can see the full list of location types that can be parents, and some more information about parent/child relationships, here. The location that's inside of some sort of parent is a child.
Dealing with parent and child locations can be an important part of working with SafeGraph data, even if you're not interested in distinguishing them, since if you're not careful you could end up double-counting foot traffic: once for the parent, and once for the child.
So some important questions:
Let's start by opening up our Shop data and loading it in.
This particular store data was constructed using the Match Service, where we found matches for a bunch of children POIs. This contains a set of customer_ columns of the child data used to match, and then another set of columns containing the match data, i.e. the parent.
Store data that was acquired in a different way, for example getting all POIs in a certain city, might be structured slightly differently, so the next steps might not be necessary for you.
So, question 1 as above:
1. How can we look in our Shop data and figure out where we have parent and child locations?
We've got the Shop data. We separated out all the match data, but for a given set of POIs that contains both parents and children, we can figure out where we have parent and child locations by looking at the parent_placekey column. This column is missing for any location without a parent. For children locations, it will tell you which location is the parent.
We can figure out children by any row that has a nonmissing parent_placekey column.
And we can figure out parents by *row for which its placekey is found as a parent_placekey of some other column.
Let's find this in our data here:
Now it just so happens that the data we've taken from the Shop from this demonstration includes exactly one parent: the Pleasant Valley Marketplace in Virginia Beach, and all of its children.
With our data loaded, and the parents and children identified, we can move on to our next question:
2. How can we work with data that has parent and child locations?
This requires us to think about what kind of parent/child relationship we have. The most important distinction is whether or not the child is enclosed within its parent.
The enclosed column tells you whether a child is enclosed within its parent. This is something like that Claire's in the mall. It's really inside that mall, as opposed to a burger joint in an outdoor strip mall, which is in that strip mall but maybe also sort of its own space. When a location like Claire's is enclosed, it can be difficult to tell the difference between a device being in Claire's and being near Claire's inside the mall.
The way we deal with parent and child locations differs considerably based on whether the children are enclosed or not.
In the case of Pleasant Valley Marketplace, the children are enclosed (enclosed == True).
Children that are enclosed (enclosed == True) are basically not distinguished from their parents. They do not have their own separate foot traffic data. Sometimes they have their own polygons, but sometimes they're just a part of the parent polygon, although they may have their own latitude/longitude data.
Children that are not enclosed (enclosed == False) have parents but also act as independent locations. We track visitor data like visits_per_day separately for those locations, and they have their own polygon data in the polygon_wkt column.
The Pleasant Valley Marketplace is full of enclosed children, so we'll talk about how to handle that first.
How can we handle data from enclosed children? Well, we can ignore the children's foot traffic data, since it doesn't really have any. We can take the parent foot traffic data we see, and that covers the entire region.
However, while we don't have traffic data for the children, we do have plenty of other information about them from the core information columns.
For example, maybe we're interested in the kinds of businesses and locations that are inside the shopping center.
3. How can we think about the spatial orientation of parent and child locations?
In the case of enclosed-children data where the children have their own polygons, you can handle spatial orientation as normal. Simply look at the polygons!
But what if they don't, as in this data? At this point, you're stuck with just the parent polygon. But you can still do a little something with the children, because they will have latitude and longitude data that you can work with.
So we'll start by mapping out the parent polygon. The polygon_wkt column is information about the POI's polygon in WKT format.
We can add the child locations on as points (see this guide).
This looks like the kind of place where there's a long row of stores down the center, surrounded by parking. Let's make sure that makes sense. First of all... does the polygon include parking? We can check in the includes_parking_lot column.
Yep! That's a parking lot. Knowing that we have a parking lot is important for interpreting foot traffic data-for example, foot traffic to a McDonald's means something very different depending on whether or not we pick up the drive-thru.
So we have a long row of stores and a parking lot. Is "one long row of stores" what it actually looks like? We can take a quick check of Google Maps to see:
That looks right to me! What is notable is that Pleasant Valley turns out to be an outdoor shopping center, which goes to show that sometimes these kinds of locations can also be "enclosed."
Let's look at another part of the data, picking out a parent location that has non-enclosed children. Unlike the first data set, this one was not produced using the Match service, but rather by pulling all locations in a certain zip code. This means we can see, and understand how to import, this alternate structure of data.
This time we'll be working with the Shoppes at Lac de Ville, which is in Rochester, New York. This is the Placekey ID [email protected], and so to get both the parent and child data, we can look for that code in either the parent_placekey column or the placekey column.
And are these actually non-enclosed child locations? Let's make sure.
All false! That's what we were expecting.
The first thing to be aware of when dealing with non-enclosed data is that unless we're careful, we'll double-count foot traffic. Foot traffic that shows up for a child will also show up for its parent. So we will want to drop one or the other if we're going to be aggregating things up and don't want to double-count.
How can we tell if we have double-counting going on? Well, it should be going on any time you have a non-enclosed child location. But it's especially easy to see if the parent location doesn't have any visitors outside of its children, as is the case here. We can add up the daily visits for the parent, and for all the children, and should get the exact same values.
They're exactly the same! Clearly if we want to work with foot traffic data, we'll need to only use one or the other.
Next we can ask how to deal with the spatial arrangement of our data. This time, we have data where each place of interest has its own polygon in the polygon_wkt column.
Each POI having its own geometry is going to be the case whenever we have non-enclosed children, but keep in mind it will also sometimes be the case with enclosed children. Just be sure to look if it's there!
When it comes to polygons in close proximity like this, sometimes we can be certain of how well we have the shape down, and other times we can't. For this we'd want to look at the polygon_class column. Ideally we want this to be an OWNED_POLYGON indicating that we can map the location to a specific polygon. Otherwise, there might be a little uncertainty. What do we have here?
We have a single parent location that is an OWNED_POLYGON, as well as 13 children OWNED_POLYGONs. In addition, we have 19 children SHARED_POLYGONs, which means that multiple POIs have ended up sharing the same polygon - these POIs can't be distinguished, or they may literally share the same space. More detail here.
As you might expect, the child polygons sit inside of the parent polygon. Using the same methods as before, this time we can actually get the internal structure of the location.
We can see the exact structure taken up by the children. It doesn't fill the whole parent space! And yet, every single visit to the parent was accounted for by a child. What gives? Well, all that blank space is parking lot, as we can tell from a quick glance at the Google Maps shot of the location:
And while the parent location includes the parking lot...
The children don't...
And so what's happening? SafeGraph is willing to count visits to parent POIs that aren't to any children, and there are areas here that are part of the parent but not the children. But in this case we can see that we aren't counting any visits from the parking lot to the parent POI (or perhaps there weren't any, but that seems unlikely). Good to know!
So there we have it! Some reminders:
That's it – that's all we do. We want to understand the physical world and power innovation through open access to geospatial data. We believe data should be an open platform, not a trade secret. Information should not be hoarded so that only a few can innovate.