Working with Locations Inside Other Locations

March 10, 2021
Nick Huntington-Klein

SafeGraph data is all about examining foot traffic at a given location. However, what do you do when a single piece of land can be rightfully claimed by two different locations?

It's not that unusual for one place to be two places. For example, the Claire's in your nearby mall. Is that in the Claire's? Is that in the mall? Really, it's in both.

SafeGraph addresses this problem using the concept of parent and child locations. Some overarching location that contains many other locations within it is a parent. Common types of parents include malls, airports, and shopping centers. You can see the full list of location types that can be parents, and some more information about parent/child relationships, here. The location that's inside of some sort of parent is a child.

This blog will show you how to identify parent and child locations in data that comes from the SafeGraph Shop, and what to do about them. When you're ready to get started, check out this notebook.

Dealing with parent and child locations can be an important part of working with SafeGraph data, even if you're not interested in distinguishing them, since if you're not careful you could end up double-counting foot traffic: once for the parent, and once for the child.

So some important questions:

  1. How can we look in our Shop data and figure out where we have parent and child locations?
  2. How can we work with data that has parent and child locations?
  3. How can we think about the spatial orientation of parent and child locations?

Let's start by opening up our Shop data and loading it in.

This particular store data was constructed using the Match Service, where we found matches for a bunch of children POIs. This contains a set of customer_ columns of the child data used to match, and then another set of columns containing the match data, i.e. the parent.

Store data that was acquired in a different way, for example getting all POIs in a certain city, might be structured slightly differently, so the next steps might not be necessary for you.

So, question 1 as above:

1. How can we look in our Shop data and figure out where we have parent and child locations?

We've got the Shop data. We separated out all the match data, but for a given set of POIs that contains both parents and children, we can figure out where we have parent and child locations by looking at the parent_placekey column. This column is missing for any location without a parent. For children locations, it will tell you which location is the parent.

We can figure out children by any row that has a nonmissing parent_placekey column.

And we can figure out parents by *row for which its placekey is found as a parent_placekey of some other column.

Let's find this in our data here:

Now it just so happens that the data we've taken from the Shop from this demonstration includes exactly one parent: the Pleasant Valley Marketplace in Virginia Beach, and all of its children.

With our data loaded, and the parents and children identified, we can move on to our next question:

2. How can we work with data that has parent and child locations?

This requires us to think about what kind of parent/child relationship we have. The most important distinction is whether or not the child is enclosed within its parent.

The enclosed column tells you whether a child is enclosed within its parent. This is something like that Claire's in the mall. It's really inside that mall, as opposed to a burger joint in an outdoor strip mall, which is in that strip mall but maybe also sort of its own space. When a location like Claire's is enclosed, it can be difficult to tell the difference between a device being in Claire's and being near Claire's inside the mall.

The way we deal with parent and child locations differs considerably based on whether the children are enclosed or not.

In the case of Pleasant Valley Marketplace, the children are enclosed (enclosed == True).

Children that are enclosed (enclosed == True) are basically not distinguished from their parents. They do not have their own separate foot traffic data. Sometimes they have their own polygons, but sometimes they're just a part of the parent polygon, although they may have their own latitude/longitude data.

Children that are not enclosed (enclosed == False) have parents but also act as independent locations. We track visitor data like visits_per_day separately for those locations, and they have their own polygon data in the polygon_wkt column.

The Pleasant Valley Marketplace is full of enclosed children, so we'll talk about how to handle that first.

Working with Enclosed Children

How can we handle data from enclosed children? Well, we can ignore the children's foot traffic data, since it doesn't really have any. We can take the parent foot traffic data we see, and that covers the entire region.

However, while we don't have traffic data for the children, we do have plenty of other information about them from the core information columns.

placekey parent_placekey location_name
0 [email protected] [email protected] Western Union
1 [email protected] [email protected] Sharps Barbershop
2 [email protected] [email protected] Cajun Seafood & Wings
4 [email protected] [email protected] Mamma Mia Pizzeria
5 [email protected] [email protected] Kingdom World Outreach Center
6 [email protected] [email protected] Saigon 1
7 [email protected] [email protected] DHL
8 [email protected] [email protected] Sally\'s Bakery & Grocery
9 [email protected] [email protected] Smoke Shack
10 [email protected] [email protected] Dolphin Laundromat
11 [email protected] [email protected] State Farm
12 [email protected] [email protected] Krossroads Cafe and Tavern
13 [email protected] [email protected] Allstate Insurance
14 [email protected] [email protected] Iglesia Cristiana Rios de Agua Viva de Virginia Beach
15 [email protected] [email protected] Family Dollar Stores
16 [email protected] [email protected] Tung Hoi Chinese Restaurant
17 [email protected] [email protected] Fj Beauty Studios
18 [email protected] [email protected] Adamo\'s New York Pizzeria
19 [email protected] [email protected] Food Lion
20 [email protected] [email protected] Tokyo Express

For example, maybe we're interested in the kinds of businesses and locations that are inside the shopping center.

top_category sub_category
Activities Related to Credit Intermediation Other Activities Related to Credit Intermediation 1
Agencies, Brokerages, and Other Insurance Related Activities Insurance Agencies and Brokerages 2
Bakeries and Tortilla Manufacturing Retail Bakeries 1
Couriers and Express Delivery Services Couriers and Express Delivery Services 1
Drycleaning and Laundry Services Drycleaning and Laundry Services (except Coin-Operated) 1
General Merchandise Stores, including Warehouse Clubs and Supercenters All Other General Merchandise Stores 1
Grocery Stores Supermarkets and Other Grocery (except Convenience) Stores 1
Other Miscellaneous Store Retailers Tobacco Stores 1
Personal Care Services Barber Shops 1
Beauty Salons 1
Religious Organizations Religious Organizations 2
Restaurants and Other Eating Places Full-Service Restaurants 7
3. How can we think about the spatial orientation of parent and child locations?

In the case of enclosed-children data where the children have their own polygons, you can handle spatial orientation as normal. Simply look at the polygons!

But what if they don't, as in this data? At this point, you're stuck with just the parent polygon. But you can still do a little something with the children, because they will have latitude and longitude data that you can work with.

So we'll start by mapping out the parent polygon. The polygon_wkt column is information about the POI's polygon in WKT format.

We can add the child locations on as points (see this guide).

This looks like the kind of place where there's a long row of stores down the center, surrounded by parking. Let's make sure that makes sense. First of all... does the polygon include parking? We can check in the includes_parking_lot column.

Yep! That's a parking lot. Knowing that we have a parking lot is important for interpreting foot traffic data-for example, foot traffic to a McDonald's means something very different depending on whether or not we pick up the drive-thru.

So we have a long row of stores and a parking lot. Is "one long row of stores" what it actually looks like? We can take a quick check of Google Maps to see:

Source: Google Maps

That looks right to me! What is notable is that Pleasant Valley turns out to be an outdoor shopping center, which goes to show that sometimes these kinds of locations can also be "enclosed."

Working with Non-Enclosed Children

Let's look at another part of the data, picking out a parent location that has non-enclosed children. Unlike the first data set, this one was not produced using the Match service, but rather by pulling all locations in a certain zip code. This means we can see, and understand how to import, this alternate structure of data.

This time we'll be working with the Shoppes at Lac de Ville, which is in Rochester, New York. This is the Placekey ID [email protected], and so to get both the parent and child data, we can look for that code in either the parent_placekey column or the placekey column.

placekey parent_placekey location_name bucketed_dwell_times
43 [email protected] NaN Shoppes At Lac De Ville {"<5":34,"5-10":421,"11-20":274,"21-60":430,"61-120":133,"121-240":57,">240":177}
56 [email protected] [email protected] Allstate Insurance NaN
59 [email protected] [email protected] Parker Robt E III DDS {"<5":1,"5-10":2,"11-20":1,"21-60":14,"61-120":8,"121-240":1,">240":0}
97 [email protected] [email protected] Silk {"<5":2,"5-10":9,"11-20":7,"21-60":2,"61-120":1,"121-240":0,">240":0}
153 [email protected] [email protected] Citizens Bank {"<5":0,"5-10":1,"11-20":0,"21-60":0,"61-120":1,"121-240":0,">240":0}
187 [email protected] [email protected] Ritz Stacey M Od {"<5":1,"5-10":3,"11-20":3,"21-60":25,"61-120":12,"121-240":2,">240":5}
224 [email protected] [email protected] Mobile Notary Service NaN
227 [email protected] [email protected] Joseph I Mann MD Greater Rochester Neurology {"<5":0,"5-10":5,"11-20":1,"21-60":16,"61-120":5,"121-240":2,">240":0}
229 [email protected] [email protected] Project Leannation {"<5":2,"5-10":23,"11-20":4,"21-60":8,"61-120":0,"121-240":1,">240":0}
283 [email protected] [email protected] Visionary Eye Associates {"<5":0,"5-10":2,"11-20":1,"21-60":3,"61-120":1,"121-240":0,">240":0}
286 [email protected] [email protected] Julian\'s Dry Cleaners NaN
296 [email protected] [email protected] Rochester Eye Associates NaN
301 [email protected] [email protected] Mesquite Grill {"<5":0,"5-10":4,"11-20":4,"21-60":4,"61-120":3,"121-240":1,">240":11}
395 [email protected] [email protected] Dollar General {"<5":0,"5-10":31,"11-20":25,"21-60":37,"61-120":10,"121-240":15,">240":28}
396 [email protected] [email protected] Bolsa Nails {"<5":0,"5-10":0,"11-20":0,"21-60":0,"61-120":0,"121-240":1,">240":0}
426 [email protected] [email protected] Thimble Tailoring & Clothier NaN
439 [email protected] [email protected] M&T Bank {"<5":0,"5-10":1,"11-20":0,"21-60":3,"61-120":1,"121-240":0,">240":1}
463 [email protected] [email protected] Liberty Wine & Liquor {"<5":0,"5-10":2,"11-20":0,"21-60":0,"61-120":0,"121-240":0,">240":0}
471 [email protected] [email protected] Feet First Shoes and Pedorthics {"<5":1,"5-10":3,"11-20":7,"21-60":2,"61-120":0,"121-240":1,">240":0}
514 [email protected] [email protected] CVS {"<5":0,"5-10":14,"11-20":11,"21-60":10,"61-120":3,"121-240":2,">240":0}
558 [email protected] [email protected] Boomtown Cafe {"<5":0,"5-10":20,"11-20":8,"21-60":14,"61-120":3,"121-240":1,">240":4}
580 [email protected] [email protected] Evangelisti Reconstructive & Plastic Surgery {"<5":0,"5-10":1,"11-20":0,"21-60":9,"61-120":4,"121-240":0,">240":0}
618 [email protected] [email protected] Oreck {"<5":1,"5-10":0,"11-20":1,"21-60":0,"61-120":0,"121-240":0,">240":0}
625 [email protected] [email protected] Dupont David OD {"<5":1,"5-10":3,"11-20":3,"21-60":11,"61-120":7,"121-240":0,">240":0}
779 [email protected] [email protected] Amaya Indian Cuisine {"<5":2,"5-10":22,"11-20":5,"21-60":17,"61-120":3,"121-240":7,">240":18}
785 [email protected] [email protected] Stephen Evangelisti {"<5":0,"5-10":0,"11-20":0,"21-60":0,"61-120":0,"121-240":0,">240":1}
835 [email protected] [email protected] MacGregor\'s Grill & Tap {"<5":6,"5-10":73,"11-20":38,"21-60":21,"61-120":8,"121-240":2,">240":1}
854 [email protected] [email protected] Paislee Boutique {"<5":0,"5-10":5,"11-20":4,"21-60":4,"61-120":0,"121-240":0,">240":0}
857 [email protected] [email protected] Tops Friendly Markets {"<5":15,"5-10":147,"11-20":123,"21-60":158,"61-120":17,"121-240":11,">240":45}
868 [email protected] [email protected] Brighton Towne Dental {"<5":1,"5-10":1,"11-20":2,"21-60":10,"61-120":4,"121-240":2,">240":26}
889 [email protected] [email protected] United States Postal Service (USPS) NaN
900 [email protected] [email protected] Rita\'s Italian Ice {"<5":1,"5-10":24,"11-20":11,"21-60":29,"61-120":8,"121-240":4,">240":35}
902 [email protected] [email protected] CaminoByTheWay {"<5":0,"5-10":25,"11-20":15,"21-60":33,"61-120":34,"121-240":4,">240":2}

And are these actually non-enclosed child locations? Let's make sure.

All false! That's what we were expecting.

The first thing to be aware of when dealing with non-enclosed data is that unless we're careful, we'll double-count foot traffic. Foot traffic that shows up for a child will also show up for its parent. So we will want to drop one or the other if we're going to be aggregating things up and don't want to double-count.

How can we tell if we have double-counting going on? Well, it should be going on any time you have a non-enclosed child location. But it's especially easy to see if the parent location doesn't have any visitors outside of its children, as is the case here. We can add up the daily visits for the parent, and for all the children, and should get the exact same values.

parent_visits child_visits
0 25 25.0
1 28 28.0
2 28 28.0
3 54 54.0
4 77 77.0
5 55 55.0
6 54 54.0
7 58 58.0
8 29 29.0
9 30 30.0
10 67 67.0
11 73 73.0
12 61 61.0
13 55 55.0
14 58 58.0
15 33 33.0
16 22 22.0
17 43 43.0
18 83 83.0
19 51 51.0
20 50 50.0
21 66 66.0
22 50 50.0
23 25 25.0
24 56 56.0
25 58 58.0
26 62 62.0
27 53 53.0
28 65 65.0
29 36 36.0
30 21 21.0

They're exactly the same! Clearly if we want to work with foot traffic data, we'll need to only use one or the other.

Next we can ask how to deal with the spatial arrangement of our data. This time, we have data where each place of interest has its own polygon in the polygon_wkt column.

56 POLYGON ((-77.59417363840089 43.119868567405916, -77.59414413410173 43.119943943129186, -77.59391480523095 43.119885208807354, -77.5939416273211 43.11981472754674, -77.59417363840089 43.119868567405916))
59 POLYGON ((-77.59143161215052 43.120892209242385, -77.5912596878754 43.120816922141515, -77.591356 43.120596, -77.591542 43.120639, -77.59143161215052 43.120892209242385))
97 POLYGON ((-77.59414341246315 43.11994394312916, -77.59407635723778 43.120070221730195, -77.59401466643044 43.120098609906954, -77.593844346158 43.12004868586318, -77.59391274248787 43.119885208807375, -77.59414341246315 43.11994394312916))
153 POLYGON ((-77.59200493412783 43.120714745000875, -77.59228924828341 43.12079207746439, -77.59237105565836 43.120632273247, -77.59284446554949 43.120730896725995, -77.59273985939791 43.12101428539153, -77.59253094468227 43.12150846710204, -77.59250412259212 43.12151042486173, -77.5924638894569 43.12157307313886, -77.59240219864955 43.121608312766554, -77.59197036299815 43.12150455158246, -77.5919971850883 43.12144777652045, -77.59196231637111 43.12144777652045, -77.59193549428096 43.12150259382258, -77.59167800221553 43.12143211442511, -77.59185502801051 43.121030771864206, -77.59177456174007 43.121015109662125, -77.59183088812938 43.120928967478925, -77.59188453230968 43.12094854525849, -77.59200493412783 43.120714745000875))
187 POLYGON ((-77.5918583166116 43.120005479690604, -77.59162764663633 43.119952618856665, -77.59174566383298 43.11967656709296, -77.59197901601728 43.11972747034871, -77.5918583166116 43.120005479690604))
224 POLYGON ((-77.59200493412783 43.120714745000875, -77.59228924828341 43.12079207746439, -77.59237105565836 43.120632273247, -77.59284446554949 43.120730896725995, -77.59273985939791 43.12101428539153, -77.59253094468227 43.12150846710204, -77.59250412259212 43.12151042486173, -77.5924638894569 43.12157307313886, -77.59240219864955 43.121608312766554, -77.59197036299815 43.12150455158246, -77.5919971850883 43.12144777652045, -77.59196231637111 43.12144777652045, -77.59193549428096 43.12150259382258, -77.59167800221553 43.12143211442511, -77.59185502801051 43.121030771864206, -77.59177456174007 43.121015109662125, -77.59183088812938 43.120928967478925, -77.59188453230968 43.12094854525849, -77.59200493412783 43.120714745000875))
227 POLYGON ((-77.59164308521562 43.12040763170624, -77.59147215835709 43.120331919138046, -77.59159 43.120058, -77.591775 43.120101, -77.59164308521562 43.12040763170624))
229 POLYGON ((-77.59423801141725 43.11974620402106, -77.59417363840089 43.11987052521808, -77.59394430953012 43.11981276973279, -77.59399124818788 43.1196972585986, -77.59423801141725 43.11974620402106))
283 POLYGON ((-77.591946 43.121896, -77.591944 43.121862, -77.591803 43.121865, -77.591801 43.121798, -77.591748 43.121784, -77.591736 43.121808, -77.591592 43.12177, -77.591574 43.121808, -77.591446 43.121774, -77.591542 43.121581, -77.591813 43.121654, -77.591804 43.121671, -77.591936 43.121668, -77.591937 43.121699, -77.592063 43.121696, -77.592065 43.121762, -77.592282 43.121757, -77.592288 43.121889, -77.591946 43.121896))

Each POI having its own geometry is going to be the case whenever we have non-enclosed children, but keep in mind it will also sometimes be the case with enclosed children. Just be sure to look if it's there!

When it comes to polygons in close proximity like this, sometimes we can be certain of how well we have the shape down, and other times we can't. For this we'd want to look at the polygon_class column. Ideally we want this to be an OWNED_POLYGON indicating that we can map the location to a specific polygon. Otherwise, there might be a little uncertainty. What do we have here?

We have a single parent location that is an OWNED_POLYGON, as well as 13 children OWNED_POLYGONs. In addition, we have 19 children SHARED_POLYGONs, which means that multiple POIs have ended up sharing the same polygon - these POIs can't be distinguished, or they may literally share the same space. More detail here.

As you might expect, the child polygons sit inside of the parent polygon. Using the same methods as before, this time we can actually get the internal structure of the location.

We can see the exact structure taken up by the children. It doesn't fill the whole parent space! And yet, every single visit to the parent was accounted for by a child. What gives? Well, all that blank space is parking lot, as we can tell from a quick glance at the Google Maps shot of the location:

Source: Google Maps

And while the parent location includes the parking lot...

The children don't...

And so what's happening? SafeGraph is willing to count visits to parent POIs that aren't to any children, and there are areas here that are part of the parent but not the children. But in this case we can see that we aren't counting any visits from the parking lot to the parent POI (or perhaps there weren't any, but that seems unlikely). Good to know!

Wrapping Up

So there we have it! Some reminders:

  • Parent locations like malls and airports have children location inside of them
  • Locations can include or exclude parking lots
  • Children can be enclosed or non-enclosed
  • Enclosed children don't get their own foot traffic data
  • Non-enclosed children do, and if you're aggregating up you want to drop either the parents or the non-enclosed children or else you'll double-count
  • Some enclosed children don't have their own polygons
  • Some enclosed children, and all non-enclosed children, should have their own polygons
  • The parent polygon can be bigger than the full list of its children, but sometimes this additional area doesn't record visits (sometimes it does, though)

Ready to get started? Use our Locations Inside Other Locations Notebook to run through this analysis.

Nick Huntington-Klein
Nick Huntington-Klein
Contributor @ SafeGraph
SafeGraph is just a data company

That's it – that's all we do. We want to understand the physical world and power innovation through open access to geospatial data. We believe data should be an open platform, not a trade secret. Information should not be hoarded so that only a few can innovate.