Why aren't the contents of matching Core Places and Patterns columns identical?

My data query from shop.safegraph.com included both core places and weekly patterns data joined into a single table, rather than two separate tables. This wouldn’t be inherently problematic, but this joined table included duplicate columns that are attributed to both source tables, e.g. sg_c__parent_placekey and sg_wp__parent_placekey. Again, not necessarily a problem, but in some records, the data in these columns do not match! Could someone shed some light on why this might be, and what the appropriate resolution is? Happy to share some examples if needed.

@cpjackson Can you please share some examples where you are seeing this?

Absolutely! Here’s a few rows of weekly patterns whose parent placekeys disagree (only by one character, but still!)

  date_range_start   date_range_end   placekey              sg_c__parent_placekey     sg_wp__parent_placekey
  ------------------ ---------------- --------------------- ------------------------- --------------------------
  2021-11-29         2021-12-06       [email protected]   [email protected]       [email protected]
  2022-01-17         2022-01-24       [email protected]   [email protected]       [email protected]
  2021-12-27         2022-01-03       [email protected]   [email protected]       [email protected]
  2021-12-06         2021-12-13       [email protected]   [email protected]       [email protected]
  2022-01-03         2022-01-10       [email protected]   [email protected]       [email protected]

I haven’t yet reviewed my full dataset and every possibly duplicated column, so this is just one example.

@cpjackson Thanks for sharing. Adding some context about Placekey that might be helpful. I’ll still look into this a bit more but also sharing a link for Placekey feedback at the bottom of this post.

The What part of a Placekey is split into two triplets (for example: 223-227) where the first triplet is a serial index of an address located in the Where part of the Placekey, and the second triplet is a serial index of POIs located at that address. These are referred to as the address encoding, and POI encoding, respectively.

The POI encoding is optional, while the address encoding will always be present if the Placekey has a What part. If the POI does not have an actual address (like some parks or monuments), the address encoding will be “zzz”. What parts are only unique up to the Where part of a Placekey. You should be able to identify two companies as unique Placekeys even if they are right next to each other.

Would you be able to provide the feedback here so our product team can take a closer look?

I’m happy to send feedback to the Placekey team as well, but the issue is not limited to just the Parent Placekey columns. Here’s an example with some mismatches in the location names:

  date_range_start   date_range_end   placekey              sg_c__location_name                             sg_wp__location_name
  ------------------ ---------------- --------------------- ----------------------------------------------- ------------------------------------------
  2020-05-04         2020-05-11       [email protected]   Suzy's Freeze                                   Suzys Freeze
  2020-10-19         2020-10-26       [email protected]   John Van DDS Dental Arts of Mountain View       Dental Arts of Mountain View
  2021-10-04         2021-10-11       [email protected]   Lee Gary D DDS Dental Associates of Riverside   Dental Associates of Riverside
  2022-01-03         2022-01-10       [email protected]   Ucb Clark Kerr Campus Childrens Center          Ucb Clark Kerr Campus Children's Center
  2019-07-22         2019-07-29       [email protected]   Sunset 80s                                      Whisky A Go Go
  2019-12-02         2019-12-09       [email protected]   The Garage On Pico                              The Garage
  2018-04-23         2018-04-30       [email protected]   The Universal Church                            Tne Universal Church
  2020-02-10         2020-02-17       [email protected]   Nueva School The                                The Nueva School
  2018-11-12         2018-11-19       [email protected]   MediCann                                        Medicann
  2019-01-21         2019-01-28       [email protected]   The Universal Church                            Tne Universal Church

Here’s some mismatches in the street addresses:

  date_range_start   date_range_end   placekey              sg_c__street_address          sg_wp__street_address
  ------------------ ---------------- --------------------- ----------------------------- -------------------------------
  2019-07-15         2019-07-22       [email protected]   675 E Grand Blvd              675 E Grand Blvd Ste 105
  2019-04-22         2019-04-29       [email protected]   5151 Murphy Canyon Rd         5151 Murphy Canyon Rd Ste 200
  2018-03-05         2018-03-12       [email protected]   671 Via Alondra Ste 804       671 Via Alondra
  2020-05-11         2020-05-18       [email protected]   6850 Five Star Blvd           6850 Five Star Blvd Ste 1
  2020-08-17         2020-08-24       [email protected]   5151 Murphy Canyon Rd         5151 Murphy Canyon Rd Ste 200
  2019-01-28         2019-02-04       [email protected]   675 E Grand Blvd              675 E Grand Blvd Ste 105
  2020-07-27         2020-08-03       [email protected]   6850 Five Star Blvd           6850 Five Star Blvd Ste 1
  2021-01-18         2021-01-25       [email protected]   2000 Notre Dame Blvd          2000 Notre Dame Blvd Ste 100
  2019-11-11         2019-11-18       [email protected]   3998 Vista Way                3998 Vista Way Ste 108
  2020-02-17         2020-02-24       [email protected]   11101 W Olympic Blvd # 300    11101 W Olympic Blvd

Finally, some mismatches in the Brand IDs:

  date_range_start   date_range_end   placekey              sg_c__safegraph_brand_ids                   sg_wp__safegraph_brand_ids
  ------------------ ---------------- --------------------- ------------------------------------------- -------------------------------------------
  2020-06-22         2020-06-29       [email protected]   SG_BRAND_32c968bb1e341ec64fd75b776fdcc269   SG_BRAND_6cdab01ceb7ce1a3aa0ed865233aa6cb
  2019-09-30         2019-10-07       [email protected]   SG_BRAND_228837c9ee696b9e                   
  2020-01-06         2020-01-13       [email protected]   SG_BRAND_47e3fe60626ba5ec                   
  2020-12-07         2020-12-14       [email protected]   SG_BRAND_32c968bb1e341ec64fd75b776fdcc269   SG_BRAND_6cdab01ceb7ce1a3aa0ed865233aa6cb
  2019-11-18         2019-11-25       [email protected]   SG_BRAND_e410894e61868017                   
  2021-01-18         2021-01-25       [email protected]   SG_BRAND_47e3fe60626ba5ec                   
  2020-12-21         2020-12-28       [email protected]   SG_BRAND_d1b8667db4d35ac0                   
  2018-10-15         2018-10-22       [email protected]   SG_BRAND_32c968bb1e341ec64fd75b776fdcc269   SG_BRAND_6cdab01ceb7ce1a3aa0ed865233aa6cb
  2018-12-24         2018-12-31       [email protected]   SG_BRAND_47e3fe60626ba5ec                   
  2019-06-17         2019-06-24       [email protected]   SG_BRAND_228837c9ee696b9e     

I suspect that perhaps these Weekly Patterns data were once merged with an older version of Core Places, and then upon my data request, merged again with the current version?

To be fair, I don’t think any of these columns will impact my analysis. But I want to make sure I’m starting out in the right place.

Hi @Hayden_SafeGraph, I just wanted to see if you had any additional guidance about this?