I made a notebook to demonstrate/test the new Python client for the Placekey API

Hi all, I made a notebook to demonstrate/test the new Python client for the Placekey API. The notebook uses dataset consisting of ~24,000 lat/lon coordinates of cellular towers. First, the requests are sent in bulk to the Python client using PlacekeyAPI.lookup_placekeys(). The rate limit is hit, an error message is given, and the completed queries are returned. The error message is helpful, but not all the entries are given a placekey because of the rate limit. Second, I wrote a simple wrapper function getPlacekeys() to send the requests in batches of 50 with a 1 second sleep in between each batch. All entries are given a placekey, but the approach is quite a bit slower.

I’m sure the wrapper function could be further optimized, but this demonstrates the tradeoff. Perhaps in the future the Python client could automatically slow down to avoid rate limit issues? Either way, the Python client makes things much more straightforward!

@Ryan_Fox_Squire_SafeGraph

@Ryan_Kruse_MN_State thanks for laying this out clearly. super helpful.

The client (is supposed to) rate limits itself, and it will eventually stop retrying. This is desirable in case the API goes down. But it shouldn’t be encountered in general usage, so maybe the parameters need to be adjusted or there is some other bug.

So what you are seeing is not expected. Our engineering team is looking into it and will provide updates. Your notebook will be a good test to see if we fix it.

cc: @Russ_Thompson_SafeGraph

@Ryan_Kruse_MN_State We’ve published an updated version of placekey-py that fixes the rate limiting issue. Let us know if you give it a spin or run into any other issues.

@Russ_Thompson_SafeGraph Awesome. I just tried again, and I’m getting some strange behavior… The rate limit issue is handled and all placekeys are returned, which is great. However, it seems like the query_id values returned are not the same as the originals that were sent with the request. For example, after query 100, the next query ID is 201 (when it should have been 101). The largest query ID I requested with was 23400, but the largest returned by the client/API was 46898.

This makes it more difficult to merge the placekeys back with the original data.

Do the queries you’re sending have query_id defined as part of the address data?

@Russ_Thompson_SafeGraph Yes, here is an example of the format of each query:

 'latitude': 39.9372222222,
 'longitude': -88.3119444444,
 'query_id': '101'}```

That’s a fun bug. The code interpolates query_id when it’s not provided in the query, but the check for that is currently whether or not the response query_id is numeric. Thanks for catching it.

No problem! Let me know if you’d like me to run the set of queries again.

@Russ_Thompson_SafeGraph will you let us know when the bug is fixed and we can test again?

Sure. The PR is out for review, so the update should be available soon.

@Ryan_Kruse_MN_State we think that this shoudl be fixed in v0.0.8 which is now published. When you have a chance, could you try re-running this notebook again and confirming that things work correctly? Then we can finally wrap up this project and update the public-facing notebook with the new functionality.

Again the goal is to show that using the placekey python package to hit the placekey API performs as well or better than the custom function we originally wrote.

@Ryan_Fox_Squire_SafeGraph I ran it again and it successfully handles the rate limit while returning 100% of the placekeys. So the bug is fixed :+1:

@Ryan_Kruse_MN_State great! So is this ready for me to push as an update to the official tutorial notebook on GitHub?

@Ryan_Fox_Squire_SafeGraph no, I just need to remove a couple code blocks then it’ll be ready. I’ll let you know when, won’t be long

@Ryan_Fox_Squire_SafeGraph What dataset would you like in the demo? The notebook currently uses the cell towers, which is only latitude and longitude so it only returns “where” portion of placekey which is not ideal for a demo probably

@Ryan_Kruse_MN_State IDeally we can use the same as the original:

https://github.com/Placekey/placekey-notebooks/blob/main/notebooks/Adding_Placekey_to_your_POI_dataset_using_python_and_the_Placekey_API.ipynb

I was hoping maybe all I will need to do is update the function getPlacekeys() or maybe change some other things in Output 55 but keep everything else the same and just re-run it.

does that make sense?

I haven’t looked in detail at your notebook, but I’m hoping i can make some surgical changes to the current notebook and keep everything else the same

@Ryan_Fox_Squire_SafeGraph Here is the updated notebook: https://colab.research.google.com/drive/1lQTlxLWJKP4ZlG1SEtpcVlA33ePGO0PR?usp=sharing

I think this should be ready to go in terms of replacing the other notebook. Feel free to run through it and let me know of any changes you’d like me to make

@Ryan_Kruse_MN_State OK thanks, will look at this today and let you know if I have any questions.

@Russ_Thompson_SafeGraph I noticed that the new placekey API function lookup_placekeys has an optional parameter of batch_size. https://placekey.github.io/placekey-py/placekey.html#placekey.api.PlacekeyAPI.lookup_placekeys

Is this a parameter that a user would/should use? I’m wondering if there is any additional advice I should call out about this parameter in the notebook.

Since the API auto-handles rate-limting etc, is there a practical impact to a user tweaking this parameter? Is there a use case where we would recommend a user change batch_size ?

cc: @Ryan_Kruse_MN_State

Generally, I wouldn’t recommend users tweak that setting.