Anybody know of an easy way that I could aggregate the census_block_group values in a dataframe of social distancing data (12 digits) to the tract (11 digits) with pandas and python?

Rohan_Bansal · April 22, 2020, 12:00am

does anybody know of an easy way that I could aggregate the census_block_group values in a dataframe of social distancing data (12 digits) to the tract (11 digits) with pandas and python. Basically merge rows that have the same first 11 digits of their census_block

Ryan_Fox_Squire_SafeGraph · April 23, 2020, 12:18am

@Rohan_Bansal

just create a new column reading the first 11 chars from the CBG file like this

df['tract'] = df['census_block_group'].str.slice(start=0, stop=11)

then pd.merge() using that column

Ryan_Fox_Squire_SafeGraph · April 23, 2020, 12:19am

Rohan_Bansal · April 23, 2020, 12:19am

Yeah, but will the merge add values for other columns.

Rohan_Bansal · April 23, 2020, 12:20am

for example, device counts should be added to represent the entire tract

Ryan_Fox_Squire_SafeGraph · April 23, 2020, 12:20am

oh, sorry.

before you join do

tract_df = df.groupby(['tract']).sum().reset_index()

Ryan_Fox_Squire_SafeGraph · April 23, 2020, 12:20am

just groupby the tract_id

Rohan_Bansal · April 23, 2020, 12:21am

ok, and for dictionary or list values in that dataframe, they will be appended not added element wise?

Ryan_Fox_Squire_SafeGraph · April 23, 2020, 12:22am

@Rohan_Bansal i guess not. If you want to re-aggregate the JSON or array columns from CBG to tract, then you will need to explode them out first at the CBG level.

Ryan_Fox_Squire_SafeGraph · April 23, 2020, 12:23am

Rohan_Bansal · April 23, 2020, 12:24am

Ok, thanks for the information and help.

Rohan_Bansal · April 23, 2020, 2:10am

@Ryan_Fox_Squire_SafeGraph any idea why I’m getting the JSON object must be str, bytes or bytearray, not float

Rohan_Bansal · April 23, 2020, 2:11am

When I do:

Rohan_Bansal · April 23, 2020, 2:11am

parsed_bucket_distance = nyc.bucketed_distance_traveled.apply(lambda x: json.loads(x))

Ryan_Fox_Squire_SafeGraph · April 28, 2020, 9:14pm

@Rohan_Bansal are you still struggling with this, or were you able to resolve?

If you are still struggling, can you confirm what type(nyc['bucketed_distance_traveled']) gives you? It’s weird that it thinks this column is a float instead of a string

Rohan_Bansal · April 28, 2020, 9:30pm

I think it just null values. had to remove rows and the function worked properly