Anybody know of an easy way that I could aggregate the census_block_group values in a dataframe of social distancing data (12 digits) to the tract (11 digits) with pandas and python?

does anybody know of an easy way that I could aggregate the census_block_group values in a dataframe of social distancing data (12 digits) to the tract (11 digits) with pandas and python. Basically merge rows that have the same first 11 digits of their census_block

@Rohan_Bansal

just create a new column reading the first 11 chars from the CBG file like this

df['tract'] = df['census_block_group'].str.slice(start=0, stop=11)

then pd.merge() using that column

Yeah, but will the merge add values for other columns.

for example, device counts should be added to represent the entire tract

oh, sorry.

before you join do

tract_df = df.groupby(['tract']).sum().reset_index()

just groupby the tract_id

ok, and for dictionary or list values in that dataframe, they will be appended not added element wise?

@Rohan_Bansal i guess not. If you want to re-aggregate the JSON or array columns from CBG to tract, then you will need to explode them out first at the CBG level.

Ok, thanks for the information and help.

@Ryan_Fox_Squire_SafeGraph any idea why I’m getting the JSON object must be str, bytes or bytearray, not float

When I do:

parsed_bucket_distance = nyc.bucketed_distance_traveled.apply(lambda x: json.loads(x))

@Rohan_Bansal are you still struggling with this, or were you able to resolve?

If you are still struggling, can you confirm what type(nyc['bucketed_distance_traveled']) gives you? It’s weird that it thinks this column is a float instead of a string

I think it just null values. had to remove rows and the function worked properly