does anybody know of an easy way that I could aggregate the census_block_group values in a dataframe of social distancing data (12 digits) to the tract (11 digits) with pandas and python. Basically merge rows that have the same first 11 digits of their census_block
just create a new column reading the first 11 chars from the CBG file like this
df['tract'] = df['census_block_group'].str.slice(start=0, stop=11)
then pd.merge()
using that column
Yeah, but will the merge add values for other columns.
for example, device counts should be added to represent the entire tract
oh, sorry.
before you join do
tract_df = df.groupby(['tract']).sum().reset_index()
just groupby the tract_id
ok, and for dictionary or list values in that dataframe, they will be appended not added element wise?
@Rohan_Bansal i guess not. If you want to re-aggregate the JSON or array columns from CBG to tract, then you will need to explode them out first at the CBG level.
Ok, thanks for the information and help.
@Ryan_Fox_Squire_SafeGraph any idea why I’m getting the JSON object must be str, bytes or bytearray, not float
When I do:
parsed_bucket_distance = nyc.bucketed_distance_traveled.apply(lambda x: json.loads(x))
@Rohan_Bansal are you still struggling with this, or were you able to resolve?
If you are still struggling, can you confirm what type(nyc['bucketed_distance_traveled'])
gives you? It’s weird that it thinks this column is a float instead of a string
I think it just null values. had to remove rows and the function worked properly