How others are handling their monthly data ingest and updates?

Steve_Lee · October 18, 2021, 7:36pm

Hi SafeGraph Community! I am curious to learn how others are handling their monthly data ingest and updates - what are peoples process to track updates in the most recent releases and making changes to their database?

This topic was automatically generated from Slack. You can find the original thread here.

Niki_Kaz · October 18, 2021, 7:36pm

Ohhh interesting point! Might be good to pull some power users from the Community on this.

@Mohsen_Bahrami_MIT any thoughts on this? I know you’re frequently pulling data from the Catalog.

@J_Hathaway you might have some good input on this? Especially with the volumes of data you’re working with.

@Kate_Schertz_University_of_Chicago do you have any suggestions for Steve?

@Christian_Gunning_University_of_Georgia any feedback for Steve on this?

Niki_Kaz · October 18, 2021, 7:37pm

Looping in a few others that might have input:

@Yang_Wang_Temple_University I know you’re a heavy AWS CLI user. Thought of you when Steve posted this question.

@Aaron_Yelowitz_University_of_Kentucky are you still pulling frequently for some of your Texas work we’ve previously discussed? you might have some thoughts that Steve might find valuable.

@Charisse_Madlock_Brown_University_of_Tennessee_Health_Scienc I thought you might be a good person to comment on this too. Any thoughts are appreciated!

Mohsen_Bahrami_MIT · October 18, 2021, 7:37pm

@Steve_Lee we don’t use any automated method for this, but manually check for updates monthly through the Catalog.

Kate_Schertz_University_of_Chicago · October 18, 2021, 7:37pm

I also don’t have an automated process. My analyses are all based on historical data so I pulled the July backfill using Cloudberry to access the AWS bucket.

J_Hathaway · October 18, 2021, 7:37pm

I am just starting down this element of the journey. I will most likely be asking @Steve_Lee his thoughts in January…

Yang_Wang_Temple_University · October 18, 2021, 7:37pm

I’m using a shell script that does aws sync once a week for all the relevant folders and then processing this data using some python codes tailored for my usage.