Recently “data” has been an ugly word. Queue up news on Cambridge Analytica or GDPR. But like many technical advancements, data can be used for both the benefit of society and can also be used by bad actors to attack society.
Data about people, when used correctly, can significantly increase our understanding of society and has the potential to better the lives of everyone in the world. Better understanding of data can lead to more effective cancer treatments, reduction of traffic in cities, greater understanding of income inequality, a deeper appreciation of why people vote, and much much more.
Professor Raj Chetty (of Stanford University) has been publishing some of the most interesting economic papers this decade because of his unique access to IRS data. Chetty has access to almost 30 years of tax records. Through this data, Chetty (and his colleagues) have been able to study income mobility in the United States in a way that was never able to do in the past.
Of course, the IRS data is extremely privacy-sensitive (it includes the tax returns of everyone in America) and must be handled with care. Chetty has to take many precautions when using the data (including accessing it only in IRS clean-rooms). The precautions are incredibly necessary because of the sensitivity of the data.
The collaboration between Chetty and the IRS have had a significant impact on our understanding of society. For example, Chetty found that the share of U.S. children making more than their parents dropped from around 90% in the 1940s to just over 50% in the 1980s.
Science Magazine published an incredibly interesting paper by Professor Keith Chen (of UCLA). Chen is interested in how increased political polarization in America affects quality of life. He used geospatial mobility data from anonymous devices (sourced from SafeGraph) and tied it to public voting pattern data by precinct to analyze the travel patterns of anonymous people during Thanksgiving immediately following the 2016 election.
He found that families coming together from both heavily Democratic and Republican voting precincts had significantly shorter Thanksgivings than families which gathered from only one political voting precinct. This effect was correlated with the degree of political marketing ad spend in those precincts, suggesting that there is a correlation between political polarization and cross-political family cohesiveness. This has fascinating implications for the interaction between government politics and the lives of citizens.
Mobility data, even when appropriately anonymized, has the potential to be extremely sensitive. Like the IRS data, it can be abused and so it needs to be controlled, respected, and kept secure. Also like the IRS data, analyzing the mobility data can have incredible benefits to society that make it worthwhile to do the hard work to enable the positive benefits while protecting against possible abuse of the data.
While IRS and mobility data are sensitive, health care data is even more personal. And while there are a lot of protections of this data (like HIPAA) many institutions have deep medical data tied to exact-person identities. If used properly, healthcare data can save lives and massively enhance quality of life. But the data also has the potential to be used to take advantage of patients … so protection and controls are paramount.
IBM Watson has made a lot of strides in analyzing oncology data for good. IBM partnered with Sloan Kettering, one of the top medical institutions in the world, to analyze cancer treatments and outcomes. The hope is that the data will help doctors optimize treatments for individual cases to lead to the very best outcomes.
Flatiron Health (recently acquired by Roche for $1.9 billion) has also made significant strides in getting better cancer treatments. In 2014 the company raised $130 million and used the bulk of that to acquire an oncology CRM software company that also develops electronic medical records systems. They are using the proprietary data to positively impact the lives of potentially millions of people.
In other areas of the healthcare field, Datavant is working to use health data to help life-saving drugs come to market faster. (conflict note: I am an investor in Datavant).
Some of the most interesting data is individual credit card data. There are about a half-dozen companies that sell credit card data (many at the individual level). This data, of course, is very private and we would never want an individual’s data to become public.
Credit card data offers significant insight into society. We can even learn a great deal from the mundane, like Second Measure’s study that more people buy flowers on Mother’s Day than Valentine’s Day. (conflict note: I am an investor in Second Measure). More interesting, we can predict flu and epidemic patterns in specific areas based on spending data. This could allow for better deployment of public health resources and lead to significant life savings.
We must re-envision credit card data as a powerful tool to improve society (and not just to optimize the betting algorithms of quant funds on Wall Street).
Professor Susan Athey (of Stanford University) wrote (in 2017) a seminal piece in Science on how societies can better take advantage of data to help people. More access to data has amazing potential to transform the efficiency of cities, provide better health care, and help policy makers to grow the economy. Large, deep data sets about people have the potential to unlock some of society’s greatest secrets (like how to have a successful marriage, more effectively raise children, increase happiness, and increase altruism).
The opportunities for benefits are massive, but to reap those benefits, we also need to build protections. We can develop smart policies that both protect people’s privacy while enabling innovation. Blunt instruments like GDPR will almost certainly have a chilling effect on innovation and will enable the data monopolies to become even more powerful. We need open information to promote innovation while simultaneously building protections for user privacy.
The benefits of accessing and aggregating deep data about people are too large to ignore; so the great challenge is to figure out how to enable the use of Data for Good while also protecting individual privacy and defending against bad actors.
The goal when dealing with data is not only to protect privacy. That is a worthy goal that can be achieved with the right security and legislation. The real goal is a much harder one: to enable society to take advantage of all this data while simultaneously protecting individual privacy. This goal will take some of the world’s most innovative minds to achieve. But it might be the most important goal to work on in our lifetime … so I have hope that we will make a lot of progress over the next decade.
Join Us: We’re bringing together a world-class team, see open positions.
That's it – that's all we do. We want to understand the physical world and power innovation through open access to geospatial data. We believe data should be an open platform, not a trade secret. Information should not be hoarded so that only a few can innovate.