Key Takeaways
- Responsible access to large datasets can unlock breakthroughs in economics, healthcare, and public policy.
- Longitudinal tax data has reshaped our understanding of income mobility in the United States.
- Healthcare data, when securely analyzed, can improve cancer treatment and accelerate life-saving drug development.
- Credit card and spending data reveal real-time patterns that support public health and economic planning.
- The central challenge is balancing innovation with privacy safeguards, not choosing one over the other.
Recently “data” has been an ugly word. Queue up news on Cambridge Analytica or GDPR. But like many technical advancements, data can be used for both the benefit of society and can also be used by bad actors to attack society.
Data is currently being used to advance our understanding of society
Data about people, when used correctly, can significantly increase our understanding of society and has the potential to better the lives of everyone in the world. Better understanding of data can lead to more effective cancer treatments, reduction of traffic in cities, greater understanding of income inequality, a deeper appreciation of why people vote, and much much more.
Chetty’s IRS partnership leads to insights in income mobility
Professor Raj Chetty (of Stanford University) has been publishing some of the most interesting economic papers this decade because of his unique access to IRS data. Chetty has access to almost 30 years of tax records. Through this data, Chetty (and his colleagues) have been able to study income mobility in the United States in a way that was never able to do in the past.
Of course, the IRS data is extremely privacy-sensitive (it includes the tax returns of everyone in America) and must be handled with care. Chetty has to take many precautions when using the data (including accessing it only in IRS clean-rooms). The precautions are incredibly necessary because of the sensitivity of the data.
The collaboration between Chetty and the IRS have had a significant impact on our understanding of society. For example, Chetty found that the share of US children making more than their parents dropped from around 90% in the 1940s to just over 50% in the 1980s.

Saving lives in cancer treatments using data
Like IRS data, mobility data has the potential to be extremely sensitive. Analyzing this data can have incredible benefits to society that make it worthwhile to do the hard work to aggregate it and enable the positive benefits while protecting against possible abuse of the data.
While IRS and mobility data are sensitive, health care data is even more personal. And while there are a lot of protections of this data (like HIPAA) many institutions have deep medical data tied to exact-person identities. If used properly, healthcare data can save lives and massively enhance quality of life. But the data also has the potential to be used to take advantage of patients … so protection and controls are paramount.
IBM Watson has made a lot of strides in analyzing oncology data for good. IBM partnered with Sloan Kettering, one of the top medical institutions in the world, to analyze cancer treatments and outcomes. The hope is that the data will help doctors optimize treatments for individual cases to lead to the very best outcomes.
Flatiron Health (recently acquired by Roche for $1.9 billion) has also made significant strides in getting better cancer treatments. In 2014 the company raised $130 million and used the bulk of that to acquire an oncology CRM software company that also develops electronic medical records systems. They are using the proprietary data to positively impact the lives of potentially millions of people.
In other areas of the healthcare field, Datavant is working to use health data to help life-saving drugs come to market faster. (conflict note: I am an investor in Datavant).
Credit card data can really help understand society
Some of the most interesting data is individual credit card data. There are about a half-dozen companies that sell credit card data (many at the individual level). This data, of course, is very private and we would never want an individual’s data to become public.
Credit card data offers significant insight into society. We can even learn a great deal from the mundane, like Second Measure’s study that more people buy flowers on Mother’s Day than Valentine’s Day. (conflict note: I am an investor in Second Measure). More interesting, we can predict flu and epidemic patterns in specific areas based on spending data. This could allow for better deployment of public health resources and lead to significant life savings.
We must re-envision credit card data as a powerful tool to improve society (and not just to optimize the betting algorithms of quant funds on Wall Street).
Using big data for policy problems
Professor Susan Athey (of Stanford University) wrote (in 2017) a seminal piece in Science on how societies can better take advantage of data to help people. More access to data has amazing potential to transform the efficiency of cities, provide better health care, and help policy makers to grow the economy. Large, deep data sets about people have the potential to unlock some of society’s greatest secrets (like how to have a successful marriage, more effectively raise children, increase happiness, and increase altruism).
The opportunities for benefits are massive, but to reap those benefits, we also need to build protections. We can develop smart policies that both protect people’s privacy while enabling innovation. We need open information to promote innovation while simultaneously building protections for user privacy.
The benefits of accessing and aggregating deep data about places and people are too large to ignore; so the great challenge is to figure out how to enable the use of data for good while also protecting individual privacy and defending against bad actors.
The ultimate goal in dealing with data is to enable society to take advantage of all this data while simultaneously protecting individual privacy. This goal will take some of the world’s most innovative minds to achieve. But it might be the most important goal to work on in our lifetime … so I have hope that we will make a lot of progress over the next decade.
FAQ’s
1. How can data improve society?
When used responsibly, data enables better public policy, improved healthcare outcomes, smarter urban planning, and deeper economic research.
2. How did IRS data help research income mobility?
Long-term tax records allowed researchers to measure intergenerational income trends with far greater accuracy than surveys.
3. Can healthcare data really improve cancer treatment?
Yes. Large clinical datasets help identify treatment patterns, optimize care plans, and speed up drug development.
4. What insights can credit card data provide?
Spending data can reveal consumer trends, detect early signs of epidemics, and inform economic and policy decisions.
5. How can society balance data innovation with privacy?
Through strict safeguards, anonymization, secure access environments, and clear regulatory frameworks that protect individuals while enabling research.