Susan Athey: Tech Economists, Machine Learning, and Causation

July 29, 2021

Episode transcript

[Auren Hoffman] Welcome to World of DaaS, a show for data enthusiasts. I'm your host, Auren Hoffman, CEO of SafeGraph. For more conversations, videos, and transcripts, visit safegraph.com/podcasts.

Hello, fellow data nerds. My guest today is Susan Athey. Susan is the economics of technology professor at Stanford University's Graduate School of Business and a research associate at the National Bureau of Economic Research. Susan was formerly the chief economist at Microsoft. She's also on the board of directors of Lending Club, Expedia, Ripple, Rover, Turo, Innovations for Poverty Action, Proof School, and Research Improving People’s Lives. Whoa, that's a lot of boards. Susan, welcome to World of DaaS.

[Susan Athey] Thanks for having me here. Really excited to be here.

[Auren Hoffman] Okay, now I have a bunch of things I want to jump into with you because you're an expert on a lot of things dealing with data and how to make data kind of sing, but also how to do it in the right way. And one of the things that I think is worth diving into is that a lot of people are doing a lot of A/B tests. And sometimes we end up with this kind of local maximum, rather than actually finding the global maximum through A/B test. Is there some sort of simple heuristics that we can do to guard against that? Have you thought about that?

[Susan Athey] Yeah. So figuring out the right KPIs to optimize for is one of the most powerful things to do, I think. So when I first got to Microsoft, and I saw the A/B testing platforms in the late 2000s, it was still like “Wow, look at this thing. This is awesome”. But then I immediately realized that we were kind of optimizing for the wrong things. And so I asked the question “why are they putting up too many ads, or too many irrelevant ads?”. I went around and talked to people and everybody understood the costs of if an advertiser gets an ad click, and nobody buys anything, they're not going to pay very much for that click. So getting an irrelevant click just devalues the click and ultimately lowers the equilibrium prices. But if everybody believes this, why wasn't the system changing? And then I just started understanding that everybody was optimizing for the A/B testing platform. If you don't get through the A/B testing platform, your algorithm isn't shipped, and no engineer gets rewarded unless their algorithm is shipped.

[Auren Hoffman] It was optimizing for the click rather than from the backend feedback from the merchant, whether that click was actually valuable or not.

[Susan Athey] Exactly. So of course if you're doing an A/B test on something like an advertising platform, the revenue that you get in the treatment group, it's real revenue, right? If you change the relevance of the ads, you can count up the revenue that you get. But if you throw away bad ads, and just put up less ads, that's gonna be bad for revenue in a one day test. Because the advertisers don't even have time to see that you've got a better conversion rate. So there's no way the bids can respond. So getting rid of bad ads is always bad in the short run, basically. So there was no way that the A/B testing platform was set up, if it just looked at short term revenue could ever reward getting rid of bad ads. So that kind of insight then was like this “aha moment” that says, “gosh, it's not just that this is a little bit of a mistake, like the entire system is constructed to not get the right answer”. And then I started asking around, and finding out that like everybody, my friends at other firms too, they all have the same problem, that the A/B testing platforms were pushing them to be very short term focused. And the mechanism like in all the search engines kind of had a similar problem. In other marketplaces, the feedback effects are different. But this idea that in order to have data driven innovation, you want to do lots of A/B tests, they need to be short term. If they're short term, then you're gonna just by definition leave out feedback effects, long term quality effects, anything that's not captured in these short term experiments.

[Auren Hoffman] So obviously, the hard part is actually figuring out what to optimize. Microsoft was able to hire you, you won the John Bates Clark medal. Most of us can’t hire someone who won the John Bates Clark medal. How do we all figure that out in the right way? Do we just have to just constantly struggle or is there some sort of hack to get to the answer faster?

[Susan Athey] There are a number of hacks, but I guess the first one is just realizing it's not that the Microsoft engineers were dumb that I got there. It actually wasn't that I was some rocket scientist who figured something out. It's actually just intrinsic, you have a trade off. And so I think one thing economists are good at is just saying “there's a tradeoff, guys”. You want to optimize for the long run objective, but those things are noisy, innovation slows down and you may not get a signal. But if you optimize for a short term objective, then you're gonna go in the wrong direction. So there's a tradeoff.

[Auren Hoffman] Do people in their head know they were optimizing for short term objectives? Didn't they know how to optimize for the longer term, or they just didn't ask those types of questions?

[Susan Athey] Framing the question, I think sometimes engineers can think, “well, there's a lot of approximations”. So you know, it's just kind of noise. And of course, it's not perfect. So I think one thing is just saying, “actually, no, it's not just that there's noise. It's not just that our metrics are imperfect, but actually, there's a whole category of interventions that we will never pass”. It's not just “we'll make mistakes”. It's like we're systematically going to miss those. This isn't rocket science necessarily, it's kind of common sense of, “alright, now, what do we do about that?” What if there is a whole category of interventions that will never pass our A/B test? So I think one of my first ideas is to say, “Alright, let's recognize that there's some kinds of interventions where the short term and the long term aren't misaligned and so our current system is kind of fine”. So you should just charge ahead with that, no problem. But then in the case of search engines, say, things like ad relevance are not going to be well reflected in the short term experiments. So we need something different for that. And there's different ways you can do something different. So one thing is that you can just put constraints. So you can say, “All right, we're going to still use the short term ship criteria, but if we see some signal, that we're putting up too many bad ads, because we know that's not captured in the short term metric, if you are hurting ad relevance, then you know, we may not ship”. So it's just a constraint, we can't directly measure the long term, but we know that degrading conversion rate is bad and has a long term effect that isn't captured.

[Auren Hoffman] Sometimes they're just kind of like self learning over time and they can move pretty quickly, right?

[Susan Athey] Well, so that's one reason actually that firms don't want to have self learning systems. I'll come back to these other tactics, but just to jump ahead, when you go to think “I'm going to put in a bandit, or contextual bandit, or a reinforcement learning algorithm”, at that point, you have to commit to the metrics, and the system will optimize without you stopping it. And if you have bad metrics, that's a bad idea.

[Auren Hoffman] So that can work maybe really well. And like labeling data or something, like if I'm trying to figure out this is a cat or this is a dog? Where does self learning work and where should we be more aware of it working?

[Susan Athey] Well, it works well when your metrics capture everything you care about. And so actually, both for the A/B testing and for the reinforcement learning, one way to avoid bad outcomes is to only have a choice set that you know is acceptable. So for example, suppose you were doing marketing emails or ad copy. Well, you don't want the AI to make up the ad copy, because they could start writing stuff that was really bad for your reputation, or racist, or God knows what an AI would make up. So humans restrict what the acceptable alternatives are. But then if you're sure, as a human, that all of the alternatives are acceptable, and you really just want to maximize clicks, then reinforcement learning can fire away, as long as you're sure it's not going to go in the wrong direction, and you really care about clicks. So it's basically restricting the set of alternatives. And one way to do that is just to say, again, our short term A/B testing platform, our reinforcement learning only solves one kind of problem. And for other problems, where I know my metrics are flawed, then I'm going to use other tactics.

[Auren Hoffman] Should the button be on the left or the right or something like that, something more simple.

[Susan Athey] Exactly. But even something like email marketing, your reinforcement learning or your bandits might discover the more misleading headline. The more tricky the headline is, the more people will click on it, but then once they open it, the sales per email-open will go down. So you might think they're all acceptable, but then the algorithm is still going to learn something that doesn't actually work. So there's a few other tactics you can do about this. One is you can try to get the engineers to pay more attention to it. Basically, you can go ahead and ship something that's a new algorithm that works or go ahead and launch something, but then have a holdout set, and claw back the bonuses. If the thing doesn't look good, after a few months you might decide which marketing campaign to use first on email opens or clicks. But then if the sales conversion rate goes down later, you reward them on the sales. So you accelerate your decisions, but you withhold the rewards. And what that's going to do is it's going to motivate people to look harder to do that extra work to ensure that their algorithms or their marketing or whatever doesn't have these unintended consequences, which is really maybe a subjective extra layer of analytics.

[Auren Hoffman] I could see how this can work really well, especially at a smaller company where you can kind of understand everything, and you can kind of map it out and you could have a team of 20 people and they all kind of like understand the steps and “Okay, there's an algorithm here, there’s an algorithm here, as you get to a much larger organization, whether it's Microsoft or another organization, there could be 1000s of people involved in a given process, there could be all these steps and if I was somewhere along the process, I might not even know whether the step that came before me was an algorithm or a person. How does one even map that out? Is there like a painting I could make so everyone could be on the same page as me, or how does it? How does one even know?

[Susan Athey] The thing is, you don't want to stop the decentralized innovation of having teams being able to operate, and you want to be able to make hundreds of product decisions in a week if you're a search engine. So there's a few other tactics you can think about. One is peer review. You will not have everybody have to review something, but one team has to present to another team and explain why the category of innovations they're considering have effects that are well captured by the metrics, and they have to convince another team that that's true. And actually, even just having people write down and say “the thing we're doing, you know, it doesn't have any effect on ad relevance it doesn't have this kind of effect, it doesn't have that kind of effect”. So we believe, as a team, that the metrics we normally use capture it well. Another way to say it is “oh, I have something that's going to improve ad relevance. And I know it's going to look bad in revenue in the short term. But my team believes that we need to use these supplementary metrics, or we need to do a longer term test, or some other way to show how good my experiment is, if it wasn't reflected in the current metrics”.

[Auren Hoffman] If you think of a big engineering org, is it easier to do this, if all the engineering algorithms are in some sort of microservices environment, so they've got inputs in and inputs out? And with that, obviously, it still is really hard, but does that make it at least somewhat easier to start to think about these things?

[Susan Athey] There is a little bit of needing to do some big picture thinking, but I think almost every system has the main short term, long term trade offs. And if you ask people about them, they'll know what they are, they'll know that if you're doing email optimization, there's a lot of pressure to get clicky headlines, we all know from seeing news headlines, or what we see on Facebook what the pressures are. So in any particular domain, I think most people kind of know, qualitatively, what's going wrong, and where the A/B tests work and where they don't. So I think you can actually pretty easily delineate the categories of things, and then kind of have rules about them, and add relevance and things that affect advertiser behavior in the search engine, those are things that were long term. So those are some of the tactics, so peer review categories and experiment types. And then another thing is to change the metric. So clicks are just often bad. So having some kind of quality adjustment on the clicks is a very simple fix, but adjusting clicks by dwell time. So instead of optimizing newspaper headlines for clicks, you could optimize them for the amount of time spent on the page that inserts to incorporate quality, you may not be able to fully relate it to the newspaper’s long term subscription value, but at that point, you're using subjective information that says if people spend more time reading my articles, that's a measure of user satisfaction.

[Auren Hoffman] In some ways, it's almost about what time horizon you look at. And obviously, the longer the time highs and the more relevant the information, but then also, the longer the feedback groups are. It'd be great to have a 100 year time horizon, but it'd be really hard to iterate with 100 year clicks or something like that. Is that really what our main constraint is? Just how long the time is, and the short time we can iterate faster, but we might be iterating in the wrong way. In some ways, if you think like brand reputation from email, that could be many months before we would have a sense of how that would go in, that would really hurt our iteration cycles if we had to wait five months before we could get the feedback.

[Susan Athey] Yes, exactly. I think the time one is the easiest to understand. And I have this paper with Hito Emmons and Raj Chetty where we actually explicitly model how many quarters you would have to follow the subjects before you were able to get a good estimate of the benefit of a training program. And so this one was like 36 quarters, and we could figure it out after six quarters, but two quarters wasn't enough, but six quarters was good. So this is actually really an optimization problem of how long you want to wait. If you think about it in terms of Facebook ads or something there's a click, there's a dwell time, there's a visit to a page, there's how long you spend on the pages. There's whether you put something into Checkout, there's how much you spend, there's how much you spend over two, four, or six months. So you can basically think about solving an optimization problem trading off the issues of bias, of time delay and also noisiness. But there's other dimensions besides just time. Going back to the search ads, conversion tracking, not everybody has conversion tracking on. So you could try to optimize for conversions, but then you just have missing data. These are like different settings, maybe you know the customer ID for some customers, but you don't know them for others. Missing data is another example. It's not the time dimension, but there might be better measures that are only available for a subset of your customers, for example.

[Auren Hoffman] Let's say we pick a time horizon of a day or something like that? And then we find out a year later that there's something flawed about it, right? You mentioned Raj Chetty, one of his more famous papers is there was some government experiment, I think, in the 80s, where they moved a bunch of low income people to a higher income area, and they paid for their housing. And they ended up chalking up as a failed experiment, because these people did not really improve their life situation. But when Chetty many years later ran the numbers, he found that their kids actually did improve greatly from it. But you know, this is like many decades later to find out this information. So it kind of depends on your time horizon of looking at this, right? And then once you figure that out, like how do you go back, and then say “Okay, now we've got a year out, we can go back. We obviously can't rerun what we did a year ago, but we can now learn from that, let's say Bing Ads, or something like that”. So are there many different time horizons, things that are going on at once? And they can inform each other?

[Susan Athey] Yeah, so this series of papers about this with Raj, and I've also encountered this in many different organizations. In principle, you either have some existing historical data set and you go back in time, and it may not have an experiment in it, but you might be able to build a predictive model, or maybe even a causal model using historical data. And that might be a much bigger data set, for example. But then you construct from that the metrics you optimize in the short run. I've talked to firms that have said, and some do this, where they have some very long term hold out sets. I think Facebook famously had a holdout set that didn't see ads for many years, so you have some kind of very long term experiment that allows you to understand mechanisms, and then you use that long term experiment to figure out what your short term metrics are. But the problem with doing that, if you don't have it already, and you haven't kept the data or you don't have something, you tell a firm “okay, let's run a long term experiment for two years. And then two years from now, we'll make good decisions”, nobody wants to do that.

[Auren Hoffman] I imagine the Facebook experiment was 0.1%, some small percentage of their users or something like that. And so if you could start to think about how to run these experiments, you could learn a lot from a very small decrease in income or something.

[Susan Athey] Yeah. And so I think a lot of times the economist or the causal inference person in one of these firms will push very hard to have long running experiments or ongoing randomization that allow you to answer these kinds of questions. But they are expensive and for smaller firms, 0.1% isn't enough. I think Facebook did this when they were very young, and they didn't have any ad revenue, they weren't gonna have to explain to investors why their ad revenue fell. One piece of advice I have, and actually, I got this from multiple people and Cheryl knew it from her time at Google, is when you go into these things, set them up before you have revenue, because no manager ever wants to pull back. You can get people to get used to a slightly smaller amount, and have that learning, but it's hard to claw back and give up revenue. It's like a drug, you're addicted to it. It's like opioids, can't you can't call back to get that learning.

[Auren Hoffman] Is there some sort of way of actually picking the right measure so we can actually accelerate AI?

[Susan Athey] There's hacks that are specific, that aren't kind of worth talking about in a general audience, but this idea of quality-adjusted clicks is a pretty generalizable thing.

[Auren Hoffman] Is it kind of slowly adding a little bit more qualitative measurements to things, it's never going to be perfect, but that we get a little bit better each week or each month or whatever it is?

[Susan Athey] Yes. The way that I've handled this when I had lots of time and lots of resources is you really start and say “Well, what are our objectives?” And then which ones are well captured? And which ones are not well captured by short term metrics? Then you have to brainstorm, the ones that are not well captured by short term metrics, is there a hack? Like putting quality adjustment on the click? Or is it that I need to constrain my AI or constrain my system not to do certain things that I just can't measure? So if I talk to a newspaper, I might say, I just can't tell a really scammy article in the data in a day. And people click on spammy articles all the time, New York Times customers will still click on really bad articles. So what you have to do then is say: “All right, you can only propose articles that are consistent with a New York Times brand, I'm going to tell you what the brand is and I'm going to trust you to create articles that are consistent with our brand. And, if you don't, you're not going to have a long term career here”. Then the system only gets high quality stuff, it optimizes, and then maybe, if the system never picks an article about Syria, you can strain it. You got to show something about Syria, that's who we are as a company. So there's a little bit of this leadership, both in terms of constraining the choice set, as well as possibly, forcing the system to do things that you can't otherwise measure.

[Auren Hoffman] In some ways, it's kind of like managing a big sales team, in that you put together these complex commission structures. And they always find a way to optimize something that isn't necessarily best for the business, but best for their own income. And then every quarter, you have to make all these like little adjustments or change the spiffs, or do these other things to get back to the right equilibrium. Instead of algorithms, these are very smart people that end up optimizing for what they care about, which is maybe their paycheck. Instead, it's similar to what the algorithm is optimizing for, but maybe not like the long term best interest of the company.

[Susan Athey] I love that analogy. As an economist, when I came in and saw the A/B testing platform, I said “Oh my gosh, this is like one big incentive system”. And I think the ML people were thinking it was a statistical system. But I was like, “no, it's an incentive system”. And sometimes it was the engineers optimizing. And sometimes the engineers didn't even understand, but they tried 10 things and the one that looked best was the one that optimized for these things. And so there's a whole economics literature of just as you said, where the sales people gaming commissions is Exhibit A of the economics literature. And so when I teach about this now to executives, I say “Look, you're used to setting up a sort of data driven firm, you gave each business unit some KPIs, and you realize they were going to gain it, but you had subjective evaluation”. Plus, people are smart, but they're not that smart. So it takes them a while to game and it takes people a while to copy the gaming. Plus, they kind of know that if everybody sees them gaming, like nobody wants to hire them in the next firm, there's lots of these implicit things. But the AI, or the machine learning engineer who doesn't actually understand their algorithm, there's no brakes on that system. It just optimizes, it goes super fast, it goes right off that cliff, it doesn't care that you burned a lot of resources in one dimension. The biggest limit to this whole misguided image is that self governing AI is going to take over the world or even take over very much at all. The reason that can't be true, is that it's just the same reason that you couldn't govern a firm in the past purely on KPIs, because you can never measure everything that you care about.

[Auren Hoffman] Okay, we built these great machine learning systems. And we want to, hopefully, use them to make better decisions. Let's say I'm an investor and an investor could be picking stocks, or I could be just deciding which particular projects to fund more internally, whatever it is. And I'm using these machine learning systems to help me make the best types of decisions. How are we doing today on that, and how do you expect we'll be doing over time? What are the big things that we need to overcome to get to the next level?

[Susan Athey] The big tech optimists just believe that we're just moving towards a world where entire jobs are going to be replaced by AI, we're just going to take humans out of the loop altogether. And it's just a matter of time before we get there. You all see these great curves where performance is exploding and we're beating a human expert here, and we're beating a human expert there. So it's just a matter of time before that all happens. But again, I think you don't see economists, or people who have designed incentive systems really saying that, because they understand that the big constraint in the end is not just the computing, it's that you can't measure all the things you care about in the right time scale. And so there's actually a relatively limited number of things where you could just let the machines go. Now, if you realize that that's the issue, though, of course, you can still see a great benefit to AI, but you see it more in terms of automating things that help humans make decisions, rather than just replacing humans, in most cases. Now, there's some things where what humans were doing was copying numbers out of one spreadsheet and sticking them in another spreadsheet, and the humans are gone.

[Auren Hoffman] They use the radiologists as an example. Can a machine sometimes make a better prediction of percentage that this is cancer or something like that, than the human?

[Susan Athey] Sure, and they could, it's just you can’t just take the human out of the loop. It is a routine enough decision. But a more common situation is that there's something where it takes the human a long time to gather a wide range of noisy signals. And the decision is sort of low enough value that it's not worth it for the human to gather all of that. It should be different. But things like “should I send a social worker to investigate a family after a complaint of abuse? Or, should I let somebody out on bail?” Those types of decisions, in principle, a lot of human effort should go into each one. But in practice, it doesn't. You have overworked people who are spending a small number of minutes, and so even though all the information might be in the file, or it might be gettable through other data sources, the human doesn't take the time to do that. But then it's not like you want to completely outsource to the AI whether somebody gets out on bail, or whether a child gets taken away. But what these systems can do is they can help prioritize so they can say “Hey, I've gathered all this information, and there's a signal here that says this person's in a risky situation, maybe it's because there's a new adult in the home, who is actually socially connected to gangs”. And so that just raises a flag and says “alright, this is one where we need to look into it. Now, maybe this person is no longer part of gangs, and it's all fine”. It's a flag that allows the human to come in. A human would have to be a very sophisticated Bayesian to incorporate all these weak signals of this information and just doesn't have time to do it. But you still don't want to outsource, you can just assist a human. And I think that's a really easy thing to see happening, and it is happening. But to completely take the human out of the loop, you have to deal with a lot of edge cases and things that are hard to measure that you just may not ever be able to solve.

[Auren Hoffman] One of the cool things like machine learning and basically running tons of experiments at once is that we can more quickly see interesting things in the world, because we can maybe run a million experiments, whereas before we could just run one. But of course, one of the pitfalls of that is that we might learn things from experiments that aren't true, the correlation versus causation phenomenon that might happen. How do we guard from that? And especially in your profession within academic circles? We could end up finding all these random correlations in society that aren't actually accurate. How do we actually make sure that we're running these experiments in a more appropriate way, so we actually get real learning from them?

[Susan Athey] Let me actually step back and answer an easier question. What we teach people today at Stanford and pretty much every other place, if somebody just wants to learn machine learning, we put them through a set of classes, which teaches them a cookbook, so basically says: here's a problem, you can use logistic regression and then a random forest if you're if you're an old lame person, but if you're really cool, then you'll use a neural net. Then we have them download a bunch of datasets and run the models. So they come out super competent, that all those old fashioned people were wrong about everything and now they can solve every problem. One of the ways that we miseducate people, is that in order to make it sound sexy, we use a language of decision making when we set up the problems, but then all they do is run prediction algorithms. And as a result of that, they don't really learn how to think about cause and effect or even think about what you can learn from datasets and what you can't. Because if you think about it, if you spend a bunch of time downloading a classification data set and using a neural net to tell a cat from a dog, you can immediately see from that dataset whether you got the right answer. So a lot of the value added at that point is engineering. And it's kind of a cookbook, you just compare, like you hold out a test set, and you can just see how well you did. And there's a right answer, basically, that is like telling cats and dogs apart. That's all you need to figure out. Not if you ask a question like “which customer should I prioritize?” That would be a business question. “Who should my salespeople call?” Well, if you took prediction methods to the problem, what you would do is just build a prediction model and say, I'm gonna build a prediction model of who's gonna quit the firm. And then I'll tell the salespeople to call the people who look like they're gonna churn. That'll help because certainly, the people who aren't going to churn, probably you don't need to call them. But if you instead run a randomized experiment, and call some people and don't call people, then you could use a machine learning model. By the way, this is the kind of stuff I've developed in my research, you could use machine learning models for heterogeneous treatment effects to see for whom did the call help. It turns out that the people who are going to churn or not, in general, and across lots of different applications, are not the people you want to call. Because some people are going to churn no matter what you do, they're moving away, they don't need your product anymore. So you want to find the people who are at risk of churning, but also who are amenable to a phone call. But if you think about it, you can't really learn that very well from past historical data, because you probably had some system for calling people in the past and you never tried calling the different people. So you really need to run an experiment.

[Auren Hoffman] The data itself is really important. Obviously, if you're not collecting certain pieces of data, or you just don't know certain things, or that it's wrong, it puts you in a very interesting place, which is not going to be where you actually want to go.

[Susan Athey] Yeah, I think actually it's a common way to frame it, that you've clicked on the wrong data. But it's really that you haven't experimented, like suppose in the past, you would call the only big accounts, where you would have no idea what would happen if you called small accounts. You have to have somehow either had a natural experiment or run an experiment if you really want to know the treatment effect of calling on a particular subset of people. But the reason people use these predictive models, and if you're a startup who was selling term prediction software, you would tell people it was about prioritization, but most firms haven't run experiments in the past. So they haven't generated the data that would actually allow them to learn who they should call and who they haven't called. And it actually takes time and effort to set up an experiment and actually optimize for the right thing.

[Auren Hoffman] Let's say we forget to feed in all this data, which is the beauty of what we have now we can start feeding in. Let's say we have this huge panel of people, and we can see all the internet sites they go to and everywhere they went, and every medical treatment they ever had, and everything they ever ate and how often they went to the gym. You can imagine all this like amazing data, and they all opted in, it was great. They wanted to do this for science, how would we even know if chocolate cake was good or bad for you? Warren Buffett drinks Coca Cola and eats dairy queen every single day, and he's doing great at 90 years old. How would I know what I should do in health or in society? Like how do we actually teach these things out to unlock the true secrets of society?

[Susan Athey] That's a really important question. And the answer is you can't actually. If I had a sensor on 100,1000, or even 10,000 people and just saw them walk around, that data is actually useless for understanding cause and effect, unless there was some kind of randomized or natural experiment in the background. So one of the things economists did before machine learning was that we spent a lot of time thinking about finding natural experiments. So if we wanted to do something like understand the effect of military service on labor market outcomes, you would go and think “Okay, what was something that randomly assigned people to military service?” And then you'd have to find a time when there was a draft or a draft lottery number that did that prioritization. There's other types of things where the quarter of birth is going to determine whether you're older, young for your grade in school, so you can sort of figure out whether you get extra years of schooling. This observation has been around for a very long time. But I think the new generation of people who just came in with machine learning, we stopped teaching them because we were teaching them how to do neural nets, we stopped teaching them how to think about drawing inferences from observational data. And we stopped teaching them that there's situations where you just can't. There are theorems that say you literally cannot answer that question. Even if you had infinite data. So you have to learn to think about when you could and when you couldn't answer certain questions.

[Auren Hoffman] Okay, this is interesting, but also depressing at the same time, because I really want to find out some of these big secrets of the world. One of the things that you're kind of a pioneer on is not only being an academic economist, but being an economist actually at a company, and in your case at Microsoft. And I know that you've mentored many other economists that helped go into many other types of companies that are out there. I think I have a pretty good understanding of what economists do at a tech firm and why employers hire them. But I don't know what an employer can do to set it up so that they can be successful there. Is there a certain type of environment to set up for an economist as opposed to like an engineer or product manager or more of a classic type of person, you would expect to be at a company?

[Susan Athey] Yes, that's been one of the really interesting things. I was one of the first tech economists and I actually stopped being chief economist at Microsoft many years ago. So just to say I'm not there anymore, they're actually on their second chief economist after me. In the early days, it was hard to get a really high quality economist to invest in this as a career. But now it's really been professionalized, and so you can get a lot more people to go into it. Amazon has been the most successful at this and they actually have hundreds of PhD economists, they're one of the biggest recruiters and biggest employers. I gave them a lot of advice early on when they were starting, but they really succeeded beyond my wildest expectations.There, they actually have different models, but they actually have a community of economists and then they also work with academic economists who come in and work maybe one day a week, or come and spend a year there to help mentor the young people and help them kind of mature. So one reason that it's actually harder to get economists and have them grow and be successful is that their work is actually not very cookbook, and it's actually hard to know if they did it right. And so if you hire a machine learning person, and their classification thing doesn't work, or if they can't tell fraud from not fraud, or cats from dogs, you can just see that it's wrong. But if what you're trying to do is to use observational, nonexperimental past data to determine whether you should have introduced this new product, or whether something has been good for your brand, or some of these bigger picture questions, it's hard to know whether you got it right, and there's a lot of judgment to it. So younger people need more care and feeding. They also are typically less trained in engineering and they tend to move more slowly. So it tends to work well with the very young people, if you are able to nurture them, and also combine them with data scientists or data engineers, who can help them move faster. But there's different types of economists. When I was at Microsoft and on my boards now I do lots of different things. Strategy is one thing and economists can be really good at strategy, who should you buy? What's the ultimate industry structure going to look like? You know, what's gonna be the value of this additional data? So those are kind of bigger picture strategic projects. Whereas you know at Amazon, people do estimating price elasticities.

[Auren Hoffman] There are a lot of really valuable people who in some ways are much more valuable to a firm one day a week than five days a week, right? And instead, when you were at Microsoft, I assume you were one day a week or maybe two days a week at the most or something, right? But I don't think most companies are set up well to take advantage. They have consultants, of course, that comment, or they use lawyers for like a project type of thing. But an ongoing, one day a week relationship is less common. Are there certain ways that companies can be better set up to take advantage of these experts?

[Susan Athey] This has become more common. Also, computer science experts do this as well, because there's such a shortage of AI and ML talent that companies do whatever they can to get intellectual leadership, basically. The way that it worked best for me at Microsoft, and in other arrangements that I have, is that there was a team with a really good project manager and then I was kind of the architect. And frankly, if I was full time at a company, I don't know that my life would be that different. I would just have more projects, because I wouldn't be doing the coding in any case, except occasionally. So in the end, suppose that as an architect, or an intellectual leader of a project, we would have two one-hour a week check-ins and then engineers would go off and work, data scientists would go and code and then we would meet again. So really, one day a week versus five days a week isn't that different for that kind of work as long as you have really good project managers who really understand what you're doing, or where you're on the same wavelength and you really are compatible. I think that the “one day a week” thing can be harder for the big decisions. So the fact that I was very immersed with Microsoft was really helpful when we said “All right, like, how much should we pay for Yahoo?” That's a super strategic decision and you need trust, context, and to be able to drop everything and work on it. But for repetitive type things where you're trying to understand alternative pricing models, or you're trying to do new product development and there's an economic element, I think somebody can meet twice a week with a really good team and add a lot of value. And as I said, that's kind of what you might do if you were full time anyways. So I think firms have done really well in getting top talent. Frankly, the best academics are often not very good at execution. They don't know how to manage people, and they don't get stuff done on time. So you don't really want them. You want them paired with a project manager.

[Auren Hoffman] I've got a couple personal questions. I've been meaning to ask you for a long time, one of them. But you recently found the gold capital, social impact lab at Stanford Business School. What is that and why did you do it?

[Susan Athey] Well, so we've talked a lot about the interesting problems that I've worked with all of these different tech firms. When I stopped being chief economist at Microsoft, I still had that bug to want to do the same kind of research. So first of all, I had like 20 papers in my head. I wanted to write about short term/long term outcomes about heterogeneous treatment effects. And so I started writing those methods. I felt like I needed to still be using them in practice, but I knew I wanted to publish even more from them. I realized that a great alignment was to work with social impact organizations, because they were happy for you to publish, and they were even more desperate for the talent and the help. So there was this great match and alignment of interest. Once I started doing that, I realized that I could actually make this a thing. So I have this whole university full of engineers and computer scientists who are desperate to apply their work to something important, rather than do a summer internship targeting ads, no offense to the ads, but like, why not do something really fun while you're in school? The social impact firms can't really hire that talent because it's hard for them to screen and half of them don't work out, they come for a short internship, they don't do anything. So the gold capital social impact lab partners to make social sector organizations more effective. I have long term relationships, so we get over all the contracting stuff. I'm kind of a human capital intermediary where I kind of screen people, supervise them.

[Auren Hoffman] You’re kind of like Accenture for the social impact companies.

[Susan Athey] Exactly. Accenture publishes case studies, but we're much more focused on creating generalizable insight. It's also incredibly educational for these young people, and then some of them will go out and decide ” no, I want to do an EdTech company, I'm going to do a training tech company. Because I've seen what this is like, and I'm going to do that instead of going to one of the big tech companies”.

[Auren Hoffman] We've been friends for 14+ years or so. And you're one of the best people I know at being a personal CEO, at basically getting leverage in your life. Are there one, two or three things you could tell us as a personal CEO, what your hacks are and that we can learn from you about what to do?

[Susan Athey] I would say I'm constantly a work in progress. One of my insights, a while ago, when I was young, is I used to just feel like “ Oh, I didn't get x done, or I'm procrastinating on x, so I'm a bad person”. So I should tell myself “bad, bad Susan, you work harder, be better”. And then at some point, I realized, “no, I should just realize that x is hard for me”. So I need to come up with a different structure, a different system that gets x done. Sometimes I used to meet with a friend in a coffee shop, and we would both do the thing that we didn't want to do, but it was important for us to do where we were procrastinating or was painful. And now of course, I can hire people or people on my team. So it's just sort of figuring out, what is it that you do naturally and you don't need a lot of structure on and then give yourself a time to do the things that are joyful and self motivating, and then figure out a way to either delegate or force the things that are important for you to do, but that you naturally don't do, and make sure they get done. And then the things that kind of clog up your brain and are the gating factor for getting more done. Just find a way to delegate those.

[Auren Hoffman] One of the things I think you're good at is, and maybe it's your economist brain, but I think in your personal life, you're good at, in some ways, using money for leverage as well. So there's this thing, I'm willing to part with $20 on that, maybe I don't want to part with $20, but I'm willing to part with that, and some other service or person or automation is happy to take my $20 and do this thing that I don't want. I think most of us are trained to see them in business, but we don't see them in our life as well. Is that fair? Is it because you're an economist that you can figure these out?

[Susan Athey] There's some bit of a moral thing, actually. Both my parents grew up in rural Alabama, and we never hired anybody to do anything my entire time growing up. We fixed everything ourselves, and cleaned the house and everything ourselves. So it was pretty hard. Thankfully, my parents actually were super supportive of me, outsourcing things, actually. But from a business perspective, you're right, it makes sense. When I talk to especially women and young mothers, I try to really get people to think about it rationally and say, “your kid has no idea who cut up the carrots. If you're cutting up the carrots at midnight, while your kid is asleep, that is something somebody else can be doing. And if you can, then do that.” And also, if you can just spend a lot of money, there's a period of time where you just feel like you're gonna die as a young parent, like you just, it's never enough, every minute somebody is sad, because you're not doing something. So you hide in the bathroom and cry in the two minutes you have. Somehow kids agree and understand that you need to go to the bathroom, so that's the one time they give you like two minutes. But besides that, you don't get any slack. But I think, if I say “get a cheaper car, get a cheaper house and spend up to your entire amount that you're making, while you have young kids so that you can get through that, because if you stay in the labor force, and you stay in a job that is is the best job you can have, where you have the most expertise. Later on when they're school age, you'll have more power, you'll have more autonomy and your time, if you've got a really good job, and you're valued there. And then later on, it won't be so hard anymore”. So just like spending the money to not be miserable, and to be a functional sort of alive, happy human being and get another hour with your kid when you're not folding laundry or cutting carrots or whatever it is, is essential. If people don't spend that, then maybe they'll drop out of the labor force, which is crazy, because then you're making nothing for the next 20 years. So like you should at least spend up to the money that you have to just to just get through the time period and make sure that you're emotionally present with your kids instead of chopping carrots and doing laundry during the one hour that you potentially have with them while they're awake.

[Auren Hoffman] I think this is amazing. The last question we ask all of our guests, which is, if you go back in time, what advice do you wish you could give the younger Susan, to take?

[Susan Athey] Well, I think I actually mentioned it earlier. And this is now what I give my younger colleagues too, is instead of giving yourself value judgment on your weaknesses, just approach them as problems to be solved, and sort of take a more objective approach to yourself. And so instead of thinking that you have to do everything, that you're a bad person if you don't do something that you're stupid, if you can't do something, and then trying to hide it from everybody, get past it. Instead realize no human being can do everything. Everybody has weaknesses, everybody procrastinates, everybody that's sitting in a room, if you don't understand something, probably somebody else in the room doesn't understand it either. So kind of getting comfortable with that, being honest with yourself.

[Auren Hoffman] There are some things you're bad at, you want to go to mediocre, there's some things you're mediocre at, you want to go to good. There's some things you're good at, you want to go to great. You have limited time to invest, how do you know where to invest?

[Susan Athey] I think it's really hard to make yourself do stuff and be really great at it.

[Auren Hoffman] So just go where you're excited to invest your time anyway or something.

[Susan Athey] Yeah, people were like, “Susan, why do you work so hard? How could you have all this stuff? And why are you still working at midnight or something?” And I'm like, I'm excited about what I'm doing. But I'm worse than most people and making myself do things I don't want to do. People think that I'm somehow a workaholic, but I'm not. I just get really excited about stuff. And so basically, the more stuff in my life that I'm excited about, the more productive I'm going to be, but I'm worse than most people about procrastinating on things I don't want to do. And so I just sort of feel like it's like finding, picking the thing that inspires you, that you that you that we're it's fun to do. It's just really hard to make yourself do stuff. Now that said, there's a lot of shit work, young people think ”Oh, I'm going to find a job that's perfect”. Every single job has work that you hate. My job does too and I hate big parts of my work. And I still have to do it, and I have to do it on time. But I'm not going to make myself miserable because I'm bad at it. Instead, I'm going to put structures in and say “okay, I'm going to do that from nine to 10 today and I'm going to show somebody what it is at the end. Because if not, I'm going to surf the web from 9 to 10 instead of doing the thing that I hate, so it's finding the hacks.

[Auren Hoffman] You’ve been completely awesome. Thank you so much. Where can people find you on the internet, Susan?

[Susan Athey] You can google me or search for me on Bing. Because as professors you can find me everywhere. So yeah, it's Susan Athey Stanford will get you to me.

[Auren Hoffman] Awesome. All right. Well, thank you very much. Thank you for being on World of DaaS. All right. That was good. That was a lot of fun. If you're in the shower, at some point, you're like, I hated this thing. I wanted to change this or whatever, let us know. And, it was really good. I really enjoyed it.

[Susan Athey] Yeah, me too. That was fun. You're so good at this, Auren.‍

[Auren Hoffman] Work in progress, I am starting to learn. Alright, thanks. Thanks again, I´ll let you know it will probably go live. Probably a month-ish from now. But we will definitely give you all lead time to know when all that stuff is going to happen. Great. ‍

[Music playing]

[Auren Hoffman] Thanks for listening. If you enjoyed this show, consider rating this podcast and leaving a review. For more World of DaaS (DaaS is D-A-A-S), you can subscribe on Spotify or Apple Podcasts. Also check out YouTube for the videos. You can find me on Twitter at @auren (A-U-R-E-N). I’d love to hear from you.

Transcript

Susan Athey, Professor of Economics at Stanford University, talks with World of DaaS host Auren Hoffman. She is also a research associate at the National Bureau of Economic Research and was formerly the first Chief Economist at Microsoft. Susan is also on the Board of Directors of Lending Club, Expedia, Ripple, Turo, and more. Susan and Auren dive into the role of tech economists, machine learning, and causal inference.

World of DaaS is brought to you by SafeGraph & Flex Capital. For more episodes, visit safegraph.com/podcasts.

‍

Listen on

Apple Podcasts

Spotify

Youtube

Share this episode:

Susan Athey: Tech Economists, Machine Learning, and Causation

Listen to more great episodes

Hilary Mason: The Rise of Data Science

Scott Stephenson (CEO of Verisk): How Data Sharing Transformed Insurance