Academic Distinctions: A Podcast to Make Sense of American Education

007: How much research is enough to know...anything?

Zac Chase & Stephanie Melville Season 1 Episode 7

Send us a text

Stephanie and Zac talk with Dr. Maggie Beiting-Parrish about the possible intended and unintended consequences of John Hattie's work and what we should be asking when we see research in education and beyond.

Stephanie:

Hey friends, welcome back to Academic Distinctions. This is part two of our episode on John Hattie's work. In the first part, we talked about why our brains love lists and order and what makes things sticky. And we introduced Hattie's meta-meta analysis and its impact. This time, we're digging into why some say the math might not really math. And we'll talk to Dr. Maggie Biding-Parish, whose research specializes in quantitative methods, to hopefully learn a little bit more about whether or not any of this means, well, And

Zac:

we're back. So eventually, all of this love of Addy kind of broke down. And stats and researchers... Nope. So eventually... This love affair with Hattie. Woke up the next morning, looked across the pillow and said, something's different. More specifically and less rom-commy, statisticians and researchers looked at the math and noticed there were some problems.

Stephanie:

Yeah. So here's the thing. Most teachers, most people have little to no experience or exposure to two things. Number one, research methodology. Research follows the scientific method that we all learn in school, right, Zac?

Zac:

You have a hypothesis, you design an experiment, you conduct the experiment, and you compare what happened to what you thought would happen. And then you run it again with a different group. Lather, rinse, repeat, and repeat, and repeat, and repeat, and repeat, and repeat.

Stephanie:

And the other is statistics and probability.

Zac:

Studying the likelihood that stuff happens. or happens by chance, and then using those numbers to make predictions about what might happen if you do this, that, or the other.

Stephanie:

So to get us through the rest of this conversation, we're going to provide listeners with some terms someone needs to know in an age of research and effect size. So the first term is sample size. How many things were selected to look at? In Hattie's meta-analyses, this could be either the number of studies he looked at or the number of students the studies looked at.

Zac:

The next one is the term statistically significant. And this is the likelihood that something is happening because of something you did or an intervention, or if it's just happening by total chance.

Stephanie:

And the last one is effect size. In this case, Hattie used Cohen's d. It compares two groups, one group that got a treatment or intervention and one that did not. And when you run that calculation, you get a number. That number is essentially ascribed to one of four categories, which are closely aligned to probably not likely, maybe a little likely, sort of likely, and probably likely. But the actual number doesn't mean anything, unlike in math and stats, where numbers usually have a context associated with it.

Zac:

And that's really important. We're not saying that effect size is bad. It means what it means, but it doesn't mean something exact.

Stephanie:

Yeah. So what was not so great was how Hattie determined effect size. Hattie took a ton of studies and put them all together into one big group and basically said, aha, these things work better than these other things. And I know this because of my calculation of Cohen's d.

Zac:

Which sounds, especially to me, as somebody who has minimal stats training and a degree in English and other things with words that are not numbers, like it isn't a big problem. So Stephanie, what is the big problem?

Stephanie:

So there were a mixture of different types of data, different experimental methods, different sample sizes.

Zac:

That's a lot of difference. And different feels like it's a problem when we're trying to think about things being the same.

Stephanie:

Right. So you've heard of the saying, an apple a day keeps the doctor away, right?

Zac:

Yes, I did. And did you know that it has its origins in a Welsh saying from the 1860s, which goes something like, eat an apple on going to bed. and you'll keep the doctor from earning his bread. Did you know

Stephanie:

that? I didn't. I did not. Thank you for bringing that to my attention.

Zac:

My guess is that you were not looking for history of adages, but maybe you mean there are a few things left out of consideration in An Apple a Day Keeps the Doctor Away. Like, I might eat an apple every day, but it's not going to keep me from going to the doctor if I have some other kind of health issue that apples don't fix or probably don't fix. Like, say, if I fall and break my leg.

Stephanie:

Right. Rubbing an apple on it is not going to help. They aren't super fortified with calcium, so it's not going to aid in keeping your bones strong. Apples don't help there.

Zac:

So what does this mean? Why are we talking about apples and doctors?

Stephanie:

Yeah. So let's pretend you want to test the apple a day theory. You first need to design an experiment. Let's say your first experiment gathers 200 people and you force everyone to eat an apple every single day for a year. And then you total up the number of times folks went to the doctor. That might not hit a home run.

Zac:

Because you have nothing to compare it to. You only know about people who ate apples every day.

Stephanie:

Right. So let's go back to the drawing board. And now let's say you have 200 people, but you randomly sort them into one of two groups. 100 folks who have to eat an apple every single day for a year. And then 100 folks who cannot eat an apple at all for a year. Is that a better design?

Zac:

It sounds better because then you are making a better comparison. I know what happens if I eat apples versus what I don't eat apples. And because everybody's randomly assigned, I can say these groups are probably similar. Math, math, math, math, math.

Stephanie:

Yeah. Yeah, sure. That's, that's great. So let's return to the drawing board again and

Zac:

say, Hold on. What? There's a lot on this drawing board. Let me erase it.

Stephanie:

Oh, okay. All right. Let me know when you're

Zac:

ready. Yeah,

Stephanie:

go ahead. Okay. So let's say you have 200 people. And you let them pick the groups they join. Five of them choose to eat bananas every day. 20 choose to eat apples. 100 choose to eat oranges. And the remaining 75 choose to eat fruit salad. All five of the people in the banana group never went to the doctor. What can you tell me?

Zac:

I feel like you're trying to get me to be like, bananas are the best. Because on paper, it would look like a banana every day kept me from going bananas. to the doctor, but I feel like it's also a trick question. Isn't it possible that the people in the banana group also had other things in common? Maybe they didn't have access to a doctor, no health insurance. Maybe they were on a multivitamin. Maybe they were just healthier in general. There are a lot of things at play here. If I have asthma and I'm allergic to bananas, but I hadn't been allergic to bananas, then that was the group I chose. I would have impacted the number. I don't know. It doesn't make sense.

Stephanie:

Yep, yep. So keep that feeling in mind. And what I want you to do is take all the experiments we just designed and combine the findings and say broadly, bananas have the best impact on your health, not apples. In fact, we're going to use Cohen's d and say the effect size is 0.92, which is considered, just for whoever's wondering, a pretty likely value. How do you feel about that?

Zac:

I don't feel great about it because I don't know who was in the groups. Like if we're still on the fruit issue, like, are we saying everybody had a banana? Everybody had an orange, no bananas, no oranges, all apples.

Stephanie:

Yeah. So, so that's the issue in a nutshell. The issue we have is not with effect size, but what had to use to get it and the reporting of it in his chart. We don't know enough about the designs of the experiments, whether they were good or bad, how the data were collected, the sample sizes, or even what the possible responses could have been. He just combined them all into one big, beautiful meta meta analysis. And when statisticians called him out on it, he basically said, well, I didn't think I was going to include that list at first. And then I did almost as an afterthought. And sorry, I can't help it. If you didn't understand the things I said, the way I meant them to be said.

Zac:

Also, I think it's a, important to point out as we talked with gee that that list is the thing that gets shared the most from this work people are familiar with if not all 138 or 200 however many there are in the current incarnation of addy people are familiar with that list and they love the order and hierarchy that it provides so even if you didn't intend it this seems like what ended up happening

Stephanie:

Right. And honestly, you know, some folks have said it's good enough, that even if you take out the bad studies, the effect sizes are relatively unchanged, so it doesn't matter. There's a list, there are numbers, there are rankings, that should be enough.

Zac:

But at the end of the day, where we had our apples, teachers do not have time for this. We might look at Hattie's book or we might just see the slides where we capture the top of that list in our professional learning. And our school administrator says, do these 10 things. This is all we're going to do. But they don't have time for it. So when we come back, we're going to talk with Maggie. Oh, biting. She's muted. So when we come back, we're going to talk with Maggie Biting Parish to help us make sense of education research. whether Hattie's work was bad or good, and what the unintended consequences may have been. And we're back, and we brought some reinforcements. It is my pleasure to welcome to the podcast Maggie Biding Parish, PhD in Educational Psychology and with a focus on quality. Nope, not qualitative. People would have been angry. All the scientists in the world would be like, who's she? Quantitative, meaning math, methods. I say it means math. I know it doesn't mean that. I know I'm simplifying things. I'm just a tiny brain. Maggie, you listened to that last segment and our discussion of Hattie and his research and where he may have gone off the rails. What do you think of Hattie?

Maggie:

So I think in general, it is... It's important to gather together all of the research that exists around different topics, especially since they're so far flown across different countries and contexts. This was a really great attempt to combine it and kind of gather all this different research and all these different domains and subjects into one sort of easy to use place. I thought that was actually a really noble thing to do here, especially since it combines the work of like 50,000 some studies or something. So I think that's good. I think some of the long downstream consequences of this work are... or where the problems start to come in.

Stephanie:

What consequences? What do you, what do you, what, what?

Maggie:

Well, I think especially in some like PDs and things, people just sort of took his ranked list of the effect sizes and have just sort of treated that as sort of like a 138 line long 10 commandments and just kind of worked on the list. Like this is the best thing we can do. So let's just work down the list. Anything towards the bottom that has like a negative effect size or very small people just choose to, kind of removed from their practice or kind of downgrade or don't focus on. And I think sort of that ranking might have created a sort of false belief for a lot of people about like what's the best and what isn't, or this belief that something in education could be the best practice.

Zac:

And I'm going to talk about that a little bit when I get to my thing on this question. Actually, a good thing about what you just said was a bad thing, but we'll hold on. Stephanie, you've now like, showered in the thoughtfulness of Hattie and what works and why we like it and where the math went off the rails. And as a person who works in data literacy, where are you coming out on this Hattie discussion?

Stephanie:

I just have a hard time with it. You know, I was a teacher, a classroom teacher when Hattie's work came out. And it was one of those things where I got to sit in PD and all over the place being told that I need to focus on these moves because these moves were the ones that were going to work. And as a person who likes data and data science and statistics, it's like, no, that's not what this says. That's not what this says. And... Like, there's good there. Like, there's something to point at. But with... when you look at it and you, you point it as though it's going to work because it worked in this particular circumstance. And you, you, you walk around as though like, like what you were saying, Maggie, that's the 10 commandments is the gospel truth. It's like, that's, that's not how data works.

Zac:

And I think anybody who's been in a classroom as a student or ever moved from one school to the other as a student, um, or gone from high school to college knows that location matters. And when, when, when we're thinking about how teachers teach and how students learn. So I think that for me, the big aha is, oh, we weren't comparing things that were necessarily the same. And so that makes me worry a little bit about exactly what Maggie, you just said about, well, let's rank it and do these top 10, right? So that was, again, I was a teacher when Hattie's work came out as well. And that was how it was shown to us. And we had the 20, right? These are the 20 practices we're really going to focus on this year. And it became really difficult to do anything that wasn't those 20. So I think that was an issue.

Maggie:

Yeah. So I think a big thing is Hattie is located in New Zealand. And so a lot of the studies he pulls down are also, you know, some of course represent American classrooms, but they also represent New Zealand classrooms, Australian classrooms, schools in Japan and South America, like all different places. And it's all across all different contexts and all different groups of students. So yeah, you have to take the sort of those overall average effect sizes by a study with a pretty big grain of salt because you're lumping in students who might be from very different contexts, age groups, grade levels, subjects, classrooms, all different teaching styles, all different things, and kind of distilling down that one finding, that one number based on just like one set of findings without taking the whole larger context into consideration.

Zac:

I think he tries to answer that a little bit in the introduction where he talks about, you know, I started writing this book here. Then I was in Australia. Then I was in North Carolina. And then I was in New Zealand. So I think he tries to answer that by saying, so I've seen education in a lot of places. And I believe that that is true. And people have kind of belabored that point. But I think that what you're saying is well taken. So those are kind of the flaws, right? Those are the kind of worries and where we feel like we should itch and chafe and push back against some of this. I wonder if we can think about the positives in this space. So for me, I think about a saying I heard from Richard Elmore, which was language is culture and culture is language. And so as much as I just said, we were given the top 20 and said to do these things. If I were a school that had or an institution that had a broken culture or no culture or a lot of really new people to do a job, being able to look at those 20 things and have a shared definition of what we are working toward, I think would be a really, really helpful thing to say, all right, what is it I'm supposed to be doing if I'm new at this? Or what are the expectations of us in this space? And what's the common language so that when we come together in a faculty meeting, what can we all talk about doing? And if those 20 are the thing that we're focusing on, that does give me a chance to build a common language and thereby a common culture.

Stephanie:

Yeah. So I think it is what's nice about this work is that it really kind of brought to light the magnitude of research that had been done in education. You know, one of the things that when I worked in the Department of Education, it was really surprising how much research had been done and funded. And like, it never went anywhere into the classrooms. Like, it wasn't until I worked for the US Department of Education that I even knew the What Works Clearinghouse existed within the Institute of Education Sciences. And so knowing that there was this magnitude of ed research that had been compiled into a singular like report, essentially, as something that could be used as, you know, like a rudder, if you will, for for schools that needed to know where to go, I think was a pretty positive aspect for it.

Maggie:

I think additionally, one of the really strong positives is that it I think there's a lot of you know, popular discourse right now that's like everything's the teacher's fault and everything they do is their own fault. And I think what's nice about this is it kind of breaks down the different aspects of learning, whether it's like the teacher, the curriculum, the student aspects, the school aspects, like it kind of brings in all these different elements of the whole picture of what it means to learn in a classroom. And while it does treat those kind of separately as their own individual pieces, it does show that there's all these different aspects that go into teaching and learning that aren't just, it's a teacher and she's not doing well. And so I thought that was great. And also it gives a lot of different examples and different kinds of approaches you can use.

Zac:

I know that we are focusing on the positives right now, but you just brought up something we haven't talked about at all in this episode that is a thing that drives me completely banana pants. And that is that like the teacher as the most important aspect in a classroom and the way that even though you said it's not just like there are a lot of different factors that that finding or has been used to as a weapon. Right. Like you are the most important or effective or impactful factor in a classroom as a teacher. So it isn't like as though teachers did not know that. And I think that when we look at effect size and we don't know what those numbers mean and we see that it's bigger than the other numbers, We think that we have to, like, that that is the thing that if we could just change how teachers do this or think about these things, then everything else will come out in the wash. If you're listening to this podcast and you are not in education, this is a, every teacher you know or have ever met has struggled with the fact that there is, yes, they are the most important factor in a student's education, but they are not the only factor and they cannot on their own overcome things. If you scroll down to the bottom and see the negative effect sizes, One of those pieces is depression, right? Depression has a negative effect size on student learning.

Stephanie:

Shocking.

Zac:

Well, yeah, this is not a surprising finding. I don't think anybody looked at the findings and was like, oh, really? But it does say to me, like, my question then becomes, I think it's like a negative 0.6. I don't remember. I'm not looking at it right now. can that positive teacher expectation of a student overcome depression? Can a positive teacher expectation overcome poverty, hunger, anxiety? Like, I think that is an issue that gets lost in this ranking. I have a question for you, Maggie. Talk to me about negative effect sizes.

Maggie:

Sure. So, In the site, they did find a few. A lot of them weren't very strong negative effect sizes, but there were some. Essentially, like you mentioned with your depression example, they are factors that they have found have a net... If a group has a depression versus a group of students who doesn't, the kids with depression would have lower achievement levels than the group of kids with no depression. And so another example they gave was... the effect of long summer vacation on student learning. Again, this is a very, very small negative effect size. It's like negative 0.09, so basically zero. But again, that would say that the more summer vacation you have, there is a negative impact on your achievement or your academic achievement, at least at the beginning of the school year when they measured it. But even that was sort of problematic because that value is only based on one meta-analysis. And so And that only had a few studies within it to begin with. And so again, if you just took that one value, you would say, oh, I guess we should shorten summer break for everybody. But if you really thought about it, it's like, that's not very, very negative or very strong. And so it's very close to zero. And so we shouldn't cancel all of summer break just because this value was right on the bubble of zero. It kind

Stephanie:

of is interesting to me to hear that you know, an extended summer break as only a minimal negative effect size for how often we hear about the summer slide, right? And how to mitigate summer slide. It's like, well, is it really that drastic of a slide at that point? Like, if it's only a tiny, are we freaking out about nothing? Or is it just as an impact of or is it just as a result of you know, not having enough data to back it up. Would there have been a greater negative effect size if there was more data to go behind it?

Maggie:

Right, exactly. And I think that meta-analysis was in the mid 90s. And so I would be curious to see if you updated that now, how that value would change if it would change at all. I suspect it would.

Zac:

Okay, so those are positives. Mostly, we tried to stay positive. It was a little difficult. Cynics that we are. I think it's interesting as we go through the book and in fact, in the first edition, Hattie says, this is what I'm not trying to do, right? So there's at least a nod to unintended consequences. And I keep thinking about the law of unintended consequences and our conversation with G at the top of the episode and what people latch onto, you know, what is the worst consequence of our best idea? So Maggie, again, not ragging on him, but, What is the possible harm of all of this?

Maggie:

I think one of the possible harms is you're combining a lot of studies together to get these sort of average effect sizes, this Cohen's d here, that may not actually be super related to each other in any other really form. There's a lot of different contexts and a lot of different aspects and different, as I mentioned before, different countries, different grade levels, different subjects, all different things, all being combined into one place. And then furthermore, I think this maybe was one of the sort of purposes of creating this, but in education research in general, replication rarely ever happens. In fact, one study found that only about 0.13% of studies are replicated in educational research. And so I think part of the logic of why this was created was to help bring research together, even if it wasn't exactly the same study over and over, at least to try and find these sort of general patterns or things across sort of the same types of approaches and learning curricula and things like that.

Zac:

Stephanie, as somebody who thinks about data science a lot, Maggie just said these things aren't replicated. why do I care that education research is not replicated? I

Stephanie:

mean, part of the design of like the, the data science process is that it's iterative, right? You, you ask the question, you collect the data, you analyze the data, you come up with your, like your, your findings, and then you go back to the drawing board. You do it again. It's not just a one-time thing, right? So if I'm in a classroom and I'm teaching a lesson and, to students, one of the things that I need to do is I need to consistently check for understanding. If I wait for the very last day of the unit to give the assessment and, you know, two-thirds of my kids don't pass it, that says something. But if I'm doing repeated checks for understanding, that's going to give me a better insight as to what my kids need assistance with, right? And the why it's so problematic that we're not repeating things, but then they get touted as what works after that singular study. If you can't replicate it, then how do you know that it works? And I feel like that's a big thing in ed research in general, right? We do these studies, we do this research in a particular set of classrooms, in a particular location, And then we don't go and try to replicate it in a different state or a different district or different grade level. Right. But suddenly this is labeled as something that is beneficial to all students. Good teaching is good teaching. Like, well, but OK, but every classroom is different.

Zac:

In a larger context, this makes me think of the initial conversations we were having about a decade and a half ago around climate change. Right. Those who criticized climate change. change activists and those claiming we needed to take action said, well, yeah, but you're just one study, right? And so what the scientific community did was say, we're going to do these studies over and over and over again, and we're going to make them comparable so that what we think here in Greenland or Denmark or the US or a country named here, we're able to replicate those findings so that we have greater certainty of what's going to happen. And that as much as there still are people who don't want to pay attention to that evidence, as there are in any field or on any topic, more people have signed over to the, oh, we should probably do something about that. And so replicating that when we're talking about the education of human beings also seems important.

Maggie:

But it's also really difficult to do. I'm sure anyone who's taught has the experience of teaching a lesson in the morning to one group of students and it went beautifully and went excellently. And then it's the exact same lesson, just a different group of students in the afternoon and it bombs out terribly. So even with the same teacher, same lesson, same school, same classroom, controlling as many factors as you can, doing that exact same lesson might be very different in the afternoon from the morning or even Tuesday versus Thursday or something like that. So this is very difficult to do because there are so many factors for when you're teaching children.

Zac:

Okay. So... Data scientists, researchers, friends of mine, what would we recommend or what would you recommend to folks who are thinking about or interpreting educational research or research in general? But let's use education as the lens. But I think these are probably going to be applicable in a larger sense as well. But what are three things we should keep in mind when we hear research says?

Maggie:

I think the biggest one is who is the sample and how big is it? Like, who are these students? How big was your sample? Was there a diversity of backgrounds and ages and different things? Or is it all a very homogenous sample? I think that's a big one. Because I think if it's a very homogenous, very small sample, then the generalizability of those findings are pretty limited. And, you know, you can say research says, but really it's just, you know, Miss Green's third grade classroom research says that actually it's not applicable to everybody. I think the other thing to think about is how did they actually define what they were looking at in this study? So what I mean by that is if they're if. we're saying academic achievement. What do you mean by that? Is it kids just end of the year state test scores? Is it some kind of end of unit assessment? Is it some kind of exit ticket? How are you defining the things you're actually studying? And does it make sense with what the actual findings and results are saying? And then the third thing would be to say, I think the overall study design and research design, a lot of these effects take a long time to... Are a lot of things in learning take a long time to actually really sink into students? And was the experiment even long enough to really teach what you needed to teach and actually find what you wanted to find?

Stephanie:

I love what you just said. How often have I been in a classroom or in a district setting where there's been some kind of initiative that has been put into place as being the thing that we have to do? And then it changes the next year because we didn't get the results that we wanted right away, right? Sometimes the change takes three to five years and we have to give it an opportunity to make that shift, you know? So I know that's not fully in terms of what we're talking here, but like this sweeping generalization and how long of a period of time has passed in order for things to be internalized, I feel like is... connection to a lot of that. But also, I have a question, if you don't mind, like expanding a little bit in terms of this sweeping generalization to the larger field of learning. To me, that sounds like extrapolation. Is that what you mean?

Maggie:

Yes, I think a lot of times with these very small, you know, 30 student classrooms, you look at two or three classrooms, and you're applying that to like, oh, all students learn best by using note cards. And that's sort of one of the findings you find. It's like, well, For those, you know, 60 or 90 students, that was a helpful vocabulary intervention. But can you really extrapolate that to all kids across all grades for all subjects? Like, probably not.

Zac:

And I know what it means, but just, you know, for people who don't know what it means, but extrapolate, not a word that a lot of people use every day. What does that mean? So it's

Maggie:

basically like taking from the smaller sample of research that you're doing and then sort of making a larger claim about how learning is done, for example, or what's the best practice for this curriculum or whatever it is.

Zac:

So if something is true in Montana, it may not be true in the Bronx.

Stephanie:

Yep.

Zac:

Gotcha. Maggie, thank you so much. This has been great and helpful.

Stephanie:

I'm so glad that you joined us, Maggie. Thank you for having me.

Zac:

And thank you everybody for listening to this episode of Academic Distinctions. Hopefully we've added some context and created some understanding around the work of Hattie.

Stephanie:

Thank you so much for joining us today on this episode of Academic Distinctions. We, your pod hosts, believe in you and know from Hattie's list that your learning will be improved if you talk about what you learned today. So share with your friends, your family, your doctors, your pets, anyone who will listen. Follow us on Instagram at academicdistinctionspod. Find us on Blue Sky at Fixing Schools or find us on Facebook. As always, this is your call to action to share the podcast, like us and subscribe. You can find us online at academicdistinctions.com buzzsprout.com i have a question for the pod or a topic you'd like us to dig into email us at mail at academic distinctions.com until next week friends this podcast is underwritten by the federation of american scientists find out more at fas.org so

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.