Toward better measurement in K-12 education research

Billy Buchanan, Director of Data, Research, and Accountability, Fayette County Public Schools, Lexington, Kentucky, expresses frustration with the disconnect between the large and important goals of education research, on one hand, and the gaps in measurement and statistical training, on the other.

Buchanan writes:

I don’t think that every classroom educator, instructional coach, principal, or central office administrator needs to be an expert on measurement. I do, however, think that if we are training individuals to be researchers (e.g., PhDs) we have a duty to make sure they are able to conduct the best possible research, understand the various caveats and limitations to their studies, and especially understand how measurement – as the foundational component of all research across all disciplines (yes, even qualitative research) – affects the inferences derived from their data. . . .

In essence, if a researcher wants to use an existing vetted measurement tool for research purposes, they should already have access to the technical documentation and can provide the information up front so as our staff reviews the request we can also evaluate whether or not the measurement tool is appropriate for the study. If a researcher wants to use their own measure, we want them to be prepared to provide sufficient information about the validity of their measurement tool so we can ensure that they publish valid results from their study; this also has an added benefit for the researcher by essentially motivating them to generate another paper – or two or three – from their single study.

He provides some links to resources, and then he continues:

I would like to encourage other K-12 school districts to join with us in requiring researchers to do the highest quality research possible. I, for one, at least feel that the students, families, and communities that we serve deserve nothing less than the best and believe if you feel the same that your organization would adopt similar practices. You can find our policies here: Fayette County Public Schools Research Requests. Or if you would like to use our documentation/information and/or contribute to what we currently have, you can submit an issue to our documentation’s GitHub Repository (https://github.com/fcps/research-request).

53 thoughts on “Toward better measurement in K-12 education research

    • It could be fairly long, or short, depending on how you define an improvement in learning. For example, there is an intervention program named Math Recovery for which there is some modicum of evidence related to potential effectiveness (https://www.evidenceforessa.org/programs/math/elementary/math-recoveryr-intervention). However, the measure being used for the outcome (WJ-III for short) is not aligned to any content/performance standards used in any states. Bearing that in mind, would you say that this is evidence of an improvement in learning if the improvement studied is measuring a different – but related – construct to the construct that should be affected? This also effects the ability of researchers to replicate the results if the outcome varies between states; this is probably another reason why for larger multi-state/multi-site studies the researchers may select a measure that is not in operational use in the educational organization.

      The other consideration is that not all education research is directly aimed at learning. For example, if someone was studying a behavioral intervention it would be reasonable to assume it may have a tertiary or quaternary effects on learning outcomes but they would be much more challenging to detect reliably without a massive sample. If that intervention has the intended effect it would still be good to know in either case.

      • What I have in mind is the kind of intervention that has shown effectiveness in real world applications. The intervention such as Math Recovery doesn’t count. A tutoring program requiring almost daily tutoring session for 12 weeks and providing moderate temporary boost that goes away in one year does not seem worth it.

        Measuring real world impact of an education intervention is difficult given all the confounders and disagreement on how best to measure learning outcomes. However, interventions that cost billions of dollars should have effects strong enough to show up in imperfect measures such as standardized test scores under reasonably careful analysis.

        • Reading Recovery (the reading analog to Math Recovery) gets significant grant funding on a regular basis; so I’m not exactly sure what you mean by effectiveness in real world applications, when both Reading and Math Recovery are deployed in similar ways at fairly large scale across the US. There are ways to deal with confounding that are fairly well established in the literature, so I’m not completely convinced that confounding is the true culprit here. That all said, I think your bigger point about cost effectiveness and efficiency is definitely valid. However, before making statements about cost effectiveness we need to know whether or not the interventions work. In this case, Reading and Math Recovery might not provide the greatest yield for the investment needed to implement them.

        • “before making statements about cost effectiveness we need to know whether or not the interventions work. ”

          I don’t think that’s true at all, in fact the core quantity to estimate in these issues is expected improvement on some achievement scale per dollar spent (let’s say expressed dimensionlessly as a fraction of the same quantity for some standard treatment)

          The idea that first you establish a positive effect (say by p less than 0.05 in a sufficiently powered study) and then you establish the cost effectiveness in some second study is basically a technique to get grant funds, not an effective decision making technique. It lets you argue that we should spend money studying things without first considering if it’s cost effective to even bother with.

        • Daniel,

          So would a cost estimate be valuable for an intervention that doesn’t work? Wouldn’t the decision – in an ideal world at least – be to not use an ineffective intervention regardless of the cost? Or to put it another way, what value would be gained from estimating the cost effectiveness of an intervention that lacks evidence of having an effect?

          I’m not arguing for Frequentiist or Bayesian approaches or that the cost effectiveness needs to be established in a second study. If in the initial study there is evidence of a causal effect it seems much easier to quantify the cost per unit change, but if there is no evidence of an effect why would anyone be inclined to invest further resources attempting to quantify the cost effectiveness of something that doesn’t have an effect? Would there even be a valid estimator of the cost effectiveness of an ineffective intervention or would it be written off as a loss by and large?

        • Sometimes reasonable use of background knowledge can be worked into assumptions that facilitate a reasonable sense of expected cost/benefit before the first study is ever done.

          I was with a group that did this for randomised clinical trails back in the 1980s where we even supplied a program to allow others specify/estimate a prior for a treatment effect based on past study outcomes of “similar” treatments, an adoption rate function based on estimated effectiveness/side effects from the prior predictive estimates that would flow from the study design.

          Actually was a lot of fun and for drug trials, given the huge cost of adoption and horrendous cost of adopting treatments that are harmful almost always large trials where the most cost/effective. OK the funding agencies did not like the result at all.

          But for one treatment for infection prevention in child birth, similar ideas lead to very large cost/effective ratio as the treatment was very inexpensive and there was very little concerns about harms or side effects. If I remember correctly the treatment was just adopted with a registry to track harms or side effects.

          Also, Daniel might be right – the cost/benefit of intervention studies in education may not be that attractive.

        • “So would a cost estimate be valuable for an intervention that doesn’t work? ”

          Absolutely, and not just a cost estimate for the intervention but a cost estimate for the study required to effectively estimate the effectiveness.

          A tool that is useful for understanding these kinds of issues is taking things towards limits. So for example suppose you are proposing an intervention that costs $1B per child and you believe that it will raise all the subjects SAT scores to 1600 uniformly.

          I can tell you already that whether it works or not even taking the time to write up the grant is a waste of resources. The value of raising one child’s SAT scores to 1600 is undoubtedly 3 decimal orders of magnitude less than the cost at least.

          On the other hand suppose you have a scheme that will take $1 per student and will on average move SAT scores by on the order of 1 point. All it will take is to first establish effectiveness using a $100M grant for a 10 year period to study 100k children longitudinally. Again, I can reject your proposal, even if it turns out to raise SAT scores by 9 points, substantially more than the expected amount. There MUST be substantially better uses of $100M in education. The cost to establish the usefulness of the intervention is simply too high.

        • @Daniel Lakeland,

          In both of the examples below you’ve assumed that there is some marginal return to the intervention (e.g., either raising scores to 1600 uniformly, or getting a 1 point increase on average). If, however, we knew that a third intervention would not change SAT scores at all, how would you attribute cost effectiveness to it beyond writing it off as a total loss? In many instances around the US districts are using interventions without having an understanding of whether or not they have any effect, but I’ve never met an education administrator who would say they would be comfortable investing a non-trivial amount of money on an intervention that wouldn’t have an effect (barring field trips which tend to be more about exposure than any dramatic shifts in learning).

        • @Keith O’Rourke,

          Do you happen to still have the tool that you mentioned available? I think it’d be great if there were some efficient ways to create some type of “what-if” engines that educational leaders could use as part of their decision making.

        • Billy: It was 30+ years ago.

          This is the only written record – Are Clinical Trials a Cost-effective Investment? Allan S. Detsky file:///C:/Users/KEOROURK/Downloads/jama_262_13_037.pdf

          You might wish to look at more recent work (don’t know if its better) Value of Information Analysis in Oncology: The Value of Evidence and Evidence of Value http://ascopubs.org/doi/full/10.1200/jop.2013.001108

          And one of the earlier write ups on this topic – Yates f 1952 ‘Principles governing the amount of experimentation in developmental work. Nature, 170, 138-140.

        • Of course few propose interventions that are known to make kids do worse, or to spend money when the effect is known to be zero. But my point is that we rarely know what the effect size will be until we’ve spent some money, maybe a lot of money, on studying it. But if we think about costs before we’ve even done the initial study to establish a positive effect, we can already do a cost effectiveness calculation. Does it make sense to even spend the money to find out how big the effect really is given what we know about the range of plausible effect sizes? Few $1 interventions exist that bring kids reliably from SAT of 1000 to SAT of 1600. If something could be slightly negative to slightly positive, and it costs a bunch of money to even find out, and there are better things to do with the money than even find out… we should spend our money on those other things. It’s fine to say “we don’t really know how effective handing out toothbrushes at schools really is at reducing dental caries, but given that the effect is probably around X size, has individual variation of s, and costs D dollars per student in the study, spending enough money to find out how big the effect size is already wastes money compared to just spending that same money to do alternative proposal Z.

          Cost effectiveness of a study, and of an intervention, should evaluated *before* we do the study and potentially waste the money. This is in essence the Bayesian alternative to what Frequentists would do for a “power analysis”. In Bayesian Decision Theory you actually decide whether a study is good or not based on whether it is expected to produce a desirable net outcome (measured in Dollars for example), rather than whether it’s expected to detect an effect with a p value less than 0.05 in 80% of study instances. If it costs you a billion dollars to find out that handing out toothbrushes has a tiny negative effect, you’d be better off not finding out.

        • @Daniel,

          Thanks for the clarification. I get where you’re coming from at this point and hadn’t been thinking about it from the same perspective initially. It definitely makes sense for sure. I guess the big challenge now is still monetizing outcomes consistently/reliably to do more of this work a priori while avoiding some of the challenges with misidentification (e.g., if the potential cost benefit is truly greater than what would be estimated a priori).

          @Keith,

          Thanks for the reference. I think something like the approach you mentioned combined with what Daniel discussed could provide a really interesting framework to support decision making about programs before using them.

        • Daniel:

          Thank you for these insights. I have often wondered lately, when reading about education experiments and “reforms,” whether it would have been better just to give all the money that was spent studying the theory directly to the intended beneficiaries instead. A lot of this work has the odor of “full employment for the (upper middle class) researchers,” as you suggest.

        • Kyle,

          As I mentioned there are way too many researchers; variable in quality obviously. And yes it is to secure employment for mostly the beneficiaries of upper middle class upbringing. Anyway we cast questions relating to K-12 education invariably leads to queries about pedagogy in higher education and media culture.

        • @Kyle C,

          In some instances it makes complete sense to bypass trying something out. The shift from phonics based reading/literacy to whole word approaches was a great example. Phonological, phonemic, and phonetic skills are all – to some degree – prerequisites of whole word reading; it would be like trying to teach students multiplication if they’ve never experienced addition in any form. Sure it might be possible, but I think there would be consensus that the prerequisite skills are needed to do things more efficiently. One of the bigger challenges, however, is building research fluency and smart consumerism in the K-12 space. Outside of individuals who might work in research offices or who’ve gone through a doctoral program, consumption of research is not emphasized strongly enough in the preservice pipeline – on average – to equip a critical mass of professionals in the education sector with the skills to carefully evaluate research. What ends up happening is that sales reps will throw marketing materials that make all sorts of causal claims in front of potential buyers while appealing to their applied training; basically show them something that seems like it would be easy for most classroom educators to implement and then throwing out some wild causal claims about the product’s effectiveness. If we built up a critical mass of professionals with that skill set it wouldn’t prevent all suboptimal decisions from being made, but could potentially avoid some of the worst product/service decisions.

          @Sameera,

          I’m not sure how media culture would fit into things, but there are definitely challenges in the pre-service pipeline. For example, how many undergraduate programs require pre-service educators to have even an intro psychometrics course? The course wouldn’t need to be applied and wouldn’t need to dig into the underlying math much, but could provide a foundation for understanding concepts like measurement invariance/DIF/DTF, what test characteristic curves and test information functions are and their implications for inference about test takers, what equating/linking is and isn’t, etc… I’m not sure that there are too many researchers or that their motivation is to secure the benefits of upper middle class upbringing. Some researchers genuinely care about improving conditions for children and making systems more efficient and effective.

  1. A big problem here is researchers’ investment in a particular result. Many know from the start what they hope to demonstrate: that charter schools do or don’t “work,” that teachers’ education levels do or do not affect the quality of their instruction, that group work helps or hinders learning, that “growth mindset” interventions boost or do not boost test performance, etc. Such expectations can skew the research itself, since the truth may be provisional and conditional. So yes, I agree heartily with Buchanan that “[those involved in teacher preparation] have a duty to make sure [educators] are able to conduct the best possible research, understand the various caveats and limitations to their studies, and especially understand how measurement – as the foundational component of all research across all disciplines (yes, even qualitative research) – affects the inferences derived from their data.” (The text in brackets is my own tentative interpretation, since I was not entirely certain who he meant by “we” and “they.”)

  2. I’m not sure though that the statistics community agrees on what constitutes good measurement, which, in turn, makes it difficult to settle on any one measurement tool. Perhaps there could be a course and lecture series that specifically focuses on what has or has not worked in measurement. So that future researchers can avoid some of the pitfalls faced in the past.

    • Psychometrics has a fair amount of history. While there aren’t any single overall measures that are universally accepted as a type of gold standard for all purposes there is a fair amount of consensus on important factors in developing sound measurement tools; the Standards for Educational and Psychological Testing provide a fairly good framework to start from.

      • Billy,

        Thanks. In rereading your Linkedin article, I don’t doubt that there is a gap in measurement/statistical training and the larger goals of education research. The reality is that there are statisticians that may not understand the statistics concepts you listed in the article Such is the discussion that is going on among the thought leaders in statistics and epidemiology, to name two fields.

        • That definitely makes sense. Any suggestions on how we could point others in the right direction if they aren’t already familiar with some of these concepts?

        • The right directionS (or better the least wrong directionS) of empirical inquiry will continue to change as fields develop.

          “we should try to avoid anything that resembles a prescriptive approach to inference that instructs scientists THIS IS HOW WE DO IT” http://statmodeling.stat.columbia.edu/2018/10/12/limitations-of-limitations-of-bayesian-leave-one-out-cross-validation-for-model-selection/#comment-902963

          On the other hand, nothing could be worse than giving up.

        • Billy,

          I have been thinking about the very same question you raise b/c from my understanding, large percent of researchers has some quantitative background. But subsets may not be apprised of the controversies over uses of some measurement tools. Researchers apply them without understanding the conceptual/theoretical basis for their applications. Admittedly the conceptual/theoretical may take a decade for researchers to grasp. Andrew’s blog is one venue that should be recommended to researchers. I’m sure some percent already peruse this blog.

          I would hope that the statistical societies are providing more venues to discuss the issues you raise.

          My hunch is that some small % of statisticians will be at the helm of statistical reforms. There are way too many researchers to begin with. I would prefer to hear more from what kids think about their education.

          Bottom line, we need better instructors in statistics particularly.

      • Billy:

        It is this consensus around a false belief in the precision that is simply not possible which is a fundamental problem in psychology and directly related to the problems in replication we see today. There appears to be a naive belief that the crude assessment of broad concepts can be made precise by increasing the size of the training sample. And while there is in fact consensus, it is not universally shared: http://statmodeling.stat.columbia.edu/2016/09/22/why-is-the-scientific-replication-crisis-centered-on-psychology/#comment-902494

        • The same can be said of the naive belief that pre-registration can solve these problems. It appears to be an attempt to dress crudeness and substantial uncertainty in the scientific garb of apparent precision and certainty.

        • Hi Curious,

          I’m not sure if you’re referencing the number of participants or number of items in your statement above. After reading the comment that you linked to as well as the comments here, what suggestions do you have to improve things? While challenging assumptions and understanding is great, what is even better is to improve the field. To that end, I fail to see anything in your comments that would achieve that goal (e.g., improving the field). If I have missed something please feel free to redirect my attention to what I may have overlooked.

  3. I have to say that I am a bit surprised that no one has commented about how we are getting researchers to preregister their plans as part of our approval process.

    • Preregistration is the devil’s plaything, a terrible technique to keep people from having to do the hard work of rethinking p values and moving to a bayesian decision theoretic framework. It’s designed as a fortification to hide behind and an armory to shoot the enemy with.

      ;-)

      • I think preregistration is important regardless of whether or not individuals are using frequentist or Bayesian methods. From my perspective preregistration is fundamentally about replicability and isn’t a terribly high threshold; usually anything asked for preregistration is what someone would need to do during their dissertation (e.g., specifying methods, models, data collection, business rules, etc… a priori). That said, do you see preregistration having different utility based on the statistical approach that people are taking to the data? One of the reasons why the documentation for our request process lives in GitHub is to promote discussions like this along with evolving and updating the process with time in a transparent way. When we’ve had questions from researchers about the process, we’ve encouraged them to submit an issue and/or pull request with suggestions to improve the process overall.

      • @Daniel

        I disagree. Even if you don’t use p-values I’d trust the work more if I know it was pre-registered. It’s about not shifting the goalposts. It’s about knowing exactly what intervention you set out to test.

        I don’t like the scenario where all studies are only exploratory.

      • So, obviously the winking smiley was intended to convey that I was being a little extreme on purpose. I think there’s a lot of happy talk about preregistration and I wanted to push back against it a bit. I don’t have a problem with people stating their hypothesis before collecting data, but I do have a problem with what I percieve to be the next step: claiming people’s analyses are invalid because they didn’t pre-register their analysis, or claiming that you can’t or shouldn’t publish an analysis because it isn’t the one you pre-registered, etc.

        I think a much better alternative to preregistration is a comprehensive Bayesian model in which all the main plausible mechanisms are included. If we want to decide what’s going on, we should compare all the possibilities simultaneously and see if there’s one that does a substantially better job than the others.

        Another important alternative to preregistration is to demand that data be made public. Then if there are alternative models to be considered that the original researcher didn’t consider, a third party can run an analysis comparing them.

        Both of these kinds of transparency are substantially better than preregistration, and extremely substantially better than the cargo-cult version of preregistration: “you didn’t preregister this so it’s not science / invalid / ignorable” type stuff.

        • + 1 to the spirit of Daniel Lakeland’s comment. There is an unfortunate tendency to impose crude binary decision rules in science – although increasingly I think this is about attentional filtering as much as it is about scientific judgement. Replacing dumb reliance on p values as a filter, with p-values + pre-registration IS one step forward, but provides fodder for NOT fixing the problems…

        • Basically, pulling off the kinds of analyses that Daniel, or for that matter, Andrew or myself or really most folks here, would want to see, requires a good deal of time and energy spent both in learning the mechanics, and the reasoning behind the mechanics, in addition to greater time and energy in any given application. But, as is well-known here, the incentives in much of the game favor quantity over quality, especially as responsibilities pile up. I am glad I did much of my “extra-curricular stats” learning while still a graduate student! Those of us who insist on the careful, thoughtful approach definitely take a hit, at some level. I am willing to absorb it for the sake of integrity, but I increasingly appreciate why the simple filtering games of significance/not-significant, pre-registered/not-pre-registered have come to such prominence in many academic circles.

        • Chris:

          Agree – and I had a strange position that allowed me ample “extra-curricular stats” learning time or at least a direct who realised it was in their interest. Although it cost me after I did a Phd afterwards and funding agencies disallowed me from applying for funding as a new researcher status because of that.

          Now there likely is also the challenge that those seeking statistical expertise do not want to have to grasp any of it but rather just want assurance they are working with an actual expert who is behaving in good faith http://statmodeling.stat.columbia.edu/2018/01/23/better-enable-others-avoid-misled-trying-learn-observations-promise-not-transparent-open-sincere-honest/

        • Keith,

          Those are the categories of concerns I’ve been juggling, in an informal way. I couldn’t have explained them as cogently and elegantly as you have. You have had decades of experience with the epistemic environment. I come at it from an altogether different domain. But I see the same or similar concerns across different disciplines. Thanks for referencing that article.

        • I definitely agree with what you are saying here. It isn’t our intention to claim that if a study isn’t preregistered that it isn’t and/or can’t be valid. It is a combination of getting potential research partners to specific exactly what their intentions are and to guard against p-hacking. If someone approached us saying that they intended to do some type of data mining we could be ok with that so long as they weren’t trying to make a bunch of causal claims. Exploratory analysis can definitely be useful/helpful and we definitely don’t want to get in the way of that. Basically we are addressing the transparency issue to the extent allowable under current law (i.e., FERPA). One of the bigger challenges that I would see with analyzing random data sets in education that are stripped of all identifiable information is that it can prevent others from modeling non-ignorable policy constraints that may exist in some locations. That said, I definitely appreciate the clarification and expansion.

  4. (Did all the comments on this page get deleted? I can see a dozen in the RSS but none actually on this page now.)

    yyw: education is not my specialty but the largest meta-analyses I can think of on effects of education are:

    – “The Production of Human Capital in Developed Countries: Evidence from 196 Randomized Field Experiments” http://scholar.harvard.edu/files/fryer/files/handbook_fryer_03.25.2016.pdf , Fryer 2016

    – “Randomized Controlled Trials Commissioned by the Institute of Education Sciences Since 2002: How Many Found Positive Versus Weak or No Effects?” http://coalition4evidence.org/wp-content/uploads/2013/06/IES-Commissioned-RCTs-positive-vs-weak-or-null-findings-7-2013.pdf , Coalition for Evidence-Based Policy 2013

    Taking a cynical Bayesian meta-analytic point of view, we see loads of small effects which should be shrunk closer to zero and lots of signs of small-study biases.

    • Gwern,

      Thank you for the links. Fryer seemed to have done a lot of work for his meta-analysis. A few thoughts:
      1) He could have used a good statistician to help with producing more informative graphs (Figure 3 is a big missed opportunity) and pruning forking paths.
      2) I wish he focused on long term (2 years past intervention at least) outcomes for early childhood interventions.
      3) The interventions that stood out were high dosage tutoring and managed PD (reading). When I looked closely at managed PD, however, it turned out to be programs like reading recovery, which combines a 1-year specialized training for teachers with high dosage tutoring of students. It’s not clear how much the 1-year training contributes since all RCTs cited used no tutoring instead of tutoring with regular teachers as control.
      4) I wish there was more of an effort to reconcile and explain conflicting results instead of just focusing on an average effect across studies and its statistical significance and call it a day.

      • yyw,

        Thanks for looking at the resources that gwern listed. I hadn’t come across those, but your point in numbers 3 and 4 are definitely valuable. I had drawn out a workflow to help others with combing through the literature to figure out whether or not the resource might provide evidence of effectiveness and one of the cautionary notes I added to the document was to think carefully about the comparison/control group. I definitely want to ask around the office about whether or not anyone is implementing any form of tutoring without having the Reading/Math Recovery PD. I think part of the reason for some of the differences in study findings is related to different measures being used for outcomes. The proximity of the items to the taught curricula can make an instrument much more sensitive to changes that would not appear in measures designed to be highly distal from taught curricula (e.g., state assessments).

        Robert Slavin has also worked on quite a few meta-analyses for IES/NCES. John Hattie and Robert Marzano have as wel, but their work tends to be more focused on instructional practices and not necessarily interventions.

      • yyw
        > more of an effort to reconcile and explain conflicting results instead of just focusing on an average effect
        That is the major downfall IMO of most papers about doing meta-analysis – not sure why its downplayed so much – my guess is avoidance of clear indications of limitations.

        In the wiki entry of meta-analysis I put the reconcile and explain conflicting results topic as primary and at least by the second sentence. Over the years it has been moved lower and prefaced with words suggesting its an ancillary consideration.

        Currently it’s this last sentence of the first paragraph “In addition to providing an estimate of the unknown common truth, meta-analysis has the capacity to contrast results from different studies and identify patterns among study results, sources of disagreement among those results, or other interesting relationships that may come to light in the context of multiple studies.”

  5. Since this has attracted so much criticism, let me pile on (and the entire post and discussion did not appear in my browser until this morning). An area about the new policy that concerns me greatly is the lack of any mention of publicly providing data used in analysis – in fact, there is only extensive discussion of the need to protect privacy. Yes that is important – but it becomes a shield from having anyone be able to really examine your results. Too much “research” leads to policy conclusions that cannot be tested or examined by other researchers since the data is not available. It’s great to focus on measurement – long overdue. Preregistration is also welcome, although, obviously with a number of counter-arguments. But let’s not forget about public access to data used in studies. Without that, I’m afraid it will be difficult to move the needle on research quality, however good the intentions behind these policies.

    • While I can understand why you would view the privacy protection issues as a concern from the research world, the Family Educational Rights and Privacy Act (20 U.S.C. § 1232g; 34 CFR Part 99) is unavoidable in the realm of education. If nothing else, by essentially pre-registering their study with our office my team and I should be able to replicate the researcher’s findings. Until/unless federal law changes there isn’t much that we can do about publicly releasing data.

      When I was working on my dissertation with one of the restricted use NAEP data sets I was actually surprised at the time that it was a restricted use data set since there wasn’t any personally identifiable information contained in it. Some places (e.g., Massachusetts) took the opposite approach and would previously release extraordinarily granular data but would either omit location information (e.g., schools/districts) or would omit demographics in their public assessment files.

      • FERPA, like HIPPA, was well intentioned (maybe), but has proven to be a curse on the crusade for open data. The ultimate irony is that so much of our data is shared without our knowledge or agreement, but those laws are immediately provoked when anyone uses the data for research purposes. I am not doubting the importance of protecting medical and educational information, but do you really think that my medical information is being kept private? I get phone calls every day from people marketing medical services to me and they know plenty about me. At times, it feels like these laws are protecting the rights of others to use my data without my knowledge – not the other way around. Forgive the tirade – I’m well aware of the need for these laws, but they are invoked far too often in my opinion.

  6. I’m surprised at what seems to be brushing off what is for me yyw’s most important point, namely
    “#2: I wish he focused on long term (2 years past intervention at least) outcomes for early childhood interventions.”
    Perhaps it is not widely appreciated that, generally speaking, interventions tend to wash out rather than lead to cumulative gains over the long term (as was optimistically expected a few decades ago). There has been some talk through the grapevine to the effect that interventions are but temporary environment-changers that do little than temporarily accelerate a child’s attainment toward a more-or-less fixed ceiling (of course it matters whether it is “more” or “less”). I don’t think anyone knows at the moment how much to make of this pessimistic view in the absence of definitive evidence on truly long-term (> 6 years; ideally, to adulthood) and wide-ranging (primary and secondary) outcomes. These are not easy studies to carry out, and they may be even difficult to pitch convincingly in the context of reasonable overriding concerns regarding equal opportunity, access to services, the role of social inequalities etc.
    But it is a nagging point; eventually, someone will have to figure it out, one way or another, whether for scientific or for financial reasons. It seems to me it had better be the good guys doing the job and drawing out the implications within a humanistic framework.

    • @Athanassios Protopapas,

      I have regularly thought about this and was thinking about how I could respond to further the discussion started y yyw. I think the way you have framed things definitely helps. To add a bit of context, Montgomery County, MD got all sorts of attention roughly 5 years ago by claiming they could accurately predict whether or not students would drop out based on measures in Kindergarten. While this isn’t in reference to an intervention I have the same concerns about it that you and yyw mentioned regarding interventions, decaying effects. At the moment, the best K-12 institutions can do for outcome measures following graduation are tracking whether or not students enroll in any IHEs, whether they remain enrolled until they graduate, and I think the degree they graduate with. But aside from that there aren’t longer term outcomes available to the K-12 side. That said, USED released grants under the previous administration to fund “Statewide Longitudinal Data Systems” with the intent of addressing these types of issues. While the SLDS grant recipients are definitely able to study broader state policy changes and their implications, my experience is that they rarely possess the detailed data necessary to support more robust longitudinal analyses of outcomes from district and/or school based interventions. Even in cases where they do have access to those data, there tend to be non-trivial missing data related problems due to changes in content/performance standards (which subsequently result in changes to the measures used for academic outcomes). That said, I definitely think it makes a lot of sense to incorporate longitudinal evaluation designs into the work as well. If you, or others, wouldn’t mind providing some additional feedback, how would you balance the value of understanding the longer term effects of an intervention with the potential need/desire to bring a program to scale? Given some of Daniel Lakeland’s comments above, it’d also be interesting to figure out a way to include the decay of the effects into a cost-benefit framework.

Leave a Reply to Kyle C Cancel reply

Your email address will not be published. Required fields are marked *