First of all, while I am NOT an expert statistician, I do know more than the average person and have had formal training/education in the subject. (I've had some undergraduate and graduate stats classes, used statistics professionally & throughout undergrad, etc.) So I would love to have a "pure" (read: non-education related) statistician go over the data, but I think I have made a decent start here.
If there's one thing I hate hearing any more in education it is "data" because most of the data they have is crap and/or they have no idea what to do with it. In the case of Marzano, I often hear "him" (his company & his salespeople) going on about how much research he has. It's true, his database has 1036 studies in it (I copy & pasted these into a text file, then imported into Excel as a "delimited" file). But let's take a closer look, because as just about everyone with any connection to the professoriate knows, the quality of educational research is (often extremely) suspect. This is largely do to the difficulties of doing research on kids (particularly longitudinal research) but there are other problems as well.
If you sort the data by p-values, you quickly start to see some problems. Marzano's own website declares,
"Basically, if the value in this column is less than .05, the effect size reported for the study can be considered statistically significant at a significance level of 5% (α = .05). In other words, a reasonable inference can be made that the reported effect size is probably not a function of random factors; rather, the reported effect size represents a real change in student learning."
So sort the data by the p-value, delete those that are greater than 0.05 and look what happens: you're down to 285 studies from the initial 1036. That means that by his own criteria, only 27.5% of his data is statistically significant (to α=0.05). (Or, taken another way: There is a significant probability that the results weren't really results, but random fluctuation between the controls and experiments. And that's true for nearly three quarters of the data.) Of the remaining 285 studies, 101 of them have a p-value of zero; I assume this means that either it wasn't reported OR it was such a great experiment that they were able to calculate the p-value down to less than 0.001. The latter is unrealistic. (For example, one study has just 4 data points and an "effect size" of 9.25, which is grossly unrealistic. I don't see how any self-respecting statistician could use or report this, to be blunt.) So incomplete/unrealistic data in my book gets thrown out--we're down to 185 studies (17.9% of his database).
We're not done yet. Here are a few other data points I'm going to throw out because I find them too suspect to be reliable for district-wide policy-setting decisions: Studies involving less than 18 students (n): 79. Admittedly, this is somewhat arbitrary, but I could probably defend their exclusion* far better than anyone could defend their inclusion. That brings us down to 106 (10.2% of the total) studies. I'm going to stop there, but notice that there are also 16 studies that are incomplete; they have no unit length. Another 21 studies lasted less than a week. Two studies have controls of less than 10 students, which "seems" too low (one is 4, the other 9). So even this remaining 10% is somewhat dubious. But it's not the ten percent that really bothers me, it's the 90%, because Marzano's work--which sadly is influencing policy--is based on all of this bad data. One other quick question: How much time was spent on each of these 1000+ studies? (I'm guessing not a lot; see below.)
In other words, the policy is based on research and the research relies on unreliable data.
So does it surprise anyone that all of these new policies only seem to make things worse?
Also worth reading: "Marzano - A Successful Fraud", a review of Marzano et al's Classroom Instruction that Works...
To quote the Amazon.com review:
A. Every single reference I checked was itself dubious or misrepresented by the authors.
B. Some of the references were on topics unrelated to the instructional strategies cited.
B. [sic] Some of the numbers from published data were altered to better conform to the author's point of view.
C. Some of the references themselves presented provisional conclusions based on weak results, but were given complete credence by Marzano et al.
D. The authors took weak data from several studies, each based on averaging the results from studies assumed to use similar methods and subject cohorts, and averaged these, compounding the statistical weaknesses. This is especially shocking given that no credible researcher would combine results from studies by different groups that clearly use different methodologies and subject cohorts.
* My rationale for excluding studies of < 18 students: This is less than a typical classroom and more importantly, probably less than necessary for reliable statistical analysis (I was always told to use at least 30 data points, but that was in a field--Science/Engineering--that has much more rigorous standards than the social sciences, let alone education).