Background

The current report focuses on evidence that the STEMscopes Math curriculum is effective at raising student math achievement. We use a post-facto quasi-experimental design (QED) with a matched control group to evaluate the potential associations between STEMscopes Math and math achievement for 3rd and 4th graders in Texas. This type of evidence is consistent with the “Every Student Succeeds Act’s (ESSA) Tier Two evidence. QEDs with matching attempt to overcome the hurdle of “non-random” assignment. In this case, the schools that chose to purchase and use STEMscopes Math may be different in some way (e.g., serve different student populations) than the schools that did not choose to use STEMscopes Math during the 2021-2022 school year. In addition to the main analyses that focus on whether schools that use STEMscopes Math have higher student passing rates than schools that use some other math curricula, we include follow-up analyses focused on sub-group findings.

Thus our main research questions are:

  1. What is the impact of the STEMscopes Math curriculum on STAAR Math school level proficiency rates (i.e., the percent of students who met grade level expectations or above) for Grades 3 and 4?
  2.  Do impacts differ across subgroups (i.e., race/ethnicity, gender, EL status)?

Results

To examine the impact of STEMscopes Math to increase STAAR elementary school performance on the STAAR math test, we conducted multiple regression analyses. Our 3rd grade matched sample included 1,096 schools. Our 4th grade matched sample included 1094 matched schools. Our main analyses focused on predicting the percent of students who met grade level expectations or above on the 2022 STAAR math test (as this is the outcome typically used by the state of Texas; for example see here) with a binary variable indicating whether a school was a STEMscopes school or non-STEMscopes school, and covariates (see methods) Results for both grades were significant.

In 3rd grade, on average, schools that purchased and used STEMscopes had a 1.96 point increase in the percent of students who met grade level expectations or above (non-STEMscopes: 36.08%, STEMscopes: 38.04%; B = 1.96, p < 0.05, ES = 0.12), see Figure 1. We estimate that ~870+ additional students in STEMscopes schools met grade level expectations or above among the students who were tested (1.96 *44,467 tested students in STEMscopes schools) relative to non-STEMscopes schools.

In 4th grade, on average, districts that purchased and used STEMscopes had a 2.22 point increase in the percent of students who met or exceeded grade level expectations (non-STEMscopes: 35.74%, STEMscopes: 37.96%; B = 2.22, p < 0.05, ES = 0.13); see Figure 2. We estimate that approximately 990 additional 4th graders relative to the non-STEMscopes schools may have met grade level or above on their 2022 STAAR based on this weighted estimate.

In addition to the main analyses, we included several follow-up analyses focused on passing rates among different sub-groups of students. There were also significant findings among economically disadvantaged students (B = 1.96, p < .05, ES =0.12) such that economically disadvantaged students in STEMscopes Math schools outperformed economically disadvantaged students in non-STEMscopes schools. There was also a trend level difference for females in STEMscopes schools (B = 1.72, p =, ES = 0.10).Third Grade Students whoMet Grade Level Expectations

Next, we ran two multiple outcome multivariate regressions. The state of Texas organizes their publicly available STAAR outcome data by subgroups columns (e.g., “percent of White/Caucasian students in a given school who meet benchmark” is one outcome column while “percent of African American/black students in a given school who meet benchmark” represents a separate column). This “wide” format necessitates the use of multiple outcome models to look across ethnic groups and make comparisons. The first multi-outcome model focused on three racial/ethnic categories as the other categories had too much missing/masked data (see methods). We focused on the percent of African American/Black students, Hispanic students, and White/Caucasian STAAR at a given school who “met grade level or above” on their 3rd grade STAAR math assessment. African American/Black students in STEMscopes schools outperformed their counterparts in non-STEMscopes schools (B = 3.30, p < .05, ES = 0.19) as did Hispanic students in STEMscopes versus non STEMscopes schools (B = 3.25, p < .05, ES = 0.20). We did not see significant differences across STEMscopes versus non- STEMscopes schools for White/Caucasian students (B = -0.71, p = 0.56, ES = 0.04) . However, because we included all three outcomes in the same model, we were able to use Wald tests to focus on whether STEMscopes had a differential impact on the STAAR outcomes by race/ethnicity. We ran two tests comparing White/Caucasian parameter estimates to African American parameter estimates (W(1) = 6.15, p < .05), and White/Caucasian parameter estimates to Hispanic parameter estimates (W(1) = 10.16, p < .01). In both cases, the Wald tests were significant (see methods section for more details). The significant Wald tests can be interpreted such that African American/Black and Hispanic student passing rates in STEMscopes schools not only significantly differ from their counterparts in non-STEMscopes schools, but from White/Caucasian students passing rates in STEMscopes schools. Put another way, students of color are benefitting more from STEMscopes than their White/Caucasian counterparts, see Figure 1). The second multivariate multi-outcome model focused on English Language Learners (ELLs) and non-English Language Learners (non-ELLs). Overall, there was a significant point increase in the percent of ELL students who met grade level expectation or above in STEMscopes schools compared to ELLs in non-STEMscopes schools (B = 4.01, p < .01, ES = 0.21). Compared to non-ELLs in STEMscopes schools, there was a trend (W(1)=3.05, p = .08) positive difference with ELL students trending towards taking more advantage of STEMscopes Math’s high quality learning opportunities compared to non-ELL peers.

The 4th grade follow-up analyses indicated a significant point increase for economically disadvantaged students (B = 2.49, p < .05, ES =0.14), and females (B = 2.07, p < .05, ES=0.11) relative to economically disadvantaged students and females, respectively, in non-STEMscopes schools. Results were not significantly different for ELL students across schools (B = 1.36, p =.35, ES = 0.07).Fourth Grade Students whoMet Grade Level Expectations

Similar to the third-grade analyses, we ran a multivariate multi-outcome model testing STAAR outcomes across three race/ethnicity categories. Results indicated a significant point increase in the percent of Hispanic students who met or exceeded grade level expectations in STEMscopes Math schools relative to their peers in non-STEMscopes Math schools (B = 2.78, p < .05, ES = 0.17). Results did not significantly differ for African American/Black students (B = 0.80, p = .63, ES =0.04) or White/Caucasian students (B = 0.35, p = .79, ES = 0.02) in STEMscopes versus non-STEMscopes schools. As a reminder, the multivariate multi-outcome models also allow us to compare parameter estimates via the Wald test to investigate if STEMscopes Math differentially impacted students of different racial/ethnic groups. There was a trend level point increase in the percent of Hispanic students who met grade level expectations or above compared to White/Caucasian students (W(1) = 3.05, p = .08). This suggests that Hispanic students are trending towards taking more advantage of STEMscopes Math’s high quality curricular materials.

Methods

In this section, we provide details about study procedures including the data sources, variables used, and participating districts.

Data sources
Data for this study came from two sources. First, schools that used STEMscopes Math for 3rd and 4th grade in the 2021 - 2022 school year were identified through the STEMscopes analytics platform. Within the analytics reports, we used the number of 3rd and 4th grade scopes accessed as a metric of use. If a school demonstrated any 3rd or 4th grade math scope usage, they were considered a STEMscopes Math school.

Second, school demographic data and school performance on the State of Assessments of Academic Readiness (STAAR) were accessed through the Texas Education Agency website. For 3rd grade, we used the previous 3rd grade STAAR data file (spring 2021 results) and focused on schools percent of students who met grade level expectations or above on STAAR math as a baseline measure of math achievement. Specifically, the state of Texas sets 4 levels of math achievement: grade standards not met, approaching grade level expectations, grade level expectations met, and mastery of grade level standards. We use third grade as a proxy for prior achievement (as STAAR does not have 2nd grade level results). For 4th grade, we used the spring 2021 3rd grade percent of students who met grade level expectations or above on the STAAR math test (by school) as a baseline measure of math achievement. The 3rd grade scores from 2021 were based (approximately) on the same pool of students who took the STAAR math assessment in 2022 in 4th grade.

We also downloaded 2021-2022 school year school enrollment data, including enrollment by racial/ethnic subgroups, as well as the ‘special populations’ file that includes students with economic disadvantage, English language learners (ELLs), students who were part of the gifted and talented program, students who received special education services, number of students considered Title 1, as well as several variables indicating what district a school was located in, what county, and what region. Data were cleaned such that counts were converted to percentages (e.g., number of students in a given racial category/ total number of students). These variables were used to match STEMscopes and non-STEMscopes schools (see participants section below for details on matching). Once matching was complete and baseline analyses were conducted, we downloaded the 2022 STAAR data including subgroups and the number of students tested per school. We used the percent of students in each school who met grade level expectations or above on the spring 2022 STAAR as the main outcome variable.

Participants
In the 2021-2022 school year, the overall number of public Texas schools that used STEMscopes for 3rd grade was 548 and 547 for 4th grade (STEMscopes is also used by numerous private and parochial schools in TX). Overall, the Texas Education Agency website reports up to 4,679 schools that may have 2022 3rd grade data and 4,658 schools that may have 4th grade scores. There was missingness in all publicly available state files, please see the missingness section below; thus, this number is the number of schools who submitted data but not necessarily scores. Using the 2022 number of schools, we estimate schools that used STEMscopes Math in 3rd and 4th grade represent ~11.7% of Texas public schools that could be based on state testing data. As part of Tables 1 & 2 below, we present baseline statistics for all Texas public schools as well as for our matched samples by grade (see Matching section). What we can see is that schools in Texas that have purchased and are using STEMscopes Math (and consequently also in the match control sample) tended to be numerically bigger, more economically disadvantaged, and include more African American and Hispanic students, English language learners, and students with special needs relative to the state of Texas population.

Missing Data
As a measure of privacy, state data does not include a numeric value for any variable where less than 10 students contributed data. This led to missing data (by design- this means we know what caused the missingness) with variables that included fewer than 10 students not reporting numbers. In addition to not reporting categories with 10 or fewer, for race/ethnicity data, the state of Texas often masks two columns to discourage “figuring out” missingness based on the other categories. Hence this file had the most missing data. Please see Table 2 for percent missing by covariate variable. For the current study, we use variables that have up to 50% missing. To account for missing data in the covariates, we used multiple imputation by chained equations (MICE). We use the ‘mice’ package in R (5 imputations, 20 iterations per imputation). The results reported in the main regression analyses use this imputed covariate data.

Covariates are used in up to two ways in the analyses. “Regional covariates,” that is, school locations within a district, county, and region (no missingness), are used only in the school matching process to help ensure that schools are as similar as possible among as many variables as possible (we have data that suggests schools in the same district are more similar than schools in different districts; see analyses below. We assume there may also be county and regional effects on schools as well). Baseline math scores, race/ethnicity percentages, ELL percentages, and school size were used in both matching analyses and 3rd grade substantive analyses. Preliminary analyses indicated that the percentage of gifted students and percent of special education students were not significantly related to any 3rd grade outcomes and were ultimately trimmed from the models presented in this report. These variables were still included in the 4th grade analyses as, at that grade level, significant relationships began to emerge. The variable with the most missingness that was used in all analyses was the racial/ethnic percentage covariate “Two or more races/ethnicities” for 3rd and 4th grade.

Matching
To match schools based on the data available from the Texas Education Agency, we matched as closely as possible across 17 school demographic and achievement variables including the 2021 percent “approaching” and percent that “met” grade level standard or above on the 2021 STAAR math test, school size, the percent of students that were classified as economically disadvantaged, the percent of ELL students, the percent of students in a school across race/ethnicity categories (i.e., Asian, Black/African American, Hispanic/Latinx, White/Caucasian, Two or more race/ethnicities, and Native Hawaiian/Pacific Islander), percent of gifted students, percent of students receiving special education services, as well as variables indicating district, county, and region. We used the ‘Match-it” package in R with Nearest Neighbor matching. This form of matching is also known as “greedy” matching. It uses all treated units (e.g., all STEMscopes schools) and involves running through the list of the schools and selecting the closest eligible control school to be paired with each STEMscopes Math school. The nearest neighbor matching specification also requires the specifications of a distance unit to define which control school is closest to each STEMscopes Math school. We use the propensity score difference. This is the most common distance metric and uses the difference between the propensity scores for each STEMscopes Math school and control school (Stuart, 2010).

Baseline Equivalence
For all covariate variables (the variables used for matching) including baseline math performance, there was only one significant difference between matched groups: school size in 3rd grade was significantly higher in STEMscopes schools (see Tables 1 & 2). However, WWC standards note that baseline differences greater than 0.05 and less than 0.25 must/can be controlled for statistically. For 3rd grade school size, the effect size indicating difference in estimate divided by pooled standard deviation was 0.16. Thus, we control for this statistically. In fact, following the advice of Stuart, 2010, we include all covariates (apart from collinear variables, please see below) in the final analyses as a complementary approach to matching, and a more stringent test of effects. This satisfies the WWC standard for baseline equivalence as several variables had effect sizes greater than or equal to 0.05.

Table 1:

Baseline comparison for 3rd grade Texas students, matched STEMscopes and non-STEMscopes schools

 

Variables
State Total
Sample Total
NON-STEMscopes
STEMscopes (RAW)
t-value
p-value
Effect size
Percent missing
Baseline school 2021 3rd grade Math “approaches” grade level 61% 55.07% 54.78%
55.36% (55.25%)
0.47 0.64 0.03 1%
Baseline school 2021 3rd grade Math “meets” grade level 30% 24.08% 24.01% 24.14% (24.07%) 0.13 0.90
0.01
1%
School size 526.4 525.8 508.8 542.7 (542.4%) 2.63
.01
0.16
1%
Percent economically disadvantaged students 60.61% 70.16% 70.29% 70.03% (70.16%) 0.19
0.85
0.01
1%
Percent Black/African American students 12.79% 14.60% 14.62% 14.57% (18.13%) 0.04
0.97
0.00
24.9%
Percent Latino/Hispanic students 52.71% 55.99% 56.13% 55.85% (56.26%) 0.16
0.87
0.01
5.75%
Percent Asian students 4.82% 2.67% 2.64% 2.70% (3.49%) 0.21
0.83
0.01
44.1%
Percent White/Caucasian students 26.30% 24.50% 24.39% 24.60% (28.23%) 0.14
0.89
0.01
17.4%
Percent Two or more races/ethnicities students 2.89% 3.17% 3.09% 3.25% (3.86%) 0.93
0.35
0.06
45.9%
Percent Native Hawaiian/Pacific Islander students 0.16% 0.00% 0.00% 0.00% (0.00%) 0.53
0.59
0.03
31.4%
Percent of English Language Learners (ELLs) 21.66% 25.20% 25.08% 25.32% (26.57%) 0.20
0.84
0.01
10.0%
Percent gifted students 8.02% 5.96% 6.02% 5.89% (6.17%) 0.42
0.68
0.03
18.1%
Percent special education students 11.70% 13.11% 13.12% 13.10% (12.83%) 0.08
0.94
0.00
2.9%
Percent of Title 1 students 64.25% 89.74% 90.24% 89.23% (90.86%) 0.56
0.58
0.03
7.4%

Please note: Sample columns include imputed data unless specified (specifically the parenthetical values in the STEMscopes column represent raw, not imputed data). “Region variables” are not included in the table. They include no missing data and are nominally numeric in nature. When included in the MatchIt program, the nearest neighbor specification would prioritize schools with same number in these categories (e.g., so same number = same district) but those numbers do not have quantifiable meaning.

Table 2:

Baseline comparison for 4th grade Texas students, matched STEMscopes and non-STEMscopes schools

 

Variables
State Total
Sample Total
NON-STEMscopes
STEMscopes (RAW)
t-value
p-value
Effect size
Percent missing
Baseline school 2021 3rd grade Math “approaches” grade level
61%
53.40%
53.50%
53.30% (55.66%)
0.16
0.87
0.01
1%
Baseline school 2021 3rd grade Math “meets” grade level
30%
29.30%
29.38%
29.22% (24.17%)
0.14
0.87
0.01
1%
School size
526.4
534.6
526.9
542.4 (541.7)
1.16
0.25
0.07
1%
Percent economically disadvantaged students
60.61%
70.67%
71.14%
70.19% (70.09%)
0.67
0.50
0.04
1%
Percent Black/African American students
12.79%
14.11%
13.54%
14.68% (18.18%)
1.05
0.29
0.06
24.9%
Percent Latino/Hispanic students
52.71%
56.61%
57.27%
55.95% (56.25%)
0.77
0.44
0.05
5.8%
Percent Asian students
4.82%
2.69%
2.46%
2.93% (3.44%)
1.62
0.11
0.10
44.1%
Percent White/Caucasian students
26.30%
24.38%
24.56%
24.19% (21.55%)
0.25
0.80
0.02
17.4%
Percent Two or more races/ethnicities students
2.89%
3.13%
3.05%
3.22% (3.92%)
1.05
0.29
0.06
45.9%
Percent Native Hawaiian/ Pacific Islander students
0.16%
0.00%
0.00%
0.00% (0.00%)
0.19
0.85
0.01
31.4%
Percent of English Language Learners (ELLs)
21.66%
25.89%
26.09%
25.68% (26.63%)
0.32
0.75
0.02
10%
Percent gifted students
8.02%
5.83%
5.83%
5.83% (6.21%)
0.02
0.98
0.00
18.1%
Percent special education students
11.70%
13.04%
12.99%
13.09% (13.08%)
0.34
0.74
0.02
2.9%
Percent of Title 1 students
64.25%
90.38%
90.50%
90.26% (90.53%)
0.14
0.89
0.01
7.4%

Please note: Sample columns include imputed data unless specified (specifically the parenthetical values in the STEMscopes column represent raw, not imputed data). “Region variables” are not included in the table. They include no missing data and are nominally numeric in nature. When included in the MatchIt program, the nearest neighbor specification would prioritize schools with the same number in these categories (e.g., so same number = same district) but those numbers do not have quantifiable meaning.

Method continued

Planned analyses

Substantive analyses were conducted with R-studio’s Lavaan structural equation modeling package because this package includes estimation with full information maximum likelihood (FIML) to handle missing data, and is capable of estimating multivariate multi-outcome models, as well as capable of accounting for data clustering effects. Specifically, as mentioned above, as a measure of privacy, state data does not include a numeric value for any variable where less than 10 students contributed data. This led to missing data in the STAAR math outcome data. We used FIML procedures to handle missing data estimation to ensure that the final analysis was not biased by this missing outcome data.

School data is also nested data with schools nested within districts. Districts tend to have overarching policies and procedures that affect student learning (and tend to serve the same underlying populating within a district versus across districts) and thus may need to be considered in analyses as we would expect that schools in the same districts may be more similar than schools in different districts. In the current analyses, we calculated the intra-class correlation (ICC) for the main STAAR math outcome: percent of students who met grade level expectations or above for both 3rd and 4th grade. The ICC ranges from 0 – 1, with larger numbers indicating a greater effect of “district” on outcomes, and a need to correct for clustering. For 3rd grade, the ICC was 0.28, and for 4th grade it was 0.41, suggesting a need for corrections. We could not use a true “multi-level model” as many districts in our sample only included one school, and multi-level models require at clusters to include at least two units (in this case schools). However, the Lavaan package includes a standard error correction for clustered data and also allows for the estimation of a random effect associated with a district. Essentially this parameter accounts for the potential effect of district within the model, but “random” indicates that the parameters were allowed to vary across districts (as we may expect districts with many schools may have had a larger effect on the sample’s outcome estimation that districts with only 1 school). Of note, in all models, although we include the random effect of the district, it was not ever a significant predictor of outcomes.

As a stringent test of the effects of STEMscopes, we include multiple covariates in all analyses including baseline 2021 STAAR math percent of students who met grade level expectations or above by school, school size, percent of economically disadvantaged students, percent of students by race/ethnicity, percent of ELL students. For 4th grade, we also include percent of students in the gifted and talents program and in special education services as in 4th we saw relationships emerge between math outcomes and these variables. Of note, in both 3rd and 4th grade analysis, we could not include the percent of White/Caucasian students, in the same model as percent of Hispanic students or percent of economically disadvantaged students as these covariates were highly correlated (r’s > 0.70). This would introduce multicollinearity to the model (but please note although we present models without white/Caucasian included, we ran models that separately included percent white/Caucasian and all models had similar results).

In addition to the main multivariate regressions used for the main outcome analyses, we also used multivariate multi-outcome models as a way to test differences between different categories of covariates: specifically differences in the impact of STEMscopes Math based on different race/ethnicity categories, and English language learner status. These models simultaneously estimate the parameter estimates of all variables on more than one outcome as specified. For example, the race/ethnicity model included three outcomes: percentage of African American/black students who met grade-level standards above as well as percent Hispanic and percent white/Caucasian students who met grade level standards or above.

Once the multi-outcome models were estimated, we calculated Wald tests to evaluate potential differences between the estimated parameters associated with STEMscopes Math for each outcome. Wald tests are used to assess the addition of a constraint on statistical parameters of a model. In the current models, we would introduce a constraint such that we set the STEMscopes Math parameter equal across the outcomes associated with students of different races. If the Wald test value was NOT significant, it would mean that this added constraint (that parameters are equal) did NOT introduce a misfit to the model and thus should be included. In other words, a non-significant result means that parameters may have differed numerically but not statistically. However, a significant Wald test indicates that the constraint adds model misfit and that the parameters should be allowed to vary (not constrained equal). Put another way, we could consider the parameters as significantly varying from each other. It is this last sense- that a significant Wald test indicates that parameters significantly vary from each other that helps us test whether STEMscopes Math has significantly differential effects across different categories of outcomes.

Conclusion

This report provides evidence that schools that used STEMscopes Math in the 2021-2022 school year had higher 3rd grade and 4th grade rates of students meeting or exceeding grade level expectations than matched schools that did not use STEMscopes Math when controlling for previous year achievement, and several important demographic variables. Specifically, STEMscopes schools increased the meets and exceeds grade level standard 3rd grade percent rate an estimated 1.96 points in STEMscopes schools, resulting in approximately 870 additional students passing *among the students who were tested in the STEMscopes schools. STEMscopes Math increased the 4th grade passing rate by an estimated 2.22 points resulting in approximately 990 additional 4th graders relative to the non-STEMscopes districts. Taken together, these findings provide consistent support for the effectiveness of the STEMscopes Science Curriculum.

Work Cited
Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statistical science: a review journal of the Institute of Mathematical Statistics, 25, 1.