No country for young kids? The effects of school starting age throughout childhood and beyond

Being the youngest in a cohort entails many penalties. Using administrative data of every public-school student in Portugal, we show that although performance gains from being 1-year older fade quickly from primary education to high school, age-related penalties persist through a combination of grade retention, educational tracking and testing policies. Those that start school younger are more likely to repeat grades and ultimately drop out from school. Older entrants are more likely to enroll in scientific curricula in high school, are more successful at accessing public higher education and enroll in more selective undergraduate courses. JEL Classification: H75, I21, J13.


I. Introduction
Every year millions of children around the world enter school for the first time. However, school age regulations make them initiate formal schooling at very different stages of their social, emotional and cognitive development. That these differences play a significant role in explaining individual outcomes, and academic success in particular, is a well-established empirical fact 1 . Children starting school older typically reveal higher cognitive capacity, as measured by standardized achievement tests (e.g. Bedard and Dhuey, 2006;Puhani and Weber, 2007;McEwan and Shapiro, 2008;Elder and Lubotsky, 2009;Cascio and Schanzenbach, 2016;Attar and Cohen-Zada, 2018).
Less clear is how given institutional features change the extent of these effects throughout a child's schooling career and onto adulthood. Measured cognitive differences tend to fade as children age, but effects may persist through other mechanisms. In countries where grade retention is a typical strategy, younger entrants are likelier to repeat grades (e.g. McEwan and Shapiro, 2008). In systems that track students into different curricular offers, older students are likelier to choose or be tracked into an academically-oriented offer (Puhani and Weber, 2007;Schneeweis and Zweimüller, 2014;Attar and Cohen-Zada, 2018).
Recent literature shows that age differences impact on individual and social well-being through mechanisms other than academic success. Younger students are more likely to be classified as having learning disabilities and attention deficit disorders (Dhuey and Lipscomb, 2010;Elder and Lubotsky, 2009;Evans et al., 2010;Mühlenweg et al., 2012), are less persistent and more irritable (Mühlenweg et al., 2012), are significantly more likely to suffer from bullying or victimization (Mühlenweg and Puhani, 2010a) and less likely to hold leadership positions in high-school (Dhuey and Lipscomb, 2008). Younger entrants are also shown to have a higher propensity to commit crimes as teenagers (Landersø et al., 2017), as well as a higher likelihood of juvenile delinquency (Cook and Kang, 2016) or of being incarcerated for juvenile crime (Dhuey et al., 2017). The impact on long-term outcomes is somewhat more ambiguous. While some find a causal link between starting school later and higher wages (Fredriksson andÖckert, 2014;Kawaguchi, 2011) or the likelihood of becoming a corporate CEO (Du et al., 2012), others do not find long-term effects on prime-age earnings (Black et al., 2011;Dobkin and Ferreira, 2010) We contribute to the literature by empirically estimating the impact of school starting age on a wide range of individual outcomes, from early childhood until the end of upper secondary education. For that purpose, we use de-identified longitudinal administrative records of every student enrolled in public schools in Portugal. To the best of our knowledge, we provide the first plausibly causal evidence about the impact of school starting age on student outcomes in Portugal 3 , a school system with idiosyncratic characteristics: contrary to most countries, beyond a binding enrollment cutoff at 1 January, parents whose children are born as early as 16 September have leeway to legally postpone their child's entrance in school. Our identification strategy exploits variation in school starting age around cutoff discontinuities using exact birth dates (such as in McEwan and Shapiro, 2008;Evans et al., 2010;Dobkin and Ferreira, 2010;Peña, 2017;Attar and Cohen-Zada, 2018). The use of exact birth dates enable us to avoid biases induced by seasonal patterns present in coarser measures, such as quarter or month of birth (as noted in Buckles and Hungerman, 2013). Resting on a well-identified set of falsifiable assumptions, our design is analogous to a local randomized experiment (Lee and Lemieux, 2010). Our estimates rely on local polynomial methods in accordance with the growing methodological consensus for their adequacy in regression discontinuity designs (Gelman and Imbens, 2017;Imbens and Lemieux, 2008) 4 . Given the longitudinal nature of our samples, and in order to allay concerns with non-compliance, we estimate local average treatment effects within the context of a fuzzy regression discontinuity design, going beyond intent-to-treat effects where possible.
We find that being 1-year older when entering school leads to significant gains in perceived cognitive capacity at grades 4, 6 and 9. In grade 4, Math and Language exam scores are, on average, higher by 0.27 and 0.36 standard deviations (σ), respectively.
However, the gains on test scores from starting school later fade quickly. By the end of grade 9 we estimate the impact to be of about 0.20 of a standard deviation (σ) in Language and 0.16σ in Math. The rate at which local average treatment effects fall suggests that the findings are consistent with the hypothesis that differences in cognitive maturity when taking the test-rather than school entry postponement-are driving the results. As we only have metrics of individual outcomes measured at the same time-not at the exact same age-the estimates combine the effects of starting school age with those of age-at-test.
Despite the decline in the achievement premium, we also find that school starting age differences persist well beyond elementary education. In a country where grade retention remains ubiquitous as a remedial strategy, older entrants have a 5 percentage points (pp) lower probability of repeating at least one grade in primary education, and 4pp by the end of grade 9. Younger students are doubly penalized, as even conditional on exam performance, younger students continue to be significantly more likely to be retained in the same grade. Furthermore, older students have a lower probability (-2pp) of dropping out from school by the end of grade 9. Significantly, intent-to-treat effects show that those eligible to start school one year later are more likely to enroll in the academic track (2pp) in high school, and conditional on being enrolled in the academic track, opt more often for scientific curricula (3pp), have higher application scores to access public higher education (0.10σ) and enroll in more selective undergraduate courses (0.13σ). On the other hand, we find no evidence of differences on demand for college seats, enrollment in STEM courses, or first-choice application success.
Section II further describes the empirical context of our findings. Section III describes the empirical strategy, Section IV details the data used in the analysis and Section V shows the empirical validity of our strategy. Finally, Sections VI, VII and VIII describe all findings in greater detail. Section IX concludes.

II. Institutional Setting
Portugal's compulsory education laws dictate enrollment in first grade to be mandatory for every child who is at least 6-years old by 15 September. Yet considerable leeway is given to parents wanting to enroll their children after this date. Students born between 16 September and 31 December of a given calendar year are deemed conditional and can still enroll if parents so require and there are available places in already created classes in school 5 . The existing rules implicitly generate a second-more binding-cutoff at 1 January, as children born in the beginning of the next calendar year enroll in the following school year. Children must thus be at least 5.7 years-old by 15 September, when starting school. Since most conditional students (86%) in Portugal are not deferred to enroll the following year, a child born in the beginning of January typically enrolls in first grade 1-year later than their peers born in the end of December. The Portuguese school system is organized in three sequential levels: early childhood education and care, basic education and upper secondary education. Basic education covers the initial nine grades of schooling and is divided in three studying cycles, of various lengths. First cycle comprises the four initial grades of primary education and teaching of most subjects is under the purview of a single teacher. Second cycle has a length of two academic years. The third cycle of basic education-comprising grades 7 through 9-corresponds to lower secondary education. At completion of basic education, typically at age 15, students transition to upper secondary education (high school). Upper secondary education offer is divided in a general academic and vocational pathways. In the general academic track, students can select on of four concentrations: science and technology, social and economic sciences, languages and humanities, or visual arts. On the other hand, the vocational track offers a plethora of denominations, with curricula geared toward earlier integration in the labor market. Compulsory schooling laws in the country determine that students should be enrolled in school until finishing the academic year when they turn 18 or until high school graduation if before the age of 18.
Students in the Portuguese school system are evaluated through teacher testing and national exams or assessments. Barring accommodations to specific student needs, national exams in Portuguese Language and Math are performed by every student in the system, by the end of fourth grade (until 2015) and ninth grade. Children sit through the exam at exactly the same date and time, answering the same questions. Exams are then anonymously evaluated by randomly allocated evaluator teachers, from schools other than the school in which the student is enrolled. By the end of grade 6, national assessment tests follow a similar procedure. In order to graduate from high school's general academic track, students must also sit through national exams -typically completing two course-specific exams in grade 11 and another two in grade 12, in most cases Portuguese Language and Math. Students can only gain admission to college if they have a passing grade in both grade 12 exams.
Admission to public higher education in Portugal is centrally governed. Candidates are publicly listed by the government according to candidate's ranked preferences, application scores and available capacity. Application scores combine high school GPA and exam scores. The application score depends on the college and department to which the student applies. In final application scores, high school GPA must weigh a minimum of 50 percent in the admission decision. However, each tertiary education institution can set the weight of exam scores within a band of 35 to 50 percent of the total application score.

III. Empirical Strategy
We start with the basic relationship of interest captured by the following linear model: Where Y ig is the outcome of interest of student i measured by the end of school year g, A i is the age of student i measured in decimal years as of 15 September in the year she first enrolled in first grade, X i is a vector of individual and family background characteristics measured during the year student i first enrolled in first grade, φ c are cohort fixed effects and ϵ i is an idiosyncratic error term. In this setting, β represents the marginal effect of delaying enrollment by 1-year. Nonetheless, Equation 1 does not account for school starting age being likely correlated with characteristics of students and their families that are not typically observed in the data, such as learning maturity.
To overcome endogeneity concerns, we exploit exogenous variation induced by the school starting age regulations. In Section V we present evidence that exists a sharp discontinuity at 1 January, and a kink at the 16 September cutoff. Therefore, our main identification strategy relies on comparing outcomes of students that are born before the cutoff of 1 January and those that are born on or after that same date and are induced to enroll in the following school year.
We interpret our regression discontinuity results in light of a potential outcomes framework (Hahn et al., 2001). Provided that other characteristics associated with the outcomes of interest are continuous at the cutoff and birth dates around the cutoff are as good as randomly assigned, the outcomes of those born before the cutoff provide a convincing counterfactual for those that are born after the cutoff had they enrolled one year earlier.
In order to estimate our coefficients we first construct a variable B i with 366 unique integer values (allowing for leap years) that identify the birthday of student i in the calendar year, as in McEwan and Shapiro (2008). We standardize B i as a distance (in days) to the cutoff of 1 January (B i = 0) and make it run from 1 July (B i = −184) to 30 June (B i = 181), so that we have the discontinuity at about mid-range of the running variable. Based on it we define an indicator variable, τ i = 1(B i ≥ 0), which identifies the values of B i that are equal to or exceed the enrollment cutoff of 1 January.
We estimate the discontinuity using both local-linear and local-quadratic regression methods, through the following reduced form weighted least squares estimator 6 : Where f (B i ) is, depending on the specification, a piecewise linear ( function of the running variable interacted with the cutoff. K h (τ i , B i ) is a triangular weighting kernel function with bandwidth h, given by: The bandwidth h here denotes the window of birth dates (B i ) to the left and to the right of the cutoff, used to estimate the coefficient of interest α, and N (h) : The triangular kernel assigns zero weight to all observations outside the interval defined by the bandwidth [−h, h] and positive weights to all observations within, with weights declining symmetrically and linearly as the value of the running variable gets farther away from the cutoff. In order to avoid imposing an ad hoc bandwidth length, we use datadriven methods to select optimal bandwidth for each regression. In particular, we use an upgraded version of the mean square error (MSE) optimal bandwidth selectors discussed in Imbens and Kalyanaraman (2012) (for the linear case) and Calonico et al. (2014b) (for the quadratic case) and derived in Calonico et al. (2018) 7 .
Since existing rules in Portugal provide leeway for parents to delay children's enrollment between the 16 September and 31 December (covering our control group), and since some parents may not comply with the more binding cutoff of 1 January, α captures at an intent-to-treat effect. Therefore, we estimate local average treatment effects (LATE) through a fuzzy regression discontinuity design, using the indicator of being born after the cutoff (τ i ) as an instrument for school starting age (A i ): Estimates of parameter θ in first stage Equation 4 identify the estimated proportion of the complier population with the delayed school entrance close to the cutoff. Estimates of β, in Equation 5, identify the local average treatment effects for those complying with the cutoff (Imbens and Angrist, 1994), as further discussed in Section V. For precision, we also control for a vector of individual and family background characteristics (X i ) and cohort fixed effects (φ c ).
Importantly, the plausibly causal interpretation of the effects rests on the assumption that parents do not systematically time births relative to the cutoff. In the next Section we provide evidence that there does not seem to be systematic manipulation of the running variable close to the cutoff. Additionally, we run balancing analysis of observable socioeconomic characteristics of the students around the cutoff. Finally, our working assumption is that precise birth timing around the cutoffs does not introduce sharp differences in unobserved characteristics that affect our outcomes of interest.

A. Data Description
We use a de-identified administrative dataset (MISI) 8 containing detailed information on every student enrolled in Portugal from 2006 to 2017. A unique student identifier tracks students throughout grades, even as they change schools, allowing us to have a panel dataset of every student since they are first observed in the education system until the most recent observation 9 We focus on non-adult students enrolled in the regular public system of education 10 .
We merge MISI data with a two other administrative datasets (ENEB and ENES) containing comprehensive information on student achievement school and national exams.
Sitting standardized achievement tests in Portugal is not a stable policy, so we can only recover a few years of outcome data. During the period for which there is available data we can gather grade 4 national exam scores sat between 2013 and 2015, as well as grade 6 scores for the period 2012-2015 and grade 9, 11 and 12 information for the entire period of the dataset (2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017). Our first analytical dataset contains student-level information at grade 1 as well as student outcomes for those that sat at least one Math or Language grade 4, grade 6 or grade 9 national exam, a total of 660 573 children. For reduced form estimates of upper secondary and post-secondary outcomes, we use a second anaytical dataset where over 630 thousand students are followed from grade 9 until the end of grade 12 11 .

B. Main Variables
Outcomes (Y ig ). The main outcome variables include scores at Language and Math national exams by the end of grades 4, 6 and 9, as well as other course-specific and trackspecific national exams by the end of grades 11 and 12. We also consider grade retention, dropout and graudation indicators. In the transtion to high school we also construct dummies for if the student, conditional on having completed grade 9, decided to enroll in the general academic track and if, conditional on having opted for the general academic track, decided to pursue a science-oriented curricula in high school. Post-secondary outcomes include indicators for if the student applied to college, what was her application score, if she was rejected from going to college, enrolled in an academic degree, enrolled in a STEM course as well as the course selectivity of the college degree in which she enrolled.
Age variables. We define school starting age (A i )-our main regressor of interest-as the exact student age as of 15 September of each school year. It measures the age at which the student is first observed in first grade. By the way it is constructed, unit variations in this variable represent a 1-year variation in the age at which the student is first enrolled in grade 1. On the other hand, the exact date of birth (B i ) is measured as a continuous variable (measured in days) representing all possible 366 days of the calendar year (see Section III). It is the running variable in our regression discontinuity strategy, from which we also extract the relevant post-cutoff indicator for the analysis (τ i ).
Student characteristics (X i ): We construct a vector of variables for several observable characteristics of students such as indicators for gender, first generation immigrant, access to a personal computer at home, recipiency of school social support as a proxy variable for household financial constraints, unemployment status of the child's father, and a proxy for the level of education in the household. Table 1 documents summary statistics of student characteristics and outcomes for each full grade sample, and separately for students born 60-days before and after the 1 January cutoff 12 . The groups born before and after the cutoff seem to be relatively homogeneous with respect to their observable individual characteristics, at least for a significance level of 1 percent. Prevalence of female students, students with access to computer at home, first generation immigrants, recipients of school social support or those whose father were in an unemployment situation when entering school is similar across samples, as expected were the students randomly allocated around the cutoff. For higher levels of confidence, however, statistical differences may be found in some of the covariates. However, in Appendix Table 5 and Appendix Figure 12 we provide estimates that provide suggestive evidence of the continuity of each of these covariates at the cutoff, as is further discussed in the following section.

C. Descriptive Statistics
The Portuguese education system is characterized by having few immigrant students (2%) and relatively few households in which at least one of the parents has some sort of higher education (18-22%). A relatively high proportion of students received school social support 13 when in grade 1 (16-39%). Differences across grade samples are explained by a steady increase in the rate of identification of students in need of social support in the most recent years of the data.
On the other hand, for all outcome variables and the main regressor, differences in means between the 60-days before and 60-days after cutoff samples are always statistically significant at a 1 percent confidence level. The size of these differences already help anticipate the size of some of our estimates. Students born 60-days after the cutoff have, on average about more 0.7 years of age when starting school, relative to those born 60-days before the cutoff. Likewise, children born after the cutoff have a 0.23 standard deviation point (σ) advantage in Math and 0.26σ in Language by the end of grade 4. Even if these differences tend to quickly dissipate from grade to grade, we still reject a null difference in unconditional achievement in both subjects by the end of grade 9. Likewise, children born just after the cutoff are less likely to repeat in all of the observed grades.

A. Do Parents Plan Birth Dates Strategically?
We begin by presenting evidence in support of our identification strategy. It could be the case that assignment to our virtual control and treatment groups is not as good as random if strategic parents-aware of their benefits-plan births to occur before or after the cutoff.  Figure 1 confirms this through a local cubic density manipulation test (Cattaneo et al., 2018;McCrary, 2008), where we can observe the shaded confidence intervals intersecting at the cutoff (p-value = 0.191, for a null hypothesis of dissimilar density 4-days before and after the cutoff) 14 .
Given this evidence, our virtual assignment mechanism seems to hold its' validity.

B. Compliance
What is the age at which students start school in Portugal? Figure 2 depicts the average school starting age (SSA) of students within each birth date bin, measured as the distance in days relative to the 1 January cutoff (B i = 0). The solid lines are fitted values from a piecewise quadratic spline. In the figure, it can be observed a small discrete jump in school starting age at the 16 September (B i = −107). As discussed in Section II, there is leeway for parents to delay their child's SSA from this date onwards. SSA jumps slightly and, throughout the end of the calendar year, decreases at a relatively slower rate than before 16 September. A sharp discontinuity follows at the 1 January cutoff. Therefore, students born on or just after the latter cutoff average about 0.7 more years when starting school, relative to those born just before the cutoff. The fact that the difference in school starting age at the cutoff is less than 1-year between the two groups confirms imperfect compliance with the quasi-experimental design and motivates regression discontinuity estimates that take into account left-side non-compliance. About 14% of the students born on or after the 16 September defer entrance to the following school year. Appendix Figure 11 reveals the declining rates of compliance for our different samples. Compliant students are those that start school on the year they turn 6-years old and do not defer entrance before reaching the 1 January cutoff. On the other hand, our strategy does not hold on the 16 September cutoff as compliance with such a quasi-experiment would be extremely low to render powered estimates (i.e., the vast majority of parents does not delay their child's entrance at school). Table 2 presents point estimates of the school starting age discontinuity at the cutoff for 30-and 60-days fixed bandwidths, as well as for a changing MSE-optimal bandwidth, for each of the grade samples (per row). The results are analogous to the first stage of our two-stage fuzzy discontinuity design (Equation 4). The first column reports estimates for a local baseline specification, where school starting age is regressed on a quadratic function of the running variable and a triangular kernel, for a bandwidth of 30-days before and after the cutoff, separately for each grade sample. In accordance with Figure 2, we find a sharp and precisely estimated discontinuity at the cutoff. Column 2 reports results from a regression controlling for all student covariates presented in Table 1 and cohort fixed effects. Consistent with the evidence that student characteristics are continuous at the cutoff, there is no sizable change in the coefficient. For a longer 60-days bandwidth, results are more precise but tell a similar story, with no significant changes to the point estimates (Columns 3-5, Table 2). Columns 5 and 6 display results for bandwidths selected through data-driven methods. Results are the most precise, with coefficients varying between 0.69 and 0.74 across specifications and samples. Local polynomial estimates thus corroborate the raw differences in means on Table 1: Students born just after the cutoff are, on average, about 8 months older when starting school. However, among the compliant subpopulation this difference will evidently be 12 months, which allows us to identify the impact of starting school 1-year later.

C. Continuity
Is our virtual control group a reasonable counterfactual to older entrants? In Table 1 above, we present suggestive evidence that student characteristics of those born before and after 1 January are relatively identical close to the cutoff. The fact that point estimates in Table 2 do not change considerably after controlling for covariates further suggests that the groups are balanced in terms of observable characteristics.
Despite suggestive, these are yet not sufficient to convincingly show that covariates are continuous at the cutoff. As discussed in Section II, compulsory schooling rules in Portugal provide considerable leeway for parents to delay children's entrance for students born between 16 September and 31 December. If parents that delay students at 16 September significantly differ in their characteristics from those that opt to not defer, then if not our intent-to-treat, our local average treatment effect (LATE) estimates could be biased by compositional effects.
We test if our identification strategy survives continuity concerns along predetermined characteristics of treated children by running regressions analogous to those in Equations 4 and 5. Alternatively, though, we regress each observable covariate on school starting age, having the cutoff as the excluded instrument. In Appendix Table 5, we summarize per-sample two-stage least square estimates for local-quadratic specifications, with and without student controls, and a 60-days fixed bandwidth. We find precisely estimated null differences between complier groups, except for grade 4 sample students that receive school social support (for p-value < .01). However, the qualitative interpretation of these results is unequivocal: Since grade 4 students born on or after the 1 January cutoff are likelier to be recipients of school social support-a characteristic predictive of lower achievement-our estimated LATE, if positive, will be at most underestimated relative to the true average treatment effect at the cutoff.

VI. Impacts Throughout Basic Education
A. Student Achievement ITT. Figure 3 depicts sharp discontinuities in student achievement in Math and Language within each birth date cell, as well as each grade sample. Fourth graders born right after the cutoff are expected to score substantially higher in each of the exams. These differences tend to vanish as students become older. Columns 1 and 2 of Table 3 present reduced form estimates, based on local-quadratic regressions using an MSE-optimal bandwidth specification, controlling for student characteristics and cohort fixed effects. On average, students assigned to treatment have consistently better performance. Subject-specific differences seem to be important. By the end of grade 9, while being assigned to treatment confers an estimated 0.09σ premium in Math relative to being born just before the cutoff, in Language the difference is 0.14σ. Nonetheless, left-side non-compliance is thus likely downward-biasing the causal impact estimated through our reduced form model. Table 3 also summarizes the local average treatment effects (LATE) from regressions of SSA on Math and Language student achievement. School starting age is instrumented by the cutoff and includes a piecewise quadratic function of birth dates, our running variable. In accordance with the literature, we adjust standard errors for clustering within birth date cells. Column 3 presents per-grade estimates without any covariates.

LATE.
According to this baseline specification, entering school 1-year older entails an average benefit of about 28 percent of a standard deviation in grade 4 Math performance. When including student covariates and cohort fixed effects the effect size is slightly reduced to 0.27σ, as precision is improved through smaller standard errors (Column 4). Comparison between these two columns suggests that our identification strategy could allow us to forgo the inclusion of covariates without substantial changes in the estimated coefficients.
Columns 5 and 6 present the effects for Language achievement. The local-quadratic specifications reveal a larger effect of about 0.36σ at grade 4, with clustered standard errors at the birthday-level of 0.04σ. The estimated LATE decline at a relatively fast rate, especially between the end of grades 6 and 9. Whereas the estimated coefficient in Columns 4 and 6 drops to 0.26σ in Math and 0.29σ in Language by grade 6, respectively, by the end of grade 9 the local-quadratic estimate is as small as 0.16σ for Math and 0.20σ for Language.
Heterogeneity. We apply the same strategy to subsamples of our data in order to investigate suggestive evidence of heterogeneous effects. Figure 4 summarizes LATE estimates for different subsamples of a local-quadratic specification with student controls and cohort fixed effects. Top panel and bottom panel show the effect and 95% confidence intervals on grade 4, 6 and 9 Math and Language exam scores, respectively.
Gains in grade 4 Math are larger for girls by about 0.11σ relative to boys. As comparison, the gap between boys and girls in the Math exam in our grade 4 sample is of 0.13σ in favor of the former. However, for Math achievement, we fail to find statistically significant differences in the effect sizes across subgroups. Regarding Language performance, statistically significant benefits by grade 9 seem to be driven by students that are male, by those that do not benefit from social support and live in households where at least one of the parents or legal guardians as an higher education degree, suggesting that relatively well-off male students tend to benefit slightly more in Language, for entering school 1-year later. However, the precision of each of these estimates do not allow us to uncover statistically significant differences in effects across subgroups.

B. Grade Retention
ITT and LATE. Table 4 presents ITT and LATE of starting school 1-year older on the probability of repeating a grade, using local-quadratic specifications and controlling for cohort fixed effects. As with achievement, older entrants benefit substantially with respect to grade retention. Column 1 shows that students born after the cutoff have a significantly lower probability of repeating a grade at least once, in all grades. Restricting our attention to compliers (Column 4), the benefit becomes even larger. Compliant students born just after the cutoff are 5.3 percentage points (pp) less likely to have repeated at least once until grade 4, for an average of 13.7 percent among those born within 2 months before the cutoff (Table 1). ITT and LATE on grade retention display a different persistence pattern throughout time than impacts on student achievement do: Instead of continually decreasing, effects become slightly larger in magnitude by grade 6 (−7.9pp), quickly fading by grade 9 (−4.4pp).
Given the estimated effects on student achievement, results in Columns 1 and 4 are not surprising. Indeed, decisions to retain students strongly rely on the achievement of cognitive capacity of students. As older students perform better, they are also less likely to be penalized by retention.
However, are younger children more likely to repeat in spite of lower achievement, or mostly because of it? Table 4 shows that, even controlling for Math or Language achievement (Columns 2-3, 5-6), older entrants are still significantly less likely to repeat a grade, even if the effect quickly approaches zero as students age. The results suggest that the impact of SSA on grade retention operate through other mechanisms. Younger students are thus doubly penalized, through lower achievement in Math and Language as well as a higher likelihood of retention. Heterogeneity.

C. Student Attainment
Differences in school starting age also have an impact on basic education attainment outcomes. Figure 6 depicts reduced form estimates and 95% confidence intervals for the estimated impact of SSA on a series of attainment outcomes, across different specifications.
Because our data does not allow to observe whether high school students complied with the virtual assignment mechanism, we can only restrict our interpretation to ITT. Children eligible to enter school 1-year later are less likely (−2pp) to dropout, which compares to an overall average of 12% of students that dropout from school until grade 9.
SSA significantly impacts enrollment patterns in high school too. Older entrants are likelier to select a general academic track in high school (2pp)-a pathway selected by 72% of the students in our data. Figure 6 also shows that-conditional on having selected a general academic track in high school-students induced by the cutoff to start school 1-year later are also likelier (3.5pp) to enroll in the science and technology stream, a pathway that enables one to follow to STEM higher education courses, and selected by 59% of our sample.

A. Student Achievement and Retention
The patterns in student achievement observed throughout basic education persist until high school. Because we do not observe sufficient school years in our data, we can only estimate ITT for students that can be followed from grade 9 at most until grade 12. Although in most subjects we do not have enough statistical power to estimate precise nulls, for most subjects achievement gains tend to be statistically insignificant (see Appendix Figure 13). Nonetheless, we can safely affirm that older entrants have higher achievement in Physics and Biology by the end of grade 11, and Language and Biology of the order of between 5 to 10 percent of a standard deviation. Importantly, these results, albeit deriving from reduced forms, are lower bounds on the true LATE, as by high school the proportion of compliers in the control group that was not lost to grade retention, dropped out or moved to the vocational track is smaller than the same proportion in the treatment group.
Analogous to the impact on basic education outcomes, for students that have followed to upper secondary education, older entrants are less likely to be retained in grade. In Appendix Figure 14 we also show reduced form estimates of the impact of SSA on grade retention throughout upper secondary education. Students induced to start school 1-year later due to the 1 January cutoff will be less likely to be retained in grade.

B. College Applications
Do the effects of SSA persist along other margins even after high school graduation? We look into multiple outcomes on the applications of high school graduates to public colleges in Portugal. The left panel of Figure 7 depicts reduced form estimates and their respective 95% confidence intervals for a series of binary type of outcomes. SSA does not seem to impact on college seat demand. Students born just after the 1 January cutoff are not more or less likely to apply to higher education. Although with slightly less precision, we also fail to find significant effects in rejection rates, as well as success rates in the first of the three application phases to public higher education in Portugal. Although, by the end of basic education differences in SSA seem to play a role in preferences for scientific subjects, we fail to find such differences by the end of high school. Our ITT suggest that older entrants are not more likely than their younger peers to be accepted into academic universities vis-à-vis polytechnic institutions. Likewise, no significant differences are uncovered for enrollment patterns in Science, Technology, Engineering and Mathematics (STEM) majors, between groups.
But in which other ways may older and younger college applicants be different? The right panel of Figure 7 depicts other margins through which SSA effects may persist after high school graduation. Significantly, we find that higher education candidates born just after the 1 January cutoff have higher application scores (0.1σ). Higher application scores enlarge the option set of candidates, as well as the chances of being admitted. Evidence of this same phenomenon is the fact that, because SSA effects on the high school GPA and some national exam scores in upper secondary education are still prevalent, older entrants also enroll in more selective courses. In our most conservative point estimates, students born just after the cutoff enroll in courses 0.11σ more selective than others. 16 .

VIII. Robustness and Placebos
An important concern about our estimated LATE in Table 3 is that -despite controlling for cohort fixed effects -results may be driven by students that are retained in grade, clustered just before the cutoff. As shown before repeaters are disproportionately concentrated before the enrollment cutoff, which can bias estimates by introducing compositional effects. In order to overcome such concern, our first set of robustness checks is to restrict the main regressions to students that never repeated a grade. Appendix Table 6 presents results analogous to those in Table 3, only considering students that never repeated a grade. There are no statistically significant quantitative changes in the size of the effects, and certainly no changes in the qualitative interpretations of the restricted model. For each grade sample, even for the subset of relatively higher achieving never-repeaters, LATE estimates are identical, allowing to allay concerns with significant sample and attrition biases introduced by grade retention patterns.
A second concern is with patterns in birth dates reflecting parental characteristics that are not perceivable solely by inspecting the distribution of births across the calendar year. As in other countries, scheduled birth-giving and hospital service adjustments cause the frequency of births to decrease during weekends in Portugal. If the enrollment cutoff coincidentally falls close to weekends then differences at the cutoff may introduce some correlation with the characteristics of parents. However, controlling for weekday indicators produces no changes in the point estimates, suggesting this is not a problem in our analysis (see Appendix Table 7).
A third source of concern relates to the method itself. Since local regression estimates are sensitive to the choice of bandwidth, the optimal data-driven bandwidth could be systematically biasing our results. The choice of bandwidth typically entails contemplating a trade-off: Opting for a larger bandwidth includes more valid observations and increases precision, however if it is too wide our local specification may not be adequate. Appendix Appendix Figures 15, 16 and 17 show that-through randomization inference-our estimates are unlikely due to chance. All our estimates are at the end of the right-tail of the placebo distributions and are in line with the asymptotic p-values. We can also perceive that the coefficients at the true cutoff (represented by the gray dash line in Figure   15), except at grade 9 Math, are a clear outlier among the placebos (spread out along each of the x -axis).

IX. Conclusion
An exogenous one-year variation in school starting age has significant effects on primary level student outcomes. Students that are induced to delay enrollment in first grade for one-year improve student performance in 4th grade national exams by about 0.3 standard deviations (σ) in Math and almost 0.38σ in Language in Portugal 18 . Heterogeneity across groups is also limited. Students from more disadvantaged backgrounds-i.e. that receive school social support and have less educated parents-seem to benefit slightly more in terms of achievement in Math. Older girls also benefit slightly more than older boys.
In any case, delayed entrance is homogeneously beneficial to students across identified socioeconomic groups, with overlapping confidence intervals precluding us from taking further conclusions about these patterns. Importantly, we find that the cognitive premium by the end of elementary education persists across all groups, but quickly fades throughout lower and upper secondary education.
But through which causal mechanism do our local average treatment effects fade as students age? Since we do not have a second source of exogenous variation, we cannot separate the 'age-at-test' effects, reflecting cognitive maturity differences, from differential 'exposure to schooling' effects. Our results are thus best interpreted as absolute age effects.
If underlying causal mechanisms in Portugal are no different than in other contexts (e.g. Crawford et al., 2007;Black and Devereux, 2011;Fredriksson andÖckert, 2014;Peña, 2017;Cornelissen and Dustmann, 2019), 'age-at-test' effects may tend to dominate and lead to null or even negative impacts of delayed school entrance in the long-run.
However, certain institutional mechanisms make school starting age matter to the individual through other margins besides measurable achievement differences. Students that enter school one-year later are less likely to repeat a grade, a pattern that persists well into high school. Conditional on achievement-arguably the most determinant factor in retention decisions-older entrants still are less likely to repeat. Likewise, we find that older entrants are less likely to dropout from basic education. SSA is important in yet other ways. Students predicted to be older entrants into school-even if being exposed to schooling later-are more likely to enroll in a general academic track in high school and, conditional on it, to opt for high school concentrations dominated by scientific courses.
Our intent-to-treat effects also show that older students have higher application scores to access public higher education (0.1σ) and enroll in more selective undergraduate courses (0.11σ). However, we find no evidence of differences on the demand for college seats, enrollment in STEM courses, or first-choice application success.
How relevant are these findings for policy and individual decision-making? Our empirical strategy-besides evidence that birth dates are not manipulated around the cutoff and that covariates are balanced independently of treatment assignment-gives us confidence that our LATE and ITT are at least internally valid. However, it is well known that RD estimates are local to the cutoff and that direct extrapolation requires relatively strong assumptions about the homogeneity of treatment (Imbens and Lemieux, 2008). We can affirm that, due to their reliability, our estimates fall with a high degree of confidence within a short interval of true estimates of the causal effects of delayed school entrance in regions of the running variable (in this case, birth dates) where parents can more easily choose to delay children. In this sense, parents can be relatively safeguarded that delaying school entrance of their children within reasonably close distance of the cutoff -even if not exactly at it -will, on average, yield the described benefits. Nevertheless, choice prescription needs be nuanced. Even if restricting our attention to short-run benefits, the policy response-if intended at improving social well-being -may be at odds with the optimal choice by parents. Strategic parents will tend to respond to evidence of benefits of being relatively older through delaying enrollment of their children. For the individual child this can signify an advantage in her school success that may (or may not) spillover into adult outcomes. However, variance-increasing effects of delaying entrance on social welfare may lead to relatively more unequal outcomes across children from different backgrounds. In the case of Portugal, if strategic parents become more responsive to evidence of relative gains to older children, this may lead to an advantage that can be perceived as unfair to those constrained in their choice. As the legal option to defer entrance is mostly granted to those whose children are born between the 16 September and 31 December, this leads to an unequal distribution of choice. Moreover, if even for parents of conditional children access to information and good professional judgment is unequally distributed, early enrollees-i.e. those that do not delay school entrance-may be disproportionately penalized. Taking our evidence at face-value, early enrollees will have lower achievement and will also be more likely to repeat at least once during primary education, offsetting the potential future gain of entering one-year earlier into the labor market.
On the other hand, if school capacity constraints force parents to delay children entrance into formal schooling -another mechanism through which students may be delayed in the Portuguese education system -this too may have unintended consequences for ensuring equal opportunities. Children who start school a year later will typically remain in pre-school environments whose quality for learning will be more strongly correlated with family background. Many have been arguing that, insofar pre-schooling conditions are unequal, delaying public schooling may well reproduce and amplify such initial conditions (Deming and Dynarski, 2008). Both parents and policymakers should thus appropriately weigh costs associated with an additional year of childcare outside formal schooling environments and shorter work careers. Alternatively to changing policy in school entry laws, policymakers can also consider other ex ante measures, namely early childhood interventions aimed at addressing school readiness gaps across children from different socioeconomic groups.

Figure 6. Impact of SSA on Student Attainment
Notes: Figure is based on cohorts of students that entered a regular curriculum program in a public school in continental Portugal and were at least 5 years-old and at most 8 years-old when first enrolled in grade 1. Each marker represents a point estimate of the impact of school starting age on the probabilities of dropping out from school, opting for an academic track in high school and, conditional on having opted for an academic track in high school, having selected the science and technology stream. Point estimates are coefficients of local regressions, where the post-cutoff indicator (τ ) is the regressor of interest. Regressions include a piecewise linear or a piecewise quadratic function of birth dates (B) interacted with τ , depending on the specification indicated. A triangular kernel with data-driven optimal bandwidths is used. Regressions also control for cohort fixed effects as well as all other individual covariates reported in Table 1. Horizontal bars represent 95% confidence intervals for clustered standard errors at the birthday-level.

Figure 7. Impact of SSA on College Application Outcomes
Notes: Figure is based on cohorts of students that entered a regular curriculum program in a public school in continental Portugal and were at least 5 years-old and at most 8 years-old when first enrolled in grade 1. Point estimates are coefficients of local polynomial regressions. The regressor of interest is the post-cutoff indicator (τ ). In the left panel the dependent variables, where indicated, are dummies switched on for if the students applied to a public college, enrolled in the academic track in college, enrolled in a STEM course, failed to enroll in any course due to rejection and was enrolled in application phase 1. In the right panel the dependent variables, where indicated, are standardized variables for college application scores and an index of course selectivity. Course selectivity is measured as an index of i) percentile rank of the pair college-course in terms of the application scores of accepted candidates, ii) percentile rank of the standard deviation of application scores of accepted candidates, and iii) acceptance rate of applications of each course in each higher education institution. The values of the latent variable are predicted through principal factor analysis, with results being later standardized to have mean zero and standard deviation of one. All regressions include a piecewise quadratic function of birth dates (B) interacted with τ . A triangular kernel with data-driven optimal bandwidths is used. Regressions also control for cohort fixed effects as well as all other individual covariates in Table 1. The horizontal bars represent 95% confidence intervals for clustered standard errors at the birthday-level.

Notes
1 The literature on the effects of school starting age emerges from such diverse contexts as the United States (e.g. Dhuey and Lipscomb, 2010;Elder and Lubotsky, 2009;Evans et al., 2010;Dobkin and Ferreira, 2010;Cascio and Schanzenbach, 2016;Cook and Kang, 2018) 5 Enrollment for conditionals is not the default option; an enrollment requirement by parents is a necessary condition to starting the school year before turning 6-years old. However, it is not a sufficient one, as it also depends on school capacity constraints.
6 Gelman and Imbens (2017) show why regression discontinuity estimation through local low-order polynomial approximations should be preferred to global polynomial regressions. 9 A student's track is lost when she moves abroad, drops from the education system altogether, or dies.
We may also lose track of students if these move to a different pubic or private school and the matching algorithm is unable to correctly assign the unique identifier to new instances of the same student in the system. education and training or artistic courses outside the scope of the regular curriculum between grades 1 and 12.
11 Appendix A documents in more detail all data treatment, including attrition rates for each of the cohorts.
12 The choice of a 60-days window to each side of the cutoff is justified by full coverage of the optimal bandwidths used for estimation later on.
13 School social support (ASE) in Portugal is tied to household financial constraints. Students identified as ASE have half-to fully-reduced price meals at school, textbooks and school material.
14 The relatively sharp decline in density before the cutoff occurs during the Christmas period, with parents appearing to opt-out from having children during the period. Despite making a full analysis of the parental preferences for birth dates being out of our scope, an analysis of the histogram of births across the whole year reveals clear seasonality in the data. In particular, we find that parents prefer to have children during September, 20 September being the most common birthday in our data. 171 319 (27%) applied to public higher education after graduating from high school. In the case of college application outcomes, we benefited from an hand-collected dataset, that was available on-line for a given period of time and allowed to link the data with the existing administrative datasets. We also trim the dataset for students with outlier ages by only keeping children that are at least 5-years old and had at most 8-years old when first enrolled in Grade 1.
Course selectivity is measured as an index of i) percentile rank of the pair higher education institution course in terms of the application scores of accepted candidates, ii) percentile rank of the standard deviation of application scores of accepted candidates, and iii) acceptance rate of applications. The values of the latent variable are predicted through principal factor analysis, with results being later standardized to have mean zero and standard deviation of one.
To construct the proxy variable for the household level of education we minimize problems with missing values by including the maximum level of education (in terms of years of study required) in the household (from either the father, the mother, or the legal guardian of the child, in case information for any of the parents is not available). All these characteristics are measured as of the school year in which the student is first enrolled in Grade 1.
In the Figures 8, 9 and 10, below, we document the rates of attrition in our analysis datasets, for each of the three samples and years at which the students started school.
For longer panels (such as in Grades 6 and 9 samples) we tend to follow less students. In the administrative data, the track of a student is lost when she moves abroad, drops from the education system altogether, or dies. We may also lose track of students if these move to a different pubic or private school and the matching algorithm is unable to correctly assign the unique identifier to new instances of the same student in the system.
The main outcome variables in our regressions is constructed from the score points of students in Grades 4, 6 and 9 Math and Language national exams, as a proxy for student cognitive ability. Students sit -or sat -national exams or standardized achievement tests at the end of each of these grades. To the best of our knowledge, this is the only reliable assessment of cognitive ability that was systematically performed to a large cohort of children in Portugal. Exams are anonymized and scored by randomly selected evaluating teachers, which are not teachers of the students whose achievement is being scored.
The major advantage is that these tests were sat by the universe of eligible children by Grades 4, 6 and 9. Due to policy changes, Grade 4 exams were discontinued from the beginning of the 2015/16 school year onwards. Additionally, the score scale at which ability was measured has changed from school year 2012/2013 onwards, with the previous discrete scale (0-5) being insufficient to retain relevant variation across students. Given Exam scores below 50 are considered a fail, and students get a level 2 -a "negative" level, as it can lead to retention in the same grade. The raw distribution of exam scores strongly suggests that graders tend to upgrade scores that fall within a region of 5 points While this sort of manipulation may be beneficial to students at the margin of threshold levels, it can significantly bias the analysis if one is to take unit changes in the exam score as informative of cognitive differences between individuals. In order to circumvent this concern, we collapse the 0-100 scale into a 0-20 scale. Such a scale has a couple of advantages. First, it still retains informative variation across students. Second, it almost entirely eliminates score manipulation bias and reduces noise. In the new scale, 5 -instead of 1 -underlying exam points are now considered informative of student abilities differences. In other words, students that have 47, 48, 49, or 50 are considered within the Course-and track-specific exams include the Grade 11 subject exams of Physics, Biology, Geometry, Economics, Philosophy and Geography, as well as Grade 12 subject exams of Language, Math A, Physics, Biology, History, Geometry, Drawing and Economics.   Notes: ll coefficients are estimates of local regressions. The excluded instrument is the post-cutoff indicator (τ ). Both first and second stage regressions include a piecewise linear or quadratic function of birth dates (B) interacted with τ depending on the specification indicated. A triangular kernel with an data-driven MSE-optimal bandwidth is used. All regressions also control for cohort fixed effects, day of the week in which the student is born, as well as individual covariates in the form of indicator variables for gender (1 if female), immigrant status (1 if first generation immigrant), recipiency of school social support (1 if receiver), dad's unemployment status (1 if unemployed), access to computer at home (1 if yes), and fine-grained descriptions of the maximum level of education taken by the guardians of the child (e.g. primary education, lower secondary, bachelor degree, etc). Robust standard errors clustered at the birthday level are presented in parentheses. Notes: All coefficients are estimates of local-quadratic regressions. The excluded instrument is the post-cutoff indicator (τ ). Both first and second stage regressions include a piecewise quadratic function of birth dates (B) interacted with τ , depending on the specification indicated. As indicated, a triangular kernel with a 30-days, 60-days and a data-driven MSE-optimal choice of bandwidth, allowing bandwidths before and after the cutoff to differ, are used. All regressions also control for cohort fixed effects as well as individual covariates in the form of indicator variables for gender (1 if female), immigrant status (1 if first generation immigrant), recipiency of school social support (1 if receiver), dad's unemployment status (1 if unemployed), access to computer at home (1 if yes), and fine-grained descriptions of the maximum level of education taken by the guardians of the child (e.g. primary education, lower secondary, bachelor degree, etc). Robust standard errors clustered at the birthday level are presented in parentheses.

48
C. Supplemental Figures   Figure 11. Compliance rates with the 1 January cutoff by grade sample Notes: Figure is based on cohorts of students that entered a regular curriculum program in a public school in continental Portugal between 2007 and 2013, and were at least 5 years-old and at most 8 years-old when first enrolled in Grade 1. Lines represent the trend (by day of birth) in compliance rates, i.e., the ratio of students that did not differ school entrance to the next school year given the modal school starting age for students born in a given day and the number of students born in each birth date bins, by grade sample. The first dashed line represents the 16 September cutoff, while the second dashed line represents the 1 January cutoff.  Figure 13. Impact of delayed entrance eligibility on student achievement in Grades 11 and 12 national exams by subject -ITT Notes: The left-hand panel refers to exams in grade 11. The right-hand side refers to exams in grade 12. Point estimates are coefficients of local regressions. The regressor of interest is the post-cutoff indicator (τ ). All regressions include a piecewise quadratic function of birth dates (B) interacted with τ . A triangular kernel with data-driven optimal bandwidths is used. Regressions also control for cohort fixed effects as well as all other individual covariates. The bars 95% confidence intervals for clustered standard errors at the birthday-level. Figure 14. Impact of school starting age on grade retention throughout high school -ITT Notes: Point estimates are coefficients of local regressions. The regressor of interest is the post-cutoff indicator (τ ). All regressions include a piecewise quadratic function of birth dates (B) interacted with τ . A triangular kernel with data-driven optimal bandwidths is used. Regressions also control for cohort fixed effects as well as all other individual covariates. The shaded area represents 95% confidence intervals for clustered standard errors at the birthday-level. For each placebo cutoff the specification estimated is the one in Column 1 (for Math) and Column 2 (for Language) in Table 3, with varying MSE-optimal bandwidths. The vertical red line represents the position of the coefficient of the true cutoff at the distribution of placebo cutoffs. Randomization-based p-values, computed through the software package rdpermute and according to (Ganong and Jäger, 2018), are presented under the asymptotic p-values for the preferred specification at the true cutoff.  Table  4, with varying MSE-optimal bandwidths. The vertical red line represents the position of the coefficient of the true cutoff at the distribution of placebo cutoffs. Randomization-based p-values, computed through the software package rdpermute and according to (Ganong and Jäger, 2018), are presented under the asymptotic p-values for the preferred specification at the true cutoff. Figure 17. Intent-to-treat effects by potential cutoff, subject and grade Notes: Figure is based on cohorts of students that entered a regular curriculum program in a public school in continental Portugal between 2007 and 2013, and were at least 5 years-old and at most 8 years-old when first enrolled in Grade 1. Each panel shows estimated intent-to-treat effects for a total of 200 potential cutoffs (including the true one) along the y-axis. Cutoffs considered are all where B i = {100, 99}spread along the x-axis. For each placebo cutoff the specification estimated is the one in Column 1 (for Math) and Column 2 (for Language) in Table 3, with varying MSE-optimal bandwidths. The vertical dashed line represents the position of the coefficient of the true cutoff at the distribution of placebo cutoffs.