Health risks and genetic architecture of objectively measured multidimensional sleep health
Study participants
The UK Biobank is a large prospective cohort study of over 500,000 middle-aged adults in Great Britain69. The initial assessment was conducted during 2006–2010. The UK Biobank has received approval from the North West Multi-centre Research Ethics Committee (MREC) as a Research Tissue Bank (RTB). Written informed consent was obtained from all participants. Between May 2013 and December 2015, about 240,000 participants were invited to wear the Axivity AX3 wrist-worn triaxial accelerometer on their dominant wrists for seven consecutive days. A total of 103,720 participants returned their activity monitors with data covering at least three complete 24-h periods. After excluding those who withdrew from the study, we obtained raw activity monitor data for 103,626 participants (data-field 90001) in the form of binary Continuous Wave Accelerometer (cwa) files. After applying data quality control procedures recommended by the UK Biobank accelerometer working group and following data processing approaches used in previous studies based on UK Biobank accelerometry data, a total of 85,233 participants were retained for the final analysis67,68 (Supplementary Fig. 18).
For the whole-exome and whole-genome sequence analyses (UK Biobank Field #23158 and #24304), a series of quality control procedures were implemented to filter out low-quality variants and samples (see the corresponding section below for details). We did not use any study design that required randomization or blinding.
Accelerometer-based sleep variables
The GGIR R package was utilized to extract raw accelerometer data, enabling thorough quantitative assessments of sleep patterns70. This tool uses an algorithm validated against polysomnograms in an external cohort to accurately detect sleep periods without the need for a sleep diary, thus minimizing potential biases11. Median z-angle changes over 5-min rolling windows across a 24-h period were calculated to ensure activity monitor orientation insensitivity. Inactivity bouts lasting ≥30 min were documented, with consecutive bouts <60 min apart combined into blocks. The sleep period time-window (SPT-window) was defined by the onset and conclusion of the longest continuous inactivity block.
Building on the SPT, variables such as sleep onset time, wake-up time, sleep midpoint, sleep duration, and sleep efficiency were derived. Utilizing the time series of raw accelerometer data and an extended cosine model, circadian rhythm analyses were conducted to obtain metrics such as intra-daily variability (IV), inter-daily stability (IS), relative amplitude (RA), amplitude, acrophase, midline estimating statistic of rhythm (MESOR), and pseudo-F71,72. Detailed definitions and significance of the sleep variables extracted in this study are provided in Supplementary Data 37. Sleep variables derived from the SPT-window underwent thorough screening to exclude outliers, ensuring the inclusion of valid data for statistical analysis. Based on the data quality metrics provided by the UK Biobank accelerometer working group, the exclusion criteria were as follows: (1) data not well calibrated; (2) data with poor wear time; (3) unreliable data size; (4) data affected by daylight savings crossover; (5) data not calibrated on own data; (6) data with interrupted recording periods >0; and (7) data with excessively numerous recording errors (>Q3 + 1.5 × IQR). Individuals were additionally excluded if their mean sleep duration was shorter than 3 h or longer than 12 h, if they had fewer than 5 or >30 mean sleep episodes per night, or if they had fewer than 3 valid days of data67,68. Detailed configuration parameters for GGIR used in this study can be found in Supplementary Data 38.
Although accelerometry algorithms provide daily sleep-wake characteristics, we focused on summary measures across the observation period to examine habitual sleep patterns. These summary measures are commonly used in sleep health research due to their interpretability and their effectiveness as predictors of key health outcomes. As accelerometers only capture lack of movement, GGIR refers to classified sleep periods as sustained inactivity bouts (SIB), making it unable to accurately identify naps. Therefore, in the absence of sleep diary entries during accelerometer wear, nap-related variables cannot be extracted. We selected accelerometer summary variables for EFA based on their relevance to current research on sleep health and their clinical or scientific significance6,72.
Phenotypes and mortality
This work used data provided by patients and collected by the NHS as part of their care and support. We leveraged the full available historical EHR data spanning multiple decades to identify the earliest diagnosis dates for each participant and phenotype. Diagnosis events were assessed in subjects starting 1 year after accelerometer monitoring through linkage to EHRs. Events extracted from hospital inpatient data, death register, and cancer register were categorized into phenotypes (phecodes) using their corresponding International Classification of Diseases (ICD) codes with the Phecode Map 1.221,22. The phecode map provides exclusion criteria for each phenotype, identifying similar conditions that may suggest the likelihood of undiagnosed patients with the phenotype under consideration. An example of applying phecode exclusion criteria is demonstrated in a type 2 diabetes study using EHRs21. To define cases of type 2 diabetes, patients with ICD codes mapping to phecode 250.2 “Type 2 diabetes” were included. For the control cohort, only participants without phenotypes in the “Diabetes” group (phecodes 249-250.99) were included, preventing contamination by diseases such as “Type 1 diabetes” (phecode 250.1) and “Secondary diabetes mellitus” (phecode 249). Additionally, participants with signs and symptoms commonly associated with type 2 diabetes, like “Abnormal glucose” (phecode 250.4), were excluded to avoid including those who may be undiagnosed. In this study, all subjects meeting any exclusion criteria for a phenotype were excluded from the analysis of that phenotype to avoid including likely or potential prior cases. Detailed exclusion criteria for each selected phenotype are provided in Supplementary Data 39. Subjects excluded from one phecode analysis were not excluded from others unless they also met the exclusion criteria for those phecodes.
Among the 502,250 participants in the UK Biobank, we identified 1695 specific phecodes. Of the 85,233 subjects meeting the inclusion criteria for this study, we retained samples where genetic sex matched the recorded sex and excluded individuals with mismatched sex-specific diagnosis codes or those who withdrew. Consequently, 1633 disease phenotypes were extracted from EHRs at the time of data download on January 8, 2024, and grouped according to the phecode map, with an average follow-up of 7.92 years. To exclude participants with subclinical disease at the time of accelerometer monitoring, we limited enrollment to those whose first diagnostic event occurred at least 1 year after accelerometer wear, resulting in 526 phecodes with at least 200 cases.
Date of death and primary cause of death were obtained through linkage to national death registries. Cardiovascular disease (CVD) mortality was defined by ICD-10 codes: I00-I99.
Covariates
Most participants completed a touchscreen questionnaire and underwent anthropometric assessments at initial recruitment. Subsequently, a subset of included participants in the present study also engaged in first repeat assessments, imaging visit, and first repeat imaging visit (n = 7464, n = 22,822, and n = 2611, respectively). The time differences and number of participants between accelerometry and the four recruitment assessments for the included study participants are detailed in Supplementary Data 40 and Supplementary Fig. 19. Covariates included age, sex, ethnicity (white or non-white), TDI, education (no qualification, any other qualification, degree or above), diet (poor, ideal), physical activity, smoking status (never, previous, current), alcohol consumption (never, previous, current), self-rated health (excellent/good, fair/poor), body mass index (BMI), and season (spring for March to May, summer for June to August, autumn for September to November, and winter for December to February; UK Meteorological Office definitions) (Supplementary Data 41). A healthy diet was defined as consuming at least 5 out of 10 recommended food groups9,73. Intake goals for each food group are detailed in Supplementary Data 42. BMI categories were classified as normal/underweight (<25 kg/m²), overweight (25–30 kg/m²), and obese (≥30 kg/m²). Physical activity was quantified using accelerometer-derived moderate-to-vigorous physical activity (MVPA) duration in minutes, with a threshold of 100 mg for MVPA classification.
Covariate data from the interview conducted closest in time prior to the accelerometry assessment were used as the baseline for this analysis. Exceptions included self-reported sex and TDI, which were obtained only at baseline, as well as self-reported ethnicity (assumed unchanged). Smoking status, alcohol consumption, and education remained stable between baseline and additional visits, while diet, self-rated health, and BMI exhibited some variability over time (Supplementary Figs. 20–22). In the final analysis, time-varying covariates comprised 90.53% from the initial assessment, 7.94% from the first repeat assessment, and 1.53% from the imaging visit (Supplementary Data 43). The number and combinations of missing covariates are shown in Supplementary Data 44 and Supplementary Fig. 23. The highest proportion of missing covariates does not exceed 1% (0–0.74%). We performed multiple imputations to assign missing covariate values using the mice package in R.
Statistical analyses
EFA was employed to extract representative sleep domains using selected sleep variables from accelerometer. We used the principal function in the psych package with a “quartimax” rotation (an orthogonal rotation), without imposing specific domains on the data. The number of factors was determined by examining eigenvalues, visually inspecting scree plots, and considering our hypotheses. Variables were assigned to a factor if they had a loading >0.50. If a variable loaded on multiple factors, we assigned it to the factor with the highest loading. If the highest and second highest loadings were close, we considered the factor’s interpretability for the assignment. Subsequently, we used the standardized factor scores (mean of 0 and standard deviation of 1) derived from the factor analysis as new variables for LPA using the mclust package in R to identify distinct sleep patterns in the participants. The number of latent profiles was selected based on the integrated completed likelihood criterion.
In the PheWAS analysis, a Cox proportional hazards model was conducted for each phecode with at least 200 cases. This threshold was determined through a power analysis via simulation to detect a 0.2 log (hazard ratio) effect size with 80% power74. The endpoint was the diagnosis of the phenotype, with censoring by death or the end of data collection. The timescale used was years since the accelerometer measurement. The independent variable was the sleep profile identified from LPA. To test the proportionality of hazards assumption, Schoenfeld residuals were examined using the cox.zph function. For variables displaying significant (P < 0.05) non-constant violations, a second accelerated failure time model was rerun and compared with the original model to determine the optimal fit. Bonferroni and FDR adjusted significance thresholds were utilized. To check for sex- and age-specific effects, we further conducted subgroup analyses based on sex and age (<65 years, ≥65 years).
To examine the associations between upstream environmental factors and sleep profiles, we used multivariable logistic regression analysis with USP as the dependent variable.
We also conducted several sensitivity analyses. First, we analyzed the relationship between USP and outcomes in a population without missing covariates. Second, we further excluded individuals with a history of shift work from the complete data set. Third, to address potential reverse causality, we excluded individuals who died within 1 year after wearing the accelerometer and analyzed the relationship between sleep profile and risks of all-cause and CVD mortality. Fourth, we applied restricted cubic spline models to examine the shape and strength of associations between each of the five sleep domains constituting USP and 526 disease phenotypes, allowing for potential non-linear relationships. Fifth, we derived a continuous healthy sleep score by linearly combining the harmonized factor scores of the five USP domains, weighted by their proportion of variance explained in the factor analysis. We then conducted PheWAS using Cox models to estimate hazard ratios per standard deviation increase in the healthy sleep score, with covariate adjustment consistent with the binary USP analyses. Finally, follow-up analyses of genome-wide significant loci from the primary analyses were conducted, excluding shift workers and users of sleep or psychiatric medications. These analyses incorporated additional covariate adjustments for smoking status, alcohol consumption, marital status, education, season of accelerometer wear, TDI, BMI, sleep apnea, sleepiness, and ease of getting up, in addition to the initial adjustments for age, sex, and 10 principal components75,76,77. Definitions of additional covariates and the list of medications can be found in the Supplementary Note. All P-values were two-sided and analyses were conducted using R v4.3.3.
Calculating PGS
PGS for USP were derived using individual-level genotype data from the UK Biobank dataset ukb22418. Variant-level quality control (QC) was performed using PLINK, excluding SNPs with minor allele frequency (MAF) < 1%, minor allele count (MAC) < 100, genotype missingness >10%, or Hardy-Weinberg equilibrium \(P < 1\times 10^-15\). Individuals with >10% missing genotypes were also excluded.
Following QC, the dataset was randomly partitioned into a training set (80%) for GWAS and a tuning set (20%) for PGS parameter optimization. GWAS was conducted in the training set using Regenie78 under a logistic regression framework with Firth correction to account for case-control imbalance. GWAS summary statistics from this analysis were used to construct the USP PGS using a clumping and thresholding (C + T) approach implemented in RICE79. Linkage disequilibrium (LD) clumping was performed in PLINK80 with an r² threshold of 0.1 within a 500 kb window. PRSs were generated at nine GWAS P-value thresholds: \(P < 5\times 10^-8\), \(5\times 10^-7\), \(5\times 10^-6\), \(5\times 10^-5\), \(5\times 10^-4\), \(5\times 10^-3\), 0.05, 0.5, and 1.0. The optimal P-value threshold (\(P < 0.05\)) for inclusion of SNPs in the final score was selected based on performance in the tuning set. The final PGS was then computed in the study cohort and used to assess the association between polygenic risk of USP and phenotypes identified in the PheWAS analysis.
MR
We conducted two-sample MR analyses using the TwoSampleMR81 R package to investigate the potential causal effects of USP-associated genetic variants on the USP-linked phenotypes. Genetic instruments for the exposure (USP) were derived from our single variant analysis based on whole genome sequencing (WGS) data from the UK Biobank (Field #24304), as described below. For the outcomes, GWAS summary statistics were downloaded from the GWAS Catalog. To ensure consistency with the disease phenotype definitions used in our PheWAS, we selected outcome GWASs based on phecode-matched traits and included only those conducted in European ancestry populations82. Genetic variants reaching a P < 5 × 10⁻6 threshold in the USP WGS single variant analysis were first clumped to ensure independence (r² < 0.01 within 10 Mb, using 1000 Genomes EUR as the reference panel). The alleles of variants were harmonized between the exposure and outcome datasets using the default allele alignment method (action = 2) to account for palindromic variants based on allele frequencies.
For each of the phenotype, we applied five MR methods: IVW, MR Egger, weighted median, weighted mode, and simple mode. IVW served as the primary analysis method, while the others were used as sensitivity analyses to assess the robustness of the results under different assumptions about instrument validity. We further conducted heterogeneity testing using Cochran’s Q test, evaluated horizontal pleiotropy via the MR Egger intercept, and calculated the mean F-statistic to assess instrument strength. Additionally, leave-one-out analyses were performed to identify influential variants that might disproportionately affect the results.
Whole exome sequence analysis of coding variants associated with USP
We used the PLINK format files for WES data of UK Biobank participants (UK Biobank Field #23158). Quality control measures were performed in the following steps. We first removed the variants with Hardy-Weinberg Equilibrium \(P < 1\times 10^-15\). Second, we removed variants for which > 10% of all genotypes for that variant had a read depth < 10 (ukb23158_500k_OQFE.90pct10dp_qc_variants.txt). We finally excluded the variants with > 10% missing genotypes and the samples with 10% missing genotypes.
We used STAARpipeline to perform genetic association analysis of USP. STAARpipeline is a regression-based framework that allows for adjustment of covariates, population structure, and relatedness by fitting linear and logistic mixed models for quantitative and dichotomous traits27,28. Specifically, we fitted a logistic mixed model adjusting for age, sex, the first 10 ancestral principal components to account for population structure, and a variance component for a sparse genetic relatedness matrix to account for sample relatedness83.
For single variant analysis, we calculated individual P-values of variants with MAC ≥ 40. We first used the normal approximation to calculate the P-value, and when it is <0.05, we applied the saddlepoint approximation to recalculate it84,85. The gene-centric coding analysis of variants, including both single-nucleotide variants (SNVs) and indels, provided seven coding functional categories of protein coding genes, including putative loss of function (stop gain, stop loss and splice) variants, missense variants, disruptive missense variants, putative loss of function and disruptive missense variants, synonymous variants, protein-truncating RVs (stop gain, stop loss, splice, frameshift deletion and frameshift insertion), and protein-truncating RVs and disruptive missense RVs. The putative loss of function, missense, synonymous, and protein-truncating RVs were defined by GENCODE VEP categories86,87. The disruptive variants were further defined by MetaSVM88, which measures the deleteriousness of missense mutations. For each variant set, we calculated the STAAR-Burden P-value. Same as the single variant analysis, we first used the normal approximation to calculate the P-value, and when it is <0.05, we applied the saddlepoint approximation to recalculate it.
Whole genome sequence analysis of noncoding variants associated with USP
We used the pVCF format files for WGS data of UK Biobank participants (UK Biobank Field #24304)89. We followed the same quality control procedure in previous study of UK Biobank WGS data90. We kept all variants with pass indicated by QC label and AAScore >0.5, where AAScore was generated by GraphTyper, the software used by the UK Biobank to perform genotype calling.
We used STAARpipeline27 to perform genetic association analysis of USP. We fitted the null model in the same way as the WES analysis. For single variant analysis, we calculated individual P-values of variants with MAC ≥ 40. The gene-centric noncoding analysis provided eight genetic categories of SNVs, including promoter or enhancer overlaid with CAGE or DHS sites, UTR, upstream, downstream of protein coding genes, and noncoding RNA genes. The promoter RVs were defined as RVs in the +/- 3-kilobase (kb) window of transcription start sites with the overlap of CAGE sites or DHS sites. The enhancer RVs were defined as RVs in GeneHancer predicted regions with the overlap of CAGE sites or DHS sites91,92,93,94. We defined the UTR, upstream, downstream, and ncRNA RVs by GENCODE Variant Effect Predictor (VEP) categories86,87. For the UTR mask, we included RVs in both 5’ and 3’ UTR regions. For the ncRNA mask, we included the exonic and splicing ncRNA rare SNVs. We considered the protein-coding gene for the first seven categories provided by Ensembl95 and the ncRNA genes provided by GENCODE86,87. We incorporated nine annotation Principal Components (aPCs)28 and three integrative scores (CADD96, LINSIGHT97, and FATHMM-XF98) as weights in constructing STAAR-Burden statistics28.
Genome build
All genome coordinates are given in NCBI GRCh38/UCSC hg38.
Replication of the USP framework
To further examine the robustness and generalizability of the USP framework, we validated our findings in an independent multi-ethnic cohort, the MESA (BioLINCC Application ID: 15959). We selected the MESA cohort for validation due to its diverse demographic composition and independent data collection setting, which provided a meaningful test of the framework’s robustness across different populations and measurement environments. This cohort included 2237 participants who wore wrist-worn actigraphy devices (Actiwatch Spectrum, Philips Respironics) for up to seven consecutive days between 2010–2012. All recordings were scored by trained technicians at the Boston Sleep Reading Center. Institutional Review Boards of all participating MESA sites approved the study, and all participants signed informed consent. The validation analysis included 2152 participants aged 54–93 years with complete data. The sample comprised 37.4% white, 11.3% Chinese, 27.8% African American, and 23.4% Hispanic individuals.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
link
