Read in the Nightingale Health data using the
read_nightingale function. Here we will read in the example
data provided with the package, and convert it directly into a Omiprep
S7 object.
str(mydata)
#> <omiprep::Omiprep>
#> @ data : num [1:150, 1:229, 1] 9.59e-11 3.97e-10 6.30e-11 7.68e-11 1.40e-10 ...
#> .. - attr(*, "dimnames")=List of 3
#> .. ..$ : chr [1:150] "sam_1" "sam_2" "sam_3" "sam_4" ...
#> .. ..$ : chr [1:229] "XXL-VLDL-P" "XXL-VLDL-L" "XXL-VLDL-PL" "XXL-VLDL-C" ...
#> .. ..$ : chr "input"
#> @ samples :'data.frame': 150 obs. of 254 variables:
#> .. $ sample_id : chr "sam_1" "sam_2" "sam_3" "sam_4" ...
#> .. $ informed_sample_type : chr NA NA NA NA ...
#> .. $ sample_excluded : chr NA NA NA NA ...
#> .. $ sample_notes : chr NA NA NA NA ...
#> .. $ edta_plasma : chr NA NA NA NA ...
#> .. $ citrate_plasma : chr NA NA NA NA ...
#> .. $ low_ethanol : chr NA NA NA NA ...
#> .. $ medium_ethanol : chr NA NA NA NA ...
#> .. $ high_ethanol : chr NA NA NA NA ...
#> .. $ isopropyl_alcohol : chr NA NA NA NA ...
#> .. $ 1methyl2pyrrolidone : chr NA NA NA NA ...
#> .. $ polysaccharides : chr NA NA NA NA ...
#> .. $ aminocaproic_acid : chr NA NA NA NA ...
#> .. $ low_glucose : chr NA NA NA NA ...
#> .. $ high_lactate : chr NA NA NA NA ...
#> .. $ high_pyruvate : chr NA NA NA NA ...
#> .. $ low_glutamine__high_glutamate : chr NA NA NA NA ...
#> .. $ gluconolactone : chr NA NA NA NA ...
#> .. $ low_protein : chr NA NA NA NA ...
#> .. $ unexpected_amino_acid_signals : chr NA NA NA NA ...
#> .. $ unidentified_macromolecules : chr NA NA NA NA ...
#> .. $ unidentified_small_molecule (a): chr NA NA NA NA ...
#> .. $ unidentified_small_molecule (b): chr NA NA NA NA ...
#> .. $ unidentified_small_molecule (c): chr NA NA NA NA ...
#> .. $ below_limit_of_quantification : chr NA NA NA NA ...
#> .. $ xxlvldlp : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xxlvldll : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xxlvldlpl : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xxlvldlc : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xxlvldlce : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xxlvldlfc : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xxlvldltg : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xlvldlp : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xlvldll : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xlvldlpl : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xlvldlc : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xlvldlce : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xlvldlfc : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xlvldltg : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ lvldlp : Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...
#> .. $ lvldll : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ lvldlpl : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ lvldlc : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ lvldlce : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ lvldlfc : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ lvldltg : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ mvldlp : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ mvldll : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA 1 ...
#> .. $ mvldlpl : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ mvldlc : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA 1 NA ...
#> .. $ mvldlce : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ mvldlfc : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ mvldltg : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ svldlp : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ svldll : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ svldlpl : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ svldlc : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA 1 NA ...
#> .. $ svldlce : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ svldlfc : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA 1 NA ...
#> .. $ svldltg : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xsvldlp : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA 1 NA NA ...
#> .. $ xsvldll : Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xsvldlpl : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xsvldlc : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA 1 NA ...
#> .. $ xsvldlce : Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xsvldlfc : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xsvldltg : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ idlp : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA 1 NA NA NA NA NA ...
#> .. $ idll : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ idlpl : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA 1 NA NA NA NA NA NA NA ...
#> .. $ idlc : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ idlce : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ idlfc : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ idltg : Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...
#> .. $ lldlp : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ lldll : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA 1 NA NA NA NA ...
#> .. $ lldlpl : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA 1 ...
#> .. $ lldlc : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ lldlce : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ lldlfc : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ lldltg : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ mldlp : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA 1 NA NA NA NA NA NA ...
#> .. $ mldll : Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...
#> .. $ mldlpl : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA 1 NA NA NA NA ...
#> .. $ mldlc : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ mldlce : Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...
#> .. $ mldlfc : Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...
#> .. $ mldltg : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ sldlp : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ sldll : Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...
#> .. $ sldlpl : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ sldlc : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ sldlce : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA 1 NA ...
#> .. $ sldlfc : Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...
#> .. $ sldltg : Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xlhdlp : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA 1 NA NA NA NA ...
#> .. $ xlhdll : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xlhdlpl : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. $ xlhdlc : Factor w/ 1 level "Value was rejected by automatic sample and measurement quality control": NA NA NA NA NA NA NA NA NA NA ...
#> .. [list output truncated]
#> @ features :'data.frame': 229 obs. of 6 variables:
#> .. $ feature_id : chr "XXL-VLDL-P" "XXL-VLDL-L" "XXL-VLDL-PL" "XXL-VLDL-C" ...
#> .. $ csv_column_name: chr "XXL_VLDL_P" "XXL_VLDL_L" "XXL_VLDL_PL" "XXL_VLDL_C" ...
#> .. $ biomarker_name : chr "Concentration of chylomicrons and extremely large VLDL particles" "Total lipids in chylomicrons and extremely large VLDL" "Phospholipids in chylomicrons and extremely large VLDL" "Cholesterol in chylomicrons and extremely large VLDL" ...
#> .. $ unit : chr "mol/l" "mmol/l" "mmol/l" "mmol/l" ...
#> .. $ group : chr "Lipoprotein subclasses" "Lipoprotein subclasses" "Lipoprotein subclasses" "Lipoprotein subclasses" ...
#> .. $ subgroup : chr "Chylomicrons and extremely large VLDL" "Chylomicrons and extremely large VLDL" "Chylomicrons and extremely large VLDL" "Chylomicrons and extremely large VLDL" ...
#> @ exclusions :List of 2
#> .. $ samples :List of 5
#> .. ..$ user_excluded : chr(0)
#> .. ..$ extreme_sample_missingness : chr(0)
#> .. ..$ user_defined_sample_missingness : chr(0)
#> .. ..$ user_defined_sample_totalpeakarea: chr(0)
#> .. ..$ user_defined_sample_pca_outlier : chr(0)
#> .. $ features:List of 4
#> .. ..$ user_excluded : chr(0)
#> .. ..$ extreme_feature_missingness : chr(0)
#> .. ..$ user_defined_feature_missingness: chr(0)
#> .. ..$ user_defined_feature_skewness : chr(0)
#> @ feature_summary: num[0 , 0 , 0 ]
#> @ sample_summary : num[0 , 0 , 0 ]Perform the QC steps using the quality_control
function.
mydata <- mydata |>
quality_control(source_layer = "input",
sample_missingness = 0.2,
feature_missingness = 0.2,
total_sum_abundance_sd = 5,
outlier_udist = 5,
outlier_treatment = "leave_be",
winsorize_quantile = 1.0,
tree_cut_height = 0.5,
pc_outlier_sd = 5,
feature_selection = "max_var_exp",
features_exclude_but_keep = NULL,
cores = 1
)
#>
#> ── Starting Omics QC Process ───────────────────────────────────────────────────
#> ℹ Validating input parameters
#> ✔ Validating input parameters [6ms]
#>
#> ℹ Sample & Feature Summary Statistics for raw data
#> ℹ Number of informative PCs (Scree acceleration factor): 2
#> ℹ Sample & Feature Summary Statistics for raw data✔ Sample & Feature Summary Statistics for raw data [2.1s]
#>
#> ℹ Copying input data to new 'qc' data layer
#> ✔ Copying input data to new 'qc' data layer [16ms]
#>
#> ℹ Assessing for extreme sample missingness >=80% - excluding 0 sample(s)
#> ✔ Assessing for extreme sample missingness >=80% - excluding 0 sample(s) [17ms]
#>
#> ℹ Assessing for extreme feature missingness >=80% - excluding 0 feature(s)
#> ✔ Assessing for extreme feature missingness >=80% - excluding 0 feature(s) [17m…
#>
#> ℹ Assessing for sample missingness at specified level of >=20% - excluding 0 sa…
#> ✔ Assessing for sample missingness at specified level of >=20% - excluding 0 sa…
#>
#> ℹ Assessing for feature missingness at specified level of >=20% - excluding 0 f…
#> ✔ Assessing for feature missingness at specified level of >=20% - excluding 7 f…
#>
#> ℹ Calculating total sum abundance outliers at +/- 5 Sdev - excluding 0 sample(s)
#> ✔ Calculating total sum abundance outliers at +/- 5 Sdev - excluding 0 sample(s…
#>
#> ℹ Running sample data PCA outlier analysis at +/- 5 Sdev
#> ✔ Running sample data PCA outlier analysis at +/- 5 Sdev [18ms]
#>
#> ℹ Sample PCA outlier analysis - re-identify feature independence and PC outlier…
#> ℹ Number of informative PCs (Scree acceleration factor): 2
#> ℹ Sample PCA outlier analysis - re-identify feature independence and PC outlier…ℹ Sample PCA outlier analysis - re-identify feature independence and PC outlier…
#> ! The stated max PCs [max_num_pcs=10] to use in PCA outlier assessment is greater than the number of available informative PCs [2]
#> ℹ Sample PCA outlier analysis - re-identify feature independence and PC outlier…✔ Sample PCA outlier analysis - re-identify feature independence and PC outlier…
#>
#> ℹ Creating final QC dataset...
#> ℹ Number of informative PCs (Scree acceleration factor): 2
#> ℹ Creating final QC dataset...
#> ℹ Creating final QC dataset...── Step timings ──
#> ℹ Creating final QC dataset...
#> ℹ Creating final QC dataset...
#> step seconds pct
#> validation 0.00 0.0
#> summarise_raw 2.09 32.2
#> copy_layer 0.00 0.0
#> extreme_sample_missingness 0.00 0.0
#> extreme_feature_missingness 0.00 0.0
#> sample_missingness 0.00 0.0
#> total_sum_abundance 0.02 0.3
#> summarise_pca 2.07 31.9
#> summarise_final 2.11 32.5
#> total 6.49 99.9
#> ✔ Creating final QC dataset... [2.2s]
#>
#> ℹ 'Omics QC Process Completed
#> ✔ 'Omics QC Process Completed [14ms]summary(mydata)
#> Omiprep Object Summary
#> --------------------------
#> Samples : 150
#> Features : 229
#> Data Layers : 2
#> Layer Names : input, qc
#>
#> Sample Summary Layers : input, qc
#> Feature Summary Layers: input, qc
#>
#> Sample Annotation (metadata):
#> Columns: 256
#> Names : sample_id, informed_sample_type, sample_excluded, sample_notes, edta_plasma, citrate_plasma, low_ethanol, medium_ethanol, high_ethanol, isopropyl_alcohol, 1methyl2pyrrolidone, polysaccharides, aminocaproic_acid, low_glucose, high_lactate, high_pyruvate, low_glutamine__high_glutamate, gluconolactone, low_protein, unexpected_amino_acid_signals, unidentified_macromolecules, unidentified_small_molecule (a), unidentified_small_molecule (b), unidentified_small_molecule (c), below_limit_of_quantification, xxlvldlp, xxlvldll, xxlvldlpl, xxlvldlc, xxlvldlce, xxlvldlfc, xxlvldltg, xlvldlp, xlvldll, xlvldlpl, xlvldlc, xlvldlce, xlvldlfc, xlvldltg, lvldlp, lvldll, lvldlpl, lvldlc, lvldlce, lvldlfc, lvldltg, mvldlp, mvldll, mvldlpl, mvldlc, mvldlce, mvldlfc, mvldltg, svldlp, svldll, svldlpl, svldlc, svldlce, svldlfc, svldltg, xsvldlp, xsvldll, xsvldlpl, xsvldlc, xsvldlce, xsvldlfc, xsvldltg, idlp, idll, idlpl, idlc, idlce, idlfc, idltg, lldlp, lldll, lldlpl, lldlc, lldlce, lldlfc, lldltg, mldlp, mldll, mldlpl, mldlc, mldlce, mldlfc, mldltg, sldlp, sldll, sldlpl, sldlc, sldlce, sldlfc, sldltg, xlhdlp, xlhdll, xlhdlpl, xlhdlc, xlhdlce, xlhdlfc, xlhdltg, lhdlp, lhdll, lhdlpl, lhdlc, lhdlce, lhdlfc, lhdltg, mhdlp, mhdll, mhdlpl, mhdlc, mhdlce, mhdlfc, mhdltg, shdlp, shdll, shdlpl, shdlc, shdlce, shdlfc, shdltg, xxlvldlpl_pct, xxlvldlc_pct, xxlvldlce_pct, xxlvldlfc_pct, xxlvldltg_pct, xlvldlpl_pct, xlvldlc_pct, xlvldlce_pct, xlvldlfc_pct, xlvldltg_pct, lvldlpl_pct, lvldlc_pct, lvldlce_pct, lvldlfc_pct, lvldltg_pct, mvldlpl_pct, mvldlc_pct, mvldlce_pct, mvldlfc_pct, mvldltg_pct, svldlpl_pct, svldlc_pct, svldlce_pct, svldlfc_pct, svldltg_pct, xsvldlpl_pct, xsvldlc_pct, xsvldlce_pct, xsvldlfc_pct, xsvldltg_pct, idlpl_pct, idlc_pct, idlce_pct, idlfc_pct, idltg_pct, lldlpl_pct, lldlc_pct, lldlce_pct, lldlfc_pct, lldltg_pct, mldlpl_pct, mldlc_pct, mldlce_pct, mldlfc_pct, mldltg_pct, sldlpl_pct, sldlc_pct, sldlce_pct, sldlfc_pct, sldltg_pct, xlhdlpl_pct, xlhdlc_pct, xlhdlce_pct, xlhdlfc_pct, xlhdltg_pct, lhdlpl_pct, lhdlc_pct, lhdlce_pct, lhdlfc_pct, lhdltg_pct, mhdlpl_pct, mhdlc_pct, mhdlce_pct, mhdlfc_pct, mhdltg_pct, shdlpl_pct, shdlc_pct, shdlce_pct, shdlfc_pct, shdltg_pct, vldl_size, ldl_size, hdl_size, totalc, vldlc, remnantc, ldlc, hdlc, hdl2c, hdl3c, totalce, totalfc, totaltg, vldltg, ldltg, hdltg, dag, dagtg, totalpg, tgpg, pc, sm, cho, apoa1, apob, apobapoa1, totalfa, falen, unsat, dha, la, cla, omega3, omega6, pufa, mufa, sfa, dha_pct, la_pct, cla_pct, omega3_pct, omega6_pct, pufa_pct, mufa_pct, sfa_pct, glc, lac, cit, ala, gln, his, ile, leu, val, phe, tyr, acetate, bohbut, crea, alb, glyca, reason_excluded, excluded
#>
#> Feature Annotation (metadata):
#> Columns: 8
#> Names : feature_id, csv_column_name, biomarker_name, unit, group, subgroup, reason_excluded, excluded
#>
#> Exclusion Codes Summary:
#>
#> Sample Exclusions:
#> Exclusion | Count
#> -----------------
#> user_excluded | 0
#> extreme_sample_missingness | 0
#> user_defined_sample_missingness | 0
#> user_defined_sample_totalpeakarea | 0
#> user_defined_sample_pca_outlier | 0
#>
#> Feature Exclusions:
#> Exclusion | Count
#> -----------------
#> user_excluded | 0
#> extreme_feature_missingness | 0
#> user_defined_feature_missingness | 7
#> user_defined_feature_skewness | 0