--- title: "Import Metabolon Metabolomic Data" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Import Metabolon Metabolomic Data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Import Metabolon data Read in the Metabolon data using the `read_metabolon` function. Here we will read in the example data provided with the package, as a list object. ```{r setup} library(metaboprep) # example file filepath <- system.file("extdata", "metabolon_v1.2_example.xlsx", package = "metaboprep") # import data as a list object rather than directly as a Metaboprep object dat <- read_metabolon(filepath, sheet = 'OrigScale', return_Metaboprep = FALSE) ``` ## Quick look at data structure of the imported data ```{r data_str} str(dat) ``` ## Create Metaboprep object Once imported, we pass the data to the Metaboprep() function to build the `Metaboprep` class object. ```{r metaboprep_obj} ## This step could be avoided by defining return_Metaboprep = TRUE in read_metabolon() function above. mydata <- Metaboprep(data = dat$data, features = dat$features, samples = dat$samples) ``` ## Quick summary of the metaboprep object ```{r summary_before_qc} summary(mydata) ``` ## Identify the Xenobiotics to exclude from the QC steps Use the feature data just imported to identify xenobiotic metabolites. It may be best to excluded these features from the quality-control (QC) process. Xenobiotics typically exhibit much higher levels of missingness than endogenous metabolites, and including them in QC can result in excessive exclusion of both features and samples. This step will allow you to retain these features in the final dataset, by excluding them from QC filtering steps. ```{r xeno_identification} xenos <- mydata@features[!is.na(mydata@features$super_pathway) & mydata@features$super_pathway == "Xenobiotics", "feature_id"] ## how many xenobiotics identified length(xenos) ``` ## QC Metabolon Perform the QC steps using the `quality_control` function, specifying the xenobiotics to exclude from the QC steps. ```{r qc} ## Given the high missingness in metabolon data, ## we suggest using the `least_missingness` feature selection method ## for the identification of principle variable that will then be ## used in the construction of PCs. mydata <- mydata |> quality_control(source_layer = "input", sample_missingness = 0.2, feature_missingness = 0.2, total_peak_area_sd = 5, outlier_udist = 5, outlier_treatment = "leave_be", winsorize_quantile = 1.0, tree_cut_height = 0.5, pc_outlier_sd = 5, feature_selection = "least_missingness", ## We suggest using `least_missingness` when working with data, like Metabolon, with high missingness. Default is "max_var_exp". features_exclude_but_keep = xenos ## exclude xenobiotics from QC, but retain them in the final dataset ) ``` ## Quick summary of the metaboprep object following QC ```{r summary_after_qc} summary(mydata) ```