Database Open Access
CGMacros: a scientific dataset for personalized nutrition and diet monitoring
Ricardo Gutierrez-Osuna , David Kerr , Bobak Mortazavi , Anurag Das
Published: Jan. 28, 2025. Version: 1.0.0
When using this resource, please cite:
(show more options)
Gutierrez-Osuna, R., Kerr, D., Mortazavi, B., & Das, A. (2025). CGMacros: a scientific dataset for personalized nutrition and diet monitoring (version 1.0.0). PhysioNet. https://doi.org/10.13026/3z8q-x658.
Anurag Das , David Kerr , Namino Glanz , Wendy Bevier , Rony Santiago , Ricardo Gutierrez-Osuna, and Bobak Mortazavi, "CGMacros: a scientific dataset for personalized nutrition and diet monitoring," Scientific Data (under review)
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
Abstract
We present CGMacros, a dataset containing multimodal information from two continuous glucose monitors (CGM), food macronutrients, food photographs, and physical activity, in addition to anonymized participant demographics, anthropometric measurements and health parameters from blood analyses and gut microbiome profiles. CGMacros contains data from 45 study participants (15 healthy adults, 16 with pre-diabetes, and 14 with Type 2 diabetes) who consumed meals with varying and known macronutrient compositions in a free-living setting for ten consecutive days. To our knowledge, this is the first database of its kind to be made publicly available. CGMacros, and larger publicly available datasets that we hope may follow, are essential to democratize academic research in personalized nutrition and algorithmic approaches to automated diet monitoring.
Background
Poor dietary habits are a major contributor to the development of chronic diseases such as type 2 diabetes, obesity, heart disease, and some cancers [1-3]. A recent survey examining food consumption across 195 countries estimated that improving diet can prevent one of every five deaths worldwide globally [4]. Using several dietary risk factors (e.g., diet high in sodium, or low in fiber), the study concluded that poor diet was responsible for more deaths than any other risks globally, including tobacco smoking. Therefore, monitoring food intake is an important step toward maintaining a healthy diet and preventing chronic diseases later in life. Current methods for diet monitoring are based on self-report measures (e.g., food journals and mobile apps). However, these methods often require prolonged manual entry, which is cumbersome and error prone [5-7].
The broad availability of wearable sensors and fitness trackers has led to the development of automated methods to detect and recognize moments of food intake. These sensors include smartwatches and smart utensils with embedded accelerometers that track hand-to-mouth gestures as a proxy for detecting eating instances [7, 8]. Datasets based on these technologies have been publicly released to advance the field of nutrition monitoring [9-12]. However, these datasets generally contain data from one or a limited set of sensing modalities, which limits their application to specific use cases such as activity recognition of eating moments or image recognition from food photographs. Also lacking is the availability of publicly available datasets with food macronutrients and their associated glucose measurements from continuous glucose monitors (CGMs) [13]. For example, in a study on personalized nutrition, Zeevi et al. [14] recorded CGM responses of 800 participants for one week while participants kept detailed records of their diet. Using this extensive dataset, the authors trained a machine-learning (ML) model that was able predict the glucose response of a meal for each participant based on individual factors, such as anthropometric variables, blood panels, and gut microbiome, and use it to develop personalized diets. However, this dataset is not publicly available. In what can be thought of as the "inverse" model of Zeevi et al. [14], we have shown that the shape of the postprandial glucose response (PPGR) to a meal can be used to estimate the macronutrient composition of the meal using ML models [15-17], opening the possibility to track diet automatically using CGMs. The CGMacros dataset being submitted to PhysioNet is the result of our efforts and aims to democratize academic research in personalized nutrition and algorithmic approaches to automated diet monitoring.
Methods
Participants were recruited at Sansum Diabetes Research Institute (SDRI), in Santa Barbara, CA. On day 1 of the study, potential participants cleared an initial screening and signed a consent form (Advarra IRB Pro00049227; ClinicalTrials.gov NCT04991142). As part of the screening process, we measured the participant's body mass index (BMI), glycated hemoglobin (HbA1c), fasting glucose, fasting insulin, triglyceride, and cholesterol levels. At this time, we also recorded their demographic information (age, gender, and race). After the initial screening, an Abbott FreeStyle Libre Pro CGM (15-min sampling period) and a Dexcom G6 Pro CGM (5-min sampling) were placed on the participant's upper arm and abdomen, respectively. Both CGMs were blinded to prevent glucose readings from influencing participants. Participants were also provided with a Fitbit smartwatch (Fitbit Sense) to log exercise, and were trained to use the MyFitnessPal mobile app to log their meals and take pictures of their foods using the WhatsApp mobile app.
Each subject recorded their meals for 10 days, including breakfast, lunch and dinner. Breakfasts consisted of protein shakes with varying amounts of carbohydrates, protein, fat, and fiber. Lunches were ordered from a local, fast-casual restaurant chain (Chipotle Mexican Grill). The breakfast and lunch meals were designed to cover a range of macronutrient contents. For dinners, participants ate foods of their own choice. To minimize interferences in glucose responses from prior meals, participants were instructed to eat lunch at least three hours after breakfast, with only water or coffee (without sugar) in between, and dinner at least three hours after lunch. They also took photographs of the meals before and after eating, from which we extracted the meal timestamps and the proportion of the meal they consumed. Stool samples were collected at the start of the study and analyzed using a Viome microbiome kit (Viome Life Sciences, Inc.).
NOTE: Since full dates are considered protected health information (PHI) per HIPPAA regulations, dates for CGM recordings and food photographs have been time shifted by +/- N days (365<N<720).
Data Description
Forty-five participants completed our study, ages 18-69, and body mass index (BMI) 21-46 kgm2. All participants were recruited between 2021 and 2024. Out of 45 participants, 15 had no pre-existing diabetes (HbA1c<5.7%), 16 had pre-diabetes (5.7% ≤HbA1c≤6.4%), and 14 had type 2 diabetes (T2D) (HbA1c>6.4%).
The dataset consists of 45 main CSV files (one per participant) and three supplementary CSV files. The main files (cgm#.csv) contain CGM and fitness tracker readings at 1-minute intervals, one row per measurement (plus a heading), and one column per variable (# denotes participant number). At the appropriate time stamp (i.e., row), we also report the total caloric content and carbohydrate, protein, fat, and fiber amounts of each meal, the type of meal (breakfast, lunch, dinner) and a path to the file containing the corresponding photograph (also included in the dataset). Data from each participant spans approximately ten days.
The supplementary files (bio.csv, microbes.csv, gut_health_test.csv) contains demographics (age, gender, ethnicity), anthropometric measurements (height, weight, BMI), blood analytics (HbA1c, fasting glucose, insulin, triglyceride, cholesterol, high-density lipoprotein (HDL), non-HDL, low-density lipoprotein (LDL), very low-density lipoprotein (VLDL) levels), three finger stick glucose measurements and microbiome profile, all taken on the first day along for each study participant, with the corresponding date and time stamp. For each participant, Viome provides two reports, the first one listing all the bacteria that are present in the stool sample, and the second one providing digestive health scores and recommendations that Viome generates (recommendations are not included in this dataset). We combined the Viome reports of bacteria of the 45 participants and generated an indicator variable as a separate column for each of 1,979 bacteria, denoting whether it was present (1) or absent (0) in the corresponding Viome report. From the report of 22 gut health scores generated by Viome, we developed an ordinal variable for each of the tests coded as Good, Average, or Not Optimal. These scores are Viome's estimate of gut health based upon the bacteria identified and include their estimates of overall gut health (Tily et al., 2022). Examples of such tests/scores include an overall Gut Health test, Metabolic Fitness, Inflammatory Activity, Digestive Efficiency, Gut Active Microbial Diversity, and more as summaries of present or non-present bacteria from the first report.
Folder Structure:
- Top level: The top-level structure includes three CSV files, a Readme file, and a Jupyter notebook file, and then individual sub-folders for each participant's CGM readings. The CSV files include bio.csv, which contains the demographics of all participants, gut_health_test.csv, which contains the Viome gut health scores for all participants, and microbs.csv, which contains the Viome gut micro detection for all participants. The Jupyter notebook is described in Section 6 below. Each of the individual participant folders is described further here.
- Second level folders CGMMacros-0XX (where XX represents the participant ID, and there is one folder per participant ID: 1-49): Includes a CGMacros-0YY.csv file (where YY matches the participant ID of the folder) that contains CGM readings, Fitbit readings, meal information, and path information to meal photographs. Additionally, each folder has its own photos subfolder, which contains within it the before and after photos of all meals as indicated in the CGMacros.csv file.
Usage Notes
The dataset includes a Python script file parse_data.ipynb that parses the dataset and builds a machine-learning model (XGBoost) to predict the area-under-the-curve (AUC) and incremental AUC (iAUC) of the post-prandial glucose response to each meal as a function of meal macronutrients, demographics and health parameters (e.g., body mass index, HbA1c.) This script may be used as an example of how to generate personalized nutrition programs that minimize the postprandial glucose response of meals, in a manner akin to Zeevi et al. [14].
A second potential application of the CGMacros dataset is the development of algorithms to detect the onset of meals by identifying the early rise of postprandial glucose responses. Note the CGMacros provide ground truth for the start and end of a meal (i.e., timestamp of food photographs).
The final application of CGMacros is to develop "inverse" metabolic models that estimate the macronutrient composition of meals by analyzing post-prandial glucose responses, as described in our prior work [15-17].
Ethics
The study was approved by Advarra Institutional Review Board (Advarra IRB Pro00049227; ClinicalTrials.gov NCT04991142). Participants signed a consent form prior to the start of the study.
Acknowledgements
This work was supported by National Science Foundation award No. 2014475
Conflicts of Interest
Dr. Mortazavi discloses a relationship with McAndrews, Held, and Malloy Ltd and Kirkland & Ellis, LLP for expert testimony.
References
- Sami W, Ansari T, Butt NS, Hamid MRA. Effect of diet on type 2 diabetes mellitus: A review. Int J Health Sci (Qassim). 2017;11(2):65-71.
- Key TJ, Allen NE, Spencer EA, Travis RC. The effect of diet on risk of cancer. The Lancet. 2002;360(9336):861-8.
- Zhang FF, Cudhea F, Shan Z, Michaud DS, Imamura F, Eom H, et al. Preventable Cancer Burden Associated With Poor Diet in the United States. JNCI Cancer Spectrum. 2019;3(2).
- Neuhouser ML. The importance of healthy dietary patterns in chronic disease prevention. Nutrition Research. 2019;70:3-6.
- Epstein DA, Cordeiro F, Fogarty J, Hsieh G, Munson SA. Crumbs: Lightweight Daily Food Challenges to Promote Engagement and Mindfulness. Proceedings of the SIGCHI conference on human factors in computing systems CHI Conference. 2016;2016:5632-44.
- Shay LE, Seibert D, Watts D, Sbrocco T, Pagliara C. Adherence and weight loss outcomes associated with food-exercise diary preference in a military weight management program. Eating behaviors. 2009;10(4):220-7.
- Bell BM, Alam R, Alshurafa N, Thomaz E, Mondol AS, de la Haye K, et al. Automatic, wearable-based, in-field eating detection approaches for public health research: a scoping review. npj Digital Medicine. 2020;3(1):38.
- Bedri A, Li D, Khurana R, Bhuwalka K, Goel M, editors. Fitbyte: Automatic diet monitoring in unconstrained situations using multimodal sensing on eyeglasses. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems; 2020.
- Kyritsis K, Diou C, Delopoulos A. A data driven end-to-end approach for in-the-wild monitoring of eating behavior using smartwatches. IEEE Journal of Biomedical and Health Informatics. 2020;25(1):22-34.
- Mirtchouk M, Merck C, Kleinberg S, editors. Automated estimation of food type and amount consumed from body-worn audio and motion sensors. Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing; 2016.
- Shen Y, Salley J, Muth E, Hoover A. Assessing the accuracy of a wrist motion tracking method for counting bites across demographic and food variables. IEEE journal of biomedical and health informatics. 2016;21(3):599-606.
- Rouast PV, Heydarian H, Adam MT, Rollo ME. Oreba: A dataset for objectively recognizing eating behavior and associated intake. IEEE Access. 2020;8:181955-63.
- Das A, Kerr D, Glanz N, Bevier W, Santiago R, McCrory M, et al. CGMacros: a scientific dataset for personalized nutrition and diet monitoring. Scientific Data (under review). 2024.
- Zeevi D, Korem T, Zmora N, Israeli D, Rothschild D, Weinberger A, et al. Personalized Nutrition by Prediction of Glycemic Responses. Cell. 2015;163(5):1079-94.
- Mortazavi BJ, Gutierrez-Osuna R. A Review of Digital Innovations for Diet Monitoring and Precision Nutrition. Journal of Diabetes Science and Technology. 2023;17(1):217-23.
- Das A, Mortazavi B, Sajjadi S, Chaspari T, Ruebush LE, Deutz NE, et al. Predicting the Macronutrient Composition of Mixed Meals From Dietary Biomarkers in Blood. IEEE Journal of Biomedical and Health Informatics. 2022;26(6):2726-36.
- Zhang L, Huang S, Das A, Do E, Glantz N, Bevier W, et al., editors. Joint Embedding of Food Photographs and Blood Glucose for Improved Calorie Estimation. 2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI); 2023 15-18 Oct. 2023.
Access
Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.
License (for files):
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License
Discovery
DOI (version 1.0.0):
https://doi.org/10.13026/3z8q-x658
DOI (latest version):
https://doi.org/10.13026/8mak-rs10
Topics:
diabetes
continuous glucose monitors
obesity
machine learning
postprandial glucose response
food macronutrients
metabolic models
food photographs
personalized nutrition
Project Website:
https://psi.engr.tamu.edu/publications/?tgid=40&yr=&type=&usr=&auth=
Corresponding Author
Files
Total uncompressed size: 627.9 MB.
Access the files
- Download the ZIP file (627.1 MB)
-
Download the files using your terminal:
wget -r -N -c -np https://physionet.org/files/cgmacros/1.0.0/
Name | Size | Modified |
---|---|---|
CGMacros_dateshifted365.zip (download) | 626.7 MB | 2024-12-12 |
DataDictionary.pdf (download) | 945.3 KB | 2024-11-21 |
DataDictionary_Bio.csv (download) | 2.0 KB | 2024-12-12 |
DataDictionary_CGMacros-00X.csv (download) | 1.6 KB | 2024-12-12 |
DataDictionary_Gut_Health_Test.csv (download) | 4.3 KB | 2024-12-12 |
DataDictionary_Microbes.csv (download) | 234.8 KB | 2024-12-12 |
LICENSE.txt (download) | 0 B | 2025-01-24 |
SHA256SUMS.txt (download) | 632 B | 2025-01-28 |