This was a modelling study combining hierarchical clustering analysis with linear programming to design nutritionally adequate, health-promoting, climate-friendly and culturally acceptable diets. Self-selected diets were derived from the nationally representative Swedish dietary survey Riksmaten Vuxna 2010–11 (Riksmaten Adults) [18]. The data, which were collected between May 2010 and May 2011 by the Swedish Food Agency, is publicly available in fully anonymised form [19]. Briefly, a web-based 4-day diary was completed by 1797 adults aged 18–80, and all foods and drinks consumed over four consecutive days were recorded. The participants were able to choose from more than 1900 different food items and dishes and several portion sizes. The study sample consisted of 56% females and the mean age was 48 years. Information on income and other sociodemographic factors was also gathered. A more detailed description of the material and methods used for this study can be found in the Supplementary Information.
Nutritional compositionEnergy and nutrient intakes of the edible parts of foods as eaten (e.g., cooked pasta) were automatically calculated through linkage with the Swedish Food Agency’s Food composition database version Riksmaten Vuxna 2010–11.
Climate footprintsThe carbon dioxide equivalents (CO2eq) of foods were derived from the Climate Database developed and maintained by the Research Institutes of Sweden (RISE) [20], which is linked to the Swedish Food Agency’s Food composition database. The database includes CO2eq estimations for 2078 food items following life-cycle assessment standards [21, 22] taking into consideration Swedish production and consumption patterns [20]. The CO2eq estimations consider the impact from carbon dioxide (CO2); methane (CH4); and nitrous oxide (N2O), which have been weighted in line with their respective global-warming potential over a 100 year period using factors recommended by the IPCC [23]. The CO2eq data did not take into consideration the packaging, transportation from stores to households, meal preparation or food waste.
Cost of foodsThe webpage “Matpriskollen” [24], which compares the prices of foods among twelve of Sweden’s largest food retailers, was used to estimate the price of each food in the year 2020. An average price was calculated for each food item based on varying available prices for a food item (including low price, conventional and organic varieties).
Grouping of foodsFor analytical and descriptive purposes, foods were grouped in 24 food categories, based on the categorisations used in the RISE Climate Database: Red meat (including red meat dishes); Processed meat (both red meat and poultry); Poultry (including poultry based dishes); Seafood (including fish, mussels and crabs, and seafood dishes); Offal; Dairy (e.g., milk and cheese); Eggs; Pasta and rice dishes with meat/fish (e.g., composite dishes like lasagne); Pasta and rice dishes with dairy/eggs (e.g., composite dishes like vegetarian lasagne); Vegetable oils; Vegetables (whole vegetables and a few vegetable based dishes); Potatoes (including potato based dishes); Pulses (beans, lentils, peas and chickpeas); Fruits and berries (including smoothies); Nuts and seeds; Meat alternatives (e.g., soy mince); Dairy alternatives (e.g., oat milk); Mixed/animal fats (added fats such as butter, margarine-butter mix); Cereals/grains (including e.g., breakfast cereals and, pasta); Rice; Savoury snacks; Sugar and sweets (including chocolate); Drinks other than milk; and Other (e.g., seasonings and sauces). Further details on the categorisation can be found elsewhere [20].
The foods in the baseline and optimised diets were additionally re-grouped in order to be comparable to the EAT-Lancet Commission’s food categorisation [1], namely: Whole grains (rice, wheat, corn and other); Tubers or starchy vegetables (including potatoes); Vegetables; Fruits; Dairy foods (whole milk or equivalents, including butter); Beef, lamb and pork; Chicken and other poultry; Eggs; Fish; Legumes; Nuts; Added fats (unsaturated oils and saturated oils); and Added sugars. This categorisation was either based on the most dominant component or calculated based on the proportional shares, based on recipes.
Cluster analysisClusters analysis was performed to identify dominating eating patterns in the Swedish population. Firstly, the R package clValid [25] was applied to the dietary data to simultaneously compare multiple clustering algorithms and clustering methods. By comparing the discriminatory power of different calculation paths, clValid identified hierarchical clustering to be the best fitting clustering algorithm for our data. It also proposed using Canberra distances with Ward’s method in a hierarchical clustering as this combination resulted in the highest value for Dunn’s Index (the ratio of the smallest distance between observations not in the same cluster to the largest intra-cluster distance). Secondly, the NbClust package in R [26] (which uses 30 different indices to suggest the best clustering approach and number of clusters to choose based on all combinations of self-organising clusters, distance measures, and clustering methods) was used to determine the optimal number of clusters when combining Canberra distances with Ward’s method (results suggesting 2 or 3 clusters, visualised in Supplementary Fig. 1). Following on these initial exploratory analyses, data was scaled and hierarchical clustering using Ward’s method and Canberra distances was applied to the dietary data. Based on the outputs from NbClust, three clusters were chosen for this analysis.
Food groups that were consumed by less than 75% of the population were not included in the clustering to avoid bias emerging from missing data. Two exceptions were made for the food groups Pulses and Nuts and Seeds, since these food groups are seen as indicators of both climate friendliness and healthy eating [1]. Hence, the following food groups were included in the clustering: Red meat, Processed meat, Vegetables, Fruits and berries, Dairy, Pulses, Nuts and seeds, Seafood, Mixed animal fats, Sugar and sweets, Rice, Potatoes, Cereals/grains, Eggs, and Poultry. Whole grains were also included in the clustering although not classified as a food group in the food consumption survey. For the clustering procedure, intakes of food groups were standardised for individual energy intake (g/MJ) to account for heterogeneous energy intake.
Comparing the clustersClusters were compared post-hoc on the basis of the energy-adjusted intake of the food groups included in the cluster analysis (g/MJ), age (y), income (SEK), sex (male/female), and CO2eq (g/MJ). Kruskal–Wallis test was used to statistically determine if significant differences between clusters existed with regards to food groups, CO2eq and income since these variables were not normally distributed. Age was normally distributed and thus assessed with Analysis of Variance. Sex (categorical variable) was assessed using Pearson’s chi-squared test. As for the non-normally distributed variables, the Dunn (1964) Kruskal–Wallis test for multiple comparison (alpha adjusted with the Benjamini-Hochberg correction) was used as a post-hoc test to identify which clusters that differed significantly. Tukey’s honest significance test was applied as a post-hoc test for the normally distributed variables. Statistical significance was set at P ≤ 0.05. Both the cluster analysis and all statistical computations were performed in R version 4.1.1 [27].
The healthiness of the three clusters was calculated in accordance with a previously developed healthy eating index relevant for the Swedish context – SHEIA15 [28]. The ratio between the baseline intake and the recommended intake of nine different dietary components were accordingly calculated (Supplementary Table 1) and summed to a total score. Ratios <0 and >1 were recoded to zero and one, respectively, resulting in a range of 0–9. As previously suggested [28], the summed ratios for the different dietary components were categorised into three defined levels; low (<4 points), medium (4–7 points), and high (>7 points).
OptimisationThe chosen optimisation method of LP has successfully been applied to optimise goal determinants of diets while considering a multitude of (sometimes conflicting) constraints [6, 29]. Briefly, it is the application of an algorithm for either maximising or minimising a specific linear objective function (the variable being optimised) which is subjected to a set of linear constraints (predetermined requirements that should be met) on a list of decision variables (in this case, the absolute amount of each individual food item) [30]. A feasible solution is found when all constraints are met. If the selected constraints are too rigorous, the algorithm will not be able to provide a solution, i.e., there will be no feasible solution to the mathematical problem. The constraints that determine the objective function’s capacity to be minimised or maximised (i.e. those conditions fulfilled by 100% in relation to its predetermined limit) are considered “active constraints” [31]. Linear optimisation was performed with the CBC (COIN-OR Branch and Cut) Solver algorithm, which is part of the Excel® 2016 software add-in OpenSolver, V. 2.9.0 [32].
We optimised the average diet of the total study sample (n = 1797, i.e. the “TotPop” diet) as well as the diet of the three clusters (Table 1), respectively. The relative deviation (RD) from the reported intake of each food item was calculated as RD (wopt – wrep)/wrep, where wopt is the food weight in the optimised diet and wrep is the reported intake. As the objective function of all LP models, we chose the minimisation of the total relative deviation (TRD) from the baseline diet [10, 11]. This objective function was implemented to maximise the similarity between the baseline and the optimised diet solutions. The decision variables were the amounts of individual food items in the total study sample/each cluster. All optimisations applied dietary reference values (DRVs), covering the nutritional needs of 97.5% of the population and based on the Nordic Nutrition Recommendations 2012 [33], as obligatory constraints (Supplementary Table 2). In cases where the DRVs differed depending on sex, the nutritional constraints were weighted according to the DRVs and population size of the sex groups in the study sample. Total daily energy (kcal) was set to equal the baseline energy intake within the total population/the three clusters in all models (Supplementary Table 2). All models were also constrained to meet the Swedish Food Based Dietary Guidelines (FBDGs) (Table 1) [34]. Individual food items were allowed to be reduced to 0 g; however, they were not allowed to increase by more than 200% relative to their respective baseline weight. This constraint was applied to all foods except for the ones belonging to the food groups Pulses, Nuts and seeds, Dairy substitutes, Meat substitutes and Vegetable oils. Because of their plausible role in making up a healthy and environmentally friendly diet and their partly recent appearance on the market, these foods/food groups were allowed to increase by any value.
Table 1 Characteristics of all applied models.In a first set of models, all aforementioned constraints, but no upper threshold for the associated GHGE, were applied. The second set of models also included a limit for total diet-related CO2eq. These models were constrained to contain less than or equal to 1570 g of CO2eq per day. The cost of the baseline and optimised diets was calculated separately and was not included as a constraint in the models. The average relative deviation (ARD) from the baseline food consumption (i.e., the TRD divided by the total number of food items included in the model) was calculated as an output and used as a proxy of similarity between the baseline and the optimised food consumption and as an assumed indicator of cultural acceptability. Active nutrient constraints (those meeting exactly 100% of the applied limit [31]) were identified for each solution. A more detailed description of the optimisation procedure can be found in the Supplementary Information.
Comments (0)