The results of this study allow us to demonstrate and quantify the importance of the specialist’s experience for age estimation in forensic anthropology. High experience and expertise is essential for correct labeling of components. Thus, component-based methods are only more effective than phase-based methods when utilized by highly experienced practitioners. Methods based on explainable artificial intelligence techniques offer estimates that are comparable to those produced by humans, regardless of their experience. Consequently, they could provide a solution to the aforementioned issue. However, these methods are currently constrained by component labeling systems that are poorly replicable.
Table 1 shows that the difference between novice and experienced practitioners has been very notable. In the case of experienced practitioners, good agreement values have been obtained in practically all the variables analyzed (K ≥ 0,6), but the ventral bevel. However, poor or moderate agreement values have been obtained for almost all variables (K < 0,4) in the case of novice practitioners. The difference in performance between both groups indicates a high difficulty in identifying these traits, despite novice practitioners were previously trained and they were supported by the detailed atlas with definitions and prototypical images for each trait.
The results of the linear weighted kappa coefficients indicate that analyzing components individually instead of development phases in a macroscopic fashion only results in a moderate reduction in intra- and inter-observer error in the case of experienced practitioners. However, this is not the case for novice observers, who demonstrate greater consistency when analyzing phases rather than components. It is relatively straightforward to envisage a novice practitioner being able to distinguish between younger and older pubic bones after a few weeks of training. However, it is possible that they may require considerably more practice to become proficient in identifying variables such as macroporosity, bony nodules, and dorsal plateaus.
The findings of this study may call into question the conclusions previously drawn by numerous authors who claim that component-based systems are more objective and therefore offer more accurate estimates [4,5,6, 26, 31]. For example, Shirley et al. [5] compared the inter-observer error assumed when evaluating phases (with the Suchey and Brooks’ method) and when evaluating components. For this purpose, two expert observers evaluated 30 pubic bones of individuals aged 24 to 93 years. The results reveal a higher error rate in the evaluation of phases (linear weighted kappa of 0,68) compared to components (K ranked between 0,73 and 0,98). Although other prior studies do not make a direct comparison, as Shirley et al. [5] did, they have set out the merits of components over phases based on factors such as the extensive number of traits analyzed in each phase, their considerable variability, or the subjectivity in evaluating them [4, 6, 26, 31]. Nonetheless, as demonstrated by the outcomes of the current study, it is possible that this conclusion is exclusively applicable to practitioners with extensive experience. Consequently, incorporating novice practitioners is essential to accurately assess the replicability of new methods.
The high levels of agreement obtained to identify the formation of the upper and lower edges, both by experienced and novice practitioners, stand out positively. These results corroborate those obtained by most previous studies [4,5,6]. Despite the use of statistical analyses diverging from the one used in our study, these investigations substantiate that these traits exhibit low inter-observer variability. The upper and lower edges are easily identifiable pubic symphysis traits, with only two levels (see Fig. 2), that are fundamental for estimating age-at-death at an early age. The difficulty in assessing the changes that occur in the ventral bevel was also observed by other researchers [4, 31]. The definition of this variable is probably not clear enough or difficult to interpret, so our proposal for future studies is to change its definition or to eliminate it.
Finally, the new variable proposed for the analysis of the dorsal groove has shown good agreement among experienced practitioners (0,6 ≤ K < 0,8) but has also been difficult to identify for novice practitioners (K < 0,4). Despite this difficulty in identification, we consider that this variable represents an advance over the “dorsal margin formation” variable originally defined by Todd [1]. Shirley et al. [5] also analyzed the inter-observer error obtained when analyzing the dorsal margin according to the parameters defined by Brooks and Suchey [2], obtaining moderate agreement with this variable (K = 0,4–0,6). Other components related to the dorsal margin, such as the Lipping or the decomposition presented by other authors [26], are characteristic of older ages. Meanwhile, according to a preliminary assessment by the researchers of this study, the presence of the dorsal groove seems to be characteristic of individuals with intermediate ages (between approximately 30 and 40 years). The identification of distinct groups at intermediate ages represents a significant challenge for most age-at-death estimation methods. For instance, Brooks and Suchey [2] provide notably broad intervals for the intermediate phases; Schmitt et al. [8] also arrive at analogous conclusions. The use of contemporary AI-driven methods shows analogous patterns. Castillo et al. [26] achieve optimal outcomes with their proposed S4 model, which demonstrates superiority in identifying three distinct age groups: individuals below 30 years, those between 30 and 70 years, and those above 70 years. For this reason, future studies will be necessary to evaluate more precisely the usefulness of the dorsal groove for age-at-death estimation, particularly in intermediate age groups where lower accuracy has been currently reported to date [8].
Our study does not reflect a significant error reduction when we compare age-at-death estimation by phases following the macroscopic approach with methods based on component analysis (Tables 2., 3., 4., 5., 6., 7., 8. and 9.), results shared by other studies [4, 15, 26]. Even so, sometimes the estimation by phases following the schemes proposed by the traditional methodology can produce a smaller error than the estimation by components [31], as we observed in our study with experienced practitioner 2. It is our contention that the subjectivity of traditional methods as a defect and the ease of identifying components as a virtue are two insufficient criteria for declaring one approach better than the other. It should be noted that the effect of the observer’s experience is significantly more important using traditional phase-based methods, compared to component analysis. This is because the estimation is carried out from a holistic perspective considering all the variables globally. For instance, it is important to consider factors such as bone weight, which is strongly linked to processes associated with osteopenia and osteoporosity, as well as age [10].
In all disciplines, experience allows us to work more efficiently. Nevertheless, currently, experience is arguably the most crucial factor in forensic anthropology. Although reliable estimates can be derived when sufficient experience is available, demonstrating and replicating these results remains a significant challenge [32]. As stated by Schmitt et al. [8], the methods should give the same result regardless of the observer using them.
A preliminary examination of the findings presented in this technical note indicates that the utilization of the previously described rule-based explainable machine learning techniques for component analysis [27] may offer a potential solution. The automated algorithm estimates age-at-death with a similar degree of precision as that observed by the four practitioners through macroscopic assessment, regardless of their level of experience. In addition, as shown in [27], competitive global error values are also achieved in comparison with similar automatic methods proposed in other studies [17, 19, 20].
Meanwhile, as shown in Table 1, whether the estimation is performed by practitioners or by the artificial intelligence system, the main problem remains the component labeling process. As specified in [27], a rigorous validation process is followed to design the automatic rule-based method, which provides a clear understanding of the performance of the generated rule base. Nevertheless, as our findings illustrate, given the significant influence of the forensic practitioner’s experience on the labeling process, it can be postulated that the method is tailored to the observer analyzing the pubis sample.
It seems clear that artificial intelligence is a good alternative to improve methods for age estimation, although it is still necessary to improve the component labeling process. The following lines are proposed for future research:
Promote the use of computer applications that facilitate a semi-automatic labeling process of characteristics and allow a more exhaustive training for the specialist through the use of numerous images and examples.
Carry out the learning process of the algorithm from a component labeling data set obtained by the largest number of practitioners possible, avoiding “methods tailored to the observer”.
Ensure that the algorithms used offer specific estimation errors for each age group.
Ensure that automatic methods always give the same result regardless of who applies them. This will be a help for the observer, but not a substitute, since the observer’s experience will be what allows him/her to interpret the results, detect errors, and select the best methods for each case.
Although the number of observers considered is comparable to that employed in similar studies [5, 6, 11], the primary limitation of this study is that this is insufficient to justify the assertion of reproducible results. The process of labelling large samples of dry bones is inherently time-consuming. The labeling process of the pubis collection used in this study represents several months of dedicated work. Furthermore, it is required that observers work on-site, where the collection is available. In the case of professionals unable to work full-time on this task, the process can thus take years. To increase the number of observers in future studies, our team is developing specific software tools and implementing image analysis methods to streamline this task. This will allow a greater number of observers to analyze the sample without having to spend several months working at the bone site.
Finally, as a point of consideration for future studies, we intend to consider the high research potential of the Granada pubis collection, given the high amount of ante mortem information available. A study with similar aims to this one should be carried out with female subjects. In addition, the impact of traumatic or pathological changes on age estimation could be a valuable addition for future studies. These approaches could enhance the machine learning model’s capacity to interpret such cases effectively.
Comments (0)