Agreement between assessor in using shunt algorithm for frontoethmoidal encephalocele with cerebrospinal fluid circulation disorder

Introduction: Frontoethmoidal encephalocele (FEE) is a type of neural tube formation disorder. Hydrocephalus and intracranial cysts are the most common accompanying abnormalities in FEE. A high rate of shunt complications led to the development of the shunt algorithm for frontoethmoidal encephalocele (SAFE) to assess whether the shunt is needed. Method: This was a cross-sectional study with 10 cases assessed using the SAFE algorithm. Each case was assessed by two assessors in three experience groups (neurosurgical residents who have passed the neuropediatric division, chief of neurological resident, and neurosurgeon) with a double-blind sampling method. Results: The median age was ten months with 60% of the samples were female and 50% of the samples were not having shunt insertion, while 90% of the samples had FEE reconstruction. The agreement value with Fleiss Kappa showed low interrater agreement (κ = 0.037; 95% CI 0.035 to 0.039; p = 0.254) with moderate κ values of the six SAFE components where statistically significant for the cerebrospinal fluid (CSF) accumulation (κ = 0.460; 95 % CI 0.456 to 0.463; p = 0.001) and the FEE volume (κ = 0.450; 95% CI 0.447 to 0.454; p = 0.001). Agreement value in shunt insertion was adequate, with a value of κ = 0.250 (95% CI 0.245 to 0.255), p = 0.002. The agreement value in patients who had shunts was moderate with a value of κ = 0.411 (95% CI 0.403 to 0.418 p = 0.000. The agreement value in patients who were not shunted was low with a value of κ = 0.089 (95% CI 0.082 to 0.97 p = 0.439. Conclusion: The assessors’ agreement using SAFE in FEE patients with circulatory CSF abnormality was low and not statistically significant. All components did not have an optimal agreement value. The components that were closest to the moderate agreement value were the CSF accumulation and FEE volume. Both of them were statistically significant.


INTRODUCTION
Frontoethmoidal encephalocele (FEE) is a neural tube formation disorder with herniation of intracranial structures (meninges, brain parenchyma, cerebrospinal fluid) through a congenital defect in the anterior cranium. 1-2 FEE is commonly found in Southeast Asia countries such as Thailand, Malaysia, Myanmar, Cambodia, and Indonesia. 3 Surgical interventions in children with FEE should be carried out to repair facial deformities, visual field defects, and to prevent an increasing size of the FEE due to an increased intracranial tissue passing through the defect, as well as an increased risk of central nervous system infection due to rupture and ulcers on FEE. 4 Based on all documented FEE cases, 20 -25% were born with one or more intracranial abnormalities. Intracranial abnormalities due to FEE can be divided into two groups, which are FEE with cerebrospinal fluid (CSF) circulation disorders and FEE without CSF circulation disorders. 4-6 Prior to 2009, almost every case of FEE with CSF circulation disorders had a shunt placement (except for small cysts). Shunt placement is often accompanied by complications such as infection, exposed shunts, and over drainage of CSF. These complications mainly occur in patients with malnutrition, poor hygiene, and middle to lower economic status. 7 The pediatric neurosurgery division at our hospital had developed a shunt algorithm for frontoethmoidal encephalocele (SAFE) in 2010. SAFE was compiled based on the morphological and radiological features of FEE patients. 7 SAFE consists of a scoring table and a flow chart that can determine whether the shunt should be placed or not. SAFE consists of six points arranged using ordinal variables. This arrangement is intended to facilitate physicians in assessing patients with FEE. SAFE has a value of 0 to 2, with a total value of 0 to 11. This score is then applied to the SAFE flowchart and is used to determine whether shunt placement is needed or not. To date, the use of SAFE remains limited and needs further studies to verify the agreement among physicians so that SAFE can be widely used in the future. We hope

SAFE score assessment
The assessment was carried out on 6 components of the SAFE score: location of the CSF accumulation, CSF on the cele outlet, presence of membranous covering skin, the volume of FEE, defect diameter, and the length of the cele outlet. The values were summed for each patient based on clinical and radiological photographs.
The overall calculation was carried out for the agreement value between all assessors using the Fleiss' Kappa method. The agreement value was interpreted with a range of Kappa values as follows: 9,10 Value of < 0.20 was slight agreement; 0.21 -0.40 was fair agreement; 0.41 -0.60 was moderate agreement; 0.61 -0.80 was substantial agreement; and 0.81 -1.00 was near-perfect agreement.
The overall assessor agreement value with the Fleiss' kappa method shows the low agreement value between assessors ( Table 3). The total score had a low agreement value with a value of κ = 0.037 (95% CI 0.035 -0.039), p = 0.254. The overall assessor agreement value with the Fleiss' kappa method was also calculated for each component. The CSF accumulation had the highest value of κ among the values of the other components with a value of κ = 0.460 (95% CI 0.456 -0.463), p = 0.001 so that it provided a moderate agreement interpretation. A moderate agreement value was also found on the FEE volume with a value of κ = 0.450 (95% CI 0.447 -0.454), p = 0.001. The defect diameter provided an adequate agreement value with κ = 0.333 (95% CI 0.329 -0.337), p = 0.001.
The other three components of the algorithm show low agreement scores and not statistically significant. The length of the outlet had a value of κ = 0.148 (95% CI 0.144 -0.152), p = 0.020. CSF on the outlet had a value of κ = 0.070 (95% CI 0.066 -0.074), p = 0.058. Presence of the assessment of FEE patients with CSF circulation disorders will have a better algorithm, making it easier for physicians to determine the need for shunt placement and can be used safely in the management of FEE patients.

METHODS
This was a cross-sectional study with each case assessed by two assessors who were grouped into three groups. Neither the researcher nor the assessor knew the case being assessed (double-blind). The study sample was FEE patients with CSF circulatory disorders collected from medical record data from 2015 -2017. The number of cases was determined based on the table of sample size estimation using the Intraclass Correlation Coefficient (ICC) 8 with the number of assessors determined by two people and an expected ICC of 0.9 with a 95% confidence interval (CI) with alpha +/-0.1. The case was taken by the pediatric neurosurgery division team, the researcher, and the assessor who did not know the previous case (double-blind).
The scores of each assessor were collected, then a descriptive analysis was carried out, followed by the SAFE assessment reliability test with the Kappa value calculation to assess the agreement between two assessors. The agreement on the results was determined by the Kappa value calculated using the SPSS software.
The Kappa value in determining the strength of agreement is a diagnostic test recommended by Landis and Koch. The strength value < 0 is no agreement, 0.00 -0.20 is none to slight, 0.21 -0.40 is fair, 0.41 -0.60 is moderate, 0.61 -0.80 is substantial, 0.81 -1.00 is almost perfect agreement. A reliable Kappa value is between 0.61 -1.00.

Patient characteristics
Scoring was carried out on 10 cases taken randomly. The distribution of patient age data was abnormal (p = 0.012) due to number of cases and age range that was far apart ( Table 1). The characteristic of samples can be seen in Table 2.     Table 5. Frequency distribution of agreement scores based on Cohen's kappa.

The value of each component of the SAFE score
Each component in the SAFE score assessment has its own score range based on the assessment results. These ranges include the location of the CSF accumulation: 0 -2; CSF on the outlet: 0 -2; presence of membranous skin: 0 -1; FEE volume: 0 -2; defect diameter: 0 -2; and length of the outlet: 0 -2. The total SAFE score was obtained by adding the value of each component listed on the scoring sheet including the mode, median, maximum and minimum values. In addition, an analysis of the agreement value between the two assessors was carried out using Cohen's kappa for all assessors against the 6 assessment components above.
The Cohen's kappa value between two assessors for each component of the SAFE algorithm was calculated for each of the 2 assessors followed by the calculation of the mean value of κ to obtain a single index of the agreement value between assessors. The frequency distributions of the agreement scores for both total scores ( Table 4) and 6 components of the SAFE score ( Table 5) are shown in the following tables.

SAFE score value and shunting
Analysis of the scores between all assessors in patients with shunt insertion or not based on the total SAFE value and the value of each component can be seen in Table 6 and Table 7.
The results of Fleiss' Kappa analysis showed the low value of agreement between assessors, both for patients who have shunt insertion or not, where almost all aspects show a low agreement value. In patients who had shunt insertion, sufficient agreement was found for FEE volume with κ = 0.271 (95% CI 0.266 -0.276), p = 0.001. Moderate agreement was found at defect diameter with a value of κ = 0.599 (95% CI 0.593 -0.604), p = 0.001, and good agreement was found on CSF accumulation with a value of κ = 0.684 (95% CI 0.678 -0.691), p = 0.001.
Fleiss' kappa analysis on determining the need for a shunt was also analyzed ( Table 8). The results of Fleiss' kappa analysis showed that agreement on shunt insertion in all patients was sufficient, with a value of κ = 0.250 (95% CI 0.245 -0.255), p = 0.002. The agreement value in patients who had shunt was moderate with a value of κ = 0.411 (95% CI 0.403 -0.418 p = 0.000), and the agreement value in patients who did not had shunt was low with a value of κ = 0.089 (95% CI 0.082 -0.97; p = 0.439).

DISCUSSION
FEE patients are commonly found in South East Asia, with an incidence of 1 in 5,000 live births. This lesion is also significantly correlated to the pediatric population with a low economic class. 11 The etiology of FEE itself is still not fully understood. Neural tube defects, such as spina bifida could lead to frontoethmoidal meningoencephalocele. Aung and Hta reported the possibility of correlation between folate deficiency and this condition, even though there had not been enough reports regarding the correlation between maternal folate level and this condition. 3 A study conducted by Hoving and Vermeij-Keers proposed a theory of pathogenesis originating from the disruption of neural components from the ectodermal layer in the late phase of neurulation, which is the closure of the rostral neuropore and also a mesodermal defect in the area with disrupted separation. 12 Selected patients in this study had a median age of 10 months ranging from 10 days to 5 years. A study conducted on a FEE population in Myanmar showed a similar study with the highest number of subjects under 2 years of age. 3 Similar age group could be observed in a study with a larger scope, in India in which the youngest subject was 1 day old and the oldest was 6 years old. 13 Another study reported a little difference where the median age of the index is the most popular measure of valuation agreement. This value can solve the difference problem by assessing the bias and appropriateness between assessor ratings. 21 There was a different agreement value between the assessors on each component of the SAFE algorithm with a tendency to a low agreement value distribution with a low to moderate agreement spread for each component. It showed a better agreement value in the CSF accumulation assessment even though it remained in the moderate agreement group. 22 It showed that the optimal agreement value had not been achieved for the use of the SAFE algorithm when it was applied to the assessor.
The frequency distribution of Cohen's kappa agreement values showed the highest frequency in the group with low agreement values followed by moderate agreement values. It showed that there were still high differences in the SAFE algorithm assessment component with the highest agreement value. The assessment on the component of CSF accumulation between the same level of assessors showed moderate agreement values involving the chief level resident and the specialist doctor. It can be concluded that the CSF accumulation assessment component had the possibility to be influenced by the doctor level. However, this cannot be completely eliminated again by not carrying out any training that was held prior to assessing the components of the CSF accumulation. Training will increase the uniformity of the value even with minimal knowledge of the field under study. 23 The overall CSF on the outlet value had a low agreement value, with a value of κ = 0.070 (95% CI 0.066 to 0.074), p = 0.058 based on Fleiss' kappa calculations. The low overall agreement value for the CSF on the outlet showed the difficulty to use and apply the SAFE algorithm to assess this component. It can be caused by the assessment made not from the original image of the data but from the photo where the sharpness level is lower than the original image. Differences in understanding between assessors cannot be eliminated because uniform training was not provided prior to the assessment.
The difficulty in the assessment of CSF  samples was 5.3 years old, ranging from 2 months until 24 years old. 12 The genders of selected subjects were dominated by females. This proportion was different from another study where the males were the dominant population of the sample. 3,12 A study by Marshall et al. also showed larger male subjects compared to our study. 14 Shunting was performed on five patients, and one patient also received FEE reconstruction. Almost all patients underwent surgery for the FEE lesion. FEE reconstruction was performed to close the defect and performed early to give chance for a normal growth. 15 FEE reconstruction approach should be performed with cosmetic consideration through avoidance of incision on the face. The incision could cause obvious surgical scars. However, the surgical approach should also be performed comprehensively, intra-or transcranial along with craniofacial reconstruction. 5,15

SAFE scoring
Assessment of FEE patients using SAFE scoring aims to facilitate decisionmaking for shunt insertion. Although hydrocephalus is a rare occurrence in anterior encephalocele, 10 -15% of the incidence of hydrocephalus is associated with this defect. 16-18 Ventriculoperitoneal shunt was performed as a diversion of CSF in cases of FEE with hydrocephalus. 19 Differences in assessors can produce differences in results, so an equation or agreement value is needed. The agreement value is the inter-rater reliability value. This value is a matter of concern in most studies because of the possibility of differences in understanding and interpretation resulting in different outcomes. 20 The opposite occurs because of bias in the judgment so as to produce consistent differences. Even though an assessment is perfectly correlated, if there are consistent differences, then the agreement is poor enough among the assessors. The Kappa on the outlet was the use of conventional film as reference. Radiological prints that consisted of multiple cuts cause a lack of uniform interpretation. Other issues that have emerged include inconsistent radiological sources from the same hospital and different instruments. A centralized use of digital data that can be processed with the help of computers will provide more accurate radiological results. 24 The lowest agreement value was found in the membranous-covering skin assessment component where the value of κ = 0.083 (95% CI 0.078 to -0.088), p = 0.307. It can be caused by the assessment made not from the original image of the data but from the photo where the sharpness level was lower than the original image. The disagreement condition indicated the need for a more careful evaluation. Assessment of membranous skin covering requires good clinical experience and understanding. This can lead to differences in the interpretation and understanding of the shells themselves. The assessment made from clinical photographs can also contribute to the differences in the scores given. The level of sharpness and angle of taking of clinical photos can be one of the factors in making the assessment compared to direct assessment.
The overall FEE volume value assessment gave a moderate agreement based on the calculation of the Fleiss' kappa value. Assessment using photographs from CT scan images can produce images that are less accurate when compared to direct assessments on the results of CT scan computer images. A more accurate volume measurement can be done with the aid of a computer in determining a volume from the radiological results. Reiner et al. 24 reported the accuracy of the assessment of CT scan images will increase by performing an assessment on a computer (workstation) with a picture archiving and communication system (PACS) system compared to conventional radiological film interpretation.
An assessment of the components that can be objectively assessed with the aid of a computer will produce a uniform output. The bias between assessors can be reduced by the emergence of standardized tools such as distance measurement programs so that defect diameters can be measured with minimal variation. 24 The defect diameter assessment using the conventional method was proven to provide the highest distribution of values at a low agreement value (33.33%) and a sufficient agreement value (20%). These results indicated a high variation in the valuation between assessors.
Assessment with digital data using the computerized measurement method can overcome differences between assessors as in the research conducted by Jain et al. where digital images were a valid and reliable alternative to conventional film. 25 The accuracy of this assessment will increase with the existence of a centralized system where digital data can be processed with maximum potential as well as more objective computerized measurements. 24 This also affects the results obtained, where the difficulties experienced by the assessor on the uniformity of measurement methods. Consider adding a uniform measurement tool such as more specific instructions on how to measure defect diameter and using 3D reconstruction as a reference for defect diameter measurement.
The clinical experience contributed significantly to the assessment of the length of the outlet in the SAFE algorithm by looking at the mean agreement value which was moderate. Diagnostic accuracy arises from how well a system or test predicts the presence or absence of a condition or how well a modality measures the level or magnitude of the condition. The assessors' perceptions are the basis of data interpretation and contribute to the diagnostic results. The tool used to assess diagnostic performance is integral to studying image perception in the medical scope. 26 Shunt insertion was performed on five FEE patients according to clinical considerations and the SAFE algorithm. In these 5 patients, the best value suitability was found in the specialist doctor group where almost all the specialists agreed to shunt all 5 patients. It strengthens the influence of knowledge level on the SAFE algorithm assessment. Sattler et al. stated the importance of equalizing understanding among assessors. 23 The Fleiss' kappa analysis of the SAFE score in patients who had shunts insertion did not show sub-optimal agreement among the assessors. This illustrated the accumulation of various problems both from the SAFE algorithm and limitations of research such as the absence of training, and digital data for radiological and clinical images of patients which should make the assessment easier. Therefore, the agreement value of using the SAFE algorithm in this study was still not optimal. 23-25

CONCLUSIONS
The agreement value in the assessment of the CSF on the outlet and the presence of membranous skin using SAFE were low and insignificant. The agreement value in the assessment of the CSF accumulation and FEE volume were moderate and significant. The agreement value of defect diameter was sufficient and significant. The agreement value of the length of the outlet was low and significant.