Novel computational image analysis to predict regional nodal disease for early-stage oral cancer

Objective: Nodal disease (N+) for early-stage oral squamous cell carcinoma (OSCC) is the most significant prognostic factor for survival. There is a lack of effective predictor to justify prophylactic neck dissection. Quantitative tissue pathology (QTP) has shown its promise in providing an objective means for diagnosis and prognosis of many cancer types. We conducted a pilot study on the utilization of QTP to evaluate risk of nodal disease. Study design: Retrospective case-control study Subjects and methods: Histological sections from 15 primary tumors of clinically node-negative (N0) patients were stained with Feulgen-Thionin followed by acquisition of digital images and image processing to measure the mean and variance of nuclear phenotype and tissue architecture features from 45,253 nuclei of 45 tumor nests. Association between features and nodal disease outcome (N0 or N+) was investigated using nested analysis of variance adjusted by patient. Ability to discriminate between N0 and N+ was analyzed using multivariate logistic regression and receiver operating characteristics (ROC) curve analysis. P-value<0.05 (2-sided) was considered significant. Results: The N+ group presented higher mean values of chromatin condensation levels and cell density compared to those of the N0 group. ROC curve showed a strong discriminative ability of chromatin condensation levels between the N+ and N0 groups with a sensitivity and specificity of 1.0 and 0.75, respectively. Conclusion: This study reports the first-ever data on QTP as a risk assessment tool for nodal disease in early-stage OSCCs. Such computational imaging analysis potentially provides a new objective approach to predict regional nodal disease.


Introduction
For patients with primary oral squamous cell carcinoma (OSCC) the most significant prognostic factor is the spreading of tumor to lymph nodes of the neck where the presence of tumor in any single node can reduce survival by 45-50% [1][2][3]. Thus, all neck nodes must be clinically assessed prior to treatment planning. For clinically node-negative (cN0) patients, most clinicians advocate elective neck dissection (END) to minimize the chance of tumour spread [4]. However, to avoid overtreating ~75% patients who would never develop nodal disease, some clinicians prefer to wait and dissect only when nodal disease actually occurs; and for tumors that eventually metastasize, patients may miss the window of early intervention leading to under-treatment or death. The decision on how to manage neck nodes remains controversial. In current practice, the decision of END is most often based on indicators from histological descriptions of the surgical samples. However, this pathology interpretation can be subjective and may often lead to over treatment [5,6]. In our population-based retrospective study, we found that the tumor depth of invasion (DOI), a commonly used marker for the justification of END, is a poor indicator of nodal disease for earlystage cN0 OSCCs [7]. To avoid over treating those who will not develop nodal disease or under treating those who will benefit from early intervention, effective markers are urgently needed for the prediction of lymph node status for cN0 OSCC patients.
Quantitative tissue phenotyping (QTP) by computational image analysis has become a novel emerging means of obtaining objective information concerning diagnosis and prognosis of many cancer types [8][9][10][11][12][13]. During carcinogenesis, squamous cells go through progressive changes in appearance due to accumulation of genetic and epigenetic alterations, making them phenotypically different from normal cells. In our previous study, Guillaud et al. developed a scoring system (nuclear phenotype score, NPS) for premalignant lesions on cancer progression by recognizing nuclear phenotype that are distinctively different between normal, mild/moderate dysplasia, severe dysplasia/ carcinoma in situ, or SCC [14]. We have also used this technique to assess surgical margins to predict the chance of recurrence [15]. In this case control study, we aim to use QTP to measure NPS of tumor nests of primary OSCC tumors of cN0 patients to investigate its potential to predict nodal status during follow-up.

Patient and surgical specimen
From patients enrolled in the Canadian Optically-guided approach for Oral Lesions Surgical (COOLs) Trial [7,16,17], we identified 15 who had no previous history of OSCC, received surgical excision with curative intent, and were periodically followed up with thoroughly annotated information on demographics, clinic-pathological tumor characteristics, and clinical outcome on survival or nodal disease. The average age was 58.7±11.9 years and the male to female ratio was 1:1. Of the 15 patients, 8 remained N0 with a median follow-up time of 4.1 years, and 7 were found to be N+ after surgery or during followup. There was no difference in clinical or pathological attributes of the primary tumors between the two groups (Table 1).
Two serial sections (4-µm thickness) from formalin-fixed paraffinembedded (FFPE) primary tumors were obtained, one was stained with Hematoxylin and Eosin (HE) for visualizing cell type and tissue structure, and the other one was stained with Feulgen-Thionin (FT) ( Figure 1A and 1B, respectively), using previously described protocol [18], for nuclear content.

Image acquisition, segmentation and processing
Stained H&E slides were imaged at 20X magnification with Pannoramic MIDI © and reviewed on Pannoramic Viewer © (3DHISTECH Ltd., Budapest, Hungary). Stained FT slides were imaged at 5 focal-planes every 500 µm using a modified version of our Cyto-Savant™ imaging system with bright-field microscopy (600 ± 5 nm illumination) and a charge-coupled digital camera which had a resolution of 1280 by 1024 (0.8 numerical aperture, effective pixel spacing in the sample plane was 0.27µm) [18].
Once digitized images were obtained, each pair of HE and FT images was put through an image analysis pipeline involving the following steps. First, an experienced pathologist (CFP) identified and demarcated tumor nests, the region of interests (ROIs), by drawing around the nest border on the HE images and matched the same ROIs on the FT images. Second, the best in-focus FT-stained image of each nucleus within each ROI was automatically selected and segmented using the DUnit Program © (Integrative Oncology, BC Cancer Agency, Vancouver, Canada) [14,19] ( Figure 1C). Third, all segmented nuclei were manually evaluated and filtered to exclude: nuclei that were not squamous epithelial cells (granulocytes, lymphocytes, or fibroblasts); and 'unacceptable' nuclei that were overlapping with neighboring nuclei, incorrectly segmented, or out of focus. In the end, only nuclei of cancerous squamous cells were used for analysis.

QTP feature extraction
The DUnit program © automatically calculates 104 nuclear phenotype features and 16 tissue architectural features [20]. Nuclear phenotype features describe morphology, including the size, shape, and boundary variation of a nucleus based on the number of pixels (1 pixel = 0.116µm 2 ) occupied by a nucleus; photometric based optical density for each pixel and chromatin texture evaluated using in fractal dimension calculations, Run_length, Markovian, and discrete texture features for each segmented nucleus. Tissue architecture was computed by constructing a Voronoi tessellation and Delaunay Triangulation, using the centers of gravity of the nuclei as seeds. Together, they measure the spatial distribution of cells and the distance between neighbouring cells thus portraying tissue organization. Mean and standard deviation of each feature was calculated within each ROI. The complete list of 120 features is given in Appendix Table 1.

Layering of tumor nests as ROI
In addition to investigating each tumor nest as a unit, we also  performed layer-based analysis [21]. By generating a Voronoi tessellation, which geometrically partitions each nest into regions based on the positions of the nuclei, we generated successive layers for each nest in the following manner. All nuclei whose corresponding Voronoi polygons touched the nest border were assigned to Layer#1 (the "outermost layer"); Layer#2 consisted nuclei which were not in Layer#1 and had a Voronoi neighbour in Layer#1; Layer#3 consisted nuclei which were not in Layer#1 or Layer#2 and had an neighbour in Layer#2. Higher number layers were defined similarly ( Figure 1D and 1F). We combined each layer with its subsequent layer for the purpose of obtaining more representative measurements considering: 1) the possibility that these tumor nests are tangentially sectioned and that a segmented nucleus from a 2-dimension image gives only partial phenotypic information of a whole nucleus; and 2) the less abundant well-segmented nuclei in the layers. For example, we treated Layer#1 and Layer#2 or Layer#2 and Layer#3 as a single ROIs (Layer#1-2 or Layer#2-3) by adding the nuclei from the 2 individual layers.

Statistical analysis
Each feature was compared between the N0 and N+ groups, with adjustment for patient-tumor nest effect by performing nested analysis of variance (nANOVA) tests. Distribution of each feature was illustrated by box-and-whisker plots for the two groups. For evaluating the ability of each feature in discriminating ROIs into the correct nodal status group, we first performed a forward stepwise linear discrimination process to select the feature(s) with the highest discriminative power. Due to the relatively large number of variables and small number of ROIs, we stopped the selection process at maximum of two features to avoid over fitting. Selected feature(s) were then tested for classification performance analysis by receiver operating characteristic (ROC) curve, with area under the curve (AUC) as a representation the predictability for nodal status. For combinations of two selected features, a fitted probability of being N+ based on the model and features was computed. A threshold value with the highest sum of sensitivity and specificity was then calculated. All statistical analysis and plots were produced using R software (v3.2.3). As these were single variable and unpaired comparisons, all P values were uncorrected with P<0.05 considered to be statistically significant.

Results
A total of 45 tumor nests were identified with 23 (51%) from N0 and 22 (49%) from N+ groups (Table 2). These nests composed of 410 layers (N0, n=199; N+, n=211) and 45,253 segmented nuclei that were included for calculation of QTP features. There was no difference in the average number and pixel areas of the tumor nest per patient, the average number of layers per nest, or the average nuclei per nest between the two nodal status groups (Table 2).

Association between QTP feature and lymph node status
Tumor nests as ROIs: Among the 104 nuclear phenotypic features, 6 discrete chromatin texture features were significantly different between the two groups; they all indicate a significant higher fraction of high or medium condensation level of chromatin regions in the N+ group (Table 3 and Figure 2).

Combined tumor nest layers as ROIs:
In Layer#1-2, the N+ group exhibited significantly higher fraction of medium and high density chromatin condensation states (Lowvsmed_dnaMean, P<0.001; Lowvshigh_dnaMean, P=0.03). There was also higher value in dispersion of chromatin of high condensation state in the N+ group (High_den_objMean, P=0.003) ( Figure 4A).
In Layer #2-3, a similar observation was also seen where the N+ group had higher fraction of medium-density chromatin region  Abbreviation: N0, node-negative; N+, node-positive; SD, standard deviation ( Figure 4B). We also observed higher average values in run_length features which describe the size of chromatin clumps where the larger the run_length values, the bigger the clumps and vice versa.

Discriminative ability on nodal status
The observed differences in the features described above provided initial data supporting QTP analysis for differentiating N+ from N0. Thus, we next investigated whether these 120 features can be used to predict N+ status. Linear discrimination analysis with forward stepwise selection procedure was performed to determine feature(s) that achieved the best discriminatione between the two groups among a) tumor nests and b) combined layers.
For both types of ROIs, feature selection from within the 120 features consistently isolated Lowvsmed_dnaMean as the most Liu KYP (2016) Novel computational image analysis to predict regional nodal disease for early-stage oral cancer independent discriminative feature. For tumor nests, combination of Lowvsmed_dnaMean and VorPeriStdv gave an AUC of 85%. The threshold value of fitted probability based on the model on selected features was 0.41 with sensitivity of 82% and specificity of 78%. Combined Layer#1-2 showed the highest AUC of 94% ( Figure 5) when the combination of Lowvsmed_dnaMean and Mh_av_dstStdv features were used with threshold value of fitted probability of 0.28 achieving a sensitivity of 100% and specificity of 75%.

Discussion
Assessment and judgement of risk for nodal metastasis has always been a challenge for clinicians when facing cN0 OSCC patients. With its robust and objective nature, we report the first-ever use of QTP for quantitative analysis of tumour nests for predicting nodal disease. Comparison of QTP features between the two groups highlighted a reoccurring theme where the N+ nests demonstrated a higher ratio of medium to low condensation regions of chromatin and high cell density.
Nuclear chromatin alteration has been extensively studied as one of the regulators for genomic activity and functions which lead to aberrant chromatin remodeling that is seen in many human diseases and cancers [12,[22][23][24]. During normal cell cycle, the chromatin de-condenses and becomes loosely packed euchromatin, exposing DNA and allowing gene activity and transcription. When activity is not needed, chromatin becomes tightly packed condensed heterochromatin. Taken together, changes in condensation states of chromatin occur throughout cell division. Therefore, it could be hypothesized that the observed higher fraction of medium-condensed chromatin reflects higher proliferation activity in N+ tumor cells.
The intention for studying and comparing nest layers was motivated by the increasing evidence on the role of tumor microenvironment in initiation, growth, and spread of tumor. Our study also found that medium-to-low condensation ratio was significantly higher in nuclei of the outer layers of N+ tumor nests than that of the N0 group. Tumor nests are bordered by a well-defined boundary; and immediately inside this boundary are most likely where proliferative tumor cells reside. Indeed, the progression to cancer and metastasis require not only genomic and morphological changes but also cascades of events and responses with local microenvironment leading to loss of polarity, disruption in epithelial compartment, invasion, and angiogenesis [25][26][27][28]. This particular finding could indicate an interaction of these outer layer cells with tumor microenvironment which may play important roles in nodal metastasis. The current QTP system has the capacity to analyze in situ-stained specimens. From this point and moving forward, we plan to integrate tumor QTP features with profiles of surrounding stromal cells, including inflammatory cell types, especially for areas at the invasive front. This may help us explain the higher state of chromatin condensation seen in N+ group as well as elucidate the possible anti-cancer or pro-cancer interaction between the N+ and tumor microenvironment.
Regarding the tissue architecture of these tumor nests, we observed a strong correlation between cell density and nodal status, as characterized by smaller Voronoi polygon area and shorter distance between neighbour nuclei. From this, we attempted to infer NC (nucleus-cytoplasm) ratio, a measure commonly used in the conventional pathology, by associating cytoplasmic area with Voronoi polygon area. In normal cells, the nuclei size decrease as the cell matures; thus the NC ratio decreases as well. This suggests that an increased NC ratio in matured cells indicate atypical growth pattern which is the general observation in premalignant or malignant cells [29]. In our pilot cohort, the average nuclear area did not differ between N+ and N0 groups; however, the average polygon area measured was significantly smaller in N+ group. If we take the N0 group as the reference, i.e., set the reference NC ratio to 1, to indicate non-metastatic potential, a smaller cytoplasm area with the same nuclear area would mean a larger NC ratio. Analogously, we can suggest that NC ratio of the outermost layers may be associated with nodal status. Indeed, we observed N+ group had an increase in NC ratio, but the difference was not significant (P=0.11).
In addition, fractal dimension (FD), which measures the extent of irregularity and complexity in nuclear structure has been associated with malignant, less differentiated tumors, and poor prognosis in many cancer types [10,[30][31][32][33][34]. Although there was no difference in FD between N0 and N+ groups, we observed a higher mean of FD in nuclei of poorly differentiated tumors. Thus, this feature may potentially be used to compensate the subjectivity in determining tumor differentiation which could be associated with nodal disease outcome.
The most important finding of our study is however, the value of QTP analysis as a predictive and risk assessment aid in pathology. Two features, both describing chromatin texture, demonstrated outstanding ability discriminating N+ from N0 nest layers with AUC of 94%. This is much better compared to using the combination of differentiation and DOI, [7] which has shown 63% of AUC with 67% specificity and 54% sensitivity.
This pilot study is limited by the small number of patients included. Although there was no difference in demographics and clinicpathological characteristics, the observed association and the predictive ability of QTP features requires further validation in a larger sample size cohort with prospective recruitment, such as the COOLS study [16]. Nevertheless, despite the small number of patients, the number of tumor nuclei and layers of tumor nests are enough for meaningful statistical results.
In conclusion, the results of our study encourage the potential value of bringing computational imaging analysis as an adjunct tool for pathologists in assessing patients who may be at high risk of developing nodal disease. Additional studies are warranted with increased number of patients to corroborate the findings provided in this work.