Improve Anomaly Detection in Low-resolution and Noisy WSL images Using Transfer Learning Wafaa A

Improve Anomaly Detection in Low-resolution and Noisy WSL images Using Transfer Learning

Wafaa A. Al-Olofi, Muhammad A. Rushdi, Muhammad A. Islam,Ahmed M.Badawi.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

Department of Biomedical Engineering, Cairo University, Cairo, Egypt
[email protected]
[email protected]

Abstract— Whole Slide Imaging (WSI) is the most recent technology introduced in medical pathology practices. WSI images are created using a computerized system that scans, stitches and storages pathology specimen glass slides into digital images, which provide a multi-resolution pyramid construction of a huge gigabyte size due to the need for containing a high amount of tissue details. Therefore, digital WSI brings major challenges in data storage, image analysis and transmission (eg. telepathology and interoperability). WSI has form of digitizing tissue specimen that enables the enhancement of computational analysis algorithms as a cause of classification challenges due to distinguish lymph node interval cancer pattern with a variance in tissue structure between normal biopsy as well as anomaly one. This variance as well as the huge size of the images make creating an accurate and reliable automated breast cancer image analysis a challenge.
We investigate the effects of the scaling and noise on anomaly detection with a computerized image analysis tools that utilize digital slides to perform objective quantification of hematoxylin and eosin (H&E) stains and lesions analysis. First, we analyze the effects of scaling due to focus on developing an approach for Computer Aided Diagnosis (CAD) system, depends on fair suggestion for an effective sets texture features of local binary pattern (LBP)versions. We started to apply these features to perform classifier over the pyramid in five scaled levels with multi-resolution, then we proposed CAD system in only a very high magnification level with more detailed review to find the best scale or window size, hence the early detection of breast cancer anomalies can reduce the computational cost, also better quality and accuracy can be achieved. Thus we propose a learning-based approach to find the scale mappings between WSI levels using partial least-square (PLS) regression. The learned scale mapping can be used to detect anomalies in lower-resolution images and small magnification. Second, we explore the effect of different levels of Gaussian noise on anomaly detection and utilize algorithms of de-noising with PLS, Block Matching 3D (BM3D) and both PLS+BM3D to show how different de-noising techniques can help to reduce the noise severity on anomaly detection.
In this paper, we worked in the detected relevant lesions, where lesion’s characteristic features and normal tissue features are fed to the classification step that includes by Support Vector Machine (SVM), k-nearest neighbors’ algorithm (KNN) classifiers to produces probability maps for classes, followed by a post-processing step normalization, provides faster and more cost effective with Principal component analysis (PCA) reduction dimensionality technique, Estimate subset of more accurate texture features , where selected region of interest to obtain final diagnosis result and finally evaluation with Receiver operating characteristic (ROC) curve to obtain area under the curve value equal=.99974 and classifier accuracy equal with SVM by 10-fold cross-validation =99.3966%.

Key words: Whole Slide Imaging, multi-resolution pyramid construction, lymph node interval cancer, local binary pattern and Computer Aided Diagnosis


Whole Slide Imaging (WSI) is a novel technology that is beginning to be in the very near future a useful tool in a most routine microscopic application which is also known as a one key of telepathology issue, And WSI is called virtual microscopy because the images are viewed without the use of a microscope or slides. Virtual slides have a structure of multi-layers in multiple-resolutions at each image location, according to the highest resolution obtained to allow more detailed showing specific regions of interest. Scans may be acquired at 2x, 10x, 20x, 40x, 60x and 100x or a combination of magnifications due to be presented at various areas of interest and showing precision details of different diseases on multiple digital slides that it composed with scaling levels to supply the ability to interactively demonstrate different magnifications and display the high-resolution for the high performance level with representation existing special H&E tissue stains1,2. The usage colour in digital pathology can demonstrate the pathology finding and give objective quantification measures that describe the characteristic of spatial part in tissues to elevate the professional levels of the pathology presentations according to every pyramid levels, where the image pyramid consists of a base largest level can reduce dimensionality as a series of successively smaller sub-images that are called tiles , each one represents on at half the along x-y coordinates for the previous tile until successively smaller sub-images corresponds to the lowest resolution level. Thus WSI is created to be realized in pathology application.

Our challenge focusing on the approach of diagnosis Whole Slide Image can be represented on the detection of metastases in lymph node digitized images that is known as one of the most important prognostic variations or growth in breast cancer. Prognosis is poorer when cancer has spread to the lymph nodes. The diagnostic procedure for pathologists is, however, tedious and time-consuming and prone to misinterpretation. A computer aided diagnosis (CAD) system is the solution to enhance the metastasis detection, when automatic detection of lymph node metastasis, it is one of a great potential to help the pathologist to be sure about the right decision. Within the past few years, the field has been moving towards grand goals with strong potential diagnostic to analysis the whole-slide images to detect or grade cancer, to predict prognosis or identify metastases 3.

A computer aided diagnosis can be texture classification, which involves two phases: the learning phase and the recognition phase. In the learning phase, the target is to build a model for the texture content of each texture class present in the training data, which generally comprises of images with known class labels. The texture content of the training images is captured with the chosen texture analysis method, which yields a set of textural features for each image. These features characterize the textural properties of the images, such as spatial structure, contrast, roughness and orientation. In the recognition phase, the texture content of the unknown sample is first described with the same texture analysis method. Then the textural features of the sample are compared to those of the training images with a classification algorithm, and the sample is assigned to the category with the best match. Optionally, if the best match is not sufficiently good according to some predefined criteria, the unknown sample can be rejected instead.

The WSI image with its pyramid construction can be computer vision problem that is approached as solving the task of associating data or knowledge across different domains, as we consider images of different resolutions as different domains. For example, cross-resolution image classification utilizes training data captured by high-resolution (HR), and thus the designed features or classifiers can be applied to recognize test data at a low-resolution (LR). With the goal to transfer the knowledge from the source to target domain, recent developments in transfer learning have shown promising results for cross-domain recognition problems. Transfer learning can be applied in many applications from medical imaging diagnosis systems to material and texture classification.

In recent year, the digital image in medical histopathology filed participates in many kinds of research and publications with giving promising results, which contribute to analyze and detect anomaly specimen digital images by improving the understanding properties of possible unhealthy tissues during diagnosis. Thus there is a large body of work in the literature about medical image analysis based on different type of level features. While some of those are imported from fundamentals of computer vision such as texture, morphometric and intensity-based features 3, Wavelet, Co-occurrence and first-order statistics features 4 and capture the color distribution over nuclei and cytoplasm, according to H&E stain 5. Some studies propose more efficient solutions such as detection in multi-resolution or multi-scale to reduce computational cost. These approaches usually begin on a subsample of slides and increase the resolution on interest regions until reaching sufficient confidence. This methodology attempts to mimic the analyzing pattern of pathologists on whole-slides in a way.
A. Basavanhallyet et al.,3 employed one approach for analysis WSI can define as splitting the image and integrating the features from multiple field-of-views (FOVs) in H&E stained breast cancer histopathology slide with various sizes. However, automatically selecting an optimal FOV size is not straightforward. Then using features selection which are morphology for nuclear architecture and textural features were extracted from a multi-FOV of varying sizes and important features that was performed individually at different FOV sizes, were identified to distinguish low/high-, low/intermediate-, and intermediate/high-grade patients. Finally, each FOV was classified to combine all prediction of entire WSI slide unsurprisingly, the highest performance was obtained when distinguishing low from high-grade patients.
M. Peikari et al.,4 indicated a proposed texture technique to prove claims about the fact between resolution and accuracy detection, when may cause the low resolution missing in some details. 1-They utilized a greedy approach to reduce randomly selects as small patches to analyze whole-slide images. 2- Image patches were divided into smaller tiles 3-Gaussian-like texture filters were applied to them: Texture filter responses from each tile were combined together and statistical measures were derived from their histograms of responses. 4-Employing to combine these extracted features from tiles to form one histogram of words per every image patch, then fed a support vector machine classifier that was trained using the calculated histograms of words to be able to distinguish between clinically relevant and irrelevant patches. Experimental work on 5151 image patches had came from 10 patient cases with 65 tissue slides that proposed to apply texture technique and obtaining out-performed two previously proposed color and intensity based methods with an area under the ROC curve of 0.87.

M.Valkonen et al,5 extracted tissue blocks of 200 * 200 pixels are then randomly sampled from normal and cancerous regions from both channels. Features are Extracted:1- Texture features included, for example, contrast, correlation and energy, calculated from the gray level co-occurrence matrix (GLCM), using local binary patterns (LBP) and Nuclei density features included descriptors related to inter-nuclei distance inside the sample block, such as mean, maximum, minimum and standard deviation The 104 feature vector representations, which are fed to a random forest model as training data. During classification, stage to evaluate AUC50.84–0.91 for tumor vs. normal tissue detection.

We focused in a technique of cropped region of interest ROI from WSIs biopsies or specimens, when especially this method gives importance about what types of features are more suitable and useful for classification by the computer aided diagnosis (CAD). We utilized our CAD system to perform its classification type of tissue over cancerous or normal tissue according to pathologist’s judge each case based on multiple WSIs and label it accordingly malignancy or not that can find the individual WSIs, so with these tracking records of pathologist’s viewing of WSIs in Camylon2016 competition group.
Then we propose a trial learning-based approaches to infer low-resolution from high-resolution, to projection spot information for low-resolution images as pyramid benefits structure. We investigate a method for learning the mapping of the low-resolution image features to high-resolution image features, which is the Partial Least-Square (PLS) regression mapping algorithm. We study the effect of scale, for example, level 3 to enhance from different levels 0, 1, and 2 on the classification performance and demonstrate the purpose of super-resolution in feature domain to effect on the classification performance with un-optimal scale variations. Finally, we added different density of Gaussian noise and used both algorithm PLS, BM3D and both.

Computer aided diagnosis might be more beneficial in developing countries which are not lucky enough to have expert pathologists.


2.1 Dataset Description

The Camelyon16 dataset was known as a challenge for the virtual microscopy of WSI slide images by the supporting of the 2016 IEEE International Symposium on Biomedical Imaging (ISBI-2016). The data in this challenge (The Camelyon16 dataset) has constructer for 400 whole slide images (WSIs) as a total number of sentinel lymph node collected by two institutions Radboud University Medical Center (Nijmegen, the Netherlands), and the University Medical Center Utrecht (Utrecht, the Netherlands)3.The ground truth data for the slides containing metastases is given in two formats: .xml files is referenced to vertices of the annotated contours to detect the locations of cancer metastases, according to a pathologist’s delineation of regions of suspicious tissue on lymph nodes WSI images. And WSI binary Masks containing the location of the cancer metastasis18.

Our dataset contains 80 WSIs, that divided to 2983 patches corresponding to 40 normal images extract 1476 patches and 40 tumor images extract 1507patches. These subsets are then divided equally to 1990 images for training and 994 images for testing. That number of testing and training depends on splitting all data into Seventy-five percent as normal and Twenty-five percent as tumor.

We used OpenSlide interface, a vendor-neutral C library for reading and manipulating digital slides of diverse vendor formats. The library is extensible and easily interfaced to various programming languages. An application written to the OpenSlide interface can transparently handle multiple vendor formats. OpenSlide is in use today by many academic and industrial organizations world-wide, including many research sites in the United States that are funded by the National Institutes of Health6.

Classification steps

Our category for histopathological image classification employs a supervised learning model, when utilized manually selected regions of interest that define by class labels provided by pathologists. Thus, the identification and localization of diagnostically relevant regions of interest have emerged as an important initial step for whole slide image analysis. To obtain an importance about what types of features are more suitable and useful for classification. For example, we applied first order statistical features (mean, standard deviation, Smoothness, Entropy, etc.), Gray Level Co-occurrence Matrix (GLCM) and Gray Level Run Length Matrix (GLRLM) features have been extensively utilized in medical image analysis to compare with three extended of Local Binary Pattern (LBP). Thus they are shown to be effective for discrimination of easier problems such as classification of healthy versus invasive cancerous lesions, the information can be captured by those features, then this information can have fed to type of the supervised KNN, SVM and as shown in figure 2.1.

Figure 2.1 A generic WSI analysis pipeline. The processed WSI analysis result is used as a second opinion supporting the physician’s diagnosis in the computer aided diagnosis7.

We use three different types of texture features to represent the information in the gray image intensities which are:
Rotation Invariant Co-occurrence among Local Binary Patterns (RIC-LBP)8: RIC-LBP features represent an extension of LBP descriptors which can describe complex textures by observing the spatial relations among adjacent LBP by evaluating the histogram of LBP pairs considered at different orientations. RIC-LBP accomplished local rotation invariance by attaching the same label to each co-occurrence LBP pair that has identically a rotation-invariant. The major benefits of these descriptors are the invariance to local and global rotations of the cell image due to its orientation, don’t effect on its classification and the robustness occurs greater in uniform changes over the pixel’s intensity against the traditional LBP with a low computational cost for LBP pairs.
In original LBP algorithm. where I be an image intensity and r = (x, y) be a position vector in I. LBP at r is defined as follows:
LBP(r) = ?_(i=0)^(N-1)??sgn(I(r+?s)-I(r)) 2^i ? (2.1)

sgn(x)={?(1, if x?0@0, otherwise), ?

Where N is the number of neighbor pixels. ?s is displacement vector from the center pixel to neighboring pixels given by ?s= (s cos(?i), s sin(?i)), where ?i=2?/N i and s is a scale parameter of LBP.
But in LBP pair with simplest way to include rotation invariance is to assign a rotation invariant label to each LBP pair.
P_”?” (r,?r_”?” )=(?LBP?_”?” (r) , ?LBP?_”?” (r + ?r_”?” )), (2.2)
?LBP?_”?” (r)= ?_(i=0)^(N-1)??sgn(I(r+??s?_(i,”?” ) )-I(r)) 2^i,? (2.3)
?s_(i,”?” )=(s cos?(?_i+”?” ),s sin?(?_i+”?” ) ),
where ? indicates the rotation angle of an entire LBP pair. Then to obtain rotation invariance, the same label is given to (“?=0 ” ; 45; 90; 135); and 180 when their binary patterns are rotation equivalent.

Multi-scale Co-occurrence Local Binary Pattern (MCLBP)9: It is a texture pattern feature that is known as globally rotation invariant, when its theoretical idea represents the dealing with multi-scale co-occurrence LBP groups. All co-occurrence (among LBP points) patterns are rearranged into groups according to properties of the co-patterns, and encoding functions were extracted MCLBP texture features type from each group. The major benefit occurs in MCLBP framework can effectively capture strong correlation information between different LBP scales around the same central point and is also globally rotation invariant (GRI). A pattern for each point in each scale is calculated. Then, the multi-scale co-occurrence of LBPU (MCLBP) can be denoted as follows:

?CoLBP?^UU ((s_1 ) ?,(s_2 ) ? )=?LBP?^U ((s_1 ) ?,i),?LBP?^U ((s_2 ) ?,i)_co (2.4)

, where i can be fixed to be any number between 0, n-1. Here, just set it to S1 and S2 denote the two scales. , co is a co-occurrence operator.
Image rotation will change one pattern in one group to another pattern in the same group, but the cumulative probability of all patterns in a group will not change.

Median Robust Extended of Local Binary Pattern13:it is scheme for powerful texture feature that is modified LBP descriptor with encoding strategy of replacing every pixel intensities at selected points with simple kernel filter responses, centered oversampling position. MRE-LBP has three descriptors: Extended Local Binary Pattern: ELBP_CI, ELBP_NI and ELBP_RD which explore information from the intensity of the center pixel, of itsneighboring pixels, and radial differences, respectively. The major benefit occurs in maximize the noise performance robustness with a nonlinear median filter over the pixels and rotation invariance speed.

Center pixel representation:

RELBP_CI (Xc) = s(?(Xc,w) ? ?w) (2.5)
The result of applying filter ?() to Xc,w, the local patch of size w × w centered at the center pixel Xc, and ?w denoting the mean of ?(Xc,w) over the whole image.
Neighbor representation:

?_(n=0 )^(p-1)??s(?(X_(r,p,wr ,n)) – ?_(r,p,wr) )2^n (2.6) ?

Where Xr,p,wr,n denotes a patch of size wr×wr centered on Xr,p,n.
3) Radial difference representation:

RELBP_RDr,r?1,p,wr ,wr?1 (Xc)=
?_(n=0 )^(p-1)??s(?(X_(r,p,wr ,n)) – ?(X_(r-1,p,wr-1,n)))2^n (2.7) ?
where Xr,p,wr,n and Xr?1,p,wr?1,n denote the patches centered at the neighboring pixels Xr,p,nand Xr?1,p,n respectively.

After features extraction step, we have normalized all types of features. The normalization process was achieved by features scaling. The result of this operation was that all vectors values become between -1 and 1. The normalization process was used to simplify the coefficient value, then we used the Principal component analysis (PCA), where PCA is a statistical procedure, it is known as dimensionality reduction that characterized the d-dimensional data in a lower-dimensional space, it causes decreasing the time complexities and space.

Classification techniques

Applying SVM classifier: we applied an SVM classifier with non-linear radial-basis functions (RBF), since SVMs perform well in texture classification problems. We use the LIBSVM library14, to apply SVM, and the optimal values of the regularization parameters such as gamma and the cost are estimated empirically in a grid search manner.

Applying KNN classifier: we applied a KNN classifier, since KNN perform well in texture classification problems. We use the fitcknn and predict Matlab function, to apply KNN, and the optimal values of the regularization parameters such as number of neighbors surrounding point with k equal 5 points.

Experiment 1

Our approach starts with processing on Whole Slide Images that are split into five levels by projection Level#5 window size that is the low resolution utilize to detect regions of interest as the same patches in higher resolutions L3, L2, L1 and L0 respectively, in other words we applied lower resolution level to crop region of the same sample image until the highest level (L0) as shown in figure 2.2. In additional, we go beyond in the motivation to get specific features over different levels of magnification with carrying variety characteristic. This is handled by data driven learning features at each level to train for our classifiers with selecting exact regions of interest in order tracking data at more than one resolution level, when we got the level how is more efficient to distinguish the data behavior or describing the best resolution and highest magnification level with lowest consumption time.

To evaluation our SVM classifier with the best accuracy according to the success obtained on the training set bring the risk of memorization problem, so the system should be evaluated on a separate data set that is not used in training the system. Instance, one approach has divided the data set into two disjoint sets and used these sets to train and test the system. In this case it is not feasible to use a significant portion of the data as the testing set, the k fold cross-validation works as an approach takes all original patches to divide them in randomly subdivided samples into two groups with k size sub-patches over the all patches, when it uses (k-1) group to train the system and uses the remaining group to estimate an error rate. this procedure is repeated k times such that each group is used for testing the system.

Figure 2.2 Multiple magnifications with the different resolution of the WSI pyramid structure, splitting WSL into 4 levels with different scales, where L0 has 512*512 window size, L1 has 256*256 window size, L2 has 128*128 window size, L3 has 64*64 window size and L5 has 16*16 window size.

Experiment 2

Here we focus on examination the variety in scaling at the highest magnification level L0 by splitting many windows to make a relation between three quantitative measures classification accuracy, time and number of features.

Experiment 3

we exploit the existence of High resolution (HR) images to learn the mapping from the Low resolution LR-feature space to the HR-feature space, to get a new latent space where these features are highly correlated to each other, and then directly project the LR input image features to its corresponding HR feature. We can enhance the classification performance on images taken under low-resolution conditions.

Suppose that we have n observations (input features for one patch in a row) and each of them is a p dimensional vector (length features) as our LR-feature matrix. In correspondence, we have n observations lying in a q dimensional space as our HR-feature matrix. Let X be the regressor(Input) matrix and Y^ be the response(Output) matrix, where we used each row contains one observation for X and Y are (n×p) and (n×q) matrices respectively. PLS models X and Y such that:

X = TPT + E (a)
Y = UQT + F (b)
U = TB + H (c)

T and U are (n×d) matrices of the d extracted PLS scores or latent projections. The (p×d) matrix P and the (q×d) matrix Q represent matrices of loadings and the (n×p) matrix E, (n × q) matrix F and (n×d) matrix H are the residual matrices. B is a (d×d) diagonal matrix which relates the latent scores of X and Y. So the response variables can be predicted using the multivariate regression formula (which can be derived by substituting Eqn. (c) in Eqn. (b), then Eqn. (a) in Eqn. (b)) as:

Y ?=X(P^(T+))BQ^T (2.8)

where Y ? is the predicted response matrix our output.

T n×r = X-scores U n×r = Y-scores
P p×r= X-loading Q 1×r = Y-loadings
E n×p = X-residuals F n×1 = Y-residual

we studied another solution for enhancement classifier, in which learning-based approach PLS 15, 16 is used to map features from the low-resolution feature (LR-feature) domain to a high-resolution feature (HR-feature) or super-resolved feature (SR-feature) domain and then classification task can be applied in that SR-feature domain rather than in the low-resolution domain. Figure 2.3 can show the steps to get B and then can calculate predicted Y.

Figure 2.3 Steps of PLS algorithm.
Block-matching (BM) 18 is a matching approach that is a particular way of grouping, where it divides into patches are usually stored by proximity to the reference patch along this new artificial dimension which stacked into a volume that has one dimension more (stacked together in a 3-D array). Then apply two filters wavelet with Hard-thresholding and wiener filter over transformation 3D block refer to one reference, where the concept of collecting similar d-dimensional fragments of a given signal into a (d+1)-dimensional group structure that we term “group.” In the case of images for example, the signal fragments can be arbitrary 2-D neighborhoods (e.g., image patches or blocks). There, a group is a 3-D array formed by stacking together similar image neighborhoods.

Step 1) Basic estimate.

a) Block-wise estimates. For each block in the noisy image, do the following.

i) Grouping. Find blocks that are similar to the currently processed one and then stack them together in a 3-D array (group)
ii) Collaborative hard-thresholding. Apply a 3-D transform to the formed group, attenuate the noise by hard-thresholding of the transform coefficients, invert the 3-D transform to produce estimates of all grouped blocks, and return the estimates of the blocks to their original positions.

b) The image plane which the order is called aggregation, when maybe one patch a member of more than one group.
Thus aggregation computes the basic estimate of the true-image by weighted averaging all of the obtained block-wise estimates that result from overlapping.

Repeat steps using different wiener de-noising strategy.

Results and discussion

The supervised classification with KNN, SVM are utilized with labels or known data and the classification process with these supervised classifiers are used to produce splitting the original data into the learning or training stage and testing stage. Firstly, through the training stage, the system was learned how to get a difference between normal and abnormal tissues of WSIs by definition the normal tissues and introduce the type of suspicious tissues or abnormal lesions.
Secondly, in the testing stage, the accuracy of the classification system is tested by entering test patches to compute the precision degree of the system decision with utilizing unknown tissues for both types normal and abnormal patches. In our work we combined all patches normal and abnormal and making random samples with their labels, which is repeated 10 time to produce average for 10 accuracies. After that, we divided the train patches to equal two-thirds of the total number of our patches (1990 patches), then we divided the testing to consist of the residual patches to be one-third from the total examined patches (994 patches). Then all these stages train and test were fed the KNN, SVM classifier. The test stage should measure how well the system will work on unknown samples in the futures, when the test set must consist patches that are independent not those patches that were used by the training stage.
In our proposed CAD system, we defined image regions that can be called cancerous patches with (positive = 1) and normal patches with (negative = -1), and the decision for a detection result can be either correct (true) or incorrect (false). The decision for a detection result. Therefore, we measured the detection performance of the classifiers by computing the sensitivity, specificity, and accuracy for each classifier on data, shows the confusion matrix, which illustrated the four possible outcomes of an evaluation: TP, FP, TN, and FN. The result of the test performed with perfect sensitivity and specificity will express either TP or TN and never or less wrong FP or FN. These measures for both KNN, SVM applied over 4 multi-scale levels for different magnification size and multi-resolution, where they demonstrated the degree of a successful diagnostic system.

We show the SVM is the best classifier can give the best performance in the highest resolution and magnification level L0 and so on in lower magnification in the pyramid construction and the area under the curve, when we get the performance of the classifier can reduce according to number of level Sequentially as shown in the first 3 levels: L0, L1, and L3 are nearly perform good way to recognize the tissue type between normal or abnormal, but L0 has the best classifier performance accuracy, but L3 have less accuracies and confused the between two type of tissues as shown in figure 3.1 below.

Figure 3.1 Represent the best accuracy for the three extracted texture features type that fed SVM classifier over the examined scaled levels: L0, L1, L2, and L3.

After we extracted the features for every patch; those features created a matrix is known features matrix, which its rows demonstrated the number of total observation (2983 patches) we used in our experiment, but its columns represented the total number of extracted features.

Then we applied normalization as a necessary pre-processing before training stage due to understanding our collection features that describe our data information, so our CAD system tries to keep the collected different features in the same scale to prevent losing any important information, can help in excellent reorganization of the type of tissue. So we apply features normalization to have the same scale between -1 to 1, then implied those features as training stage to fed our type of classifier to reduce the bias with the features weights. The major advantages for normalizing features can appear in facilitate the training stage to be faster and the features weight decay to enhance bias till get a precise classification.
After that, we tried to reduce the longest length of each type of features by applying PCA to minimize the dimensionality features reduction such as an important step before fed all those features into the classifier, when there are many advantages that their majority corresponding in facilitate features visualization and understanding and decreasing the storage spaces, reduce training computation and enhancing prediction performance. The purpose of this step is to get the features that have the capability of distinguish the normal pattern against the abnormal pattern and also to select the most significant features. In other words, the discrimination power of the features will increase.
The output of PCA technique is dependent on sorting eigenvectors corresponding descending from maxima eigenvalue to minima. They are called respectively eigenvector e1 with first eigenvalue k1, e2 second eigenvector with eigenvalue k2 and so on
In fact, the largest k such eigenvectors are chosen. In practice to describe almost the details of the data information, this is utilized by observation from a spectrum of eigenvectors. Often there the first component analysis will be dimension implying as orthogonal on the second component analysis dimensionality and so on.
The goal of elimination features that give a little or no additional information beyond that subsumed by the remaining features. only a few features may be useful or optimal, while most may contain irrelevant or redundant information that may have result in the degradation or little of classifier’s performance.

The eigenvalues are arranged in descending order, where the first principal component corresponds to the largest eigenvalue that describes almost the data and so on. The number of the selected principal components is determined by selecting the percentage of the variances that should be preserved.

Experiment 2 result
We used our CAD system to determine what is the good scale, which can define the highest magnification level (L0). We split the L0 in 1024,512,256 and 128 window size to get which scale is enough to give the best performance and low time cost with the more effective type of feature as will see in figure 3.2 below. We studied the relation between time of testing and CAD system accuracy with constant number of features equal 200 lengths in time.
Here, we can conclude if the features length is constant to be the first 200 features, where the more effective describing the data to the fewest number of features can be only 60 features that they let the classifier accuracy be constant. The time of testing and the variety of scales to give 256 window size can be enough scale, that can describe the whole dilates in the patch to be nearly to the classifier performance in 1024 window size. However, the feature can view MC-LBP is the best performance and fast running time consume, when the accuracy average over 10 trials is maximum and equal 95.1659. But RIC-LBP= 92.3854, and MRE-LBP= 93.1669.

Figure 3.2 The behaviour of 200 length feature of MC-LBP versions of LBP in different patches scale.

Figure 3.3 Represent the ROC curve for all extracted features that fed SVM classifier at 1024 level size.
Experiment 3 result
With description for Partial Least Squares (PLS) Regression
We extract 2983 patches (1476 normal, 1507 abnormal) from each of levels 0, 1, 2,3 and 4 with sizes 512×512, 256×256, 128×128, 64×64 and 32×32 respectively. It is important to mention that we maintained each patch represents the same spot or patches across all levels. The data has been divided into two halves, the first one will be used for the mapping algorithm (PLS) training, and the second one will be used for evaluating the algorithm. In these experiments, we used texture features that is called Multi-Scale Co-occurrence of Local Binary Patterns (MC-LBP), which is an extension of the known LBP texture descriptors, and it achieved a good performance for texture content classification.

The mapping has been learned using the first half of the
data: from lower resolution to the higher resolution such as L4 to L0, L4 to L1 to study the effect of mapping from close level and far level to the performance of the algorithm. Then, the other half of the data is used for evaluating the mapping algorithm (PLS) by projecting the features domain of the data of level#4 (low resolution patch) to the corresponding (higher resolution patches) level#0, level#1, and level#2 features domains using the projection matrices learned from previous step, the inferred higher features domains are then used for the learning and testing of an SVM classifier. Figure 3.4 and Table 3.1 demonstrate the idea of our experiment in the simplest block graph describing mapping experiment.

Figure 3.4 represent PLS mapping between different scaled levels for WSL pyramid structure.

Table 3.1 effects enhancement the lower resolution performance by PLS mapping.

Level No Window size ACC of
low/res level ACC of
high / res
level Apply PLS Mapping
L4?L0 1024 89.21 -/+0.91 98.75 -/+ 0.99 91.83-/+0.94
512 96.72 -/+ 0.98 90.89-/+0.927
L3?L0 1024 92.56 -/+ 0.94 98.75 -/+ 0.99 96.16-/+0.97
512 96.72 -/+ 0.98 94.41-/+ 0.96
L2?L0 1024 92.38-/+ 0.94 98.75 -/+ 0.9944 95.77 -/+0.97
512 96.72 -/+ 0.98 95.48-/+0.97
L1?L0 1024 94.24 -/+ 0.96 98.75 -/+ 0.99 96.2-/+0.97
512 96.72 -/+ 0.98 96.38-/+0.98

Level(low/res)? L1 L4 to L1 89.21 -/+0.91 94.24 -/+0.96 90.59-/+0.92
L3 to L1 92.56 -/+ 0.94 94.24 -/+0.96 94.36-/+0.96
L2 to L1 92.38-/+0.94 94.24 -/+0.96 95.49-/+0.97
Level(low/res)? L2 L2 to L4 89.21-/+0.91 92.38-/+0.94 90.12-/+0.92
L2 to L3 92.56 -/+ 0.94 92.380.94-/+ 94.52-/+0.96

Here, we calculate Classifier Performance with Confidence Intervals which is known as a presenting just a single error score, a confidence interval can be calculated and presented as part of the model skill.

A confidence interval is comprised of two things:

Range. This is the lower and upper limit on the skill that can be expected on the model.

Probability. This is the probability that the skill of the model will fall within the range.

In general, the confidence interval for classification error can be calculated as follows:
classification error = incorrect predictions / total predictions

Classifier Performance with Confidence Intervals= error +/- const * sqrt( (error * (1 – error)) / n) (3.1)
const=1.96 (95%)

Where error is the classification error, const is a constant value that defines the chosen probability, sqrt is the square root function, and n is the number of observations (rows) used to evaluate the model.

Then We add Gaussian noise in variety values G=0.01, 0.04, 0.08, 0.1,0.4,0.8,1,1.4,1.8, and 2 to see the effect of distortion in the patches features as shown in figure 3.5 and their effects on the performance of SVM classifier as shown in table 3.2 And the two de-noising algorithms applied individually and applying the combination BM3D+PLS.
We observed applying PLS mapping features can enhance the classifier accuracy of the noise feature for 256 window size as different data collection for the previous experiment, to be better than the used features BM3D algorithm, but both combinations can give the best performance exact at the too much noisy features.

Figure 3.5 Adding Gaussian noise with different density.

Table 3.2 effects adding Gaussian noise and enhancing features with PLS, BM3D and combination PLS+BM3D.

res/noisy High res/clean Gaussian Noise density
99.0332 96.0222 98.6808 93.6254 98.8419 0.01
98.9728 95.7200 98.9627 92.6083 98.8419 0.04
98.6606 95.9718 98.7210 93.8469 98.8419 0.08
98.1672 95.7704 98.3082 93.8671 98.8419 0.1
97.7744 94.2900 96.0020 84.3706 98.8419 0.4
97.5126 92.7291 93.8066 63.5045 98.8419 0.8
97.2608 92.8802 92.5881 55.4884 98.8419 1
96.9587 92.7593 90.8459 50.3223 98.8419 1.4
95.1057 89.5166 88.1470 49.6073 98.8419 1.8
93.5851 87.8046 87.6234 49.5871 98.8419 2

Figure 3.6 Effects de-noising algorithm on interval of Gaussian noise.

This paper focused on development science in CAD system, where it defines as an approach can obtain the meaningful subset features overall diagnostic or analysis histopathology interval cancerous over lymph node in type of WSI pathology lesions.
Our CAD system based on extracting the most texture effective sets of features, used them in detection, then fed the classifiers by the vector of features type that can define and distinguish the breast cancer pattern with E;H stain by selected feature stage that improve prediction performance and provide faster and more cost effective with PCA to apply them reduction number of features to the using supervised KNN ; SVM classifier, they are consider as effective tools that accurately detect and classify breast abnormalities. Supervised SVM classifier gave the best result for detection with a sensitivity of 0.9917% and specificity of 0.9881%.
According to our result over this study, the further proposed researches can be occurring in some following ideas:
Developing automatically segmentation algorithm to detect cancer tissues and split them into small patches.
Using the recent machine learning technique such as CNN Conventional Neural Network, it is better to deal with huge WSI image size.
Extraction more different effective type of features
Improving the response computing time for dealing with the huge digital pathology images that express
multiple Gigabyte as large virtual slides, by controlling with compression techniques or prioritization algorithms.

The physiology WSIs can employ the detection according based cancer grading workflow in a semi-automated CAD system diagnosis, increasing precise and accuracy of knowing the tissue type.



1 W. Slide, O. U. Medicine, L. A. Hassell, O. U. Physicians, and O. City, “Introduction of Whole Slide Imaging in Pathology at Oklahoma University ( OU ) Medical Center.”
2 “Virtual Slide?: Real Microscopy – Redefined SLIDE INTO THE FUTURE ….”
3 P. Filipczuk, M. Kowal, and A. Obuchowicz, “Multi-label fast marching and seeded watershed segmentation methods for diagnosis of breast cancer cytology.,” Conf. Proc. IEEE Eng. Med. Biol. Soc., vol. 2013, pp. 7368–7371, 2013.
4 S. Doyle, A. Madabhushi, M. Feldman, and J. Tomaszeweski, “A Boosting Cascade for Automated Detection of Prostate Cancer from Digitized Histology,” pp. 504–511, 2006.
5 C. Bahlmann et al., “Automated detection of diagnostically relevant regions in H&E stained digital pathology slides,” Proc. SPIE, Med. Imaging, vol. 8315, p. 831504, 2012.
6 A. Basavanhally et al., “Multi-field-of-view framework for distinguishing tumor grade in ER+ breast cancer from entire histopathology slides,” IEEE Trans. Biomed. Eng., vol. 60, no. 8, pp. 2089–2099, 2013.
7 M. Peikari, M. J. Gangeh, J. Zubovits, G. Clarke, and A. L. Martel, “Triaging diagnostically relevant regions from pathology whole slides of breast cancer: A texture based approach,” IEEE Trans. Med. Imaging, vol. 35, no. 1, pp. 307–315, 2016.
8 M. Valkonen, K. Kartasalo, K. Liimatainen, M. Nykter, and L. Latonen, “Metastasis Detection from Whole Slide Images Using Local Features and Random Forests,” no. 8, pp. 1–11, 2017.
9 P. Informatics, “Journal of Pathology Informatics,” vol. 4, no. 1, 2013.
10 B. Braithwaite, K. Haataja, T. Ikonen, and P. Toivanen, “Current Analysis Approaches and Performance Needs for Whole Slide Image Processing in Breast Cancer Diagnostics,” pp. 1–7, 2015.
11 R. Nosaka and K. Fukui, “HEp-2 cell classi fi cation using rotation invariant co-occurrence among local binary patterns,” Pattern Recognit., vol. 47, no. 7, pp. 2428–2436, 2014.
12 X. Qi, L. Shen, G. Zhao, Q. Li, and M. Pietikäinen, “Globally rotation invariant multi-scale co-occurrence local binary pattern,” Image Vis. Comput., vol. 43, pp. 16–26, 2015.
13 L. Liu, S. Lao, P. W. Fieguth, Y. Guo, X. Wang, and M. Pietikäinen, “Median Robust Extended Local Binary Pattern for Texture Classification,” IEEE Trans. Image Process., vol. 25, no. 3, pp. 1368–1381, 2016.
14 C.-C. Chang and C.-J. Lin, “LIBSVM,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 1–27, Apr. 2011.

15 “Partial Least Squares (PLS),” Squares#SIMPLS.

16 Asaad, M., 2017, “Learning-based Features Super-Resolution For Low-Resolution Image Classification “, MSc. Thesis, Faculty of Engineering, Cairo University, Giza, Egypt.

17 K. Dabov, A. Foi, and V. Katkovnik, “Image denoising by sparse 3D transformation-domain collaborative filtering,” IEEE Trans. Image Process., vol. 16, no. 8, pp. 1–16, 2007.

18 Camelyon 2016. January 2017, .


I'm Sarah!

Would you like to get a custom essay? How about receiving a customized one?

Check it out