Where ? is is a sparsity constraint parameter ranges
from 0 to 1 and ? controls the sparsity cost term. The (KL?˜J) reaches a minimum value of ? = ?˜J
which is the “average activation value” of hidden
unit j over the training input. After learning the “optimal numbers” for W and
b1 by applying the sparse auto-encoder on the unlabeled data, xu,
after which, we evaluate the feature representation a = hWb1 (xl) for the data which is labeled, (xl,y). This
new “features representation” a is used with the labels vector y. Classification
task is done using a soft-max regression in the second stage as shown in fig ure2(b).
3.2 NSL-KDD Dataset
NSL-KDD dataset is the improved and simplified version
of the KDD Cup 99 data set . Because of the drawbacks in the KDD Cup 99,
the NSL-KDD was developed. It includes the following features:
1. “unwanted and unnecessary records” are removed to
allow the classifiers to produce fair results.”
2. “An adequate number of records” is accessible by
training and testing dataset, which is sensible and empowers to perform tests
on the full set.
3. “For each difficult level group the number” of
selected records is inversely proportional to the percentage records in the
original KDD dataset.
The NSL-KDD dataset comprises of 41 features and an
attribute assigned to each of them denoting an attack type or as normal. This
dataset contains 5 classes of system vector which are additionally sorted as
one normal class and four attack class. The four attack classes comprise of
User to Root
Service Attack” (DoS)
Remote to Local Attack (R2L)
The total number of records in the “training data s is
125973 data set”, while testing is 22544 datasets . These features consist
of the basic features derived directly from a “TCP/IP connection”. Table 1
shows the “Traffic distribution of NSL-KDD” in multi-class.
4. Proposed methodology
The NSL-NDD database which contains a few sorts of
attributes having distinctive esteems, will be trained/handled before the
“self – taught learning” will be used on it. Properties which are
nominal are changed over into discrete attributes by utilizing 1 to n encoding.
As prior talked about the “self-taught learning” includes two stages,
“feature learning” and last classification.
A “sigmoid function” is utilized to process
the yield value amid the feature learning stage which gives esteems for 0 to 1.
As the yield value are same with the information layer value in this stage, it
subsequently influences the contribution to an incentive to normalize from 0 to
1. We subsequently play out a maximum min standardization on the new characteristic
rundown. Having this new traits, NSL-KDD training information without label is
connected to a “sparse auto encoder” to accomplish another
“learned feature representation”. This component is then connected on
same training information for classification utilizing a “soft max
regression classifier”. For appropriate execution, for “feature
learning and classifier training” both the unlabeled and named information
originate from same source, i.e, NSL-KDD training information.