Network encroachment detection systems (NEDS) are installed at a predetermined point in the network to analyze traffic from all connected devices.
It monitors all subnet traffic and compares it to a database of known threats. An alarm can be issued to the administrator whenever an assault has been detected or strange behaviour has been discovered.
Data-set
https://www.kaggle.com/sampadab17/network-intrusion-detection
A data set including a wide range of intrusions simulated in a military network environment was supplied for auditing at the above mentioned URL. By mimicking a typical US Air Force LAN, it established an environment in which raw TCP/IP dump data for a network could be acquired.
The LAN was focused as if it were a real setting, and various attacks were launched. A connection is a series of TCP packets that begin and stop at a specific time interval and allow data to flow from a source IP address to a target IP address using a well-defined protocol.
In addition, each link is classified as either normal or an attack, with only one attack kind. Each connection record is around 100 bytes long.
From normal and attack data, 41 quantitative and qualitative features (3 qualitative and 38 quantitative features) are extracted for each TCP/IP connection.
There are two types of classes in the class variable:
• Normal
• Anomalous
Data Visualization:
We can infer from the above graph, that the data is almost balanced.
Histograms of values of each column are displayed below:
Data Pre-Processing:
We need to standardize the input data set as there are large differences between ranges of each feature.
Next, we have to encode categorical attributes,
Feature Selection:
Feature selection is a fundamental topic in machine learning that has a significant influence on your model’s performance. The data attributes you use to train your machine learning models have a significant impact on the results you can get.
[\'src_bytes\', \'dst_bytes\', \'logged_in\', \'count\', \'srv_count\', \'same_srv_rate\', \'diff_srv_rate\', \'dst_host_srv_count\', \'dst_host_same_srv_rate\', \'dst_host_diff_srv_rate\', \'dst_host_same_src_port_rate\', \'dst_host_srv_diff_host_rate\', \'protocol_type\', \'service\', \'flag\']
Model Building:
A neural network is built as follows.
Model: "sequential_6" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_18 (Dense) (None, 8) 128 _________________________________________________________________ dense_19 (Dense) (None, 8) 72 _________________________________________________________________ dense_20 (Dense) (None, 1) 9 ================================================================= Total params: 209 Trainable params: 209 Non-trainable params: 0 _________________________________________________________________
Model Training
The model is fitted over pre-processed data, with epochs = 20
Epoch 1/20 395/395 [==============================] - 1s 2ms/step - loss: 0.3710 - accuracy: 0.8672 - val_loss: 0.1879 - val_accuracy: 0.9286 Epoch 2/20 395/395 [==============================] - 1s 2ms/step - loss: 0.1695 - accuracy: 0.9361 - val_loss: 0.1595 - val_accuracy: 0.9378 Epoch 3/20 395/395 [==============================] - 1s 2ms/step - loss: 0.1579 - accuracy: 0.9429 - val_loss: 0.1499 - val_accuracy: 0.9398 Epoch 4/20 395/395 [==============================] - 1s 2ms/step - loss: 0.1513 - accuracy: 0.9451 - val_loss: 0.1443 - val_accuracy: 0.9422 Epoch 5/20 395/395 [==============================] - 1s 2ms/step - loss: 0.1466 - accuracy: 0.9463 - val_loss: 0.1410 - val_accuracy: 0.9466 Epoch 6/20 395/395 [==============================] - 1s 2ms/step - loss: 0.1435 - accuracy: 0.9475 - val_loss: 0.1380 - val_accuracy: 0.9548 Epoch 7/20 395/395 [==============================] - 1s 2ms/step - loss: 0.1411 - accuracy: 0.9500 - val_loss: 0.1359 - val_accuracy: 0.9554 Epoch 8/20 395/395 [==============================] - 1s 2ms/step - loss: 0.1392 - accuracy: 0.9501 - val_loss: 0.1383 - val_accuracy: 0.9466 Epoch 9/20 395/395 [==============================] - 1s 2ms/step - loss: 0.1380 - accuracy: 0.9525 - val_loss: 0.1324 - val_accuracy: 0.9594 Epoch 10/20 395/395 [==============================] - 1s 2ms/step - loss: 0.1350 - accuracy: 0.9530 - val_loss: 0.1322 - val_accuracy: 0.9554 Epoch 11/20 395/395 [==============================] - 1s 2ms/step - loss: 0.1336 - accuracy: 0.9542 - val_loss: 0.1301 - val_accuracy: 0.9596 Epoch 12/20 395/395 [==============================] - 1s 2ms/step - loss: 0.1319 - accuracy: 0.9558 - val_loss: 0.1286 - val_accuracy: 0.9604 Epoch 13/20 395/395 [==============================] - 1s 2ms/step - loss: 0.1308 - accuracy: 0.9555 - val_loss: 0.1285 - val_accuracy: 0.9588 Epoch 14/20 395/395 [==============================] - 1s 2ms/step - loss: 0.1306 - accuracy: 0.9559 - val_loss: 0.1268 - val_accuracy: 0.9602 Epoch 15/20 395/395 [==============================] - 1s 2ms/step - loss: 0.1282 - accuracy: 0.9574 - val_loss: 0.1266 - val_accuracy: 0.9622 Epoch 16/20 395/395 [==============================] - 1s 2ms/step - loss: 0.1269 - accuracy: 0.9569 - val_loss: 0.1271 - val_accuracy: 0.9576 Epoch 17/20 395/395 [==============================] - 1s 2ms/step - loss: 0.1269 - accuracy: 0.9564 - val_loss: 0.1310 - val_accuracy: 0.9530 Epoch 18/20 395/395 [==============================] - 1s 2ms/step - loss: 0.1266 - accuracy: 0.9573 - val_loss: 0.1229 - val_accuracy: 0.9640 Epoch 19/20 395/395 [==============================] - 1s 2ms/step - loss: 0.1241 - accuracy: 0.9591 - val_loss: 0.1251 - val_accuracy: 0.9602 Epoch 20/20 395/395 [==============================] - 1s 2ms/step - loss: 0.1231 - accuracy: 0.9577 - val_loss: 0.1213 - val_accuracy: 0.9620
A training accuracy of ~96% is achieved with built network.
Epochs vs Accuracy graph is as plotted below for the considered model:
Epochs vs Loss graph is as plotted below:
Evaluation:
Scores:
============================== ANN Model Test Results ==============================
Model Accuracy: 0.9657316750463085
Classification report: precision recall f1-score support
0 0.98 0.95 0.96 3498 1 0.95 0.98 0.97 4060
accuracy 0.97 7558 macro avg 0.97 0.96 0.97 7558 weighted avg 0.97 0.97 0.97 7558
Accuracy on test data is ~96%
Predictions for Test data:
Anomaly Anomaly Normal Anomaly Anomaly Normal Normal Normal Normal Normal Normal Normal Anomaly Anomaly Normal Normal Normal Normal . . . Normal Normal Normal Normal Normal Anomaly Anomaly Anomaly Normal Normal Normal Anomaly Anomaly Normal Normal Normal Normal Anomaly Anomaly Normal Normal Normal Normal Anomaly Anomaly
Conclusion:
A deep learning model to detect Encroachments in networks is built with an accuracy of ~96%.
Platform: cAInvas
Code: Here
Written By: Dheeraj Perumandla
Also Read: Spider Breed Classification with Cainvas