當前位置：首頁 > 编程语言 > java >内容正文

java

神经网络检测Java溢出攻击

發布時間：2024/5/14 java 25 豆豆

生活随笔收集整理的這篇文章主要介紹了神经网络检测Java溢出攻击小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1.數據搜集和數據清洗

使用ADFA-LD數據集中與Java溢出攻擊相關的數據。

ADFA-LD數據集中記錄了系統調用序列，使用系統調用號標識每一個系統調用，這樣就將一連串的系統調用轉換成一連串的系統調用號了。

6 6 63 6 42 120 6 195 120 6 6 114 114 1 1 252 252 252 1 1 1 1 1 1 1 1 1 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 1 1 252 1 1 1 1 1 1 1 1 1 1 1 1 1 1 252 1 1 1 1 1 1 252 252 252 252 252 252 252 252 252 252 252 1 1 1 1 1 1 1 1 1 1 252 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 252 1 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 1 1 1 1 1 1 1 1 1 1 1 252 252 1 1 1 1 1 1 1 1 1 1 1 1 1 1 252 252 252 1 252 1 1 1 1 252 1 1 1 1 1 1 1 1 1 252 252 252 252 252 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 1 1 1 1 1 252 252 252 1 1 1 1 1 1 1 1 1 252 1 1 1 1 1 1 1 1 1 1 252 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 252 252 252 252 252 1 1 252 1 252 252 252 252 252 1 1 252 252 252 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 252 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 252 1 1 252 1 1 252 1 1 252 252 1 1 1 1 1 1 1 1 1 252 1 1 1 1 1 1 252 252 252 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 252 1 1 1 1 1 252 1 1 1 1 1 1 1 252 1 1 1 1 1 1 252 252 1 1 1 1 1 252 252 252 252 1 1 1 1 1 1 1 1 1 1 1 1 1 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 1 252 252 252 252 252 252 252 252 1 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 1 252 252 1 252 252 1 1 252 252 252 1 1 252 252 252 252 1 1 1 1 1 1 1 1 252 252 252 252 252 252 252 1 252 252 252 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 1 252 252 1 252 252 252 252 252 252 252 252 252 252 252 252 252 252 1 1 1 1 1 1 1 1 1 1 1 1 252 1 1 1 1 252 252 252 252 252 252 252 1 252 1 1 252 1 1 252 1 252 252 252 252 252 252 252 252 252 252 1 252 1 1 252 1 252 252 252 1 252 252 252 1 1 252 252 252 252 252 252 252 252 252 252 252 252 252 252 1 1 252

每個系統調用文件中都是一長串的系統調用號。

正常狀態系統調用文件（833個樣本）

Training_Data_Master/ ├── UTD-0001.txt ├── ... ├── UTD-0833.txt

攻擊狀態系統調用文件（這里需要遞歸目錄，找到所有文件）

Attack_Data_Master/ ├── Adduser_1 │ ... │ └── UAD-Adduser-1-2783.txt... ├── Hydra_FTP_1 │ ... │ └── UAD-Hydra-FTP-1-9186.txt... ├── Hydra_SSH_1 │ ... │ └── UAD-Hydra-SSH-1-6011.txt... ├── Java_Meterpreter_1（124個樣本） │ ... │ └── UAD-Java-Meterpreter-1-961.txt... ├── Meterpreter_1 │ ... │ └── UAD-Meterpreter-1-2783.txt... ├── Web_Shell_1（118個樣本） │ ... │ └── UAD-WS1-961.txt...

2.特征化

每個文件都記錄了一長串的系統調用號，每個文件中的系統調用號數量都不一樣，并且有大量重復的系統調用號。對樣本集合做詞集處理，通俗的說就是統計總共有多少種數字，每一條樣本中有詞集中的數字就置位1，沒有就置位0，這種處理方式會丟失上下文語義。

詞集處理可參考：機器學習特征提取入門。

我們統計到正常系統調用樣本833個，打上0標簽，Java溢出攻擊系統調用樣本124個，打上1標簽。

vectorizer = CountVectorizer(min_df=1)#詞集處理 #將每一行個數不同的樣本，處理成具有相同列數的特征向量 x=vectorizer.fit_transform(x)#將List數據轉為二維數組 x=x.toarray()# 每一行轉換成由01組成的一維向量

3.訓練樣本

實例化神經網絡算法，構建2個隱藏層的神經網絡：

mlp = MLPClassifier(hidden_layer_sizes=(130,70), #隱藏層，神經元個數max_iter=10, #最大迭代次數alpha=1e-4,#正則化項參數solver='sgd',#激活函數，隨機梯度下降 verbose=10, #是否將輸出打印到stdouttol=1e-4, #優化容忍度random_state=1,#隨機數生成器的狀態learning_rate_init=.1)#學習率，用于權重更新

4.效果驗證

由于數據量較少，采用10折交叉驗證：

score=cross_validation.cross_val_score(mlp, x, y, n_jobs=-1, cv=10)

準確率：87%

Iteration 1, loss = 11.38788075 Iteration 1, loss = 10.51482083 Iteration 1, loss = 10.39080226 Iteration 1, loss = 11.32852644 Iteration 2, loss = 32.20600360 Iteration 2, loss = 58.44084813 Iteration 2, loss = 7.32748794 Iteration 2, loss = 80.82413497 Iteration 3, loss = 7424.23267185 Iteration 3, loss = 9424.08875251 Iteration 3, loss = 22610434422.25117111 Iteration 3, loss = 2840272.43090724 Iteration 4, loss = 29906.60187593 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. Iteration 4, loss = 38487.24709705 Iteration 4, loss = 35603719231444562174143164348696901984344076821095036977977791348736.00000000 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. Iteration 4, loss = 8621248.96811235 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. Iteration 5, loss = 65955.57738497 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. Iteration 1, loss = 10.38922256 Iteration 1, loss = 10.79185274 Iteration 1, loss = 11.10184141 Iteration 2, loss = 6.79289521 Iteration 1, loss = 11.44750387 Iteration 2, loss = 30.92423419 Iteration 2, loss = 5.54195671 Iteration 3, loss = 1539.31127234 Iteration 2, loss = 174.99802016 Iteration 3, loss = 35192.88159551 Iteration 3, loss = 2.18754845 Iteration 4, loss = 6406.82346419 Iteration 3, loss = 4352716771914203.00000000 Iteration 4, loss = 6193158.71046004 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. Iteration 5, loss = 11052.63693957 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. Iteration 4, loss = 4.32418180 Iteration 4, loss = 67175089847701384.00000000 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. Iteration 5, loss = 5.74591940 Iteration 1, loss = 11.19351752 Iteration 1, loss = 10.34211713 Iteration 2, loss = 10.96010720 Iteration 6, loss = 6.81881415 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. Iteration 2, loss = 1450.04509096 Iteration 3, loss = 62382524.00523394 Iteration 4, loss = 973646103714251520.00000000 Iteration 3, loss = 4663025983327232851968.00000000 Iteration 4, loss = 1966014836005464769614914274924194047233373694149176820834764942345697615581775134720.00000000 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. Iteration 5, loss = 187847652662873026661827284709049461984939454363352667852600455282239475435256474409617063936.00000000 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. 0.8735988650750587

5.完整代碼

import re import matplotlib.pyplot as plt import os from sklearn.feature_extraction.text import CountVectorizer from sklearn import cross_validationimport numpy as np from sklearn.neural_network import MLPClassifierdef load_one_flle(filename):x=[]with open(filename) as f:line=f.readline()line=line.strip('\n')return linedef load_adfa_training_files(rootdir):x=[]y=[]list = os.listdir(rootdir)for i in range(0, len(list)):path = os.path.join(rootdir, list[i])if os.path.isfile(path):x.append(load_one_flle(path))y.append(0)return x,ydef dirlist(path, allfile):filelist = os.listdir(path)for filename in filelist:filepath = os.path.join(path, filename)if os.path.isdir(filepath):dirlist(filepath, allfile)else:allfile.append(filepath)return allfiledef load_adfa_java_files(rootdir):x=[]y=[]allfile=dirlist(rootdir,[])for file in allfile:if re.match(r"../data/ADFA-LD/Attack_Data_Master/Java_Meterpreter_\d+/UAD-Java-Meterpreter*",file):x.append(load_one_flle(file))y.append(1)return x,yif __name__ == '__main__':x1,y1=load_adfa_training_files("../data/ADFA-LD/Training_Data_Master/")x2,y2=load_adfa_java_files("../data/ADFA-LD/Attack_Data_Master/")x=x1+x2y=y1+y2vectorizer = CountVectorizer(min_df=1)x=vectorizer.fit_transform(x)x=x.toarray()mlp = MLPClassifier(hidden_layer_sizes=(130,70), max_iter=10, alpha=1e-4,solver='sgd', verbose=10, tol=1e-4, random_state=1,learning_rate_init=.1)score=cross_validation.cross_val_score(mlp, x, y, n_jobs=-1, cv=10)print np.mean(score)

總結

以上是生活随笔為你收集整理的神经网络检测Java溢出攻击的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：用Python爬取中国各省GDP数据
下一篇：设置电子围栏高德地图_简直监视他人行踪