神经网络检测Java溢出攻击
生活随笔
收集整理的這篇文章主要介紹了
神经网络检测Java溢出攻击
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
1.數據搜集和數據清洗
使用ADFA-LD數據集中與Java溢出攻擊相關的數據。
ADFA-LD數據集中記錄了系統調用序列,使用系統調用號標識每一個系統調用,這樣就將一連串的系統調用轉換成一連串的系統調用號了。
6 6 63 6 42 120 6 195 120 6 6 114 114 1 1 252 252 252 1 1 1 1 1 1 1 1 1 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 1 1 252 1 1 1 1 1 1 1 1 1 1 1 1 1 1 252 1 1 1 1 1 1 252 252 252 252 252 252 252 252 252 252 252 1 1 1 1 1 1 1 1 1 1 252 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 252 1 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 1 1 1 1 1 1 1 1 1 1 1 252 252 1 1 1 1 1 1 1 1 1 1 1 1 1 1 252 252 252 1 252 1 1 1 1 252 1 1 1 1 1 1 1 1 1 252 252 252 252 252 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 1 1 1 1 1 252 252 252 1 1 1 1 1 1 1 1 1 252 1 1 1 1 1 1 1 1 1 1 252 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 252 252 252 252 252 1 1 252 1 252 252 252 252 252 1 1 252 252 252 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 252 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 252 1 1 252 1 1 252 1 1 252 252 1 1 1 1 1 1 1 1 1 252 1 1 1 1 1 1 252 252 252 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 252 1 1 1 1 1 252 1 1 1 1 1 1 1 252 1 1 1 1 1 1 252 252 1 1 1 1 1 252 252 252 252 1 1 1 1 1 1 1 1 1 1 1 1 1 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 1 252 252 252 252 252 252 252 252 1 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 1 252 252 1 252 252 1 1 252 252 252 1 1 252 252 252 252 1 1 1 1 1 1 1 1 252 252 252 252 252 252 252 1 252 252 252 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 1 252 252 1 252 252 252 252 252 252 252 252 252 252 252 252 252 252 1 1 1 1 1 1 1 1 1 1 1 1 252 1 1 1 1 252 252 252 252 252 252 252 1 252 1 1 252 1 1 252 1 252 252 252 252 252 252 252 252 252 252 1 252 1 1 252 1 252 252 252 1 252 252 252 1 1 252 252 252 252 252 252 252 252 252 252 252 252 252 252 1 1 252每個系統調用文件中都是一長串的系統調用號。
正常狀態系統調用文件(833個樣本)
Training_Data_Master/ ├── UTD-0001.txt ├── ... ├── UTD-0833.txt攻擊狀態系統調用文件(這里需要遞歸目錄,找到所有文件)
Attack_Data_Master/ ├── Adduser_1 │ ... │ └── UAD-Adduser-1-2783.txt... ├── Hydra_FTP_1 │ ... │ └── UAD-Hydra-FTP-1-9186.txt... ├── Hydra_SSH_1 │ ... │ └── UAD-Hydra-SSH-1-6011.txt... ├── Java_Meterpreter_1(124個樣本) │ ... │ └── UAD-Java-Meterpreter-1-961.txt... ├── Meterpreter_1 │ ... │ └── UAD-Meterpreter-1-2783.txt... ├── Web_Shell_1(118個樣本) │ ... │ └── UAD-WS1-961.txt...2.特征化
每個文件都記錄了一長串的系統調用號,每個文件中的系統調用號數量都不一樣,并且有大量重復的系統調用號。對樣本集合做詞集處理,通俗的說就是統計總共有多少種數字,每一條樣本中有詞集中的數字就置位1,沒有就置位0,這種處理方式會丟失上下文語義。
詞集處理可參考:機器學習特征提取入門。
我們統計到正常系統調用樣本833個,打上0標簽,Java溢出攻擊系統調用樣本124個,打上1標簽。
vectorizer = CountVectorizer(min_df=1)#詞集處理 #將每一行個數不同的樣本,處理成具有相同列數的特征向量 x=vectorizer.fit_transform(x)#將List數據轉為二維數組 x=x.toarray()# 每一行轉換成由01組成的一維向量3.訓練樣本
實例化神經網絡算法,構建2個隱藏層的神經網絡:
mlp = MLPClassifier(hidden_layer_sizes=(130,70), #隱藏層,神經元個數max_iter=10, #最大迭代次數alpha=1e-4,#正則化項參數solver='sgd',#激活函數,隨機梯度下降 verbose=10, #是否將輸出打印到stdouttol=1e-4, #優化容忍度random_state=1,#隨機數生成器的狀態learning_rate_init=.1)#學習率,用于權重更新4.效果驗證
由于數據量較少,采用10折交叉驗證:
score=cross_validation.cross_val_score(mlp, x, y, n_jobs=-1, cv=10)準確率:87%
Iteration 1, loss = 11.38788075 Iteration 1, loss = 10.51482083 Iteration 1, loss = 10.39080226 Iteration 1, loss = 11.32852644 Iteration 2, loss = 32.20600360 Iteration 2, loss = 58.44084813 Iteration 2, loss = 7.32748794 Iteration 2, loss = 80.82413497 Iteration 3, loss = 7424.23267185 Iteration 3, loss = 9424.08875251 Iteration 3, loss = 22610434422.25117111 Iteration 3, loss = 2840272.43090724 Iteration 4, loss = 29906.60187593 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. Iteration 4, loss = 38487.24709705 Iteration 4, loss = 35603719231444562174143164348696901984344076821095036977977791348736.00000000 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. Iteration 4, loss = 8621248.96811235 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. Iteration 5, loss = 65955.57738497 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. Iteration 1, loss = 10.38922256 Iteration 1, loss = 10.79185274 Iteration 1, loss = 11.10184141 Iteration 2, loss = 6.79289521 Iteration 1, loss = 11.44750387 Iteration 2, loss = 30.92423419 Iteration 2, loss = 5.54195671 Iteration 3, loss = 1539.31127234 Iteration 2, loss = 174.99802016 Iteration 3, loss = 35192.88159551 Iteration 3, loss = 2.18754845 Iteration 4, loss = 6406.82346419 Iteration 3, loss = 4352716771914203.00000000 Iteration 4, loss = 6193158.71046004 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. Iteration 5, loss = 11052.63693957 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. Iteration 4, loss = 4.32418180 Iteration 4, loss = 67175089847701384.00000000 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. Iteration 5, loss = 5.74591940 Iteration 1, loss = 11.19351752 Iteration 1, loss = 10.34211713 Iteration 2, loss = 10.96010720 Iteration 6, loss = 6.81881415 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. Iteration 2, loss = 1450.04509096 Iteration 3, loss = 62382524.00523394 Iteration 4, loss = 973646103714251520.00000000 Iteration 3, loss = 4663025983327232851968.00000000 Iteration 4, loss = 1966014836005464769614914274924194047233373694149176820834764942345697615581775134720.00000000 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. Iteration 5, loss = 187847652662873026661827284709049461984939454363352667852600455282239475435256474409617063936.00000000 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. 0.87359886507505875.完整代碼
import re import matplotlib.pyplot as plt import os from sklearn.feature_extraction.text import CountVectorizer from sklearn import cross_validationimport numpy as np from sklearn.neural_network import MLPClassifierdef load_one_flle(filename):x=[]with open(filename) as f:line=f.readline()line=line.strip('\n')return linedef load_adfa_training_files(rootdir):x=[]y=[]list = os.listdir(rootdir)for i in range(0, len(list)):path = os.path.join(rootdir, list[i])if os.path.isfile(path):x.append(load_one_flle(path))y.append(0)return x,ydef dirlist(path, allfile):filelist = os.listdir(path)for filename in filelist:filepath = os.path.join(path, filename)if os.path.isdir(filepath):dirlist(filepath, allfile)else:allfile.append(filepath)return allfiledef load_adfa_java_files(rootdir):x=[]y=[]allfile=dirlist(rootdir,[])for file in allfile:if re.match(r"../data/ADFA-LD/Attack_Data_Master/Java_Meterpreter_\d+/UAD-Java-Meterpreter*",file):x.append(load_one_flle(file))y.append(1)return x,yif __name__ == '__main__':x1,y1=load_adfa_training_files("../data/ADFA-LD/Training_Data_Master/")x2,y2=load_adfa_java_files("../data/ADFA-LD/Attack_Data_Master/")x=x1+x2y=y1+y2vectorizer = CountVectorizer(min_df=1)x=vectorizer.fit_transform(x)x=x.toarray()mlp = MLPClassifier(hidden_layer_sizes=(130,70), max_iter=10, alpha=1e-4,solver='sgd', verbose=10, tol=1e-4, random_state=1,learning_rate_init=.1)score=cross_validation.cross_val_score(mlp, x, y, n_jobs=-1, cv=10)print np.mean(score)總結
以上是生活随笔為你收集整理的神经网络检测Java溢出攻击的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 用Python爬取中国各省GDP数据
- 下一篇: 设置电子围栏 高德地图_简直监视他人行踪