生活随笔
收集整理的這篇文章主要介紹了
生物序列生成onehot编码
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
- 定義生成onehot類
- 每次讀入一行描述行和一行序列
- 生成csv文件
import pandas
as pd
from numpy
import array
from numpy
import argmax
from sklearn
.preprocessing
import LabelEncoder
from sklearn
.preprocessing
import OneHotEncoder
import re
class hot_dna:def __init__(self
, fasta
):if re
.search
(">", fasta
):name
= re
.split
("\n", fasta
)[0]sequence
= re
.split
("\n", fasta
)[1]else:name
= 'unknown_sequence'sequence
= fastaseq_array
= array
(list(sequence
))label_encoder
= LabelEncoder
()integer_encoded_seq
= label_encoder
.fit_transform
(seq_array
)onehot_encoder
= OneHotEncoder
(sparse
=False)integer_encoded_seq
= integer_encoded_seq
.reshape
(len(integer_encoded_seq
), 1)onehot_encoded_seq
= onehot_encoder
.fit_transform
(integer_encoded_seq
)self
.name
= nameself
.sequence
= fastaself
.integer
= integer_encoded_seqself
.onehot
= onehot_encoded_seqinputfile
= "H_sapiens_acc_sample__len398_pos.fasta"
savefile
= "SpliceRover_H_sapiens_acc_pos.csv"with open(inputfile
,"r") as f
:data
= f
.readlines
()for index
,line
in enumerate(data
):if index
% 2 == 0:fasta
= data
[index
]+data
[index
+1]my_hottie
= hot_dna
(fasta
)onehot
= pd
.DataFrame
(my_hottie
.onehot
)onehot
.to_csv
(savefile
,index
=False,header
=False,mode
="a+")
f
.close
()
總結(jié)
以上是生活随笔為你收集整理的生物序列生成onehot编码的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。