遗传编程算法
假設從向銀行申請貸款的顧戶中,要選出優質顧客。怎么做?
現在有學習數據如下
| ID | 孩子個數 | 薪水 | 婚姻狀況 | 是否優質顧客? |
| ID-1 | 2 | 45000 | Married | 0 |
| ID-2 | 0 | 30000 | Single | 1 |
| ID-3 | 1 | 40000 | Divorced | 1 |
| … | ? | ? | ? | ? |
如果從學習數據中學習出如下規則
IF (孩子個數(NOC) = 2) AND (薪水(S) > 80000) THEN 優良顧客 ELSE 不良顧客。
這條規則以一條樹的形式可以表現如下。
遺傳編程(genetic programming)基于遺傳算法,傳統的遺傳算法是用定長的線性字符串表示一個基因。而遺傳編程基于樹的形式,其樹的深度和寬度是可變的。樹可以輕易表達算術表達式,邏輯表達式,程序等。例如
(1)算術表達式
表示成樹為
(2) 邏輯表達式:(x ù true) ? (( x ú y ) ú (z ? (x ù y)))??梢杂蓸浔磉_為
(3)程序
i =1;
while (i < 20){
i = i +1
}
可以表示為
正因為遺傳編程中,以樹的形式來表達基因,因此遺傳編程更適于表達復雜的結構問題。其用武之地也比遺傳算法廣泛得多了。開始的銀行尋找優良顧客就是其中一例子。
遺傳編程算法的一個最為簡單的例子,是嘗試構造一個簡單的數學函數。假設我們有一個包含輸入和輸出的表,如下
| x | y | Result |
| 2 | 7 | 21 |
| 8 | 5 | 83 |
| 8 | 4 | 81 |
| 7 | 9 | 75 |
| 7 | 4 | 65 |
其背后函數實際上是x*x+x+2*y+1?,F在打算來構造一個函數,來擬合上述表格中的數據。
首先構造擬合數據。定義如下函數。
?
?
?
?
def examplefun(x, y):return x * x + x + 2 * y + 1
def constructcheckdata(count=10):
checkdata = []
for i in range(0, count):
dic = {}
x = randint(0, 10)
y = randint(0, 10)
dic['x'] = x
dic['y'] = y
dic['result'] = examplefun(x, y)
checkdata.append(dic)
return checkdata
實際上一棵樹上的節點可以分成三種,分別函數,變量及常數。定義三個類來包裝它們:
class funwrapper:def __init__(self, function, childcount, name):
self.function = functionself.childcount = childcountself.name = nameclass variable:
def __init__(self, var, value=0):
self.var = var
self.value = value
self.name = str(var)
self.type = "variable"
def evaluate(self):
return self.varvalue
def setvar(self, value):
self.value = value
def display(self, indent=0):
print '%s%s' % (' '*indent, self.var)
class const:
def __init__(self, value):
self.value = value
self.name = str(value)
self.type = "constant"
def evaluate(self):
return self.value
def display(self, indent=0):
print '%s%d' % (' '*indent, self.value)
現在可以由這些節點來構造一棵樹了。
class node:def __init__(self, type, children, funwrap, var=None, const=None):
self.type = type
self.children = children
self.funwrap = funwrap
self.variable = var
self.const = const
self.depth = self.refreshdepth()
self.value = 0
self.fitness = 0
def eval(self):
if self.type == "variable":
return self.variable.value
elif self.type == "constant":
return self.const.value
else:
for c in self.children:
result = [c.eval() for c in self.children]
return self.funwrap.function(result)
def getfitness(self, checkdata):#checkdata like {"x":1,"result":3"}diff = 0#set variable valuefor data in checkdata:
self.setvariablevalue(data)
diff += abs(self.eval() - data["result"])
self.fitness = diff
def setvariablevalue(self, value):
if self.type == "variable":
if value.has_key(self.variable.var):
self.variable.setvar(value[self.variable.var])
else:
print "There is no value for variable:", self.variable.var
returnif self.type == "constant":
pass
if self.children:#function node
for child in self.children:
child.setvariablevalue(value)
def refreshdepth(self):
if self.type == "constant" or self.type == "variable":
return 0
else:
depth = []
for c in self.children:
depth.append(c.refreshdepth())
return max(depth) + 1
def __cmp__(self, other):
return cmp(self.fitness, other.fitness)
def display(self, indent=0):
if self.type == "function":
print (' '*indent) + self.funwrap.nameelif self.type == "variable":
print (' '*indent) + self.variable.nameelif self.type == "constant":
print (' '*indent) + self.const.name
if self.children:
for c in self.children:
c.display(indent + 1)
##for draw node
def getwidth(self):
if self.type == "variable" or self.type == "constant":
return 1
else:
result = 0
for i in range(0, len(self.children)):
result += self.children[i].getwidth()
return result
def drawnode(self, draw, x, y):
if self.type == "function":
allwidth = 0
for c in self.children:
allwidth += c.getwidth()*100
left = x - allwidth / 2
#draw the function namedraw.text((x - 10, y - 10), self.funwrap.name, (0, 0, 0))
#draw the children
for c in self.children:
wide = c.getwidth()*100
draw.line((x, y, left + wide / 2, y + 100), fill=(255, 0, 0))
c.drawnode(draw, left + wide / 2, y + 100)
left = left + wide
elif self.type == "variable":
draw.text((x - 5 , y), self.variable.name, (0, 0, 0))
elif self.type == "constant":
draw.text((x - 5 , y), self.const.name, (0, 0, 0))
def drawtree(self, jpeg="tree.png"):
w = self.getwidth()*100
h = self.depth * 100 + 120
img = Image.new('RGB', (w, h), (255, 255, 255))
draw = ImageDraw.Draw(img)
self.drawnode(draw, w / 2, 20)
img.save(jpeg, 'PNG') 其中計算適應度的函數getfitness(),是將變量賦值后計算所得的值,與正確的數據集的差的絕對值的和。Eval函數即為將變量賦值后,計算樹的值。構造出的樹如下圖,可由drawtree()函數作出。
其實這棵樹的數學表達式為x*x-3x。
然后就可以由這此樹來構造程序了。初始種群是隨機作成的。
def _maketree(self, startdepth):if startdepth == 0:
#make a new tree
nodepattern = 0#functionelif startdepth == self.maxdepth:nodepattern = 1#variable or constantelse:
nodepattern = randint(0, 1)
if nodepattern == 0:
childlist = []
selectedfun = randint(0, len(self.funwraplist) - 1)
for i in range(0, self.funwraplist[selectedfun].childcount):
child = self._maketree(startdepth + 1)
childlist.append(child)
return node("function", childlist, self.funwraplist[selectedfun])
else:
if randint(0, 1) == 0:#variable
selectedvariable = randint(0, len(self.variablelist) - 1)
return node("variable", None, None, variable(self.variablelist[selectedvariable]), None)
else:
selectedconstant = randint(0, len(self.constantlist) - 1)
return node("constant", None, None, None, const(self.constantlist[selectedconstant]))
當樹的深度被定義為0時,表明是從重新開始構造一棵新樹。當樹的深度達到最高深度時,生長的節點必須是變量型或者常數型。
當然程序不止這些。還包括對樹進行變異和交叉。變異的方式的方式為,選中一個節點后,產生一棵新樹來代替這個節點。當然并不是所有的節點都實施變異,只是按一個很小的概率。變異如下:
def mutate(self, tree, probchange=0.1, startdepth=0):if random() < probchange:
return self._maketree(startdepth)
else:
result = deepcopy(tree)
if result.type == "function":
result.children = [self.mutate(c, probchange, startdepth + 1) for c in tree.children]
return result
交叉的方式為:從種群中選出兩個優異者,用一棵樹的某個節點代替另一棵樹的節點,從而產生兩棵新樹。
def crossover(self, tree1, tree2, probswap=0.8, top=1):if random() < probswap and not top:
return deepcopy(tree2)
else:
result = deepcopy(tree1)
if tree1.type == "function" and tree2.type == "function":
result.children = [self.crossover(c, choice(tree2.children), probswap, 0)
for c in tree1.children]
return result
以上變異及交叉都涉及到從現有種群中選擇一棵樹。常用的選擇算法有錦標賽方法,即隨機選出幾棵樹后,按fitness選出最優的一棵樹。另一種方法是輪盤賭算法。即按fitness在種群的比率而隨機選擇。Fitness越大的樹,越有可能被選中。如下所列的輪盤賭函數。
def roulettewheelsel(self, reverse=False):if reverse == False:
allfitness = 0
for i in range(0, self.size):
allfitness += self.population[i].fitness
randomnum = random()*(self.size - 1)
check = 0
for i in range(0, self.size):
check += (1.0 - self.population[i].fitness / allfitness)
if check >= randomnum:
return self.population[i], i
if reverse == True:
allfitness = 0
for i in range(0, self.size):
allfitness += self.population[i].fitness
randomnum = random()
check = 0
for i in range(0, self.size):
check += self.population[i].fitness * 1.0 / allfitness
if check >= randomnum:
return self.population[i], i
其中參數reverse若為False,表明fitness越小,則這棵樹表現越優異。不然,則越大越優異。在本例中,選擇樹來進行變異和交叉時,選擇優異的樹來進行,以將優良的基因帶入下一代。而當變異和交叉出新的子樹時,則選擇較差的樹,將其淘汰掉。
現在可以構造進化環境了。
def envolve(self, maxgen=100, crossrate=0.9, mutationrate=0.1):for i in range(0, maxgen):
print "generation no.", i
child = []
for j in range(0, int(self.size * self.newbirthrate / 2)):
parent1, p1 = self.roulettewheelsel()
parent2, p2 = self.roulettewheelsel()
newchild = self.crossover(parent1, parent2)
child.append(newchild)#generate new tree
parent, p3 = self.roulettewheelsel()
newchild = self.mutate(parent, mutationrate)
child.append(newchild)
#refresh all tree's fitness
for j in range(0, int(self.size * self.newbirthrate)):
replacedtree, replacedindex = self.roulettewheelsel(reverse=True)
#replace bad tree with child
self.population[replacedindex] = child[j]
for k in range(0, self.size):
self.population[k].getfitness(self.checkdata)
self.population[k].depth=self.population[k].refreshdepth()
if self.minimaxtype == "min":
if self.population[k].fitness < self.besttree.fitness:
self.besttree = self.population[k]
elif self.minimaxtype == "max":
if self.population[k].fitness > self.besttree.fitness:
self.besttree = self.population[k]
print "best tree's fitbess..",self.besttree.fitness
self.besttree.display()
self.besttree.drawtree()
每次按newbirthrate的比率,淘汰表現不佳的舊樹,產生相應數目的新樹。每次迭代完后,比較fitness,選出最佳的樹。迭代的終止條件是其fitness等于零,即找到了正確的數學表達式,或者迭代次數超過了最大迭代次數。
還有其它一些細節代碼,暫且按下不表。自由教程可按這里下載:http://www.gp-field-guide.org.uk/
全部代碼可在這里下載:http://wp.me/pGEU6-z
?
?
?
Technorati Tags: 遺傳編程,算法轉載于:https://www.cnblogs.com/zgw21cn/archive/2009/11/07/1598238.html
總結
- 上一篇: 【old】简单易用的鹰眼类源代码下载
- 下一篇: 几个不错的自己到的少的游戏站