TensorFlow 2.0快速上手指南12条:“Keras之父”亲授 | 高赞热贴
如何用TensorFlow 2.0 + Keras進行機器學習研究?
谷歌深度學習研究員、“Keras之父”Fran?ois Chollet發表推特,總結了一份TensorFlow 2.0 + Keras做深度學習研究的速成指南。
在這份指南中,Chollet提出了12條必備準則,條條簡短易用,全程干貨滿滿,在推特上引發了近3K網友點贊,千人轉發。
不多說了,一起看看大神“化繁為簡”的編程世界:
必備指南12條
1)你首先需要學習層(Layer),一層Layer里就封裝了一種狀態和一些計算。
?
from?tensorflow.keras.layers?import?Layerclass?Linear(Layer):"""y?=?w.x?+?b"""def?__init__(self,?units=32,?input_dim=32):super(Linear,?self).__init__()w_init?=?tf.random_normal_initializer()self.w?=?tf.Variable(initial_value=w_init(shape=(input_dim,?units),?dtype='float32'),trainable=True)b_init?=?tf.zeros_initializer()self.b?=?tf.Variable(initial_value=b_init(shape=(units,),?dtype='float32'),trainable=True)def?call(self,?inputs):return?tf.matmul(inputs,?self.w)?+?self.b#?Instantiate?our?layer. linear_layer?=?Linear(4,?2)#?The?layer?can?be?treated?as?a?function. #?Here?we?call?it?on?some?data. y?=?linear_layer(tf.ones((2,?2))) assert?y.shape?==?(2,?4)#?Weights?are?automatically?tracked?under?the?`weights`?property. assert?linear_layer.weights?==?[linear_layer.w,?linear_layer.b]?
2)add_weight方法可能是構建權重的捷徑。
3)可以實踐一下在單獨的build中構建權重,用layer捕捉的第一個輸入的shape來調用add_weight方法,這種模式不用我們再去指定input_dim了。
?
class?Linear(Layer):"""y?=?w.x?+?b"""def?__init__(self,?units=32):super(Linear,?self).__init__()self.units?=?unitsdef?build(self,?input_shape):self.w?=?self.add_weight(shape=(input_shape[-1],?self.units),initializer='random_normal',trainable=True)self.b?=?self.add_weight(shape=(self.units,),initializer='random_normal',trainable=True)def?call(self,?inputs):return?tf.matmul(inputs,?self.w)?+?self.b#?Instantiate?our?lazy?layer. linear_layer?=?Linear(4)#?This?will?also?call?`build(input_shape)`?and?create?the?weights. y?=?linear_layer(tf.ones((2,?2)))?
4)如果想自動檢索這一層權重的梯度,可以在GradientTape中調用。利用這些梯度,你可以使用優化器或者手動更新的權重。當然,你也可以在使用前修正梯度。
?
#?Prepare?a?dataset. (x_train,?y_train),?_?=?tf.keras.datasets.mnist.load_data() dataset?=?tf.data.Dataset.from_tensor_slices((x_train.reshape(60000,?784).astype('float32')?/?255,?y_train)) dataset?=?dataset.shuffle(buffer_size=1024).batch(64)#?Instantiate?our?linear?layer?(defined?above)?with?10?units. linear_layer?=?Linear(10)#?Instantiate?a?logistic?loss?function?that?expects?integer?targets. loss_fn?=?tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)#?Instantiate?an?optimizer. optimizer?=?tf.keras.optimizers.SGD(learning_rate=1e-3)#?Iterate?over?the?batches?of?the?dataset. for?step,?(x,?y)?in?enumerate(dataset):#?Open?a?GradientTape.with?tf.GradientTape()?as?tape:#?Forward?pass.logits?=?linear_layer(x)#?Loss?value?for?this?batch.loss?=?loss_fn(y,?logits)#?Get?gradients?of?weights?wrt?the?loss.gradients?=?tape.gradient(loss,?linear_layer.trainable_weights)#?Update?the?weights?of?our?linear?layer.optimizer.apply_gradients(zip(gradients,?linear_layer.trainable_weights))#?Logging.if?step?%?100?==?0:print(step,?float(loss))?
5)層創建的權重可以是可訓練的,也可以是不可訓練的,是否可訓練在trainable_weights和non_trainable_weights中查看。下面這個是一個不可訓練的層:
?
class?ComputeSum(Layer):"""Returns?the?sum?of?the?inputs."""def?__init__(self,?input_dim):super(ComputeSum,?self).__init__()#?Create?a?non-trainable?weight.self.total?=?tf.Variable(initial_value=tf.zeros((input_dim,)),trainable=False)def?call(self,?inputs):self.total.assign_add(tf.reduce_sum(inputs,?axis=0))return?self.total??my_sum?=?ComputeSum(2) x?=?tf.ones((2,?2))y?=?my_sum(x) print(y.numpy())??#?[2.?2.]y?=?my_sum(x) print(y.numpy())??#?[4.?4.]assert?my_sum.weights?==?[my_sum.total] assert?my_sum.non_trainable_weights?==?[my_sum.total] assert?my_sum.trainable_weights?==?[]?
6)可以將層遞歸嵌套創建一個更大的計算塊。無論是可訓練的還是不可訓練的,每一層都與它子層(sublayer)的權重有關聯。
?
#?Let's?reuse?the?Linear?class #?with?a?`build`?method?that?we?defined?above.class?MLP(Layer):"""Simple?stack?of?Linear?layers."""def?__init__(self):super(MLP,?self).__init__()self.linear_1?=?Linear(32)self.linear_2?=?Linear(32)self.linear_3?=?Linear(10)def?call(self,?inputs):x?=?self.linear_1(inputs)x?=?tf.nn.relu(x)x?=?self.linear_2(x)x?=?tf.nn.relu(x)return?self.linear_3(x)mlp?=?MLP()#?The?first?call?to?the?`mlp`?object?will?create?the?weights. y?=?mlp(tf.ones(shape=(3,?64)))#?Weights?are?recursively?tracked. assert?len(mlp.weights)?==?6?
7)層可以在向前傳遞的過程中帶來損失,將損失正則化很管用。
?
class?ActivityRegularization(Layer):"""Layer?that?creates?an?activity?sparsity?regularization?loss."""def?__init__(self,?rate=1e-2):super(ActivityRegularization,?self).__init__()self.rate?=?ratedef?call(self,?inputs):#?We?use?`add_loss`?to?create?a?regularization?loss#?that?depends?on?the?inputs.self.add_loss(self.rate?*?tf.reduce_sum(inputs))return?inputs#?Let's?use?the?loss?layer?in?a?MLP?block.class?SparseMLP(Layer):"""Stack?of?Linear?layers?with?a?sparsity?regularization?loss."""def?__init__(self):super(SparseMLP,?self).__init__()self.linear_1?=?Linear(32)self.regularization?=?ActivityRegularization(1e-2)self.linear_3?=?Linear(10)def?call(self,?inputs):x?=?self.linear_1(inputs)x?=?tf.nn.relu(x)x?=?self.regularization(x)return?self.linear_3(x)mlp?=?SparseMLP() y?=?mlp(tf.ones((10,?10)))print(mlp.losses)??#?List?containing?one?float32?scalar?
8)這些損失在向前傳遞時開始由頂層清除,因此不會累積。layer.losses只包含在最后一次向前傳遞中產生的損失。在寫訓練循環時,你通常會在計算梯度之前,將這些損失再累加起來。
?
#?Losses?correspond?to?the?*last*?forward?pass. mlp?=?SparseMLP() mlp(tf.ones((10,?10))) assert?len(mlp.losses)?==?1 mlp(tf.ones((10,?10))) assert?len(mlp.losses)?==?1??#?No?accumulation.#?Let's?demonstrate?how?to?use?these?losses?in?a?training?loop.#?Prepare?a?dataset. (x_train,?y_train),?_?=?tf.keras.datasets.mnist.load_data() dataset?=?tf.data.Dataset.from_tensor_slices((x_train.reshape(60000,?784).astype('float32')?/?255,?y_train)) dataset?=?dataset.shuffle(buffer_size=1024).batch(64)#?A?new?MLP. mlp?=?SparseMLP()#?Loss?and?optimizer. loss_fn?=?tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) optimizer?=?tf.keras.optimizers.SGD(learning_rate=1e-3)for?step,?(x,?y)?in?enumerate(dataset):with?tf.GradientTape()?as?tape:#?Forward?pass.logits?=?mlp(x)#?External?loss?value?for?this?batch.loss?=?loss_fn(y,?logits)#?Add?the?losses?created?during?the?forward?pass.loss?+=?sum(mlp.losses)#?Get?gradients?of?weights?wrt?the?loss.gradients?=?tape.gradient(loss,?mlp.trainable_weights)#?Update?the?weights?of?our?linear?layer.optimizer.apply_gradients(zip(gradients,?mlp.trainable_weights))#?Logging.if?step?%?100?==?0:print(step,?float(loss))?
9)把計算編譯成靜態圖再運行,可能在debug階段比直接運行表現更好。靜態圖是研究人員的好朋友,你可以通過將函數封裝在tf.function decorator中來編譯它們。
?
#?Prepare?our?layer,?loss,?and?optimizer. mlp?=?MLP() loss_fn?=?tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) optimizer?=?tf.keras.optimizers.SGD(learning_rate=1e-3)#?Create?a?training?step?function.@tf.function??#?Make?it?fast. def?train_on_batch(x,?y):with?tf.GradientTape()?as?tape:logits?=?mlp(x)loss?=?loss_fn(y,?logits)gradients?=?tape.gradient(loss,?mlp.trainable_weights)optimizer.apply_gradients(zip(gradients,?mlp.trainable_weights))return?loss#?Prepare?a?dataset. (x_train,?y_train),?_?=?tf.keras.datasets.mnist.load_data() dataset?=?tf.data.Dataset.from_tensor_slices((x_train.reshape(60000,?784).astype('float32')?/?255,?y_train)) dataset?=?dataset.shuffle(buffer_size=1024).batch(64)for?step,?(x,?y)?in?enumerate(dataset):loss?=?train_on_batch(x,?y)if?step?%?100?==?0:print(step,?float(loss))?
10)在訓練和推理的過程中,尤其是在批標準化層和Dropout層中,執行訓練和推理操作的流程是不一樣的。這時可以套用一種模板做法,在call中增加training(boolean) 參數。
通過此舉,你可以在訓練和推理階段正確使用內部評估循環。
?
class?Dropout(Layer):def?__init__(self,?rate):super(Dropout,?self).__init__()self.rate?=?rate@tf.functiondef?call(self,?inputs,?training=None):#?Note?that?the?tf.function?decorator?enables?use#?to?use?imperative?control?flow?like?this?`if`,#?while?defining?a?static?graph!if?training:return?tf.nn.dropout(inputs,?rate=self.rate)return?inputsclass?MLPWithDropout(Layer):def?__init__(self):super(MLPWithDropout,?self).__init__()self.linear_1?=?Linear(32)self.dropout?=?Dropout(0.5)self.linear_3?=?Linear(10)def?call(self,?inputs,?training=None):x?=?self.linear_1(inputs)x?=?tf.nn.relu(x)x?=?self.dropout(x,?training=training)return?self.linear_3(x)mlp?=?MLPWithDropout() y_train?=?mlp(tf.ones((2,?2)),?training=True) y_test?=?mlp(tf.ones((2,?2)),?training=False)問題的關鍵還是在于Keras+TensorFlow2.0里面我們如何處理在training和testing狀態下行為不一致的Layer;以及對于model.fit()和tf.funtion這兩種訓練方法的區別,最終來看model.fit()里面似乎包含很多詭異的行為。
最終的使用建議如下:
- 1)使用構建Model的subclass,但是針對call()設置training的狀態,對于BatchNoramlization,Dropout這樣的Layer進行不同處理
- 2)使用Functional API或者Sequential的方式構建Model,設置tf.keras.backend.set_learning_phase(True),但是注意在testing的時候改變一下狀態
?
11)你可以有很多內置層,從Dense、Conv2D、LSTM到Conv2DTranspose和 ConvLSTM2D都可以擁有,學會重新利用內置功能。
12)如果要構建深度學習模型,你不必總是面向對象編程。到目前為止,你能看到的所有層都可以在功能上進行組合,就像下面這樣:
?
#?We?use?an?`Input`?object?to?describe?the?shape?and?dtype?of?the?inputs. #?This?is?the?deep?learning?equivalent?of?*declaring?a?type*. #?The?shape?argument?is?per-sample;?it?does?not?include?the?batch?size. #?The?functional?API?focused?on?defining?per-sample?transformations. #?The?model?we?create?will?automatically?batch?the?per-sample?transformations, #?so?that?it?can?be?called?on?batches?of?data. inputs?=?tf.keras.Input(shape=(16,))#?We?call?layers?on?these?"type"?objects #?and?they?return?updated?types?(new?shapes/dtypes). x?=?Linear(32)(inputs)?#?We?are?reusing?the?Linear?layer?we?defined?earlier. x?=?Dropout(0.5)(x)?#?We?are?reusing?the?Dropout?layer?we?defined?earlier. outputs?=?Linear(10)(x)#?A?functional?`Model`?can?be?defined?by?specifying?inputs?and?outputs. #?A?model?is?itself?a?layer?like?any?other. model?=?tf.keras.Model(inputs,?outputs)#?A?functional?model?already?has?weights,?before?being?called?on?any?data. #?That's?because?we?defined?its?input?shape?in?advance?(in?`Input`). assert?len(model.weights)?==?4#?Let's?call?our?model?on?some?data,?for?fun. y?=?model(tf.ones((2,?16))) assert?y.shape?==?(2,?10)#?You?can?pass?a?`training`?argument?in?`__call__` #?(it?will?get?passed?down?to?the?Dropout?layer). y?=?model(tf.ones((2,?16)),?training=True)?
這就是函數API,它比子分類更簡潔易用,不過它只能用于定義層中的DAG。
掌握了上述指南12條,就能實現大多數深度學習研究了,是不是贊贊的。
傳送門
最后,附Chollet推特原文地址:
https://twitter.com/fchollet/status/1105139360226140160
Google Colab Notebook地址:
https://colab.research.google.com/drive/17u-pRZJnKN0gO5XZmq8n5A2bKGrfKEUg#scrollTo=rwREGJ7Wiyl9
總結
以上是生活随笔為你收集整理的TensorFlow 2.0快速上手指南12条:“Keras之父”亲授 | 高赞热贴的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 世纪之花怎么打
- 下一篇: NVIDIA显卡驱动版本,CUDA版本,