Paper:《Graph Neural Networks: A Review of Methods and Applications》翻译与解读
Paper:《Graph Neural Networks: A Review of Methods and Applications》翻譯與解讀
?
?
目錄
《Graph Neural Networks: A Review of Methods and Applications》翻譯與解讀
Abstract
1. Introduction
2. Neural Networks as Relational Graphs
2.1. Message Exchange over Graphs
2.2. Fixed-width MLPs as Relational Graphs
2.3. General Neural Networks as Relational Graphs
3. Exploring Relational Graphs
3.1. Selection of Graph Measures
3.2. Design of Graph Generators
3.3. Controlling Computational Budget
4. Experimental Setup
4.1. Base Architectures
4.2. Exploration with Relational Graphs
5. Results
5.1. A Sweet Spot for Top Neural Networks
5.2. Neural Network Performance as a Smooth Function over Graph Measures
5.3. Consistency across Architectures
5.4. Quickly Identifying a Sweet Spot
5.5. Network Science and Neuroscience Connections
6. Related Work
7. Discussions
8. Conclusion
Acknowledgments
?
?
《Graph Neural Networks: A Review of Methods and Applications》翻譯與解讀
原論文地址:
https://arxiv.org/pdf/2007.06559.pdf
https://arxiv.org/abs/2007.06559
| Comments: | ICML 2020 [Submitted on 13 Jul 2020] |
| Subjects: | Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI); Machine Learning (stat.ML) |
| Cite as: | arXiv:2007.06559?[cs.LG] (or?arXiv:2007.06559v1?[cs.LG]?for this version) |
?
Abstract
| Neural networks are often represented as graphs of connections between neurons. However, despite their wide use, there is currently little understanding of the relationship between the graph structure of the neural network and its predictive performance. Here we systematically investigate how does the graph structure of neural networks affect their predictive performance. To this end, we develop a novel graph-based representation of neural networks called relational graph, where layers of neural network computation correspond to rounds of message exchange along the graph structure. Using this representation we show that: (1) a "sweet spot" of relational graphs leads to neural networks with significantly improved predictive performance; (2) neural network's performance is approximately a smooth function of the clustering coefficient and average path length of its relational graph; (3) our findings are consistent across many different tasks and datasets; (4) the sweet spot can be identified efficiently; (5) top-performing neural networks have graph structure surprisingly similar to those of real biological neural networks. Our work opens new directions for the design of neural architectures and the understanding on neural networks in general. | 神經網絡通常用神經元之間的連接圖來表示。然而,盡管它們被廣泛使用,目前人們對神經網絡的圖結構與其預測性能之間的關系知之甚少。在這里,我們系統(tǒng)地研究如何圖結構的神經網絡影響其預測性能。為此,我們開發(fā)了一種新的基于圖的神經網絡表示,稱為關系圖,其中神經網絡計算的層對應于沿著圖結構的消息交換輪數(shù)。利用這種表示法,我們證明:
|
?
?
1. Introduction
| Deep neural networks consist of neurons organized into layers and connections between them. Architecture of a neural network can be captured by its “computational graph” where neurons are represented as nodes and directed edges link neurons in different layers. Such graphical representation demonstrates how the network passes and transforms the information from its input neurons, through hidden layers?all the way to the output neurons (McClelland et al., 1986). While it has been widely observed that performance of neural networks depends on their architecture (LeCun et al., 1998; Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; Szegedy et al., 2015; He et al., 2016), there is currently little systematic understanding on the relation between a neural network’s accuracy and its underlying graph structure. This is especially important for the neural architecture search, which today exhaustively searches over all possible connectivity patterns (Ying et al., 2019). From this perspective, several open questions arise:
Establishing such a relation is both scientifically and practically important because it would have direct consequences on designing more efficient and more accurate architectures. It would also inform the design of new hardware architectures that execute neural networks. Understanding the graph structures that underlie neural networks would also advance the science of deep learning. However, establishing the relation between network architecture and its accuracy is nontrivial, because it is unclear how to map a neural network to a graph (and vice versa). The natural choice would be to use computational graph representation but it has many limitations: (1) lack of generality: Computational graphs are constrained by the allowed graph properties, e.g., these graphs have to be directed and acyclic (DAGs), bipartite at the layer level, and single-in-single-out at the network level (Xie et al., 2019). This limits the use of the rich tools developed for general graphs. (2) Disconnection with biology/neuroscience: Biological neural networks have a much richer and less templatized structure (Fornito et al., 2013). There are information exchanges, rather than just single-directional flows, in the brain networks (Stringer et al., 2018). Such biological or neurological models cannot be simply represented by directed acyclic graphs. | 深層神經網絡由神經元組成,這些神經元被組織成層,并在層與層之間建立聯(lián)系。神經網絡的結構可以通過它的“計算圖”來捕獲,其中神經元被表示為節(jié)點,有向邊連接不同層次的神經元。這樣的圖形表示演示了網絡如何傳遞和轉換來自輸入神經元的信息,通過隱藏層一直到輸出神經元(McClelland et al., 1986)。雖然已廣泛觀察到神經網絡的性能取決于其結構(LeCun et al., 1998;Krizhevsky等,2012;Simonyan & Zisserman, 2015;Szegedy等,2015;對于神經網絡的精度與其底層圖結構之間的關系,目前尚無系統(tǒng)的認識。這對于神經結構搜索來說尤為重要,如今,神經結構搜索遍尋所有可能的連通性模式(Ying等人,2019)。從這個角度來看,幾個開放的問題出現(xiàn)了:
建立這樣的關系在科學上和實踐上都很重要,因為它將直接影響到設計更高效、更精確的架構。它還將指導執(zhí)行神經網絡的新硬件架構的設計。理解神經網絡的圖形結構也將推進深度學習科學。 然而,建立網絡架構與其準確度之間的關系并非無關緊要,因為還不清楚如何將神經網絡映射到圖(反之亦然)。自然選擇是使用計算圖表示,但它有很多限制:(1)缺乏普遍性:計算圖被限制允許圖形屬性,例如,這些圖表必須指導和無環(huán)(無進取心的人),在層級別由兩部分構成的,single-in-single-out在網絡層(謝et al ., 2019)。這限制了為一般圖形開發(fā)的豐富工具的使用。(2)與生物學/神經科學的分離:生物神經網絡具有更豐富的、更少圣殿化的結構(Fornito et al., 2013)。大腦網絡中存在著信息交換,而不僅僅是單向流動(Stringer et al., 2018)。這樣的生物或神經模型不能簡單地用有向無環(huán)圖來表示。 |
| Here we systematically study the relationship between the graph structure of a neural network and its predictive performance. We develop a new way of representing a neural network as a graph, which we call relational graph. Our?key insight is to focus on message exchange, rather than just on directed data flow. As a simple example, for a fixedwidth fully-connected layer, we can represent one input channel and one output channel together as a single node, and an edge in the relational graph represents the message exchange between the two nodes (Figure 1(a)). Under this formulation, using appropriate message exchange definition, we show that the relational graph can represent many types of neural network layers (a fully-connected layer, a convolutional layer, etc.), while getting rid of many constraints of computational graphs (such as directed, acyclic, bipartite, single-in-single-out). One neural network layer corresponds to one round of message exchange over a relational graph, and to obtain deep networks, we perform message exchange over the same graph for several rounds. Our new representation enables us to build neural networks that are richer and more diverse and analyze them using well-established tools of network science (Barabasi & Psfai ′ , 2016). We then design a graph generator named WS-flex that allows us to systematically explore the design space of neural networks (i.e., relation graphs). Based on the insights from neuroscience, we characterize neural networks by the clustering coefficient and average path length of their relational graphs (Figure 1(c)). Furthermore, our framework is flexible and general, as we can translate relational graphs into diverse neural architectures, including Multilayer Perceptrons (MLPs), Convolutional Neural Networks (CNNs), ResNets, etc. with controlled computational budgets (Figure 1(d)). | 本文系統(tǒng)地研究了神經網絡的圖結構與其預測性能之間的關系。我們提出了一種將神經網絡表示為圖的新方法,即關系圖。我們的重點是關注消息交換,而不僅僅是定向數(shù)據(jù)流。作為一個簡單的例子,對于固定寬度的全連接層,我們可以將一個輸入通道和一個輸出通道一起表示為單個節(jié)點,關系圖中的一條邊表示兩個節(jié)點之間的消息交換(圖1(a))。在此公式下,利用適當?shù)南⒔粨Q定義,我們表明關系圖可以表示多種類型的神經網絡層(全連通層、卷積層等),同時擺脫了計算圖的許多約束(如有向、無環(huán)、二部圖、單入單出)。一個神經網絡層對應于在關系圖上進行一輪消息交換,為了獲得深度網絡,我們在同一圖上進行幾輪消息交換。我們的新表示使我們能夠構建更加豐富和多樣化的神經網絡,并使用成熟的網絡科學工具對其進行分析(Barabasi & Psfai, 2016)。 然后我們設計了一個名為WS-flex的圖形生成器,它允許我們系統(tǒng)地探索神經網絡的設計空間。關系圖)?;谏窠浛茖W的見解,我們通過聚類系數(shù)和關系圖的平均路徑長度來描述神經網絡(圖1(c))。此外,我們的框架是靈活和通用的,因為我們可以將關系圖轉換為不同的神經結構,包括多層感知器(MLPs)、卷積神經網絡(CNNs)、ResNets等,并控制計算預算(圖1(d))。 |
| Using standard image classification datasets CIFAR-10 and ImageNet, we conduct a systematic study on how the architecture of neural networks affects their predictive performance. We make several important empirical observations:
Our results have implications for designing neural network architectures, advancing the science of deep learning and improving our understanding of neural networks in general. | 使用標準圖像分類數(shù)據(jù)集CIFAR-10和ImageNet,我們對神經網絡的結構如何影響其預測性能進行了系統(tǒng)研究。我們做了幾個重要的經驗觀察:
? |
| Figure 1: Overview of our approach. (a) A layer of a neural network can be viewed as a relational graph where we connect nodes that exchange messages. (b) More examples of neural network layers and relational graphs. (c) We explore the design space of relational graphs according to their graph measures, including average path length and clustering coefficient, where the complete graph corresponds to a fully-connected layer. (d) We translate these relational graphs to neural networks and study how their predictive performance depends on the graph measures of their corresponding relational graphs. 圖1:我們方法的概述。(a)神經網絡的一層可以看作是一個關系圖,在這里我們連接交換消息的節(jié)點。(b)神經網絡層和關系圖的更多例子。(c)我們根據(jù)關系圖的圖度量來探索關系圖的設計空間,包括平均路徑長度和聚類系數(shù),其中完全圖對應一個全連通層。我們將這些關系圖轉換為神經網絡,并研究它們的預測性能如何取決于對應關系圖的圖度量。 | |
?
?
2. Neural Networks as Relational Graphs
| To explore the graph structure of neural networks, we first introduce the concept of our relational graph representation and its instantiations. We demonstrate how our representation can capture diverse neural network architectures under a unified framework. Using the language of graph in the context of deep learning helps bring the two worlds together and establish a foundation for our study. | 為了探討神經網絡的圖結構,我們首先介紹關系圖表示的概念及其實例。我們將演示如何在統(tǒng)一框架下捕獲不同的神經網絡架構。在深度學習的背景下使用graph語言有助于將這兩個世界結合起來,為我們的研究奠定基礎。 |
2.1. Message Exchange over Graphs
| We start by revisiting the definition of a neural network from the graph perspective. We define a graph G = (V, E) by its node set V = {v1, ..., vn} and edge set E ? {(vi , vj )|vi , vj ∈ V}. We assume each node v has a node feature scalar/vector xv. Table 1: Diverse neural architectures expressed in the language of relational graphs. These architectures are usually implemented as complete relational graphs, while we systematically explore more graph structures for these architectures. 表1:用關系圖語言表示的各種神經結構。這些架構通常被實現(xiàn)為完整的關系圖,而我們系統(tǒng)地為這些架構探索更多的圖結構。 Figure 2: Example of translating a 4-node relational graph to a 4-layer 65-dim MLP. We highlight the message exchange for node x1. Using different definitions of xi , fi(·), AGG(·) and R (those defined in Table 1), relational graphs can be translated to diverse neural architectures. 圖2:將4節(jié)點關系圖轉換為4層65-dim MLP的示例。我們將重點介紹節(jié)點x1的消息交換。使用xi、fi(·)、AGG(·)和R(表1中定義的)的不同定義,關系圖可以轉換為不同的神經結構。 | |
| We call graph G a relational graph, when it is associated with message exchanges between neurons. Specifically, a message exchange is defined by a message function, whose input is a node’s feature and output is a message, and an aggregation function, whose input is a set of messages and output is the updated node feature. At each round of message exchange, each node sends messages to its neighbors, and aggregates incoming messages from its neighbors. Each message is transformed at each edge through a message function f(·), then they are aggregated at each node via an aggregation function AGG(·). Suppose we conduct R rounds of message exchange, then the r-th round of message exchange for a node v can be described as Equation 1 provides a general definition for message exchange. In the remainder of this section, we discuss how this general message exchange definition can be instantiated as different neural architectures. We summarize the different instantiations in Table 1, and provide a concrete example of instantiating a 4-layer 65-dim MLP in Figure 2. | 我們稱圖G為關系圖,當它與神經元之間的信息交換有關時。具體來說,消息交換由消息函數(shù)和聚合函數(shù)定義,前者的輸入是節(jié)點的特性,輸出是消息,后者的輸入是一組消息,輸出是更新后的節(jié)點特性。在每一輪消息交換中,每個節(jié)點向它的鄰居發(fā)送消息,并聚合從它的鄰居傳入的消息。每個消息通過消息函數(shù)f(·)在每個邊進行轉換,然后通過聚合函數(shù)AGG(·)在每個節(jié)點進行聚合。假設我們進行了R輪的消息交換,那么節(jié)點v的第R輪消息交換可以描述為 公式1提供了消息交換的一般定義。在本節(jié)的其余部分中,我們將討論如何將這個通用消息交換定義實例化為不同的神經結構。我們在表1中總結了不同的實例,并在圖2中提供了實例化4層65-dim MLP的具體示例。 |
?
2.2. Fixed-width MLPs as Relational Graphs
| A Multilayer Perceptron (MLP) consists of layers of computation units (neurons), where each neuron performs a weighted sum over scalar inputs and outputs, followed by some non-linearity. Suppose the r-th layer of an MLP takes x (r) as input and x (r+1) as output, then a neuron computes: | 一個多層感知器(MLP)由多層計算單元(神經元)組成,其中每個神經元執(zhí)行標量輸入和輸出的加權和,然后是一些非線性。假設MLP的第r層以x (r)為輸入,x (r+1)為輸出,則有一個神經元計算: |
| The above discussion reveals that a fixed-width MLP can be viewed as a complete relational graph with a special message exchange function. Therefore, a fixed-width MLP is a special case under a much more general model family, where the message function, aggregation function, and most importantly, the relation graph structure can vary. This insight allows us to generalize fixed-width MLPs from using complete relational graph to any general relational graph G. Based on the general definition of message exchange in Equation 1, we have: | 上面的討論表明,可以將固定寬度的MLP視為具有特殊消息交換功能的完整關系圖。因此,固定寬度的MLP是更為通用的模型系列中的一種特殊情況,其中消息函數(shù)、聚合函數(shù)以及最重要的關系圖結構可能會發(fā)生變化。 這使得我們可以將固定寬度MLPs從使用完全關系圖推廣到任何一般關系圖g。根據(jù)公式1中消息交換的一般定義,我們有: |
?
2.3. General Neural Networks as Relational Graphs
| The graph viewpoint in Equation 3 lays the foundation of representing fixed-width MLPs as relational graphs. In this?section, we discuss how we can further generalize relational graphs to general neural networks. Variable-width MLPs as relational graphs. An important design consideration for general neural networks is that layer width often varies through out the network. For example, in CNNs, a common practice is to double the layer width (number of feature channels) after spatial down-sampling. | 式3中的圖點為將定寬MLPs表示為關系圖奠定了基礎。在這一節(jié)中,我們將討論如何進一步將關系圖推廣到一般的神經網絡。 可變寬度MLPs作為關系圖。對于一般的神經網絡,一個重要的設計考慮是網絡的層寬經常是不同的。例如,在CNNs中,常用的做法是在空間下采樣后將層寬(特征通道數(shù))增加一倍。 |
| Note that under this definition, the maximum number of nodes of a relational graph is bounded by the width of the narrowest layer in the corresponding neural network (since the feature dimension for each node must be at least 1). | 注意,在這個定義下,關系圖的最大節(jié)點數(shù)以對應神經網絡中最窄層的寬度為界(因為每個節(jié)點的特征維數(shù)必須至少為1)。 |
| Modern neural architectures as relational graphs. Finally, we generalize relational graphs to represent modern neural architectures with more sophisticated designs. For?example, to represent a ResNet (He et al., 2016), we keep the residual connections between layers unchanged. To represent neural networks with bottleneck transform (He et al., 2016), a relational graph alternatively applies message exchange with 3×3 and 1×1 convolution; similarly, in the efficient computing setup, the widely used separable convolution (Howard et al., 2017; Chollet, 2017) can be viewed as alternatively applying message exchange with 3×3 depth-wise convolution and 1×1 convolution. Overall, relational graphs provide a general representation for neural networks. With proper definitions of node features and message exchange, relational graphs can represent diverse neural architectures, as is summarized in Table 1. | 作為關系圖的現(xiàn)代神經結構。最后,我們推廣了關系圖,用更復雜的設計來表示現(xiàn)代神經結構。例如,為了表示ResNet (He et al., 2016),我們保持層之間的剩余連接不變。為了用瓶頸變換表示神經網絡(He et al., 2016),關系圖交替應用3×3和1×1卷積的消息交換;同樣,在高效的計算設置中,廣泛使用的可分離卷積(Howard et al., 2017;Chollet, 2017)可以看作是3×3深度卷積和1×1卷積交替應用消息交換。 總的來說,關系圖提供了神經網絡的一般表示。通過正確定義節(jié)點特性和消息交換,關系圖可以表示不同的神經結構,如表1所示。 |
?
3. Exploring Relational Graphs
| In this section, we describe in detail how we design and explore the space of relational graphs defined in Section 2, in order to study the relationship between the graph structure of neural networks and their predictive performance. Three main components are needed to make progress: (1) graph measures that characterize graph structural properties, (2) graph generators that can generate diverse graphs, and (3) a way to control the computational budget, so that the differences in performance of different neural networks are due to their diverse relational graph structures. | 在本節(jié)中,我們將詳細描述如何設計和探索第二節(jié)中定義的關系圖空間,以研究神經網絡的圖結構與其預測性能之間的關系。三個主要組件是需要取得進展:(1)圖的措施描述圖的結構性質,(2)圖形發(fā)生器,可以產生不同的圖表,和(3)一種方法來控制計算預算,以便不同神經網絡的性能的差異是由于各自不同的關系圖結構。 |
?
3.1. Selection of Graph Measures
| Given the complex nature of graph structure, graph measures are often used to characterize graphs. In this paper, we focus on one global graph measure, average path length, and one local graph measure, clustering coefficient. Notably, these two measures are widely used in network science (Watts & Strogatz, 1998) and neuroscience (Sporns, 2003; Bassett & Bullmore, 2006). Specifically, average path length measures the average shortest path distance between any pair of nodes; clustering coefficient measures the proportion of edges between the nodes within a given node’s neighborhood, divided by the number of edges that could possibly exist between them, averaged over all the nodes. There are other graph measures that can be used for analysis, which are included in the Appendix. | 由于圖結構的復雜性,圖測度通常被用來刻畫圖的特征。本文主要研究了一個全局圖測度,即平均路徑長度,和一個局部圖測度,即聚類系數(shù)。值得注意的是,這兩種方法在網絡科學(Watts & Strogatz, 1998)和神經科學(Sporns, 2003;巴西特和布爾莫爾,2006年)。具體來說,平均路徑長度度量任意對節(jié)點之間的平均最短路徑距離;聚類系數(shù)度量給定節(jié)點鄰域內節(jié)點之間的邊的比例,除以它們之間可能存在的邊的數(shù)量,平均到所有節(jié)點上。還有其他可以用于分析的圖表度量,包括在附錄中。 |
?
3.2. Design of Graph Generators
| Given selected graph measures, we aim to generate diverse graphs that can cover a large span of graph measures, using a graph generator. However, such a goal requires careful generator designs: classic graph generators can only generate a limited class of graphs, while recent learning-based graph generators are designed to imitate given exemplar graphs (Kipf & Welling, 2017; Li et al., 2018b; You et al., 2018a;b; 2019a). Limitations of existing graph generators. To illustrate the limitation of existing graph generators, we investigate the following classic graph generators: (1) Erdos-R ? enyi ′ (ER) model that can sample graphs with given node and edge number uniformly at random (Erdos & R ? enyi ′ , 1960); (2) Watts-Strogatz (WS) model that can generate graphs with small-world properties (Watts & Strogatz, 1998); (3) Barabasi-Albert (BA) model that can generate scale-free ′ graphs (Albert & Barabasi ′ , 2002); (4) Harary model that can generate graphs with maximum connectivity (Harary, 1962); (5) regular ring lattice graphs (ring graphs); (6) complete graphs. For all types of graph generators, we control the number of nodes to be 64, enumerate all possible discrete parameters and grid search over all continuous parameters of the graph generator. We generate 30 random graphs with different random seeds under each parameter setting. In total, we generate 486,000 WS graphs, 53,000 ER graphs, 8,000 BA graphs, 1,800 Harary graphs, 54 ring graphs and 1 complete graph (more details provided in the Appendix). In Figure 3, we can observe that graphs generated by those classic graph generators have a limited span in the space of average path length and clustering coefficient. | 給定選定的圖度量,我們的目標是使用圖生成器生成能夠覆蓋大范圍圖度量的不同圖。然而,這樣的目標需要仔細的生成器設計:經典的圖形生成器只能生成有限的圖形類,而最近的基于學習的圖形生成器被設計用來模仿給定的范例圖形(Kipf & Welling, 2017;李等,2018b;You等,2018a;b;2019年)。 現(xiàn)有圖形生成器的局限性。為了說明現(xiàn)有圖生成器的限制,我們調查以下經典圖生成器:(1)可以隨機統(tǒng)一地用給定節(jié)點和邊數(shù)采樣圖的厄多斯-R圖表生成器(厄多斯& R他們是在1960年);(2)能夠生成具有小世界特性圖的Watts-Strogatz (WS)模型(Watts & Strogatz, 1998);(3) barabsi -Albert (BA)模型,可以生成無尺度的’圖(Albert & Barabasi, 2002);(4)可生成最大連通性圖的Harary模型(Harary, 1962);(5)正則環(huán)格圖(環(huán)圖);(6)完成圖表。對于所有類型的圖生成器,我們控制節(jié)點數(shù)為64,枚舉所有可能的離散參數(shù),并對圖生成器的所有連續(xù)參數(shù)進行網格搜索。我們在每個參數(shù)設置下生成30個帶有不同隨機種子的隨機圖??偣采?86,000個WS圖、53,000個ER圖、8,000個BA圖、1,800個Harary圖、54個環(huán)圖和1個完整圖(詳情見附錄)。在圖3中,我們可以看到經典的圖生成器生成的圖在平均路徑長度和聚類系數(shù)的空間中具有有限的跨度。 ? |
| WS-flex graph generator. Here we propose the WS-flex graph generator that can generate graphs with a wide coverage of graph measures; notably, WS-flex graphs almost encompass all the graphs generated by classic random generators mentioned above, as is shown in Figure 3. WSflex generator generalizes WS model by relaxing the constraint that all the nodes have the same degree before random rewiring. Specifically, WS-flex generator is parametrized by node n, average degree k and rewiring probability p. The number of edges is determined as e = bn ? k/2c. Specifically, WS-flex generator first creates a ring graph where each node connects to be/nc neighboring nodes; then the generator randomly picks e mod n nodes and connects each node to one closest neighboring node; finally, all the edges are randomly rewired with probability p. We use WS-flex generator to smoothly sample within the space of clustering coefficient and average path length, then sub-sample 3942 graphs for our experiments, as is shown in Figure 1(c). | WS-flex圖生成器。在這里,我們提出了WS-flex圖形生成器,它可以生成覆蓋范圍廣泛的圖形度量;值得注意的是,WS-flex圖幾乎包含了上面提到的經典隨機生成器生成的所有圖,如圖3所示。WSflex生成器是對WS模型的一般化,它放寬了隨機重新布線前所有節(jié)點具有相同程度的約束。具體來說,WS-flex生成器由節(jié)點n、平均度k和重新布線概率p參數(shù)化。邊的數(shù)量確定為e = bn?k/2c。具體來說,WS-flex生成器首先創(chuàng)建一個環(huán)圖,其中每個節(jié)點連接到be/nc相鄰節(jié)點;然后隨機選取e mod n個節(jié)點,將每個節(jié)點連接到一個最近的相鄰節(jié)點;最后,以概率p隨機重新連接所有的邊。我們使用WS-flex生成器在聚類系數(shù)和平均路徑長度的空間內平滑采樣,然后進行我們實驗的子樣本3942張圖,如圖1(c)所示。 |
?
?
3.3. Controlling Computational Budget
| To compare the neural networks translated by these diverse graphs, it is important to ensure that all networks have approximately the same complexity, so that the differences in performance are due to their relational graph structures. We use FLOPS (# of multiply-adds) as the metric. We first compute the FLOPS of our baseline network instantiations (i.e. complete relational graph), and use them as the reference complexity in each experiment. As described in Section 2.3, a relational graph structure can be instantiated as a neural network with variable width, by partitioning dimensions or channels into disjoint set of node features. Therefore, we?can conveniently adjust the width of a neural network to match the reference complexity (within 0.5% of baseline FLOPS) without changing the relational graph structures. We provide more details in the Appendix. ? | 為了比較由這些不同圖轉換的神經網絡,確保所有網絡具有近似相同的復雜性是很重要的,這樣性能上的差異是由于它們的關系圖結構造成的。我們使用FLOPS (number of multiply- added)作為度量標準。我們首先計算我們的基線網絡實例化的失敗(即完全關系圖),并使用它們作為每個實驗的參考復雜度。如2.3節(jié)所述,通過將維度或通道劃分為節(jié)點特征的不相交集,可以將關系圖結構實例化為寬度可變的神經網絡。因此,我們可以方便地調整神經網絡的寬度,以匹配參考復雜度(在基線失敗的0.5%以內),而不改變關系圖結構。我們在附錄中提供了更多細節(jié)。 |
?
4. Experimental Setup
| Considering the large number of candidate graphs (3942 in total) that we want to explore, we first investigate graph structure of MLPs on the CIFAR-10 dataset (Krizhevsky, 2009) which has 50K training images and 10K validation images. We then further study the larger and more complex task of ImageNet classification (Russakovsky et al., 2015), which consists of 1K image classes, 1.28M training images and 50K validation images. | 考慮到我們想要探索的候選圖數(shù)量很大(總共3942個),我們首先在CIFAR-10數(shù)據(jù)集(Krizhevsky, 2009)上研究MLPs的圖結構,該數(shù)據(jù)集有50K訓練圖像和10K驗證圖像。然后,我們進一步研究了更大、更復雜的ImageNet分類任務(Russakovsky et al., 2015),包括1K圖像類、1.28M訓練圖像和50K驗證圖像。 |
?
?
4.1. Base Architectures
| For CIFAR-10 experiments, We use a 5-layer MLP with 512 hidden units as the baseline architecture. The input of the MLP is a 3072-d flattened vector of the (32×32×3) image, the output is a 10-d prediction. Each MLP layer has a ReLU non-linearity and a BatchNorm layer (Ioffe & Szegedy, 2015). We train the model for 200 epochs with batch size 128, using cosine learning rate schedule (Loshchilov & Hutter, 2016) with an initial learning rate of 0.1 (annealed to 0, no restarting). We train all MLP models with 5 different random seeds and report the averaged results. | 在CIFAR-10實驗中,我們使用一個5層的MLP和512個隱藏單元作為基線架構。MLP的輸入是(32×32×3)圖像的3072 d平坦向量,輸出是10 d預測。每個MLP層具有ReLU非線性和BatchNorm層(Ioffe & Szegedy, 2015)。我們使用余弦學習率計劃(Loshchilov & Hutter, 2016)訓練批量大小為128的200個epoch的模型,初始學習率為0.1(退火到0,不重新啟動)。我們用5種不同的隨機種子訓練所有的MLP模型,并報告平均結果。 |
| For ImageNet experiments, we use three ResNet-family architectures, including (1) ResNet-34, which only consists of basic blocks of 3×3 convolutions (He et al., 2016); (2) ResNet-34-sep, a variant where we replace all 3×3 dense convolutions in ResNet-34 with 3×3 separable convolutions (Chollet, 2017); (3) ResNet-50, which consists of bottleneck blocks (He et al., 2016) of 1×1, 3×3, 1×1 convolutions. Additionally, we use EfficientNet-B0 architecture (Tan & Le, 2019) that achieves good performance in the small computation regime. Finally, we use a simple 8- layer CNN with 3×3 convolutions. The model has 3 stages with [64, 128, 256] hidden units. Stride-2 convolutions are used for down-sampling. The stem and head layers are?the same as a ResNet. We train all the ImageNet models for 100 epochs using cosine learning rate schedule with initial learning rate of 0.1. Batch size is 256 for ResNetfamily models and 512 for EfficientNet-B0. We train all ImageNet models with 3 random seeds and report the averaged performance. All the baseline architectures have a complete relational graph structure. The reference computational complexity is 2.89e6 FLOPS for MLP, 3.66e9 FLOPS for ResNet-34, 0.55e9 FLOPS for ResNet-34-sep, 4.09e9 FLOPS for ResNet-50, 0.39e9 FLOPS for EffcientNet-B0, and 0.17e9 FLOPS for 8-layer CNN. Training an MLP model roughly takes 5 minutes on a NVIDIA Tesla V100 GPU, and training a ResNet model on ImageNet roughly takes a day on 8 Tesla V100 GPUs with data parallelism. We provide more details in Appendix. | 在ImageNet實驗中,我們使用了三種resnet系列架構,包括(1)ResNet-34,它只包含3×3卷積的基本塊(He et al., 2016);(2) ResNet-34-sep,將ResNet-34中的所有3×3稠密卷積替換為3×3可分離卷積(Chollet, 2017);(3) ResNet-50,由1×1,3×3,1×1卷積的瓶頸塊(He et al., 2016)組成。此外,我們使用了efficiency - net - b0架構(Tan & Le, 2019),在小計算環(huán)境下取得了良好的性能。最后,我們使用一個簡單的8層CNN, 3×3卷積。模型有3個階段,隱含單元為[64,128,256]。Stride-2卷積用于下采樣。莖和頭層與ResNet相同。我們使用初始學習率為0.1的余弦學習率計劃對所有ImageNet模型進行100個epoch的訓練。批大小是256的ResNetfamily模型和512的效率網- b0。我們用3個隨機種子訓練所有的ImageNet模型,并報告平均性能。所有的基線架構都有一個完整的關系圖結構。MLP的參考計算復雜度是2.89e6失敗,ResNet-34的3.66e9失敗,ResNet-34-sep的0.55e9失敗,ResNet-50的4.09e9失敗,EffcientNet-B0的0.39e9失敗,8層CNN的0.17e9失敗。在NVIDIA Tesla V100 GPU上訓練一個MLP模型大約需要5分鐘,而在ImageNet上訓練一個ResNet模型大約需要一天,在8個Tesla V100 GPU上使用數(shù)據(jù)并行。我們在附錄中提供更多細節(jié)。 |
| Figure 4: Key results. The computational budgets of all the experiments are rigorously controlled. Each visualized result is averaged over at least 3 random seeds. A complete graph with C = 1 and L = 1 (lower right corner) is regarded as the baseline. (a)(c) Graph measures vs. neural network performance. The best graphs significantly outperform the baseline complete graphs. (b)(d) Single graph measure vs. neural network performance. Relational graphs that fall within the given range are shown as grey points. The overall smooth function is indicated by the blue regression line. (e) Consistency across architectures. Correlations of the performance of the same set of 52 relational graphs when translated to different neural architectures are shown. (f) Summary of all the experiments. Best relational graphs (the red crosses) consistently outperform the baseline complete graphs across different settings. Moreover, we highlight the “sweet spots” (red rectangular regions), in which relational graphs are not statistically worse than the best relational graphs (bins with red crosses). Bin values of 5-layer MLP on CIFAR-10 are average over all the relational graphs whose C and L fall into the given bin 圖4:關鍵結果。所有實驗的計算預算都是嚴格控制的。每個可視化結果至少在3個隨機種子上取平均值。以C = 1, L = 1(右下角)的完整圖作為基線。(a)(c)圖形測量與神經網絡性能。最好的圖表明顯優(yōu)于基線完整的圖表。(b)(d)單圖測量與神經網絡性能。在給定范圍內的關系圖顯示為灰色點。整體平滑函數(shù)用藍色回歸線表示。(e)架構之間的一致性。當轉換到不同的神經結構時,同一組52個關系圖的性能的相關性被顯示出來。(f)總結所有實驗。在不同的設置中,最佳關系圖(紅色叉)的表現(xiàn)始終優(yōu)于基線完整圖。此外,我們強調了“甜蜜點”(紅色矩形區(qū)域),其中關系圖在統(tǒng)計上并不比最佳關系圖(紅色叉的箱子)差。CIFAR-10上5層MLP的Bin值是C和L屬于給定Bin的所有關系圖的平均值 | |
?
4.2. Exploration with Relational Graphs
| For all the architectures, we instantiate each sampled relational graph as a neural network, using the corresponding definitions outlined in Table 1. Specifically, we replace all the dense layers (linear layers, 3×3 and 1×1 convolution layers) with their relational graph counterparts. We leave?the input and output layer unchanged and keep all the other designs (such as down-sampling, skip-connections, etc.) intact. We then match the reference computational complexity for all the models, as discussed in Section 3.3. | 對于所有的架構,我們使用表1中列出的相應定義將每個抽樣的關系圖實例化為一個神經網絡。具體來說,我們將所有的稠密層(線性層、3×3和1×1卷積層)替換為對應的關系圖。我們保持輸入和輸出層不變,并保持所有其他設計(如下采樣、跳接等)不變。然后我們匹配所有模型的參考計算復雜度,如3.3節(jié)所討論的。 |
| For CIFAR-10 MLP experiments, we study 3942 sampled relational graphs of 64 nodes as described in Section 3.2. For ImageNet experiments, due to high computational cost, we sub-sample 52 graphs uniformly from the 3942 graphs. Since EfficientNet-B0 is a small model with a layer that has only 16 channels, we can not reuse the 64-node graphs sampled for other setups. We re-sample 48 relational graphs with 16 nodes following the same procedure in Section 3. | 對于CIFAR-10 MLP實驗,我們研究了包含64個節(jié)點的3942個抽樣關系圖,如3.2節(jié)所述。在ImageNet實驗中,由于計算量大,我們從3942個圖中均勻地抽取52個圖。因為efficient - b0是一個只有16個通道的層的小模型,我們不能在其他設置中重用64節(jié)點圖。我們按照第3節(jié)中相同的步驟,對48個有16個節(jié)點的關系圖重新采樣。 |
?
?
5. Results
| In this section, we summarize the results of our experiments and discuss our key findings. We collect top-1 errors for all the sampled relational graphs on different tasks and architectures, and also record the graph measures (average path length L and clustering coefficient C) for each sampled graph. We present these results as heat maps of graph measures vs. predictive performance (Figure 4(a)(c)(f)). | 在本節(jié)中,我們將總結我們的實驗結果并討論我們的主要發(fā)現(xiàn)。我們收集了不同任務和架構上的所有抽樣關系圖的top-1錯誤,并記錄了每個抽樣圖的圖度量(平均路徑長度L和聚類系數(shù)C)。我們將這些結果作為圖表測量與預測性能的熱圖(圖4(a)(c)(f))。 |
?
5.1. A Sweet Spot for Top Neural Networks
| Overall, the heat maps of graph measures vs. predictive performance (Figure 4(f)) show that there exist graph structures that can outperform the complete graph (the pixel on bottom right) baselines. The best performing relational graph can outperform the complete graph baseline by 1.4% top-1 error on CIFAR-10, and 0.5% to 1.2% for models on ImageNet. Notably, we discover that top-performing graphs tend to cluster into a sweet spot in the space defined by C and L (red rectangles in Figure 4(f)). We follow these steps to identify a sweet spot: (1) we downsample and aggregate the 3942 graphs in Figure 4(a) into a coarse resolution of 52 bins, where each bin records the performance of graphs that fall into the bin; (2) we identify the bin with best average performance (red cross in Figure 4(f)); (3) we conduct onetailed t-test over each bin against the best-performing bin, and record the bins that are not significantly worse than the best-performing bin (p-value 0.05 as threshold). The minimum area rectangle that covers these bins is visualized as a sweet spot. For 5-layer MLP on CIFAR-10, the sweet spot is C ∈ [0.10, 0.50], L ∈ [1.82, 2.75]. | 總的來說,圖度量與預測性能的熱圖(圖4(f))表明,存在的圖結構可以超過整個圖(右下角的像素)基線。在CIFAR-10中,表現(xiàn)最好的關系圖的top-1誤差比整個圖基線高出1.4%,而在ImageNet中,模型的top-1誤差為0.5%到1.2%。值得注意的是,我們發(fā)現(xiàn)性能最好的圖往往聚集在C和L定義的空間中的一個最佳點(圖4(f)中的紅色矩形)。我們按照以下步驟來確定最佳點:(1)我們向下采樣并將圖4(a)中的3942個圖匯總為52個大致分辨率的bin,每個bin記錄落入bin的圖的性能;(2)我們確定平均性能最佳的bin(圖4(f)中的紅十字會);(3)對每個箱子與性能最好的箱子進行最小t檢驗,記錄性能不明顯差的箱子(p-value 0.05為閾值)。覆蓋這些箱子的最小面積矩形被可視化為一個甜點點。對于CIFAR-10上的5層MLP,最優(yōu)點C∈[0.10,0.50],L∈[1.82,2.75]。 |
?
5.2. Neural Network Performance as a Smooth Function over Graph Measures
| In Figure 4(f), we observe that neural network’s predictive performance is approximately a smooth function of the clustering coefficient and average path length of its relational graph. Keeping one graph measure fixed in a small range (C ∈ [0.4, 0.6], L ∈ [2, 2.5]), we visualize network performances against the other measure (shown in Figure 4(b)(d)). We use second degree polynomial regression to visualize the overall trend. We observe that both clustering coefficient and average path length are indicative of neural network performance, demonstrating a smooth U-shape correlation | 在圖4(f)中,我們觀察到神經網絡的預測性能近似是其聚類系數(shù)和關系圖平均路徑長度的平滑函數(shù)。將一個圖度量固定在一個小范圍內(C∈[0.4,0.6],L∈[2,2.5]),我們將網絡性能與另一個度量進行可視化(如圖4(b)(d)所示)。我們使用二次多項式回歸來可視化總體趨勢。我們觀察到,聚類系數(shù)和平均路徑長度都是神經網絡性能的指標,呈平滑的u形相關 |
?
?
5.3. Consistency across Architectures
| Figure 5: Quickly identifying a sweet spot. Left: The correlation between sweet spots identified using fewer samples of relational graphs and using all 3942 graphs. Right: The correlation between sweet spots identified at the intermediate training epochs and the final epoch (100 epochs). | 圖5:快速確定最佳點。左圖:使用較少的關系圖樣本和使用全部3942幅圖識別出的甜點之間的相關性。右圖:中間訓練時期和最后訓練時期(100個時期)確定的甜蜜點之間的相關性。 |
| Given that relational graph defines a shared design space across various neural architectures, we observe that relational graphs with certain graph measures may consistently perform well regardless of how they are instantiated. Qualitative consistency. We visually observe in Figure 4(f) that the sweet spots are roughly consistent across different architectures. Specifically, if we take the union of the sweet spots across architectures, we have C ∈ [0.43, 0.50], L ∈ [1.82, 2.28] which is the consistent sweet spot across architectures. Moreover, the U-shape trends between graph measures and corresponding neural network performance, shown in Figure 4(b)(d), are also visually consistent. Quantitative consistency. To further quantify this consistency across tasks and architectures, we select the 52 bins?in the heat map in Figure 4(f), where the bin value indicates the average performance of relational graphs whose graph measures fall into the bin range. We plot the correlation of the 52 bin values across different pairs of tasks, shown in Figure 4(e). We observe that the performance of relational graphs with certain graph measures correlates across different tasks and architectures. For example, even though a ResNet-34 has much higher complexity than a 5-layer MLP, and ImageNet is a much more challenging dataset than CIFAR-10, a fixed set relational graphs would perform similarly in both settings, indicated by a Pearson correlation of 0.658 (p-value < 10?8 ). | 假設關系圖定義了跨各種神經結構的共享設計空間,我們觀察到,無論如何實例化,具有特定圖度量的關系圖都可以始終執(zhí)行得很好。 定性的一致性。在圖4(f)中,我們可以直觀地看到,不同架構之間的甜點點基本上是一致的。具體來說,如果我們取跨架構的甜蜜點的并集,我們有C∈[0.43,0.50],L∈[1.82,2.28],這是跨架構的一致的甜蜜點。此外,圖4(b)(d)所示的圖測度與對應的神經網絡性能之間的u形趨勢在視覺上也是一致的。 量化一致性。為了進一步量化跨任務和架構的一致性,我們在圖4(f)的熱圖中選擇了52個bin,其中bin值表示圖度量在bin范圍內的關系圖的平均性能。我們繪制52個bin值在不同任務對之間的相關性,如圖4(e)所示。我們觀察到,具有特定圖形的關系圖的性能度量了不同任務和架構之間的關聯(lián)。例如,盡管ResNet-34比5層MLP復雜得多,ImageNet是一個比ciremote -10更具挑戰(zhàn)性的數(shù)據(jù)集,一個固定的集合關系圖在兩種設置中表現(xiàn)相似,通過0.658的Pearson相關性表示(p值< 10?8)。 |
?
5.4. Quickly Identifying a Sweet Spot
| Training thousands of relational graphs until convergence might be computationally prohibitive. Therefore, we quantitatively show that a sweet spot can be identified with much less computational cost, e.g., by sampling fewer graphs and training for fewer epochs. How many graphs are needed? Using the 5-layer MLP on CIFAR-10 as an example, we consider the heat map over 52 bins in Figure 4(f) which is computed using 3942 graph samples. We investigate if a similar heat map can be produced with much fewer graph samples. Specifically, we sub-sample the graphs in each bin while making sure each bin has at least one graph. We then compute the correlation between the 52 bin values computed using all 3942 graphs and using sub-sampled fewer graphs, as is shown in Figure 5 (left). We can see that bin values computed using only 52 samples have a high 0.90 Pearson correlation with the bin values computed using full 3942 graph samples. This finding suggests that, in practice, much fewer graphs are needed to conduct a similar analysis. | 訓練成千上萬的關系圖,直到運算上無法收斂為止。因此,我們定量地表明,可以用更少的計算成本來確定一個最佳點,例如,通過采樣更少的圖和訓練更少的epoch。 需要多少個圖?以CIFAR-10上的5層MLP為例,我們考慮圖4(f)中52個箱子上的熱圖,該熱圖使用3942個圖樣本計算。我們研究了是否可以用更少的圖表樣本制作類似的熱圖。具體來說,我們對每個容器中的圖進行子采樣,同時確保每個容器至少有一個圖。然后,我們計算使用所有3942圖和使用更少的次采樣圖計算的52個bin值之間的相關性,如圖5(左)所示。我們可以看到,僅使用52個樣本計算的bin值與使用全部3942個圖樣本計算的bin值有很高的0.90 Pearson相關性。這一發(fā)現(xiàn)表明,實際上,進行類似分析所需的圖表要少得多。 |
| ? |
?
5.5. Network Science and Neuroscience Connections
| Network science. The average path length that we measure characterizes how well information is exchanged across the network (Latora & Marchiori, 2001), which aligns with our definition of relational graph that consists of rounds of message exchange. Therefore, the U-shape correlation in Figure 4(b)(d) might indicate a trade-off between message exchange efficiency (Sengupta et al., 2013) and capability of learning distributed representations (Hinton, 1984). Neuroscience. The best-performing relational graph that we discover surprisingly resembles biological neural networks, as is shown in Table 2 and Figure 6. The similarities are in two-fold: (1) the graph measures (L and C) of top artificial neural networks are highly similar to biological neural networks; (2) with the relational graph representation, we can translate biological neural networks to 5-layer MLPs, and found that these networks also outperform the baseline complete graphs. While our findings are preliminary, our approach opens up new possibilities for interdisciplinary research in network science, neuroscience and deep learning. | 網絡科學。我們測量的平均路徑長度表征了信息在網絡中交換的良好程度(Latora & Marchiori, 2001),這與我們對包含輪消息交換的關系圖的定義一致。因此,圖4(b)(d)中的u形相關性可能表明消息交換效率(Sengupta et al., 2013)和學習分布式表示的能力(Hinton, 1984)之間的權衡。神經科學。我們發(fā)現(xiàn)的性能最好的關系圖與生物神經網絡驚人地相似,如表2和圖6所示。相似點有兩方面:
雖然我們的發(fā)現(xiàn)還處于初步階段,但我們的方法為網絡科學、神經科學和深度學習領域的跨學科研究開辟了新的可能性。 |
?
6. Related Work
| Neural network connectivity. The design of neural network connectivity patterns has been focused on computational graphs at different granularity: the macro structures, i.e. connectivity across layers (LeCun et al., 1998; Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; Szegedy et al., 2015; He et al., 2016; Huang et al., 2017; Tan & Le, 2019), and the micro structures, i.e. connectivity within a layer (LeCun et al., 1998; Xie et al., 2017; Zhang et al., 2018; Howard et al., 2017; Dao et al., 2019; Alizadeh et al., 2019). Our current exploration focuses on the latter, but the same methodology can be extended to the macro space. Deep Expander Networks (Prabhu et al., 2018) adopt expander graphs to generate bipartite structures. RandWire (Xie et al., 2019) generates macro structures using existing graph generators. However, the statistical relationships between graph structure measures and network predictive performances were not explored in those works. Another related work is Cross-channel Communication Networks (Yang et al., 2019) which aims to encourage the neuron communication through message passing, where only a complete graph structure is considered. | 神經網絡的連通性。神經網絡連通性模式的設計一直關注于不同粒度的計算圖:宏觀結構,即跨層連通性(LeCun et al., 1998;Krizhevsky等,2012;Simonyan & Zisserman, 2015;Szegedy等,2015;He et al., 2016;黃等,2017;Tan & Le, 2019)和微觀結構,即層內的連通性(LeCun et al., 1998;謝等,2017;張等,2018;Howard等人,2017;Dao等,2019年;Alizadeh等,2019)。我們目前的研究重點是后者,但同樣的方法可以擴展到宏觀空間。深度擴展器網絡(Prabhu et al., 2018)采用擴展器圖生成二部圖結構。RandWire (Xie等人,2019)使用現(xiàn)有的圖生成器生成宏結構。然而,圖結構測度與網絡預測性能之間的統(tǒng)計關系并沒有在這些工作中探索。另一項相關工作是跨通道通信網絡(Yang et al., 2019),旨在通過消息傳遞促進神經元通信,其中只考慮了完整的圖結構。 |
| Neural architecture search. Efforts on learning the connectivity patterns at micro (Ahmed & Torresani, 2018; Wortsman et al., 2019; Yang et al., 2018), or macro (Zoph & Le, 2017; Zoph et al., 2018) level mostly focus on improving learning/search algorithms (Liu et al., 2018; Pham et al., 2018; Real et al., 2019; Liu et al., 2019). NAS-Bench101 (Ying et al., 2019) defines a graph search space by enumerating DAGs with constrained sizes (≤ 7 nodes, cf. 64-node graphs in our work). Our work points to a new path: instead of exhaustively searching over all the possible connectivity patterns, certain graph generators and graph measures could define a smooth space where the search cost could be significantly reduced. | 神經結構搜索。在micro學習連接模式的努力(Ahmed & Torresani, 2018;Wortsman等人,2019年;Yang et al., 2018),或macro (Zoph & Le, 2017;Zoph等,2018)水平主要關注于改進學習/搜索算法(Liu等,2018;Pham等人,2018年;Real等人,2019年;Liu等,2019)。NAS-Bench101 (Ying et al., 2019)通過枚舉大小受限的DAGs(≤7個節(jié)點,我們的工作cf. 64節(jié)點圖)來定義圖搜索空間。我們的工作指向了一個新的路徑:不再對所有可能的連通性模式進行窮舉搜索,某些圖生成器和圖度量可以定義一個平滑的空間,在這個空間中搜索成本可以顯著降低。 |
?
7. Discussions
| Hierarchical graph structure of neural networks. As the first step in this direction, our work focuses on graph structures at the layer level. Neural networks are intrinsically hierarchical graphs (from connectivity of neurons to that of layers, blocks, and networks) which constitute a more complex design space than what is considered in this paper. Extensive exploration in that space will be computationally prohibitive, but we expect our methodology and findings to generalize. Efficient implementation. Our current implementation uses standard CUDA kernels thus relies on weight masking, which leads to worse wall-clock time performance compared with baseline complete graphs. However, the practical adoption of our discoveries is not far-fetched. Complementary?to our work, there are ongoing efforts such as block-sparse kernels (Gray et al., 2017) and fast sparse ConvNets (Elsen et al., 2019) which could close the gap between theoretical FLOPS and real-world gains. Our work might also inform the design of new hardware architectures, e.g., biologicallyinspired ones with spike patterns (Pei et al., 2019). | 神經網絡的層次圖結構。作為這個方向的第一步,我們的工作集中在層層次上的圖結構。神經網絡本質上是層次圖(從神經元的連通性到層、塊和網絡的連通性),它構成了比本文所考慮的更復雜的設計空間。在那個空間進行廣泛的探索在計算上是不可能的,但我們希望我們的方法和發(fā)現(xiàn)可以一般化。 高效的實現(xiàn)。我們目前的實現(xiàn)使用標準CUDA內核,因此依賴于weight masking,這導致wall-clock時間性能比基線完整圖更差。然而,實際應用我們的發(fā)現(xiàn)并不牽強。作為我們工作的補充,還有一些正在進行的工作,如塊稀疏核(Gray et al., 2017)和快速稀疏卷積網絡(Elsen et al., 2019),它們可以縮小理論失敗和現(xiàn)實收獲之間的差距。我們的工作也可能為新的硬件架構的設計提供信息,例如,受生物學啟發(fā)的帶有spike圖案的架構(Pei et al., 2019)。 |
| Prior vs. Learning. We currently utilize the relational graph representation as a structural prior, i.e., we hard-wire the graph structure on neural networks throughout training. It has been shown that deep ReLU neural networks can automatically learn sparse representations (Glorot et al., 2011). A further question arises: without imposing graph priors, does any graph structure emerge from training a (fully-connected) neural network? Figure 7: Prior vs. Learning. Results for 5-layer MLPs on CIFAR-10. We highlight the best-performing graph when used as a structural prior. Additionally, we train a fullyconnected MLP, and visualize the learned weights as a relational graph (different points are graphs under different thresholds). The learned graph structure moves towards the “sweet spot” after training but does not close the gap. | 之前與學習。我們目前利用關系圖表示作為結構先驗,即。,在整個訓練過程中,我們將圖形結構硬連接到神經網絡上。已有研究表明,深度ReLU神經網絡可以自動學習稀疏表示(Glorot et al., 2011)。一個進一步的問題出現(xiàn)了:在不強加圖先驗的情況下,訓練一個(完全連接的)神經網絡會產生任何圖結構嗎? ? ? ? ? ? ? ? ? 圖7:先驗與學習。5層MLPs在CIFAR-10上的結果。當使用結構先驗時,我們會突出顯示表現(xiàn)最佳的圖。此外,我們訓練一個完全連接的MLP,并將學習到的權重可視化為一個關系圖(不同的點是不同閾值下的圖)。學習后的圖結構在訓練后向“最佳點”移動,但并沒有縮小差距。 |
| As a preliminary exploration, we “reverse-engineer” a trained neural network and study the emerged relational graph structure. Specifically, we train a fully-connected 5-layer MLP on CIFAR-10 (the same setup as in previous experiments). We then try to infer the underlying relational graph structure of the network via the following steps: (1) to get nodes in a relational graph, we stack the weights from all the hidden layers and group them into 64 nodes, following the procedure described in Section 2.2; (2) to get undirected edges, the weights are summed by their transposes; (3) we compute the Frobenius norm of the weights as the edge value; (4) we get a sparse graph structure by binarizing edge values with a certain threshold. We show the extracted graphs under different thresholds in Figure 7. As expected, the extracted graphs at initialization follow the patterns of E-R graphs (Figure 3(left)), since weight matrices are randomly i.i.d. initialized. Interestingly, after training to convergence, the extracted graphs are no longer E-R random graphs and move towards the sweet spot region we found in Section 5. Note that there is still a gap?between these learned graphs and the best-performing graph imposed as a structural prior, which might explain why a fully-connected MLP has inferior performance. In our experiments, we also find that there are a few special cases where learning the graph structure can be superior (i.e., when the task is simple and the network capacity is abundant). We provide more discussions in the Appendix. Overall, these results further demonstrate that studying the graph structure of a neural network is crucial for understanding its predictive performance. | ? 作為初步的探索,我們“逆向工程”一個訓練過的神經網絡和研究出現(xiàn)的關系圖結構。具體來說,我們在CIFAR-10上訓練了一個完全連接的5層MLP(與之前的實驗相同的設置)。然后嘗試通過以下步驟來推斷網絡的底層關系圖結構:(1)為了得到關系圖中的節(jié)點,我們將所有隱含層的權值進行疊加,并按照2.2節(jié)的步驟將其分組為64個節(jié)點;(2)對權值的轉置求和,得到無向邊;(3)計算權值的Frobenius范數(shù)作為邊緣值;(4)通過對具有一定閾值的邊值進行二值化,得到一種稀疏圖結構。 ? 我們在圖7中顯示了在不同閾值下提取的圖。正如預期的那樣,初始化時提取的圖遵循E-R圖的模式(圖3(左)),因為權重矩陣是隨機初始化的。有趣的是,經過收斂訓練后,提取的圖不再是E-R隨機圖,而是朝著我們在第5節(jié)中發(fā)現(xiàn)的最佳點區(qū)域移動。請注意,在這些學習圖和作為結構先驗的最佳性能圖之間仍然存在差距,這可能解釋了為什么完全連接的MLP性能較差。 在我們的實驗中,我們也發(fā)現(xiàn)在一些特殊的情況下學習圖結構是更好的。當任務簡單且網絡容量充足時)。我們在附錄中提供了更多的討論??偟膩碚f,這些結果進一步證明了研究神經網絡的圖結構對于理解其預測性能是至關重要的。 ? ? |
| Unified view of Graph Neural Networks (GNNs) and general neural architectures. The way we define neural networks as a message exchange function over graphs is partly inspired by GNNs (Kipf & Welling, 2017; Hamilton et al., 2017; Velickovi ˇ c et al. ′ , 2018). Under the relational graph representation, we point out that GNNs are a special class of general neural architectures where: (1) graph structure is regarded as the input instead of part of the neural architecture; consequently, (2) message functions are shared across all the edges to respect the invariance properties of the input graph. Concretely, recall how we define general neural networks as relational graphs: Therefore, our work offers a unified view of GNNs and general neural architecture design, which we hope can bridge the two communities and inspire new innovations. On one hand, successful techniques in general neural architectures can be naturally introduced to the design of GNNs, such as separable convolution (Howard et al., 2017), group normalization (Wu & He, 2018) and Squeeze-and-Excitation block (Hu et al., 2018); on the other hand, novel GNN architectures (You et al., 2019b; Chen et al., 2019) beyond the commonly used paradigm (i.e., Equation 6) may inspire more advanced neural architecture designs. | 圖形神經網絡(GNNs)和一般神經結構的統(tǒng)一視圖。我們將神經網絡定義為圖形上的信息交換功能的方式,部分受到了gnn的啟發(fā)(Kipf & Welling, 2017;Hamilton等,2017;Velickoviˇc et al .′, 2018)。在關系圖表示下,我們指出gnn是一類特殊的一般神經結構,其中:(1)將圖結構作為輸入,而不是神經結構的一部分;因此,(2)消息函數(shù)在所有邊之間共享,以尊重輸入圖的不變性。具體地說,回想一下我們是如何將一般的神經網絡定義為關系圖的: 因此,我們的工作提供了一個關于gnn和一般神經結構設計的統(tǒng)一觀點,我們希望能夠搭建這兩個社區(qū)的橋梁,激發(fā)新的創(chuàng)新。一方面,一般神經結構中的成功技術可以自然地引入到gnn的設計中,如可分離卷積(Howard et al., 2017)、群歸一化(Wu & He, 2018)和擠壓-激勵塊(Hu et al., 2018);另一方面,新的GNN架構(You et al., 2019b;陳等人,2019)超越了常用的范式(即,(6)可以啟發(fā)更先進的神經結構設計。 |
?
8. Conclusion
| In sum, we propose a new perspective of using relational graph representation for analyzing and understanding neural networks. Our work suggests a new transition from studying conventional computation architecture to studying graph structure of neural networks. We show that well-established graph techniques and methodologies offered in other science disciplines (network science, neuroscience, etc.) could contribute to understanding and designing deep neural networks. We believe this could be a fruitful avenue of future research that tackles more complex situations. | 最后,我們提出了一種利用關系圖表示來分析和理解神經網絡的新觀點。我們的工作提出了一個新的過渡,從研究傳統(tǒng)的計算結構到研究神經網絡的圖結構。我們表明,在其他科學學科(網絡科學、神經科學等)中提供的成熟的圖形技術和方法可以有助于理解和設計深度神經網絡。我們相信這將是未來解決更復雜情況的研究的一個富有成效的途徑。 |
?
Acknowledgments
| This work is done during Jiaxuan You’s internship at Facebook AI Research. Jure Leskovec is a Chan Zuckerberg Biohub investigator. The authors thank Alexander Kirillov, Ross Girshick, Jonathan Gomes Selman, Pan Li for their helpful discussions. | 這項工作是在You Jiaxuan在Facebook AI Research實習期間完成的。Jure Leskovec是陳-扎克伯格生物中心的調查員。作者感謝Alexander Kirillov, Ross Girshick, Jonathan Gomes Selman和Pan Li的討論。 |
?
?
?
?
?
?
?
?
總結
以上是生活随笔為你收集整理的Paper:《Graph Neural Networks: A Review of Methods and Applications》翻译与解读的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: TF之AE:AE实现TF自带数据集数字真
- 下一篇: CSDN:借助工具对【本博客访问来源】进