发布时间:2022-08-18 18:42
1.1 关于图的基本概念
1.2 图、节点和边
1.3 节点和边的特征
1.4 从外部源创建图
1.5 异构图
1.6 在GPU上使用DGLGraph
2.1 内置函数和消息传递API
2.2 编写高效的消息传递代码
2.3 在图的一部分上进行消息传递
2.4 在消息传递中使用边的权重
2.5 在异构图上进行消息传递
3.1 DGL NN模块的构造函数
3.2 编写DGL NN模块的forward函数
3.3 异构图上的GraphConv模块
4.1 DGLDataset类
4.2 下载原始数据(可选)
4.3 处理数据
4.4 保存和加载数据
4.5 使用ogb包导入OGB数据集
5.1 Node Classification/Regression
One of the most popular and widely adopted tasks for graph neural networks is node classification, where each node in the training/validation/test set is assigned a ground truth category from a set of predefined categories. Node regression is similar, where each node in the training/validation/test set is assigned a ground truth number.
节点分类是图神经网络最受欢迎和广泛采用的任务之一,其中训练/验证/测试集中的每个节点都从一组预定义的类别中分配一个ground truth类别。节点回归类似,训练/验证/测试集中的每个节点都被分配一个地面真值。
To classify nodes, graph neural network performs message passing discussed in Chapter 2: Message Passing to utilize the node’s own features, but also its neighboring node and edge features. Message passing can be repeated multiple rounds to incorporate information from larger range of neighborhood.
DGL provides a few built-in graph convolution modules that can perform one round of message passing. In this guide, we choose dgl.nn.pytorch.SAGEConv (also available in MXNet and Tensorflow), the graph convolution module for GraphSAGE.
Usually for deep learning models on graphs we need a multi-layer graph neural network, where we do multiple rounds of message passing. This can be achieved by stacking graph convolution modules as follows.
Note that you can use the model above for not only node classification, but also obtaining hidden node representations for other downstream tasks such as 5.2 Edge Classification/Regression, 5.3 Link Prediction, or 5.4 Graph Classification.
For a complete list of built-in graph convolution modules, please refer to dgl.nn.
For more details in how DGL neural network modules work and how to write a custom neural network module with message passing please refer to the example in Chapter 3: Building GNN Modules.
Training on the full graph simply involves a forward propagation of the model defined above, and computing the loss by comparing the prediction against ground truth labels on the training nodes.
This section uses a DGL built-in dataset dgl.data.CiteseerGraphDataset to show a training loop. The node features and labels are stored on its graph instance, and the training-validation-test split are also stored on the graph as boolean masks. This is similar to what you have seen in Chapter 4: Graph Data Pipeline.
在全图上的训练只涉及上面定义的模型的前向传播,并通过与训练节点上的ground truth标签比较预测来计算损失。
本节使用DGL内置数据集DGL .data。CiteseerGraphDataset显示一个训练循环。节点特性和标签存储在它的图实例中,训练-验证-测试分割也作为布尔掩码存储在图中。这类似于您在第4章:图形数据管道中看到的内容。
If your graph is heterogeneous, you may want to gather message from neighbors along all edge types. You can use the module dgl.nn.pytorch.HeteroGraphConv (also available in MXNet and Tensorflow) to perform message passing on all edge types, then combining different graph convolution modules for each edge type.
The following code will define a heterogeneous graph convolution module that first performs a separate graph convolution on each edge type, then sums the message aggregations on each edge type as the final result for all node types.
5.2 Edge Classification/Regression
Sometimes you wish to predict the attributes on the edges of the graph, or even whether an edge exists or not between two given nodes. In that case, you would like to have an edge classification/regression model.
Here we generate a random graph for edge prediction as a demonstration.
From the previous section you have learned how to do node classification with a multilayer GNN. The same technique can be applied for computing a hidden representation of any node. The prediction on edges can then be derived from the representation of their incident nodes.
The most common case of computing the prediction on an edge is to express it as a parameterized function of the representation of its incident nodes, and optionally the features on the edge itself.
Assuming that you compute the node representation with the model from the previous section, you only need to write another component that computes the edge prediction with the apply_edges() method.
For instance, if you would like to compute a score for each edge for edge regression, the following code computes the dot product of incident node representations on each edge.
One can also write a prediction function that predicts a vector for each edge with an MLP. Such vector can be used in further downstream tasks, e.g. as logits of a categorical distribution.
Given the node representation computation model and an edge predictor model, we can easily write a full-graph training loop where we compute the prediction on all edges.
The following example takes SAGE in the previous section as the node representation computation model and DotPredictor as an edge predictor model.
In this example, we also assume that the training/validation/test edge sets are identified by boolean masks on edges. This example also does not include early stopping and model saving.
Edge classification on heterogeneous graphs is not very different from that on homogeneous graphs. If you wish to perform edge classification on one edge type, you only need to compute the node representation for all node types, and predict on that edge type with apply_edges() method.
For example, to make DotProductPredictor work on one edge type of a heterogeneous graph, you only need to specify the edge type in apply_edges method.
5.3 Link Prediction
In some other settings you may want to predict whether an edge exists between two given nodes or not. Such model is called a link prediction model.
A GNN-based link prediction model represents the likelihood of connectivity between two nodes u and v as a function of h(L)u and h(L)v, their node representation computed from the multi-layer GNN.
Training a link prediction model involves comparing the scores between nodes connected by an edge against the scores between an arbitrary pair of nodes. For example, given an edge connecting u and v, we encourage the score between node u and v to be higher than the score between node u and a sampled node v′ from an arbitrary noise distribution v′∼Pn(v). Such methodology is called negative sampling.
There are lots of loss functions that can achieve the behavior above if minimized. A non-exhaustive list include:
5.4 Graph Classification
Instead of a big single graph, sometimes one might have the data in the form of multiple graphs, for example a list of different types of communities of people. By characterizing the friendship among people in the same community by a graph, one can get a list of graphs to classify. In this scenario, a graph classification model could help identify the type of the community, i.e. to classify each graph based on the structure and overall information.
The major difference between graph classification and node classification or link prediction is that the prediction result characterizes the property of the entire input graph. One can perform the message passing over nodes/edges just like the previous tasks, but also needs to retrieve a graph-level representation.
The graph classification pipeline proceeds as follows:
From left to right, the common practice is: