Partition a Knowledge Graph¶
For distributed training, a user needs to partition a graph beforehand. DGL-KE provides a partition tool dglke_partition
, which partitions a given knowledge graph into N
parts with the METIS partition algorithm. This partition algorithm reduces the number of edge cuts between partitions to reduce network communication in the distributed training. For a cluster of P
machines, we usually split a graph into P
partitions and assign a partition to a machine as shown in the figure below.
Arguments¶
The command line provides the following arguments:
--data_path DATA_PATH
The name of the knowledge graph stored under data_path. If it is one ofthe builtin knowledge grpahs such as FB15k, DGL-KE will automatically download the knowledge graph and keep it under data_path.--dataset DATA_SET
The name of the knowledge graph stored under data_path. If it is one of the builtin knowledge grpahs such asFB15k
,FB15k-237
,wn18
,wn18rr
, andFreebase
, DGL-KE will automatically download the knowledge graph and keep it under data_path.--format FORMAT
The format of the dataset. For builtin knowledge graphs, the format is determined automatically. For users own knowledge graphs, it needs to beraw_udd_{htr}
orudd_{htr}
.raw_udd_
indicates that the user’s data use raw ID for entities and relations andudd_
indicates that the user’s data uses KGE ID.{htr}
indicates the location of the head entity, tail entity and relation in a triplet. For example,htr
means the head entity is the first element in the triplet, the tail entity is the second element and the relation is the last element.--data_files [DATA_FILES ...]
A list of data file names. This is required for training KGE on their own datasets. If the format israw_udd_{htr}
, users need to provide train_file [valid_file] [test_file]. If the format isudd_{htr}
, users need to provide entity_file relation_file train_file [valid_file] [test_file]. In both cases, valid_file and test_file are optional.--delimiter DELIMITER
Delimiter used in data files. Note all files should use the same delimiter.-k NUM_PARTS
or--num-parts NUM_PARTS
The number of partitions.