Evaluation on Pre-Trained Embeddings

dglke_eval reads the pre-trained embeddings and evaluates the quality of the embeddings with a link prediction task on the test set.

Arguments

The command line provides the following arguments:

  • --model_name {TransE, TransE_l1, TransE_l2, TransR, RESCAL, DistMult, ComplEx, RotatE} The models provided by DGL-KE.
  • --data_path DATA_PATH The name of the knowledge graph stored under data_path. If it is one ofthe builtin knowledge grpahs such as FB15k, DGL-KE will automatically download the knowledge graph and keep it under data_path.
  • --dataset DATASET The name of the knowledge graph stored under data_path. If it is one ofthe builtin knowledge grpahs such as FB15k, DGL-KE will automatically download the knowledge graph and keep it under data_path.
  • --format FORMAT The format of the dataset. For builtin knowledge graphs, the format is determined automatically. For users own knowledge graphs, it needs to be raw_udd_{htr} or udd_{htr}. raw_udd_ indicates that the user’s data use raw ID for entities and relations and udd_ indicates that the user’s data uses KGE ID. {htr} indicates the location of the head entity, tail entity and relation in a triplet. For example, htr means the head entity is the first element in the triplet, the tail entity is the second element and the relation is the last element.
  • --data_files [DATA_FILES ...] A list of data file names. This is used if users want to train KGE on their own datasets. If the format is raw_udd_{htr}, users need to provide train_file [valid_file] [test_file]. If the format is udd_{htr}, users need to provide entity_file relation_file train_file [valid_file] [test_file]. In both cases, valid_file and test_file are optional.
  • --delimiter DELIMITER Delimiter used in data files. Note all files should use the same delimiter.
  • --model_path MODEL_PATH The place where models are saved.
  • --batch_size_eval BATCH_SIZE_EVAL Batch size used for eval and test
  • --neg_sample_size_eval NEG_SAMPLE_SIZE_EVAL Negative sampling size for testing
  • --neg_deg_sample_eval Negative sampling proportional to vertex degree for testing.
  • --hidden_dim HIDDEN_DIM Hidden dim used by relation and entity
  • -g GAMMA or --gamma GAMMA The margin value in the score function. It is used by TransX and RotatE.
  • --eval_percent EVAL_PERCENT Randomly sample some percentage of edges for evaluation.
  • --no_eval_filter Disable filter positive edges from randomly constructed negative edges for evaluation.
  • --gpu [GPU ...] A list of gpu ids, e.g. 0 1 2 4
  • --mix_cpu_gpu Training a knowledge graph embedding model with both CPUs and GPUs.The embeddings are stored in CPU memory and the training is performed in GPUs.This is usually used for training a large knowledge graph embeddings.
  • -de or --double_ent Double entitiy dim for complex number It is used by RotatE.
  • -dr or --double_rel Double relation dim for complex number.
  • --num_proc NUM_PROC The number of processes to train the model in parallel.In multi-GPU training, the number of processes by default is set to match the number of GPUs. If set explicitly, the number of processes needs to be divisible by the number of GPUs.
  • --num_thread NUM_THREAD The number of CPU threads to train the model in each process. This argument is used for multi-processing training.

Examples

The following command evaluates the pre-trained KG embedding on multi-cores:

dglke_eval --model_name TransE_l2 --dataset FB15k --hidden_dim 400 --gamma 19.9 --batch_size_eval 16 \
--num_thread 1 --num_proc 8 --model_path ~/my_task/ckpts/TransE_l2_FB15k_0/

We can also use GPUs in our evaluation tasks:

dglke_eval --model_name TransE_l2 --dataset FB15k --hidden_dim 400 --gamma 19.9 --batch_size_eval 16 \
--gpu 0 1 2 3 4 5 6 7 --model_path ~/my_task/ckpts/TransE_l2_FB15k_0/