Format of Output¶
Different DGL-KE command line toolkits has different output data. Basically they have following dependency:
dglke_dist_train
depends on the output ofdglke_partition
dglke_eval
depends on the output (Trained Embeddings) of the training CMDdglke_train
ordglke_dist_train
dglke_predict
anddglke_emb_sim
depends on the the output (Trained Embeddings) of the training CMDdglke_train
ordglke_dist_train
as well as the ID mapping file.
Output format of dglke_partition¶
dglke_partition
parititions a graph into parts. It generates N partition directories according to the input argument -k N
. For example, when we set -k
to 4, it will generate 4 directories: partition_0
, partition_1
, partition_2
, and partition_3
.
The detailed format of each partition_n
is used by dglke_dist_train
only and is out of the current scope. Please refer to distributed train section for more details.
Output format of dglke_train and dglke_dist_train¶
The output of dglke_train
and dglke_dist_train
are almost the same.
Here we explain the output of dglke_train
in this paragraph.
Basically there are four outputs:
- Traned Embeddings: The saved model. For most of models like
TransE
,RESCAL
,DistMult
,ComplEx
, andRotatE
, there will be two files: <dataset_name>_<model>_entity.npy for entity embedding and <dataset_name>_<model>_relation.npy for relation embedding. There are all saved numpy tensor objects. ForTransR
, there is one additional output for saving the projection matrix.- config.json: The config file records all the details of the training configurations as well as the locations of ID mapping files generated by
dgl_train
. The fields of the config file are shown below:
Field Name Explanation neg_sample_size int value of param –neg_sample_size max_train_step int value of param –max_step double_ent bool value of param –double_ent rmap_file relation ID mapping file name lr float value of param –lr neg_adversarial_sampling bool value of param –neg_adversarial_sampling gamma float value of param – gamma adversarial_temperature float value of param – adversarial_temperature batch_size int value of param – batch_size regularization_coef float value of param –regularization_coef model model name dataset dataset name emb_size embedding dimention size regularization_norm int value of param –regularization_norm double_rel bool value of param –double_rel emap_file entity ID mapping file name
Training Log: The output log printed to stdout. If
--test
is set. The final test result is also output (MR
,MRR
,Hit@1
,Hit@3
,Hit@10
).ID mapping Files (Optional): The the input data is in format of Raw User Defined Knowledge Graph, that is all triplets use the Raw ID space. The training script will do the ID convertion and generate two ID mapping files:
entities.tsv, for entity ID mapping in format of
KGE_entity_ID\tRaw_entity_Name
, for example:0\tBeijing
1\tChina”
relations.tsv, for relation ID mapping in format of
KGE_relation_ID\tRaw_relation_name
, for example:0\tis_capital_of
1\tlocated_at
Output format of dglke_eval¶
There will be only one output of dglke_eval
, the testing result including MR
, MRR
, Hit@1
, Hit@3
, Hit@10
.
Output format of dglke_predict¶
The output of dglke_predict
is a list of top ranked candidate (h, r, t) triplets as well as their prediction scores. The output is by default written into result.tsv
and in the format of ‘src\trel\tdst\tscore’.
The example output is as:
src rel dst score
6 0 15 -2.39380
8 0 14 -2.65297
2 0 14 -2.67331
9 0 18 -2.86985
8 0 20 -2.89651
If the input data of dglke_predict
is in Raw IDs, dglke_predict
will also convert the output result in Raw IDs.
- The example output is as::
- head rel tail score 08847694 _derivationally_related_form 09440400 -7.41088 08847694 _hyponym 09440400 -8.99562 02537319 _derivationally_related_form 01490112 -9.08666 02537319 _hyponym 01490112 -9.44877 00083809 _derivationally_related_form 05940414 -9.88155
Output format of dglke_emb_sim¶
The output of dglke_emb_sim
is a list of top ranked candidate (left, right) pairs as well as their embedding similarity scores. The output is by default written into result.tsv
and in the format of ‘left\tright\tscore’.
The example output is as:
left right score
6 15 0.55512
1 12 0.33153
7 20 0.27706
7 19 0.25631
7 13 0.21372
If the input data of dglke_emb_sim
is in Raw IDs, dglke_emb_sim
will also convert the output result in Raw IDs.
The example output is as:
left right score
_hyponym _hyponym 0.99999
_derivationally_related_form _derivationally_related_form 0.99999
_hyponym _also_see 0.58408
_hyponym _member_of_domain_topic 0.44027
_hyponym _member_of_domain_region 0.30975