Format of Output ---------------- Different DGL-KE command line toolkits has different output data. Basically they have following dependency: * ``dglke_dist_train`` depends on the output of ``dglke_partition`` * ``dglke_eval`` depends on the output (Trained Embeddings) of the training CMD ``dglke_train`` or ``dglke_dist_train`` * ``dglke_predict`` and ``dglke_emb_sim`` depends on the the output (Trained Embeddings) of the training CMD ``dglke_train`` or ``dglke_dist_train`` as well as the ID mapping file. Output format of dglke_partition ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``dglke_partition`` parititions a graph into parts. It generates N partition directories according to the input argument ``-k N``. For example, when we set ``-k`` to 4, it will generate 4 directories: ``partition_0``, ``partition_1``, ``partition_2``, and ``partition_3``. The detailed format of each ``partition_n`` is used by ``dglke_dist_train`` only and is out of the current scope. Please refer to distributed train section for more details. Output format of dglke_train and dglke_dist_train ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The output of ``dglke_train`` and ``dglke_dist_train`` are almost the same. Here we explain the output of ``dglke_train`` in this paragraph. Basically there are four outputs: * **Traned Embeddings**: The saved model. For most of models like ``TransE``, ``RESCAL``, ``DistMult``, ``ComplEx``, and ``RotatE``, there will be two files: **\_\_entity.npy** for entity embedding and **\_\_relation.npy** for relation embedding. There are all saved numpy tensor objects. For ``TransR``, there is one additional output for saving the projection matrix. * **config.json**: The config file records all the details of the training configurations as well as the locations of ID mapping files generated by ``dgl_train``. The fields of the config file are shown below: ========================== ============ Field Name Explanation -------------------------- ------------ neg_sample_size int value of param --neg_sample_size max_train_step int value of param --max_step double_ent bool value of param --double_ent rmap_file **relation ID mapping file name** lr float value of param --lr neg_adversarial_sampling bool value of param --neg_adversarial_sampling gamma float value of param -- gamma adversarial_temperature float value of param -- adversarial_temperature batch_size int value of param -- batch_size regularization_coef float value of param --regularization_coef model model name dataset dataset name emb_size embedding dimention size regularization_norm int value of param --regularization_norm double_rel bool value of param --double_rel emap_file **entity ID mapping file name** ========================== ============ * **Training Log**: The output log printed to stdout. If ``--test`` is set. The final test result is also output (``MR``, ``MRR``, ``Hit@1``, ``Hit@3``, ``Hit@10``). * **ID mapping Files (Optional)**: The the input data is in format of **Raw User Defined Knowledge Graph**, that is all triplets use the Raw ID space. The training script will do the ID convertion and generate two ID mapping files: - entities.tsv, for entity ID mapping in format of ``KGE_entity_ID\tRaw_entity_Name``, for example: 0\\tBeijing 1\\tChina" - relations.tsv, for relation ID mapping in format of ``KGE_relation_ID\tRaw_relation_name``, for example: 0\\tis_capital_of 1\\tlocated_at Output format of dglke_eval ~~~~~~~~~~~~~~~~~~~~~~~~~~~ There will be only one output of ``dglke_eval``, the testing result including ``MR``, ``MRR``, ``Hit@1``, ``Hit@3``, ``Hit@10``. Output format of dglke_predict ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The output of ``dglke_predict`` is a list of top ranked candidate (h, r, t) triplets as well as their prediction scores. The output is by default written into ``result.tsv`` and in the format of 'src\\trel\\tdst\\tscore'. The example output is as:: src rel dst score 6 0 15 -2.39380 8 0 14 -2.65297 2 0 14 -2.67331 9 0 18 -2.86985 8 0 20 -2.89651 If the input data of ``dglke_predict`` is in Raw IDs, ``dglke_predict`` will also convert the output result in Raw IDs. The example output is as:: head rel tail score 08847694 _derivationally_related_form 09440400 -7.41088 08847694 _hyponym 09440400 -8.99562 02537319 _derivationally_related_form 01490112 -9.08666 02537319 _hyponym 01490112 -9.44877 00083809 _derivationally_related_form 05940414 -9.88155 Output format of dglke_emb_sim ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The output of ``dglke_emb_sim`` is a list of top ranked candidate (left, right) pairs as well as their embedding similarity scores. The output is by default written into ``result.tsv`` and in the format of 'left\\tright\\tscore'. The example output is as:: left right score 6 15 0.55512 1 12 0.33153 7 20 0.27706 7 19 0.25631 7 13 0.21372 If the input data of ``dglke_emb_sim`` is in Raw IDs, ``dglke_emb_sim`` will also convert the output result in Raw IDs. The example output is as:: left right score _hyponym _hyponym 0.99999 _derivationally_related_form _derivationally_related_form 0.99999 _hyponym _also_see 0.58408 _hyponym _member_of_domain_topic 0.44027 _hyponym _member_of_domain_region 0.30975