Format of Output
----------------

Different DGL-KE command line toolkits has different output data. Basically they have following dependency:

  * ``dglke_dist_train`` depends on the output of ``dglke_partition``
  * ``dglke_eval`` depends on the output (Trained Embeddings) of the training CMD ``dglke_train`` or ``dglke_dist_train``
  * ``dglke_predict`` and ``dglke_emb_sim`` depends on the the output (Trained Embeddings) of the training CMD ``dglke_train`` or ``dglke_dist_train`` as well as the ID mapping file.

Output format of dglke_partition
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``dglke_partition`` parititions a graph into parts. It generates N partition directories according to the input argument ``-k N``. For example, when we set ``-k`` to 4, it will generate 4 directories: ``partition_0``, ``partition_1``, ``partition_2``, and ``partition_3``.

The detailed format of each ``partition_n`` is used by ``dglke_dist_train`` only and is out of the current scope. Please refer to distributed train section for more details.

Output format of dglke_train and dglke_dist_train
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The output of ``dglke_train`` and ``dglke_dist_train`` are almost the same.
Here we explain the output of ``dglke_train`` in this paragraph.

Basically there are four outputs:

  * **Traned Embeddings**: The saved model. For most of models like ``TransE``, ``RESCAL``, ``DistMult``, ``ComplEx``, and ``RotatE``, there will be two files: **<dataset_name>\_<model>\_entity.npy** for entity embedding and **<dataset_name>\_<model>\_relation.npy** for relation embedding. There are all saved numpy tensor objects. For ``TransR``, there is one additional output for saving the projection matrix.
  * **config.json**: The config file records all the details of the training configurations as well as the locations of ID mapping files generated by ``dgl_train``. The fields of the config file are shown below:

  ========================== ============
  Field Name                 Explanation
  -------------------------- ------------
  neg_sample_size            int value of param --neg_sample_size
  max_train_step             int value of param --max_step
  double_ent                 bool value of param --double_ent
  rmap_file                  **relation ID mapping file name**
  lr                         float value of param --lr
  neg_adversarial_sampling   bool value of param --neg_adversarial_sampling
  gamma                      float value of param -- gamma
  adversarial_temperature    float value of param -- adversarial_temperature
  batch_size                 int value of param -- batch_size
  regularization_coef        float value of param --regularization_coef
  model                      model name
  dataset                    dataset name
  emb_size                   embedding dimention size 
  regularization_norm        int value of param --regularization_norm
  double_rel                 bool value of param --double_rel
  emap_file                  **entity ID mapping file name**
  ========================== ============
  
  * **Training Log**: The output log printed to stdout. If ``--test`` is set. The final test result is also output (``MR``, ``MRR``, ``Hit@1``, ``Hit@3``, ``Hit@10``).
  * **ID mapping Files (Optional)**: The the input data is in format of **Raw User Defined Knowledge Graph**, that is all triplets use the Raw ID space. The training script will do the ID convertion and generate two ID mapping files: 

    - entities.tsv, for entity ID mapping in format of ``KGE_entity_ID\tRaw_entity_Name``, for example:

        0\\tBeijing

        1\\tChina"

    - relations.tsv, for relation ID mapping in format of ``KGE_relation_ID\tRaw_relation_name``, for example:

        0\\tis_capital_of

        1\\tlocated_at


Output format of dglke_eval
~~~~~~~~~~~~~~~~~~~~~~~~~~~

There will be only one output of ``dglke_eval``, the testing result including ``MR``, ``MRR``, ``Hit@1``, ``Hit@3``, ``Hit@10``.

Output format of dglke_predict
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The output of ``dglke_predict`` is a list of top ranked candidate (h, r, t) triplets as well as their prediction scores. The output is by default written into ``result.tsv`` and in the format of 'src\\trel\\tdst\\tscore'. 

The example output is as::

    src  rel  dst  score
    6    0    15   -2.39380
    8    0    14   -2.65297
    2    0    14   -2.67331
    9    0    18   -2.86985
    8    0    20   -2.89651

If the input data of ``dglke_predict`` is in Raw IDs, ``dglke_predict`` will also convert the output result in Raw IDs.

The example output is as::
    head      rel                           tail      score
    08847694  _derivationally_related_form  09440400  -7.41088
    08847694  _hyponym                      09440400  -8.99562
    02537319  _derivationally_related_form  01490112  -9.08666
    02537319  _hyponym                      01490112  -9.44877
    00083809  _derivationally_related_form  05940414  -9.88155

Output format of dglke_emb_sim
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The output of ``dglke_emb_sim`` is a list of top ranked candidate (left, right) pairs as well as their embedding similarity scores. The output is by default written into ``result.tsv`` and in the format of 'left\\tright\\tscore'. 

The example output is as::

    left    right   score
    6       15      0.55512
    1       12      0.33153
    7       20      0.27706
    7       19      0.25631
    7       13      0.21372

If the input data of ``dglke_emb_sim`` is in Raw IDs, ``dglke_emb_sim`` will also convert the output result in Raw IDs.

The example output is as::

    left                          right                           score
    _hyponym                      _hyponym                        0.99999
    _derivationally_related_form  _derivationally_related_form    0.99999
    _hyponym                      _also_see                       0.58408
    _hyponym                      _member_of_domain_topic         0.44027
    _hyponym                      _member_of_domain_region        0.30975