Predict entities/relations in triplets ====================================== `dglke_predict` predicts missing entities or relations in a triplet. Blow shows an example that predicts top 5 most likely destination entities for every given source node and relation:: src rel dst score 1 0 12 -5.11393 1 0 18 -6.10925 1 0 13 -6.66778 1 0 17 -6.81532 1 0 19 -6.83329 2 0 17 -5.09325 2 0 18 -5.42972 2 0 20 -5.61894 2 0 12 -5.75848 2 0 14 -5.94183 Currently, it supports six models: TransE_l1, TransE_l2, RESCAL, DistMult, ComplEx, and RotatE. Arguments --------- Four arguments are required to provide basic information for predicting missing entities or relations: * ``--model_path``, The path containing the pretrained model, including the embedding files (.npy) and a config.json containing the configuration of training the model. * ``--format``, The format of the input data, specified in ``h_r_t``. Ideally, user should provides three files, one for head entities, one for relations and one for tail entities. But we also allow users to use *\** to represent *all* of the entities or relations. For example, ``h_r_*`` requires users to provide files containing head entities and relation entities and use all entities as tail entities; ``*_*_t`` requires users to provide a single file containing tail entities and use all entities as head entities and all relations. The supported formats include ``h_r_t``, ``h_r_*``, ``h_*_t``, ``*_r_t``, ``h_*_*``, ``*_r_*``, ``*_*_t``. * ``--data_files`` A list of data file names. This is used to provide necessary files containing the input data according to the format, e.g., for ``h_r_t``, the three input files are required and they contain a list of head entities, a list of relations and a list of tail entities. For ``h_*_t``, two files are required, which contain a list of head entities and a list of tail entities. * ``--raw_data``, A flag indicates whether the input data specified by --data_files use the raw Ids or KGE Ids. If True, the input data uses Raw IDs and the command translates IDs according to ID mapping. If False, the data use KGE IDs. Default False. Task related arguments: * ``--exec_mode``, How to calculate scores for triplets and calculate topK. Default 'all'. * ``triplet_wise``: head, relation and tail lists have the same length N, and we calculate the similarity triplet by triplet: result = topK([score(h_i, r_i, t_i) for i in N]), the result shape will be (K,). * ``all``: three lists of head, relation and tail ids are provided as H, R and T, and we calculate all possible combinations of all triplets (h_i, r_j, t_k): result = topK([[[score(h_i, r_j, t_k) for each h_i in H] for each r_j in R] for each t_k in T]), and find top K from the triplets * ``batch_head``: three lists of head, relation and tail ids are provided as H, R and T, and we calculate topK for each element in head: result = topK([[score(h_i, r_j, t_k) for each r_j in R] for each t_k in T]) for each h_i in H. It returns (sizeof(H) * K) triplets. * ``batch_rel``: three lists of head, relation and tail ids are provided as H, R and T, and we calculate topK for each element in relation: result = topK([[score(h_i, r_j, t_k) for each h_i in H] for each t_k in T]) for each r_j in R. It returns (sizeof(R) * K) triplets. * ``batch_tail``: three lists of head, relation and tail ids are provided as H, R and T, and we calculate topK for each element in tail: result = topK([[score(h_i, r_j, t_k) for each h_i in H] for each r_j in R]) for each t_k in T. It returns (sizeof(T) * K) triplets. * ``--topk``, How many results are returned. Default: 10. * ``--score_func``, What kind of score is used in ranking. Currently, we support two functions: ``none`` (score = $x$) and ``logsigmoid`` ($score = log(sigmoid(x))$). Default: 'none'. * ``--gpu``, GPU device to use in inference. Default: -1 (CPU) Input/Output related arguments: * ``--output``, the output file to store the result. By default it is stored in result.tsv * ``--entity_mfile``, The entity ID mapping file. Required if Raw ID is used. * ``--rel_mfile``, The relation ID mapping file. Required if Raw ID is used. Examples -------- The following command predicts the K most likely relations and tail entities for each head entity in the list using a pretrained TransE_l2 model (--exec_mode ‘batch_head’). In this example, the candidate relations and the candidate tail entities are given by the user.:: # Using PyTorch Backend dglke_predict --model_path ckpts/TransE_l2_wn18_0/ --format 'h_r_t' --data_files head.list rel.list tail.list --score_func logsigmoid --topK 5 --exec_mode 'batch_head' # Using MXNet Backend MXNET_ENGINE_TYPE=NaiveEngine DGLBACKEND=mxnet dglke_predict --model_path ckpts/TransE_l2_wn18_0/ --format 'h_r_t' --data_files head.list rel.list tail.list --score_func logsigmoid --topK 5 --exec_mode 'batch_head' The output is as:: src rel dst score 1 0 12 -5.11393 1 0 18 -6.10925 1 0 13 -6.66778 1 0 17 -6.81532 1 0 19 -6.83329 2 0 17 -5.09325 2 0 18 -5.42972 2 0 20 -5.61894 2 0 12 -5.75848 2 0 14 -5.94183 ... The following command finds the most likely combinations of head entities, relations and tail entities from the input lists using a pretrained DistMult model:: # Using PyTorch Backend dglke_predict --model_path ckpts/DistMult_wn18_0/ --format 'h_r_t' --data_files head.list rel.list tail.list --score_func none --topK 5 # Using MXNet Backend MXNET_ENGINE_TYPE=NaiveEngine DGLBACKEND=mxnet dglke_predict --model_path ckpts/DistMult_wn18_0/ --format 'h_r_t' --data_files head.list rel.list tail.list --score_func none --topK 5 The output is as:: src rel dst score 6 0 15 -2.39380 8 0 14 -2.65297 2 0 14 -2.67331 9 0 18 -2.86985 8 0 20 -2.89651 The following command finds the most likely combinations of head entities, relations and tail entities from the input lists using a pretrained TransE_l2 model and uses Raw ID (turn on --raw_data):: # Using PyTorch Backend dglke_predict --model_path ckpts/TransE_l2_wn18_0/ --format 'h_r_t' --data_files raw_head.list raw_rel.list raw_tail.list --topK 5 --raw_data --entity_mfile data/wn18/entities.dict --rel_mfile data/wn18/relations.dict # Using MXNet Backend MXNET_ENGINE_TYPE=NaiveEngine DGLBACKEND=mxnet dglke_predict --model_path ckpts/TransE_l2_wn18_0/ --format 'h_r_t' --data_files raw_head.list raw_rel.list raw_tail.list --topK 5 --raw_data --entity_mfile data/wn18/entities.dict --rel_mfile data/wn18/relations.dict The output is as:: head rel tail score 08847694 _derivationally_related_form 09440400 -7.41088 08847694 _hyponym 09440400 -8.99562 02537319 _derivationally_related_form 01490112 -9.08666 02537319 _hyponym 01490112 -9.44877 00083809 _derivationally_related_form 05940414 -9.88155