## Blog: Knowledge Graph(KG) for Recommendation System

This is an introduction on how to integrate Knowledge Graph with Recommendation System.

### 1. Introduction

### 1.1 What’s KG

Knowledge Graph is a semantic network in NLP proposed by Google. It could be simplified as a graph with some nodes and edges, in it we call the node as Entity and the edge as Relation.

In real world, the **entity** could be a person, a company or a concept, and the **relation** is just the relationship between different entities.

The following graph is a simple ideal KG, which shows the relationships between different entities.

### 1.2 Why we need

For Recommendation System, the **basic step** is to extract as many as **valuable candidates** before evaluating and sending to users.

Commonly, the hashing-based and quantitations-based models are implemented for generating **structural similarly **candidates, and most of them are built via word embeddings, which is a generalized embeddings instead of domain specific embeddings, so its performance won’t be good enough if we focus on some specialized fields.

KG is built as semantic network, which means we could calculate the **semantic similarities** between the entities. And semantic similar entities will offer some different kinds of candidates for Recommendation System.

Besides, KG is helpful to provide **recommendation explanation** and solve the problem of **data sparsity.**

### 1.3 How this work

#### 1.3.1 Knowledge Graph Embedding(KGE)

One approach by applying KG into RS is using the KGE(Knowledge Graph Embedding). By replacing the FastText embedding, or using KGE as additional feature vectors, we could improve the performance of recommendation.

The following graph shows common RS engine, and another 2 engine combining with KG:

Note:

GE(Graph Embedding) could be taken as a special case of KGE when the Entities are only persons/companies, the relationship is limited to one type, and the embedding algorithms will only handle with one to one relationship representation.

#### 1.3.2 Knowledge Inference

Suppose we built a KG as the following picture shows.

In this KG, there are 4 companies, and they are:

- SalesSeek
- SalesForce
- Zoom
- Slack

All of them provides some software/tools for clients. It’s hard to inference the similarity based on the users usage because all the users(xxx@abc.com & yyy@opq.com) are using them.

However, we could easily found the difference of the 4 companies by their product purpose. SalesSeek & Salesforce focus on sales fields while Zoom & Slack are more used as team work tool.

Thus, we could found Zoom & Slack are more related to each other, and SalesSeek & Salesforce could be taken as another group.

The following graph shows the inference result:

### 2. Architecture

Looking inside of KG, there are 3 basic layer and they are:

- storage layer: determine the graph data storage
- data layer: determine the triple data structure and the entity & relation set
- calculation layer: the optional algorithms for recommendation based on KG

I will give detailed specification in Section #Roadmap. See the following graph for an overview:

### 3. KG Roadmap

### 3.1 Roadmap Overview

### 3.2 Triple Generation

**Triple Definition**

KG is consist of Entities and Relations, and a triple is a basic operator to describe a KG.

A triple could be defined as :

For any KG, the E and R should be limited. Unlimited E or R will only increase the calculation complexity and is not applicable to business application.

Thus, before generating a triple, the **first step** is to set the possible choices in E and R.

**Entity Set**

For us, there are 3 kinds of entity we need generate:

- physical entity: a person, a company, or a product.
- concepts entity: the concepts of specialties domains, industry categories, technologies, location, and etc.

For physical entity, we can simply take its ID as a single entity.

For concepts entities, we need do determine what the concepts should be concluded in.

In all, for every filed we need to use in KG, we need **build the category label/tag/structure of the field**, and usually we need Classification or Entity Resolution models for generating the tags or class labels.

**Relation Set**

For the structured data, the relation set could be very simple, which means, we could directly use the fields as relationship type.

For unstructured data, it takes some extra effort to generate the relation set. This will be discussed in another article.

### 3.3 KGE

#### 3.3.1 TransE[1]

TransE is proposed on NIPS, and it’s a classic embedding algorithm for handling data with multiple relations.

Let use (h,r,t) represent the triple, and the key idea of TransE is:

Its optimization target is maximize the margin loss of distance between positive data and negative sampling data. The loss function is:

For details, check the Reference [1].

### 3.3.2 TransH[2]

Different with TransE, TransH is trying to projecting the vector into a hyperplane, and then training the same loss.

The key idea of TransH is:

Read Reference [2] for details.

### 3.3.3 TransR[3]

Similar to TransE, but TransR try to build the relation in different space for different relation type. Which means:

For every relation type r, there’s a specified transformed matrix named Mr, and the key idea is:

Read Reference [3] for details.

### 3.3.4 Other Embedding Algorithms

Besides the above approaches, there still are approaches such as TransD, which is doing similar thing as the 3 algorithms listed above.

In another hand, as I mentioned GE before, we could extract a subgraph of KG to build a ‘Capture Network’(maybe a ‘social network’). After that, we could using some embedding algorithm like Deep-walk[5], GCNN, SDNE[6], etc to represent the entity in a subgraph.

### 3.4 Knowledge Inference

Knowledge Inference models are more like a end to end model instead of the embedding representation via KGE.

### 3.4.1 Deep Knowledge-Aware Network(DKN)[7]

DKN is based on KGE algorithms, it need KGE generates vectors for entities and relations, and then use CNN & Attention Network to evaluate history records.

For the vectors, it combines the vector from different embedding algorithm, and every vector will be taken as a channel for CNN. The embedding algorithms it mentioned are:

- word embedding
- entity embedding
- relation embedding

For more details, read Reference [7].

### 3.4.2 Ripple Network[8]

Ripple Network is another kinds of knowledge inference algorithm which focus on the interest broadcast path.

It supposes that users interest are like ripple which expanding from a point to outside, and try to evaluate the best path in KG for recommendation.

Following graph is from the paper, which visualizes the interest expanding process on KG. For details you could read Reference [8].

### 3.4.3 More Inference Algorithms

There are more inference algorithms, such as Collaborative Knowledge Base Embedding[9], Multi-task Learning for KG enhanced Recommendation[10], MetaGraph[11], and etc.

They are all listed in the section References.

### 4. References

- Bordes, Antoine, et al. “Translating embeddings for modeling multi-relational data.” Advances in neural information processing systems. 2013.
- Wang, Zhen, et al. “Knowledge graph embedding by translating on hyperplanes.” Twenty-Eighth AAAI conference on artificial intelligence. 2014.
- Lin, Yankai, et al. “Learning entity and relation embeddings for knowledge graph completion.” Twenty-ninth AAAI conference on artificial intelligence. 2015.
- Ji, Guoliang, et al. “Knowledge graph embedding via dynamic mapping matrix.” Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Vol. 1. 2015.
- Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. “Deepwalk: Online learning of social representations.” Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014.
- Wang, Daixin, Peng Cui, and Wenwu Zhu. “Structural deep network embedding.” Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2016.
- Wang, Hongwei, et al. “DKN: Deep knowledge-aware network for news recommendation.” Proceedings of the 2018 World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2018.
- Wang, Hongwei, et al. “RippleNet: Propagating user preferences on the knowledge graph for recommender systems.” Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 2018.
- Zhang, Fuzheng, et al. “Collaborative knowledge base embedding for recommender systems.” Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 2016.
- Wang, Hongwei, et al. “Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation.” arXiv preprint arXiv:1901.08907 (2019).
- Zhao, Huan, et al. “Meta-graph based recommendation fusion over heterogeneous information networks.” Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2017.

*Source: Artificial Intelligence on Medium*