daumkakao S2Graph : A large-scale graph database with Hbase Doyoung Yoon x Taejin Chin
DaumKakao A Mobile Lifestyle Platform 1. KakaoTalk a. Mobile Messenger replacing SMS b. ‘KaTalkHe’ is being used as a verb in Korea like ‘Googling’ c. 96% of Korean smartphone users are using KakaoTalk d. 170M users worldwide e. 3B messages / day 2
DaumKakao A Mobile Lifestyle Platform Social Contents Commerce Marketing Local Personal Platform Platform Platform Platform Platform Platform KakaoTalk Media Daum KakaoPick Yellow ID Daum Map KakaoHome KakaoStory KakaoGame Gift Shop Plus Friend KakaoPlace Sol calendar Digital Item Store KakaoGroup KakaoStyle Story Plus Sol Mail Daum Cafe KakaoTopic Daum Cluod 96% of Zap KakaoPage Korean smartphone Biggest mobile users are using KakaoTalk SNS in Korea messenger, 170 million Sol Group KakaoMusic users worldwide) Daum tvPot Daum Webtoon 3
Our Social Graph Listen count : Advertise Coupon Message price : Emoticon affinity affinity affinity affinity: Like count : 7 affinity Friend Pick withFriend : affinity Eat Write rating : length : affinity Play affinity level: 6 affinity View Comment count : Read affinity Present price : 3 Search Group keyword size : 6 : 4
Our Social Graph Music ID : 603 Ad ID : 603 Listen count : 6 Advertise ctr : 0.32 Message ID : 201 Message length : 9 affinity 4 affinity 3 affinity 6 affinity: 9 affinity 9 Item ID : 13 Pick Friend withFriend : 3 affinity 1 Write length : 3 affinity 3 Play affinity 3 Post ID : 97 level: 6 affinity 2 Search Comment keyword length : 15 : “HBase" affinity 2 5 Game ID : 1984
Technical Challenges 1. Large social graph constantly changing a. Scale more than, social network: 10 billion edges, 200 million vertices, 50 million update on existing edges. user activities: 400 million new edges per day 6
Technical Challenges (cont) 2. Low latency for breadth first search traversal on connected data. a. performance requirement peak graph-traversing query per second: 20000 response time: 100ms 7
Technical Challenges (cont) 3. Update should be applied to graph in real time for viral effect Fast Fast Fast Person A Person B Person C Person D Post Comment Sharing Mention 8
Technical Challenges (cont) 4. Support for Dynamic Ranking logic a. push strategy: hard to change data ranking logic dynamically. b. pull strategy: can try various data ranking logic 9
Before Messaging SNS Blog App App App Friend relationship SNS feeds Blog user activities Messaging Each app server should know each DB’s sharding logic. Highly inter-connected architecture 10
After Messaging SNS Blog App App App S2Graph DB stateless app servers 11
S2Graph : Distributed Online GraphDB 1.Low-latency 2.Graph-traversable 3.Scalable 4.Eventually consistent 5.Asynchronous, non-blocking 12
Why We Choose HBase? 1.High Availability 2.Scalability 3.Low latency 4.High concurrency 5.Fault tolerant 6.Integration with HDFS 7.Distributed operation 13
The Data Model vertex 1 out edges vertex 2 in edges edge 2 label edge 2 source vertex edge 2 target vertex 1. Columns 2. Labels 3. Directions 4. Index Properties 1 3 2 comment 5. Non-index Properties 5 created know date = 20150507 4 name = “josh” edge 5 properties age = 32 vertex 4 id vertex 4 properties 14
How to store the data - Edge Logical View 1. Snapshot edges : Up-to-date status of edge column Tgt Vertex ID1 Tgt Vertex ID2 Tgt Vertex ID3 row Src Vertex ID1 Properties Properties Properties Src Vertex ID2 Properties Properties Properties a. Fetching an edge between two specific vertex b. Lookup Table to reach indexed edges for update, increment, delete operations 15
How to store the data - Edge Logical View 2. Indexed edges : Edges with index column Index Values | Tgt Vertex ID1 Index Values | Tgt Vertex ID2 row Src Vertex ID1 Non-index Properties Non-index Properties a. Fetches edges originating from a certain vertex in order of index 16
How to store the data - Edge Physical View - table schema 1. Snapshot Edge a. Rowkey Murmur Hash Src Vertex ID Label ID Direction Index Sequence Is Inverted 16 bit variable length 30 bit 2 bit 7bit 1 bit Vertex IDs can be encoded with 8 bit header + byte array (long, integer, short, byte, string) 17
How to store the data - Edge Physical View - table schema 1. Snapshot Edge c. Value b. Qualifier Target Vertex ID All Property Key Value Pairs variable length variable length 18
How to store the data - Edge Physical View - table schema 2. Indexed Edge a. Rowkey Murmur Hash Src Vertex ID Label ID Direction Index Sequence Is Inverted 16 bit variable length 30 bit 2 bit 7bit 1 bit Vertex IDs can be encoded with 8 bit header + byte array (long, integer, short, byte, string) 19
How to store the data - Edge Physical View - table schema 2. Indexed Edge c. Value b. Qualifier Index Property Values Tgt Vertex ID Non-index Property Key Value Pairs variable length variable length variable length 20
How to store the data - Vertex Logical View 1. Vertex : Up-to-date status of Vertex column Property Key1 Property Key2 row Src Vertex ID1 Value1 Value2 Vertex ID2 Value1 Value2 21
How to store the data - Vertex Physical View - table schema 1. Vertex : Up-to-date status of Vertex a. Rowkey Murmur Hash Column ID Vertex ID 16 bit integer(32bit) variable length b. Qualifier c. Value Property Key Property Value Byte(8 bit) variable length 22
How to read the data - GetEdges Using a custom query DSL on top of HTTP User 1 curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { Step 1 Friends Friends "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], "steps": [ friend 1 friend 2 [{"label": "friends", "direction": "out", "limit": 100}], // step [{"label": "hear", "direction": "out", "limit": 10}] ] hear hear hear Step 2 } time: 20140502 time: 20140712 time: 20141116 ' Steps = a list of Step Step = contains the labels to traverse Don’t let go let it be let it go and how to rank them in the result 23
How to read the data - GetEdges Example Friend list User 1 curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], "steps": [ Friends Friends [{"label": "friends", "direction": "out", "limit": 100}], // step ] } ' friend 1 friend 2 24
How to read the data - GetEdges Example Songs my friends have listened User 1 curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { Friends Friends "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], "steps": [ friend 1 friend 2 [{"label": "friends", "direction": "out", "limit": 50, “scoring”: {“score”: 1.0}], [{"label": "listen", "direction": "out", "limit": 10}] ] hear hear hear } time: 20140502 time: 20140712 time: 20141116 ' Don’t let go let it be let it go Reference : https://github.com/daumkakao/s2graph#1-definition 25
How to read the data - GetEdges Example Similar songs to songs that I have listened to. User 1 curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { hear hear hear "srcVertices": [{"serviceName": "s2graph", "columnName": "account_id", "id":1}], time: 20140502 time: 20140712 time: 20141116 "steps": [ [{"label": "listen", "direction": "out", "limit": 50}], [{"label": "similar_song", "direction": "out", "limit": 10, “scoring”: {“score”: 1.0}] ] Don’t let go let it be let it go } similar_song similar_song similar_song similarity: 0.3 similarity: 0.4 similarity: 0.6 let it bleed Hey jude Do you wanna build a snowman? 26
How to read the data - GetVertices curl -XPOST localhost:9000/graphs/getVertices -H 'Content-Type: Application/json' -d ' [ {"serviceName": "s2graph", "columnName": "account_id", "ids": [1, 2, 3]}, {"serviceName": "kakaomusic", "columnName": "user_id", "ids": [1, 2, 3]} ] ' User 1 User 2 {created_at:20070812, {created_at:201206132, updated_at:20150507} updated_at:20140505} 27
How to write the data - Insert curl -XPOST localhost:9000/graphs/edges/insert -H 'Content-Type: Application/json' -d ' [ {"from":1,"to":2,"label":"graph_test","props":{"time":-1, "weight":10},"timestamp":1417616431}, ] ' User 1 User 2 28
Recommend
More recommend