daumkakao S2Graph : A large-scale graph database with Hbase
Reference 1. HBase Conference 2015 1.http://www.slideshare.net/HBaseCon/use-cases-session-5 2.https://vimeo.com/128203919 2. Deview 2015 3. Apache Con BigData Europe 1.http://sched.co/3ztM 4. Github: https://github.com/daumkakao/s2graph 2
Our Social Graph Listen count : Advertise Coupon Message price : Emoticon affinity affinity affinity affinity: Like count : 7 affinity Friend Style share : 3 affinity Eat Write rating : length : affinity Play affinity level: 6 affinity View Comment count : Read affinity Present price : 3 Search Group keyword size : 6 : 3
Our Social Graph Music ID : 603 Ad ID : 603 Listen count : 6 Advertise ctr : 0.32 Message ID : 201 Message length : 9 affinity 4 affinity 3 affinity 6 affinity: 9 affinity 9 Item ID : 13 Style Friend share : 3 affinity 1 Write length : 3 affinity 3 Play affinity 3 Post ID : 97 level: 6 affinity 2 Search Comment keyword length : 15 : “HBase" affinity 2 4 Game ID : 1984
Technical Challenges 1. Large social graph constantly changing a. Scale more than, social network: 10 billion edges, 200 million vertices, 50 million update on existing edges. user activities: over 1 billion new edges per day 5
Technical Challenges (cont) 2. Low latency for breadth first search traversal on connected data. a. performance requirement peak graph-traversing query per second: 20000 response time: 100ms 6
Technical Challenges (cont) 3. Realtime update capabilities for viral effects Fast Fast Fast Person A Person B Person C Person D Post Comment Sharing Mention 7
Technical Challenges (cont) 4. Support for Dynamic Ranking logic a. Push strategy: Hard to change data ranking logic dynamically. b. Pull strategy: Enables user to try out various data ranking logics. 8
Before Messaging SNS Blog App App App Friend relationship SNS feeds Blog user activities Messaging Each app server should know each DB’s sharding logic. Highly inter-connected architecture 9
After Messaging SNS Blog App App App S2Graph DB stateless app servers 10
daumkakao What is S2Graph?
What is S2Graph? Storage-as-a-Service + Graph API = Realtime Breadth First Search 12
Chat Room Message 1 Message 1 Message 1 Example: Messanger Data Model Contains Participates Recent messages in my chat rooms. SELECT a.* FROM user_chat_rooms a, chat_room_messages b WHERE a.user_id = 1 AND a.chat_room_id = b.chat_room_id WHERE b.created_at >= yesterday 13
Chat Room Message 1 Message 1 Message 1 Example: Messanger Data Model Contains Participates Recent messages in my chat rooms. curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": “user_id", "id":1}], "steps": [ [{"label": "user_chat_rooms", "direction": "out", "limit": 100}], // step [{"label": "chat_room_messages", "direction": "out", "limit": 10, “where”: “created_at >= yesterday”}] ] } 14 '
Post1 Post 2 Post 3 Example: News Feed (cont) create/like/share posts Friends Posts that my friends interacted. SELECT a.*, b.* FROM friends a, user_posts b WHERE a.user_id = b.user_id WHERE b.updated_at >= yesterday and b.action_type in (‘create’, ‘like’, ‘share’) 15
Post1 Post 2 Post 3 Example: News Feed (cont) create/like/share posts Friends Posts that my friends interacted. curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": “user_id", "id":1}], "steps": [ [{"label": "friends", "direction": "out", "limit": 100}], // step [{"label": “user_posts", "direction": "out", "limit": 10, “where”: “created_at >= yesterday”}] ] } 16 '
Product 1 Product2 Product 3 Example: Recommendation(User-based CF) (cont) Batch user-product interaction (click/buy/like/share) Similar Users Products that similar user interact recently. SELECT a.* , b.* FROM similar_users a, user_products b WHERE a.sim_user_id = b.user_id AND b.updated_at >= yesterday 17
Product 1 Product2 Product 3 Example: Recommendation(User-based CF) (cont) Batch user-product interaction (click/buy/like/share) Similar Users Products that similar user interact recently. curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { “filterOut”: {“srcVertices”: [{“serviceName”: “s2graph”, “columnName”: “user_id”, “id”: 1}], “steps”: [[{“label”: “user_products_interact”}]] }, "srcVertices": [{"serviceName": "s2graph", "columnName": “user_id", "id":1}], "steps": [ [{"label": “similar_users", "direction": "out", "limit": 100, “where”: “similarity > 0.2”}], // step [{"label": “user_products_interact”, "direction": "out", "limit": 10, “where”: “created_at >= yesterday and price >= 1000”}] ] } 18 '
Product 1 Product2 Product 3 Product 1 Product 1 Product 1 Example: Recommendation(Item-based CF) (cont) Batch user-product interaction Similar Products (click/buy/like/share) Products that are similar to what I have interested. SELECT a.* , b.* FROM similar_ a, user_products b WHERE a.sim_user_id = b.user_id AND b.updated_at >= yesterday 19
Product 1 Product 1 Product 1 Product 3 Product2 Product 1 Example: Recommendation(Item-based CF) (cont) Batch user-product interaction Similar Products (click/buy/like/share) Products that are similar to what I have interested. curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": “user_id", "id":1}], "steps": [ [{"label": “user_products_interact", "direction": "out", "limit": 100, “where”: “created_at >= yesterday and price >= 1000”}], [{"label": “similar_products”, "direction": "out", "limit": 10, “where”: “similarity > 0.2”}] ] } 20 '
Product10 Product20 Product1 Product2 Product 3 Product20 Category1 Category2 Product10 Example: Recommendation(Content + Most popular) (cont) user-product interaction TopK(k=1) product per timeUnit(day) (click/buy/like/share) Today Yesterday Today Yesterday Daily top product per categories in products that I liked. SELECT c.* FROM user_products a, product_categories b, category_daily_top_products c WHERE a.user_id = 1 and a.product_id = b.product_id and b.category_id = c.category_id and c.time between (yesterday, today) 21
Product1 Product 3 Product10 Product20 Product20 Product10 Category2 Category1 Product2 Example: Recommendation(Content + Most popular) (cont) user-product interaction TopK(k=1) product per timeUnit(day) (click/buy/like/share) Today Yesterday Today Yesterday Daily top product per categories in products that I liked. curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": “user_id", "id":1}], "steps": [ [{"label": “user_products_interact", "direction": "out", "limit": 100, “where”: “created_at >= yesterday and price >= 1000”}], [{“label”: “product_cates”, “direction”: “out”, “limit”: 3}], [{"label": “category_products_topK”, "direction": "out", "limit": 10] ] } 22 '
Product 1 Product2 Product 3 Example: Recommendation(Spreading Activation) (cont) user-product interaction (click/buy/like/share) Products that is interacted by users who interacted on products that I interact SELECT b.product_id, count(*) FROM user_products a, user_products b WHERE a.user_id = 1 AND a.product_id = b.product_id GROUP BY b.product_id 23
Product 1 Product2 Product 3 Example: Recommendation(Spreading Activation) (cont) user-product interaction (click/buy/like/share) Products that is interacted by users who interacted on products that I interact curl -XPOST localhost:9000/graphs/getEdges -H 'Content-Type: Application/json' -d ' { "srcVertices": [{"serviceName": "s2graph", "columnName": “user_id", "id":1}], "steps": [ [{"label": “user_products_interact", "direction": "out", "limit": 100, “where”: “created_at >= yesterday and price >= 1000”}], [{"label": “user_products_interact", "direction": "in", "limit": 10, “where”: “created_at >= today”}], [{"label": “user_products_interact", "direction": "out", "limit": 10, “where”: “created_at >= 1 hour ago”}], ] } 24 '
Recommend
More recommend