polardb for myrocks
play

POLARDB for MyRocks Extending shared storage to MyRocks Zhang, Yuan - PowerPoint PPT Presentation

POLARDB for MyRocks Extending shared storage to MyRocks Zhang, Yuan Alibaba Cloud Apr, 2018 About me Yuan Zhang database engineer Work at Ailbaba for 5 years Focus on MySQL & MyRocks email


  1. POLARDB for MyRocks Extending shared storage to MyRocks Zhang, Yuan Alibaba Cloud Apr, 2018

  2. About me • Yuan Zhang • database engineer • Work at Ailbaba for 5 years • Focus on MySQL & MyRocks • email : zhangyuan.zy@alibaba-inc.com MORE THAN JUST CLOUD

  3. Agenda • Background • Basic Architecture • Implementation details • Performance Improment • Future plan MORE THAN JUST CLOUD

  4. Background Why POLARDB for MyRocks MyRocks + Polarstore Benifits from MyRocks • Greate space efficiency, better compression • Greate write efficiency, lower write amplification • Fast data loading • Compatiable with MySQL Benifits from share-storage(polarstore) • Promising data consistency • Ability to scale read node immediately without full copy of data MORE THAN JUST CLOUD

  5. Basic Architecture Primary • Accept Read/Write workload Replica • Only Accept Read workload • Share sst/wal with primary MORE THAN JUST CLOUD

  6. Let’s Begin prepare for rocksdb wal replication • Base on AIiSQL5.7 • Port MyRocks from Facebook • Only support RocksDB and MyISAM engine • Convert system tables to RocksDB MORE THAN JUST CLOUD

  7. Convert system tables to RocksDB Prepare for RocksDB WAL replication • Convert system tables to RocksDB • Except mysql.slow_log, mysql.general_log, they store in local disk, primary and replica have their owen mysql.slow_log, mysql.general_log tables. MORE THAN JUST CLOUD

  8. Rocksdb WAL/Manifest replication Architecture MORE THAN JUST CLOUD

  9. Rocksdb WAL/Manifest replication Asynchronous replication WAL Replication • Replay PUT/DELETE/MERGE Manifest Replicaion • Replay flush & compaction WAL and Manifest Coordination • Only apply VEdit while Applied lsn > VEdit lsn MORE THAN JUST CLOUD

  10. Rocksdb WAL/Manifest replication Control Primary WAL and SST files deletion WAL deletion - original wal deletion will lead Replica lost wal • Lm : min_log_number on Primary • Ln : min_log_number on all Replicas • new_min_log_number = min( Lm , Ln ) • When WAL’s number < new_min_log_number , then this WAL can be deleted SST deletion - original SST deleteion will lead Replica cannot find SST and crash • min_version_number : the minimal version number replica is using • SST can be deleted only when It will’t be used by Primary and all Replicas MORE THAN JUST CLOUD

  11. DDL&Cache replication Architecture MORE THAN JUST CLOUD

  12. DDL Replication Remove frm,par files Frm,par files • Table metadata information • If Master and replica share frm,par files, DDL replication must be synchronous Remove frm,par files • Store these contents in RocksDB • Replica can read multi version of table schema • DDL replication is asynchronous MORE THAN JUST CLOUD

  13. DDL Replication Remove frm,par files DDL replication is asynchronous • Multiple Table schema version in rocksdb • Row data also have different verisions MORE THAN JUST CLOUD

  14. DDL Replication We have MDL lock to protect DDL operation in Primary. This lock also need in Replica’s DDL. Primary • Log MDL lock start and end. Replica • Replay MDL lock start A. lock MDL • Replay MDL lock end A. update table cache in myrocks B. unlock MDL MORE THAN JUST CLOUD

  15. Cache Replication ACL, Procedure, Query cache Replicaition Primary • Log cache change in RocksDB WAL ACL, Procedure Replica • Replay this change from WAL and invaild this cache MORE THAN JUST CLOUD

  16. Index Statistics Replication Persistent • Part index statistics information persist in each SST • Total index statistics store in INDEX_STATISTICS Memory • Rdb_dey_def::m_stats Update • Analyze table • Flush memtable • Compact 
 Replica listen PUT operation in INDEX_STATISTICS and reload statistic info to memory. MORE THAN JUST CLOUD

  17. New Log Format log change for replication Log Types • DDL(START, END) • Cache change, ACL/Proc Log format • PUT/DELETE Log store location • __system__ column family MORE THAN JUST CLOUD

  18. New Log Format New type in data dictionary // Data dictionary types enum DATA_DICT_TYPE { enum POLAR_LOG_TYPE { DDL_ENTRY_INDEX_START_NUMBER = 1, INDEX_INFO = 2, TABLE_DDL = 1, CF_DEFINITION = 3, CACHE_CHANGE = 2, BINLOG_INFO_INDEX_NUMBER = 4, …… DDL_DROP_INDEX_ONGOING = 5, INDEX_STATISTICS = 6, END_POLAR_ROCK_TYPE = 255 MAX_INDEX_ID = 7, }; DDL_CREATE_INDEX_ONGOING = 8, POLAR_LOG = 100, // for polar replication END_DICT_INDEX_ID = 255 }; MORE THAN JUST CLOUD

  19. New Log Format New type in data dictionary DDL_START • type: PUT • key: POLAR_LOG+TABLE_DDL+dbname.tablename • value: NULL DDL_END • type: DELETE • key: POLAR_LOG+TABLE_DDL+dbname.tablename • value: NULL CACHE_CHANGE • type: PUT • key: POLAR_LOG+CACHE_CHANGE+ACL/Proc • value: NULL MORE THAN JUST CLOUD

  20. 
 
 New Log Format Problems DDL_START and DDL_END must be a pair. Problem 1: Primary Crash DDL_START • type: PUT • Primary crash after DDL_START , Primary will • key: POLAR_LOG+TABLE_DDL+dbname.tablename resent DDL_START when restart, and the previous • value: NULL DDL_END will lost. DDL_END • type: DELETE • Replica replay DDL_START and hold MDL lock, It • key: POLAR_LOG+TABLE_DDL+dbname.tablename will not unlock with DDL_END • value: NULL MORE THAN JUST CLOUD

  21. New Log Format Problems DDL_START and DDL_END must be a pair. Problem 1: Primary Crash • Primary crash after DDL_START , Primary will resent DDL_START when restart, and the previous DDL_END will lost. • Replica replay DDL_START and hold MDL lock, It will not unlock with DDL_END Solution • Primary Scan RocksDB to find record TABLE_DDL when restart, if found, Primary should resent DDL_END , and Replica will unlock the old lock MORE THAN JUST CLOUD

  22. 
 New Log Format Problems DDL_START and DDL_END must be a pair. Problem 2: Replica Crash • Replica carsh after DDL_START , Replica will continue to replay DDL_END when restart • But the lock with DDL_START will not exist after restart, Replica replay DDL_END to unlock a MDL lock which is not exist MORE THAN JUST CLOUD

  23. 
 New Log Format Problems DDL_START and DDL_END must be a pair. Problem 2: Replica Crash • Replica carsh after DDL_START , Replica will continue to replay DDL_END when restart • But the lock with DDL_START will not exist after restart, Replica replay DDL_END to unlock a MDL lock which is not exist Solution • Replica Scan RocksDB to find record TABLE_DDL when restart, if found, Replica should replay DDL_START to lock MORE THAN JUST CLOUD

  24. MVCC MVCC based on RocksDB snapshot Keep a consistent snapshot in Replica • Replica can’t get the record after Primary compact Control compact in Primary • Compact in Primary should consider about Replica ’s snapshot • Only delete record when sequnce >= Sn , Sn is the laste seqence in Replica • Primary ’s snapshot list merge with replica ’ s snapshot list. MORE THAN JUST CLOUD

  25. MVCC MVCC based on RocksDB snapshot Keep a consistent snapshot in Replica MORE THAN JUST CLOUD

  26. Performance Improment Optimize write performance • Async-commit • Optimize auto_increment • MORE THAN JUST CLOUD

  27. Performance Improment Async-commit Original pipeline write MORE THAN JUST CLOUD

  28. Performance Improment Async-commit Async-commit MORE THAN JUST CLOUD

  29. Performance Improment Optimize write performance Optimize auto_increment • write need check unique • Do Get first then write • Get is expensive Actually, most auto_increment check uniqueness is not necessary. Espacially, when all the auto_incment column is automatically generated. MORE THAN JUST CLOUD

  30. Performance Improment Optimize write performance Optimize auto_increment • max_specify_pk: user sepcified max auto_increment value • if pk > max_specify_pk, skip unique check • if pk <= max_specify_pk nead unique check max_specify_pk update when user use sepcified auto_increment value MORE THAN JUST CLOUD

  31. Future Feature • Online DDL • Multiple-Master Performance • Compaction optimize MORE THAN JUST CLOUD

  32. Q&A MORE THAN JUST CLOUD

More recommend