radosgw bucket index sharding
生活随笔
收集整理的這篇文章主要介紹了
radosgw bucket index sharding
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
每個(gè)key在其對應(yīng)的dir/bucket下都會占有200B左右的空間。當(dāng)dir/bucket下面的key數(shù)量
很多時(shí),這將使得dir對象很大。不僅包含該dir對象的osd會使用很多內(nèi)存,而且當(dāng)dir
對象遷移時(shí)所有對該對象的寫操作都會鎖定[1]。
[root@yhg-2 cmds]# rados -p .rgw.buckets.index listomapvals .dir.yhg-yhg.5457.29 --cluster yhg | moreafuhanvalue: (198 bytes) :0000 : 08 03 c0 00 00 00 06 00 00 00 61 66 75 68 61 6e : ..........afuhan0010 : 02 00 00 00 00 00 00 00 01 04 03 75 00 00 00 01 : ...........u....0020 : 09 00 00 00 00 00 00 00 fc b7 10 57 00 00 00 00 : ...........W....0030 : 20 00 00 00 62 62 62 38 61 61 65 35 37 63 31 30 : ...bbb8aae57c100040 : 34 63 64 61 34 30 63 39 33 38 34 33 61 64 35 65 : 4cda40c93843ad5e0050 : 36 64 62 38 03 00 00 00 78 78 31 11 00 00 00 5a : 6db8....xx1....Z0060 : 6f 6e 65 20 75 73 65 72 20 66 6f 72 20 79 68 67 : one user for yhg0070 : 18 00 00 00 61 70 70 6c 69 63 61 74 69 6f 6e 2f : ....application/0080 : 6f 63 74 65 74 2d 73 74 72 65 61 6d 09 00 00 00 : octet-stream....0090 : 00 00 00 00 00 00 00 00 00 00 00 00 01 01 02 00 : ................00a0 : 00 00 01 02 04 0f 00 00 00 79 68 67 2d 79 68 67 : .........yhg-yhg00b0 : 2e 31 34 31 30 32 2e 37 00 00 00 00 00 00 00 00 : .14102.7........00c0 : 00 00 00 00 00 00 : ......
從hammer開始,就可以使用bucket index shard特性。將bucket的index對象分成多個(gè)
對象。但對于已經(jīng)存在bucket,無法使用該特性。
對于bucket index shard,涉及4類情況[2]:
- bucket 創(chuàng)建
- put/copy/delete 對象
以put為例進(jìn)行分析。
- bucket listing/stats/fix indexing
以list為例進(jìn)行分析。
- bucket index log
有兩種配置方法:
// 獲取region配置[root@yhg-2 cmds]# radosgw-admin region get --cluster yhg > /tmp/region// 將 'bucket_index_max_shards'的值修改為 4[root@yhg-2 cmds]# vim /tmp/region// 更新region配置[root@yhg-2 cmds]# radosgw-admin region put --cluster yhg < /tmp/region// 更新到region map[root@yhg-2 cmds]# radosgw-admin regionmap update --cluster yhg
[client.radosgw.yhg-yhg-yhg-2]rgw frontends = "civetweb port=80"rgw bucket index max shards = 4
.. Note:: 需要重新啟動radosgw實(shí)例才能使得配置生效。
創(chuàng)建一個(gè)名為mmm的bucket。
// 查看bucket的id[root@yhg-2 cmds]# radosgw-admin metadata get bucket:mmm --cluster yhg | grep bucket_id"bucket_id": "yhg-yhg.14236.1"// 查看bucket 實(shí)例信息[root@yhg-2 cmds]# radosgw-admin metadata get bucket.instance:mmm:yhg-yhg.14236.1 --cluster yhg | grep shard"num_shards": 4,"bi_shard_hash_type": 0// 查看.dir對象分片[root@yhg-2 cmds]# rados -p .rgw.buckets.index ls --cluster yhg | grep "yhg-yhg.14236.1".dir.yhg-yhg.14236.1.2.dir.yhg-yhg.14236.1.0.dir.yhg-yhg.14236.1.1.dir.yhg-yhg.14236.1.3
從[1]中看,該值不適合配置太大,否則嚴(yán)重影響bucket list操作。具體設(shè)置為多少,
需要測試來確定。
index shard對象。 對象的名字為.dir_XXX.$NUM。
2593 int RGWRados::init_bucket_index(rgw_bucket& bucket, int num_shards) 2594 { 2595 librados::IoCtx index_ctx; // context for new bucket 2596 // 創(chuàng)建到集群中bucket.index_pool的IoCtx,用于后續(xù)對該pool進(jìn)行操作// 本例中bucket.index_pool是 .rgw.buckets.index2597 int r = open_bucket_index_ctx(bucket, index_ctx); 2598 if (r < 0) 2599 return r; 2600 // 拼接bucket的.dir_XXX對象名字// 例如:.dir.yhg-yhg.5457.24// 其中, '.dir'為統(tǒng)一前綴,yhg-yhg.5457.24為bucket id/marker2601 string dir_oid = dir_oid_prefix; 2602 dir_oid.append(bucket.marker); 2603 // 獲取.dir_XXX對象map。如果沒有打開index shard特性,該map中只有一個(gè)// 項(xiàng),就是<0, '.dir_XXX'>。// 如果打開了index shard特性,.dir_XXX為分成num_shards個(gè)對象。// 名字為.dir_XXX.$NUM 2604 map<int, string> bucket_objs; 2605 get_bucket_index_objects(dir_oid, num_shards, bucket_objs); 2606 // 調(diào)用了ceph osd端cls操作 'rgw bucket_init_index',// 即調(diào)用了CLSRGWConcurrentIO()2607 return CLSRGWIssueBucketIndexInit(index_ctx, bucket_objs, cct->_conf->rgw_bucket_index_max_aio)();2608 }
// cls/rgw/cls_rgw_client.h246 int operator()() {247 int ret = 0; 248 iter = objs_container.begin(); 249 for (; iter != objs_container.end() && max_aio-- > 0; ++iter) { // 最終調(diào)用了issue_bucket_index_init_op// 即,調(diào)用了集群端cls操作, 'rgw bucket_init_index'250 ret = issue_op(iter->first, iter->second);
初始化index對象的rgw_bucket_dir_header信息。
// cls/rgw/cls_rgw.cc566 int rgw_bucket_init_index(cls_method_context_t hctx, bufferlist *in, bufferlist *out)
rgw_bucket_dir_header持久化為omap header。具體存放的內(nèi)容為(結(jié)合encode方法看):
// key為 header542 struct rgw_bucket_dir_header { 543 map<uint8_t, rgw_bucket_category_stats> stats; 544 uint64_t tag_timeout; 545 uint64_t ver; // 本次操作時(shí),只有有ver被設(shè)置為了1 546 uint64_t master_ver; 547 string max_marker?
在將對象的各個(gè)數(shù)據(jù)片段寫入數(shù)據(jù)pool后,需要更新bucket index的信息。在
"解析in中的op信息,將新key內(nèi)容寫到bucket dir對象的omap中。"步驟時(shí)的操作如下:
// RGWRados::Bucket::UpdateIndex::prepare3468 r = index_op.prepare(CLS_RGW_OP_ADD);// RGWRados::Bucket::UpdateIndex::complete3487 r = index_op.complete(poolid, epoch, size, 3488 ut, etag, content_type, &acl_bl, 3489 meta.category, meta.remove_objs); ?
上面的兩步都需要確定bucket index信息。具體對于bucket index分片的確定,在
BucketShard對象的初始化過程中完成。prepare和complete中都是通過調(diào)用
get_bucket_shard()來取定BucketShard信息的。下面對BucketShard bs的初始化過程
進(jìn)行分析。
// rgw/rgw_rados.h1529 int get_bucket_shard(BucketShard **pbs) { 1530 if (!bs_initialized) { 1531 int r = bs.init(bucket_info.bucket, obj);// rgw/rgw_rados.cc4589 int RGWRados::open_bucket_index_shard(rgw_bucket& bucket, librados::IoCtx& index_ctx, 4590 const string& obj_key, string *bucket_obj, int *shard_id) // 建立index base對象所在pool的io上下文,// 并返回拼接好的index base對象名字.dir.${bucket.marker}4592 string bucket_oid_base; 4593 int ret = open_bucket_index_base(bucket, index_ctx, bucket_oid_base);// 從bucket meta對象(.bucket.meta.${bucket.name}:${bucket.marker})中讀出// bucket的描述信息4599 // Get the bucket info 4600 RGWBucketInfo binfo; 4601 ret = get_bucket_instance_info(obj_ctx, bucket, binfo, NULL, NULL); // 采用簡單的hash算法,計(jì)算出shard id,并拼接出bucket index對象的名字// 比如,.dir.yhg-yhg.14236.1.14605 ret = get_bucket_index_object(bucket_oid_base, obj_key, binfo.num_shards, 4606 (RGWBucketInfo::BIShardsHashType)binfo.bucket_index_shard_hash_type, bucket_obj, shard_id)
BucketShard信息:
(gdb) print *bs$62 = {store = 0x3301c70,bucket = {name = "mmm",data_pool = "dpool1",data_extra_pool = ".rgw.buckets.extra",index_pool = ".rgw.buckets.index",marker = "yhg-yhg.14236.1",bucket_id = "yhg-yhg.14236.1",oid = ".bucket.meta.mmm:yhg-yhg.14236.1"},shard_id = 1,index_ctx = {io_ctx_impl = 0x7f5074008d10},bucket_obj = ".dir.yhg-yhg.14236.1.1"}
binfo內(nèi)容如下:
其中對于index shard來說關(guān)注的有num_shards和bucket_index_shard_hash_type。
(gdb) print binfo$53 = {bucket = {name = "mmm",data_pool = "dpool1",data_extra_pool = ".rgw.buckets.extra",index_pool = ".rgw.buckets.index",marker = "yhg-yhg.14236.1",bucket_id = "yhg-yhg.14236.1",oid = ".bucket.meta.mmm:yhg-yhg.14236.1"},owner = "xx1",flags = 0,region = "yhg",creation_time = 1461035367,placement_rule = "default-placement",has_instance_obj = true,objv_tracker = {read_version = {ver = 1,tag = "_TrpC7B0VOdoBEkokzucAQtd"},write_version = {ver = 0,tag = ""}},ep_objv = {ver = 0,tag = ""},quota = {max_size_kb = -1,max_objects = -1,enabled = false,max_size_soft_threshold = -1,max_objs_soft_threshold = -1},num_shards = 4,bucket_index_shard_hash_type = 0 '\000',static NUM_SHARDS_BLIND_BUCKET = 4294967295}
**總結(jié)**
需要獲取BucketShard信息時(shí),套路如下
6589 BucketShard bs(this); 6590 int ret = bs.init(bucket, obj_instance)
// rgw/rgw_rados.cc2396 /** 2397 * get listing of the objects in a bucket. 2398 * bucket: bucket to list contents of 2399 * max: maximum number of results to return 2400 * prefix: only return results that match this prefix 2401 * delim: do not include results that match this string. 2402 * Any skipped results will have the matching portion of their name 2403 * inserted in common_prefixes with a "true" mark. 2404 * marker: if filled in, begin the listing with this object. 2405 * result: the objects are put in here. 2406 * common_prefixes: if delim is filled in, any matching prefixes are placed 2407 * here. 2408 */ 2409 int RGWRados::Bucket::List::list_objects(int max, vector<RGWObjEnt> *result, 2410 map<string, bool> *common_prefixes, 2411 bool *is_truncated)
8084 int RGWRados::cls_bucket_list(rgw_bucket& bucket, rgw_obj_key& start, const string& prefix, 8085 uint32_t num_entries, bool list_versions, map<string, RGWObjEnt>& m, 8086 bool *is_truncated, rgw_obj_key *last_entry, 8087 bool (*force_check_filter)(const string& name))...8092 // key - oid (for different shards if there is any) 8093 // value - list result for the corresponding oid (shard), it is filled by the AIO callback 8094 map<int, string> oids; // 存放shard id // CLSRGWIssueBucketList列舉的結(jié)果存放在list_results中 8095 map<int, struct rgw_cls_list_ret> list_results; // oids中存放bucket index shard 對象的名字// 比如,// (gdb) print oids// $75 = std::map with 4 elements = {// [0] = ".dir.yhg-yhg.14236.1.0",// [1] = ".dir.yhg-yhg.14236.1.1",// [2] = ".dir.yhg-yhg.14236.1.2",// [3] = ".dir.yhg-yhg.14236.1.3"// // 調(diào)用了"4551 int RGWRados::open_bucket_index/get_bucket_index_objects"// 獲取了iods的名字列表(bucket index分片的名字)。8096 int r = open_bucket_index(bucket, index_ctx, oids); 8097 if (r < 0) 8098 return r; 8099 8100 cls_rgw_obj_key start_key(start.name, start.instance); // 對于oids中的所有對象,調(diào)用issue_op方法// 詳見:cls/rgw/cls_rgw_client.h:246// issue_op中調(diào)用了 osd 端cls 函數(shù) 'rgw bucket_list'8101 r = CLSRGWIssueBucketList(index_ctx, start_key, prefix, num_entries, list_versions, 8102 oids, list_results, cct->_conf->rgw_bucket_index_max_aio)();
list_results 結(jié)果示例:
(gdb) print list_results$85 = std::map with 4 elements = {[0] = {dir = {header = {stats = std::map with 1 elements = {[1 '\001'] = {total_size = 18,total_size_rounded = 8192,num_entries = 2}},tag_timeout = 0,ver = 7,master_ver = 0,max_marker = "00000000006.219.3"},m = std::map with 2 elements = {["h4h4"] = {key = {name = "h4h4",instance = ""},ver = {pool = 1,epoch = 8},locator = "",exists = true,meta = {category = 1 '\001',size = 9,mtime = {tv = {tv_sec = 1461035810,tv_nsec = 0}},etag = "bbb8aae57c104cda40c93843ad5e6db8",owner = "xx1",owner_display_name = "Zone user for yhg",content_type = "application/octet-stream",accounted_size = 9},pending_map = std::multimap with 0 elements,index_ver = 4,tag = "yhg-yhg.14236.5",flags = 0,versioned_epoch = 0},["sbsb"] = {key = {name = "sbsb",instance = ""},ver = {pool = 1,epoch = 99},locator = "",exists = true,meta = {category = 1 '\001',size = 9,mtime = {tv = {tv_sec = 1461055490,tv_nsec = 0}},etag = "bbb8aae57c104cda40c93843ad5e6db8",owner = "xx1",owner_display_name = "Zone user for yhg",content_type = "application/octet-stream",accounted_size = 9},pending_map = std::multimap with 0 elements,index_ver = 6,tag = "yhg-yhg.14236.29",flags = 0,versioned_epoch = 0}}},is_truncated = false},[1] = {dir = {header = {stats = std::map with 1 elements = {[1 '\001'] = {total_size = 10485760,total_size_rounded = 10485760,num_entries = 1}},tag_timeout = 0,ver = 11,master_ver = 0,max_marker = "00000000010.67.3"},m = std::map with 1 elements = {["tttt"] = {key = {name = "tttt",instance = ""},ver = {pool = 1,epoch = 13},locator = "",exists = true,meta = {category = 1 '\001',size = 10485760,mtime = {tv = {tv_sec = 1461132772,tv_nsec = 0}},etag = "219c7b0c38567750b218389f15c57e82",owner = "xx1",owner_display_name = "Zone user for yhg",content_type = "application/octet-stream",accounted_size = 10485760},pending_map = std::multimap with 0 elements,index_ver = 10,tag = "yhg-yhg.14236.49",flags = 0,versioned_epoch = 0}}},is_truncated = false},[2] = {dir = {header = {stats = std::map with 1 elements = {[1 '\001'] = {total_size = 9,total_size_rounded = 4096,num_entries = 1}},tag_timeout = 0,ver = 11,master_ver = 0,max_marker = "00000000010.101.3"},m = std::map with 1 elements = {["h5h5"] = {key = {name = "h5h5",instance = ""},ver = {pool = 1,epoch = 34},locator = "",exists = true,meta = {category = 1 '\001',size = 9,mtime = {tv = {tv_sec = 1461053473,tv_nsec = 0}},etag = "bbb8aae57c104cda40c93843ad5e6db8",owner = "xx1",owner_display_name = "Zone user for yhg",content_type = "application/octet-stream",accounted_size = 9},pending_map = std::multimap with 0 elements,index_ver = 10,tag = "yhg-yhg.14236.26",flags = 0,versioned_epoch = 0}}},is_truncated = false},[3] = {dir = {header = {stats = std::map with 0 elements,tag_timeout = 0,ver = 1,master_ver = 0,max_marker = ""},m = std::map with 0 elements},is_truncated = false}}
列舉結(jié)果存放在 struct rgw_cls_list_ret 結(jié)構(gòu)中。
375 struct rgw_cls_list_ret 376 { 377 rgw_bucket_dir dir; 378 bool is_truncated;584 struct rgw_bucket_dir { 585 struct rgw_bucket_dir_header header; 586 std::map<string, struct rgw_bucket_dir_entry> m;542 struct rgw_bucket_dir_header { 543 map<uint8_t, rgw_bucket_category_stats> stats; 544 uint64_t tag_timeout; 545 uint64_t ver; 546 uint64_t master_ver; 547 string max_marker; 516 struct rgw_bucket_category_stats { 517 uint64_t total_size; 518 uint64_t total_size_rounded; 519 uint64_t num_entries;
?
403 int rgw_bucket_list(cls_method_context_t hctx, bufferlist *in, bufferlist *out)// 讀取bucket index 對象上的omap header// (gdb) print new_dir.header// $2 = {// stats = std::map with 1 elements = {// [1 '\001'] = {// total_size = 18,// total_size_rounded = 8192,// num_entries = 2// }// },// tag_timeout = 0,// ver = 7,// master_ver = 0,// max_marker = "00000000006.219.3"// }//415 struct rgw_cls_list_ret ret; 416 struct rgw_bucket_dir& new_dir = ret.dir; 417 int rc = read_bucket_header(hctx, &new_dir.header);425 map<string, bufferlist> keys; 426 string start_key; 427 encode_list_index_key(hctx, op.start_obj, &start_key); // 沒有什么作用// 讀取bucket index 對象的omap的各個(gè)k/v entry 428 rc = get_obj_vals(hctx, start_key, op.filter_prefix, op.num_entries + 1, &keys);
?
最終keys解析到new_dir.m中,返回給rgw instance端。如下
(gdb) print new_dir.m$5 = std::map with 2 elements = {["h4h4"] = {key = {name = "h4h4",instance = ""},ver = {pool = 1,epoch = 8},locator = "",exists = true,meta = {category = 1 '\001',size = 9,mtime = {tv = {tv_sec = 1461035810,tv_nsec = 0}},etag = "bbb8aae57c104cda40c93843ad5e6db8",owner = "xx1",owner_display_name = "Zone user for yhg",content_type = "application/octet-stream",accounted_size = 9},pending_map = std::multimap with 0 elements,index_ver = 4,tag = "yhg-yhg.14236.5",flags = 0,versioned_epoch = 0},["sbsb"] = {key = {name = "sbsb",instance = ""},ver = {pool = 1,epoch = 99},locator = "",exists = true,meta = {category = 1 '\001',size = 9,mtime = {tv = {tv_sec = 1461055490,tv_nsec = 0}},etag = "bbb8aae57c104cda40c93843ad5e6db8",owner = "xx1",owner_display_name = "Zone user for yhg",content_type = "application/octet-stream",accounted_size = 9},pending_map = std::multimap with 0 elements,index_ver = 6,tag = "yhg-yhg.14236.29",flags = 0,versioned_epoch = 0}}
[1] `RadosGW Big Index <http://cephnotes.ksperis.com/blog/2015/05/12/radosgw-big-index/>`_
[2] `Rgw bucket index scalability <http://tracker.ceph.com/projects/ceph/wiki/Rgw_-_bucket_index_scalability>
很多時(shí),這將使得dir對象很大。不僅包含該dir對象的osd會使用很多內(nèi)存,而且當(dāng)dir
對象遷移時(shí)所有對該對象的寫操作都會鎖定[1]。
[root@yhg-2 cmds]# rados -p .rgw.buckets.index listomapvals .dir.yhg-yhg.5457.29 --cluster yhg | moreafuhanvalue: (198 bytes) :0000 : 08 03 c0 00 00 00 06 00 00 00 61 66 75 68 61 6e : ..........afuhan0010 : 02 00 00 00 00 00 00 00 01 04 03 75 00 00 00 01 : ...........u....0020 : 09 00 00 00 00 00 00 00 fc b7 10 57 00 00 00 00 : ...........W....0030 : 20 00 00 00 62 62 62 38 61 61 65 35 37 63 31 30 : ...bbb8aae57c100040 : 34 63 64 61 34 30 63 39 33 38 34 33 61 64 35 65 : 4cda40c93843ad5e0050 : 36 64 62 38 03 00 00 00 78 78 31 11 00 00 00 5a : 6db8....xx1....Z0060 : 6f 6e 65 20 75 73 65 72 20 66 6f 72 20 79 68 67 : one user for yhg0070 : 18 00 00 00 61 70 70 6c 69 63 61 74 69 6f 6e 2f : ....application/0080 : 6f 63 74 65 74 2d 73 74 72 65 61 6d 09 00 00 00 : octet-stream....0090 : 00 00 00 00 00 00 00 00 00 00 00 00 01 01 02 00 : ................00a0 : 00 00 01 02 04 0f 00 00 00 79 68 67 2d 79 68 67 : .........yhg-yhg00b0 : 2e 31 34 31 30 32 2e 37 00 00 00 00 00 00 00 00 : .14102.7........00c0 : 00 00 00 00 00 00 : ......
從hammer開始,就可以使用bucket index shard特性。將bucket的index對象分成多個(gè)
對象。但對于已經(jīng)存在bucket,無法使用該特性。
對于bucket index shard,涉及4類情況[2]:
- bucket 創(chuàng)建
- put/copy/delete 對象
以put為例進(jìn)行分析。
- bucket listing/stats/fix indexing
以list為例進(jìn)行分析。
- bucket index log
1. index shard配置
有兩種配置方法:
通過region map設(shè)置
// 獲取region配置[root@yhg-2 cmds]# radosgw-admin region get --cluster yhg > /tmp/region// 將 'bucket_index_max_shards'的值修改為 4[root@yhg-2 cmds]# vim /tmp/region// 更新region配置[root@yhg-2 cmds]# radosgw-admin region put --cluster yhg < /tmp/region// 更新到region map[root@yhg-2 cmds]# radosgw-admin regionmap update --cluster yhg
通過集群配置文件設(shè)置
[client.radosgw.yhg-yhg-yhg-2]rgw frontends = "civetweb port=80"rgw bucket index max shards = 4
.. Note:: 需要重新啟動radosgw實(shí)例才能使得配置生效。
2. 檢查
創(chuàng)建一個(gè)名為mmm的bucket。
// 查看bucket的id[root@yhg-2 cmds]# radosgw-admin metadata get bucket:mmm --cluster yhg | grep bucket_id"bucket_id": "yhg-yhg.14236.1"// 查看bucket 實(shí)例信息[root@yhg-2 cmds]# radosgw-admin metadata get bucket.instance:mmm:yhg-yhg.14236.1 --cluster yhg | grep shard"num_shards": 4,"bi_shard_hash_type": 0// 查看.dir對象分片[root@yhg-2 cmds]# rados -p .rgw.buckets.index ls --cluster yhg | grep "yhg-yhg.14236.1".dir.yhg-yhg.14236.1.2.dir.yhg-yhg.14236.1.0.dir.yhg-yhg.14236.1.1.dir.yhg-yhg.14236.1.3
3. bucket_index_max_shards 值的選擇
從[1]中看,該值不適合配置太大,否則嚴(yán)重影響bucket list操作。具體設(shè)置為多少,
需要測試來確定。
4. create bucket 初始化各個(gè)index shard
創(chuàng)建bucket時(shí)需要初始化bucket index對象。index shard打開時(shí),會創(chuàng)建多個(gè)index shard對象。 對象的名字為.dir_XXX.$NUM。
2593 int RGWRados::init_bucket_index(rgw_bucket& bucket, int num_shards) 2594 { 2595 librados::IoCtx index_ctx; // context for new bucket 2596 // 創(chuàng)建到集群中bucket.index_pool的IoCtx,用于后續(xù)對該pool進(jìn)行操作// 本例中bucket.index_pool是 .rgw.buckets.index2597 int r = open_bucket_index_ctx(bucket, index_ctx); 2598 if (r < 0) 2599 return r; 2600 // 拼接bucket的.dir_XXX對象名字// 例如:.dir.yhg-yhg.5457.24// 其中, '.dir'為統(tǒng)一前綴,yhg-yhg.5457.24為bucket id/marker2601 string dir_oid = dir_oid_prefix; 2602 dir_oid.append(bucket.marker); 2603 // 獲取.dir_XXX對象map。如果沒有打開index shard特性,該map中只有一個(gè)// 項(xiàng),就是<0, '.dir_XXX'>。// 如果打開了index shard特性,.dir_XXX為分成num_shards個(gè)對象。// 名字為.dir_XXX.$NUM 2604 map<int, string> bucket_objs; 2605 get_bucket_index_objects(dir_oid, num_shards, bucket_objs); 2606 // 調(diào)用了ceph osd端cls操作 'rgw bucket_init_index',// 即調(diào)用了CLSRGWConcurrentIO()2607 return CLSRGWIssueBucketIndexInit(index_ctx, bucket_objs, cct->_conf->rgw_bucket_index_max_aio)();2608 }
對于index shard配置為4時(shí),
4.1 CLSRGWConcurrentIO()
對bucket 的.dir_XXX對象調(diào)用了cls 操作 'rgw bucket_init_index'。// cls/rgw/cls_rgw_client.h246 int operator()() {247 int ret = 0; 248 iter = objs_container.begin(); 249 for (; iter != objs_container.end() && max_aio-- > 0; ++iter) { // 最終調(diào)用了issue_bucket_index_init_op// 即,調(diào)用了集群端cls操作, 'rgw bucket_init_index'250 ret = issue_op(iter->first, iter->second);
4.2 ceph osd端 cls 操作 'rgw bucket_init_index'
初始化index對象的rgw_bucket_dir_header信息。
// cls/rgw/cls_rgw.cc566 int rgw_bucket_init_index(cls_method_context_t hctx, bufferlist *in, bufferlist *out)
rgw_bucket_dir_header持久化為omap header。具體存放的內(nèi)容為(結(jié)合encode方法看):
// key為 header542 struct rgw_bucket_dir_header { 543 map<uint8_t, rgw_bucket_category_stats> stats; 544 uint64_t tag_timeout; 545 uint64_t ver; // 本次操作時(shí),只有有ver被設(shè)置為了1 546 uint64_t master_ver; 547 string max_marker?
5. put object 如何確定bucket index
在將對象的各個(gè)數(shù)據(jù)片段寫入數(shù)據(jù)pool后,需要更新bucket index的信息。在
"解析in中的op信息,將新key內(nèi)容寫到bucket dir對象的omap中。"步驟時(shí)的操作如下:
// RGWRados::Bucket::UpdateIndex::prepare3468 r = index_op.prepare(CLS_RGW_OP_ADD);// RGWRados::Bucket::UpdateIndex::complete3487 r = index_op.complete(poolid, epoch, size, 3488 ut, etag, content_type, &acl_bl, 3489 meta.category, meta.remove_objs); ?
上面的兩步都需要確定bucket index信息。具體對于bucket index分片的確定,在
BucketShard對象的初始化過程中完成。prepare和complete中都是通過調(diào)用
get_bucket_shard()來取定BucketShard信息的。下面對BucketShard bs的初始化過程
進(jìn)行分析。
// rgw/rgw_rados.h1529 int get_bucket_shard(BucketShard **pbs) { 1530 if (!bs_initialized) { 1531 int r = bs.init(bucket_info.bucket, obj);// rgw/rgw_rados.cc4589 int RGWRados::open_bucket_index_shard(rgw_bucket& bucket, librados::IoCtx& index_ctx, 4590 const string& obj_key, string *bucket_obj, int *shard_id) // 建立index base對象所在pool的io上下文,// 并返回拼接好的index base對象名字.dir.${bucket.marker}4592 string bucket_oid_base; 4593 int ret = open_bucket_index_base(bucket, index_ctx, bucket_oid_base);// 從bucket meta對象(.bucket.meta.${bucket.name}:${bucket.marker})中讀出// bucket的描述信息4599 // Get the bucket info 4600 RGWBucketInfo binfo; 4601 ret = get_bucket_instance_info(obj_ctx, bucket, binfo, NULL, NULL); // 采用簡單的hash算法,計(jì)算出shard id,并拼接出bucket index對象的名字// 比如,.dir.yhg-yhg.14236.1.14605 ret = get_bucket_index_object(bucket_oid_base, obj_key, binfo.num_shards, 4606 (RGWBucketInfo::BIShardsHashType)binfo.bucket_index_shard_hash_type, bucket_obj, shard_id)
BucketShard信息:
(gdb) print *bs$62 = {store = 0x3301c70,bucket = {name = "mmm",data_pool = "dpool1",data_extra_pool = ".rgw.buckets.extra",index_pool = ".rgw.buckets.index",marker = "yhg-yhg.14236.1",bucket_id = "yhg-yhg.14236.1",oid = ".bucket.meta.mmm:yhg-yhg.14236.1"},shard_id = 1,index_ctx = {io_ctx_impl = 0x7f5074008d10},bucket_obj = ".dir.yhg-yhg.14236.1.1"}
binfo內(nèi)容如下:
其中對于index shard來說關(guān)注的有num_shards和bucket_index_shard_hash_type。
(gdb) print binfo$53 = {bucket = {name = "mmm",data_pool = "dpool1",data_extra_pool = ".rgw.buckets.extra",index_pool = ".rgw.buckets.index",marker = "yhg-yhg.14236.1",bucket_id = "yhg-yhg.14236.1",oid = ".bucket.meta.mmm:yhg-yhg.14236.1"},owner = "xx1",flags = 0,region = "yhg",creation_time = 1461035367,placement_rule = "default-placement",has_instance_obj = true,objv_tracker = {read_version = {ver = 1,tag = "_TrpC7B0VOdoBEkokzucAQtd"},write_version = {ver = 0,tag = ""}},ep_objv = {ver = 0,tag = ""},quota = {max_size_kb = -1,max_objects = -1,enabled = false,max_size_soft_threshold = -1,max_objs_soft_threshold = -1},num_shards = 4,bucket_index_shard_hash_type = 0 '\000',static NUM_SHARDS_BLIND_BUCKET = 4294967295}
**總結(jié)**
需要獲取BucketShard信息時(shí),套路如下
6589 BucketShard bs(this); 6590 int ret = bs.init(bucket, obj_instance)
6. bucket listing 獲取所有對象列表
// rgw/rgw_rados.cc2396 /** 2397 * get listing of the objects in a bucket. 2398 * bucket: bucket to list contents of 2399 * max: maximum number of results to return 2400 * prefix: only return results that match this prefix 2401 * delim: do not include results that match this string. 2402 * Any skipped results will have the matching portion of their name 2403 * inserted in common_prefixes with a "true" mark. 2404 * marker: if filled in, begin the listing with this object. 2405 * result: the objects are put in here. 2406 * common_prefixes: if delim is filled in, any matching prefixes are placed 2407 * here. 2408 */ 2409 int RGWRados::Bucket::List::list_objects(int max, vector<RGWObjEnt> *result, 2410 map<string, bool> *common_prefixes, 2411 bool *is_truncated)
8084 int RGWRados::cls_bucket_list(rgw_bucket& bucket, rgw_obj_key& start, const string& prefix, 8085 uint32_t num_entries, bool list_versions, map<string, RGWObjEnt>& m, 8086 bool *is_truncated, rgw_obj_key *last_entry, 8087 bool (*force_check_filter)(const string& name))...8092 // key - oid (for different shards if there is any) 8093 // value - list result for the corresponding oid (shard), it is filled by the AIO callback 8094 map<int, string> oids; // 存放shard id // CLSRGWIssueBucketList列舉的結(jié)果存放在list_results中 8095 map<int, struct rgw_cls_list_ret> list_results; // oids中存放bucket index shard 對象的名字// 比如,// (gdb) print oids// $75 = std::map with 4 elements = {// [0] = ".dir.yhg-yhg.14236.1.0",// [1] = ".dir.yhg-yhg.14236.1.1",// [2] = ".dir.yhg-yhg.14236.1.2",// [3] = ".dir.yhg-yhg.14236.1.3"// // 調(diào)用了"4551 int RGWRados::open_bucket_index/get_bucket_index_objects"// 獲取了iods的名字列表(bucket index分片的名字)。8096 int r = open_bucket_index(bucket, index_ctx, oids); 8097 if (r < 0) 8098 return r; 8099 8100 cls_rgw_obj_key start_key(start.name, start.instance); // 對于oids中的所有對象,調(diào)用issue_op方法// 詳見:cls/rgw/cls_rgw_client.h:246// issue_op中調(diào)用了 osd 端cls 函數(shù) 'rgw bucket_list'8101 r = CLSRGWIssueBucketList(index_ctx, start_key, prefix, num_entries, list_versions, 8102 oids, list_results, cct->_conf->rgw_bucket_index_max_aio)();
list_results 結(jié)果示例:
(gdb) print list_results$85 = std::map with 4 elements = {[0] = {dir = {header = {stats = std::map with 1 elements = {[1 '\001'] = {total_size = 18,total_size_rounded = 8192,num_entries = 2}},tag_timeout = 0,ver = 7,master_ver = 0,max_marker = "00000000006.219.3"},m = std::map with 2 elements = {["h4h4"] = {key = {name = "h4h4",instance = ""},ver = {pool = 1,epoch = 8},locator = "",exists = true,meta = {category = 1 '\001',size = 9,mtime = {tv = {tv_sec = 1461035810,tv_nsec = 0}},etag = "bbb8aae57c104cda40c93843ad5e6db8",owner = "xx1",owner_display_name = "Zone user for yhg",content_type = "application/octet-stream",accounted_size = 9},pending_map = std::multimap with 0 elements,index_ver = 4,tag = "yhg-yhg.14236.5",flags = 0,versioned_epoch = 0},["sbsb"] = {key = {name = "sbsb",instance = ""},ver = {pool = 1,epoch = 99},locator = "",exists = true,meta = {category = 1 '\001',size = 9,mtime = {tv = {tv_sec = 1461055490,tv_nsec = 0}},etag = "bbb8aae57c104cda40c93843ad5e6db8",owner = "xx1",owner_display_name = "Zone user for yhg",content_type = "application/octet-stream",accounted_size = 9},pending_map = std::multimap with 0 elements,index_ver = 6,tag = "yhg-yhg.14236.29",flags = 0,versioned_epoch = 0}}},is_truncated = false},[1] = {dir = {header = {stats = std::map with 1 elements = {[1 '\001'] = {total_size = 10485760,total_size_rounded = 10485760,num_entries = 1}},tag_timeout = 0,ver = 11,master_ver = 0,max_marker = "00000000010.67.3"},m = std::map with 1 elements = {["tttt"] = {key = {name = "tttt",instance = ""},ver = {pool = 1,epoch = 13},locator = "",exists = true,meta = {category = 1 '\001',size = 10485760,mtime = {tv = {tv_sec = 1461132772,tv_nsec = 0}},etag = "219c7b0c38567750b218389f15c57e82",owner = "xx1",owner_display_name = "Zone user for yhg",content_type = "application/octet-stream",accounted_size = 10485760},pending_map = std::multimap with 0 elements,index_ver = 10,tag = "yhg-yhg.14236.49",flags = 0,versioned_epoch = 0}}},is_truncated = false},[2] = {dir = {header = {stats = std::map with 1 elements = {[1 '\001'] = {total_size = 9,total_size_rounded = 4096,num_entries = 1}},tag_timeout = 0,ver = 11,master_ver = 0,max_marker = "00000000010.101.3"},m = std::map with 1 elements = {["h5h5"] = {key = {name = "h5h5",instance = ""},ver = {pool = 1,epoch = 34},locator = "",exists = true,meta = {category = 1 '\001',size = 9,mtime = {tv = {tv_sec = 1461053473,tv_nsec = 0}},etag = "bbb8aae57c104cda40c93843ad5e6db8",owner = "xx1",owner_display_name = "Zone user for yhg",content_type = "application/octet-stream",accounted_size = 9},pending_map = std::multimap with 0 elements,index_ver = 10,tag = "yhg-yhg.14236.26",flags = 0,versioned_epoch = 0}}},is_truncated = false},[3] = {dir = {header = {stats = std::map with 0 elements,tag_timeout = 0,ver = 1,master_ver = 0,max_marker = ""},m = std::map with 0 elements},is_truncated = false}}
rgw bucket_list
列舉結(jié)果存放在 struct rgw_cls_list_ret 結(jié)構(gòu)中。
375 struct rgw_cls_list_ret 376 { 377 rgw_bucket_dir dir; 378 bool is_truncated;584 struct rgw_bucket_dir { 585 struct rgw_bucket_dir_header header; 586 std::map<string, struct rgw_bucket_dir_entry> m;542 struct rgw_bucket_dir_header { 543 map<uint8_t, rgw_bucket_category_stats> stats; 544 uint64_t tag_timeout; 545 uint64_t ver; 546 uint64_t master_ver; 547 string max_marker; 516 struct rgw_bucket_category_stats { 517 uint64_t total_size; 518 uint64_t total_size_rounded; 519 uint64_t num_entries;
?
403 int rgw_bucket_list(cls_method_context_t hctx, bufferlist *in, bufferlist *out)// 讀取bucket index 對象上的omap header// (gdb) print new_dir.header// $2 = {// stats = std::map with 1 elements = {// [1 '\001'] = {// total_size = 18,// total_size_rounded = 8192,// num_entries = 2// }// },// tag_timeout = 0,// ver = 7,// master_ver = 0,// max_marker = "00000000006.219.3"// }//415 struct rgw_cls_list_ret ret; 416 struct rgw_bucket_dir& new_dir = ret.dir; 417 int rc = read_bucket_header(hctx, &new_dir.header);425 map<string, bufferlist> keys; 426 string start_key; 427 encode_list_index_key(hctx, op.start_obj, &start_key); // 沒有什么作用// 讀取bucket index 對象的omap的各個(gè)k/v entry 428 rc = get_obj_vals(hctx, start_key, op.filter_prefix, op.num_entries + 1, &keys);
?
最終keys解析到new_dir.m中,返回給rgw instance端。如下
(gdb) print new_dir.m$5 = std::map with 2 elements = {["h4h4"] = {key = {name = "h4h4",instance = ""},ver = {pool = 1,epoch = 8},locator = "",exists = true,meta = {category = 1 '\001',size = 9,mtime = {tv = {tv_sec = 1461035810,tv_nsec = 0}},etag = "bbb8aae57c104cda40c93843ad5e6db8",owner = "xx1",owner_display_name = "Zone user for yhg",content_type = "application/octet-stream",accounted_size = 9},pending_map = std::multimap with 0 elements,index_ver = 4,tag = "yhg-yhg.14236.5",flags = 0,versioned_epoch = 0},["sbsb"] = {key = {name = "sbsb",instance = ""},ver = {pool = 1,epoch = 99},locator = "",exists = true,meta = {category = 1 '\001',size = 9,mtime = {tv = {tv_sec = 1461055490,tv_nsec = 0}},etag = "bbb8aae57c104cda40c93843ad5e6db8",owner = "xx1",owner_display_name = "Zone user for yhg",content_type = "application/octet-stream",accounted_size = 9},pending_map = std::multimap with 0 elements,index_ver = 6,tag = "yhg-yhg.14236.29",flags = 0,versioned_epoch = 0}}
參考
[1] `RadosGW Big Index <http://cephnotes.ksperis.com/blog/2015/05/12/radosgw-big-index/>`_
[2] `Rgw bucket index scalability <http://tracker.ceph.com/projects/ceph/wiki/Rgw_-_bucket_index_scalability>
總結(jié)
以上是生活随笔為你收集整理的radosgw bucket index sharding的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 如何评价 2021 考研政治题,难度如何
- 下一篇: bucket是什么意思?有什么作用?