HugePages 大内存页
HugePages ?大內(nèi)存頁(yè)
Linux HugePage 特性??http://blog.csdn.net/leshami/article/details/8777639
? ? HugePage,就是指的大頁(yè)內(nèi)存管理方式。與傳統(tǒng)的4kb的普通頁(yè)管理方式相比,HugePage為管理大內(nèi)存(8GB以上)更為高效。本文描述了什么是HugePage,以及HugePage的一些特性。
?
1、Hugepage的引入
??? 操作系統(tǒng)對(duì)于數(shù)據(jù)的存取直接從物理內(nèi)存要比從磁盤(pán)讀寫(xiě)數(shù)據(jù)要快的多,但是物理內(nèi)存是有限的,這樣就引出了物理內(nèi)存與虛擬內(nèi)存的概念。虛擬內(nèi)存就是為了滿足物理內(nèi)存的不足而提出的策略,它是利用磁盤(pán)空間虛擬出的一塊邏輯內(nèi)存,這部分磁盤(pán)空間Windows下稱(chēng)之為虛擬內(nèi)存,Linux下被稱(chēng)為交換空間(Swap Space)。
?
????對(duì)于這個(gè)大內(nèi)存的管理(物理內(nèi)存+虛擬內(nèi)存),大多數(shù)操作系統(tǒng)采用了分段或分頁(yè)的方式進(jìn)行管理。分段是粗粒度的管理方式,而分頁(yè)則是細(xì)粒度管理方式,分頁(yè)方式可以避免內(nèi)存空間的浪費(fèi)。相應(yīng)地,也就存在內(nèi)存的物理地址與虛擬地址的概念。通過(guò)前面這兩種方式,CPU必須把虛擬地址轉(zhuǎn)換程物理內(nèi)存地址才能真正訪問(wèn)內(nèi)存。為了提高這個(gè)轉(zhuǎn)換效率,CPU會(huì)緩存最近的虛擬內(nèi)存地址和物理內(nèi)存地址的映射關(guān)系,并保存在一個(gè)由CPU維護(hù)的映射表中。為了盡量提高內(nèi)存的訪問(wèn)速度,需要在映射表中保存盡量多的映射關(guān)系。
?
????linux的內(nèi)存管理采取的是分頁(yè)存取機(jī)制,為了保證物理內(nèi)存能得到充分的利用,內(nèi)核會(huì)按照LRU算法在適當(dāng)?shù)臅r(shí)候?qū)⑽锢韮?nèi)存中不經(jīng)常使用的內(nèi)存頁(yè)自動(dòng)交換到虛擬內(nèi)存中,而將經(jīng)常使用的信息保留到物理內(nèi)存。通常情況下,Linux默認(rèn)情況下每頁(yè)是4K,這就意味著如果物理內(nèi)存很大,則映射表的條目將會(huì)非常多,會(huì)影響CPU的檢索效率。因?yàn)閮?nèi)存大小是固定的,為了減少映射表的條目,可采取的辦法只有增加頁(yè)的尺寸。因此Hugepage便因此而來(lái)。也就是打破傳統(tǒng)的小頁(yè)面的內(nèi)存管理方式,使用大頁(yè)面2m,4m,16m等等。如此一來(lái)映射條目則明顯減少。如果系統(tǒng)有大量的物理內(nèi)存(大于8G),則物理32位的操作系統(tǒng)還是64位的,都應(yīng)該使用Hugepage。
?
2、Hugepage的相關(guān)術(shù)語(yǔ)
Page Table:
????A page table is the data structure of a virtual memory system in an operating system to store the mapping between virtual addresses and physical addresses. This means that on a virtual memory system, the memory is accessed by first accessing a page table and then accessing the actual memory location implicitly.
????如前所述,page table也就是一種用于內(nèi)存管理的實(shí)現(xiàn)方式,用于物理地址到虛擬之間的映射。因此對(duì)于內(nèi)存的訪問(wèn),先是訪問(wèn)Page Table,然后根據(jù)Page Table 中的映射關(guān)系,隱式的轉(zhuǎn)移到物理地址來(lái)存取數(shù)據(jù)。
?
TLB:
????A Translation Lookaside Buffer (TLB) is a buffer (or cache) in a CPU that contains parts of the page table. This is a fixed size buffer being used to do virtual address translation faster.
????? CPU中的一塊固定大小的cache,包含了部分page table的映射關(guān)系,用于快速實(shí)現(xiàn)虛擬地址到物理地址的轉(zhuǎn)換。
?
hugetlb:
????This is an entry in the TLB that points to a HugePage (a large/big page larger than regular 4K and predefined in size). HugePages are implemented via hugetlb entries, i.e. we can say that a HugePage is handled by a "hugetlb page entry". The 'hugetlb" term is also (and mostly) used synonymously with a HugePage (See Note 261889.1). In this document the term "HugePage" is going to be used but keep in mind that mostly "hugetlb" refers to the same concept.
????hugetlb 是TLB中指向HugePage的一個(gè)entry(通常大于4k或預(yù)定義頁(yè)面大小)。 HugePage 通過(guò)hugetlb entries來(lái)實(shí)現(xiàn),也可以理解為HugePage 是hugetlb page entry的一個(gè)句柄。
?
hugetlbfs:
????This is a new in-memory filesystem like tmpfs and is presented by 2.6 kernel. Pages allocated on hugetlbfs type filesystem are allocated in HugePages.
????? 一個(gè)類(lèi)似于tmpfs的新的in-memory filesystem,在2.6內(nèi)核被提出。
?
3、常見(jiàn)的錯(cuò)誤概念
WRONG: HugePages is a method to be able to use large SGA on 32-bit VLM systems?
RIGHT: HugePages is a method to have larger pages where it is useful for working with very large memory. It is both useful in 32- and 64-bit configurations
?
WRONG: HugePages cannot be used without USE_INDIRECT_DATA_BUFFERS?
RIGHT: HugePages can be used without indirect buffers. 64-bit systems does not need to use indirect buffers to have a large buffer cache for the RDBMS instance and HugePages can be used there too.
?
WRONG: hugetlbfs means hugetlb?
RIGHT: hugetlbfs is a filesystem type **BUT** hugetlb is the mechanism employed in the back where hugetlb can be employed WITHOUT hugetlbfs
?
WRONG: hugetlbfs means hugepages?
RIGHT: hugetlbfs is a filesystem type **BUT** HugePages is the mechanism employed in the back (synonymously with hugetlb) where HugePages can be employed WITHOUT hugetlbfs.
?
4、Regular Pages 與 HugePages
a、Regular Pages
在下圖中有兩個(gè)不同的進(jìn)程,兩個(gè)進(jìn)程對(duì)于內(nèi)存的訪問(wèn)是首先訪問(wèn)本地的page table,而本地的page table又參照了system-wide table的page(也就是前面描述的TLB),最終system-wide table中的entry指向了實(shí)際的物理地址。圖中物理地址page size大小4kb。也可以看到進(jìn)程1和進(jìn)程2在system-wide table中都指向了page2,也就是同一個(gè)物理地址。Oracle sga中共享內(nèi)存的使用會(huì)出現(xiàn)上述情形。
?
b、Huge Pages
在下圖中,本地的page table 與system page table中都包含了huge page屬性。因此page table中的任意一個(gè)page可能使用了常規(guī)的page,
也有可能使用了huge page。同樣進(jìn)程1和進(jìn)程2都共享了其中的Hpage2。圖中的物理內(nèi)存常規(guī)的page size是4kb,huge page size 是4mb。
--Author : Robinson
--Blog?? :?http://blog.csdn.net/robinson_0612
?
5、huge page 的大小
huge page 的大小取決于所使用的操作系統(tǒng)的內(nèi)核版本以及不同的硬件平臺(tái)
可以使用$grep Hugepagesize /proc/meminfo來(lái)查看huge page 的大小
下面是不同平臺(tái)常用的huge page 的大小。
HW Platform????????????????????????Source Code Tree????????????? Kernel 2.4?????????? Kernel 2.6
-----------------??????????????????---------------------???????? ------------????????? -------------
Linux x86 (IA32)???????????????????i386????????????????????????? 4 MB????????????????? 4 MB *?
Linux x86-64 (AMD64, EM64T)??????? x86_64??????????????????????? 2 MB????????????????? 2 MB?
Linux Itanium (IA64)???????????????ia64????????????????????????? 256 MB?????? ???????? 256 MB?
IBM Power Based Linux (PPC64)??????ppc64/powerpc???????????????? N/A **??????? ??????? 16 MB?
IBM zSeries Based Linux????????????s390????????????????????????? N/A???????????????????1 MB?
IBM S/390 Based Linux??????????????s390????????????????????????? N/A????????????????? ?N/A
?
6、使用huge page的優(yōu)點(diǎn)
對(duì)于較大的系統(tǒng)內(nèi)存以及sga,使用hugepage可以極大程度的提高Oracle數(shù)據(jù)庫(kù)性能。
a、Not swappable
HugePages are not swappable. Therefore there is no page-in/page-out mechanism overhead.HugePages are universally regarded as pinned.
無(wú)需交換。也就是不存在頁(yè)面由于內(nèi)存空間不足而存在換入換出的問(wèn)題
?
b、Relief of TLB pressure
Hugepge uses fewer pages to cover the physical address space, so the size of “book keeping” (mapping from the virtual to the physical address) decreases, so it requiring fewer entries in the TLB
TLB entries will cover a larger part of the address space when use HugePages, there will be fewer TLB misses before the entire or most of the SGA is mapped in the SGA
Fewer TLB entries for the SGA also means more for other parts of the address space
減輕TLB的壓力,也就是降低了cpu cache可緩存的地址映射壓力。由于使用了huge page,相同的內(nèi)存大小情況下,管理的虛擬地址數(shù)量變少。
TLB entry可以包含更多的地址空間,cpu的尋址能力相應(yīng)的得到了增強(qiáng)。
?
c、Decreased page table overhead
Each page table entry can be as large as 64 bytes and if we are trying to handle 50GB of RAM, the pagetable will be approximately 800MB in size which is practically will not fit in 880MB size lowmem (in 2.4 kernels - the page table is not necessarily in lowmem in 2.6 kernels) considering the other uses of lowmem. When 95% of memory is accessed via 256MB hugepages, this can work with a page table of approximately 40MB in total. See also Document 361468.1.
降低page table負(fù)載,對(duì)于普通的page,每個(gè)entry需要64bytes進(jìn)行管理,對(duì)于50gb的內(nèi)存,管理這些entry,需要800mb的大小
(50*1024*1024)kb/4kb*64bytes/1024/1024=800mb。
?
d、Eliminated page table lookup overhead
Since the pages are not subject to replacement, page table lookups are not required.(?消除page table查找負(fù)載)
?
e、Faster overall memory performance?
On virtual memory systems each memory operation is actually two abstract memory operations. Since there are fewer pages to work on, the possible bottleneck on page table access is clearly avoided.(提高內(nèi)存的整體性能)??????
?
7、未正確配值huge page的風(fēng)險(xiǎn)
基于大內(nèi)存(>8GB)的管理,如果配值或正確配值huge page,可能存在下列不確定的隱性問(wèn)題
??? HugePages not used (HugePages_Total = HugePages_Free) at all wasting the amount configured for?
??? Poor database performance?
??? System running out of memory or excessive swapping?
??? Some or any database instance cannot be started?
??? Crucial system services failing (e.g.: CRS)
?
8、基于2.6內(nèi)核的配值步驟
The kernel parameter used for HugePages is vm.nr_hugepages which is based on the number of the pages. SLES9, RHEL4 and Asianux 2.0 are? examples of distributions with the 2.6 kernel. For the configuration, follow steps below:
??? a. Start instance(s)
??? b. Calculate nr_hugepages using script from Document 401749.1
??? c. Set kernel parameter:
??????? ??? # sysctl -w vm.nr_hugepages=
???????? and make sure that the parameter is persistent to reboots. e.g. On SLES9:
??????????? # chkconfig boot.sysctl on
??? d. Check available hugepages:
??????? ??? $ grep Huge /proc/meminfo
??? e. Restart instances
??? f. Check available hugepages:
??????? ??? $ grep Huge /proc/meminfo
?
9、注意事項(xiàng)
a、HugePage使用的是共享內(nèi)存,在操作系統(tǒng)啟動(dòng)期間被動(dòng)態(tài)分配并被保留,因?yàn)樗麄儾粫?huì)被置換。
b、由于不會(huì)被置換的特點(diǎn),在使用hugepage的內(nèi)存不能被其他的進(jìn)程使用。所以要合理設(shè)置該值,避免造成內(nèi)存浪費(fèi)。
c、對(duì)于只使用Oracle的服務(wù)器來(lái)說(shuō),把Hugepage設(shè)置成SGA(所有instance SGA之和)大小即可。
d、如果增加HugePage或添加物理內(nèi)存或者是當(dāng)前服務(wù)器增加了新的instance以及SGA發(fā)生變化,應(yīng)該重新設(shè)置所需的HugePage。
e、reference: HugePages on Linux: What It Is... and What It Is Not... [ID 361323.1] To Bottom
f、如何配置HugePage,請(qǐng)參考:Linux 下配置 HugePages?
Linux 下配置 HugePages? ?http://blog.csdn.net/leshami/article/details/8788825
? ?HugePages是通過(guò)使用大頁(yè)內(nèi)存來(lái)取代傳統(tǒng)的4kb內(nèi)存頁(yè)面,使得管理虛擬地址數(shù)變少,加快了從虛擬地址到物理地址的映射以及通過(guò)摒棄內(nèi)存頁(yè)面的換入換出以提高內(nèi)存的整體性能。尤其是對(duì)于8GB以上的內(nèi)存以及較大的Oracle SGA size,建議配值并使用HugePage特性。本文基于x86_64?Linux下來(lái)描述如何配值 HugePages。
??? 有關(guān)HugePages的特性請(qǐng)參考:Linux HugePage 特性
??
1、為什么需要配值HugePages ?
a、Larger Page Size and Less # of Pages:?
??? Default page size is 4K whereas the HugeTLB size is 2048K. That means the system would need to handle 512 times less pages.
b、No Page Table Lookups:?
??? Since the HugePages are not subject to replacement (despite regular pages), page table lookups are not required.
c、Better Overall Memory Performance:?
??? On virtual memory systems (any modern OS) each memory operation is actually two abstract memory operations. With HugePages, since there are less number of pages to work on, the possible bottleneck on page table access is clearly avoided.
d、No Swapping:?
??? We must avoid swapping to happen on Linux OS at all Document 1295478.1. HugePages are not swappable (whereas regular pages are). Therefore there is no page replacement mechanism overhead. HugePages are universally regarded as pinned.
e、No 'kswapd' Operations:
???? kswapd will get very busy if there is a very large area to be paged (i.e. 13 million page table entries for 50GB memory) and will use an incredible amount of CPU resource. When HugePages are used, kswapd is not involved in managing them. See also Document 361670.1
?
2、配值HugePages
??下面列出了配值HugePages的所有步驟
a、查看當(dāng)前系統(tǒng)是否配值HugePages
??下面的查詢中HugePages相關(guān)的幾個(gè)值都為0,表明當(dāng)前未配值HugePages,其次可以看到Hugepagesize為2MB。
??$ grep Huge /proc/meminfo
??HugePages_Total:?? 0
??HugePages_Free:??? 0
??HugePages_Rsvd:??? 0
??Hugepagesize:???? 2048 kB
????
b、修改用戶的memlock限制
??通過(guò)修改/etc/security/limits.conf 配值文件來(lái)實(shí)現(xiàn)
??該參數(shù)的值通常配值位略小于當(dāng)前的已安裝系統(tǒng)內(nèi)存,如當(dāng)前你的系統(tǒng)內(nèi)存為64GB,可以做如下設(shè)置
??*?? soft?? memlock??? 60397977
??*?? hard?? memlock??? 60397977
??上述的設(shè)置單位為kb,不會(huì)降低系統(tǒng)性能。至少也要配值為略大于系統(tǒng)上所有SGA的總和。
??使用ulimit -l 來(lái)校驗(yàn)該設(shè)置
?
c、禁用AMM(Oracle 11g)
??如果當(dāng)前的Oracle 版本為10g,可以跳過(guò)此步驟。
??如果當(dāng)前的Oracle 版本為11g,由于AMM(Automatic Memory Management)特性與Hugepages不兼容,需要禁用AMM。
????ALTER SYSTEM RESET memory_target SCOPE=SPFILE;
??? ALTER SYSTEM RESET memory_max_target SCOPE=SPFILE;
????ALTER SYSTEM SET sga_target=g SCOPE=SPFILE;
????ALTER SYSTEM SET pga_aggregate_target=g SCOPE=SPFILE;
????SHUTDOWN IMMEDIATE;?
??? STARTUP;
????
d、計(jì)算vm.nr_hugepages 的值????
??使用Oracle 提供的腳本hugepages_settings.sh的腳本來(lái)計(jì)算vm.nr_hugepages的值
??在執(zhí)行腳本之前確保所有的Oracle 實(shí)例已啟動(dòng)以及ASM也啟動(dòng)(存在的情形下)
??$ ./hugepages_settings.sh
??...
??Recommended setting: vm.nr_hugepages = 1496
?
e、?編輯/etc/sysctl.conf 來(lái)設(shè)置vm.nr_hugepages參數(shù)
??$ sysctl -w vm.nr_hugepages = 1496??
??$ sysctl -p
??
??-- Author : Robinson
??-- Blog?? :?http://blog.csdn.net/robinson_0612
??
f、停止所有的Instance并重啟server
??上述的所有步驟已經(jīng)實(shí)現(xiàn)了動(dòng)態(tài)修改,但對(duì)于HugePages的分配需要重新啟動(dòng)server才能生效。
?
h、驗(yàn)證配值
??HugePages相關(guān)參數(shù)的值會(huì)隨著當(dāng)前服務(wù)器上的實(shí)例的停止與啟動(dòng)而動(dòng)態(tài)發(fā)生變化
??通常情況下,HugePages_Free的值應(yīng)當(dāng)小于HugePages_Total的值,在HugePages被使用時(shí)HugePages_Rsvd值應(yīng)當(dāng)為非零值。
??$ grep Huge /proc/meminfo
??HugePages_Total:?? 131
??HugePages_Free:???? 20
??HugePages_Rsvd:???? 20
??Hugepagesize:???? 2048 kB?
??
??如下面的情形,當(dāng)服務(wù)器上僅有的一個(gè)實(shí)例被關(guān)閉后,HugePages_Rsvd的值為零。且HugePages_Free等于HugePages_Total
??$ grep Huge /proc/meminfo
??HugePages_Total:?? 131
??HugePages_Free:??? 131
??HugePages_Rsvd:????? 0
??Hugepagesize:???? 2048 kB???
?
3、使用HugePages的注意事項(xiàng)
??下面的三種情形應(yīng)當(dāng)重新配置HugePages
????a、物理內(nèi)存的增減或減少
????b、在當(dāng)前服務(wù)器上新增或移出Instance
????c、Instance的SGA大小增加或減少???
??如果未能調(diào)整HugePages,可能會(huì)引發(fā)下面的問(wèn)題
????a、數(shù)據(jù)庫(kù)性能地下
????b、出現(xiàn)內(nèi)存不足或者過(guò)度使用交換空間
????c、數(shù)據(jù)庫(kù)實(shí)例不能被啟動(dòng)
????d、關(guān)鍵性系統(tǒng)服務(wù)故障
???
4、HugePages特性的常見(jiàn)故障處理
Symptom A:
??? System is running out of memory or swapping?
Possible Cause:?
??? Not enough HugePages to cover the SGA(s) and therefore the area reserved for HugePages are wasted where SGAs are allocated through regular pages.?
Troubleshooting Action:
??? Review your HugePages configuration to make sure that all SGA(s) are covered.
?
Symptom B:
??? Databases fail to start?
Possible Cause:
??? memlock limits are not set properly?
Troubleshooting Action:
??? Make sure the settings in limits.conf apply to database owner account.
?
Symptom C:
??? One of the database fail to start while another is up?
Possible Cause:
??? The SGA of the specific database could not find available HugePages and remaining RAM is not enough.?
Troubleshooting Action:
??? Make sure that the RAM and HugePages are enough to cover all your database SGAs
?
Symptom D:
??? Cluster Ready Services (CRS) fail to start?
Possible Cause:
??? HugePages configured too large (maybe larger than installed RAM)
Troubleshooting Action:?
??? Make sure the total SGA is less than the installed RAM and re-calculate HugePages.
?
Symptom E:
??? HugePages_Total = HugePages_Free
Possible Cause:?
??? HugePages are not used at all. No database instances are up or using AMM.?
Troubleshooting Action:
?? Disable AMM and make sure that the database instances are up.
?
Symptom F:
??? Database started successfully and the performance is slow?
Possible Cause:
??? The SGA of the specific database could not find available HugePages and therefore the SGA is handled by regular pages, which leads to slow performance?
Troubleshooting Action:
??? Make sure that the HugePages are many enough to cover all your database SGAs
Reference: [ID 361468.1]
?
5、計(jì)算vm.nr_hugepages 值的腳本
[python]?view plain?copy ?print?
HugePages on Linux: What It Is... and What It Is Not... (文檔 ID 361323.1)
In this Document
| Purpose |
| Scope |
| Details |
| ? | Introduction |
| ? | Common Misconceptions |
| ? | Regular Pages and HugePages |
| ? | HugePages in 2.4 Kernels |
| ? | Some HugePages Facts/Features |
| ? | Advantages of HugePages Over Normal Sharing Or AMM (see below) |
| ? | The Size of a HugePage |
| ? | HugePages Reservation |
| ? | HugePages and Oracle 11g Automatic Memory Management (AMM) |
| ? | What if Not Enough HugePages Configured? |
| ? | What if Too Much HugePages Configured? |
| ? | Parameters/Setup |
| ? | Notes on HugePages in General |
| References |
APPLIES TO:
Oracle Database - Enterprise EditionLinux OS - Version Enterprise Linux 3.0 to Oracle Linux 6.5 with Unbreakable Enterprise Kernel [3.8.13] [Release RHEL3 to OL6U5]
IBM S/390 Based Linux (31-bit)
IBM: Linux on POWER Big Endian Systems
Linux x86-64
Linux Itanium
Linux x86
IBM: Linux on System z
***Checked for relevance on 20-May-2013***?
PURPOSE
This document describes the HugePages feature in the Linux kernel available for 32-bit and 64-bit architectures. There has been some confusion among the terms and uses related to HugePages. This document should clarify the misconceptions about the feature.
SCOPE
Information in this document is useful for Linux system administrators and Oracle database administrators working with system administrators.
This document covers information about HugePages concept that applies to very large memory (VLM)? (>= 4GB) systems for 32-bit and 64-bit architectures including some configuration information and references.
DETAILS
Introduction
HugePages is a feature integrated into the Linux kernel with release 2.6. This feature basically provides the alternative to the 4K page size (16K for IA64) providing bigger pages.
Regarding the HugePages, there are some other similar terms that are being used like, hugetlb, hugetlbfs. Before proceeding into the details of HugePages, see the definitions below:
- Page Table: A page table is the data structure of a virtual memory system in an operating system to store the mapping between virtual addresses and physical addresses. This means that on a virtual memory system, the memory is accessed by first accessing a page table and then accessing the actual memory location implicitly.
- TLB:?A Translation Lookaside Buffer (TLB) is a buffer (or cache) in a CPU that contains parts of the page table. This is a fixed size buffer being used to do virtual address translation faster.
- hugetlb:?This is an entry in the TLB that points to a HugePage (a large/big page larger than regular 4K and predefined in size). HugePages are implemented via hugetlb entries, i.e. we can say that a HugePage is handled by a "hugetlb page entry". The 'hugetlb" term is also (and mostly) used synonymously with a HugePage (See?Note 261889.1). In this document the term "HugePage" is going to be used but keep in mind that mostly "hugetlb" refers to the same concept.
- hugetlbfs:?This is a new in-memory filesystem like tmpfs and is presented by 2.6 kernel. Pages allocated on hugetlbfs type filesystem are allocated in HugePages.
Common Misconceptions
| WRONG:?HugePages is a method to be able to use large SGA on 32-bit VLM systems | RIGHT:?HugePages is a method to have larger pages where it is useful for working with very large memory. It is both useful in 32- and 64-bit configurations |
| WRONG:?HugePages cannot be used without USE_INDIRECT_DATA_BUFFERS | RIGHT:?HugePages can be used without indirect buffers. 64-bit systems does not need to use indirect buffers to have a large buffer cache for the RDBMS instance and HugePages can be used there too. |
| WRONG:?hugetlbfs means hugetlb | RIGHT:?hugetlbfs is a filesystem type **BUT** hugetlb is the mechanism employed in the back where hugetlb can be employed WITHOUT hugetlbfs |
| WRONG:?hugetlbfs means hugepages | RIGHT:?hugetlbfs is a filesystem type **BUT** HugePages is the mechanism employed in the back (synonymously with hugetlb) where HugePages can be employed WITHOUT hugetlbfs. |
?
Regular Pages and HugePages
This section aims to give a general picture about memory access in virtual memory systems and how pages are referenced.
When a single process works with a piece of memory, the pages that the process uses are reference in a local page table for the specific process. The entries in this table also contain references to the System-Wide Page Table which actually has references to actual physical memory addresses. So theoretically a user mode process (i.e. Oracle processes), follows its local page table to access to the system page table and then can reference the actual physical table virtually. As you can see below, it is also possible (and very common to Oracle RDBMS due to SGA use) that two different O/S processes can point to the same entry in the system-wide page table.
When HugePages are in the play, the usual page tables are employed. The very basic difference is that the entries in both process page table and the system? page table has attributes about huge pages. So any page in a page table can be a huge page or a regular page.
HugePages in 2.4 Kernels
The HugePages feature is backported to some 2.4 kernels. Kernel versions 2.4.21-* has this feature (See?Note 311504.1?for the distributions with 2.4.21 kernels) but it is implemented in a different way. The feature is completely available. The difference from 2.6 implementation is the organization within the source code and the kernel parameters that are used for configuring HugePages. See Parameters/Setup section below.
Some HugePages Facts/Features
- HugePages can be allocated on-the-fly but they must be reserved during system startup. Otherwise the allocation might fail as the memory is already paged in 4K mostly.
- HugePage sizes vary from 2MB to 256MB based on kernel version and HW architecture (See related section below.)
- HugePages are not subject to reservation /? release after the system startup unless there is system administrator intervention, basically changing the hugepages configuration (i.e. number of pages available or pool size)
Advantages of HugePages Over Normal Sharing Or AMM (see below)
- Not swappable:?HugePages are not swappable. Therefore there is no page-in/page-out mechanism overhead.HugePages are universally regarded as pinned.
- Relief of TLB pressure:
- Hugepge uses fewer pages to cover the physical address space, so the size of “book keeping” (mapping from the virtual to the physical address) decreases, so it requiring fewer entries in the TLB
- TLB entries will cover a larger part of the address space when use HugePages, there will be fewer TLB misses before the entire or most of the SGA is mapped in the SGA
- Fewer TLB entries for the SGA also means more for other parts of the address space
- Decreased page table overhead:?Each page table entry can be as large as 64 bytes and if we are trying to handle 50GB of RAM, the pagetable will be approximately 800MB in size which is practically will not fit in 880MB size lowmem (in 2.4 kernels - the page table is not necessarily in lowmem in 2.6 kernels) considering the other uses of lowmem. When 95% of memory is accessed via 256MB hugepages, this can work with a page table of approximately 40MB in total. See also?Document 361468.1.
- Eliminated page table lookup overhead:?Since the pages are not subject to replacement, page table lookups are not required.
- Faster overall memory performance:?On virtual memory systems each memory operation is actually two abstract memory operations. Since there are fewer pages to work on, the possible bottleneck on page table access is clearly avoided.???????
The Size of a HugePage
The size of a single HugePage varies according to:
- Kernel version/linux distribution
- HW Platform
The actual size of the HugePage on a specific system can be checked by:
??? ??? $ grep Hugepagesize /proc/meminfo
The table below shows the sizes of HugePages on different configurations. Note that these are general numbers taken from the most recent versions of the kernels. For a specific kernel source package, you can check for the HPAGE_SIZE macro value (based on HPAGE_SHIFT) for a different (more recent) kernel source tree.
| HW Platform | Source Code Tree | Kernel 2.4 | Kernel 2.6 and later |
| Linux x86 (IA32) | i386 | 4 MB | 2 MB |
| Linux x86-64 (AMD64, EM64T) | x86_64 | 2 MB | 2 MB |
| Linux Itanium (IA64) | ia64 | 256 MB | 256 MB |
| IBM Power Based Linux (PPC64) | ppc64/powerpc | N/A ** | 16 MB |
| IBM zSeries Based Linux | s390 | N/A | 1 MB |
| IBM S/390 Based Linux | s390 | N/A | N/A |
* Some older packaging for the 2.6.5 kernel on SLES8 (like 2.6.5-7.97) can have 2 MB Hugepagesize.
** Oracle RDBMS is also not certified in this configuration. See?Document 341507.1
HugePages Reservation
The HugePages reservation feature is fully implemented in 2.6.17 kernel, and thus EL5 (based on 2.6.18) has this feature. The alloc_huge_page() is improved for this. (See kernel source mm/hugetlb.c)
From /usr/share/doc/kernel-doc-2.6.18/Documentation/vm/hugetlbpage.txt:
This feature in the Linux kernel enables the Oracle Database to be able to allocate hugepages for the sublevels of the SGA on-demand. The same behaviour is expected for various Oracle Database versions that are certified on EL5.
HugePages and Oracle 11g Automatic Memory Management (AMM)
The AMM and HugePages are not compatible. One needs to disable AMM on 11g to be able to use HugePages. See?Document 749851.1?for further information.
What if Not Enough HugePages Configured?
Configuring your Linux OS for HugePages is a delicate process where if you do not configure properly, the system may experience serious problems. If you do not have enough HugePages configured you may encounter:
- HugePages not used (HugePages_Total = HugePages_Free) at all wasting the amount configured for
- Poor database performance
- System running out of memory or excessive swapping
- Some or any database instance cannot be started
- Crucial system services failing (e.g.: CRS)
To avoid / help with such situations?Bug 10153816?was filed to introduce a database initialization parameter in 11.2.0.2 (use_large_pages) to help manage which SGAs will use huge pages and potentially give warnings or not start up at all if they cannot get those pages.
What if Too Much HugePages Configured?
It is of course technically possible to configure more than needed. When that is done, the unused part of HugePages allocation will not be available for other purposes on the system and memory shortage can be encountered. Please make sure to configure only for needed amount of hugepages.
Parameters/Setup
The following configurations are a minimal list of documents providing general guidelines to configure HugePages for more than one Oracle RDBMS instance:
- Document 317055.1?How to Configure RHEL 3.0 32-bit for Very Large Memory with ramfs and hugepages
- Document 317067.1?How to Configure Asianux 1.0 32-bit for Very Large Memory with ramfs and hugepages
- Document 317141.1?How to Configure RHEL 4 32-bit for Very Large Memory with ramfs and hugepages
- Document 317139.1?How to Configure SuSE SLES 9 32-bit for Very Large Memory with ramfs and hugepages
- Document 361468.1?HugePages on 64-bit Linux
?
For all configurations be sure to have environment variable DISABLE_HUGETLBFS is unset. If it is set (specifically to 1) it will disable the use of HugePages by Oracle database.
- Amount of RAM installed for the Linux OS changed
- New database instance(s) introduced
- SGA size / configuration changed for one or more database instances
- Poor database performance
- System running out of memory or excessive swapping
- Database instances cannot be started
- Crucial system services failing
Kernel Version 2.4
The kernel parameter used for HugePages is vm.hugetlb_pool which is based on MB of memory. RHEL3, Asianux 1.0, SLES8 (Service Pack 3 and over) are examples of distributions with the 2.4 kernels with HugePages support. For the configuration, follow steps below:
1. Start database instance(s)
2. Calculate hugetlb_pool using script from?Note 401749.1
3. Shutdown database instances??
4. Set kernel parameter:
???? ??? # sysctl -w vm.hugetlb_pool=
??? and make sure that the parameter is persistent to reboots. e.g. On Asianux 1.0 by editing /etc/sysctl.conf adding/modifying as below:
??? ??? vm.hugetlb_pool=
5. Check available hugepages:
??? ??? $ grep Huge /proc/meminfo
6. Restart database instances
7. Check available hugepages:
??? ??? $ grep Huge /proc/meminfo
Notes:
- If the setting of hugetlb_pool is not effective, you will need to reboot the server to make HugePages allocation during system startup.
- The HugePages are allocated in a lazy fashion, so the "Hugepages_Free" count drops as the pages get touched and are backed by physical memory. The idea is that it's more efficient in the sense that you don't use memory you don't touch.
- If you had set the instance initialization parameter PRE_PAGE_SGA=TRUE (for suitable settings see?Document 30793.1), all of the pages would be allocated from HugePages up front.
Kernel Version 2.6
The kernel parameter used for HugePages is vm.nr_hugepages which is based on the number of the pages. SLES9, RHEL4 and Asianux 2.0 are? examples of distributions with the 2.6 kernel. For the configuration, follow steps below:
1. Start database instance(s)
2. Calculate nr_hugepages using script from?Document 401749.1
3. Shutdown database instances??
4. Set kernel parameter:
??????? # sysctl -w vm.nr_hugepages=
and make sure that the parameter is persistent to reboots. e.g. On SLES9:
??? ??? # chkconfig boot.sysctl on
5. Check available hugepages:
??????? $ grep Huge /proc/meminfo
6. Restart database instances
7. Check available hugepages:
??????? $ grep Huge /proc/meminfo
Notes:
- If the setting of nr_hugepages is not effective, you will need to reboot the server to make HugePages allocation during system startup.
- The HugePages are allocated in a lazy fashion, so the "Hugepages_Free" count drops as the pages get touched and are backed by physical memory. The idea is that it's more efficient in the sense that you don't use memory you don't touch.
- If you had set the instance initialization parameter PRE_PAGE_SGA=TRUE (for suitable settings see?Document? 30793.1), all of the pages would be allocated from HugePages up front.
Notes on HugePages in General
- The userspace application that employs HugePages should be aware of permission implications. Permissions HugePages segments in memory can strictly impose certain requirements. e.g. Per?Bug 6620371?on Linux x86-64 port of Oracle RDBMS until 11g was setting the shared memory flags to hugetlb, read and write by default. But that shall depend on the configuration environment and with?Patch 6620371?on 10.2 and with 11g, the read and write permissions are set based on the internal context.
?
REFERENCES
NOTE:261889.1- Bigpages vs. Hugetlb on RedHat LinuxNOTE:317141.1- How to Configure RHEL/OL 4 32-bit for Very Large Memory with ramfs and HugePages
BUG:10153816- WHEN USE_LARGE_PAGES=ONLY AND NO HUGEPAGES EXIST STARTUP FAILS NO DIAGNOSTIC
NOTE:1392497.1- USE_LARGE_PAGES To Enable HugePages
NOTE:311504.1- QREF: Linux Kernel Version Nomenclature
NOTE:317055.1- How to Configure RHEL 3.0 32-bit for Very Large Memory and HugePages
NOTE:317067.1- How to Configure Asianux 1.0 32-bit for Very Large Memory with ramfs and hugepages
NOTE:452326.1- Linux Kernel Lowmem Pressure Issues and Related Kernel Structures
NOTE:317139.1- How to Configure SuSE SLES 9 / 10 32-bit for Very Large Memory with ramfs and HugePages
NOTE:341507.1- Oracle Database Server on Linux on IBM POWER
NOTE:1557478.1- ALERT: Disable Transparent HugePages on SLES11, RHEL6, OL6 and UEK2 Kernels
NOTE:401749.1- Shell Script to Calculate Values Recommended Linux HugePages / HugeTLB Configuration
11gR2
G?Very Large Memory and HugePages
This chapter guides Linux system administrators to configure very large memory configurations and HugePages on Linux systems.
This chapter contains the following sections:
-
Very Large Memory on Linux x86
-
Overview of HugePages
G.1?Very Large Memory on Linux x86
Very Large Memory (VLM) configurations allow a 32-bit Oracle Database to access more than 4GB RAM that is traditionally available to Linux applications. The Oracle VLM option for 32-bit creates a large database buffer cache using an in-memory file system (/dev/shm). Other parts of the SGA are allocated from regular memory. VLM configurations improve database performance by caching more database buffers in memory, which significantly reduces the disk I/O compared to configurations without VLM. This chapter shows how to increase the SGA memory using VLM on a 32-bit computer.
Note:
The contents documented in this section apply only to 32-bit Linux operating system. With a 64-bit architecture, VLM support is available natively. All 64-bit Linux operating systems use the physical memory directly, as the maximum available virtual address space is 16 EB (exabyte = 2^60 bytes.)This section includes the following topics:
-
Implementing VLM on 32-bit Linux
-
Prerequisites for Implementing VLM
-
Methods To Increase SGA Limits
-
Configuring Very Large Memory for Oracle Database
-
Restrictions Involved in Implementing Very Large Memory
G.1.1?Implementing VLM on 32-bit Linux
With 32-bit architectures, VLM is accessed through a VLM window of a specific size. The VLM window is a data structure in the process address space that provides access to the whole virtual address space from a window of a specific size. On 32-bit Linux, you must set the parameter?USE_INDIRECT_DATA_BUFFERS=TRUE, and mount a?shmfs?or?tmpfs?or?ramfs?type of in-memory filesystem over?/dev/shm?to increase the usable address space.
G.1.2?Prerequisites for Implementing VLM
The following are some of the prerequisites for implementing VLM on a 32-bit operating system:
-
The computer on which Oracle Database is installed must have more than 4GB of memory.
-
The computer must be configured to use a kernel with PAE support upon startup.
-
The?USE_INDIRECT_DATA_BUFFERS=TRUE?must be present in the initialization parameter file for the database instance that uses VLM support.
-
Initialization parameters?DB_BLOCK_BUFFERS?and?DB_BLOCK_SIZE?must be set to values you have chosen for the Oracle Database.
G.1.3?Methods To Increase SGA Limits
In a typical 32-bit Linux kernel, one can create an SGA of up to 2.4GB size. Using a Linux Hugemem kernel enables the creation of an SGA of upto 3.2GB size. To go beyond 3.2GB on a 32-bit kernel, you must use the VLM feature.
The following are the methods to increase SGA limits on a 32-bit computer:
-
Hugemem Kernel
-
Hugemem Kernel with Very Large Memory
G.1.3.1?Hugemem Kernel
Red Hat Enterprise Linux 4 and Oracle Linux 4 include a new kernel known as the Hugemem kernel. The Hugemem kernel feature is also called a 4GB-4GB Split Kernel as it supports a 4GB per process user space (versus 3GB for the other kernels), and a 4GB direct kernel space. Using this kernel enables RHEL 4/Oracle Linux 4 to run on systems with up to 64GB of main memory. The Hugemem kernel is required to use all the memory in system configurations containing more than 16GB of memory. The Hugemem kernel can run configurations with less memory.
A classic 32-bit 4GB virtual address space is split 3GB for user processes and 1GB for the kernel. The new scheme (4GB/4GB) permits 4GB of virtual address space for the kernel and almost 4GB for each user process. Due to this scheme with hugemem kernel, 3.2GB of SGA can be created without using the indirect data buffer method.
Note:
Red Hat Enterprise Linux 5/ Oracle Linux 5 and Red Hat Enterprise Linux 6/ Oracle Linux 6 on 32-bit does not have the hugemem kernel. It supports only the 3GB user process/ 1GB kernel split. It has a PAE kernel that supports systems with more than 4GB of RAM and reliably upto 16GB. Since this has a 3GB/1GB kernel split, the system may run out of lowmem if the system's load consumes lots of lowmem. There is no equivalent kernel for hugemem in Enterprise Linux 5 and one is recommended to either use Enterprise Linux 4 with hugemem or go for 64-bit.The Hugemem kernel on large computers ensures better stability as compared to the performance overhead of address space switching.
Run the following command to determine if you are using the?Hugemem?kernel:
$ uname -r 2.6.9-5.0.3.ELhugememG.1.3.2?Hugemem Kernel with Very Large Memory
If you use only Hugemem kernels on 32-bit systems, then the SGA size can be increased but not significantly. Refer to section?"Hugemem Kernel", for more information.
Note:
Red Hat Enterprise Linux 5/ Oracle Linux 5 and Red Hat Enterprise Linux 6/ Oracle Linux 6 does not support the hugemem kernel. It supports a PAE kernel that can be used to implement Very Large Memory (VLM) as long as the physical memory does not exceed 16GB.This section shows how the SGA can be significantly increased by using Hugemem kernel with VLM on 32-bit systems.
The SGA can be increased to about 62GB (depending on block size) on a 32-bit system with 64GB RAM. A processor feature called Page Address Extension (PAE) permits you to physically address 64GB of RAM. Since PAE does not enable a process or program to either address more than 4GB directly, or have a virtual address space larger than 4GB, a process cannot attach to shared memory directly. To address this issue, a shared memory filesystem (memory-based filesystem) must be created which can be as large as the maximum allowable virtual memory supported by the kernel. With a shared memory filesystem processes can dynamically attach to regions of the filesystem allowing applications like Oracle to have virtually a much larger shared memory on 32-bit operating systems. This is not an issue on 64-bit operating systems.
VLM moves the database buffer cache part of the SGA from the System V shared memory to the shared memory filesystem. It is still considered one large SGA but it consists now of two different operating system shared memory entities. VLM uses 512MB of the non-buffer cache SGA to manage VLM. This memory area is needed for mapping the indirect data buffers (shared memory filesystem buffers) into the process address space since a process cannot attach to more than 4GB directly on a 32-bit system.
Note:
USE_INDIRECT_DATA_BUFFERS=TRUE?must be present in the initialization parameter file for the database instance that use Very Large Memory support. If this parameter is not set, then Oracle Database 11g?Release 2 (11.2) or later behaves in the same way as previous releases.You must also manually set the initialization parameters?DB_BLOCK_BUFFERS?and?SHARED_POOL_SIZE?to values you have chosen for an Oracle Database. Automatic Memory Management (AMM) cannot be used. The initialization parameter?DB_BLOCK_SIZE?sets the block size and in combination with?DB_BLOCK_BUFFERS?determines the buffer cache size for an instance
For example, if the non-buffer cache SGA is 2.5GB, then you will only have 2GB of non-buffer cache SGA for shared pool, large pool, and redo log buffer since 512MB is used for managing VLM. It is not recommended to use VLM if buffer cache size is less than 512MB.
In RHEL 4/ Oracle Linux 4 there are two different memory file systems that can be used for VLM:
-
tmpfs or shmfs: mount a?shmfs?with a certain size to?/dev/shm, and set the correct permissions. For?tmpfs?you do not need to specify a size.?Tmpfs or?shmfs?allocated memory is pageable.
For example:
Example Mount shmfs: # mount -t shm shmfs -o size=20g /dev/shmEdit /etc/fstab: shmfs /dev/shm shm size=20g 0 0ORExample Mount tmpfs: # mount –t tmpfs tmpfs /dev/shmEdit /etc/fstab: none /dev/shm tmpfs defaults 0 0 -
ramfs:?ramfs?is similar to?shmfs, except that pages are not pageable or swappable. This approach provides the commonly desired effect.?ramfs?is created by:
umount /dev/shm mount -t ramfs ramfs /dev/shm
G.1.4?Configuring Very Large Memory for Oracle Database
Complete the following procedure to configure Very Large Memory on Red Hat Enterprise Linux 4/ Oracle Linux 4 using?ramfs:
Log in as a?root?user:
sudo -sh Password:Edit the?/etc/rc.local?file and add the following entries to it to configure the computer to mount?ramfs?over the?/dev/shm?directory, whenever you start the computer:
umount /dev/shm mount -t ramfs ramfs /dev/shm chown oracle:oinstall /dev/shmIn the preceding commands,?oracle?is the owner of Oracle software files and?oinstall?is the group for Oracle owner account. If the new configuration disables?/etc/rc.local?file or you start an instance of Oracle database using a Linux service script present under the?/etc/init.d?file, then you can add those entries in the service script too.
Note, this configuration will make?ramfs?ready even before your system autostarts crucial Oracle Database instances. The commands can also be included in your startup scripts. It is important that you test the commands extensively by repeated restart action, after you complete configuring the computer using the following steps:
Restart the server.
Log in as a?root?user.
Run the following command to check if the?/dev/shm?directory is mounted with the ramfs type:
/dev/shm directory is mounted with the ramfs type:# mount | grep shm ramfs on /dev/shm type ramfs (rw)Run the following command to check the permissions on the?/dev/shm?directory:
# ls -ld /dev/shm drwxr-xr-x 3 oracle oinstall 0 Jan 13 12:12 /dev/shmEdit the?/etc/security/limits.conf?file and add the following entries to it to increase the max locked memory limit:
soft memlock 3145728 hard memlock 3145728Switch to the?oracle?user:
# sudo - oracle Password:Run the following command to check the max locked memory limit:
$ ulimit -l 3145728Complete the following procedure to configure instance parameters for Very Large Memory:
Replace the?DB_CACHE_SIZE,?DB_xK_CACHE_SIZE,?sga_target, and?memory_target?parameters with?DB_BLOCK_BUFFERS?parameter.
Add the?USE_INDIRECT_DATA_BUFFERS=TRUE?parameter.
Configure SGA size according to the SGA requirements.
Remove?SGA_TARGET,?MEMORY_TARGET, or?MEMORY_MAX_TARGET?parameters, if set.
Start the database instance.
Run the following commands to check the memory allocation:
$ ls -l /dev/shm $ ipcs -mSee Also:
"Configuring HugePages on Linux"?section for more information about HugePages.G.1.5?Restrictions Involved in Implementing Very Large Memory
Following are the limitations of running a computer in the Very Large Memory mode:
-
You cannot use Automatic Memory Management (AMM) while implementing VLM using?ramfs, because AMM works on dynamic SGA tuning. With AMM swapping is possible. For example, you can unmap the unused SGA space and map it to PGA. Dynamic SGA and multiple block size are not supported with Very Large Memory because?ramfs?is not swappable. To enable Very Large Memory, you must ensure that you set the value of?MEMORY_TARGET?to zero.
-
VLM can be implemented only if Database Buffer Cache size is greater than 512MB.
G.2?Overview of HugePages
HugePages is a feature integrated into the Linux kernel 2.6. Enabling HugePages makes it possible for the operating system to support memory pages greater than the default (usually 4KB). Using very large page sizes can improve system performance by reducing the amount of system resources required to access page table entries. HugePages is useful for both 32-bit and 64-bit configurations. HugePage sizes vary from 2MB to 256MB, depending on the kernel version and the hardware architecture. For Oracle Databases, using HugePages reduces the operating system maintenance of page states, and increases Translation Lookaside Buffer (TLB) hit ratio.
This section includes the following topics:
-
Tuning SGA With HugePages
-
Configuring HugePages on Linux
-
Restrictions for HugePages Configurations
G.2.1?Tuning SGA With HugePages
Without HugePages, the operating system keeps each 4KB of memory as a page, and when it is allocated to the SGA, then the lifecycle of that page (dirty, free, mapped to a process, and so on) is kept up to date by the operating system kernel.
With HugePages, the operating system page table (virtual memory to physical memory mapping) is smaller, since each page table entry is pointing to pages from 2MB to 256MB. Also, the kernel has fewer pages whose lifecyle must be monitored.
Note:
2MB size of HugePages is available with Linux x86-64, Linux x86, and IBM: Linux on System z.The following are the advantages of using HugePages:
-
Increased performance through increased TLB hits.
-
Pages are locked in memory and are never swapped out which guarantees that shared memory like SGA remains in RAM.
-
Contiguous pages are preallocated and cannot be used for anything else but for System V shared memory (for example, SGA)
-
Less bookkeeping work for the kernel for that part of virtual memory due to larger page sizes
G.2.2?Configuring HugePages on Linux
Complete the following steps to configure HugePages on the computer:
Edit the?memlock?setting in the?/etc/security/limits.conf?file. The?memlock?setting is specified in KB and set slightly lesser than the installed RAM. For example, if you have 64GB RAM installed, add the following entries to increase the max locked memory limit:
* soft memlock 60397977 * hard memlock 60397977You can also set the?memlock?value higher than your SGA requirements.
Login as the?oracle?user again and run the?ulimit -l?command to verify the new?memlock?setting:
$ ulimit -l 60397977Run the following command to display the value of?Hugepagesize?variable:
$ grep Hugepagesize /proc/meminfoComplete the following procedure to create a script that computes recommended values for?hugepages?configuration for the current shared memory segments:
Note:
Following is an example that may require modifications.Create a text file named?hugepages_settings.sh.
Add the following content in the file:
#!/bin/bash # # hugepages_settings.sh # # Linux bash script to compute values for the # recommended HugePages/HugeTLB configuration # # Note: This script does calculation for all shared memory # segments available when the script is run, no matter it # is an Oracle RDBMS shared memory segment or not. # Check for the kernel version KERN=`uname -r | awk -F. '{ printf("%d.%d\n",$1,$2); }'` # Find out the HugePage size HPG_SZ=`grep Hugepagesize /proc/meminfo | awk {'print $2'}` # Start from 1 pages to be on the safe side and guarantee 1 free HugePage NUM_PG=1 # Cumulative number of pages required to handle the running shared memory segments for SEG_BYTES in `ipcs -m | awk {'print $5'} | grep "[0-9][0-9]*"` doMIN_PG=`echo "$SEG_BYTES/($HPG_SZ*1024)" | bc -q`if [ $MIN_PG -gt 0 ]; thenNUM_PG=`echo "$NUM_PG+$MIN_PG+1" | bc -q`fi done # Finish with results case $KERN in'2.4') HUGETLB_POOL=`echo "$NUM_PG*$HPG_SZ/1024" | bc -q`;echo "Recommended setting: vm.hugetlb_pool = $HUGETLB_POOL" ;;'2.6') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;;*) echo "Unrecognized kernel version $KERN. Exiting." ;; esac # EndRun the following command to change the permission of the file:
$ chmod +x hugepages_settings.shRun the?hugepages_settings.sh?script to compute the values for?hugepages?configuration:
$ ./hugepages_settings.shSet the following kernel parameter:
# sysctl -w vm.nr_hugepages=value_displayed_in_step_5To make the value of the parameter available for every time you restart the computer, edit the?/etc/sysctl.conf?file and add the following entry:
vm.nr_hugepages=value_displayed_in_step_5Restart the server.
Note:
To check the available?hugepages, run the following command: $ grep Huge /proc/meminfoG.2.3?Restrictions for HugePages Configurations
Following are the limitations of using HugePages:
-
Automatic Memory Management (AMM) and HugePages are not compatible. When you use AMM, the entire SGA memory is allocated by creating files under?/dev/shm. When Oracle Database allocates SGA with AMM, HugePages are not reserved. To use HugePages on Oracle Database 12c, You must disable AMM.
-
If you are using VLM in a 32-bit environment, then you cannot use HugePages for the Database Buffer cache. You can use HugePages for other parts of the SGA, such as?shared_pool,?large_pool, and so on. Memory allocation for VLM (buffer cache) is done using shared memory file systems (ramfs/tmpfs/shmfs). Memory file systems do not reserve or use HugePages.
-
HugePages are not subject to allocation or release after system startup, unless a system administrator changes the HugePages configuration, either by modifying the number of pages available, or by modifying the pool size. If the space required is not reserved in memory during system startup, then HugePages allocation fails.
12cR1
G?HugePages
This chapter provides an overview of Hugepages and guides Linux system administrators to configure HugePages on Linux.
G.1?Overview of HugePages
HugePages is a feature integrated into the Linux kernel 2.6. Enabling HugePages makes it possible for the operating system to support memory pages greater than the default (usually 4 KB). Using very large page sizes can improve system performance by reducing the amount of system resources required to access page table entries. HugePages is useful for both 32-bit and 64-bit configurations. HugePage sizes vary from 2 MB to 256 MB, depending on the kernel version and the hardware architecture. For Oracle Databases, using HugePages reduces the operating system maintenance of page states, and increases Translation Lookaside Buffer (TLB) hit ratio.
Note:
Transparent Hugepages is currently not an alternative to manually configure HugePages.This section includes the following topics:
-
Tuning SGA With HugePages
-
Configuring HugePages on Linux
-
Restrictions for HugePages Configurations
-
Disabling Transparent HugePages
G.1.1?Tuning SGA With HugePages
Without HugePages, the operating system keeps each 4 KB of memory as a page. When it allocates pages to the database System Global Area (SGA), the operating system kernel must continually update its page table with the page lifecycle (dirty, free, mapped to a process, and so on) for each 4 KB page allocated to the SGA.
With HugePages, the operating system page table (virtual memory to physical memory mapping) is smaller, because each page table entry is pointing to pages from 2 MB to 256 MB.
Also, the kernel has fewer pages whose lifecycle must be monitored. For example, if you use HugePages with 64-bit hardware, and you want to map 256 MB of memory, you may need one page table entry (PTE). If you do not use HugePages, and you want to map 256 MB of memory, then you must have 256 MB * 1024 KB/4 KB = 65536 PTEs.
HugePages provides the following advantages:
-
Increased performance through increased TLB hits
-
Pages are locked in memory and never swapped out, which provides RAM for shared memory structures such as SGA
-
Contiguous pages are preallocated and cannot be used for anything else but for System V shared memory (for example, SGA)
-
Less bookkeeping work for the kernel for that part of virtual memory because of larger page sizes
G.1.2?Configuring HugePages on Linux
Complete the following steps to configure HugePages on the computer:
Run the following command to determine if the kernel supports HugePages:
$ grep Huge /proc/meminfoSome Linux systems do not support HugePages by default. For such systems, build the Linux kernel using the?CONFIG_HUGETLBFS?and?CONFIG_HUGETLB_PAGE?configuration options.CONFIG_HUGETLBFS?is located under File Systems and?CONFIG_HUGETLB_PAGE?is selected when you select?CONFIG_HUGETLBFS.
Edit the?memlock?setting in the?/etc/security/limits.conf?file. The?memlock?setting is specified in KB, and the maximum locked memory limit should be set to at least 90 percent of the current RAM when HugePages memory is enabled and at least 3145728 KB (3 GB) when HugePages memory is disabled. For example, if you have 64 GB RAM installed, then add the following entries to increase the maximum locked-in-memory address space:
* soft memlock 60397977 * hard memlock 60397977You can also set the?memlock?value higher than your SGA requirements.
Log in as?oracle?user again and run the?ulimit -l?command to verify the new?memlock?setting:
$ ulimit -l 60397977Run the following command to display the value of?Hugepagesize?variable:
$ grep Hugepagesize /proc/meminfoComplete the following procedure to create a script that computes recommended values for?hugepages?configuration for the current shared memory segments:
Create a text file named?hugepages_settings.sh.
Add the following content in the file:
#!/bin/bash # # hugepages_settings.sh # # Linux bash script to compute values for the # recommended HugePages/HugeTLB configuration # # Note: This script does calculation for all shared memory # segments available when the script is run, no matter it # is an Oracle RDBMS shared memory segment or not. # Check for the kernel version KERN=`uname -r | awk -F. '{ printf("%d.%d\n",$1,$2); }'` # Find out the HugePage size HPG_SZ=`grep Hugepagesize /proc/meminfo | awk {'print $2'}` # Start from 1 pages to be on the safe side and guarantee 1 free HugePage NUM_PG=1 # Cumulative number of pages required to handle the running shared memory segments for SEG_BYTES in `ipcs -m | awk {'print $5'} | grep "[0-9][0-9]*"` doMIN_PG=`echo "$SEG_BYTES/($HPG_SZ*1024)" | bc -q`if [ $MIN_PG -gt 0 ]; thenNUM_PG=`echo "$NUM_PG+$MIN_PG+1" | bc -q`fi done # Finish with results case $KERN in'2.4') HUGETLB_POOL=`echo "$NUM_PG*$HPG_SZ/1024" | bc -q`;echo "Recommended setting: vm.hugetlb_pool = $HUGETLB_POOL" ;;'2.6'|'3.8') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;;*) echo "Unrecognized kernel version $KERN. Exiting." ;; esac # EndRun the following command to change the permission of the file:
$ chmod +x hugepages_settings.shRun the?hugepages_settings.sh?script to compute the values for?hugepages?configuration:
$ ./hugepages_settings.shNote:
Before running this script, ensure that all the applications that use?hugepages?run.Set the following kernel parameter, where?value?is the HugePages value that you determined in step?7:
# sysctl -w vm.nr_hugepages=valueTo ensure that HugePages is allocated after system restarts, add the following entry to the?/etc/sysctl.conf?file, where?value?is the HugePages value that you determined in step?7:
vm.nr_hugepages=valueRun the following command to check the available?hugepages:
$ grep Huge /proc/meminfoRestart the instance.
Run the following command to check the available?hugepages?(1 or 2 pages free):
$ grep Huge /proc/meminfoNote:
If you cannot set your HugePages allocation using?nr_hugepages, then your available memory may be fragmented. Restart your server for the Hugepages allocation to take effect.G.1.3?Restrictions for HugePages Configurations
HugePages has the following limitations:
-
You must unset both the?MEMORY_TARGET?and?MEMORY_MAX_TARGET?initialization parameters. For example, to unset the parameters for the database instance, use the command?ALTER SYSTEM RESET.
-
Automatic Memory Management (AMM) and HugePages are not compatible. When you use AMM, the entire SGA memory is allocated by creating files under?/dev/shm. When Oracle Database allocates SGA with AMM, HugePages are not reserved. To use HugePages on Oracle Database 12c, You must disable AMM.
-
If you are using VLM in a 32-bit environment, then you cannot use HugePages for the Database Buffer cache. You can use HugePages for other parts of the SGA, such as?shared_pool,?large_pool, and so on. Memory allocation for VLM (buffer cache) is done using shared memory file systems (ramfs/tmpfs/shmfs). Memory file systems do not reserve or use HugePages.
-
HugePages are not subject to allocation or release after system startup, unless a system administrator changes the HugePages configuration, either by modifying the number of pages available, or by modifying the pool size. If the space required is not reserved in memory during system startup, then HugePages allocation fails.
-
Ensure that HugePages is configured properly as the system may run out of memory if excess HugePages is not used by the application.
-
If there is insufficient HugePages when an instance starts and the initialization parameter?use_large_pages?is set to?only, then the database fails to start and an alert log message provides the necessary information on Hugepages.
G.1.4?Disabling Transparent HugePages
Transparent HugePages memory is enabled by default with Red Hat Enterprise Linux 6, SUSE 11, and Oracle Linux 6 with earlier releases of Oracle Linux Unbreakable Enterprise Kernel 2 (UEK2) kernels. Transparent HugePages memory is disabled by default in later releases of UEK2 kernels.
Transparent HugePages can cause memory allocation delays at runtime. To avoid performance issues, Oracle recommends that you disable Transparent HugePages on all Oracle Database servers. Oracle recommends that you instead use standard HugePages for enhanced performance.
Transparent HugePages memory differs from standard HugePages memory because the kernel?khugepaged?thread allocates memory dynamically during runtime. Standard HugePages memory is pre-allocated at startup, and does not change during runtime.
To check if Transparent HugePages is enabled run one of the following commands as the?root?user:
Red Hat Enterprise Linux kernels:
# cat /sys/kernel/mm/redhat_transparent_hugepage/enabledOther kernels:
# cat /sys/kernel/mm/transparent_hugepage/enabledThe following is a sample output that shows Transparent HugePages is being used as the?[always]?flag is enabled.
[always] neverNote:
If Transparent HugePages is removed from the kernel then the?/sys/kernel/mm/transparent_hugepage?or?/sys/kernel/mm/redhat_transparent_hugepage?files do not exist.To disable Transparent HugePages perform the following steps:
Add the following entry to the kernel boot line in the?/etc/grub.conf?file:
transparent_hugepage=neverFor example:
title Oracle Linux Server (2.6.32-300.25.1.el6uek.x86_64)root (hd0,0)kernel /vmlinuz-2.6.32-300.25.1.el6uek.x86_64 ro root=LABEL=/ transparent_hugepage=neverinitrd /initramfs-2.6.32-300.25.1.el6uek.x86_64.imgAbout Me
| ............................................................................................................................... ●?本文整理自網(wǎng)絡(luò) ●?小麥苗云盤(pán)地址:http://blog.itpub.net/26736162/viewspace-1624453/ ● QQ群:230161599???? 微信群:私聊 ●?聯(lián)系我請(qǐng)加QQ好友(642808185),注明添加緣由 ●?版權(quán)所有,歡迎分享本文,轉(zhuǎn)載請(qǐng)保留出處 ...............................................................................................................................
拿起手機(jī)使用微信客戶端掃描下邊的左邊圖片來(lái)關(guān)注小麥苗的微信公眾號(hào):xiaomaimiaolhr,掃描右邊的二維碼加入小麥苗的QQ群,學(xué)習(xí)最實(shí)用的數(shù)據(jù)庫(kù)技術(shù)。
|
來(lái)自 “ ITPUB博客 ” ,鏈接:http://blog.itpub.net/26736162/viewspace-2134314/,如需轉(zhuǎn)載,請(qǐng)注明出處,否則將追究法律責(zé)任。
總結(jié)
以上是生活随笔為你收集整理的HugePages 大内存页的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 为什么90%的“码农”做不了“架构师”?
- 下一篇: shell的logo含义_45个富有深意