openstack 逻辑构架真相
轉載自:http://blog.csdn.net/u010305706/article/details/52206175
別以為真懂OpenStack!先別著急罵我,我也沒有說我真懂Openstack
我其實很想弄懂Openstack,然而從哪里下手呢?作為程序員,第一個想法當然是代碼,Code Talks,什么都可以忽悠,代碼是實實在在的,何況原來也深入讀過Lucene, Hadoop的源代碼,總以為從代碼下手,背后的原理變了然了。
說干就干,我喜歡讀取代碼的方式是按照情景閱讀,比如在Lucene中跟蹤索引的過程,跟蹤搜索的過程,比如在Hadoop中,跟蹤寫入文件的過程,跟蹤Map-Reduce的過程,于是在Openstack中決定跟蹤虛擬機創建的整個過程
好在很多先賢已經做過這方面的事情,想來也沒有那么的困難。
比較推薦一篇 Request Flow for Provisioning Instance in Openstack(http://ilearnstack.com/2013/04/26/request-flow-for-provisioning-instance-in-openstack/),如果被墻擋住了,我轉到了[轉]Request Flow for Provisioning Instance in Openstack
然而真的開始了這個旅程,卻發現Openstack中涉及的知識絕非只有Python代碼,而必須有大量的外圍知識方可理解。
Openstack社區強大,各門各派武林高手競相亮招,不斷的貢獻各種各樣的插件,模塊:
- 模塊繁多:除了Iaas平臺的基本組件keystone, nova, glance, neutron, cinder之外,很多人都想在Openstack里面創建新的模塊,如雨后春筍冒了出來,Telemetry (Ceilometer), Orchestration (Heat), Database Service (Trove), Data processing (Sahara), Bare metal (Ironic), Queue service (Marconi), Key management (Barbican), DNS Services (Designate), Deployment (TripleO),哦,太多了,研究不過來,好吧,先收縮一下雄心壯志,專注IaaS層吧,所以有關這些模塊的知識點,本文沒有涉及。
- 插件繁多:除了Openstack支持的一些開源插件外,各大廠商都爭先恐后的開發自己的插件,似乎害怕被Openstack社區拒之門外。我沒有錢去買去試這么多廠家的設備和插件,所以只能使用開源免費默認的KVM,LVM,Openvswitch等,所以有關各種廠商的插件的知識點,本文沒有涉及。
?
好了,我已經退縮到不能再退縮的scope了,下面就進入Openstack的虛擬機創建之旅。
我學習知識采用貪心算法,遇到在哪個步驟先遇到某個知識點就研究,可能這個知識點在其他的模塊也有應用,到時候就發現這個知識點已經被遍歷過了,當然也沒有太過貪心,遇到過于繁復,過于生僻的,也當機立斷,進行剪枝。
一、Keystone
步驟1: 任何客戶端想要訪問任何服務,都需要先從keystone拿到Token
還記得原來那個短短的UUID的Token么?例如"aef56cc3d1c9192b0257fba1a420fc37"
后來變成了一長串不知道是什么的東東:
MIIDsAYJKoZIhvcNAQcCoIIDoTCCA50CAQExCTAHBgUrDgMCGjCCAokGCSqGSIb3DQEHAaCCAnoEggJ2ew0KICAgICJhY2Nlc3MiOiB7DQogICAgICAgICJtZXRhZGF0YSI6IHsNCiAgICAgICAgICAgIC4uLi5tZXRhZGF0YSBnb2VzIGhlcmUuLi4uDQogICAgICAgIH0sDQogICAgICAgICJzZXJ2aWNlQ2F0YWxvZyI6IFsNCiAgICAgICAgICAgIC4uLi5lbmRwb2ludHMgZ29lcyBoZXJlLi4uLg0KICAgICAgICBdLA0KICAgICAgICAidG9rZW4iOiB7DQogICAgICAgICAgICAiZXhwaXJlcyI6ICIyMDEzLTA1LTI2VDA4OjUyOjUzWiIsDQogICAgICAgICAgICAiaWQiOiAicGxhY2Vob2xkZXIiLA0KICAgICAgICAgICAgImlzc3VlZF9hdCI6ICIyMDEzLTA1LTI1VDE4OjU5OjMzLjg0MTgxMSIsDQogICAgICAgICAgICAidGVuYW50Ijogew0KICAgICAgICAgICAgICAgICJkZXNjcmlwdGlvbiI6IG51bGwsDQogICAgICAgICAgICAgICAgImVuYWJsZWQiOiB0cnVlLA0KICAgICAgICAgICAgICAgICJpZCI6ICI5MjVjMjNlYWZlMWI0NzYzOTMzZTA4YTRjNDE0M2YwOCIsDQogICAgICAgICAgICAgICAgIm5hbWUiOiAidXNlciINCiAgICAgICAgICAgIH0NCiAgICAgICAgfSwNCiAgICAgICAgInVzZXIiOiB7DQogICAgICAgICAgICAuLi4udXNlcmRhdGEgZ29lcyBoZXJlLi4uLg0KICAgICAgICB9DQogICAgfQ0KfQ0KMYH/MIH8AgEBMFwwVzELMAkGA1UEBhMCVVMxDjAMBgNVBAgTBVVuc2V0MQ4wDAYDVQQHEwVVbnNldDEOMAwGA1UEChMFVW5zZXQxGDAWBgNVBAMTD3d3dy5leGFtcGxlLmNvbQIBATAHBgUrDgMCGjANBgkqhkiG9w0BAQEFAASBgEh2P5cHMwelQyzB4dZ0FAjtp5ep4Id1RRs7oiD1lYrkahJwfuakBK7OGTwx26C+0IPPAGLEnin9Bx5Vm4cst/0+COTEh6qZfJFCLUDj5b4EF7r0iosFscpnfCuc8jGMobyfApz/dZqJnsk4lt1ahlNTpXQeVFxNK/ydKL+tzEjg
這里面是什么,不可能是一般的亂碼吧。
于是看到了這篇文章
Understanding OpenStack Authentication: Keystone PKI (https://www.mirantis.com/blog/understanding-openstack-authentication-keystone-pki/)
[轉]Understanding OpenStack Authentication: Keystone PKI
才了解了這個過程
看懂這個圖,如果沒有點信息安全課程的底子,還真不行。什么各種CA, Certificate, Key,直接就暈了。
于是看了《Principles of Information Security,4ed》的第八章Cryptography,以及《Information Security Principle and Practice》才有所了悟。
下面這篇博客也對相關的概念作了形象的描述
數字證書原理(http://blog.sina.com.cn/s/blog_44ee37cd01016r1h.html)
這些概念都是了解SSL和https的必須的,而且我們在部署Openstack的時候,所有的服務最好也部署成HTTPS的。
下面這兩篇文章會幫你更好的了解SSL
http://httpd.apache.org/docs/2.2/ssl/ssl_intro.html
http://www.codeproject.com/Articles/326574/An-Introduction-to-Mutual-SSL-Authentication
?
要使用SSL,兩個必備的工具Openssl和certtool,其中Openssl比較常用,而certtool是用于配置libvirt遠程連接的官方推薦的工具。
對于Openssl,推薦下面的鏈接
http://pages.cs.wisc.edu/~zmiller/ca-howto/如果被墻屏蔽了可以訪問How To Setup a CA
Openssl的證書操作
?
對于certtool,推薦libvirt的官方文檔,講的非常的形象具體
http://wiki.libvirt.org/page/TLSSetup
keystone除了authentication的功能,還有authorization。
對于訪問控制Access Control,發現有多種http://en.wikipedia.org/wiki/Access_control,而Openstack采用的是Role Based Access Control RBAC。
其中在V2中采用的每個Service下面的policy.json文件,訪問控制是每個Service自己決策的。后來在V3中,除了policy.json文件,還可以將Policy在數據庫中創建,實現了keystone的統一管理。
推薦下面的文章
Customizing OpenStack RBAC policies
[轉] Customizing OpenStack RBAC policies
Mandatory Access Control (MAC)在Openstack中也有應用,就是對Libvirt對Host文件的訪問控制AppArmor。當你使用virsh命令進行操作的時候,如果發現自己是root,但是還沒有權限,八成就是它的原因了。
推薦http://ubuntuforums.org/showthread.php?t=1008906
[轉] Introduction to AppArmor
用戶管理也是Keystone的一大工作
在V2中,結構比較簡單,用一個三角形就可以明白
?
Keystone V3中的概念就比較多了,也相對復雜,文檔較少,比較推薦下面的文章。
http://www.florentflament.com/blog/setting-keystone-v3-domains.html
[轉]Setting Keystone v3 domains
我也畫了一幅圖,來幫助理解這個過程。
Keystone v3 domains 應用場景
?
二、nova-api
步驟3:nova-api接收請求
nova-api接收請求,也不是隨便怎么來都接收的,而是需要設定rate limits,默認的實現是在ratelimit的middleware里面實現的。
然而有時候,我們希望實現distributed rate-limiting,從而Turnstile是一個不錯的選擇。
https://github.com/klmitch/turnstile
http://pypi.python.org/pypi/turnstile
步驟4:對Token的驗證
步驟5:查看Policy
這兩步已經在keystone的時候研究過
步驟6:檢查quota
nova, neutron, Cinder各有各的quota,并且可以從命令行進行管理
# nova -h | grep quota
??? quota-class-show??? List the quotas for a quota class.
??? quota-class-update? Update the quotas for a quota class.
??? quota-defaults????? List the default quotas for a tenant.
??? quota-delete??????? Delete quota for a tenant/user so their quota will
??? quota-show????????? List the quotas for a tenant/user.
??? quota-update??????? Update the quotas for a tenant/user.
# nova quota-show
+-----------------------------+-------+
| Quota?????????????????????? | Limit |
+-----------------------------+-------+
| instances?????????????????? | 10??? |
| cores?????????????????????? | 20??? |
| ram???????????????????????? | 51200 |
| floating_ips??????????????? | 10??? |
| fixed_ips?????????????????? | -1??? |
| metadata_items????????????? | 128?? |
| injected_files????????????? | 5???? |
| injected_file_content_bytes | 10240 |
| injected_file_path_bytes??? | 255?? |
| key_pairs?????????????????? | 100?? |
| security_groups???????????? | 10??? |
| security_group_rules??????? | 20??? |
+-----------------------------+-------+
# cinder -h | grep quota
??? quota-class-show??? List the quotas for a quota class.
??? quota-class-update? Update the quotas for a quota class.
??? quota-defaults????? List the default quotas for a tenant.
??? quota-show????????? List the quotas for a tenant.
??? quota-update??????? Update the quotas for a tenant.
??? quota-usage???????? List the quota usage for a tenant.
# cinder quota-show 1779b3bc725b44b98726fb0cbdc617b1
+-----------+-------+
|? Property | Value |
+-----------+-------+
| gigabytes |? 1000 |
| snapshots |?? 10? |
|? volumes? |?? 10? |
+-----------+-------+
# neutron -h | grep quota
? quota-delete?????????????????? Delete defined quotas of a given tenant.
? quota-list???????????????????? List quotas of all tenants who have non-default quota values.
? quota-show???????????????????? Show quotas of a given tenant
? quota-update?????????????????? Define tenant's quotas not to use defaults.
# neutron quota-show 1779b3bc725b44b98726fb0cbdc617b1
+---------------------+-------+
| Field?????????????? | Value |
+---------------------+-------+
| floatingip????????? | 50??? |
| network???????????? | 10??? |
| port??????????????? | 50??? |
| router????????????? | 10??? |
| security_group????? | 10??? |
| security_group_rule | 100?? |
| subnet????????????? | 10??? |
+---------------------+-------+
推薦下面的文章
openstack nova 基礎知識——Quota(配額管理)
http://www.sebastien-han.fr/blog/2012/09/19/openstack-play-with-quota/
步驟7:在數據庫中創建Instance實例
有關nova的database schema參考下面的文章
http://www.prestonlee.com/2012/05/03/openstack-nova-essex-mysql-database-schema-diagram-and-sql/
MySQL是Openstack中最重要的組件之一,所以在生產環境中High Availability是必須的。
MySQL的HA有下面幾種方式:
http://dev.mysql.com/doc/mysql-ha-scalability/en/index.html
| Requirement | MySQL Replication | MySQL with DRBD with Corosync and Pacemaker | MySQL Cluster |
| Availability | ? | ? | ? |
| Platform Support | All Supported by MySQL Server | Linux | All Supported by MySQL Cluster |
| Automated IP Failover | No | Yes | Depends on Connector and Configuration |
| Automated Database Failover | No | Yes | Yes |
| Automatic Data Resynchronization | No | Yes | Yes |
| Typical Failover Time | User / Script Dependent | Configuration Dependent, 60 seconds and Above | 1 Second and Less |
| Synchronous Replication | No, Asynchronous and Semisynchronous | Yes | Yes |
| Shared Storage | No, Distributed | No, Distributed | No, Distributed |
| Geographic redundancy support | Yes | Yes, via MySQL Replication | Yes, via MySQL Replication |
| Update Schema On-Line | No | No | Yes |
| Scalability | ? | ? | ? |
| Number of Nodes | One Master, Multiple Slaves | One Active (primary), one Passive (secondary) Node | 255 |
| Built-in Load Balancing | Reads, via MySQL Replication | Reads, via MySQL Replication | Yes, Reads and Writes |
| Supports Read-Intensive Workloads | Yes | Yes | Yes |
| Supports Write-Intensive Workloads | Yes, via Application-Level Sharding | Yes, via Application-Level Sharding to Multiple Active/Passive Pairs | Yes, via Auto-Sharding |
| Scale On-Line (add nodes, repartition, etc.) | No | No | Yes |
要想系統的學習Mysql replication,推薦下面的這本書
《MySQL High Availability Tools for Building Robust Data Centers》
還有一種方式是Mysql + galera,可以搭建Active + Active的Mysql應用
參考下面的兩篇文章
http://www.sebastien-han.fr/blog/2012/04/08/mysql-galera-cluster-with-haproxy/
http://www.sebastien-han.fr/blog/2012/04/01/mysql-multi-master-replication-with-galera/
還有一種常見的HA的技術,就是pacemaker
?
最底層是通信層corosync/openais
負責cluster中node之間的通信
上一層是Resource Allocation Layer,包含下面的組件:
CRM Cluster Resouce Manager
是總管,對于resource做的任何操作都是通過它。每個機器上都有一個CRM。
CIB Cluster Information Base
CIB由CRM管理,是在內存中的XML數據庫,保存了cluster的配置和狀態。我們查詢出來的configuration都是保存在CIB里面的。nodes, resources, constraints, relationship.
DC Designated Coordinator
每個node都有CRM,會有一個被選為DC,是整個Cluster的大腦,這個DC控制的CIB是master CIB,其他的CIB都是副本。
PE Policy Engine
當DC需要進行一些全局配置的時候,首先由PE根據當前的狀態和配置,計算出將來的狀態,并生成一系列的action,使得cluster從初始狀態變為結果狀態。PE僅僅在DC上運行。
LRM Local Resource Manager
本地的resource管理,調用resource agent完成操作,啟停resource,將結果返回給CRM
再上一層是Resource Layer
包含多個resource agent。resource agent往往是一些shell script,用來啟動,停止,監控resource的狀態。
要弄懂Pacemaker,推薦讀《SUSE high availability guide》
https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha/book_sleha.html
本人做了一些筆記和實驗,請參考
High Availability手冊(1): 環境
High Availability手冊(2): 架構
High Availability手冊(3): 配置
步驟8:創建filter_properties,用于nova scheduler
步驟9:發送RPC給nova-conductor
有關nova-conductor的文章
http://cloudystuffhappens.blogspot.com/2013/04/understanding-nova-conductor-in.html
在Openstack中,RPC的發送是通過RabbitMQ
RabbitMQ可以通過Pacemaker進行HA,當然也可以搭建自己的RabbitMQ Cluster
學習RabbitMQ當然首推《RabbitMQ in Action》
本人也做了一些筆記
?
RabbitMQ in Action (1): Understanding messaging
RabbitMQ in Action (2): Running and administering Rabbit
RabbitMQ in Action(5): Clustering and dealing with failure
還沒完全讀完,敬請諒解
當然Openstack中對于RabbitMQ的使用,一篇很好的文章是
NOVA源碼分析——NOVA中的RabbitMQ解析
本人也對RPC的調用過程進行了代碼分析
Openstack中RabbitMQ RPC代碼分析?
步驟10:nova-condutor創建request_spec,用于scheduler
步驟11:nova-conductor發送RPC給nova-scheduler
三、nova-scheduler
?
選擇一個物理機來創建虛擬機,我們稱為schedule的過程
nova scheduler的一個經典的圖如下
就是先Filter再Weighting,其實scheduler的過程在很早就參與了。
步驟13:對Host進行Filtering
Filtering主要通過兩個變量進行,request_spec和filter_properties,而這些變量在前面的步驟,都已經準備好了。
而不同的Filter只是利用這些信息,然后再根據從HostManager統計上來的HostState信息,選出匹配的Host。
request_spec中的第一個信息就是image的properties信息,尤其是當你想支持多種Hypervisor的時候,Xen的image, KVM的image, Hyper-V的image各不相同,如何保證image跑在正確的Hypervisor上?在image里面這種hypervisor_type property就很必要。
請閱讀下面的文章
http://www.cloudbase.it/filtering-glance-images-for-hyper-v/
image properties還會有min_ram, min_disk,只有內存和硬盤夠大才可以。
Flavor里面可以設置extra_specs,這是一系列key-value值,在數據結構中,以instance_type變量實現,可以在里面輸入這個Flavor除了資源需求的其他參數,從而在Filter的時候,可以使用。
host aggregates將所有的Host分成多個Group,當然不同的Group可以根據不同的屬性Metadata劃分,一種是高性能和低性能。
在Openstack文檔中,這個例子很好的展示了host aggregates和Flavor extra_specs的配合使用
http://docs.openstack.org/trunk/config-reference/content/section_compute-scheduler.html
Example: Specify compute hosts with SSDs
This example configures the Compute service to enable users to request nodes that have solid-state drives (SSDs). You create a?fast-iohost aggregate in the?nova?availability zone and you add the?ssd=true?key-value pair to the aggregate. Then, you add the?node1, and?node2compute nodes to it.
$ nova aggregate-create fast-io nova +----+---------+-------------------+-------+----------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+---------+-------------------+-------+----------+ | 1 | fast-io | nova | | | +----+---------+-------------------+-------+----------+$ nova aggregate-set-metadata 1 ssd=true +----+---------+-------------------+-------+-------------------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+---------+-------------------+-------+-------------------+ | 1 | fast-io | nova | [] | {u'ssd': u'true'} | +----+---------+-------------------+-------+-------------------+$ nova aggregate-add-host 1 node1 +----+---------+-------------------+-----------+-------------------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+---------+-------------------+------------+-------------------+ | 1 | fast-io | nova | [u'node1'] | {u'ssd': u'true'} | +----+---------+-------------------+------------+-------------------+$ nova aggregate-add-host 1 node2 +----+---------+-------------------+---------------------+-------------------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+---------+-------------------+----------------------+-------------------+ | 1 | fast-io | nova | [u'node1', u'node2'] | {u'ssd': u'true'} | +----+---------+-------------------+----------------------+-------------------+Use the?nova flavor-create?command to create the?ssd.large?flavor called with an ID of 6, 8 GB of RAM, 80 GB root disk, and four vCPUs.
$ nova flavor-create ssd.large 6 8192 80 4 +----+-----------+-----------+------+-----------+------+-------+-------------+-----------+-------------+ | ID | Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public | extra_specs | +----+-----------+-----------+------+-----------+------+-------+-------------+-----------+-------------+ | 6 | ssd.large | 8192 | 80 | 0 | | 4 | 1 | True | {} | +----+-----------+-----------+------+-----------+------+-------+-------------+-----------+-------------+Once the flavor is created, specify one or more key-value pairs that match the key-value pairs on the host aggregates. In this case, that is the?ssd=true?key-value pair. Setting a key-value pair on a flavor is done using the?nova flavor-key?command.
$ nova flavor-key?ssd.large?set?ssd=true
Once it is set, you should see the?extra_specs?property of the?ssd.large?flavor populated with a key of?ssd?and a corresponding value of?true.
$ nova flavor-show ssd.large +----------------------------+-------------------+ | Property | Value | +----------------------------+-------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | disk | 80 | | extra_specs | {u'ssd': u'true'} | | id | 6 | | name | ssd.large | | os-flavor-access:is_public | True | | ram | 8192 | | rxtx_factor | 1.0 | | swap | | | vcpus | 4 | +----------------------------+-------------------+Now, when a user requests an instance with the?ssd.large?flavor, the scheduler only considers hosts with the?ssd=true?key-value pair. In this example, these are?node1?and?node2.
另一個作用是Xen和KVM的POOL分開,有利于XEN進行Live Migration
另一個作用是Windows和Linux的POOL分開,因為Windows是需要收費的,而Linux大多不需要,Windows的收費是按照物理機,而非虛擬機來收費的,所有需要盡量的將windows的虛擬機放到一個物理機上。
Filter_properties的里面scheduler_hints是一個json,里面可以設置任何值,用于Filter的時候使用。
例如JsonFilter
The JsonFilter allows a user to construct a custom filter by passing a scheduler hint in JSON format. The following operators are supported:
- =
- <
- >
- in
- <=
- >=
- not
- or
- and
The filter supports the following variables:
- $free_ram_mb
- $free_disk_mb
- $total_usable_ram_mb
- $vcpus_total
- $vcpus_used
Using the?nova?command-line tool, use the?--hint?flag:
$ nova boot --image 827d564a-e636-4fc4-a376-d36f7ebe1747 --flavor 1 --hint query='[">=","$free_ram_mb",1024]' server1With the API, use the?os:scheduler_hints?key:
{
"server":?{
"name":?"server-1",
"imageRef":?"cedef40a-ed67-4d10-800e-17455edce175",
"flavorRef":?"1"
},
"os:scheduler_hints":?{
"query":?"[>=,$free_ram_mb,1024]"
}
}
我們可以指定某個物理機,用下面的命令--availability-zone <zone-name>:<host-name>
步驟14:對合適的Hosts進行weighting并且排序
選出了Hosts,接下來就是進行Weighting的操作
Weighting可以根據很多變量來,一般來說Memory和disk是最先需要滿足的,CPU和network io則需要次要考慮,一般來說,對于付錢較少的Flavor,能滿足memory和disk就可以了,對于付錢較多的Flavor,則需要保證其CPU和network io.
步驟15:nova-scheduler想選出的Host發送RPC
?
四、Nova-compute
步驟17:nova-compute接收到請求后,通過Resource Tracker將創建虛擬機所需要的資源聲明占用
步驟18:調用Neutron API配置Network,虛擬機處于Networking的狀態
需要注意的是,這一步雖然是配置Network,但是主要是數據結構的準備,真正的設備并沒有創建。
由于在創建虛擬機的時候,我們指定了將虛擬機放到哪個private network里面,因而在創建真正的設備之前,所有的信息都需要準備好。
這里的知識點設計Network的創建,然而這一步其實在虛擬機創建之前就應該做好。
一個最簡單的場景是通過下面的腳本創建網絡
#!/bin/bash
TENANT_NAME="openstack"
TENANT_NETWORK_NAME="openstack-net"
TENANT_SUBNET_NAME="${TENANT_NETWORK_NAME}-subnet"
TENANT_ROUTER_NAME="openstack-router"
FIXED_RANGE="192.168.0.0/24"
NETWORK_GATEWAY="192.168.0.1"
PUBLIC_GATEWAY="172.24.1.1"
PUBLIC_RANGE="172.24.1.0/24"
PUBLIC_START="172.24.1.100"
PUBLIC_END="172.24.1.200"
TENANT_ID=$(keystone tenant-list | grep " $TENANT_NAME " | awk '{print $2}')
(1) TENANT_NET_ID=$(neutron net-create --tenant_id $TENANT_ID $TENANT_NETWORK_NAME --provider:network_type gre --provider:segmentation_id 1 | grep " id " | awk '{print $4}')
(2) TENANT_SUBNET_ID=$(neutron subnet-create --tenant_id $TENANT_ID --ip_version 4 --name $TENANT_SUBNET_NAME $TENANT_NET_ID $FIXED_RANGE --gateway $NETWORK_GATEWAY --dns_nameservers list=true 8.8.8.8 | grep " id " | awk '{print $4}')
(3) ROUTER_ID=$(neutron router-create --tenant_id $TENANT_ID $TENANT_ROUTER_NAME | grep " id " | awk '{print $4}')
(4) neutron router-interface-add $ROUTER_ID $TENANT_SUBNET_ID
(5) neutron net-create public --router:external=True
(6) neutron subnet-create --ip_version 4 --gateway $PUBLIC_GATEWAY public $PUBLIC_RANGE --allocation-pool start=$PUBLIC_START,end=$PUBLIC_END --disable-dhcp --name public-subnet
(7) neutron router-gateway-set ${TENANT_ROUTER_NAME} public
經過這個流程,從虛擬網絡,到物理網絡就邏輯上聯通了。
然而真正的創建底層的設備,卻是通過具體的命令來的,本人總結了一下:
neutron創建network執行的那些命令?
當然還有更復雜的場景,參考這篇文章
多個router和多個network?
步驟19:生成MAC Address
步驟20: 獲取DHCP Server的配置
步驟21:獲取Network的信息
步驟22:獲取Security Group的信息
步驟23:拿著所有的信息,創建Port對象,是一個Tap device,當然真正的設備還沒有創建
步驟24:調用Libvirt Driver創建虛擬機
五、Image
在創建Instance之前,當然需要Image,Image后來發現是一個大學問。
在Openstack里面,對于KVM,應用到的Image格式主要是兩種RAW和qcow2,
raw格式簡單,容易轉換為其他的格式。需要文件系統的支持才能支持sparse file,性能相對較高。
qcow2是動態的,相對于raw來說,有下列的好處:
- 即便文件系統不支持sparse file,文件大小也很小
- Copy on write
- Snapshot
- 壓縮
- 加密
具體的格式和特點,參考下面的文章
QEMU KVM libvirt手冊(4) – images
[轉] The QCOW2 Image Format
創建一個image,有多種方法
一種方法是通過virt-install,講hard disk設為一個image文件, 從CDROM啟動一個虛擬機,按照正常的安裝流程來,最后操作系統安裝好,image再經過qemu-img進行處理,壓縮,最終形成image。
參考文章
QEMU KVM libvirt 手冊(1)?
當然現在有了更先進的方法,就是libguestfs,它可以輕松基于已有版本的image創建一個你想要的image,就是virt-builder
參考文章
libguestfs手冊(3): virt命令?
當然一個可以在Openstack里面使用的image,絕不是僅僅安裝一個操作系統那么簡單。
在OpenStack Virtual Machine Image Guide中詳細寫了一個Linux Image的各種需求
- Disk partitions and resize root partition on boot (cloud-init)
- No hard-coded MAC address information
- Ensure ssh server runs
- Disable firewall
- Access instance by using ssh public key (cloud-init)
- Use cloud-init to fetch the public key
- Process user data and other metadata (cloud-init)
- Ensure image writes boot log to console
另外加幾條:
- 包含MBR和bootloader,可以自啟動
- 支持virtio的disk和network driver
- 使用DHCP分配IP
當一個Linux的Image安裝完畢后,總要測試一下:
- 能通過key ssh上去嗎
- 能夠文件注入嗎
- cloud-init是否運行了
- 文件系統被resize為flavor大小了嗎
- hostname設為文件名嗎
- timezone設的對嗎
- /etc/fstab干凈正確嗎
- 虛擬機的log被正確輸出了嗎
- /tmp路徑下干凈嗎
- 能打snapshot嗎
- block storage添加后能正常看到和使用嗎
對于windows image,卻要復雜的多,windows真的不是對cloud友好的。
- 首先必須用virt-install將windows系統安裝好。
- windows要想使用virtio,就必須將windows版本的virtio安裝好。即便安裝好,還要在device manager里面update driver才好。
- Remote Access要打開,要不怎么遠程桌面啊。
- 安裝一個SSH Server吧,有時候需要創建虛擬機后,自動執行腳本安裝東西,不能通過遠程桌面來。
- 安裝windows版本的cloud-init cloudbase-init
- 別忘了運行windows update來更新windows,否則會有很多安全漏洞。
- 在device manager里面添加一個serial console的device
- windows的硬盤占用通常很大,運行一下disk cleanup tool可以清空一些硬盤。
- 微軟有一個SDelete工具,可以講未分配的空間設為0,從而可以壓縮硬盤。
- 別忘了License問題。
- 有時候windows配置網絡的時候,會彈出對話框,這是家庭網絡,工作網絡,還是公共網絡?這會使得網絡配置死在哪里,在注冊表里干掉他。
- 運行sysprep
對于cloud-init,參考下面的文章
http://cloudinit.readthedocs.org/en/latest/index.html
http://www.scalehorizontally.com/2013/02/24/introduction-to-cloud-init/
在ubuntu中,cloud-init主要包括
配置文件在/etc/cloud下面,默認的cloud.cfg如下
root@dfasdfsdafasdf:/etc/cloud# cat cloud.cfg
# The top level settings are used as module
# and system configuration.
# A set of users which may be applied and/or used by various modules
# when a 'default' entry is found it will reference the 'default_user'
# from the distro configuration specified below
users:
?? - default
# If this is set, 'root' will not be able to ssh in and they?
# will get a message to login instead as the above $user (ubuntu)
disable_root: true
# This will cause the set+update hostname module to not operate (if true)
preserve_hostname: false
# Example datasource config
# datasource:?
#??? Ec2:?
#????? metadata_urls: [ 'blah.com' ]
#????? timeout: 5 # (defaults to 50 seconds)
#????? max_wait: 10 # (defaults to 120 seconds)
# The modules that run in the 'init' stage
cloud_init_modules:
- migrator
- seed_random
- bootcmd
- write-files
- growpart
- resizefs
- set_hostname
- update_hostname
- update_etc_hosts
- ca-certs
- rsyslog
- users-groups
- ssh
# The modules that run in the 'config' stage
cloud_config_modules:
# Emit the cloud config ready event
# this can be used by upstart jobs for 'start on cloud-config'.
- emit_upstart
- disk_setup
- mounts
- ssh-import-id
- locale
- set-passwords
- grub-dpkg
- apt-pipelining
- apt-configure
- package-update-upgrade-install
- landscape
- timezone
- puppet
- chef
- salt-minion
- mcollective
- disable-ec2-metadata
- runcmd
- byobu
# The modules that run in the 'final' stage
cloud_final_modules:
- rightscale_userdata
- scripts-vendor
- scripts-per-once
- scripts-per-boot
- scripts-per-instance
- scripts-user
- ssh-authkey-fingerprints
- keys-to-console
- phone-home
- final-message
- power-state-change
# System and/or distro specific settings
# (not accessible to handlers/transforms)
system_info:
?? # This will affect which distro class gets used
?? distro: ubuntu
?? # Default user name + that default users groups (if added/used)
?? default_user:
???? name: ubuntu
???? lock_passwd: True
???? gecos: Ubuntu
???? groups: [adm, audio, cdrom, dialout, dip, floppy, netdev, plugdev, sudo, video]
???? sudo: ["ALL=(ALL) NOPASSWD:ALL"]
???? shell: /bin/bash
?? # Other config here will be given to the distro class and/or path classes
?? paths:
????? cloud_dir: /var/lib/cloud/
????? templates_dir: /etc/cloud/templates/
????? upstart_dir: /etc/init/
?? package_mirrors:
???? - arches: [i386, amd64]
?????? failsafe:
???????? primary:?http://archive.ubuntu.com/ubuntu
???????? security:?http://security.ubuntu.com/ubuntu
?????? search:
???????? primary:
?????????? -?http://%(ec2_region)s.ec2.archive.ubuntu.com/ubuntu/
?????????? -?http://%(availability_zone)s.clouds.archive.ubuntu.com/ubuntu/
???????? security: []
???? - arches: [armhf, armel, default]
?????? failsafe:
???????? primary:?http://ports.ubuntu.com/ubuntu-ports
???????? security:?http://ports.ubuntu.com/ubuntu-ports
?? ssh_svcname: ssh
工作文件夾在/var/lib/cloud
root@dfasdfsdafasdf:/var/lib/cloud/instance# ls
boot-finished???? datasource? obj.pkl? sem??????????? user-data.txt.i? vendor-data.txt.i
cloud-config.txt? handlers??? scripts? user-data.txt? vendor-data.txt
另外就是cloud-init的命令
/usr/bin/cloud-init
如果我們打開它,發現他是python文件,如果運行/usr/bin/cloud-init init,則會運行cloud_init_modules:下面的模塊,我們以resizefs為例子
/usr/bin/cloud-init 中會調用main_init,里面會調用run_module_section
這就調用到python代碼里面去了,所以cloud-init另一個部分就是python代碼部分
/usr/lib/python2.7/dist-packages/cloudinit
我們發現里面有這個文件/usr/lib/python2.7/dist-packages/cloudinit/config/cc_resizefs.py
里面是
def _resize_btrfs(mount_point, devpth):? # pylint: disable=W0613
??? return ('btrfs', 'filesystem', 'resize', 'max', mount_point)
def _resize_ext(mount_point, devpth):? # pylint: disable=W0613
??? return ('resize2fs', devpth)
def _resize_xfs(mount_point, devpth):? # pylint: disable=W0613
??? return ('xfs_growfs', devpth)
def _resize_ufs(mount_point, devpth):? # pylint: disable=W0613
??? return ('growfs', devpth)
哈哈,終于找到resize的根源了。
說完了創建image,還需要了解修改image,我們的文件注入,就是對image的修改。
有三種方式:通過mount一個loop device,通過qemu的network block device,或者最先進的,通過libguestfs
總結成了一篇文章
如何修改image文件?
對于qemu-nbd,有文章
QEMU KVM Libvirt手冊(6) – Network Block Device?
對于libguestfs,我也寫了一些筆記
?
libguestfs手冊(1): 架構
libguestfs手冊(2):guestfish command
libguestfs手冊(3): virt命令
對于文件注入,有文章
nova file injection?
對于如何打snapshot,分多種,有文章
QEMU KVM Libvirt手冊(5) – snapshots?
Snapshot Types
External Snapshot management
[轉] External(and Live) snapshots with libvirt
[轉] Snapshotting with libvirt for qcow2 images
步驟26:從Glance下載Image,作為base
步驟27:基于base image,創建qcow2的image
步驟28:resize image的大小,和filesystem的大小無關
步驟29:配置configuration drive
步驟30:配置文件注入
六、Libvirt
對于Libvirt,在啟動虛擬機之前,首先需要define虛擬機,是一個XML格式的文件
列出所有的Instance
# virsh list
Id??? Name?????????????????????????? State
----------------------------------------------------
10??? instance-00000006????????????? running
# virsh dumpxml instance-00000006
<domain type='kvm' id='10'>
? <name>instance-00000006</name>
? <uuid>73b896bb-7c7d-447e-ab6a-c4089532f003</uuid>
? <memory unit='KiB'>2097152</memory>
? <currentMemory unit='KiB'>2097152</currentMemory>
? <vcpu placement='static'>1</vcpu>
? <resource>
??? <partition>/machine</partition>
? </resource>
? <sysinfo type='smbios'>
??? <system>
????? <entry name='manufacturer'>OpenStack Foundation</entry>
????? <entry name='product'>OpenStack Nova</entry>
????? <entry name='version'>2014.1.1</entry>
????? <entry name='serial'>80590690-87d2-e311-b1b0-a0481cabdfb4</entry>
????? <entry name='uuid'>73b896bb-7c7d-447e-ab6a-c4089532f003</entry>
??? </system>
? </sysinfo>
? <os>
??? <type arch='x86_64' machine='pc-i440fx-trusty'>hvm</type>
??? <boot dev='hd'/>
??? <smbios mode='sysinfo'/>
? </os>
? <features>
??? <acpi/>
??? <apic/>
? </features>
? <cpu mode='host-model'>
??? <model fallback='allow'/>
? </cpu>
? <clock offset='utc'>
??? <timer name='pit' tickpolicy='delay'/>
??? <timer name='rtc' tickpolicy='catchup'/>
??? <timer name='hpet' present='no'/>
? </clock>
? <on_poweroff>destroy</on_poweroff>
? <on_reboot>restart</on_reboot>
? <on_crash>destroy</on_crash>
? <devices>
??? <emulator>/usr/bin/kvm-spice</emulator>
??? <disk type='file' device='disk'>
????? <driver name='qemu' type='qcow2' cache='none'/>
????? <source file='/var/lib/nova/instances/73b896bb-7c7d-447e-ab6a-c4089532f003/disk'/>
????? <target dev='vda' bus='virtio'/>
????? <alias name='virtio-disk0'/>
????? <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
??? </disk>
??? <controller type='usb' index='0'>
????? <alias name='usb0'/>
????? <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
??? </controller>
??? <controller type='pci' index='0' model='pci-root'>
????? <alias name='pci.0'/>
??? </controller>
??? <interface type='bridge'>
????? <mac address='fa:16:3e:ae:f4:17'/>
????? <source bridge='qbrc51a349e-87'/>
????? <target dev='tapc51a349e-87'/>
????? <model type='virtio'/>
????? <alias name='net0'/>
????? <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
??? </interface>
??? <serial type='file'>
????? <source path='/var/lib/nova/instances/73b896bb-7c7d-447e-ab6a-c4089532f003/console.log'/>
????? <target port='0'/>
????? <alias name='serial0'/>
??? </serial>
??? <serial type='pty'>
????? <source path='/dev/pts/20'/>
????? <target port='1'/>
????? <alias name='serial1'/>
??? </serial>
??? <console type='file'>
????? <source path='/var/lib/nova/instances/73b896bb-7c7d-447e-ab6a-c4089532f003/console.log'/>
????? <target type='serial' port='0'/>
????? <alias name='serial0'/>
??? </console>
??? <input type='tablet' bus='usb'>
????? <alias name='input0'/>
??? </input>
??? <input type='mouse' bus='ps2'/>
??? <input type='keyboard' bus='ps2'/>
??? <graphics type='vnc' port='5900' autoport='yes' listen='0.0.0.0' keymap='en-us'>
????? <listen type='address' address='0.0.0.0'/>
??? </graphics>
??? <video>
????? <model type='cirrus' vram='9216' heads='1'/>
????? <alias name='video0'/>
????? <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
??? </video>
??? <memballoon model='virtio'>
????? <alias name='balloon0'/>
????? <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
??? </memballoon>
? </devices>
? <seclabel type='dynamic' model='apparmor' relabel='yes'>
??? <label>libvirt-73b896bb-7c7d-447e-ab6a-c4089532f003</label>
??? <imagelabel>libvirt-73b896bb-7c7d-447e-ab6a-c4089532f003</imagelabel>
? </seclabel>
</domain>
我們發現里面定義了虛擬化類型kvm, vcpu, memory, disk, pty等,需要注意的是network,是一個tap device,attach到了qbr上。
虛擬化有很多種類型,參考下面的文章
虛擬化技術
[轉]Virtualization Basics
當然虛擬機啟動了之后,通過進程的查看,便能看到復雜無比的參數
# ps aux | grep instance-00000006
libvirt+ 22200? 6.3? 0.4 5464532 282888 ?????? Sl?? 09:51?? 0:09 qemu-system-x86_64 -enable-kvm -name instance-00000006 -S -machine pc-i440fx-trusty,accel=kvm,usb=off -cpu SandyBridge,+erms,+smep,+fsgsbase,+pdpe1gb,+rdrand,+f16c,+osxsave,+dca,+pcid,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme -m 2048 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 73b896bb-7c7d-447e-ab6a-c4089532f003 -smbios type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=2014.1.1,serial=80590690-87d2-e311-b1b0-a0481cabdfb4,uuid=73b896bb-7c7d-447e-ab6a-c4089532f003 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-00000006.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/73b896bb-7c7d-447e-ab6a-c4089532f003/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:ae:f4:17,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/73b896bb-7c7d-447e-ab6a-c4089532f003/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
然而誰能解釋這些參數是干什么的?
請仔細閱讀下面兩篇文章
?
QEMU KVM libvirt 手冊(3) - Storage Media
QEMU KVM Libvirt手冊(7): 硬件虛擬化
QEMU KVM Libvirt手冊(8): 半虛擬化設備virtio
machine參數是總線Architecture,通過qemu-system-x86_64 --machine ?查看,default就是參數中的值。
accel=kvm說明虛擬化使用的是kvm
cpu表示處理器的參數以及處理器的一些flags,可以使用命令qemu-system-x86_64 --cpu ?查看
smp是對稱多處理器,
-smp 1,sockets=1,cores=1,threads=1
qemu仿真了一個具有1個vcpu,一個socket,一個core,一個threads的處理器。
socket, core, threads是什么概念呢
(1)socket就是主板上插cpu的槽的數目,也即管理員說的”路“
(2)core就是我們平時說的”核“,即雙核,4核等
(3)thread就是每個core的硬件線程數,即超線程
具體例子,某個服務器是:2路4核超線程(一般默認為2個線程),那么,通過cat /proc/cpuinfo看到的是2*4*2=16個processor,很多人也習慣成為16核了!
SMBIOS全稱System Management BIOS,用于表示x86 architectures的硬件信息,包含BIOS,主板的信息,這里都是openstack,是假的了
-chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-00000006.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control
這是一對,用unix socket方式暴露monitor,從而可以通過virsh操作monitor
rtc是指system clock, -no-hpet是指不用更精準的時間。
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 是USB,連接到PCI總線0上,是device 0, function 1
下面兩個是一對
-drive file=/var/lib/nova/instances/73b896bb-7c7d-447e-ab6a-c4089532f003/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
表示硬盤,drive指向文件,device使用virtio,連到pci的總線0上,是device 4, funtion 0
下面兩個是一對
-netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:ae:f4:17,bus=pci.0,addr=0x3
表示網卡,用tap device,device使用virtio,連接到pci的總線0上,是device 3,function 0
下面兩個是一對
-chardev file,id=charserial0,path=/var/lib/nova/instances/73b896bb-7c7d-447e-ab6a-c4089532f003/console.log
-device isa-serial,chardev=charserial0,id=serial0
是chardev,將log重定向到console.log
下面兩個是一對,是pty
-chardev pty,id=charserial1
-device isa-serial,chardev=charserial1,id=serial1
這是顯卡
-device cirrus-vga,id=video0,bus=pci.0,addr=0x2
這是內存
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
都連接到pci總線上,通過命令virsh # qemu-monitor-command instance-00000024 --hmp "info pci"可以查看pci總線上的所有設備。
這里面有很多半虛擬化的設備,從而可以提高性能
[轉] KVM VirtIO paravirtualized drivers: why they matter
Virtio: An I/O virtualization framework for Linux
QEMU KVM Libvirt手冊(8): 半虛擬化設備virtio
[轉]Virtio balloon
除了硬件的虛擬化和半虛擬化,對于網絡,qemu和kvm本身也有自己的機制
?
QEMU KVM Libvirt手冊(9): network?
QEMU Networking
Virtual Networking
同時對于存儲,也有自己的機制
QEMU KVM Libvirt手冊(11): Managing Storage?
這一節最后要說的,就是libvirt對虛擬機的管理
有一個強大的工具叫monitor,可以進行多種操作,相當于機器的管理界面,也可以通過virsh進行操作,參考文章QEMU KVM libvirt手冊(2)
最重要的命令行工具就是virsh了,參考QEMU KVM Libvirt手冊(10):Managing Virtual Machines with libvirt
?
七、Neutron
這一步,就是講instance連接到已經創建好的網絡設備上
步驟33:創建qbr網橋
步驟34:創建veth pair,qvo和qvb
步驟35:將qvb添加到qbr上
步驟36:將qvo添加到br-int上
看起來復雜而清晰的連接過程,為什么要這樣,需要理解neutron中的網絡設備架構
其實很早就有人畫出來了,如下面的圖
在network Node上:
在Compute Node上:
當看到這里,很多人腦袋就大了,openstack為什么要創建這么多的虛擬網卡,他們之間什么關系,這些dl_vlan, mod_vlan_vid都是什么東東啊?
請參考文章neutron的基本原理
neutron的不同的private network之間是隔離的,租戶隔離技術三種常用的VLAN, GRE,VXLAN,各有優缺點
VLAN原理
A virtual LAN (VLAN) is a group of networking devices in the same broadcast domain.
有兩種VLAN
Static VLAN/Port-based VLAN
- manually assign a port on a switch to a VLAN using an Interface Subconfiguration mode command.
Dynamic VLANs
- the switch automatically assigns the port to a VLAN using information from the user device, such as its MAC address, IP address, or even directory information (a user or group name, for instance).
- The switch then consults a policy server, called a VLAN membership policy server (VMPS), which contains a mapping of device information to VLANs.
有兩種connection
Access-Link Connections
- a device that has a standardized Ethernet NIC that understands only standardized Ethernet frames
- Access-link connections can only be associated with a single VLAN.
Trunk Connections
- trunk connections are capable of carrying traffic for multiple VLANs.
IEEE’s 802.1Q
優點
Increased performance
- reducing collisions
- limiting broadcast traffic
- Less need to be routed
Improved manageability
- Manage logical groups
Increased security options
- packets only to other members of the VLAN.
缺點
limited number of VLANs 4000 -> 1000
number of MAC addresses supported in switches
GRE的原理
Generic Routing Encapsulation (GRE) is a tunneling protocol that can encapsulate a wide variety of network layer protocols inside virtual point-to-point links over an Internet Protocol internetwork.
Header
優點
Resolve the VLAN and MAC limitations by encapsulating communications within p2p 'tunnels' which hid the guest MAC information exposing only the MAC addresses of host systems.
L2 to L3, after leaving the encapsulated L2 virtual network, the traffic is forwarded to a gateway which can de-encapsulate traffic and route it out onto the leveraged unencapsulated network.
?
缺點
Point to point tunnel
Pool extensibility
Few switches can understand GRE Header, so load distribution and ACL (both depends on IPs and ports) can not be applied
VXLAN原理
Allow for virtual machines to live in two disparate networks yet still operate as if they were attached to the same L2.
Components:
- Multicast support, IGMP and PIM
- VXLAN Network Identifier (VNI): 24-bit segment ID
- VXLAN Gateway
- VXLAN Tunnel End Point (VTEP)
- VXLAN Segment/VXLAN Overlay Network
?
?
?
?
優點
Address 4K VLAN Limitation
Solves mac address scaling issues
Better scalability and failover
缺點
VXLAN expects multicast to be enabled on physical networks, and it does MAC flooding to learn end points.
But IP multicast is usually disabled
Need MAC preprovisioning via a SDN Controller
Software VTEPs may have performance issue
在Openstack中,neutron的很多網絡功能都是由openvswitch實現的,因而本人專門研究了一下openvswitch,參考下面的文章
?
OpenFlow學習筆記
Openvswitch手冊(1)
Openvswitch手冊(2)
Openvswitch手冊(3)
Openvswitch手冊(4)
[轉]Comparing sFlow and NetFlow in a vSwitch
[轉]Rapidly detecting large flows, sFlow vs. NetFlow/IPFIX
Openvswitch手冊(5)
Openvswitch手冊(6)
Openvswitch手冊(7)
?
Openvswitch手冊(8)
Openvswitch手冊(9)
Openvswtich 學習筆記
對于網絡的管理,有很多好的工具可以使用
?
[轉] iptables
HTB Linux queuing discipline manual - user guide筆記
iproute2學習筆記
tcpdump
[轉]Linux操作系統tcpdump抓包分析詳解
[轉] IPTables for KVM Host
[轉] Firewall and network filtering in libvirt
[轉] XEN, KVM, Libvirt and IPTables
http://tldp.org/HOWTO/Traffic-Control-HOWTO/
http://rlworkman.Net/howtos/iptables/iptables-tutorial.html
八、KVM
這一步,像virsh start命令一樣,將虛擬機啟動起來了。虛擬機啟動之后,還有很多的步驟需要完成。
步驟38:從DHCP Server獲取IP
有時候往往數據庫里面,VM已經有了IP,很多人就認為虛擬機就得到了IP,可是總是連不進去,不知從何入手,其實界面上能看到VM的IP和VM真正從DHCP獲得IP是兩回事情。
步驟39:cloud-init連接Metadata Server,并注入Key
Metadata Server有很復雜的架構,cloud-init連接Metadata Server,很容易就會不成功,如果不成功,Key無法注入,所有常出現IP能夠ping通,但是就是ssh不上去的情況。
http://niusmallnan.github.io/_build/html/_templates/openstack/metadata_server.html
另外推薦孔令賢的blog
?
【OpenStack】metadata在OpenStack中的使用(一)
【OpenStack】metadata在OpenStack中的使用(二)
步驟40:通過VNC可以看到啟動過程
VNC是一個很好的東西,尤其是在VM沒有得到IP的時候,你可以通過VNC連進去,用用戶名密碼登陸,然后調試為什么DHCP失敗。
VNC也是比較復雜的東西,推薦文章
nova vnc proxy基本原理?
步驟41:添加一個FLoating IP,可以通過Floating IP SSH進去。
要想Floating IP成功,除了IPTables NAT要正確,br-ex也需要正確配置。
步驟42:在VM里面可以訪問外網
要做到這一點,除了gateway要配置正確,dns server也需要正確配置。
如果虛擬機網絡有問題,那是很頭疼的時候,建議通過下面的流程進行調試:
所謂基本可用,就是能運行簡單的命令沒有問題,但是,如果想里面運行程序,則需要保證kvm的性能
推薦文章http://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaat/liaatkvm.htm
而且虛擬機之間是共享物理機的資源的,我們必須對虛擬機的QoS進行控制,可用通過cgroup來進行控制
推薦文章
[轉] Quality Of Service In OpenStack?
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/
虛擬機在一個物理機上運行,當資源緊張的時候,可能需要live migration到另外的機器上
QEMU KVM Libvirt(12): Live Migration?
九、Cinder
?
?
?
虛擬機創建完畢,我們常會attach一個volume,當然也可以boot from volume,這樣volume里面的數據不會隨著VM的消失而消失。
步驟44:Cinder API請求創建一個Volume
步驟45: Cinder Scheduler在多個Cinder Volume里面選擇一個,也是先Filter再weighting的過程,可以根據總空間的大小,也可以根據分配的情況
步驟46:Cinder Volume創建一個iscsi target
步驟47:Cinder Volume創建一個LVM volume,加入iscsi target
步驟48:compute節點連接iscsi target,從而volume出現在compute節點上
步驟49:將compute節點上的volume attach到虛擬機上
對于Cinder
?
手動模擬attach cinder volume的過程
nova boot from volume
Cinder架構
對于LVM,推薦文章
?
LVM學習筆記
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Logical_Volume_Manager_Administration/
[轉] Linux 內核中的 Device Mapper 機制
http://www.ibm.com/developerworks/cn/linux/l-devmapper/
?
對于ISCSI
推薦
Using iSCSI On Ubuntu 10.04 (Initiator And Target)
Linux tgtadm: Setup iSCSI Target ( SAN )
Cinder跟其他的存儲
總結
以上是生活随笔為你收集整理的openstack 逻辑构架真相的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: [转][留着备用]如何彻底卸载删除ppt
- 下一篇: BCC(Borland C++ Comp