當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

SPL 关联优化技巧

發布時間：2023/12/31 编程问答 27 豆豆

生活随笔收集整理的這篇文章主要介紹了 SPL 关联优化技巧小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1.????????? 小維表

用SPL進行關聯，當維表不大時可以讀入內存。

1.1?????????? 單字段主鍵

如果維表是單字段主鍵，可以使用switch做連接。例如有訂單、客戶、雇員三個表存儲在集文件中，表結構如下：

Order	Customer	Employee
orderID	customerID	employeeID
customerID	name	name
employeeID	City	title
……	……	country
		……

現在把訂單表和客戶表、雇員表進行關聯：

	A
1	=file("order.btx").cursor@b()
2	=file("customer.btx").import@b()
3	=file("employee.btx").import@b()
4	=A1.switch(customerID,A2: ? customerID;employeeID,A3: employeeID)
5	=A4.new(orderID, ? customerID.name, employeeID.name:name)

A1：訂單表數據很多，所以用游標。

A2：客戶表數據少，全部裝入內存，并且建立索引。

A3：同理雇員表也做維表內存化。

A4：用switch做關聯，根據客戶ID字段關聯訂單表和客戶表，以及根據employeeID字段關聯訂單表和雇員表。

從A4可以看到，switch可以一次處理多個關聯計算。

當維表的主鍵是序號時還可以用序號定位。

	A
1	=file("order.btx").cursor@b()
2	=file("customer.btx").import@b().index()
3	=file("employee.btx").import@b()
4	=A1.switch(customerID,A2: ? customerID; employeeID D,A3:#)
5	=A4.new(orderID, ? customerID.name, employeeID.name:name)

A5：雇員表的employeeID字段是從1開始的自然數，所以可以做外鍵序號化。

如果維表的主鍵不是序號值，就無法直接使用外鍵序號化進行性能優化。比如customerID字段的值就是由字母構成的。這時，可以把維表的主鍵轉換成序號后再使用外鍵序號化。

首先把客戶表的客戶ID轉換為序號：

	A
1	=file("customer.btx").import@b()
2	=A1.derive(#:newCustomerID)
3	=file("newAndOldCustomerID.btx").export@b(A2, ? newCustomerID, customerID)
4	=file("newCustomer.btx").export@b(A2, ? newCustomerID: customerID, name,city)

序號化后的客戶保存到了集文件newCustomer.btx中。其中newAndOldCustomerID.btx里保存的是新舊客戶ID的對應關系。

然后再把訂單表的customerID進行序號化：

	A
1	=file("newAndOldCustomerID.btx").import@b()
2	=file("order.btx").cursor@b()
3	=A2.switch(customerID,A1: ? customerID)
4	=A3.run(customerID. ? newCustomerID: customerID)
5	=file("newOrder.btx").export@ba(A4)

序號化后的訂單保存到了集文件訂單.btx中。

這時對于customerID字段，也可以通過序號化進行連接了。

	A
1	=connect("demo")
2	=file("newOrder.btx").cursor@b()
3	=file("newCustomer.btx").import@b()
4	=file("employee.btx").import@b()
5	=A2.switch(customerID,A3:#; ? employeeID,A4:#)
6	=A5.new(orderID, ? customerID.name,employeeID.name:name)

1.2?????????? 多字段主鍵

當維表的主鍵是多個字段的時候，要使用join做連接。例如有學生表（Students）和班級表（Classes），學生表的專業號和班級號為外鍵字段，分別指向班級表的聯合主鍵（專業號，班級號），表結構如下：

Student	Class
studentId	majorId
name	classId
majorId	teacher
classId

現在要查詢學生的學號、姓名、專業、班級和班主任：

	A
1	=file("student.btx").import@b()
2	=file("class.btx").import@b().keys(majorId,classId)
3	=A1.join(majorId:classId,A2,teacher)

A2：導入班級數據，并且設置主鍵為majorId和classId；

A3：join() 函數進行雙字段的主鍵關聯，將班主任信息添加到學生信息中。

2.????????? 大維表

如果維表無法裝入內存，而事實表可以裝入內存，則可以使用joinx函數進行關聯。此時維表要按主鍵有序存儲，可分段集文件或組表均可，后者效率更高。

例如有退貨表、產品表兩個表存儲在集文件中，表結構如下：

Returns	Product
orderID	productID
productID	name
price	price
quantity	category
date	……
……

這里退貨表對關聯字段producID是無序的，產品表是按照producID字段有序的。計算目標是獲得每一筆退貨記錄的產品類別，需要把退貨表和產品表做關聯：

	A
1	=file("returns.btx").import@b()
2	=file("product.btx")
3	=A1.joinx@q(productID,A2:productID,categroy: ? categroy;)
4	=A3.fetch()

A1：把退貨表裝入內存；

A2：給出產品表的文件對象；

A3：使用joinx函數進行關聯。

如果事實表對關聯字段也是有序的，就可以加上@c，進一步提速。例如計算每一筆退貨記錄的客戶ID，就要把退貨表和訂單表做關聯：

	A
1	=file("returns.btx").import@b()
2	=file("order.btx")
3	=A1.joinx@qc(orderID,A2:orderID,customerID: ? customerID;)
4	=A3.fetch()

實際上，上面兩個例子可以一次完成：

	A
1	=file("returns.btx").import@b()
2	=file("order.btx")
3	=file("product.btx")
4	=A1.joinx@qc(orderID,A2:orderID,customerID:customerID; ? productID,A3:productID,category: category;)
5	=A4.fetch()

A4：做了兩次關聯，退貨表表先跟訂單表關聯，得到的結果再跟產品表做關聯。這里退貨表對關聯字段orderID是有序的，所以可以加@c，但要寫在最前面。

對于事實表無法裝入內存的情況，可以使用游標。例如有訂單明細的表結構如下：

orderDetails

orderID

productID

price

quantity

date

……

訂單明細保存在組表里，現在要計算某個月銷售的產品的種類情況，需要把訂單明細表的單月數據和產品表進行關聯，：

	A
1	=file("orderDetails.ctx").create().cursor(;year(date)==2018&& ? month(date)==11)
2	=file("product.btx")
3	=A1.joinx@q(productID,A2:productID,categroy: ? categroy;)
4	=A3.fetch()

A1：2018年11月的訂單明細仍然無法裝入內存，使用游標訪問。

A2：給出產品表的文件對象；

A3：使用joinx函數進行關聯。

joinx支持事實表是游標的情況，但不宜太大，否則效率就會比較低了。

3.????????? 同維主子表

對于按主鍵有序存儲的主子表可以使用joinx實現有序歸并連接。訂單表、訂單明細表已經是對訂單ID字段有序的，計算每個客戶的總消費額：

	A
1	=file("order.btx").cursor@b()
2	=file("orderDetail.btx").cursor@b()
3	=joinx(A1:order,orderID;A2: detail,orderID)
4	=A3.groups(order.customerID:customerID;sum(detail.price*detail.quantity):amount ? )

有序歸并還可以和游標外鍵一起使用，例如計算消費總金額大于1000的客戶名：

	A
1	=file("order.btx").cursor@b()
2	=file("orderDetail.btx").cursor@b()
3	=file("customer.btx").import@b()
4	=A1.switch@i(customerID, A3: customerID)
5	=joinx(A4:order,orderID;A2:detail,orderID)
6	=A5.groups(order.customerID.name:customer; sum(detail.price*detail.quantity):amount ? ).select(amount>100000)

如果主子表不是按住鍵有序時則要事先排序才能使用這種方法。

4.????????? 并行歸并

對于同維的主子組表，可以并行進行關聯。例如訂單和訂單明細這兩個組表，分段字段都是訂單ID。計算消費總金額大于1000的客戶名，仍然使用joinx進行關聯：

	A
1	=file("order.ctx").create().cursor@m(;;4)
2	=file("orderDetail.ctx").create().cursor(;;A1)
3	=joinx(A1:order,orderID;A2:detail ,orderID)
4	=A3.groups(order.customerID:customer; ? sum(detail.price*detail.quantity):amount).select(amount>100000)

A1：得到訂單的多路游標，這里使用的游標路數是4。

A2：根據多路游標A1的值，得到訂單明細的多路游標，游標路數也是4。

實際測試的結果表明，使用4線程并行后速度快了大約2.5倍。

需要注意的是，分段字段要在產生組表時就指定：

	A
1	=file("order.ctx").create(#orderID,customerID,employeeID;orderID)
2	=file("orderDetail.ctx").create(#orderID,productID,price,quantity,date;orderID)

這樣追加的數據會按orderID字段分段，不會把orderID同值的記錄分到兩段中。

5.????????? 主子表合一

組表支持主子表保存在同一文件中，合一后取出關聯數據的性能會更高。例如可以把訂單作為主表保存到組表文件order.ctx，再在主表上創建一個附表，命名為detail，把訂單明細保存到這個附表上，這時再計算消費總金額大于1000的客戶名，就是這樣：

	A
1	=file("order.ctx").create().attach(detail)
2	=A1.cursor(orderID,customerID,price,quantity)
3	=A2.groups(customerID:customer; sum(price*quantity):amount).select(amount>100000)

A1：打開附表訂單明細；

A2：建立附表游標；

A3：進行分組，并計算消費總額。

這種方式也支持并行，只要把A2稍微修改就可以：

	A
1	=file("order.ctx").create().attach(detail)
2	=A1.cursor@m(orderID,customerID,price,quantity;;4)
3	=A2.groups(customerID:customer; sum(price*quantity):amount).select(amount>100000)

A2：建立附表的多路游標，路數為4；

根據實際測試的結果，使用附表比使用joinx更快。主子表是1：N的關系，當N越大時，讀取速度的提升就越大；當主表的主鍵是多字段時，讀取速度提升的越明顯。

當訂單是1億條，每條對應大約10條訂單明細時，本案例實際測試結果：

耗時（秒）
2 個組表 joinx	組表合一	組表合一并行（4 線程）
781	602	368

使用主子表合一后，不僅有速度上的優化，在空間上也更小了，合一后的組表比兩個組表要節省20%左右的硬盤空間。

轉載于:https://blog.51cto.com/12749034/2387247

總結

以上是生活随笔為你收集整理的SPL 关联优化技巧的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

技巧
SPL

上一篇：缓存三大问题及解决方案
下一篇：影石 Insta360 推出全景运动相机