10亿数据导入oracle方案
????方案:由于數(shù)據(jù)量過大,而且里面有重復(fù)數(shù)據(jù),直接導(dǎo)入數(shù)據(jù)庫超慢,所以需要建立臨時(shí)表,然后去重插入最終表。臨時(shí)表、最終表分區(qū),都不建立主鍵和索引,設(shè)置臨時(shí)表和最終表為nologging+并行,使用sqlload直路徑+并行把數(shù)據(jù)導(dǎo)入臨時(shí)表,然后重構(gòu)主鍵為本地索引(為了以后查詢數(shù)據(jù)速度快些),臨時(shí)表對每個(gè)分區(qū)去重后插入最終表,最終表重構(gòu)主鍵和索引,設(shè)置最終表為logging+noparallel,刪除臨時(shí)表。
????步驟:
? ? 1. 創(chuàng)建臨時(shí)表
-- 創(chuàng)建臨時(shí)表 create table ESB_CUSTOMERNO_RELATION_OLD (CUSTOMERNO VARCHAR2(40) not null,IDTYPE VARCHAR2(10) not null,CUSTOMERID VARCHAR2(20) not null,CREDATE DATE,UPDDATE DATE,CUSTOMERSEQ VARCHAR2(2) not null ) partition by hash (CUSTOMERID) (partition P01tablespace TCBUCC_DATA_P01,......partition P42tablespace TCBUCC_DATA_P42 );? ? 2. 導(dǎo)入數(shù)據(jù)到臨時(shí)表? ? 2.1 txt文件格式如下:
10000201044324534026|a|320223196301195428|1|20180504120000|20180504120000 10000201020567704042|a|320222198002261889|1|20180504120000|20180504120000 ...... 10000201012050396509|a|640221197309130625|1|20180504120000|20180504120000? ? 2.2 ctl文件格式如下:
load data CHARACTERSET AL32UTF8 infile 'esb_customerno_relation_1_01.txt' ...... infile 'esb_customerno_relation_1_55.txt' append into table esb_customerno_relation_old FIELDS TERMINATED BY '|' TRAILING NULLCOLS (CUSTOMERNO, IDTYPE, CUSTOMERID, CUSTOMERSEQ, CREDATE date 'yyyy/mm/dd hh24:mi:ss', UPDDATE date 'yyyy/mm/dd hh24:mi:ss' )
? ??如果IDTYPE為空則默認(rèn)賦值a,則把IDTYPE改為IDTYPE "nvl(:IDTYPE,'a')"
????如果CUSTOMERSEQ全部賦"3",則把CUSTOMERSEQ改為CUSTOMERSEQ "3"
? ? 2.3 sqlload命令
nohup sqlldr root/123456@127.0.0.1:1521/root control=esb_customerno_relation.ctl direct=true parallel=true errors=99999999 readsize=209715200 bindsize=33554432 > esb_customerno_relation.out 2>&1 &? ??9.6億條數(shù)據(jù)總共耗時(shí):80分鐘
? ??
????3. 創(chuàng)建最終表
-- 創(chuàng)建最終表 create table ESB_CUSTOMERNO_RELATION (CUSTOMERNO VARCHAR2(40) not null,IDTYPE VARCHAR2(10) not null,CUSTOMERID VARCHAR2(20) not null,CREDATE DATE,UPDDATE DATE,CUSTOMERSEQ VARCHAR2(2) not null ) partition by hash (CUSTOMERID) (partition P01tablespace TCBUCC_DATA_P01,......partition P42tablespace TCBUCC_DATA_P42 );? ? 4. 把臨時(shí)表和最終表設(shè)為nologging 并行,并給臨時(shí)表創(chuàng)建本地索引
? ? 4.1?查看cpu信息,在cmd窗口運(yùn)行,據(jù)說并行度不要超過cpu個(gè)數(shù),查詢結(jié)果為16核
? ??show parameters cpu;
????4.2 設(shè)置nologging 并行
alter table ESB_CUSTOMERNO_RELATION nologging parallel 16; alter table ESB_CUSTOMERNO_RELATION_OLD nologging parallel 16;???? 4.3?為臨時(shí)表建立本地索引,(CUSTOMERID, CUSTOMERNO, CUSTOMERSEQ)為最終表的聯(lián)合主鍵create index IDX_ESB_CUSTOMERNO_OLD_01 on ESB_CUSTOMERNO_RELATION_OLD (CUSTOMERID, CUSTOMERNO, CUSTOMERSEQ) local nologging parallel 16;
?遇到問題:
????解決方式:增加撤銷表空間和臨時(shí)表空間大小
????增加撤銷表空間:
alter tablespace UNDOTBS1 add tempfile '/oracle/oradata/undotbs02.dbf' size 67m reuse autoextend on next 100m maxsize 32767m;????增加臨時(shí)表空間:
alter tablespace temp add tempfile '/oracle/oradata/temp02.dbf' size 67m reuse autoextend on next 100m maxsize 32767m;? 耗時(shí):16分鐘
????5. 對每個(gè)分區(qū)去重并插入最終表
declareinsert_sql varchar2(200);cursor cur_dupl isselect s.partition_namefrom user_tab_partitions swhere s.table_name = 'ESB_CUSTOMERNO_RELATION_OLD'; beginfor partition_record in cur_dupl loopinsert_sql := 'insert into esb_customerno_relation p(customerno, customerid, customerseq, idtype, credate, upddate)select ss.customerno,ss.customerid,ss.customerseq,ss.idtype,ss.credate,ss.upddatefrom (select s.customerno,s.customerid,s.customerseq,s.idtype,s.credate,s.upddate,row_number() over(partition by s.customerid, s.customerno, s.customerseq order by rownum) as rnfrom esb_customerno_relation_old partition(' ||partition_record.partition_name || ') s) sswhere rn = 1';dbms_output.put_line('執(zhí)行語句:' || insert_sql);execute immediate insert_sql;commit;dbms_output.put_line(partition_record.partition_name || '分區(qū)插入完成');end loop; exceptionwhen others thendbms_output.put_line('sqlerrm-->' || sqlerrm);rollback; end;??? 耗時(shí):160分鐘
? ? 6. 為最終表創(chuàng)建主鍵及索引
????6.1?為最終表創(chuàng)建主鍵和索引 全局索引建議分區(qū),此處由于某些原因未分區(qū)
create unique index PK_ESB_CUSTOMERNO_RELATION on ESB_CUSTOMERNO_RELATION(CUSTOMERID, CUSTOMERNO, CUSTOMERSEQ) local nologging parallel 16; alter table ESB_CUSTOMERNO_RELATION add constraint PK_ESB_CUSTOMERNO_RELATION primary key (CUSTOMERID, CUSTOMERNO, CUSTOMERSEQ); create index IDX_ESB_CUSTOMERNO_RELATION_01 on ESB_CUSTOMERNO_RELATION (UPDDATE) tablespace tcbucc_index nologging parallel 16;? 耗時(shí):26分鐘
????6.2?修改最終表的索引和表為logging,noparallel??
alter index PK_ESB_CUSTOMERNO_RELATION logging noparallel; alter index IDX_ESB_CUSTOMERNO_RELATION_01 logging noparallel; alter table ESB_CUSTOMERNO_RELATION logging noparallel;????7.?刪除中間表
drop table ESB_CUSTOMERNO_RELATION_OLD;? 總共耗時(shí):接近5小時(shí)
? ? 8. 一些sql語句
總結(jié)
以上是生活随笔為你收集整理的10亿数据导入oracle方案的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 联想Y471A蓝牙功能启用
- 下一篇: CentOS6.5X86_64系统定制文