快速批量导入庞大数据到SQL SERVER数据库(ADO.NET)
原文地址:http://www.cnblogs.com/chenxizhang/archive/2008/11/11/1331060.html
如果你需要在程序中批量插入成千上萬行的數(shù)據(jù),你會(huì)怎么編寫代碼呢?最近在幫朋友調(diào)優(yōu)這個(gè)的時(shí)候,總結(jié)了幾種方法,并對(duì)其進(jìn)行比較。
大概的界面如下,我模擬了一個(gè)客戶資料表.
數(shù)據(jù)我是放在一個(gè)XML文件的,大約6734行。類似下面的格式
<?xml version="1.0" encoding="utf-8" ?> <root> <Customers> <CustomerID>ALFKI</CustomerID> <CompanyName>Sina</CompanyName> <ContactName>Maria Anders</ContactName> <ContactTitle>Sales Representative</ContactTitle> <Address>Obere Str. 57</Address> <City>Berlin</City> <PostalCode>12209</PostalCode> <Country>Germany</Country> <Phone>030-0074321</Phone> <Fax>030-0076545</Fax> </Customers> <Customers> <CustomerID>ANATR</CustomerID> <CompanyName>Ana Trujillo Emparedados y helados</CompanyName> <ContactName>Ana Trujillo</ContactName> <ContactTitle>Owner</ContactTitle> <Address>Avda. de la Constitución 2222</Address> <City>México D.F.</City> <PostalCode>05021</PostalCode> <Country>Mexico</Country> <Phone>(5) 555-4729</Phone> <Fax>(5) 555-3745</Fax> </Customers> <Customers> <CustomerID>ANTON</CustomerID> <CompanyName>Antonio Moreno Taquería</CompanyName> <ContactName>Antonio Moreno</ContactName> <ContactTitle>Owner</ContactTitle> <Address>Mataderos 2312</Address> <City>México D.F.</City> <PostalCode>05023</PostalCode> <Country>Mexico</Country> <Phone>(5) 555-3932</Phone> </Customers> </root>?
下面首先在服務(wù)器稍微準(zhǔn)備一下環(huán)境
USE [tempdb] GOSET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[Customers]( [CustomerID] [nchar](5) NOT NULL, [CompanyName] [nvarchar](40) NOT NULL, [ContactName] [nvarchar](30) NULL, [ContactTitle] [nvarchar](30) NULL, [Address] [nvarchar](60) NULL, [City] [nvarchar](15) NULL, [Region] [nvarchar](15) NULL, [PostalCode] [nvarchar](10) NULL, [Country] [nvarchar](15) NULL, [Phone] [nvarchar](24) NULL, [Fax] [nvarchar](24) NULL ) ON [PRIMARY] CREATE PROCEDURE [dbo].[usp_InsertCustomer] @CustomerID nchar(5), @CompanyName nvarchar(40), @ContactName nvarchar(30), @ContactTitle nvarchar(30), @Address nvarchar(60), @City nvarchar(15), @Region nvarchar(15), @PostalCode nvarchar(10), @Country nvarchar(15), @Phone nvarchar(24), @Fax nvarchar(24) AS SET NOCOUNT ON INSERT INTO [dbo].[Customers] ( [CustomerID], [CompanyName], [ContactName], [ContactTitle], [Address], [City], [Region], [PostalCode], [Country], [Phone], [Fax] ) VALUES ( @CustomerID, @CompanyName, @ContactName, @ContactTitle, @Address, @City, @Region, @PostalCode, @Country, @Phone, @Fax ) ? ? 我們?cè)趖empdb中創(chuàng)建了一個(gè)表和一個(gè)存儲(chǔ)過程 ? 首先,我們把數(shù)據(jù)加載到一個(gè)DataSet DataSet ds = new DataSet();private void btLoadData_Click(object sender, EventArgs e){string dataFile = "CustomersData.xml"; ds.ReadXml(dataFile); bindingSource1.DataSource = ds; bindingSource1.DataMember = "Customers"; dataGridView1.DataSource = bindingSource1; }然后,我們第一個(gè)測試代碼是遍歷這個(gè)DataSet,每一行提交一次
private string GetConnectionString(){return "server=(local);database=tempdb;integrated security=true;";}/// <summary> /// 直接遍歷,一個(gè)一個(gè)的提交給服務(wù)器。時(shí)間為265毫秒左右 /// 每一行都需要寫日志 /// </summary> /// <param name="sender"></param> /// <param name="e"></param> private void btOneByOne_Click(object sender, EventArgs e) { SqlConnection conn = new SqlConnection(GetConnectionString()); SqlCommand cmd = conn.CreateCommand(); cmd.CommandText = "usp_InsertCustomer"; cmd.CommandType = CommandType.StoredProcedure; conn.Open(); TimeSpan startTime = System.Diagnostics.Process.GetCurrentProcess().UserProcessorTime; foreach (DataRow row in ds.Tables[0].Rows) { cmd.Parameters.Clear(); SqlParameter[] param = new SqlParameter[]{ new SqlParameter("@CustomerID",row[0].ToString()), new SqlParameter("@CompanyName",row[1].ToString()), new SqlParameter("@ContactName",row[2].ToString()), new SqlParameter("@ContactTitle",row[3].ToString()), new SqlParameter("@Address",row[4].ToString()), new SqlParameter("@City",row[5].ToString()), new SqlParameter("@Region",row[6].ToString()), new SqlParameter("@PostalCode",row[7].ToString()), new SqlParameter("@Country",row[8].ToString()), new SqlParameter("@Phone",row[9].ToString()), new SqlParameter("@Fax",row[10].ToString()) }; cmd.Parameters.AddRange(param); cmd.ExecuteNonQuery(); } conn.Close(); TimeSpan duration = System.Diagnostics.Process.GetCurrentProcess().UserProcessorTime.Subtract(startTime); MessageBox.Show("已經(jīng)全部插入成功,所用時(shí)間為" + duration.Milliseconds.ToString() + "毫秒"); }?
接下來,我們使用ADO.NET內(nèi)置的一個(gè)DataAdapter來提交
/// <summary>/// 這是使用Adapter的方式,其實(shí)還是遍歷,而且語法也沒有簡單/// 同時(shí),速度甚至更慢。時(shí)間為650毫秒左右/// 每一行都需要寫日志/// </summary> /// <param name="sender"></param> /// <param name="e"></param> private void btUseAdapter_Click(object sender, EventArgs e) { SqlDataAdapter adapter = new SqlDataAdapter(); adapter.AcceptChangesDuringUpdate = false;//為了演示目的,把這個(gè)開關(guān)關(guān)掉,以免它在更新完成后把數(shù)據(jù)集標(biāo)記為未更改 adapter.UpdateBatchSize = 10;//這個(gè)好像也沒有什么用 SqlConnection conn = new SqlConnection(GetConnectionString()); SqlCommand insertCommand = conn.CreateCommand(); insertCommand.CommandText = "usp_InsertCustomer"; insertCommand.CommandType = CommandType.StoredProcedure; insertCommand.UpdatedRowSource = UpdateRowSource.None; insertCommand.Parameters.Add("@CustomerID", SqlDbType.NChar, 5, "CustomerID"); insertCommand.Parameters.Add("@CompanyName", SqlDbType.NVarChar, 40, "CompanyName"); insertCommand.Parameters.Add("@ContactName", SqlDbType.NVarChar, 30, "ContactName"); insertCommand.Parameters.Add("@ContactTitle", SqlDbType.NVarChar, 30, "ContactTitle"); insertCommand.Parameters.Add("@Address", SqlDbType.NVarChar, 60, "Address"); insertCommand.Parameters.Add("@City", SqlDbType.NVarChar, 15, "City"); insertCommand.Parameters.Add("@Region", SqlDbType.NVarChar, 15, "Region"); insertCommand.Parameters.Add("@PostalCode", SqlDbType.NVarChar, 10, "PostalCode"); insertCommand.Parameters.Add("@Country", SqlDbType.NVarChar, 15, "Country"); insertCommand.Parameters.Add("@Phone", SqlDbType.NVarChar, 24, "Phone"); insertCommand.Parameters.Add("@Fax", SqlDbType.NVarChar, 24, "Fax"); adapter.InsertCommand = insertCommand; TimeSpan startTime = System.Diagnostics.Process.GetCurrentProcess().UserProcessorTime; adapter.Update(ds,"Customers"); TimeSpan duration = System.Diagnostics.Process.GetCurrentProcess().UserProcessorTime.Subtract(startTime); MessageBox.Show("已經(jīng)全部插入成功,所用時(shí)間為" + duration.Milliseconds.ToString() + "毫秒"); }?
最后,我們找到了最快的方法
?
/// <summary>/// 使用新的API,批量導(dǎo)入,這個(gè)速度很快,大約26毫秒,很顯然,這種方式只寫一次日志,不會(huì)為每一行寫日志/// </summary>/// <param name="sender"></param>/// <param name="e"></param> private void btBCP_Click(object sender, EventArgs e) { using (SqlConnection conn = new SqlConnection(GetConnectionString())) { SqlBulkCopy bcp = new SqlBulkCopy(conn); bcp.DestinationTableName = "Customers"; bcp.BatchSize = 100;//這是批尺寸可以調(diào)整 for (int i = 0; i < 11; i++) { bcp.ColumnMappings.Add(i, i); } TimeSpan startTime = System.Diagnostics.Process.GetCurrentProcess().UserProcessorTime; conn.Open(); bcp.WriteToServer(ds.Tables[0]); TimeSpan duration = System.Diagnostics.Process.GetCurrentProcess().UserProcessorTime.Subtract(startTime); MessageBox.Show("已經(jīng)全部插入成功,所用時(shí)間為" + duration.Milliseconds.ToString() + "毫秒"); } }?
還有一種辦法是通過在服務(wù)器OPENXML,因?yàn)閄ML反復(fù)處理效率很差,所以就沒有測試了,可以斷定它肯定比其他幾種還要慢。
另外提示一下,如果不用編程的方式,那么有其他三個(gè)可能的途徑去做這個(gè)事情
1. BCP工具(這是一個(gè)命令行,可以做導(dǎo)入和導(dǎo)出,不過來源文件如果不規(guī)范,那么可能很費(fèi)勁)
2. BULK INSERT語句(這是一個(gè)T-SQL語句,只能做導(dǎo)入,我們上面使用的SQLBULKCopy應(yīng)該和他很類似)
3. XML Bulk Load(這是一套COM的對(duì)象模型,適合導(dǎo)入XML文檔)
?
還有,在做大量的數(shù)據(jù)導(dǎo)入和導(dǎo)出時(shí),可以考慮微軟為SQL Server配套的SSIS(Integration Service)
?
總結(jié)
以上是生活随笔為你收集整理的快速批量导入庞大数据到SQL SERVER数据库(ADO.NET)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: DICOM医学图像处理:开源库mDCM与
- 下一篇: 关于redis