[hadoop读书笔记] 第十五章 sqoop1.4.6小实验 - 将mysq数据导入hive
?
?
安裝hive
1、下載hive-2.1.1(搭配hadoop版本為2.7.3)
?
2、解壓到文件夾下
/wdcloud/app/hive-2.1.1
?
3、配置環境變量
?
4、在mysql上創建元數據庫hive_metastore編碼選latin,并授權
grant all on hive_metastore.* to 'root'@'%' IDENTIFIED BY 'weidong' with grant option; flush privileges;?
5、新建hive-site.xml,內容如下:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
--><configuration>
<!-- WARNING!!! This file is auto generated for documentation purposes ONLY! -->
<!-- WARNING!!! Any changes you make to this file will be ignored by Hive. -->
<!-- WARNING!!! You must make your changes in hive-site.xml instead. -->
<!-- Hive Execution Parameters -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.200.250:3306/hive_metastore?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>weidong</value>
</property>
<property>
<name>datanucleus.schema.autoCreateTables</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/hive/warehouse</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/hive/warehouse</value>
</property>
<property>
<name>hive.querylog.location</name>
<value>/wdcloud/app/hive-2.1.1/logs</value>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>/wdcloud/app/hbase-1.1.6/lib</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://192.168.200.123:9083</value>
</property>
</configuration>
?
6、hive-env.sh
?
7、放開日志
cp hive-log4j2.properties. template hive-log4j2.properties cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties?
8、導入mysql connector jar包
?
9、啟動元數據庫服務
hive --service metastore &?
出錯:
?
再次啟動
成功不報錯
元數據庫也生成了一些需要的表
調試 模式命令 ?hive -hiveconf hive.root.logger=DEBUG,console?
客戶端連接
?
OK,現在hive就安裝完畢了,讓我們來執行下將mysql數據通過sqoop導入hive
sqoop import --connect jdbc:mysql://192.168.200.250:3306/sqoop --table widgets_copy -m 1 --hive-import --username root -P?
出錯
?
解決方法:?
?
再次執行,導入成功,后臺日志
[hadoop@hadoop-allinone-200-123 conf]$ sqoop import --connect jdbc:mysql://192.168.200.250:3306/sqoop --table widgets_copy -m 1 --hive-import --username root -P 17/01/24 03:42:48 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 Enter password: 17/01/24 03:42:50 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 17/01/24 03:42:50 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 17/01/24 03:42:51 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 17/01/24 03:42:51 INFO tool.CodeGenTool: Beginning code generation 17/01/24 03:42:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `widgets_copy` AS t LIMIT 1 17/01/24 03:42:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `widgets_copy` AS t LIMIT 1 17/01/24 03:42:52 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /wdcloud/app/hadoop-2.7.3 Note: /tmp/sqoop-hadoop/compile/4a89a67225918969c1c0f4c7c13168e9/widgets_copy.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 17/01/24 03:42:54 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/4a89a67225918969c1c0f4c7c13168e9/widgets_copy.jar 17/01/24 03:42:54 WARN manager.MySQLManager: It looks like you are importing from mysql. 17/01/24 03:42:54 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 17/01/24 03:42:54 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 17/01/24 03:42:54 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 17/01/24 03:42:54 INFO mapreduce.ImportJobBase: Beginning import of widgets_copy 17/01/24 03:42:54 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/wdcloud/app/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/wdcloud/app/hbase-1.1.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 17/01/24 03:42:55 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 17/01/24 03:42:57 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 17/01/24 03:42:57 INFO client.RMProxy: Connecting to ResourceManager at hadoop-allinone-200-123.wdcloud.locl/192.168.200.123:8032 17/01/24 03:43:01 INFO db.DBInputFormat: Using read commited transaction isolation 17/01/24 03:43:01 INFO mapreduce.JobSubmitter: number of splits:1 17/01/24 03:43:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1485230213604_0011 17/01/24 03:43:03 INFO impl.YarnClientImpl: Submitted application application_1485230213604_0011 17/01/24 03:43:03 INFO mapreduce.Job: The url to track the job: http://hadoop-allinone-200-123.wdcloud.locl:8088/proxy/application_1485230213604_0011/ 17/01/24 03:43:03 INFO mapreduce.Job: Running job: job_1485230213604_0011 17/01/24 03:43:16 INFO mapreduce.Job: Job job_1485230213604_0011 running in uber mode : false 17/01/24 03:43:16 INFO mapreduce.Job: map 0% reduce 0% 17/01/24 03:43:28 INFO mapreduce.Job: map 100% reduce 0% 17/01/24 03:43:28 INFO mapreduce.Job: Job job_1485230213604_0011 completed successfully 17/01/24 03:43:28 INFO mapreduce.Job: Counters: 30File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=138211FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=87HDFS: Number of bytes written=169HDFS: Number of read operations=4HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=1Other local map tasks=1Total time spent by all maps in occupied slots (ms)=8081Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=8081Total vcore-milliseconds taken by all map tasks=8081Total megabyte-milliseconds taken by all map tasks=8274944Map-Reduce FrameworkMap input records=4Map output records=4Input split bytes=87Spilled Records=0Failed Shuffles=0Merged Map outputs=0GC time elapsed (ms)=117CPU time spent (ms)=2630Physical memory (bytes) snapshot=178216960Virtual memory (bytes) snapshot=2903285760Total committed heap usage (bytes)=155713536File Input Format Counters Bytes Read=0File Output Format Counters Bytes Written=169 17/01/24 03:43:28 INFO mapreduce.ImportJobBase: Transferred 169 bytes in 31.7543 seconds (5.3221 bytes/sec) 17/01/24 03:43:28 INFO mapreduce.ImportJobBase: Retrieved 4 records. 17/01/24 03:43:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `widgets_copy` AS t LIMIT 1 17/01/24 03:43:29 WARN hive.TableDefWriter: Column price had to be cast to a less precise type in Hive 17/01/24 03:43:29 WARN hive.TableDefWriter: Column design_date had to be cast to a less precise type in Hive 17/01/24 03:43:29 INFO hive.HiveImport: Loading uploaded data into Hive(將生成在HDFS的數據加載到HIVE中) 17/01/24 03:43:41 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings. 17/01/24 03:43:41 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/wdcloud/app/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] 17/01/24 03:43:41 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/wdcloud/app/hbase-1.1.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] 17/01/24 03:43:41 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/wdcloud/app/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] 17/01/24 03:43:41 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 17/01/24 03:43:41 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] 17/01/24 03:43:42 INFO hive.HiveImport: 17/01/24 03:43:42 INFO hive.HiveImport: Logging initialized using configuration in file:/wdcloud/app/hive-2.1.1/conf/hive-log4j2.properties Async: true 17/01/24 03:43:55 INFO hive.HiveImport: OK 17/01/24 03:43:55 INFO hive.HiveImport: Time taken: 3.687 seconds 17/01/24 03:43:55 INFO hive.HiveImport: Loading data to table default.widgets_copy 17/01/24 03:43:56 INFO hive.HiveImport: OK 17/01/24 03:43:56 INFO hive.HiveImport: Time taken: 1.92 seconds 17/01/24 03:43:57 INFO hive.HiveImport: Hive import complete. 17/01/24 03:43:57 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.(加載進Hive成功后將HDFS上的中間數據刪除掉)?
如果曾經執行失敗過,那再執行的時候,會有錯誤提示:
ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory xxx already exists這時,執行hadoop fs -rmr xxx 即可
?
或者在命令行加上參數
--hive-overwrite : Overwrite existing data inthe Hive table
這個參數會自動覆蓋掉曾經存在與hive表的數據
這樣即使失敗了也會自動去覆蓋
?
?
查看導入數據
?
?
?拓展:?Sqoop-1.4.4工具import和export使用詳解
http://blog.csdn.net/wangmuming/article/details/25303831
?
附上最后的配置文件hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--Licensed to the Apache Software Foundation (ASF) under one or morecontributor license agreements. See the NOTICE file distributed withthis work for additional information regarding copyright ownership.The ASF licenses this file to You under the Apache License, Version 2.0(the "License"); you may not use this file except in compliance withthe License. You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. --><configuration><!-- WARNING!!! This file is auto generated for documentation purposes ONLY! --><!-- WARNING!!! Any changes you make to this file will be ignored by Hive. --><!-- WARNING!!! You must make your changes in hive-site.xml instead. --><!-- Hive Execution Parameters --><property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://192.168.200.250:3306/hive_metastore?createDatabaseIfNotExist=true</value></property><property><name>javax.jdo.option.ConnectionDriverName</name><value>com.mysql.jdbc.Driver</value></property><property><name>javax.jdo.option.ConnectionUserName</name><value>root</value></property><property><name>javax.jdo.option.ConnectionPassword</name><value>weidong</value></property><property><name>datanucleus.schema.autoCreateTables</name><value>true</value></property><property><name>hive.metastore.warehouse.dir</name><value>/hive/warehouse</value></property><property><name>hive.exec.scratchdir</name><value>/hive/warehouse</value></property><property><name>hive.querylog.location</name><value>/wdcloud/app/hive-2.1.1/logs</value></property><property><name>hive.aux.jars.path</name><value>/wdcloud/app/hbase-1.1.6/lib</value></property><property><name>hive.metastore.uris</name><value>thrift://192.168.200.123:9083</value></property><property><name>hive.metastore.schema.verification</name><value>false</value></property> </configuration>?
下一步需要考慮如何定時去執行這些數據同步和增量同步任務
總結
以上是生活随笔為你收集整理的[hadoop读书笔记] 第十五章 sqoop1.4.6小实验 - 将mysq数据导入hive的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: vim学习日志(7):替换、删除文件中的
- 下一篇: 转 html中offsetTop