站内搜索

Alex的Hadoop菜鸟教程:第9课Sqoop1从Hbase或者Hive导出mysql_MySQL

今天讲讲怎么用sqoop将Hbase或者Hive的东西导出到mysql。不过事先要告诉大家

目前sqoop没有办法把数据直接从Hbase导出到mysql。必须要通过Hive建立2个表,一个外部表是基于这个Hbase表的,另一个是单纯的基于hdfs的hive原生表,然后把外部表的数据导入到原生表(临时),然后通过hive将临时表里面的数据导出到mysql

数据准备

mysql建立空表


CREATE TABLE `employee` (   `rowkey` int(11) NOT NULL,  `id` int(11) NOT NULL,  `name` varchar(20) NOT NULL,      PRIMARY KEY (`id`)    ) ENGINE=MyISAM  DEFAULT CHARSET=utf8; 



注意:因为大家习惯性的把hive表用于映射Hbase的rowkey的字段命名为key,所以在建立mysql的table的时候有可能也建立对应的key字段,但是key是mysql的保留字,会导致insert语句无法插入的问题

Hbase建立employee表

建立employee表,并插入数据
hbase(main):005:0> create 'employee','info'0 row(s) in 0.4740 seconds=> Hbase::Table - employeehbase(main):006:0> put 'employee',1,'info:id',10 row(s) in 0.2080 secondshbase(main):008:0> scan 'employee'ROW                                      COLUMN+CELL                                                                                                            1                                       column=info:id, timestamp=1417591291730, value=1                                                                      1 row(s) in 0.0610 secondshbase(main):009:0> put 'employee',1,'info:name','peter'0 row(s) in 0.0220 secondshbase(main):010:0> scan 'employee'ROW                                      COLUMN+CELL                                                                                                            1                                       column=info:id, timestamp=1417591291730, value=1                                                                       1                                       column=info:name, timestamp=1417591321072, value=peter                                                                1 row(s) in 0.0450 secondshbase(main):011:0> put 'employee',2,'info:id',20 row(s) in 0.0370 secondshbase(main):012:0> put 'employee',2,'info:name','paul'0 row(s) in 0.0180 secondshbase(main):013:0> scan 'employee'ROW                                      COLUMN+CELL                                                                                                            1                                       column=info:id, timestamp=1417591291730, value=1                                                                       1                                       column=info:name, timestamp=1417591321072, value=peter                                                                 2                                       column=info:id, timestamp=1417591500179, value=2                                                                       2                                       column=info:name, timestamp=1417591512075, value=paul                                                                 2 row(s) in 0.0440 seconds

建立Hive外部表

hive 有分为原生表和外部表,原生表是以简单文件方式存储在hdfs里面,外部表依赖别的框架,比如Hbase,我们现在建立一个依赖于我们刚刚建立的employee hbase表的hive 外部表
hive> CREATE EXTERNAL TABLE h_employee(key int, id int, name string)     > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'    > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, info:id,info:name")    > TBLPROPERTIES ("hbase.table.name" = "employee");OKTime taken: 0.324 secondshive> select * from h_employee;OK1	1	peter2	2	paulTime taken: 1.129 seconds, Fetched: 2 row(s)

建立Hive原生表

这个hive原生表只是用于导出的时候临时使用的,所以取名叫 h_employee_export,字段之间的分隔符用逗号
CREATE TABLE h_employee_export(key INT, id INT, name STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY '/054';

我们去看下实际存储的文本文件是什么样子的
$ hdfs dfs -cat /user/hive/warehouse/h_employee_export/000000_01,1,peter2,2,paul


源Hive表导入数据到临时表


第一步先将数据从 h_employee(基于Hbase的外部表)导入到 h_employee_export(原生Hive表)
hive> insert overwrite table h_employee_export select * from h_employee;
hive> select * from h_employee_export;OK1	1	peter2	2	paulTime taken: 0.359 seconds, Fetched: 2 row(s)

我们去看下实际存储的文本文件长什么样子
$ hdfs dfs -cat /user/hive/warehouse/h_employee_export/000000_01,1,peter2,2,paul



从Hive导出数据到mysql


$ sqoop export --connect jdbc:mysql://localhost:3306/sqoop_test --username root --password root --table employee --m 1 --export-dir /user/hive/warehouse/h_employee_export/Warning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.14/12/05 08:49:35 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.0.114/12/05 08:49:35 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.14/12/05 08:49:35 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.14/12/05 08:49:35 INFO tool.CodeGenTool: Beginning code generation14/12/05 08:49:36 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 114/12/05 08:49:36 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 114/12/05 08:49:36 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduceNote: /tmp/sqoop-wlsuser/compile/d16eb4166baf6a1e885d7df0e2638685/employee.java uses or overrides a deprecated API.Note: Recompile with -Xlint:deprecation for details.14/12/05 08:49:39 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wlsuser/compile/d16eb4166baf6a1e885d7df0e2638685/employee.jar14/12/05 08:49:39 INFO mapreduce.ExportJobBase: Beginning export of employee14/12/05 08:49:41 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar14/12/05 08:49:43 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative14/12/05 08:49:43 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative14/12/05 08:49:43 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps14/12/05 08:49:43 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.111.78.111:803214/12/05 08:49:45 INFO input.FileInputFormat: Total input paths to process : 114/12/05 08:49:45 INFO input.FileInputFormat: Total input paths to process : 114/12/05 08:49:45 INFO mapreduce.JobSubmitter: number of splits:114/12/05 08:49:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1406097234796_003714/12/05 08:49:46 INFO impl.YarnClientImpl: Submitted application application_1406097234796_003714/12/05 08:49:46 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1406097234796_0037/14/12/05 08:49:46 INFO mapreduce.Job: Running job: job_1406097234796_003714/12/05 08:49:59 INFO mapreduce.Job: Job job_1406097234796_0037 running in uber mode : false14/12/05 08:49:59 INFO mapreduce.Job:  map 0% reduce 0%14/12/05 08:50:10 INFO mapreduce.Job:  map 100% reduce 0%14/12/05 08:50:10 INFO mapreduce.Job: Job job_1406097234796_0037 completed successfully14/12/05 08:50:10 INFO mapreduce.Job: Counters: 30	File System Counters		FILE: Number of bytes read=0		FILE: Number of bytes written=99761		FILE: Number of read operations=0		FILE: Number of large read operations=0		FILE: Number of write operations=0		HDFS: Number of bytes read=166		HDFS: Number of bytes written=0		HDFS: Number of read operations=4		HDFS: Number of large read operations=0		HDFS: Number of write operations=0	Job Counters 		Launched map tasks=1		Data-local map tasks=1		Total time spent by all maps in occupied slots (ms)=8805		Total time spent by all reduces in occupied slots (ms)=0		Total time spent by all map tasks (ms)=8805		Total vcore-seconds taken by all map tasks=8805		Total megabyte-seconds taken by all map tasks=9016320	Map-Reduce Framework		Map input records=2		Map output records=2		Input split bytes=144		Spilled Records=0		Failed Shuffles=0		Merged Map outputs=0		GC time elapsed (ms)=97		CPU time spent (ms)=1360		Physical memory (bytes) snapshot=167555072		Virtual memory (bytes) snapshot=684212224		Total committed heap usage (bytes)=148897792	File Input Format Counters 		Bytes Read=0	File Output Format Counters 		Bytes Written=014/12/05 08:50:10 INFO mapreduce.ExportJobBase: Transferred 166 bytes in 27.0676 seconds (6.1328 bytes/sec)14/12/05 08:50:10 INFO mapreduce.ExportJobBase: Exported 2 records.




注意
在这段日志中有这样一句话
14/12/05 08:49:46 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1406097234796_0037/

意思是你可以用浏览器访问这个地址去看下任务的执行情况,如果你的任务长时间卡主没结束就是出错了,可以去这个地址查看详细的错误日志

查看结果
mysql> select * from employee;+--------+----+-------+| rowkey | id | name  |+--------+----+-------+|      1 |  1 | peter ||      2 |  2 | paul  |+--------+----+-------+2 rows in set (0.00 sec)mysql> 

导入成功