當(dāng)前位置：首頁 > 前端技术 > javascript >内容正文

javascript

spring可用于数据层吗_Spring XD用于数据提取

發(fā)布時間：2023/12/3 javascript 38 豆豆

生活随笔收集整理的這篇文章主要介紹了 spring可用于数据层吗_Spring XD用于数据提取小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

spring可用于數(shù)據(jù)層嗎

Spring XD是一個功能強大的工具，它是一組可安裝的Spring Boot服務(wù)，可以獨立運行，在YARN或EC2之上運行。 Spring XD還包括一個管理UI網(wǎng)站和一個用于作業(yè)和流管理的命令行工具。 Spring XD是一組功能強大的服務(wù)，可與各種數(shù)據(jù)源一起使用。

為了達到理想的使用效果，它應(yīng)該在Apache Spark或Hadoop集群中運行。在第一部分中，我們將設(shè)置XD使其在具有必要數(shù)據(jù)服務(wù)的Centos / RHEL機器上運行。這些用于運行所需的基礎(chǔ)結(jié)構(gòu)以及數(shù)據(jù)攝取。您可以集成現(xiàn)有的RDBMS，MongoDB，Kafka，Apache Spark，Hadoop，REST，RabbitMQ和其他服務(wù)。

您還可以在Mac，Windows和其他Linux發(fā)行版上安裝XD。對于開發(fā)人員機器上的基本用法，只需從Spring.IO網(wǎng)站下載Spring XD并運行xd / xd / bin / xd-standalone，這對于運行數(shù)據(jù)提取就足夠了。

1. Spring XD設(shè)置

首先，讓我們在您的Linux服務(wù)器上安裝Spring XD，注意其運行要求。如果您沒有所需的服務(wù)，則XD下載包含它們的版本供您運行。

參考：

http://docs.spring.io/spring-xd/docs/current/reference/html/#_redhat_centos_installation
https://github.com/spring-projects/spring-xd/wiki/Running-Distributed-Mode
https://github.com/spring-projects/spring-xd/wiki/XD-Distributed-Runtime

要求：

Apache Zookeeper 3.4.6
雷迪斯
RDBMS（MySQL，Postgresql，Apache Derby等）

濃縮機：

GemFire（強烈建議用于內(nèi)存數(shù)據(jù)網(wǎng)格）
GemFire XD（強烈建議用于內(nèi)存數(shù)據(jù)庫）
RabbitMQ（強烈推薦）
阿帕奇紗線

在Centos / RHEL / Fedora上安裝MySQL

需要一個關(guān)系數(shù)據(jù)庫來存儲您的工作信息，而可以使用內(nèi)存中的RDBMS，但為了實際使用，應(yīng)使用RDBMS。如果您有可從XD群集訪問的RDBMS，則可以使用它。我更喜歡將開放源代碼數(shù)據(jù)庫僅用于XD，您可以為此安裝MySQL或Postgresql。

sudo yum install mysql-server

http://dev.mysql.com/downloads/repo/yum/

安裝Postgresql（不要與GreenPlum安裝在同一臺計算機上）

sudo yum install postgresql-server

安裝Redis

http://redis.docs.pivotal.io/doc/2x/index.html#getting-started/src/install.html ＃topic_q3g_vzs_yn

（請參閱RabbitMQ）

ﾠwget -q -O – http://packages.pivotal.io/pub/rpm/rhel6/app-suite/app-suite-installer | sh ﾠﾠﾠﾠﾠﾠsudo yum install pivotal-redis ﾠﾠﾠﾠﾠﾠsudo service pivotal-redis-6379 start ﾠﾠﾠﾠﾠﾠsudo chkconfig —level 35 pivotal-redis-6379 on

安裝RabbitMQ

即使您有另一個消息隊列，也需要RabbitMQ。單個節(jié)點就足夠了，但是通信需要它。我強烈建議您使用真實的RMQ群集，因為它適合大多數(shù)流媒體需求。

http://rabbitmq.docs.pivotal.io/doc/33/index.html#getstart/src/install-getstart.html ＃安裝，RHEL

ﾠ sudo wget -q -O – packages.pivotal.io | sh sudo wget -q -O – http://packages.pivotal.io/pub/rpm/rhel6/app-suite/app-suite-installer | sh

根據(jù)權(quán)限的不同，您可能必須將其發(fā)送到文件，將其更改為chmod 700并通過sudo ./installer.sh運行。

ﾠ sudo yum search pivotalpivotal-rabbitmq-server.noarch: The RabbitMQ serversudo yum install pivotal-rabbitmq-server sudo rabbitmq-plugins enable rabbitmq_management ﾠﾠ

如果您正在該計算機上運行其他內(nèi)容，則可能與端口沖突。

ﾠ sudo /sbin/service rabbitmq-server start

安裝Spring-XD

最簡單的安裝方法是使用Pivotal的RHEL官方版本，因為它們已通過認(rèn)證。您不需要成為客戶就可以使用它們。還有許多其他方式下載/安裝XD，但這對于RHEL來說是最簡單的，因為它將配置它們?yōu)榉?wù)。

sudo wget -q -O – http://packages.pivotal.io/pub/rpm/rhel6/app-suite/app-suite-installer sh sudo yum install spring-xd

建議

還建議在同一容器內(nèi)部署XD節(jié)點和DataNode并使用數(shù)據(jù)分區(qū)。這將加速數(shù)據(jù)處理和提取。

設(shè)置工作數(shù)據(jù)庫

更改數(shù)據(jù)源，選擇以下其中一項以進行最簡單的設(shè)置。作業(yè)數(shù)據(jù)庫是存儲Spring XD作業(yè)信息和元數(shù)據(jù)的地方。這是必要的。這將是非常少量的數(shù)據(jù)。

/opt/pivotal/spring-xd/xd/config ﾠ#spring: # ﾠdatasource: # ﾠﾠurl: jdbc:mysql://mysqlserver:3306/xdjobs # ﾠﾠusername: xdjobsschema # ﾠﾠpassword: xdsecurepassword # ﾠﾠdriverClassName: com.mysql.jdbc.Driver # ﾠﾠvalidationQuery: select 1 #Config for use with Postgres - uncomment and edit with relevant values for your environment #spring: # ﾠdatasource: # ﾠﾠurl: jdbc:postgresql://postgresqlserver:5432/xdjobs # ﾠﾠusername: xdjobsschema # ﾠﾠpassword: xdsecurepassword # ﾠﾠdriverClassName: org.postgresql.Driver # ﾠﾠvalidationQuery: select 1

測試Spring-XD單節(jié)點是否正常工作：

cd /opt/pivotal/springxd/xd/bin ./xd-singlenode —hadoopDistro phd20

如果您使用的是與Pivotal HD 2.0不同的Hadoop發(fā)行版，則可以在此處指定該標(biāo)記或?qū)⑵浔Ａ魹殛P(guān)閉狀態(tài)。

測試Spring-XD Shell是否有效

cd /opt/pivotal/springxd/shell/bin ﾠﾠ ./xd-shell—hadoopDistro phd20

該外殼具有幫助和快捷方式，只需開始輸入，Tab即可為您解析名稱和參數(shù)。

設(shè)置Spring XD的環(huán)境變量

export XD_HOME=/opt/pivotal/spring-xd/xd

對于默認(rèn)訪問，我使用：

/opt/pivotal/spring-xd/shell/bin/xd-shell —hadoopDistro phd20

用于測試分布式Spring XD（DIRT）的容器和管理服務(wù)器

sudo service spring-xd-admin start sudo service spring-xd-container start

用于測試Spring XD

http://blog.pivotal.io/pivotal/products/spring-xd-for-real-time-analytics
https://github.com/spring-projects/spring-xd-samples

一些Spring XD Shell命令進行測試

had config fs —namenode hdfs://pivhdsne:8020 admin config server http://localhost:9393 runtime containers runtime moduleshadoop fs ls /xd/ stream create ticktock —definition “time | log” stream deploy ticktock stream list

檢查網(wǎng)頁界面

http：// localhost：9393 / admin-ui /＃/ streams / definitions

2. Spring XD Job and Stream with SQL

注意：為了節(jié)省空間，完整的字段列表被縮寫，您必須列出要使用的所有字段。

首先，我們創(chuàng)建一個簡單的filejdbc Spring Job，它將原始代字號分隔的文件加載到HAWQ中。這些字段都以TEXT字段形式出現(xiàn)，出于某些目的，這可能是可以的，但對于我們的需求而言不是。我們還使用自定義接收器（請參閱XML，無編碼）創(chuàng)建XD流，該流運行SQL命令從該表插入并轉(zhuǎn)換為其他HAWQ類型（例如數(shù)字和時間）。

我們通過命令行REST POST觸發(fā)輔助流，但是我們可以使用定時觸發(fā)或許多其他方式（自動，腳本或手動）來啟動輔助流。您也可以只創(chuàng)建一個自定義XD作業(yè)，該作業(yè)完成類型的轉(zhuǎn)換和一些操作，或者通過Groovy腳本轉(zhuǎn)換完成。 XD中有很多選項。

jobload.xd

job create loadjob --definition "filejdbc --resources=file:/tmp/xd/input/files/*.* --names=time,userid,dataname,dataname2, dateTimeField, lastName, firstName, city, state, address1, address2 --tableName=raw_data_tbl --initializeDatabase=true --driverClassName=org.postgresql.Driver --delimiter=~ --dateFormat=yyyy-MM-dd-hh.mm.ss --numberFormat=%d --username=gpadmin --url=jdbc:postgresql:gpadmin" --deploy stream create --name streamload --definition "http | hawq-store" --deploy job launch jobload clear job list stream list

作業(yè)將包含所有文本列的文件加載到Raw HAWQ表中。

流是由網(wǎng)頁命中或命令行調(diào)用觸發(fā)的

（需要hawq-store）。這確實會插入到實際表中并截斷臨時表。

triggerrun.sh（用于測試的BASH Shell腳本）

curl -s -H "Content-Type: application/json" -X POST -d "{id:5}" http://localhost:9000

將spring-integration-jdbc jar添加到/ opt / pivotal / spring-xd / xd / lib

hawq-store.xml（Spring集成/ XD配置）

/opt/pivotal/spring-xd/xd/modules/sink/hawq-store.xml <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:int="http://www.springframework.org/schema/integration"xmlns:int-jdbc="http://www.springframework.org/schema/integration/jdbc"xmlns:jdbc="http://www.springframework.org/schema/jdbc"xsi:schemaLocation="http://www.springframework.org/schema/beanshttp://www.springframework.org/schema/beans/spring-beans.xsdhttp://www.springframework.org/schema/integrationhttp://www.springframework.org/schema/integration/spring-integration.xsdhttp://www.springframework.org/schema/integration/jdbchttp://www.springframework.org/schema/integration/jdbc/spring-integration-jdbc.xsd"> <int:channel id="input" /> <int-jdbc:store-outbound-channel-adapterchannel="input" query="insert into real_data_tbl(time, userid, firstname, ...) select cast(time as datetime), cast(userid as numeric), firstname, ... from dfpp_networkfillclicks" data-source="dataSource" /> <bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource"><property name="driverClassName" value="org.postgresql.Driver"/><property name="url" value="jdbc:postgresql:gpadmin"/><property name="username" value="gpadmin"/><property name="password" value=""/> </bean> </beans>

createtable.sql

CREATE TABLEﾠraw_data_tbl(time text,userid text , ...somefieldﾠtext)WITH (APPENDONLY=true)DISTRIBUTED BY (time);

3. Shell的Spring XD腳本

我的常規(guī)設(shè)置腳本（我將其保存在setup.xd中，并通過ﾠ script –file setup.xd加載它）

had config fs --namenode hdfs://localhost:8020 admin config server http://localhost:9393 hadoop fs ls / stream list

通過Spring-XD將文件加載到GemFireXD的腳本

stream create --name fileload --definition "file --dir=/tmp/xd/input/load --outputType=text/plain | ﾠjdbc --tableName=APP.filetest --columns=id,name" --deploy

4. GemFire XD的Spring XD配置

將GemFire XD JDBC驅(qū)動程序復(fù)制到Spring-XD（可能也需要tools.jar）

cp /usr/lib/gphd/Pivotal_GemFireXD_10/lib/gemfirexd-client.jar /opt/pivotal/spring-xd/xd/lib/

修改接收器的JDBC屬性以指向您的Gemfire XD，如果您正在使用Pivotal HD VM并安裝帶有Yum的Spring-XD（sudo yum update spring-xd），則此位置：

/opt/pivotal/spring-xd/xd/config/modules/sink/jdbc/jdbc.properties url = jdbc:gemfirexd://localhost:1527 username = gfxd password = gfxd driverClassName = com.pivotal.gemfirexd.jdbc.ClientDriver

對于Peer Client Driver，您需要GemFireXD Lib（.so二進制文件）中的更多文件，鏈接可能是一個好主意。

5. GemFire XD設(shè)置

gfxd connect client 'localhost:1527';create table filetest (id int, name varchar(100)) REPLICATE PERSISTENT; select id, kind, netservers from sys.members;ﾠ select * from filetest;

Spring XD命令

stream list

顯示你的流

參考：

Spring XD文檔
Spring XD Wiki
在Centos上安裝Spring XD
GemFire XD文檔
Spring XD文件提取到JDBC中
帶有Hadoop的Spring XD

6.通過Spring XD將數(shù)據(jù)從RabbitMQ導(dǎo)入RDBMS

從名為“ rq”的Rabbit隊列讀取的簡單流，并將其發(fā)送到“消息和主機”列SQL數(shù)據(jù)庫，從而創(chuàng)建一個名為“ rq”的新表。

stream create --name rq --definition "rabbit --outputType=text/plain | jdbc --columns='message,host' --initializeDatabase=true" --deploy

7.通過Spring XD將數(shù)據(jù)從REST API導(dǎo)入HDFS

stream create --name hdfssave --definition "http | hdfs" –deploy

翻譯自: https://www.javacodegeeks.com/2015/03/spring-xd-for-data-ingestion.html