javascript
java动态json入库_从JSon File动态生成模式
一些興趣點(diǎn):
1)您不需要數(shù)據(jù)幀來加載您的json架構(gòu) . 模式在驅(qū)動程序上加載和執(zhí)行,因?yàn)椴恍枰职l(fā)那些不必要的開銷
2)我構(gòu)造了一個(gè)JColumn對象的List,并將它傳遞給StructType以動態(tài)構(gòu)造模式
3)inferSchema應(yīng)該是false,因?yàn)槲覀兠鞔_定義了schema
4)我假設(shè)您的數(shù)據(jù)庫表使用“null”表示空值
5)調(diào)整映射修改typeMapping
import org.json4s._
import org.json4s.native.JsonMethods
case class JColumn(trim: Boolean, name: String, nullable: Boolean, id: Option[String], position: BigInt, table: String, _type: String, primaryKey: Boolean)
val path = """your_path\schema.json"""
val input = scala.io.Source.fromFile(path)
val json = JsonMethods.parse(input.reader())
val typeMapping = Map(
"double" -> DoubleType,
"integer" -> IntegerType,
"string" -> StringType,
"date" -> DateType,
"bool" -> BooleanType)
var rddSchema = ListBuffer[StructField]()
implicit val formats = DefaultFormats
val schema = json.extract[Array[JColumn]]
//schema.foreach(c => println(s"name:${c.name} type:${c._type} isnullable:${c.nullable}"))
schema.foreach { c =>
rddSchema += StructField(c.name, typeMapping(c._type), c.nullable, Metadata.empty)
}
val in_emp = spark.read
.format("com.databricks.spark.csv")
.schema(StructType(rddSchema.toList))
.option("inferSchema", "false")
.option("dateFormat", "yyyy.MM.dd")
.option("header", "false")
.option("delimiter", ",")
.option("nullValue", "null")
.option("treatEmptyValuesAsNulls", "true")
.csv("""your_path\employee.csv""")
in_emp.printSchema()
in_emp.collect()
in_emp.show()
我使用以下模式進(jìn)行測試:
[
{
"trim": true,
"name": "id",
"nullable": true,
"id": null,
"position": 0,
"table": "employee",
"_type": "integer",
"primaryKey": true
},
{
"trim": true,
"name": "salary",
"nullable": true,
"id": null,
"position": 1,
"table": "employee",
"_type": "double",
"primaryKey": false
},
{
"trim": true,
"name": "dob",
"nullable": true,
"id": null,
"position": 2,
"table": "employee",
"_type": "date",
"primaryKey": false
},
{
"trim": true,
"name": "department",
"nullable": true,
"id": null,
"position": 3,
"table": "employee",
"_type": "string",
"primaryKey": false
}
]
以及下一個(gè)數(shù)據(jù)(employee.csv):
1211,3500.0,null,marketing
1212,3000.0,2016.12.08,IT
1213,4000.0,2017.10.20,HR
1214,3000.0,2017.10.20,finance
總結(jié)
以上是生活随笔為你收集整理的java动态json入库_从JSon File动态生成模式的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: C语言的中常用的函数
- 下一篇: Sitemesh Demo