Соотнесите вложенную строку JSON с помощью pyspark
Входные данные:
{"id": "188325809", "version": "HEAD", "created_at": "2019-08-08T19:32:04Z", "contact": "{\"id\": \"188325809\", \"phone\": [{\" number\": \"77730455\", \" code\": \"353\", \" altNo\": false}, {\" number\": \"77730466\", \" code\": \"353\", \" altNo\": false}], \"fax\": [{\"faxNumber\": \"77730998\", \" code\": \"353\"}, {\"faxNumber\": \"77730889\", \" code\": \"353\"}]}"}
Я
Используйте Pyspark для
Расплющить его, как ,
{id: "188325809", "version": "HEAD","created_at": "2019-08-08T19:32:04Z","number": "77730455", "code": "353","altno":"false","faxNumber": "77730998","code": "353"} {id: "188325809", "version": "HEAD","created_at": "2019-08-08T19:32:04Z","number": "77730466", "code": "353","altno":"false","faxNumber": "77730889","code": "353"}
Что я уже пробовал:
ds.printSchema()
root |-- created_at: string (nullable = true) |-- id: string (nullable = true) |-- contact: string (nullable = true) |-- version: string (nullable = true)
df.withColumn('contact',explode(split('contact','number'))).show()
+--------------------+---------+--------------------+-------+ | created_at| id| contact |version| +--------------------+---------+--------------------+-------+ |2019-08-08T19:32:04Z|188325809|{"id": "1883258... | HEAD| |2019-08-08T19:32:04Z|188325809|": "77730455", "i...| HEAD| |2019-08-08T19:32:04Z|188325809|": "77730466", "i...| HEAD| +--------------------+---------+--------------------+-------+
df.withColumn('contact',explode(split('contact',' '))).show()
+--------------------+---------+--------------------+-------+ | created_at| id | contact |version| +--------------------+---------+--------------------+-------+ |2019-08-08T19:32:04Z|188325809| {"id": | HEAD| |2019-08-08T19:32:04Z|188325809| "188325809",| HEAD| |2019-08-08T19:32:04Z|188325809| "phone": | HEAD| |2019-08-08T19:32:04Z|188325809| [{"number": | HEAD| |2019-08-08T19:32:04Z|188325809| "77730455",| HEAD| |2019-08-08T19:32:04Z|188325809| "code": | HEAD| |2019-08-08T19:32:04Z|188325809| "353",| HEAD| |2019-08-08T19:32:04Z|188325809| "altno":| HEAD| |2019-08-08T19:32:04Z|188325809| false},| HEAD| |2019-08-08T19:32:04Z|188325809| {"number": | HEAD| |2019-08-08T19:32:04Z|188325809| "77730466",| HEAD| |2019-08-08T19:32:04Z|188325809| "code": | HEAD| |2019-08-08T19:32:04Z|188325809| "353",| HEAD| |2019-08-08T19:32:04Z|188325809| "altno": | HEAD| |2019-08-08T19:32:04Z|188325809| false}],| HEAD| |2019-08-08T19:32:04Z|188325809| "fax":| HEAD| |2019-08-08T19:32:04Z|188325809| [{"faxNumber":| HEAD| |2019-08-08T19:32:04Z|188325809| "77730998",| HEAD| |2019-08-08T19:32:04Z|188325809| "code": | HEAD| |2019-08-08T19:32:04Z|188325809| "353"},| HEAD| +--------------------+---------+--------------------+-------+
только показывает первые 20 записей