I need to create AVRO file but for that I need 2 things:
1) JSON
2) Avro Schema
From these 2 requirements - I have JSON:
{"web-app": {
"servlet": [
{
"servlet-name": "cofaxCDS",
"servlet-class": "org.cofax.cds.CDSServlet",
"init-param": {
"configGlossary:installationAt": "Philadelphia, PA",
"configGlossary:adminEmail": "ksm@pobox.com",
"configGlossary:poweredBy": "Cofax",
"configGlossary:poweredByIcon": "/images/cofax.gif",
"configGlossary:staticPath": "/content/static",
"templateProcessorClass": "org.cofax.WysiwygTemplate",
"templateLoaderClass": "org.cofax.FilesTemplateLoader",
"templatePath": "templates",
"templateOverridePath": "",
"defaultListTemplate": "listTemplate.htm",
"defaultFileTemplate": "articleTemplate.htm",
"useJSP": false,
"jspListTemplate": "listTemplate.jsp",
"jspFileTemplate": "articleTemplate.jsp",
"cachePackageTagsTrack": 200,
"cachePackageTagsStore": 200,
"cachePackageTagsRefresh": 60,
"cacheTemplatesTrack": 100,
"cacheTemplatesStore": 50,
"cacheTemplatesRefresh": 15,
"cachePagesTrack": 200,
"cachePagesStore": 100,
"cachePagesRefresh": 10,
"cachePagesDirtyRead": 10,
"searchEngineListTemplate": "forSearchEnginesList.htm",
"searchEngineFileTemplate": "forSearchEngines.htm",
"searchEngineRobotsDb": "WEB-INF/robots.db",
"useDataStore": true,
"dataStoreClass": "org.cofax.SqlDataStore",
"redirectionClass": "org.cofax.SqlRedirection",
"dataStoreName": "cofax",
"dataStoreDriver": "com.microsoft.jdbc.sqlserver.SQLServerDriver",
"dataStoreUrl": "jdbc:microsoft:sqlserver://LOCALHOST:1433;DatabaseName=goon",
"dataStoreUser": "sa",
"dataStorePassword": "dataStoreTestQuery",
"dataStoreTestQuery": "SET NOCOUNT ON;select test='test';",
"dataStoreLogFile": "/usr/local/tomcat/logs/datastore.log",
"dataStoreInitConns": 10,
"dataStoreMaxConns": 100,
"dataStoreConnUsageLimit": 100,
"dataStoreLogLevel": "debug",
"maxUrlLength": 500}},
{
"servlet-name": "cofaxEmail",
"servlet-class": "org.cofax.cds.EmailServlet",
"init-param": {
"mailHost": "mail1",
"mailHostOverride": "mail2"}},
{
"servlet-name": "cofaxAdmin",
"servlet-class": "org.cofax.cds.AdminServlet"},
{
"servlet-name": "fileServlet",
"servlet-class": "org.cofax.cds.FileServlet"},
{
"servlet-name": "cofaxTools",
"servlet-class": "org.cofax.cms.CofaxToolsServlet",
"init-param": {
"templatePath": "toolstemplates/",
"log": 1,
"logLocation": "/usr/local/tomcat/logs/CofaxTools.log",
"logMaxSize": "",
"dataLog": 1,
"dataLogLocation": "/usr/local/tomcat/logs/dataLog.log",
"dataLogMaxSize": "",
"removePageCache": "/content/admin/remove?cache=pages&id=",
"removeTemplateCache": "/content/admin/remove?cache=templates&id=",
"fileTransferFolder": "/usr/local/tomcat/webapps/content/fileTransferFolder",
"lookInContext": 1,
"adminGroupID": 4,
"betaServer": true}}],
"servlet-mapping": {
"cofaxCDS": "/",
"cofaxEmail": "/cofaxutil/aemail/*",
"cofaxAdmin": "/admin/*",
"fileServlet": "/static/*",
"cofaxTools": "/tools/*"},
"taglib": {
"taglib-uri": "cofax.tld",
"taglib-location": "/WEB-INF/tlds/cofax.tld"}}}
But how to create AVRO Schema based on it?
Looking for programatic way to do that since will have many schemas and can not create Avro Schema manually every time.
I checked 'avro-tools-1.8.1.jar' but that can not create Avro Schema from JSON directly.
Looking for a Jar or Python code that can create JSON -> Avro schema. It is ok if Data Types are not perfect (Strings, Integers and Floats are good enough for start).
you can use Kite SDK util to infer avro schema from a json input.
Example:
String json = "{\n" +
" \"id\": 1,\n" +
" \"name\": \"A green door\",\n" +
" \"price\": 12.50,\n" +
" \"tags\": [\"home\", \"green\"]\n" +
"}\n"
;
String avroSchema = JsonUtil.inferSchema(JsonUtil.parse(json), "myschema").toString();
System.out.println(avroSchema);
Result:
{
"type":"record",
"name":"myschema",
"fields":[
{
"name":"id",
"type":"int",
"doc":"Type inferred from '1'"
},
{
"name":"name",
"type":"string",
"doc":"Type inferred from '\"A green door\"'"
},
{
"name":"price",
"type":"double",
"doc":"Type inferred from '12.5'"
},
{
"name":"tags",
"type":{
"type":"array",
"items":"string"
},
"doc":"Type inferred from '[\"home\",\"green\"]'"
}
]
}
You can find the maven dependency here
If you want to avoid creating a dedicated AVRO schema for every JSON format, you can use rec-avro
package.
It allows you to take any python data structure, including parsed XML or JSON and store it in Avro without a need for a dedicated schema.
I tested it for python 3.
You can install it as pip3 install rec-avro or see the code and docs at https://github.com/bmizhen/rec-avro
I gave a json to avro example code here: https://stackoverflow.com/a/55444481/6654219
This one works cool with a simple copy and paste of avro schema.
https://toolslick.com/generation/metadata/avro-schema-from-json
来源:https://stackoverflow.com/questions/46556614/is-there-a-way-to-programmatically-convert-json-to-avro-schema