How to convert an 500GB SQL table into Apache Parquet?

断了今生、忘了曾经 提交于 2019-12-03 13:10:10

Apache Spark can be used to do this:

1.load your table from mysql via jdbc
2.save it as a parquet file

Example:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.read.jdbc("YOUR_MYSQL_JDBC_CONN_STRING",  "YOUR_TABLE",properties={"user": "YOUR_USER", "password": "YOUR_PASSWORD"})
df.write.parquet("YOUR_HDFS_FILE")

Use Sqoop (which stands for Sql to Hadoop). A short excerpt from the documentation:

You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle or a mainframe into the Hadoop Distributed File System (HDFS).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!