how to create table automatically based on any text file in snowflake?

↘锁芯ラ 提交于 2021-02-11 05:55:38

问题


Is there any tool or any ways that creates tables automatically based on any text files?

I have 100+ csv files and every files has different numbers of columns. It would be so much work if create table definition first in snowflake manually and then load the data. I am looking for a specific way to loading data without creating a table.

Please let me know if anyone know how to tackle this. Thanks!


回答1:


Data processing frameworks such as Spark and Pandas have readers that can parse CSV header lines and form schemas with inferred data types (not just strings). You can leverage this to create new tables.

The following example provided as an illustration:

  • Uses Pandas's SQL write capability with the Snowflake Connector for Python (via SQL Alchemy)
  • Assumes you need one new table per file
  • Assumes the filename portion of the input file path is the table name
  • Assumes the CSVs are of standard formatting, and have column name headers
  • Creates all tables under the same database and schema name
import sqlalchemy as sql
import pandas as pd
import os

# Setup an SQL Alchemy Engine object
# This will provide a connection pool for Pandas to use later
engine = sql.create_engine(
    'snowflake://{u}:{p}@{a}/{d}/{s}?warehouse={w}&role={r}'.format(
        u='USERNAME',
        p='PASSWORD',
        a='account.region',
        r='ROLE_NAME',
        d='DATABASE',
        s='SCHEMA',
        w='WAREHOUSE_NAME',
    )
)

# List of (n) input CSV file paths
csv_input_filepaths = [
    '/tmp/test1.csv',
    '/tmp/test2.csv',
    '/tmp/test3.csv',
]

try:
    # Process each path
    for path in csv_input_filepaths:

        # Use filename component of path as tablename
        # '/tmp/test1.csv' creates table named 'test1', etc.
        filename, _ext = os.path.splitext(os.path.basename(path))

        # Default CSV reading options in Pandas sniff and infer headers
        # It will auto-populate schema and types based on data
        data = pd.read_csv(path)

        # Stores into Snowflake (will create the table name if it does not exist)
        # Default args will attempt to create an index, so we disable that
        data.to_sql(filename, engine, index = False)

finally:
    # Tear down all connections gracefully pre-exit
    engine.dispose()


来源:https://stackoverflow.com/questions/61921349/how-to-create-table-automatically-based-on-any-text-file-in-snowflake

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!