How can I import data from ASCII (ISO/IEC 8859-1) to my Rails/PGSQL database?

后端 未结 3 1195
我寻月下人不归
我寻月下人不归 2020-12-08 12:23

I am trying to use data from the USDA found here: http://www.ars.usda.gov/Services/docs.htm?docid=23634

ASCII (8.6Mb) - This file contains the SR26 da

相关标签:
3条回答
  • 2020-12-08 12:46

    They have a version of that database for Microsoft access, if you have it you could download to access then convert to MySQL using Access To MySQL.

    0 讨论(0)
  • 2020-12-08 13:00

    Using the tips from @craig-ringer I've created a script to migrate the original USDA Nutrition Database files to postgreSQL.

    Is it very rough at the moment, but should help to get started.

    Check it out here: https://github.com/rlucha/usda-nutrition-database-migration

    0 讨论(0)
  • 2020-12-08 13:08

    The zip contains a number of files:

      inflating: DATA_SRC.txt            
      inflating: DATSRCLN.txt            
      inflating: DERIV_CD.txt            
      inflating: FD_GROUP.txt            
      inflating: FOOD_DES.txt            
      inflating: FOOTNOTE.txt            
      inflating: LANGDESC.txt            
      inflating: LANGUAL.txt             
      inflating: NUT_DATA.txt            
      inflating: NUTR_DEF.txt            
      inflating: sr26_doc.pdf            
      inflating: SRC_CD.txt              
      inflating: WEIGHT.txt         
    

    each of which appears to be in a bizarre almost-CSV-like format, e.g. NUTR_DEF.txt:

    ~287~^~g~^~GALS~^~Galactose~^~2~^~2100~
    ~291~^~g~^~FIBTG~^~Fiber, total dietary~^~1~^~1200~
    

    plus sr26_doc.pdf, the documentation.

    Creating table definitions

    So what you need to do here is create SQL table definitions for the database - with one table for each input file. You need the CREATE TABLE command for this; see the PostgreSQL documentation.

    Page 35 of the PDF should help you - "Figure 1. Relationships among files in the USDA National Nutrient Database for Standard Reference". The following pages describe the file formats, telling you what each column means. You can write CREATE TABLE statements based on this description.

    Here's an example, for FOOD_DES.txt (food description), the first entry.

    CREATE TABLE food_des (
        "NDB_No"      varchar(5) NOT NULL PRIMARY KEY,
        "FdGrp_Cd"    varchar(4) NOT NULL,
        "Long_Desc"   varchar(200) NOT NULL,
        "Shrt_Desc"   varchar(60) NOT NULL,
        "ComName"     varchar(100),
        "ManufacName" varchar(65),
        "Survey"      varchar(1),
        "Ref_desc"    varchar(135),
        "Refuse"      smallint,
        "SciName"     varchar(65),
        "N_Factor"    NUMERIC(4,2),
        "Pro_Factor"  NUMERIC(4,2),
        "Fat_Factor"  NUMERIC(4,2),
        "CHO_Factor"  NUMERIC(4,2)
    );
    

    That's a pretty literal copy of the description. It's not how I'd design the table

    I've used NUMERIC arbitrary-precision decimal floating point types for accuracy in non-integer numeric types. If performance is more important than accuracy, you can use float4 instead.

    For relationships, you use FOREIGN KEY constraints - just colname coltype REFERENCES othertable(othercol) is sufficient to create one.

    Important: I double quoted the column names to preserve the same name as in the definitions. That means you have to always double quote them when you refer to them, e.g. SELECT "NDB_No" FROM food_des; . If you don't want that, just leave off the double quotes - or pick different names. You don't have to stick to the clumsy abbreviated column names they used, and it's quite reasonable to write:

    CREATE TABLE food_description (
        ndb_no              varchar(5) NOT NULL PRIMARY KEY,
        foodgroup_code      varchar(4) NOT NULL,
        long_description    varchar(200) NOT NULL,
        short_description   varchar(60) NOT NULL,
        common_name         varchar(100),
        manufacturer_name   varchar(65),
    

    etc. Similarly, if you're working with Rails, you can convert the table definitions to follow Rails's conventions, especially if you then intend to do the data loading via Rails.

    Loading data

    If these were sane, sensible delimited files you could then just load each table using the psql command \copy, or PgAdmin-III's "import" option.

    It is actually CSV, they've just decided to use totally bizarre delimiter and quote chars. Import via psql with:

    \copy food_des FROM 'FOOD_DES.txt' (FORMAT CSV, DELIMITER '^', QUOTE '~');
    

    or the equivalent in whatever tool you use to talk to PostgreSQL.

    The results are a sensible looking table:

    craig=> select * from food_des limit 2;
     NDB_No | FdGrp_Cd |         Long_Desc          |        Shrt_Desc         | ComName | ManufacName | Survey | Ref_desc | Refuse | SciName | N_Factor | Pro_Factor | Fat_Factor | CHO_Factor 
    --------+----------+----------------------------+--------------------------+---------+-------------+--------+----------+--------+---------+----------+------------+------------+------------
     01001  | 0100     | Butter, salted             | BUTTER,WITH SALT         |         |             | Y      |          |      0 |         |     6.38 |       4.27 |       8.79 |       3.87
     01002  | 0100     | Butter, whipped, with salt | BUTTER,WHIPPED,WITH SALT |         |             | Y      |          |      0 |         |     6.38 |       4.27 |       8.79 |       3.87
    (2 rows)
    

    Similarly, if using Rails you can use whatever Rails CSV library you want and bulk-load into models.

    0 讨论(0)
提交回复
热议问题