Python to parse html data and store into the database

心已入冬 提交于 2020-01-07 08:32:58

问题


a This is trouble me for two days, I am new one to python, I want to Parse the html data as the following link:http://movie.walkerplus.com/list/2015/12/

and then store the data into the postgresql database named movie_db, and there is table named films which is created by the following command:

CREATE TABLE films (
title       varchar(128) NOT NULL,
description varchar(256) NOT NULL,
directors   varchar(128)[],
roles       varchar(128)[]
);

I have parsed data, there are three list data for title, description, director, roles. such as title =['a', .....,'b'], description = ['c',....,'f'], director= ['d',.....,'g'], roles = [['f','g','t'], ......,['h', 't','u']].

sql = "INSERT INTO films (title, description, directors, roles)
VALUES
(%s, %s, %s, %s);" for obj in zip(t, des, dirt, r): cur.execute(cur.mogrify(sql, obj)) conn.commit()

There is error:

 psycopg2.DataError: malformed array literal: "サム・メンデス"

LINE 1: ...ームズ・ボンドの戦いを描く『007』シリーズ第24作', 'サム・メ...
                                                         ^
DETAIL:  Array value must start with "{" or dimension information.     

回答1:


I know this error. It means you are trying to insert string values into array columns. You can verify the SQL as below.

sql2 = cur.mogrify(SQL, obj)
print sql2

Your directors and roles fetched from html are list of strings. So after zip function the obj contains dir and roles as strings.

For your case you are trying to insert only 1 row. So there is probably no need to zip.

I am not familiar with this API you used, but can you try to print the values received from html before inserting? I can provide you the exact SQL required.

Edit About the syntax for the new array

the directors array is a shorthand syntax to create a new array with each element as array. In a more readable syntax, it will be same as below

director = ['tom', 'jack', 'john']
directors = []

for d in director:
    elem_as_list = []
    elem_as_list.append(d)
    directors.append(elem_as_list)
print director
print directors
print type(director[0])
print type(directors[0])

Here is the output

['tom', 'jack', 'john']
[['tom'], ['jack'], ['john']]
<type 'str'>
<type 'list'>                                                           


来源:https://stackoverflow.com/questions/37566197/python-to-parse-html-data-and-store-into-the-database

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!