Excel CSV output appearing wrong after calling Python File

落花浮王杯 提交于 2019-12-11 06:03:42

问题


Currently struggling with the following output in .csv where their is various random character within the is the players names and values where there shouldn't be

(I've given a picture below of the output)

I'm wondering where I'm going wrong in the code where I'm struggling to eliminate the random characters

I'm trying to remove the characters below such as Â, Ã, ©, ‰ and so on. Any suggestions?

Python Code

#importing

import requests
from bs4 import BeautifulSoup
import pandas as pd

headers = {'User-Agent': 
       'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like 
Gecko) Chrome/47.0.2526.106 Safari/537.36'}

#calling websites
page = "https://www.transfermarkt.co.uk/transfers/transferrekorde/statistik/top/plus/0/galerie/0?saison_id=2000"
pageTree = requests.get(page, headers=headers)
pageSoup = BeautifulSoup(pageTree.content, 'html.parser')

#calling players names
Players = pageSoup.find_all("a", {"class": "spielprofil_tooltip"})
#Let's look at the first name in the Players list.
Players[0].text

#calling value of players
Values = pageSoup.find_all("td", {"class": "rechts hauptlink"})
#Let's look at the first name in the Values list.
Values[0].text

PlayersList = []
ValuesList = []

for i in range(0,25):
   PlayersList.append(Players[i].text)
   ValuesList.append(Values[i].text)

df = pd.DataFrame({"Players":PlayersList,"Values":ValuesList})

df.to_csv('2000.csv', index=False)

df.head()

====================================================================

My Excel output


回答1:


...
utf8_bom = '\xEF\xBB\xBF'
with open('2000.csv', 'w') as csv_file:
    csv_file.write(utf8_bom)
    df.to_csv(csv_file, index=False, mode='a')

Explanation: The BOM is the byte order mark (q.v.). If Excel finds it at the beginning of the CSV file, it uses it to determine the encoding, which in your case is UTF-8 (the default encoding – correctly – for Python 3).


EDIT

As Mark Tolonen pointed out, the compact version of the above is the following code:

df.to_csv('2000.csv', encoding='utf-8-sig', index=False)

The -sig in the name of the encoding stands for “signature”, i.e., the BOM at the beginning which is used by Microsoft software to detect the encoding. See also the Encodings and Unicode section of the codecs manual.




回答2:


You system seems to be writing the file encoded as UTF-8. Excel expects UTF-8 files to have a BOM signature, else it assumes a text file is encoded in a locale-specific ANSI encoding. This is for backward compatibility due to Windows existing before UTF-8 did.

Python has an encoding that writes the UTF-8 BOM signature, utf-8-sig, so simply use:

df.to_csv('2000.csv', encoding='utf-8-sig', index=False)



回答3:


UPDATE:

I've fixed this situation from the following answer in the link below..

https://stackoverflow.com/a/6488070/10675615

  1. Save the exported file as a csv in the cmd prompt
  2. Open Excel
  3. Import the data using Data-->Import External Data/ Get Text/CSV --> Import Data
  4. Select the file type of "csv" and browse to your file
  5. In the import wizard change the File_Origin to "65001 UTF" (or choose the correct language character identifier)
  6. Change the Delimiter to comma
  7. Select where to import to and Finish This way the special characters should show correctly.

**



来源:https://stackoverflow.com/questions/53797028/excel-csv-output-appearing-wrong-after-calling-python-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!