Django: sqlite encoding of filenames

廉价感情. 提交于 2020-01-07 04:51:07

问题


I am writing a command (to run via manage.py importfiles) to import a given directory structure on the real file system in my self written filestorage in Django.

def _handle_directory(self, directory_path, directory):
    for root, subFolders, files in os.walk(directory_path):
        for filename in files:
            path = os.path.join(root, filename)
            with open(path, 'r') as f:
                file_wrapper = FileWrapper(f)
                self.cnt_files += 1
                new_file = File(directory=directory, filename=filename,
                                file=file_wrapper, uploader=self.uploader)
                new_file.save()

The full model can be found at GitHub. The full command is currently on gist.github.com available.

If you do not want to check the model: the attribute file of my File class is a FileField.

Copying the files seems to work, thanks to pajton. Nevertheless I receive a new exception, I think, there's a problem with the sqlite encoding. But I do not know how to fix it. The value of sys.getfilesystemencoding() is mbcs.

Traceback (most recent call last):
  File ".\manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "C:\Python27\lib\site-packages\django\core\management\__init__.py", line 399, in execute_from_command_line
    utility.execute()
  File "C:\Python27\lib\site-packages\django\core\management\__init__.py", line 392, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "C:\Python27\lib\site-packages\django\core\management\base.py", line 242, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "C:\Python27\lib\site-packages\django\core\management\base.py", line 285, in execute
    output = self.handle(*args, **options)
  File "D:\Development\github\Palco\engine\filestorage\management\commands\importfiles.py", line 63, in handle
    self._handle_directory(args[0], root)
  File "D:\Development\github\Palco\engine\filestorage\management\commands\importfiles.py", line 75, in _handle_directory
    new_file.save()
  File "D:\Development\github\Palco\engine\filestorage\models.py", line 155, in save
    super(File, self).save(*args, **kwargs)
  File "C:\Python27\lib\site-packages\django\db\models\base.py", line 545, in save
    force_update=force_update, update_fields=update_fields)
  File "C:\Python27\lib\site-packages\django\db\models\base.py", line 573, in save_base
    updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
  File "C:\Python27\lib\site-packages\django\db\models\base.py", line 635, in _save_table
    forced_update)
  File "C:\Python27\lib\site-packages\django\db\models\base.py", line 679, in _do_update
    return filtered._update(values) > 0
  File "C:\Python27\lib\site-packages\django\db\models\query.py", line 507, in _update
    return query.get_compiler(self.db).execute_sql(None)
  File "C:\Python27\lib\site-packages\django\db\models\sql\compiler.py", line 976, in execute_sql
    cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
  File "C:\Python27\lib\site-packages\django\db\models\sql\compiler.py", line 782, in execute_sql
    cursor.execute(sql, params)
  File "C:\Python27\lib\site-packages\django\db\backends\util.py", line 69, in execute
    return super(CursorDebugWrapper, self).execute(sql, params)
  File "C:\Python27\lib\site-packages\django\db\backends\util.py", line 53, in execute
    return self.cursor.execute(sql, params)
  File "C:\Python27\lib\site-packages\django\db\utils.py", line 99, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "C:\Python27\lib\site-packages\django\db\backends\util.py", line 53, in execute
    return self.cursor.execute(sql, params)
  File "C:\Python27\lib\site-packages\django\db\backends\sqlite3\base.py", line 450, in execute
    return Database.Cursor.execute(self, query, params)
django.db.utils.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str
). It is highly recommended that you instead just switch your application to Unicode strings.

I changed filename in several ways; but it is always wrong. I tried values like 'foo' or u'foo', too. :( Also different combinations of .encode(), .decode() and unidecode.

I am pretty sure, that's a problem with the filename. I printed the current values of filename and the exception occurs if the filename has non-ascii characters.

Update 1: I followed pajton's advice and logged the sql querys. This is the result: (The first line is the output of print filename). D:\temp\prak-gdv-abgabe is my argument to this command.

Eigene L÷sung.pdf
(0.000) QUERY = u'BEGIN' - PARAMS = (); args=None
(0.000) QUERY = u'INSERT INTO "filestorage_file" ("directory_id", "filename", "file", "size", "content_type", "uploader_id", "datetime", "sha512") VALUES (%s, %
s, %s, %s, %s, %s, %s, %s)' - PARAMS = (164, u'Eigene L\xf6sung.pdf', u'filestorage/5/5b/5bf32077-5531-4de0-95a7-d2ea3e10a17d.pdf', None, None, 8, u'2014-02-26
23:21:17.735000', None); args=[164, 'Eigene L\xc3\xb6sung.pdf', u'filestorage/5/5b/5bf32077-5531-4de0-95a7-d2ea3e10a17d.pdf', None, None, 8, u'2014-02-26 23:21:
17.735000', None]
(0.000) QUERY = u'BEGIN' - PARAMS = (); args=None
(0.000) QUERY = u'UPDATE "filestorage_file" SET "directory_id" = %s, "filename" = %s, "file" = %s, "size" = NULL, "content_type" = %s, "uploader_id" = %s, "date
time" = %s, "sha512" = NULL WHERE "filestorage_file"."id" = %s ' - PARAMS = (164, u'D:\\Temp\\prak-gdv-abgabe\\Protokoll\\Eigene L\ufffdsung.pdf', u'filestorage
/5/5b/5bf32077-5531-4de0-95a7-d2ea3e10a17d.pdf', u'application/pdf', 8, u'2014-02-26 23:21:17.735000', 156); args=(164, 'D:\\Temp\\prak-gdv-abgabe\\Protokoll\\E
igene L\xf6sung.pdf', u'filestorage/5/5b/5bf32077-5531-4de0-95a7-d2ea3e10a17d.pdf', 'application/pdf', 8, u'2014-02-26 23:21:17.735000', 156)

Update 2: (2014-02-27 11:10 UTC) The encoding of my sqlite database is UTF-8 as verified by PRAGMA encoding;.

I checked the records of my database.

   Id   |   filename                                        |   sha512      |   size
    1   |   D:\Temp\prak-gdv-abgabe\Liesmich.html           |   ffeb8c3d5   |   5927
    2   |   D:\Temp\prak-gdv-abgabe\Liesmich.md             |   d206d241f   |   407
    3   |   D:\Temp\prak-gdv-abgabe\Liesmich.txt            |   d206d241f   |   407
    4   |   D:\Temp\prak-gdv-abgabe\Linux\GDV_Praktikum.bin |   5fc5749ee   |   166925
    5   |   Eigene Lösung.pdf                               |               |

It's very interessting, that the failing entry (id 5) has the expected filename but not the sha512 or the size values set. the other entries have the expected values for sha512 and size but not the expected filename. This is very interesting. It seems, the custom save()-method of my File class is part of my problem.... But I don't understand why these strange things happens.


回答1:


Well, I find a .... solution. I just improved my custom .save()-method of my File model. It fires not anymore 3+ saves but one. And - this is the important change - it updates only the three fields I check in my custom save method. My save method now looks like:

def save(self, *args, **kwargs):
    super(File, self).save(*args, **kwargs)
    do_update = False
    if not self.content_type:
        self.content_type = mimetypes.guess_type(self.file.name)[0]
        do_update = True
    if not self.sha512:
        self.sha512 = hashlib.sha512(self.file.read()).hexdigest()
        do_update = True
    if not self.size:
        self.size = self.file.size
        do_update = True

    if do_update:
        super(File, self).save(update_fields=['content_type', 'sha512', 'size'], *args, **kwargs)

Now the files are imported as expected!



来源:https://stackoverflow.com/questions/22038762/django-sqlite-encoding-of-filenames

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!