How can I asynchronously load data from large files in Qt?

前端未结

关注

 4  1841

I\'m using Qt 5.2.1 to implement a program that reads in data from a file (could be a few bytes to a few GB) and visualises that data in a way that\'s dependent on every byt

相关标签:

4条回答

庸人自扰

2020-12-16 05:48

if you are planing to edit 10GB files forgot about QTextEdit. This ui->hexTextView->insertPlainText will simply eat whole memory before you will read 1/10 of the file. IMO you should use QTableView to present and edit data. To do that you should inherit QAbstractTableModel. In one row you should present 16 bytes. In first 16 columns in hex form and in next column in ASCII form. This shouldn't be to complex. Just read fearfully documentation of QAbstractTableModel. Caching data will be most important here. If I will have a time I will give code example.
Forgot about use of multiple threads. This is bad case to use such thing and most probably you will create lots of problems related with synchronization.

Ok I had some time here is code which is working (I've test it works smoothly):

#include <QObject>
#include <QFile>
#include <QQueue>

class LargeFileCache : public QObject
{
    Q_OBJECT
public:
    explicit LargeFileCache(QObject *parent = 0);

    char geByte(qint64 pos);
    qint64 FileSize() const;

signals:

public slots:
    void SetFileName(const QString& filename);

private:
    static const int kPageSize;

    struct Page {
        qint64 offset;
        QByteArray data;
    };

private:
    int maxPageCount;
    qint64 fileSize;

    QFile file;
    QQueue<Page> pages;
};

#include <QAbstractTableModel>

class LargeFileCache;

class LageFileDataModel : public QAbstractTableModel
{
    Q_OBJECT
public:
    explicit LageFileDataModel(QObject *parent);

    // QAbstractTableModel
    int rowCount(const QModelIndex &parent) const;
    int columnCount(const QModelIndex &parent) const;
    QVariant data(const QModelIndex &index, int role) const;

signals:

public slots:
    void setFileName(const QString &fileName);

private:
    LargeFileCache *cachedData;
};

#include "lagefiledatamodel.h"
#include "largefilecache.h"

static const int kBytesPerRow = 16;

LageFileDataModel::LageFileDataModel(QObject *parent)
    : QAbstractTableModel(parent)
{
    cachedData = new LargeFileCache(this);
}

int LageFileDataModel::rowCount(const QModelIndex &parent) const
{
    if (parent.isValid())
        return 0;
    return (cachedData->FileSize() + kBytesPerRow - 1)/kBytesPerRow;
}

int LageFileDataModel::columnCount(const QModelIndex &parent) const
{
    if (parent.isValid())
        return 0;
    return kBytesPerRow;
}

QVariant LageFileDataModel::data(const QModelIndex &index, int role) const
{
    if (index.parent().isValid())
        return QVariant();
    if (index.isValid()) {
        if (role == Qt::DisplayRole) {
            qint64 pos = index.row()*kBytesPerRow + index.column();
            if (pos>=cachedData->FileSize())
                return QString();
            return QString::number((unsigned char)cachedData->geByte(pos), 0x10);
        }
    }

    return QVariant();
}

void LageFileDataModel::setFileName(const QString &fileName)
{
    beginResetModel();
    cachedData->SetFileName(fileName);
    endResetModel();
}

#include "largefilecache.h"

const int LargeFileCache::kPageSize = 1024*4;

LargeFileCache::LargeFileCache(QObject *parent)
    : QObject(parent)
    , maxPageCount(1024)
{

}

char LargeFileCache::geByte(qint64 pos)
{
    // largefilecache
    if (pos>=fileSize)
        return 0;

    for (int i=0, n=pages.size(); i<n; ++i) {
        int k = pos - pages.at(i).offset;
        if (k>=0 && k< pages.at(i).data.size()) {
            pages.enqueue(pages.takeAt(i));
            return pages.back().data.at(k);
        }
    }

    Page newPage;
    newPage.offset = (pos/kPageSize)*kPageSize;
    file.seek(newPage.offset);
    newPage.data = file.read(kPageSize);
    pages.push_front(newPage);

    while (pages.count()>maxPageCount)
        pages.dequeue();

    return newPage.data.at(pos - newPage.offset);
}

qint64 LargeFileCache::FileSize() const
{
    return fileSize;
}

void LargeFileCache::SetFileName(const QString &filename)
{
    file.close();
    file.setFileName(filename);
    file.open(QFile::ReadOnly);
    fileSize = file.size();
}

It is shorter then I've expected and it needs some improvement, but it should be a good base.

0 讨论(0)

灰色年华

2020-12-16 05:52
First of all, you don't have any multithreading in your app at all. Your FileReader class is a subclass of QThread, but it does not mean that all FileReader methods will be executed in another thread. In fact, all your operations are performed in the main (GUI) thread.

FileReader should be a QObject and not a QThread subclass. Then you create a basic QThread object and move your worker (reader) to it using QObject::moveToThread. You can read about this technique here.

Make sure you have registered FileReader::State type using qRegisterMetaType. This is necessary for Qt signal-slot connections to work across different threads.

An example:
```
HexViewer::HexViewer(QWidget *parent) :
    QMainWindow(parent),
    _ui(new Ui::HexViewer),
    _fileReader(new FileReader())
{
    qRegisterMetaType<FileReader::State>("FileReader::State");

    QThread *readerThread = new QThread(this);
    readerThread->setObjectName("ReaderThread");
    connect(readerThread, SIGNAL(finished()),
            _fileReader, SLOT(deleteLater()));
    _fileReader->moveToThread(readerThread);
    readerThread->start();

    _ui->setupUi(this);

    ...
}

void HexViewer::on_quitButton_clicked()
{
    _fileReader->thread()->quit();
    _fileReader->thread()->wait();

    qApp->quit();
}
```
Also it is not necessary to allocate data on the heap here:
```
while(!inFile.atEnd())
{
    QByteArray *qa = new QByteArray(inFile.read(DATA_SIZE));
    qDebug() << "emitting dataRead()";
    emit dataRead(qa);
}
```
QByteArray uses implicit sharing. It means that its contents are not copied again and again when you pass a QByteArray object across functions in a read-only mode.

Change the code above to this and forget about manual memory management:
```
while(!inFile.atEnd())
{
    QByteArray qa = inFile.read(DATA_SIZE);
    qDebug() << "emitting dataRead()";
    emit dataRead(qa);
}
```
But anyway, the main problem is not with multithreading. The problem is that QTextEdit::insertPlainText operation is not cheap, especially when you have a huge amount of data. FileReader reads file data pretty quickly and then floods your widget with new portions of data to display.

It must be noted that you have a very ineffectual implementation of HexViewer::loadData. You insert text data char by char which makes QTextEdit constantly redraw its contents and freezes the GUI.

You should prepare the resulting hex string first (note that data parameter is not a pointer anymore):
```
void HexViewer::loadData(QByteArray data)
{
    QString tmp = data.toHex();

    QString hexString;
    hexString.reserve(tmp.size() * 1.5);

    const int hexLen = 2;

    for (int i = 0; i < tmp.size(); i += hexLen)
    {
        hexString.append(tmp.mid(i, hexLen) + " ");
    }

    _ui->hexTextView->insertPlainText(hexString);
}
```
Anyway, the bottleneck of your application is not file reading but QTextEdit updating. Loading data by chunks and then appending it to the widget using QTextEdit::insertPlainText will not speed up anything. For files less than 1Mb it is faster to read the whole file at once and then set the resulting text to the widget in a single step.

I suppose you can't easily display huge texts larger than several megabytes using default Qt widgets. This task requires some non-trivial approch that in general has nothing to do with multithreading or asynchronous data loading. It's all about creating some tricky widget which won't try to display its huge contents at once.
0 讨论(0)
发布评论:

提交评论
- 加载中...
轻奢々

2020-12-16 05:57
This seems like the case that you would want to have a consumer producer with semaphores. There is a very specific example which can walk you through properly implementing it. You need one more thread to make this work apart from your main thread.

The setup should be :
- Thread A runs your filereader as a producer
- You GUI thread runs your Hexviewer widget that consumes your data on specific events. Before issuing QSemaphore::acquire() a check with QSemaphore::available()` should be made in order to avoid blocking the GUI.
- Filereader and Hexviewer have access to a third class e.g. DataClass where the data is placed upon read and retrieved from the consumer. This should also have the semaphores defined.
- There is no need to emit a signal with the data or notify.
That pretty much covers moving your data read from filereader to your widget but it does not cover how to actually paint this data. In order to achive this you can consume the data within a paintevent by overriding the paint event of Hexviewer, and reading what has been put in the queue. A more elaborate approach would be to write an event filter.

On top of this you may want to have a maximum number of bytes read after which Hexviewer is explicitly signaled to consume the data.

Notice, that this solution is completely asynchronous, threadsafe and ordered, since none of your data is sent to Hexviewer, but the Hexviewer only consumes that when it needs to display on the screen.
0 讨论(0)
发布评论:

提交评论
- 加载中...
不思量自难忘°

2020-12-16 06:00

For a hex viewer, I don't think you're on the right track at all - unless you think it will most likely be used on system with SCSI or RAID arrays for speed. Why load gigabytes of data at a time at all? A file access to fill up a text box happens pretty fast these days. Granted, that, e.g. Notepad++ has an excellent hex viewer plugin, and you have to load the file first; but that's because the file may be edited, and that's the way NPP works.

I think you would likely wind up subclassing a text box, going and getting enough data to load up the text box, or even splurge, and load up 500k of data before and after the current position. Then, say you are starting at byte zero. Load up enough data for your display, and maybe some extra data besides; but set the scrollbar type to always visible. Then, I think you'll probably intercept the scroll events by subclassing QTextBox; and writing your own scrollContentsBy() and changeEvent() and/or paint() event.

Even more simply, you could just create a QTextBox with no scrollbars ever; and a QVerticalScrollbar beside it. Set it's range and starting value. Then, respond to valueChanged() event; and change the contents of the QTextBox. That way, the user doesn't have to wait for along disk read in order to start editing, and it'll be a lot easier on resources (i.e. memory - so that if a lot of apps are open, they don't e.g. get swapped out to disk). It sounds hard to subclass these things, but a lot of times, it seems harder than it actually is. There are often fine examples already of somebody doing something like that.

If you have multiple threads reading a file, by contrast, you may have one reading from the beginning, another from the middle, and another towards the end. A single read head will be jumping around, trying to satisfy all requests, and therefore operate less efficiently. If it is an SDD drive instead, non-linear reads won't hurt you, but they won't help you either. If you'd prefer to make the trade-off of having a perhaps noticeable loading time, so that a user can scroll around a lot willy-nilly, a little faster (a textbox full of data to read really doesn't take very long to load, after all) then you might have a single thread read it in in the background, and then you can let the main one keep processing the event loop. More simply yet, just read in blocks of n megabytes at a time as it opens the whole file at once, and then do a qApp->processEvents(); to let the GUI respond to any GUI events that may have transpired in the meantime after every block read.

If you do believe it will most likely be used on a SCSI or RAID array, then it may make sense to do multitreading to do the reading. A SCSI drive can have multiple read heads; and some RAID arrays are set up to spread their data across multiple disks for speed purposes. Note that you would be better off using a single thread to do the reading, if the RAID array is set up to keep multiple identical copies of the data for data security purposes. When I went to implement multi-threading, I found the lightweight model proposed here most helpful: QThread: You were not doing so wrong. I did have to do Q_DECLARE_METATYPE on the result structure, have a constructor, destructor, and a move operator defined for it (I used memmove), and did a qRegisterMetaType() on both the structure, and the vector to hold the results, for it to return the results right. You pay the price of it blocking the vector in order to return its results; but the actually overhead of that didn't seem to be much at all. Shared memory might be worth pursuing in this context, too - but maybe each thread could have its own, so you won't need to lock out reads from other thread results to write it.

0 讨论(0)
发布评论:

提交评论
- 加载中...