Downloading whole websites with k6

问题

I'm currently evaluating whether k6 fits our load testing needs. We have a fairly traditional website architecture that uses Apache webservers with PHP und a MySQL database. Sending simple HTTP requests with k6 looks simple enough and I think we will be able to test all major functionality with it, as we don't rely on JavaScript that much and most pages are static.

However, I'm unsure how to deal with resources (stylesheets, images, etc.) that are referenced in the HTML that is returned in the requests. We need to load them as well, as this sometimes leads to database requests, which must be part of the load test.

Is there some out-of-the-box functionality in k6 that allows you to load all the resources like a browser would? I'm aware that k6 does NOT render the page and I don't need it to. I only need to request all the resources inside the HTML.

回答1:

You basically have two options, both with their caveats:

Record your session - you can either export har directly from the browser as shown there or use an extension made for your browser here is firefox and chromes. Both should be usable without a k6 cloud account you just need to set them to download the har and it will automatically (and somewhat silently) download them when you hit stop. And then either use the in k6 har converter (which is deprecated, but still works) or the new har-to-k6 one which.

This method is particularly good if you have a lot of pages and/or resources and even works if you have a single page style of application as it just gets what the browser requested as a HAR and then transforms it into a script. And if there were no dynamic things that need to be inputed (username/password) the final script can be used as is most of the time.

The biggest problem with this approach is that if you add a css file you need to redo this whole exercise. This is even more problematic if you css/js file name change on each change or something like that. Which is what the next method is good for:
Use parseHTML and then find the elements you care about and make a request for them.

import http from "k6/http";
import {parseHTML} from "k6/html";

export default function() {
    const res = http.get("https://stackoverflow.com");
    const doc = parseHTML(res.body);
    doc.find("link").toArray().forEach(function (item) {
        console.log(item.attr("href"));
        // make http gets for it
        // or added them to an array and make one batch request
     });
}

will produce

NFO[0001] https://cdn.sstatic.net/Sites/stackoverflow/img/favicon.ico?v=4f32ecc8f43d
INFO[0001] https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a
INFO[0001] https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a
INFO[0001] /opensearch.xml
INFO[0001] https://cdn.sstatic.net/Shared/stacks.css?v=53507c7c6e93
INFO[0001] https://cdn.sstatic.net/Sites/stackoverflow/primary.css?v=d3fa9a72fd53
INFO[0001] https://cdn.sstatic.net/Shared/Product/product.css?v=c9b2e1772562
INFO[0001] /feeds
INFO[0001] https://cdn.sstatic.net/Shared/Channels/channels.css?v=f9809e9ffa90

As you can see some of the urls are relative and not absolute so you will need to handle this. And in this example only some are css, so probably more filtering is needed. The problem here is that you need to write the code and if you add a relative link or something else you need to handle it. Luckily k6 is scriptable so you can reuse the code :D.

回答2:

I've followed Михаил Стойков suggestion and written my own function to load resources. Maybe it helps some future readers. You can set the way resources are loaded (batch or sequential gets with options.concurrentResourceLoading).

/**
* @param {http.RefinedResponse<http.ResponseType>} response
*/
export function getResources(response) {
const resources = [];
response
    .html()
    .find('*[href]:not(a)')
    .each((index, element) => {
    resources.push(element.attributes().href.value);
    });
response
    .html()
    .find('*[src]:not(a)')
    .each((index, element) => {
    resources.push(element.attributes().src.value);
    });

if (options.concurrentResourceLoading) {
    const responses = http.batch(
    resources.map((r) => {
        return ['GET', resolveUrl(r, response.url), null, { headers: createHeader() }];
    })
    );
    responses.forEach(() => {
    check(response, {
        'resource returns status 200': (r) => r.status === 200,
    });
    });
} else {
    resources.forEach((r) => {
    const res = http.get(resolveUrl(r, response.url), {
        headers: createHeader(),
    });
    !check(res, {
        'resource returns status 200': (r) => r.status === 200,
    });
    });
}
}

来源：https://stackoverflow.com/questions/60927653/downloading-whole-websites-with-k6

标签

load-testing