How to do multiprocessing in FastAPI

99封情书 提交于 2020-11-28 07:58:50

问题


While serving a FastAPI request, I have a CPU-bound task to do on every element of a list. I'd like to do this processing on multiple CPU cores.

What's the proper way to do this within FastAPI? Can I use the standard multiprocessing module? All the tutorials/questions I found so far only cover I/O-bound tasks like web requests.


回答1:


TL;DR

You could use loop.run_in_executor with ProcessPoolExecutor to start function at a separate process.

loop = asyncio.get_event_loop()
with concurrent.futures.ProcessPoolExecutor() as pool:
    result = await loop.run_in_executor(pool, cpu_bound_func)  # wait result

Executing on the fly

The easiest and most native way to execute a function in a separate process and immediately wait for the results is to use the loop.run_in_executor with ProcessPoolExecutor.

A pool, as in the example below, can be created when the application starts and do not forget to shutdown on application exit. The number of processes used in the pool can be set using the max_workers ProcessPoolExecutor constructor parameter. If max_workers is None or not given, it will default to the number of processors on the machine.

The disadvantage of this approach is that the request handler (path operation) waits for the computation to complete in a separate process, while the client connection remains open. And if for some reason the connection is lost, then the results will have nowhere to return.

import asyncio
from concurrent.futures.process import ProcessPoolExecutor
from fastapi import FastAPI

from calc import cpu_bound_func

app = FastAPI()


async def run_in_process(fn, *args):
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(app.state.executor, fn, *args)  # wait and return result


@app.get("/{param}")
async def handler(param: int):
    res = await run_in_process(cpu_bound_func, param)
    return {"result": res}


@app.on_event("startup")
async def on_startup():
    app.state.executor = ProcessPoolExecutor()


@app.on_event("shutdown")
async def on_shutdown():
    app.state.executor.shutdown()

Move to background

Usually, CPU bound tasks are executed in the background. FastAPI offers the ability to run background tasks to be run after returning a response, inside which you can start and asynchronously wait for the result of your CPU bound task.

In this case, for example, you can immediately return a response of "Accepted" (HTTP code 202) and a unique task ID, continue calculations in the background, and the client can later request the status of the task using this ID.

BackgroundTasks provide some features, in particular, you can run several of them (including in dependencies). And in them you can use the resources obtained in the dependencies, which will be cleaned only when all tasks are completed, while in case of exceptions it will be possible to handle them correctly. This can be seen more clearly in this diagram.

Below is an example that performs minimal task tracking. One instance of the application running is assumed.

import asyncio
from concurrent.futures.process import ProcessPoolExecutor
from http import HTTPStatus

from fastapi import BackgroundTasks
from typing import Dict
from uuid import UUID, uuid4
from fastapi import FastAPI
from pydantic import BaseModel, Field

from calc import cpu_bound_func


class Job(BaseModel):
    uid: UUID = Field(default_factory=uuid4)
    status: str = "in_progress"
    result: int = None


app = FastAPI()
jobs: Dict[UUID, Job] = {}


async def run_in_process(fn, *args):
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(app.state.executor, fn, *args)  # wait and return result


async def start_cpu_bound_task(uid: UUID, param: int) -> None:
    jobs[uid].result = await run_in_process(cpu_bound_func, param)
    jobs[uid].status = "complete"


@app.post("/new_cpu_bound_task/{param}", status_code=HTTPStatus.ACCEPTED)
async def task_handler(param: int, background_tasks: BackgroundTasks):
    new_task = Job()
    jobs[new_task.uid] = new_task
    background_tasks.add_task(start_cpu_bound_task, new_task.uid, param)
    return new_task


@app.get("/status/{uid}")
async def status_handler(uid: UUID):
    return jobs[uid]


@app.on_event("startup")
async def startup_event():
    app.state.executor = ProcessPoolExecutor()


@app.on_event("shutdown")
async def on_shutdown():
    app.state.executor.shutdown()

Add durability

We could improve the previous approach to save tasks information in some kind of database, so that if the application restarts, they are not lost. Also, this approach is more scalable and can be shared between several running instances of the application. To do this, you can use, for example, a fairly simple and asyncio intended task queue arq. Job queues and RPC in python with asyncio and redis.

This is how it might look in a simplified form.

# app.py
from http import HTTPStatus
from fastapi import FastAPI
from arq import create_pool
from arq.connections import RedisSettings


app = FastAPI()


@app.post("/new_cpu_bound_task/{param}", status_code=HTTPStatus.ACCEPTED)
async def task_handler(param: int):
    job = await app.state.arq.enqueue_job('start_cpu_bound_task', param)
    app.state.jobs[job.job_id] = job
    return {"uid": job.job_id,
            "info": await job.info()
            }


@app.get("/status/{uid}")
async def status_handler(uid: str):
    return {"uid": uid,
            "info": await app.state.jobs[uid].info()
            }


@app.on_event("startup")
async def startup_event():
    app.state.arq = await create_pool(RedisSettings())
    app.state.jobs = {}

To start worker: arq worker.WorkerSettings

# worker.py
import asyncio
from concurrent.futures.process import ProcessPoolExecutor
from calc import cpu_bound_func


async def start_cpu_bound_task(ctx, param) -> None:
    loop = asyncio.get_event_loop()
    executor = ctx["executor"]
    return await loop.run_in_executor(executor, cpu_bound_func, param)  # wait and return result


async def on_startup(ctx):
    ctx['executor'] = ProcessPoolExecutor()


async def on_finish(ctx):
    ctx['executor'].shutdown()


class WorkerSettings:
    functions = [start_cpu_bound_task]
    on_startup = on_startup
    on_finished = on_finish

More powerful solutions

All of the above examples were pretty simple, but if you need some more powerful system for heavy distributed computing, then you can look aside message brokers RabbitMQ, Kafka, NATS and etc. And libraries using themthem like Celery.



来源:https://stackoverflow.com/questions/63169865/how-to-do-multiprocessing-in-fastapi

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!