异步任务队列


先来看几个基础的定义:

Broker

The storage for the tasks themselves

This can be implemented using any sort of persistence tool

In Django the most common ones in use are RabbitMQ and Redis

Producer

The code that adds tasks to the queue to be executed later.

This is application code, the stuff that makes up a Django project

Worker

The code that takes tasks from the broker and performs them

Usually there is more than one worker

Most commonly each worker runs as a daemon under supervision

Serverless

a third-party service

Serverless takes over the role of Broker and Worker

Usually provided by services such as AWS Lambda 

run in stateless compute containers that are event-triggered

fully managed by a 3rd party

什么时候需要任务队列

Results take time to process:     Task queue should probably be used.

Users can and should see results immediately:    Task queue should not be used
情境 是否使用
Sending bulk email
Modifying files (including images)
Fetching large amounts of data from third-party APIs
Inserting or updating a lot of records into a table
Updating a user profile
Adding a blog or CMS entry
Performing time-intensive calculations
Sending or receiving of webhooks

建议:

➤ Sites with small-to-medium amounts of traffic may never need a task queue for any of these
actions.

➤ Sites with larger amounts of traffic may discover that nearly every user action requires use of a
task queue.

选择适合的任务队列软件

软件名称 优点 缺点
Celery A Django and Python standard, many different storage types, flexible, full-featured, great for high volume Challenging setup, steep learning curve for anything but the basic stuff
DjangoChannels Defacto Django standard, flexible, easy-to-use, adds websocket support to Django No retry mechanism, Redis-only
AWSLambda Flexible, scalable, easy setup API call can be slow, requires external logging services, adds complexity, requires creating REST API for notifications
Redis-Queue Huey, other Django-friendly queues Lower memory footprint than Celery, relatively easy setup Not as many features as Celery, usually Redis-only, smaller communities
django-background-tasks Very easy setup, easy to use, works on Windows, good for small volume or batch jobs, uses Django ORM for backend Uses Django ORM for backend, absolutely terrible for medium-to-high volume

实现任务队列

  • 将任务放在指定的函数中, 方便重复使用以及debug Treat Tasks Like Views

    can put your task code into a function, put that function into a helper module, and then call that function from a task function.

  • 注意任务对服务器的消耗 Tasks Aren’t Free

  • Json序列化的参数 Only Pass JSON-Serializable Values to Task Functions

Don’t pass in complex objects

1 Passing in an object representing persistent data. For example, ORM instances can cause a race
  condition. This is when the underlying persistent data changes before the task is run. Instead,
  pass in a primary key or some other identifier that can be used to call fresh data.
  
  传递一个对象的引用, 中途可能会变化

2 Passing in complex objects that have to be serialized into the task queue is time and memory
  consuming. This is counter-productive to the benefits we’re trying to achieve by using a task
  queue.
  
  序列化复杂的对象会更加耗时
  

3 We’ve found debugging JSON-serializable values easier than debugging more complex objects.


4 Depending on the task queue in use, only JSON-serializable primitives are accepted.
  • 尽可能使任务幂等 Write Tasks as Idempotent Whenever Possible

    you can run the task multiple times and get the same result

  • 防止任务失败, 重试机制 Don’t Keep Important Data in Your Queue

    增加一个是否执行成功的标记, 在任务之后修改

怎样防止任务失败

  • 使用 flower 管理, 追踪 taskworker

pypi.python.org/pypi/flower

  • 做好 logging 记录

流行的异常记录框架 - Sentry

日志管理

应用越做越复杂,输出日志五花八门,有print的,有写stdout的,有写stderr的, 有写logging的,也有自定义xxx.log的。
那么这将导致平台应用日志分布在各个地方,无法统一管理。而且可能用的还不止一种开发语言,想规范和统一日志不是一件容易的事。

Sentry

  • 定期清理无效的任务

  • 手动设置 任务异常

    ➤ Max retries for a task ➤ Retry delays

Buy me a 肥仔水!