先来看几个基础的定义:
Broker
The storage
for the tasks
themselves
This can be implemented using any sort of persistence tool
In Django the most common ones in use are RabbitMQ and Redis
Producer
The code
that adds tasks to the queue to be executed later.
This is application code, the stuff that makes up a Django project
Worker
The code
that takes tasks from the broker and performs them
Usually there is more than one worker
Most commonly each worker runs as a daemon under supervision
Serverless
a third-party
service
Serverless takes over the role of Broker and Worker
Usually provided by services such as AWS Lambda
run in stateless compute containers that are event-triggered
fully managed by a 3rd party
什么时候需要任务队列
Results take time to process: Task queue should probably be used.
Users can and should see results immediately: Task queue should not be used
情境 | 是否使用 |
---|---|
Sending bulk email | 是 |
Modifying files (including images) | 是 |
Fetching large amounts of data from third-party APIs |
是 |
Inserting or updating a lot of records into a table | 是 |
Updating a user profile | 否 |
Adding a blog or CMS entry | 否 |
Performing time-intensive calculations | 是 |
Sending or receiving of webhooks | 是 |
建议:
➤ Sites with small-to-medium amounts of traffic may never need a task queue for any of these
actions.
➤ Sites with larger amounts of traffic may discover that nearly every user action requires use of a
task queue.
选择适合的任务队列软件
软件名称 | 优点 | 缺点 |
---|---|---|
Celery | A Django and Python standard , many different storage types , flexible, full-featured, great for high volume |
Challenging setup, steep learning curve for anything but the basic stuff |
DjangoChannels | Defacto Django standard , flexible, easy-to-use, adds websocket support to Django |
No retry mechanism , Redis-only |
AWSLambda | Flexible, scalable, easy setup | API call can be slow, requires external logging services, adds complexity, requires creating REST API for notifications |
Redis-Queue Huey, other Django-friendly queues | Lower memory footprint than Celery, relatively easy setup | Not as many features as Celery, usually Redis-only, smaller communities |
django-background-tasks | Very easy setup, easy to use, works on Windows, good for small volume or batch jobs, uses Django ORM for backend |
Uses Django ORM for backend, absolutely terrible for medium-to-high volume |
实现任务队列
-
将任务放在指定的函数中, 方便重复使用以及debug
Treat Tasks Like Views
can put your task code into a function, put that function into a helper module, and then call that function from a task function.
-
注意任务对服务器的消耗
Tasks Aren’t Free
-
Json序列化的参数
Only Pass JSON-Serializable Values to Task Functions
Don’t pass in complex objects
1 Passing in an object representing persistent data. For example, ORM instances can cause a race
condition. This is when the underlying persistent data changes before the task is run. Instead,
pass in a primary key or some other identifier that can be used to call fresh data.
传递一个对象的引用, 中途可能会变化
2 Passing in complex objects that have to be serialized into the task queue is time and memory
consuming. This is counter-productive to the benefits we’re trying to achieve by using a task
queue.
序列化复杂的对象会更加耗时
3 We’ve found debugging JSON-serializable values easier than debugging more complex objects.
4 Depending on the task queue in use, only JSON-serializable primitives are accepted.
-
尽可能使任务幂等
Write Tasks as Idempotent Whenever Possible
you can run the task multiple times and get the same result
-
防止任务失败, 重试机制
Don’t Keep Important Data in Your Queue
增加一个是否执行成功的标记, 在任务之后修改
- 使用
flower
管理, 追踪task
和worker
- 做好 logging 记录
流行的异常记录框架
- Sentry
日志管理
应用越做越复杂,输出日志五花八门,有print的,有写stdout的,有写stderr的, 有写logging的,也有自定义xxx.log的。
那么这将导致平台应用日志分布在各个地方,无法统一管理。而且可能用的还不止一种开发语言,想规范和统一日志不是一件容易的事。
-
定期清理无效的任务
-
手动设置 任务异常
➤ Max retries for a task ➤ Retry delays