ARTICLE AD BOX
first stack post ever 🫡.
I'm having an issue with async tasks where my function seems to be being executed multiple times for a task.
Here are my cluster settings:
Q_CLUSTER = { 'name': 'qcluster', 'workers': 4, 'threads': 1, 'recycle': 500, 'timeout': 30, 'retry': 300, 'max_attempts': 3, 'queue_limit': 8, 'bulk': 1, 'save_limit': 10000, 'orm': 'default', 'sync': bool(IS_TEST), }I have 3 models. One storing incoming reports, an items table where each report that comes in is categorised into an instance of the item table, and a hits table that records down the number of reports for that item for each day.
reports - items : many-to-many
items - hits : one-to-many
def map_report(report_id: int) -> str: """Map report to item.""" logger.warning('Starting map report: (%s)', report_id) report = Report.objects.get(pk=report_id) item = matcher(report) Hit.increment_item_hits(item=item) logger.warning('Finished map report: (%s)', report_id) return f'Item id: {item.pk}'matcher either returns a new or existing item that maps to the report.
I then call this with
for report in Report.objects.all(): async_task(map_report, report_id=report.pk)When i run this against test data of 1000 reports it works correctly however when i run it against larger amount of data like 5000 tasks seem to run multiple times. Inside of successful tasks the correct number of tasks are run but when I check the logs it shows that map_reports seems to be being run for the same task multiple times which makes my hits end up more than expected.
Also if I run it synchronously it gives me the correct number of hits but asynchronously causes the hits to be larger than expected since map_reports is being called too many times.
I have seen similar issues where the solution was to increase the cluster's retry setting but I believe that I have it set up so that the task would definitely timeout before it is retried.
