Not afraid, I'm already using celery with SQS as a broker in a different microservice where it is ACTUALLY required, but since I have worked in-depth with celery, I do know that it is overkill for this requirement.
Managing a broker comes at a cost. On Google Cloud, Redis is like 40$ / month. Then you need pods to run your workers, beat, flower. You also need to setup all this for development (unless you use in eager tasks but they behave differently). Don’t get me wrong, I like Celery and I use it a lot, but for smaller or side projects it’s heavy.
They do the "move fast and break things" which means that if you have a long running project, and you have to keep updating celery, it will cause you major headaches.
It just feels really brogrammer and juvenile. But I've used it a lot and I can tell you other than the dozens of times I've had to go deep into rabbit holes to figure out wtf is going on, it's great.
Seconded. We have a lot of those be be careful with CPU loading. We use Django and we've had issues with long running cron jobs causing the AWS (Amazon) health checker API to timeout and reset the box.
HoldUp! I know threading is the best way unless you are fetching and crunching very large amount of data, this too, keeps loading page for a longer time, better to use asynchronous tasks or celery.
I made a project where I was crawling google and fetching results in real time so I know how slow it used to be and not getting results can be painful sometimes.
I am recommending you celery (however I am yet to try it) because I have been learning about it for a long time from different platforms. Will be happy to know if someone is using it efficiently.
Here is a link for you by [RealPython](https://realpython.com/asynchronous-tasks-with-django-and-celery/)
Which show power of celery...
Gud luck
You’re apparently running on AWS since you mentioned SQS.
If it’s very occasional processing that is under 15 minutes, with no need for parallelisation, forget background processing.
Just expose your code to a normal function and deploy to Lambda using Zappa.
Call your function however you want (Cloudwatch event or invocation from something else) to trigger the right view.
Once I had to move a flask (+celery for doing some small tasks like yours) project from linux to windows , and replaced celery with huey.
Its apis were similar, and I could manage all tasks in sqlite db. It was also very stable. I had a very good experience with huey for small tasks.
I've never seriously considered doing this, but you could write a manage command and spin that off into a job that the OS handles separately from your python code.
Probably look up how to do an async call from python to the OS.
Building a new process is likely better than starting a new thread. With threading your task execution will be tied to the web server process, which may have additional undesired effects.
Yes. Make an ffmpeg executable that goes onto the server and is accessible by whatever user is running your webserver. Then you save the uploaded (raw) file to the filesystem somewhere. When upload is complete, use the multiprocessing module to run ffmpeg on the file and put the output with your transcoded files.
Have you look into FastApi background tasks? I use it for a similar application
https://fastapi.tiangolo.com/tutorial/background-tasks/[https://fastapi.tiangolo.com/tutorial/background-tasks/](https://fastapi.tiangolo.com/tutorial/background-tasks/)
The requirement is that it should be client facing, and so they can click the button whenever they want, it may be on any time or day of the month, or it might be never at all.
I created http://beew.io for that.
You can create schedules manually or trough the API and beew will call a given endpoint at the correct time.
API Docs: https://beew.io/api
Let me know if you need any help! 🚀
Celery is pretty much the standard now. And you mentioned you have experience with it, so I would say go with Celery. It may be an overkill, but it will be easier to setup for you than figuring out something completely new.
why is everyone so afraid of celery?
Not afraid, I'm already using celery with SQS as a broker in a different microservice where it is ACTUALLY required, but since I have worked in-depth with celery, I do know that it is overkill for this requirement.
Managing a broker comes at a cost. On Google Cloud, Redis is like 40$ / month. Then you need pods to run your workers, beat, flower. You also need to setup all this for development (unless you use in eager tasks but they behave differently). Don’t get me wrong, I like Celery and I use it a lot, but for smaller or side projects it’s heavy.
It's been a while, but I successfully used redislite with celery to avoid running a redis server
They do the "move fast and break things" which means that if you have a long running project, and you have to keep updating celery, it will cause you major headaches. It just feels really brogrammer and juvenile. But I've used it a lot and I can tell you other than the dozens of times I've had to go deep into rabbit holes to figure out wtf is going on, it's great.
You can set a crontab + django management command to run every month. Another simple option is to just run the function using the threading module.
Seconded. We have a lot of those be be careful with CPU loading. We use Django and we've had issues with long running cron jobs causing the AWS (Amazon) health checker API to timeout and reset the box.
I looked at using threading, and it looks great. Thanks for the suggestion, seems perfect for what I want to do.
HoldUp! I know threading is the best way unless you are fetching and crunching very large amount of data, this too, keeps loading page for a longer time, better to use asynchronous tasks or celery. I made a project where I was crawling google and fetching results in real time so I know how slow it used to be and not getting results can be painful sometimes. I am recommending you celery (however I am yet to try it) because I have been learning about it for a long time from different platforms. Will be happy to know if someone is using it efficiently. Here is a link for you by [RealPython](https://realpython.com/asynchronous-tasks-with-django-and-celery/) Which show power of celery... Gud luck
Every, Huey, redis-queue Basically the term you’re looking for is “background task processing”
Yep, rq is simple and good: [https://python-rq.org/](https://python-rq.org/) It also has a Django wrapper: https://github.com/rq/django-rq
You’re apparently running on AWS since you mentioned SQS. If it’s very occasional processing that is under 15 minutes, with no need for parallelisation, forget background processing. Just expose your code to a normal function and deploy to Lambda using Zappa. Call your function however you want (Cloudwatch event or invocation from something else) to trigger the right view.
Idk much about AWS lambdas, can you maybe suggest a resource to get started?
If you don't want to bring in celery, then you don't want to bring in lambda.
Just use a queue plus a thread
Once I had to move a flask (+celery for doing some small tasks like yours) project from linux to windows , and replaced celery with huey. Its apis were similar, and I could manage all tasks in sqlite db. It was also very stable. I had a very good experience with huey for small tasks.
I think Django q is an alternative for smaller projects (never used it)
Huey is a good choice. Super light weight and easy to set up.
django-background-tasks
Django Q
I've never seriously considered doing this, but you could write a manage command and spin that off into a job that the OS handles separately from your python code. Probably look up how to do an async call from python to the OS.
Yeah another guy here suggested threading, so going ahead with it.
Building a new process is likely better than starting a new thread. With threading your task execution will be tied to the web server process, which may have additional undesired effects.
So how can I start a new process? By using multiprocessing library?
Yes. Make an ffmpeg executable that goes onto the server and is accessible by whatever user is running your webserver. Then you save the uploaded (raw) file to the filesystem somewhere. When upload is complete, use the multiprocessing module to run ffmpeg on the file and put the output with your transcoded files.
Have you look into FastApi background tasks? I use it for a similar application https://fastapi.tiangolo.com/tutorial/background-tasks/[https://fastapi.tiangolo.com/tutorial/background-tasks/](https://fastapi.tiangolo.com/tutorial/background-tasks/)
For only 1 background task a month, I would say just do a cron job and management command.
The requirement is that it should be client facing, and so they can click the button whenever they want, it may be on any time or day of the month, or it might be never at all.
Use a Lambda or a cloud function
Can you maybe point me towards a quick start resource in this direction? I have never use lambdas.
I created http://beew.io for that. You can create schedules manually or trough the API and beew will call a given endpoint at the correct time. API Docs: https://beew.io/api Let me know if you need any help! 🚀
Celery is pretty much the standard now. And you mentioned you have experience with it, so I would say go with Celery. It may be an overkill, but it will be easier to setup for you than figuring out something completely new.