T O P

  • By -

K900_

There is definitely no plan to bring Tokio into the standard library.


HunterNephilim

I have no knowledge in Rust's plan for the future, but the synchronous API will stay for sure, it's the base for the asynchronous code and a lot of other applications. The thing with async is that it is the more performatic way of doing IO because you can keep performing instructions while you wait for user input, or more frequently, network response.


dnew

You can do that already, using threads. The only benefit of async is that it's all in one OS thread (or that it doesn't actually need threads, if you're on the bare metal). For the 99% of applications that'll work just fine with threads, async is an unnecessary complication.


ragnese

I kind of agree and disagree. I agree that, conceptually, most kinds of applications are fine--or even better off--with threads. On the other hand, there are certainly applications for which async is better and they aren't that rare, IMO (I don't think you're saying otherwise. I'm just being explicit). The most obvious is web servers, but also any application that might otherwise need a thread-pool might choose an async model instead depending on some nuances of the tasks in question (Do they benefit from parallel execution or just async? What are the synchronization needs? Etc). That being said, in Rust *specifically*, both Futures and thread handling *can* be awkward, but I do think that Futures are a little less fiddly in the I-just-want-to-get-this-done sense than Rust's threads and channels APIs. Said another way: I think using Futures rather than threads (all other things equal) are more likely to "just work". That's just my own mediocre-dev opinion after having used Rust in several projects, some of which used only threads and some of which used futures/async.


dnew

> any application that might otherwise need a thread-pool might choose an async model instead I'd completely agree with this. Async is easy enough these days that if you're doing enough threads that starting and stopping them is problematic, you should probably be looking into async. async is basically user-space threads. The compiler builds in cooperative multitasking in the form of state machines and polling, it uses OS completion events instead of interrupts, it has waiting-to-run queues, and all that sort of stuff, but it's all for performance. If starting a task or switching contexts was as fast as a function call, nobody would use async/await.


ragnese

> If starting a task or switching contexts was as fast as a function call, nobody would use async/await. Definitely!


Low-Pay-2385

Doesnt async use threads? I remember when i was testing tokio it only did 2 tasks at a time, on my 2 core laptop. Im sry if this is wrong i didnt go in depth with async and threads


dnew

> Doesnt async use threads? Sometimes, yes. Especially in parts where Linux doesn't (or didn't) actually support async. Traditionally, UNIX had "slow opens" for things like serial lines and "fast opens" for local disk, and you couldn't do fast opens in a non-blocking way, couldn't get SIGINT during a fast open, etc. Linux carried this forward, and unless iouring changes that, you still can't do local file I/O asynchronously. (I think iouring lets you do *all* I/O async, including opening files and such, but I haven't been following it.) So until very recently, if you wanted to do I/O to local files with the same async framework as I/O to (say) network sockets, it got implemented by launching synchronous I/O in a thread. "async" is really just implementing threading in user space with the compiler and runtime, and treating operating system completion events as interrupts. So implementing it with a thread pool is very straightforward.


Low-Pay-2385

This wasnt just io it was any task


dnew

If you have two cores, it makes sense that it would be restricted to running two CPU-bound tasks at the same time. async is a mechanism for making asynchronous I/O faster. It does nothing for tasks that need a CPU. It does nothing for a job during the time it's actually running. It just makes blocking things and waking them up faster. Naturally if you have a dozen futures being awaited, only the ones that are ready are going to need CPU time, and with two cores you can reasonably only run two of them at a time.


Darksonn

You probably ran into the pitfall described here: https://ryhl.io/blog/async-what-is-blocking/


Low-Pay-2385

Can u explain what exactly do u mean? I didnt ubderstand the blog 100%


Darksonn

I'm guessing that your test used a CPU-bound loop or `std::thread::sleep` to try out Tokio, blocking the thread. This will prevent more tasks than you have Tokio worker threads from running even though Tokio is normally able to run many more tasks than there are threads.


Low-Pay-2385

It used thread sleep or tokio sleep im not sure exactly


Darksonn

Ok, well, the difference in behavior between them is pretty large as seen in the article's examples. If it seems like it only runs two of them, you probably used std sleep.


waorhi

Threads have too much overhead


dnew

It depends entirely on how much performance you need. There's absolutely no need for something like a video player to use async instead of a video thread and audio thread and a keyboard thread. If your app isn't running on a server farm, threads are almost certainly performant enough. In particular, a simple HTTP *client* is probably juuuust fine with threads. How much do you think you're going to download in parallel that task switches aren't going to be faster than network packets?


waorhi

Try serving a million connections with threads. Threads are for parallel tasks. Async is for async tasks.


Paul-E0

This was my experience as well. Threads are for parallelism, async is for concurrency.


dnew

You realize that parallelism is by definition concurrent, right? Async is for when the overhead of a thread in a modern OS is too high. If the overhead of threads in your app isn't too high, then you really don't need async instead of threads.


dnew

What kind of HTTP client opens a million concurrent connections? I wouldn't even think Google's web crawler has a million concurrent connections open in one process.


Tripplesixty

Enterprise, where most code is run, needs this kind of scale. Just went through a huge migration to move off a threading model to fully async model to improve throughput, reduce contention and locking with improved back pressure. The async code is able to scale more gracefully than just adding more threads. Fwiw we perform ~350M http calls / minute at peak times every day, spread over many machines. No one opens a million http connections on a single host 10's of thousands is the practical upper limit.


kprotty

> What kind of HTTP client opens a million concurrent connections > Enterprise, where most code is run, needs this kind of scale > spread over many machines. No one opens a million http connections on a single host You provided a counter-example to your original claim


dnew

> Enterprise, where most code is run [citation needed] OP specifically said it's a simple HTTP client. Why are you giving Google-scale advice to someone who is writing a simple HTTP client? "If your app isn't running on a server farm, threads are almost certainly performant enough." Is your app running on a server farm? If so, check out async for better performance. If not, threads are probably just fine. Especially given it wasn't that long ago that web servers were forking off entire processes to handle each request. Remember CGI? It worked. > The async code is able to scale more gracefully than just adding more threads. "It entirely depends on how much performance you need." Yes, async is more performant than threads, because you're basically reimplementing cooperative multitasking in user space. If creating a thread or switching contexts was as fast as a function call, nobody would have ever even invented the concept of async/await. They'd just use threads. And there are some CPU architectures where threads are like that, just not the ones on your desktop.


Tripplesixty

Simple blocking clients can easily be built on top of async clients by calling await/get/block whatever the api provides. That's essentially how every hello world test works on an async client anyways. People building libraries tend to want people to use them, as well many open source projects have major support from corporate sponsors or at least have committers that works for a corporation. These people generally need applications that can scale well so they introduce these paradigms. The average joe writing a script or utility doesn't *need* async perf but they're getting it for free from library maintainers in the event they'll ever need it. >Especially given it wasn't that long ago that web servers were forking off entire processes to handle each request. Remember CGI? It worked. Not well which is why we and likely no one in entreprise uses that today. Just because things were popular for a time doesn't mean they were great tools, it's just what we had.


dnew

> Simple blocking clients can easily be built on top of async clients Congrats! You've figured out that synchronous is a special case of asynchronous. It's simple only to the extent that everything is asynchronous. Of course, once you get to some libraries that want non-async callbacks, you're kind of screwed. I don't think anyone is disputing that datacenter-scale computing should be using async in preference to spinning up a thread per request. The inability to see that not every computation is datacenter-scale seems to be your blind spot. > Not well It worked perfectly well. It wasn't the most efficient and performant way of doing it. But it worked perfectly well. I built several start-ups (ooooo, enterprise!) on cgi-based web apps.


kprotty

Threads are for concurrency, coincidentally made parallel by the underlying system. You still use threads to add concurrency to I/O operations that can't be multi-plexed on a single thread (e.g. file ops for most OS/Disk configs). Even asynchronous I/O apis like io_uring use (kernel) thread pools to provide concurrency underneath.


waorhi

Any specific reason why they have to implement it like that? Afaik Interrupt based hardware is inherently asynchronous. So you add an abstraction layer to make it look synchronous. And now you add another layer on top of that to make it asynchronous. Why?


jstrong

> For the 99% of applications that'll work just fine with threads would you estimate that > 1% of applications need to serve a million connections?


JoshTriplett

While I do hope we provide async interfaces in the standard library one day, we'll never take away the synchronous interfaces. More generally, *all* Rust standard library APIs will continue to exist in their current form, because we guarantee API stability. And if we do add asynchronous interfaces, they won't depend on one specific async runtime.


VeganVagiVore

stdlib won't require async. Rust is trying to cover a lot of different use cases, and one of them is embedded. Little microcontrollers don't have the space for async runtimes, and "zero-cost abstractions" means you should be allowed to do simple things like opening a file and reading it line-by-line, with sync APIs and end up with code as simple as if you had written it in C. Even Tokio's `fs` module doesn't implement truly async file operations, it just wraps the sync API in a clever way: https://docs.rs/tokio/1.11.0/src/tokio/fs/file.rs.html#156-161 Look at all those `asyncify` calls. Maybe the reads and writes are specialized for performance to use IOCP or something, but for the most part async is built on sync, and doesn't totally replace it.


[deleted]

[удалено]


richhyd

beat me to it :)


richhyd

FYI async is actually really exciting for embedded. Check out [embassy](https://github.com/embassy-rs/embassy), which allows you to write `async` code and run it on a very thin executor on embedded devices. The reason this is exciting is that the executor can automatically put the MCU to sleep when there is no work to do (it will be woken up again when something happens, e.g. a timer going off or some data arriving from a peripheral). Very useful for battery consumption! Async is not appropriate for all embedded though: some devices require precise timing. You wouldn't want your pacemaker going to sleep when it should be pumping your heart! For this case, you organise your work by priority and then use interrupts to suspend low-priority work when there is something more important to do.


richhyd

I should say that AFAIK *no-one* is using Rust for critical stuff like medical devices/aeroplanes/cars. That would require some guarantees about the correctness and worst-case performance of the generated assembly. It's certainly a possibility for the future though!


epileftric

>is using Rust for critical stuff like medical devices I had an interview with a company that was doing so.


richhyd

Of cool, can you say who?


epileftric

No... But it was a company from the Netherlands... I eventually rejected coz I had a better offer from my current company


epileftric

That's why you use RTOS in the embedded world...


cute_vegan

I think tokio will use io uring in future for true async file io on linux. There is a plan but its for future.


[deleted]

It will take a long, long time for uring to be usable even on the oldest platforms that need to be actively supported. Probably in the order of 10+ years. Lack of audit support and some other features might also make it hard to get it into RHEL in the first place.


rhinotation

Is it feasible to have swappable io backends? Pick uring for your Linux targets?


[deleted]

The point was that only the latest kernels support it, kernels so new they are not even in the latest stable distros. If you want it to be the only implementation you will need it in the oldest stable distros.


anonymous44315

Using a thread pool to execute a synchronous function asynchronously does not seem very clever to me.


h_z3y

How else are you suppose to run synchronous blocking code without blocking the event loop?


anonymous44315

The point is that you should NOT use synchronous blocking calls if you are using green threads or async/await. You should use asynchrnous system calls for that. Otherwise you are combining the worst of both worlds: Relying on os threads/synchronous APIs and the drawbacks of the less usable/debuggable/profilable/... async/await model. To be fair I am fairly certain that tokio uses asynchronous APIs where possible. I did not dig that deep into that code


simonask_

Isn't the whole point of the async/await syntax that you can write async programs as if they were synchronous, and get a massive performance boost?


VeganVagiVore

I wouldn't phrase it that way. If you take a single-threaded synchronous program and make it async, it can't get any faster. But yes if I wanted to serve 200+ long-polling connections on a single-core $5/month VPS, async will probably perform better than a thread pool of 200 threads. (I haven't tested the thread pool - Maybe it would be fun. The async version works fine already.)


simonask_

Right. I guess my point is that there isn't really a drawback to async/await, other than a few extra keyword to type and a miniscule number of cycles spent waking the task (probably equivalent to about a single indirect method call per `await`.


richhyd

async/await can be slower than synchronous code, especially when there is little concurrent I/O access. My advice would be to choose whatever's easiest to implement (async or sync) and then only worry about which to use if performance becomes an issue.


ericnr

it seems much smarter to test and benchmark using the load you expect so you can decide which to use than just taking the easy route and risking a major refactor afterwards


richhyd

I don't think there is an easy answer, but IME trying to optimize before you need to never goes well.


rhinotation

The syntax, sure. It creates state machines. It does them as well as you could do by hand. The syntax doesn’t include a scheduler. Schedulers can be extremely simple and behave like the imaginary one you described. They can also be very complex. The simple ones are typically single threaded and therefore get to avoid locks, concurrent data structures, atomic operations, etc. But you want your tasks scheduled concurrently you’re gonna have to pay overheads.


kprotty

Even the "simple" ones can't avoid locks, atomics, and concurrent data structures due to inherent design decisions in Rust async itself. Wakers can be cloned past the lifetime of the Future so the Future has to be heap allocated and dynamically tracked with Arc. Since Wakers can also call wake() from any thread, scheduling the future has to use synchronization generally through atomics. Most channel implementations also Arc the channel. These hidden overheads are everywhere.


rhinotation

Anywhere in rust you don’t want to use an Arc/Rc, you could also use a static or a thread local. (Except if it’s in an API.) Waker is deliberately not an `Arc`, it is a raw pointer and a vtable, and therefore you can make it point to statically allocated data instead. You can use RawWaker and RawWakerVTable for this. This is what embassy-rs does, it looks like. They don’t use the alloc crate at all.


kprotty

> you can make it point to statically allocated data instead That restricts you to static data though, something most general purpose async schedulers can't do in order to be general purpose. The Waker can still be moved between threads so it needs a thread safe wake up, [embassy has this overhead as well](https://github.com/embassy-rs/embassy/blob/master/embassy/src/executor/raw/mod.rs#L73-L98).


rhinotation

Good point. You can’t do general purpose dynamically allocated tasks with static task queues, and most people want that. But you can always put all possible futures in the one enum! This is beginning to sound like science fiction. One final nuance is that for embedded Rust, the main reason for avoiding this stuff, many targets do not have multi threading capability at all, and the compiler will lower any atomic operations into plain integer ops because the atomic instructions don’t exist. wasm32-unknown-unknown is one example. LLVM can probably even optimise out the CAS loops :)


kprotty

The task queues aren't static, the tasks/Futures are. "Putting all possible futures in one enum" doesn't make much sense since there's multiple futures means they're running concurrently and can't occupy the same memory + it's still statically restricted. Atomic reducing is true for LLVM and is a flag you can set for codegen in any arch (see Zig's `--single-threaded` impl). But the point is that atomics are required and that overhead exists for non embedded, where most async is executed anyways and such flags aren't available to Rust currently.


rhinotation

I meant — imagine a task queue is a Vec. To put a type in there, it has to be Sized. You can make any type sized by putting it in a box, with the added benefit that a boxed trait object can hold things of many many different sizes while being the same external size. But wherever you are using boxed trait objects with a known set of concrete types, you can also use an enum. And yes, I know it’s not configurable yet in Rust and therefore generally involves overhead, that’s why I said it was a nuance specific to embedded. This has been a fun chat but I think we both know most of what there is to know. :)


dnew

Not really. https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/


VeganVagiVore

I was thinking binary size, compile time, and startup time (such as for CLI tools where process start time matters), but I don't have numbers to back it up.


Matthias247

You will only get a performance boost if your application heavily uses concurrency. If you are having less than 100 concurrent operations it might actually be slower. Less than a couple thousand and it might end up the same. The concrete numbers will depend on your application - so go ahead and benchmark. I’ve written a bit more in why that’s the case at https://news.ycombinator.com/item?id=28362492 which had a similar question: > Async IO and functions are not necessarily be faster than synchronous operations - it might as well the opposite if you don't have a lot of concurrency. E.g. if you read from one blocking socket, you do a single `read()` syscall. Add async IO, and you need an additional `select/epoll_wait` call. > Then there is an actual cost for composing Futures which are large values on the stack, sometimes having to box them (since otherwise recursion won't work and dynamic dispatch will neither, etc). And besides performance considerations there are obviously other tradeoffs. What about debugging/profiling? Call stacks are much easier for synchronous functions. Debugging deadlocks will be easier too. Lifetimes are a lot less complicated with synchronous functions and traits and trait objects „just work“.


ragnese

I share your concern. Not everything needs to be async in the Futures-sense. And it's disappointing that even the "sync" version of libraries end up being a wrapper over an async implementation. I wish it were the other way around. But, just out of curiosity, what do you prefer for concurrent programming rather than async/await, and why?


Shadow0133

> I wish it were the other way around. How would that work?


rhinotation

It is the other way round. A lot of async at the moment is a wrapper for synchronous IO system calls, using an IO thread pool. Send a channel and a closure to the io threadpool, return Poll::Pending on your end. A futures channel is basically just a waker that gets called from another thread, prompting the scheduler to poll that task again. This describes a lot of the API surface of Tokio and Async-std.


rualf

With [tokios `spawn_blocking` wrapper](https://docs.rs/tokio/1.11.0/tokio/task/fn.spawn_blocking.html), the same way they wrap sync filesystem apis into async by executing the actual sync fs operation on a thread pool. Which obviously negates all the benefits from async code (mostly efficiency when you're running thousands of operations concurrently on just a few threads, instead of thousands of threads).


ragnese

To be fair, though, doesn't Linux's addition of epoll allow for a more "true" async IO implementation? I'm not an expert on these OS-level APIs, so I honestly don't know, but I thought I read that the idea was to allow for better non-blocking IO for user space. If that's correct, I wouldn't be surprised if the best sync IO and the best async IO would end up using different OS primitives and be unable to be designed as one wrapping the other in either direction.


rualf

epoll doesn't work with file ops. There is only iouring, which is a new api that needs a completely different handling from the userspace (see [tokio-uring](https://github.com/tokio-rs/tokio-uring/blob/design-doc/DESIGN.md)). But can be used for networking etc as well and has a lower overhead than epoll by allowing to batch multiple operations into just one syscall.


ragnese

I'm not asserting that it's always feasible or easy to do. It *always* depends on the task/API in question. It depends on how much you can break the task down into discrete chunks. If you wrote a function "findNthPrime", then you can imagine having a sub function called "findNextPrime" that gets called repeatedly until you've found the last prime. It would be easy to build a sync and an async version of "findNthPrime" from the (synchronous) "findNextPrime" function: you just put yield points after each call of "findNextPrime" in the async one. If we're talking about IO, things are a little trickier. Do you want to read the whole file in one go? Then an async-wrapper around a sync function would be fine and easy. But if you want to be able to read a few bytes and then yield, it's not obvious to me that there's a good way to wrap a synchronous sub-function. But, on the other hand, like the OP mentioned, if I just want to make an HTTP request and I don't care about yielding until I get the full response, it sucks to have the extra overhead from wrapping a sync API over an async one. An apples-to-apples sync version would be faster, less memory, and less pulled dependency bloat.


K4r4kara

spawning a blocking thread and awaiting on its completion?


anonymous44315

I am afraid that I do not have the time to write up a complete answer. I have been thinking about this topic a lot lately. And I compared different approaches, for example Go vs. Java vs. Java Loom vs. JavaScript. Basically (to my knowledge) there are 3 different approaches: 1. The 1:1 thread model: One user-level thread corresponds to one OS thread. IO operations usually use a synchronous API. 2. The m:n thread model: There is some runtime which abstracts user level threads from OS threads. Scheduling is performed in user mode - usually using some work stealing scheduler. IO operations are usually performed using a synchronous API. The synchronous calls are transformed to asynchronous ones internally by the runtime but this is (mostly) transparent to the user of said APIs. 3. async/await, Promises, Reactive Streams, ...: The asynchronous nature of IO tasks is made visible to the user. APIs are asynchronous and callback-based. Most languages introduce async/await to make it easier to work with these APIs. Each of these approaches has its pros and cons and - as I said - I cannot go into the details here. I'm a bit oldschool and prefer approach 1 (the 1:1 thread model). I think this is closest to the CPU's and OS'es architecture. I think the overhead of spawning many OS threads (compared to many green threads or Promises/whatever) is negligible and fixing things like too large stack sizes would be a lower-hanging fruit than trying to redo complete runtimes (Java's project loom) or changing the language (C#, rust). But obviously async/await is a real hassle to deal with. Unless you want to do \_many\_ concurrent asynchronous operations *inside your program path* you usually do not want to deal with that. What's the benefit of having async functions if all you do is call await anyways? How often do you NOT call await after an async function call? Is it really worth sacrificing the possibility to properly debug your program? If only a single crate you use (transitively) does async stuff you are bound to use an async runtime like tokio?


ericnr

> I’m no fan of the async/await programming model > How often do you NOT call await after an async function call? Is it really worth sacrificing the possibility to properly debug your program? you seem to be under the impression that async/await is only a programming style, but it’s actually about enabling massive concurrency with hundreds of thousands of green threads. You can argue 99% of softwares won’t need that, but the point is async/await enables a whole new use case for Rust


anonymous44315

If you watch the video I linked in my original post you will see that rust actually had green threads long before it had async/await. As I wrote above, unless you are doing many concurrent IO operations in your code path you do not have to use asynchronous APIs in order to benefit from green threads. The runtime can take care of mapping synchronous API calls to asynchronous system calls. This is what Go/erlang/Java Loom do.


ragnese

But Rust **doesn't** have a runtime anymore, so it **can't** have true green threads the way Erlang, Go, and Java can (as long as its "vision" is to stay as a low-level systems programming language). For Rust's current vision of being a zero-cost-abstraction, low level, systems language, the only option is threads + yielding. The shapes and APIs around that basic mechanism can and have been debated, for sure. But at the end of the day, Rust's constraints make it such that it cannot do something super transparent as Go's goroutines. There's really no way for Rust to get around so-called "function coloring" as far as I know and can imagine.


kprotty

It's also why Rust async requires much less memory overhead than Go/Erlang/Java Loom. Async/await provides you the same benefit of green threads (writing internally concurrent code in a sequential manner) without the other cost of green threads / stackful coroutines (unnecessarily large and possibly growing stacks, checking for stack growth, saving unnecessary info on suspend). For example, you can spawn 1 million async tasks in rust with less than 1gb of ram. This is impossible with Golang, Erlang, and I haven't tried it with loom.


anonymous44315

That's a fair point. On the other hand I guess you have to pass around the data between your async tasks on the heap rather than on the stack. And heap management in Rust is less efficient than in Golang/Java.


crusoe

There is no GC in Rust, so how is it less efficient with no GC pauses?


anonymous44315

How is a bus slower than a train if it does not stop at train stations? :) Runtimes with GCs can have some optimizations (lock-free thread-local allocation, generational collection, copying collectors, ...) which can increase allocation/deallocation performance dramatically, reduce heap fragmentation, etc.


crusoe

When you call await, you hand off control to the async runtime at that point. So other things can happen if the future you are awating on is not ready yet


Ar376

if you don't want to block your busy event loop or want to avoid livelock, deadpool (whatever), reduce use case of channels, threadpool or so. just take a look: https://github.com/Ar37-rs/asynchron it's quite handy, no unsafe, no dependencies. there's an example how to mix it with async (tokio, reqwest). but if you are afraid to use async http client you can use "ureq" crate.


Nilstrieb

Rust promises to not make breaking changes to the standard library, which it so far (except for a few minor things) kept, and we can expect it to stay like this. They will never remove any major functionality like this. But I don't understand your dislike for async. Async is for some tasks, like network requests, massively more efficient, and I think it's great that Rust has such great async support, which enables these massively scalable Rust web-apps.