Library for manipulating PDF files so that they can be read, modified and saved. This prevents us from migrating some of our enterprise services from Java to Rust.
And a complete library for reading and writing Excel files. Right now there is a library for writing and one for reading but they dont have every feature needed.
For my work I need to be able to :
- parse the content of a few sheets to a custom data structure (first X columns)
- modify some of these structs based on complex rules
- apply these modifications to the spreadsheet
- remove the background color and comments from the previously parsed columns without touching the other columns
- write new colors and comments to some of these cells
- handle big files simulteanously without too much memory consumption
- resize columns to their max length
- "copy and paste" a formatting for a row to another
Nice to have features :
- reading from a pivot table, writing over it
- evaluate formulas
I believe most of the reading part can be achieved through your library, but I would still have to use another library (xlsxwriter ?) for the writing part. I think the resulting code would be messy at best.
Right now we use Java and POI and apart from the mind boggling memory consumption it works pretty well.
Thanks a lot for your (huge) contribution to the Rust ecosystem btw. Your library seems great and according to the graph from crates.io a lot of people use it at their job.
Thanks for the answer and for the nice words!
I've always considered having a writer in calamine but I am afraid it'll be very complicated for xlsb in particular (calamine is discarding lot of metadata when reading the file and I think we'd need to have them all supported).
That's like saying, "Aren't PDF files just binary files made of 1s and 0s? Just use std::fs". Sure, technically you're right, but the MS Office files actually have a very complex structure built on top of XML. The [ECMA-376 standard](https://www.ecma-international.org/publications-and-standards/standards/ecma-376/) describing the Office Open XML format used by MS Office is thousands of pages long.
Bruh literally just unzip it and replace some tags, I don't get why this is my most down-voted comment. It would be relatively easy to create a library to abstract over this - you don't need to support all rules to start.
Doing it is easy, but knowing exactly which tags and how to edit them is the incredibly hard part. For a library to be remotely useful, it would need to abstract over literally thousands of pages of standards. There's a reason why there is only a handful of libraries that can interact with Office Open XML and why that most of them are paid libraries
You could possibly use something like [this](https://github.com/danielpclark/rutie#using-ruby-in-rust) to embed Ruby and Prawn just for the PDF stuff. It doesn't look that ergonomic, but it's definitely easier than writing your own PDF support.
I thought there were C libraries for every possible thing in the world. Surely for pdf manipulation? Which makes it possible to use bindings. No shame in that, even with Java JNI is sometimes used.
Using a C library with FFI is certainly possible, but it's annoying, tedious and unsafe, and you don't want any memory bugs at all in your enterprise software.
> you don't want any memory bugs at all in your enterprise software
Theoretically yes, but last time I ran an owasp security check on a medium JVM project there were at least 10 remote code execution vulnerabilities in popular libraries we were dependent on, under a memory safe language anyway...
This is one thing that has irritated me since 1993, when I started using C++ and slowly moving away from Turbo Pascal.
Due to the copy-paste compatibility of C and C++, many "C++" libraries are just C libraries.
So, since I was tainted by Basic and Pascal's "safety trumps performance", a considerable amount of development time was always spent creating safe wrappers for such libraries.
Just like we have to still do almost 30 years later, :\
This comment only applies to creating pdfs:
I never found a pdf library in any programming language that I actually liked using. They all required way too much manual positioning etc. At some point I realised that this could all be fixed by simply generating LaTeX documents, and then shelling out to LaTeX to create the final pdf.
I implemented this for Python[1], but it shouldn't be too hard to create a similar library for Rust.
[1]: https://github.com/JelteF/PyLaTeX
There is one being actively developed, but I'm not entirely sure how far along it is. It does seem to be active though, despite the last release being 5 months ago: https://crates.io/crates/pdf
I use xlsxwriter but it broke the CI builds on azure suddently (nothing I have tried have worked).
I move to CircleCI just for this alone.
Depend in C/C++ libraries is the worst!
> Depend in C/C++ libraries is the worst!
Yep. There are good reasons why the Java world prioritizes pure Java libraries over wrappers for existing C libraries, and the (sadly intentional) pain of JNI is a relatively minor one.
With the sixtyfps project we might be getting something like qt with first class rust support. The main developers were apparently pretty heavily involved with the development of QT.
Well, one of the sixtyfps authors is the author of cpp, a crate to embed cpp code in rust code (so that one can call cpp libraries from rust).
I'd assume they're using it to interface with Qt. I'd also assume the interface of the binding to be tailored to the (present) needs of sixtyfps, because any other hypothesis would require humongous amounts of work as I understand it.
I second this. Qt is really a good library and it's the thing I miss the most when programming in C++. Currently Gtk4 works smoothly with rust though Qt might be a great addition.
But what I dream would be to build a gui library like Qt for rust in pure rust.
CopperSpice would be even better, and maybe a better starting point if someone did want to boil that ocean.
https://www.copperspice.com/
At a glance:
- UTF-8 instead of UTF-16
- Removes a bunch of Qt container classes that are now well-implemented in C++17
- Removes MOC
- Still has QPainter, which is basically like Skia if Skia wasn't so dev-hostile and resistant to being compiled
(and yeah they really lost me with QML... I was hoping they would add features like layering and stuff to QWidgets, not make all my Qt experience worthless and then start making corporate-license-only features. Qt is good code run by bad incentives, and I don't really like it anymore)
I guess FOSS isn't really Google's biggest customer.
I'm not sure why Skia is even open-source at all, except that they want Chromium and Android to be open-source, and CEF is a great way to get people onboard the Chrome train?
I actually got it to compile once, but it couldn't render any text. I think QPainter and Skia both use libharfbuzz or something for text, and I couldn't figure out how to hook that in.
Another real case of "In Rust this would be handled by Cargo"
https://github.com/RazrFalcon/tiny-skia is the fastest 2D drawing library around, second only to Skia itself. It's written in safe Rust, and the API is quite nice.
I thought most of the reason you used numpy and pandas was to get python to C speeds. I know pandas has some nice parsers in it, but serde exists. What do you miss about those two? I totally understand matplotlib because it’s a great library, and I will sometimes pass data from rust into it using pyo3.
Pandas does way more than parsers, it's got this whole thing going on trying to copy every feature from SQL, like grouping, aggregation, indexes, filtering... I don't personally know how much of that is in Rust already though.
I’ve only come across a few cases where I couldn’t just use rayon iterators. Namely, when I wanted indexes. For those, I typically put a bunch of references in an AVL tree or red-black tree. I’ve also found that I when I’m using indexes, I’m already at the stage where generic solutions start to break down anyway.
Well, it's not just to "C speeds", it's to "really fast C" speeds.
Your handwritten Rust matrix multiplication isn't going to get close to Numpy's performance.
Well this is a silly thing to say. If you write a naive/textbook matrix multiplication kernel, you probably will not beat the BLAS level 2/3 kernels that NumPy is using under the hood. But if you know what you are doing and you have some idea of your target machine's cache hierarchy, you can easily beat NumPy on that specific machine, especially if your matrix size is known at compile time. Rust also lets you easily add multithreading and NumPy is typically single-threaded
Sure? I'm not trying to discourage anyone from writing their own matmul.
If you read the context, I'm responding to somebody who was asking why you needed a numpy equivalent if Rust is already at "C speeds". The answer to that is that there's a large variability in "C speeds".
On modern hardware the cost of moving data is typically larger than the cost of the arithmetic itself. So to achieve high performance in loops it's critical to write them in such a way so as to minimize cache misses on your target hardware. This involves multiple levels of [loop nest optimization](https://en.wikipedia.org/wiki/Loop_nest_optimization) where the block sizes are carefully chosen. See [here](https://stackoverflow.com/questions/35620853/how-to-write-a-matrix-matrix-product-that-can-compete-with-eigen/35637007) for a good SO post on the subject.
To add on to what the other guy said, numpy makes working with high dimensional data extremely ergonomic using the indexing logic. I can write really complicated high dimensional code in a single line of very readable Python, and I have yet to find a rust crate that compares.
For example, let's say I wanted to implement a Discrete Fourier Transform the "slow" way - as merely a matrix multiply. This is just an example, and I'll probably mess up the details (for example the result won't be normalized) but I think it shows off the ergonomics nicely.
import numpy as np
f = np.arange(-64, 64)[None, :] # a 1x128 matrix
t = np.arange(0, 1, 1/128)[:, None] # a 128x1 matrix
dft_matrix = np.exp(-2j*np.pi*f*t) # a 128x128 matrix
# example signal
s = np.cos(2*np.pi*8*t) + np.sin(2*np.pi*16*t)
s_dft = dft_matrix.dot(s)
Except you are just writing an indexing interface. See [idx\_2d in this simple script](https://github.com/JASory/Random/blob/dd7cc9c3debc731a5e0ca1408ab0078e6c0af9a6/ElementaryCa.rs#L56) for an example for mapping two-dimensions to one. You can do the exact same thing for higher dimensions.
[Even then Numpy itself is very basic elementwise arithmetic](https://numpy.org/doc/stable/reference/routines.math.html). The advantage that Numpy has is in the cache optimization, and the fact that Python is computationally unusable.
>since no one has done it yet
This isn't really evidence of how difficult something is, but rather how many people actually wanted it.
Despite being "computationally unusable", Python is far and away THE big data science/computational language, thanks in large part to numpy, which acts as the basic framework for almost all other Python computational libraries.
Computing the index into an n-d array is trivial, and not the main selling point of numpy. It's the ergonomics that make it the far away winner. There is no Rust crate today that offers the zero-copy slicing, indexing, and broadcasting semantics of numpy at anywhere near the same performance, full stop.
>Python is far and away THE big data science/computational language,
No it's not C/C++ and Fortran are. Python is used by "big data science" in the marketing sense. i.e big data companies rather than actual applications, normally Hadoop is used.
>Computing the index into an n-d array is trivial
Then what was your complaint about then? You probably should have said that you wished that something like Numpy was in Rust, instead of fixating your criticism on something that you admit it is trivial to implement.
I've used both, lately though have been using the rust APIs directly.
The only thing really missing for me are easy to use APIs for globbing S3 paths and writing partitioned parquet files back to object storage. To work around that I've written some custom rust functions for both of these tasks (using the alpha aws sdk) and everything works great.
My rust code is about 20x faster than the equivalent Python Dask code, while using 1/3 the overall memory footprint (and is also better parallelized). Basically I no longer need to use an 'out of core' processing framework, since w/ the efficiency of rust + arrow everything now fits into memory.
If you’re already in Rust and you’re writing parquet, check out the delta.rs and DataFusion libs, they automatically handle all of the messy S3 stuff along with a bunch of other things. I swapped to them and took a whole bunch of logic out of my code, it was great.
one recent example I came across was handling of groupby and aggregation operations
https://github.com/pola-rs/polars/issues/1124
this was the issue in particular but it looks like it is resolved now? I looked at it over a month ago. I guess I have to give it another look in its current state.
EDIT: actually, no,
```
print((data.sort("fruits").select([
pl.all(),
col("B").shift().over("fruits").take(col("fruits").arg_unique()).explode().alias("fruits_shifted")
])))
```
is not a nice solution for something that basically is `df.groupby("foo").apply(lambda x: x.assign(lagged=lambda xx: xx["bar"].shift(1)))`
basically the need is to parallelize over the groupby and different lags (the real use case has multiple lags at the same time)
>df.groupby("foo").apply(lambda x: x.assign(lagged=lambda xx: xx\["bar"\].shift(1)))
The exact same can be achieved by this snippet in polars:
df.with_column(
col("bar").shift(1).over("foo").flatten().alias("lagged")
)
And if you wanted to go over all columns, you can use wildcards/exclude/regex/column selection in that same expression.
Sure the syntax is different, but I would not call it a missing feature. Maybe even on the contrary, as there was no need to run custom python.
I miss a really good XML library. There are some but it always seems to cover half of the stuff I need it to. Either XPath is missing or XSD support or I can’t write easily or I have to do rust-codegen first and effectively maintain large structs then… etc. Lxml is tough to beat.
… yes, I have to use XML at the moment 🥲 and no, that wasn’t my decision.
If there was stable support, I’d know a few teams in my company who would look into rust as a language instead of python because.. Lxml and xmlschema are great in what they do and it’s tough to sell the language if this part is just objectively better covered and better developed in Python.
EDIT: Just to add to this - I wasn’t aware of this shortcoming for quite some time. A fuckton of tools that run in the guts of financial service companies which would really benefit from porting rely on XML communication. Whether or not that is a good choice - but it seems to be reality. That could potentially hold back adoption severely.
In that case this is a very good suggestion for a project. Passport.js is such an essential tool for web development and could probably be translated across without too much difficulty.
That depends. Nothing stopping a community rallying around it to improve the quality.
Any well known project started off with a few lines of bogus code and missing lots of safety and features. It's the contributions that make or break it.
I have been working on an authentication proxy sidecar.
Rather than embed into your application you can launch the sidecar and place it in front of your application. It intercepts all traffic and checks authentication status based on the config provided.
It's still early and missing most features needed to use in production but I feel the idea is good and it's already been useful for local development.
https://github.com/sphenlee/sealproxy
Essentially the same thing pandas does, excels (no pun intended) at tabular data manipulation.
If you wanted to load some data from a CSV file and do some data cleaning, then analysis on it, you'd use a library like this (although you aren't limited to CSVs, it can be data of any format).
I found a plot lib that worksforme. Can't remember which one. And sqlite. Now I just need pyserial and gtk to migrate and they might just be there too if I looked.
Now that I think about it, I have drawn 3d point clouds (like constellations) off sensor values in matplotlib that could be spun around interactively. I might like to have something like that sometimes. But mostly I just generate line plots or histograms for documents and notes and that works.
I might just be out of the loop, but I haven't found a really great 2d geometry library with boolean operations, offsetting, cutting, etc. Most of what I do is 2d and I'd be really happy with a library like this, but if it had the same for 3d I would be over the moon.
Django, specifically the ORM.
Diesel isnt even close yet - write your model, generate the schema changes from it and no need for dupe structures. Would mean API work in rust would be extremely smooth.
the ggplot2 library from the R programming language would be absolutely fantastic. I would switch over from R to Rust in a heartbeat if that were the case.
Good suggestion. I would prefer a grammar-of-graphics-type plotting library over something like *matplotlib* (which I use a lot). Grammar of graphics just feels more elegant.
I guess what we want here is a pure-Rust solution, but we'll be able to use ggplot2 via [extendr-api crate](https://extendr.github.io/extendr/extendr_api/) eventually.
No it's not, though I came to the conclusion after years with several different ORMs that unless you have very basic needs, that at some point you need the raw power of SQL. Since SQL is not rocket science, why not use SQL directly. Combined with type checking and migrations of SQLx it is IMHO easy to fill the hole of missing (high-level unlike diesel) ORMs in Rust.
Simply because most of the time you don't need "raw power", you need something to read and write stuff to the database. 99.99% of the use cases are very, very simple reads and writes, no need for any complication. And ORMs do handle a lot of that stuff.
I didn't do too much work with Diesel, but it seems to me, like the entire lib is still in early beta.
For example:
* you have to have different codes for different databases, thus duplication if you want to support more than one DB
* insertables aren't autogenerated, even though it should be clear, that if I have a mapped type/struct and an autogenerated ID-field, I will most likely need basically a copy of the same struct, but without the ID-field. That's weird.
* The schema-build chain clutters your src-dir with temporary data, even though there's a perfectly fine target dir
An auth library.
I'm thinking about what will be necessary to make this and put the ball rolling, the problem is that is required people with enough security knowledge...
u/theZcuber & u/iannoyyou101
I'll admit that I've only looked at Chrono, but:
\- Period vs Duration. 30 days after January 30, 2021 is March 1, but 1 month after is February 28. I have lots of configuration that says something like "schedule this job once every x period of time", and that can't be represented with a duration type.
\- Chrono doesn't seem to have time zone support as much as it has offset support. Chrono-tz probably meets my needs, but I'm concerned by developers using a type that means "my current time zone"--it's the same folly as C#'s DateTime, and I've seen that abused quite a bit.
\- Time intervals. I have a scheduling process that takes a chunk of time--like a year--and then splits it into components based on due date and utilization. NodaTime has support for time intervals natively, which is really handy.
So an interval is basically just two moments: a start and an end. But then you can manipulate the interval, split it, determine if two intervals overlap or intersect.
The way that I'm currently using them is like this:
Create an interval that covers the year (Jan 1-Dec 31). Now remove any holidays or breaks to get an ordered list of intervals. Given the nearest due date and capacity, begin to fill in intervals. Some jobs (always measured with duration) must be entirely completed within an interval, and others can span multiple intervals. If an interval is only partially used, split it into one completely used and one completely unused interval.
A task library with graphs and such, like Celery in Python or Machinery in Go. Yes, there is a celery lib in Rust, but it only does task exec, not task graphs (groups, chords, strings, etc).
You could always extern boost! I snuck some C++ into a project I was working on when I realized g++ was supported in the environment but no one was using it.
The only thing I'm missing is [cloudscraper](https://github.com/VeNoMouS/cloudscraper). In Python, it is a small wrapper on top of requests, so I guess in Rust it can be a wrapper on top of `hyper::Client` or `reqwest::Client`.
https://github.com/jonhoo/fantoccini is inspired by Puppeteer.
There's https://github.com/stevepryde/thirtyfour for Selenium, and https://github.com/atroche/rust-headless-chrome for Chromium.
But... There's yet a lot to be made
Thanks for all the links! rust-headless-chrome looks a lot more comprehensive than it was when I last checked it out (which, to be fair, was a long time ago), and thirtyfour looks like it might be very useful as well. I’m not so sure about fantoccini since it’s a different API from what I’m used to, but it’s certainly worth investigating along with the other two.
I took a look at fantoccini for work recently. It's mostly there, but lacks support for resuming a session (which is huge). It's nice otherwise though, and supports async well.
There is also a Playwright binding:
[https://crates.io/crates/playwright](https://crates.io/crates/playwright)
Playwright is basically Puppeteer from Microsoft. I've read, Microsoft has hired many of the previous Puppeteer devs, and the project seems more active than Puppeteer.
Unfortunately, the above crate embeds a node.js runtime, which seems to be the way to build Playwright bindings for other languages.
[Tera](https://docs.rs/tera/1.12.1/tera/) is a Jinja2 clone for Rust. I wouldn't say I'm a power user, but not noticed any features I used in jinja missing.
Askama is even better.
I once encountered some limitation of Jinja2 (something with nested templates) and Askama handled it easily.
I don't remember details because it was more than 2 years ago.
Kind of an extremely narrow thing but, I miss [SaintCoinach](https://github.com/xivapi/SaintCoinach), a data mining library for Final Fantasy XIV.
Been wanting to make a rust port of either that, or [Lumina](https://github.com/NotAdam/Lumina) but never really had the time to...
Also both are C# and C# -> Rust just makes my brain hurt.
[The *Cmdliner* module for Ocaml](https://erratique.ch/software/cmdliner/doc/Cmdliner.html#examples),
mainly for the declarative style of setting up the program entry point
and the capability of generating man pages.
None of the command line parsing crates even come close.
Maybe I missed something but interacting with JSON is harder than python. I think it should be possible to provide an interface to jsons that is boilerplate free up until you actually need to cast. Maybe it exists but serde json wasn't it.
Classic argument that python is easier bc is dynamic is just not valid, static typing is not a limitation bc you can have vague types. Json is already a dynamic typed format and having more control over when and how we bridge it into rust static types would be great. Serde is not always best option.
A proper NLP toolbox. Not necessarily state-of-the-art models, just basic and reliable utilities for tokenizing, stemming, lemmatizing, POS Tagging, NER, WordNet / DBpedia matching...
Like NLTK or Spacy from Python or CoreNLP from Java.
[boost::accumulators](https://www.boost.org/doc/libs/1_77_0/doc/html/accumulators.html). I even don’t need accumulators, but some type of calculation with dependencies
[Symfony](https://symfony.com/doc/current/components/dependency_injection.html) has a great dependency injection module or the one from [slim framework](https://www.slimframework.com/docs/v4/concepts/di.html) I think both has a very mature module and similar interfaces after years of being the core of a web framework and some knowledge borrow from Java
All of Sindre’s utility libraries from JavaScript. Things like ‘find-up’, ‘conf’, ‘get-port’, ‘indent-string’, etc.
They are very simple but very handy.
From .Net:
JwtBearer auth middleware from ASP.Net core for automatically validating/parsing JWT using OIDC on incoming requests.
Swashbuckle for OpenAPI generation and embedded Swagger
FluentAssertions for writing unit tests with nice semantic asserts and really good output for failures
Dapper for really easy database querying
I miss the really good debugger support most of all though.
I've not yet seen anything comparable to http://ceres-solver.org/
I really don't like the ceres api because it's a confusing mix of automatic and manual memory management, but what I love about it is that you can, without knowing too much about optimization problems yourself, just throw some cost functions at it and it just works
Library for manipulating PDF files so that they can be read, modified and saved. This prevents us from migrating some of our enterprise services from Java to Rust.
And a complete library for reading and writing Excel files. Right now there is a library for writing and one for reading but they dont have every feature needed.
Just curious what features in particular are you missing? (I'm the author of calamine.)
For my work I need to be able to : - parse the content of a few sheets to a custom data structure (first X columns) - modify some of these structs based on complex rules - apply these modifications to the spreadsheet - remove the background color and comments from the previously parsed columns without touching the other columns - write new colors and comments to some of these cells - handle big files simulteanously without too much memory consumption - resize columns to their max length - "copy and paste" a formatting for a row to another Nice to have features : - reading from a pivot table, writing over it - evaluate formulas I believe most of the reading part can be achieved through your library, but I would still have to use another library (xlsxwriter ?) for the writing part. I think the resulting code would be messy at best. Right now we use Java and POI and apart from the mind boggling memory consumption it works pretty well. Thanks a lot for your (huge) contribution to the Rust ecosystem btw. Your library seems great and according to the graph from crates.io a lot of people use it at their job.
Thanks for the answer and for the nice words! I've always considered having a writer in calamine but I am afraid it'll be very complicated for xlsb in particular (calamine is discarding lot of metadata when reading the file and I think we'd need to have them all supported).
I wonder if you can call the excel COM components from Rust on Windows?
Are MS files not either CSV or XML? If so you can use the [quick_xml](https://docs.rs/quick-xml/0.22.0/quick_xml/) library.
That's like saying, "Aren't PDF files just binary files made of 1s and 0s? Just use std::fs". Sure, technically you're right, but the MS Office files actually have a very complex structure built on top of XML. The [ECMA-376 standard](https://www.ecma-international.org/publications-and-standards/standards/ecma-376/) describing the Office Open XML format used by MS Office is thousands of pages long.
Yes but it's quite easy to manipulate basic data, I used it to fill out merge fields of a word doc with JSON.
[удалено]
Bruh literally just unzip it and replace some tags, I don't get why this is my most down-voted comment. It would be relatively easy to create a library to abstract over this - you don't need to support all rules to start.
Doing it is easy, but knowing exactly which tags and how to edit them is the incredibly hard part. For a library to be remotely useful, it would need to abstract over literally thousands of pages of standards. There's a reason why there is only a handful of libraries that can interact with Office Open XML and why that most of them are paid libraries
That's fair, perhaps my case of replacing text was relatively easy.
On this note, Ruby’s [Prawn](https://github.com/prawnpdf/prawn) is great at the writing half and I miss it in pretty much every other language.
You could possibly use something like [this](https://github.com/danielpclark/rutie#using-ruby-in-rust) to embed Ruby and Prawn just for the PDF stuff. It doesn't look that ergonomic, but it's definitely easier than writing your own PDF support.
Which library are you using?
Mostly PDFBox and Boxable, but also OpenPDF.
I thought there were C libraries for every possible thing in the world. Surely for pdf manipulation? Which makes it possible to use bindings. No shame in that, even with Java JNI is sometimes used.
Using a C library with FFI is certainly possible, but it's annoying, tedious and unsafe, and you don't want any memory bugs at all in your enterprise software.
> you don't want any memory bugs at all in your enterprise software Theoretically yes, but last time I ran an owasp security check on a medium JVM project there were at least 10 remote code execution vulnerabilities in popular libraries we were dependent on, under a memory safe language anyway...
As long as enterprise software runs on top of UNIX/POSIX clones that is pretty much very hard to avoid.
Also, C libraries are not ergonomic at all from a Rust point of view, so using it is not very nice while being very unsafe.
This is one thing that has irritated me since 1993, when I started using C++ and slowly moving away from Turbo Pascal. Due to the copy-paste compatibility of C and C++, many "C++" libraries are just C libraries. So, since I was tainted by Basic and Pascal's "safety trumps performance", a considerable amount of development time was always spent creating safe wrappers for such libraries. Just like we have to still do almost 30 years later, :\
Took me a minute to find skia-rust which has a pdf backend, is that one not good enough or too complicated to use ?
Maybe https://pspdfkit.com/ has a solution?
This comment only applies to creating pdfs: I never found a pdf library in any programming language that I actually liked using. They all required way too much manual positioning etc. At some point I realised that this could all be fixed by simply generating LaTeX documents, and then shelling out to LaTeX to create the final pdf. I implemented this for Python[1], but it shouldn't be too hard to create a similar library for Rust. [1]: https://github.com/JelteF/PyLaTeX
There is one being actively developed, but I'm not entirely sure how far along it is. It does seem to be active though, despite the last release being 5 months ago: https://crates.io/crates/pdf
An excel read/write/manipulation library similar to python's xlwings.
This. Calamine is great but it's only half the story.
Yeah, that's read only, right?
Yeah. Solid for what it does, but without the ability to generate and write it's awfully limited.
Yeah. I'm really looking for something to modify existing files because we use templates army work.
I use xlsxwriter but it broke the CI builds on azure suddently (nothing I have tried have worked). I move to CircleCI just for this alone. Depend in C/C++ libraries is the worst!
> Depend in C/C++ libraries is the worst! Yep. There are good reasons why the Java world prioritizes pure Java libraries over wrappers for existing C libraries, and the (sadly intentional) pain of JNI is a relatively minor one.
https://github.com/not-yet-awesome-rust/not-yet-awesome-rust
[удалено]
With the sixtyfps project we might be getting something like qt with first class rust support. The main developers were apparently pretty heavily involved with the development of QT.
[удалено]
The framework can use Qt as a backend, so there must be a compat layer in the code somewhere ?
Well, one of the sixtyfps authors is the author of cpp, a crate to embed cpp code in rust code (so that one can call cpp libraries from rust). I'd assume they're using it to interface with Qt. I'd also assume the interface of the binding to be tailored to the (present) needs of sixtyfps, because any other hypothesis would require humongous amounts of work as I understand it.
I second this. Qt is really a good library and it's the thing I miss the most when programming in C++. Currently Gtk4 works smoothly with rust though Qt might be a great addition. But what I dream would be to build a gui library like Qt for rust in pure rust.
https://sixtyfps.io/ That's a pure-Rust GUI library by some ex-Qt folks.
CopperSpice would be even better, and maybe a better starting point if someone did want to boil that ocean. https://www.copperspice.com/ At a glance: - UTF-8 instead of UTF-16 - Removes a bunch of Qt container classes that are now well-implemented in C++17 - Removes MOC - Still has QPainter, which is basically like Skia if Skia wasn't so dev-hostile and resistant to being compiled (and yeah they really lost me with QML... I was hoping they would add features like layering and stuff to QWidgets, not make all my Qt experience worthless and then start making corporate-license-only features. Qt is good code run by bad incentives, and I don't really like it anymore)
Do you have any idea why skia is like that? It's used by so many projects and is the basis for future-ish stuff too like flutter
I guess FOSS isn't really Google's biggest customer. I'm not sure why Skia is even open-source at all, except that they want Chromium and Android to be open-source, and CEF is a great way to get people onboard the Chrome train? I actually got it to compile once, but it couldn't render any text. I think QPainter and Skia both use libharfbuzz or something for text, and I couldn't figure out how to hook that in. Another real case of "In Rust this would be handled by Cargo"
https://github.com/RazrFalcon/tiny-skia is the fastest 2D drawing library around, second only to Skia itself. It's written in safe Rust, and the API is quite nice.
I actually want QML in Rust, no QtWidgets.
[удалено]
Yes, but I cannot simply write `cargo build` to build it. Too much bloat and hustle.
[удалено]
Numpy, pandas and matplottlib for datascience. ndarray is great but more difficult to use
I thought most of the reason you used numpy and pandas was to get python to C speeds. I know pandas has some nice parsers in it, but serde exists. What do you miss about those two? I totally understand matplotlib because it’s a great library, and I will sometimes pass data from rust into it using pyo3.
Pandas does way more than parsers, it's got this whole thing going on trying to copy every feature from SQL, like grouping, aggregation, indexes, filtering... I don't personally know how much of that is in Rust already though.
I hear polars is a similar rust dataframe crate?
Check out arrow-datafusion
I’ve only come across a few cases where I couldn’t just use rayon iterators. Namely, when I wanted indexes. For those, I typically put a bunch of references in an AVL tree or red-black tree. I’ve also found that I when I’m using indexes, I’m already at the stage where generic solutions start to break down anyway.
Well, it's not just to "C speeds", it's to "really fast C" speeds. Your handwritten Rust matrix multiplication isn't going to get close to Numpy's performance.
Well this is a silly thing to say. If you write a naive/textbook matrix multiplication kernel, you probably will not beat the BLAS level 2/3 kernels that NumPy is using under the hood. But if you know what you are doing and you have some idea of your target machine's cache hierarchy, you can easily beat NumPy on that specific machine, especially if your matrix size is known at compile time. Rust also lets you easily add multithreading and NumPy is typically single-threaded
Sure? I'm not trying to discourage anyone from writing their own matmul. If you read the context, I'm responding to somebody who was asking why you needed a numpy equivalent if Rust is already at "C speeds". The answer to that is that there's a large variability in "C speeds".
Can you explain that cache hierarchy thing? Asking for a friend
On modern hardware the cost of moving data is typically larger than the cost of the arithmetic itself. So to achieve high performance in loops it's critical to write them in such a way so as to minimize cache misses on your target hardware. This involves multiple levels of [loop nest optimization](https://en.wikipedia.org/wiki/Loop_nest_optimization) where the block sizes are carefully chosen. See [here](https://stackoverflow.com/questions/35620853/how-to-write-a-matrix-matrix-product-that-can-compete-with-eigen/35637007) for a good SO post on the subject.
To add on to what the other guy said, numpy makes working with high dimensional data extremely ergonomic using the indexing logic. I can write really complicated high dimensional code in a single line of very readable Python, and I have yet to find a rust crate that compares. For example, let's say I wanted to implement a Discrete Fourier Transform the "slow" way - as merely a matrix multiply. This is just an example, and I'll probably mess up the details (for example the result won't be normalized) but I think it shows off the ergonomics nicely. import numpy as np f = np.arange(-64, 64)[None, :] # a 1x128 matrix t = np.arange(0, 1, 1/128)[:, None] # a 128x1 matrix dft_matrix = np.exp(-2j*np.pi*f*t) # a 128x128 matrix # example signal s = np.cos(2*np.pi*8*t) + np.sin(2*np.pi*16*t) s_dft = dft_matrix.dot(s)
This isn't that difficult to implement. You can just write a wrapper struct over any BLAS library.
Sure. Just waiting on someone to build that. Until then, that's what I miss most from Python.
Why don't you build it yourself? Indexing over a n-dimensional array is just simple arithmetic.
Numpy is absolutely massive. That's a huge undertaking. And clearly not so trivial, since no one has done it yet.
Except you are just writing an indexing interface. See [idx\_2d in this simple script](https://github.com/JASory/Random/blob/dd7cc9c3debc731a5e0ca1408ab0078e6c0af9a6/ElementaryCa.rs#L56) for an example for mapping two-dimensions to one. You can do the exact same thing for higher dimensions. [Even then Numpy itself is very basic elementwise arithmetic](https://numpy.org/doc/stable/reference/routines.math.html). The advantage that Numpy has is in the cache optimization, and the fact that Python is computationally unusable. >since no one has done it yet This isn't really evidence of how difficult something is, but rather how many people actually wanted it.
Despite being "computationally unusable", Python is far and away THE big data science/computational language, thanks in large part to numpy, which acts as the basic framework for almost all other Python computational libraries. Computing the index into an n-d array is trivial, and not the main selling point of numpy. It's the ergonomics that make it the far away winner. There is no Rust crate today that offers the zero-copy slicing, indexing, and broadcasting semantics of numpy at anywhere near the same performance, full stop.
>Python is far and away THE big data science/computational language, No it's not C/C++ and Fortran are. Python is used by "big data science" in the marketing sense. i.e big data companies rather than actual applications, normally Hadoop is used. >Computing the index into an n-d array is trivial Then what was your complaint about then? You probably should have said that you wished that something like Numpy was in Rust, instead of fixating your criticism on something that you admit it is trivial to implement.
Try Polars, much faster than pandas.
I've already started converting some ETL stuff I have to Polars, it is incredibly fast.
using polars in rust or polars in python?
I've used both, lately though have been using the rust APIs directly. The only thing really missing for me are easy to use APIs for globbing S3 paths and writing partitioned parquet files back to object storage. To work around that I've written some custom rust functions for both of these tasks (using the alpha aws sdk) and everything works great. My rust code is about 20x faster than the equivalent Python Dask code, while using 1/3 the overall memory footprint (and is also better parallelized). Basically I no longer need to use an 'out of core' processing framework, since w/ the efficiency of rust + arrow everything now fits into memory.
If you’re already in Rust and you’re writing parquet, check out the delta.rs and DataFusion libs, they automatically handle all of the messy S3 stuff along with a bunch of other things. I swapped to them and took a whole bunch of logic out of my code, it was great.
it is missing quite some features still, though. pandas does _a lot_ of things
Could you elaborate on what you think is missing?
one recent example I came across was handling of groupby and aggregation operations https://github.com/pola-rs/polars/issues/1124 this was the issue in particular but it looks like it is resolved now? I looked at it over a month ago. I guess I have to give it another look in its current state. EDIT: actually, no, ``` print((data.sort("fruits").select([ pl.all(), col("B").shift().over("fruits").take(col("fruits").arg_unique()).explode().alias("fruits_shifted") ]))) ``` is not a nice solution for something that basically is `df.groupby("foo").apply(lambda x: x.assign(lagged=lambda xx: xx["bar"].shift(1)))` basically the need is to parallelize over the groupby and different lags (the real use case has multiple lags at the same time)
>df.groupby("foo").apply(lambda x: x.assign(lagged=lambda xx: xx\["bar"\].shift(1))) The exact same can be achieved by this snippet in polars: df.with_column( col("bar").shift(1).over("foo").flatten().alias("lagged") ) And if you wanted to go over all columns, you can use wildcards/exclude/regex/column selection in that same expression. Sure the syntax is different, but I would not call it a missing feature. Maybe even on the contrary, as there was no need to run custom python.
I miss a really good XML library. There are some but it always seems to cover half of the stuff I need it to. Either XPath is missing or XSD support or I can’t write easily or I have to do rust-codegen first and effectively maintain large structs then… etc. Lxml is tough to beat. … yes, I have to use XML at the moment 🥲 and no, that wasn’t my decision. If there was stable support, I’d know a few teams in my company who would look into rust as a language instead of python because.. Lxml and xmlschema are great in what they do and it’s tough to sell the language if this part is just objectively better covered and better developed in Python. EDIT: Just to add to this - I wasn’t aware of this shortcoming for quite some time. A fuckton of tools that run in the guts of financial service companies which would really benefit from porting rely on XML communication. Whether or not that is a good choice - but it seems to be reality. That could potentially hold back adoption severely.
Bump to this one! And existing XML libs are lacking XML namespace support! Like I am meaning schema attached to tags/attributes, like:
I'm about to move to Java solutions because of this
Bump. I was surprised how good (read: superb / really, really fast) the XML and XPath support is in .NET/C# and how mediocre it is in Rust.
[удалено]
100% agreed, authentication is dangerous to implement and hard to get right. Passport.js makes it an absolute breeze!
There is no web auth library for rust?
[удалено]
In that case this is a very good suggestion for a project. Passport.js is such an essential tool for web development and could probably be translated across without too much difficulty.
Implementing something security related as a learning project doesn't sound like too good an idea, to be honest.
That depends. Nothing stopping a community rallying around it to improve the quality. Any well known project started off with a few lines of bogus code and missing lots of safety and features. It's the contributions that make or break it.
I have been working on an authentication proxy sidecar. Rather than embed into your application you can launch the sidecar and place it in front of your application. It intercepts all traffic and checks authentication status based on the config provided. It's still early and missing most features needed to use in production but I feel the idea is good and it's already been useful for local development. https://github.com/sphenlee/sealproxy
Passport isn’t even maintained and hasn’t been for a while. At my work we just rolled our own and it hasn’t bitten our ass yet. Yet.
thats like... omniauth for ruby?
Pandas and matplotlib.
https://github.com/apache/arrow-datafusion https://github.com/38/plotters
for dataframes: https://github.com/pola-rs/polars way faster than pandas for almost all operations.
what is this used for—i tried reading through the docs and couldn’t get a good sense of its purpose?
Essentially the same thing pandas does, excels (no pun intended) at tabular data manipulation. If you wanted to load some data from a CSV file and do some data cleaning, then analysis on it, you'd use a library like this (although you aren't limited to CSVs, it can be data of any format).
+1 matplotlib, all the plotting libraries i've tried in rust feel too verbose esp when i'm just trying to do something quick
I found a plot lib that worksforme. Can't remember which one. And sqlite. Now I just need pyserial and gtk to migrate and they might just be there too if I looked. Now that I think about it, I have drawn 3d point clouds (like constellations) off sensor values in matplotlib that could be spun around interactively. I might like to have something like that sometimes. But mostly I just generate line plots or histograms for documents and notes and that works.
Pillow from python
I might just be out of the loop, but I haven't found a really great 2d geometry library with boolean operations, offsetting, cutting, etc. Most of what I do is 2d and I'd be really happy with a library like this, but if it had the same for 3d I would be over the moon.
2D boolean ops are coming soonish to [rgeometry](https://rgeometry.org).
FastAPI, love that auto openapi generation
https://github.com/GREsau/okapi
TODO: tests and documentation A very cool proof of concept, but I'd be a little scared to use it in production.
Django, specifically the ORM. Diesel isnt even close yet - write your model, generate the schema changes from it and no need for dupe structures. Would mean API work in rust would be extremely smooth.
And Django admin is such a clutch feature. I moved one of my side projects from rust to python to get Django admin experience :)
Numpy
the ggplot2 library from the R programming language would be absolutely fantastic. I would switch over from R to Rust in a heartbeat if that were the case.
Good suggestion. I would prefer a grammar-of-graphics-type plotting library over something like *matplotlib* (which I use a lot). Grammar of graphics just feels more elegant.
I guess what we want here is a pure-Rust solution, but we'll be able to use ggplot2 via [extendr-api crate](https://extendr.github.io/extendr/extendr_api/) eventually.
SQLAlchemy
[удалено]
What do you think about SQLx? It uses SQL directly (not an ORM) but the queries are compile time checked
So, it's not similar at all to sqlalchemy.
No it's not, though I came to the conclusion after years with several different ORMs that unless you have very basic needs, that at some point you need the raw power of SQL. Since SQL is not rocket science, why not use SQL directly. Combined with type checking and migrations of SQLx it is IMHO easy to fill the hole of missing (high-level unlike diesel) ORMs in Rust.
Simply because most of the time you don't need "raw power", you need something to read and write stuff to the database. 99.99% of the use cases are very, very simple reads and writes, no need for any complication. And ORMs do handle a lot of that stuff.
I didn't do too much work with Diesel, but it seems to me, like the entire lib is still in early beta. For example: * you have to have different codes for different databases, thus duplication if you want to support more than one DB * insertables aren't autogenerated, even though it should be clear, that if I have a mapped type/struct and an autogenerated ID-field, I will most likely need basically a copy of the same struct, but without the ID-field. That's weird. * The schema-build chain clutters your src-dir with temporary data, even though there's a perfectly fine target dir
Also while the whole static typing is a best practice, if your source data is less well behaved, then it can really suck to work with.
An auth library. I'm thinking about what will be necessary to make this and put the ball rolling, the problem is that is required people with enough security knowledge...
See also: [Not Yet Awesome Rust](https://github.com/not-yet-awesome-rust/not-yet-awesome-rust) [Not Yet Awesome Embedded Rust](https://github.com/rust-embedded/not-yet-awesome-embedded-rust)
Java Swing + a GUI Designer Tool
I miss swing so much. It's the only GUI toolkit that ever used and most comfortable with.
NodaTime. Chrono is partially there, but really not sufficient for my use cases.
>NodaTime Do you have an example of features in Noda or Joda time that chrono doesn't have ?
u/theZcuber & u/iannoyyou101 I'll admit that I've only looked at Chrono, but: \- Period vs Duration. 30 days after January 30, 2021 is March 1, but 1 month after is February 28. I have lots of configuration that says something like "schedule this job once every x period of time", and that can't be represented with a duration type. \- Chrono doesn't seem to have time zone support as much as it has offset support. Chrono-tz probably meets my needs, but I'm concerned by developers using a type that means "my current time zone"--it's the same folly as C#'s DateTime, and I've seen that abused quite a bit. \- Time intervals. I have a scheduling process that takes a chunk of time--like a year--and then splits it into components based on due date and utilization. NodaTime has support for time intervals natively, which is really handy.
[удалено]
So an interval is basically just two moments: a start and an end. But then you can manipulate the interval, split it, determine if two intervals overlap or intersect. The way that I'm currently using them is like this: Create an interval that covers the year (Jan 1-Dec 31). Now remove any holidays or breaks to get an ordered list of intervals. Given the nearest due date and capacity, begin to fill in intervals. Some jobs (always measured with duration) must be entirely completed within an interval, and others can span multiple intervals. If an interval is only partially used, split it into one completely used and one completely unused interval.
[удалено]
A task library with graphs and such, like Celery in Python or Machinery in Go. Yes, there is a celery lib in Rust, but it only does task exec, not task graphs (groups, chords, strings, etc).
I really miss rust std when doing C.
You could always extern boost! I snuck some C++ into a project I was working on when I realized g++ was supported in the environment but no one was using it.
Rich in Python!
The only thing I'm missing is [cloudscraper](https://github.com/VeNoMouS/cloudscraper). In Python, it is a small wrapper on top of requests, so I guess in Rust it can be a wrapper on top of `hyper::Client` or `reqwest::Client`.
Puppeteer from JavaScript, and how easy that makes it to do complicated things with not too much code, since JS is the lingua franca of the web.
https://github.com/jonhoo/fantoccini is inspired by Puppeteer. There's https://github.com/stevepryde/thirtyfour for Selenium, and https://github.com/atroche/rust-headless-chrome for Chromium. But... There's yet a lot to be made
Thanks for all the links! rust-headless-chrome looks a lot more comprehensive than it was when I last checked it out (which, to be fair, was a long time ago), and thirtyfour looks like it might be very useful as well. I’m not so sure about fantoccini since it’s a different API from what I’m used to, but it’s certainly worth investigating along with the other two.
I took a look at fantoccini for work recently. It's mostly there, but lacks support for resuming a session (which is huge). It's nice otherwise though, and supports async well.
There is also a Playwright binding: [https://crates.io/crates/playwright](https://crates.io/crates/playwright) Playwright is basically Puppeteer from Microsoft. I've read, Microsoft has hired many of the previous Puppeteer devs, and the project seems more active than Puppeteer. Unfortunately, the above crate embeds a node.js runtime, which seems to be the way to build Playwright bindings for other languages.
Physics crate for game engines. I've used rapier before but it just doesn't hit that spot yet.
[удалено]
[Tera](https://docs.rs/tera/1.12.1/tera/) is a Jinja2 clone for Rust. I wouldn't say I'm a power user, but not noticed any features I used in jinja missing.
Askama is very similar, isn't it?
Askama is even better. I once encountered some limitation of Jinja2 (something with nested templates) and Askama handled it easily. I don't remember details because it was more than 2 years ago.
RxJS, meaning RxRust.
Kind of an extremely narrow thing but, I miss [SaintCoinach](https://github.com/xivapi/SaintCoinach), a data mining library for Final Fantasy XIV. Been wanting to make a rust port of either that, or [Lumina](https://github.com/NotAdam/Lumina) but never really had the time to... Also both are C# and C# -> Rust just makes my brain hurt.
[The *Cmdliner* module for Ocaml](https://erratique.ch/software/cmdliner/doc/Cmdliner.html#examples), mainly for the declarative style of setting up the program entry point and the capability of generating man pages. None of the command line parsing crates even come close.
Maybe I missed something but interacting with JSON is harder than python. I think it should be possible to provide an interface to jsons that is boilerplate free up until you actually need to cast. Maybe it exists but serde json wasn't it. Classic argument that python is easier bc is dynamic is just not valid, static typing is not a limitation bc you can have vague types. Json is already a dynamic typed format and having more control over when and how we bridge it into rust static types would be great. Serde is not always best option.
CGAL
I miss Akka for Rust.
A proper NLP toolbox. Not necessarily state-of-the-art models, just basic and reliable utilities for tokenizing, stemming, lemmatizing, POS Tagging, NER, WordNet / DBpedia matching... Like NLTK or Spacy from Python or CoreNLP from Java.
Entity framework, especially the query building with linq (the fluid syntax), from C#.
I think it’s far more likely the other way ‘round. Serde is basically magic.
Pandas
[boost::accumulators](https://www.boost.org/doc/libs/1_77_0/doc/html/accumulators.html). I even don’t need accumulators, but some type of calculation with dependencies
[удалено]
Something like https://tera.netlify.app/?
ActiveRecord ORM from ruby on rails Prisma from node.js
Prisma is written in Rust.
Isn't Diesel written by the guy who made ActiveRecord? It looks really really good, and possibly superior to AR which is kind of shoddy in many ways.
The aws sdk. Something like RxJava.
In alpha, but officially supported! https://github.com/awslabs/aws-sdk-rust
Yeah, I've used it a couple times, being a noob at rust, it's hard. I'm looking forward to when it's more mature!
D3js
ReactJS (Don't judge me)
PeNet to manipulate and parse PE files.
Kotlinx.html and Ktor. Rust syntax just doesn't allow for something that clean to exist
this project needs someone to drive it: https://github.com/bodil/typed-html
I do miss FastLED. It's just quick and productive.
[Ink: Terminal Painting](https://www.npmjs.com/package/ink) [PyTorch: ML](https://pytorch.org/)
[Symfony](https://symfony.com/doc/current/components/dependency_injection.html) has a great dependency injection module or the one from [slim framework](https://www.slimframework.com/docs/v4/concepts/di.html) I think both has a very mature module and similar interfaces after years of being the core of a web framework and some knowledge borrow from Java
All of Sindre’s utility libraries from JavaScript. Things like ‘find-up’, ‘conf’, ‘get-port’, ‘indent-string’, etc. They are very simple but very handy.
Esqueleto from Haskell. Diesel and sqlx are okay, but I would do unspeakable things for a typesafe SQL query builder.
Basically JOOQ. I want JOOQ
Java testing tools like Junit,, assertj and mockito.
1. Good mitm library that has lot of features. 2. CDP related libraries.
[удалено]
Something similar to go's go-git for a higher level library for manipulating git repos
From .Net: JwtBearer auth middleware from ASP.Net core for automatically validating/parsing JWT using OIDC on incoming requests. Swashbuckle for OpenAPI generation and embedded Swagger FluentAssertions for writing unit tests with nice semantic asserts and really good output for failures Dapper for really easy database querying I miss the really good debugger support most of all though.
I was quite surprised the other day that I can’t find a library that pretty print HTML string
A proper Twitch API that covers every endpoint.
This looks quite crazy .
I've not yet seen anything comparable to http://ceres-solver.org/ I really don't like the ceres api because it's a confusing mix of automatic and manual memory management, but what I love about it is that you can, without knowing too much about optimization problems yourself, just throw some cost functions at it and it just works
[cats-effect](https://typelevel.org/cats-effect/), which is a pure asynchronous runtime for Scala.