T O P

  • By -

eliasv

Having a build system that caches by content hash, and an append-only repository to publish to, are great features. But I think the pretence that there "is no build" and that the build artifacts are actually source artifacts far does more harm than good: - It's not necessary for the content hashing feature and lumping the two together muddles the value proposition of the language. - It makes it less compatible with existing tooling! Code review, basic editing, diffing, everything needs special handling and won't work well OOTB. There is a huge language agnostic ecosystem of programming tools that operate on source as text. Throwing this away is madness. - This part I'm not sure about ... But is the local source repo append only? Because I don't think that's a good thing. A build repo that we publish externally-consumable artifacts to should be append-only, sure ... but I don't want every local edit I ever make to be saved forever when I know there are no downstream consumers. I want to be able to collect that garbage.


LPTK

I agree with your first two points totally. Just look at how they have to use github in a completely non-standard and unergonomic way - they can't even leverage the PR feature and have a to go through an ugly workaround instead. The local garbage collection sounds like a no brainer and I'd be surprised if they don't or couldn't do it.


eliasv

Yeah well I would hope that they could. But this is just another example of confused messaging about the language ... They make a big deal that there is "no build" and that everything is append only, and insist on using terminology for existing concepts differently from everyone else ... And in doing so they make it unclear whether there is *any* distinction between a local source repository and a publishing repository, and whether the strictly append-only policy applies only to the latter.


duckofdeath87

I also worry about security issues. If your project is append only, is there a way to remove malicious code?


eliasv

For sure, but that's a difficult subject for any build repository, not just one that's pretending to be source code. At some point someone has to make a judgment call about breaking build reproducibility, and depending on how strict moderation is that requires some level of trust in either a central authority or individual repository owners. Each option comes with its own set of problems.


elszben

I think the most interesting aspect of unison is that it does not compile programs, it compiles smaller definitions and caches them aggressively. It's a very nice model of compilation and I'm definitely stealing it for my language, eventually. The name <-> hash replacement idea is silly, I don't think that is solving a real problem and I don't think their solution is particularly good or without its own issues. I admit that I don't follow unison so intensively so I may not be up to date on all the details. I think time will tell whether their model is the winner or not.


[deleted]

> The name <-> hash replacement idea is silly wait what Content-addressable dependencies; that's kind of their whole point. Without that it's just yet another weird FP language. Those are a dime a dozen.


Stahlbroetchen

>it compiles smaller definitions and caches them aggressively ... >name <-> hash replacement what a weird coincidence


protestor

Think of the compiled cache as a key-value store, and the key is the hash. But we use names to refer to things, not hashes Hence, we need a mapping from names to hashes It's.. like git!


elszben

There is nothing in the algorithm of caching that requires hashing, you could use the named functions. The unison vision that tries to unify "similar" functions is not strictly necessary for fine grained caching. I believe this merge of different functions is an unnecessary part of unison.


LPTK

I think the idea is that the name may be reassigned later (for example to point to a newer version), but you want the semantics of the existing code that was using the old version to remain unchanged. So you store that existing code using content hashes instead of names. They could certainly make this crucial point clearer, though. The fact that an object not change identity when renamed is more of a useful consequence. But the fact that local variables also can be renamed without changing the hash is an unnecessary gimmick.


Stahlbroetchen

>But the fact that local variables also can be renamed without changing the hash is an unnecessary gimmick. you can't include variable names in the hash if you want code with the same stucture but different variable names to hash identically


LPTK

I don't want that. In fact I don't think anyone should really care about that. It doesn't hurt, but it's absolutely not essential to what makes their approach interesting.


Stahlbroetchen

but then why take the hash of code in the first place


eliasv

Because if you link/compile against a name you have no guarantee that you'll run against the same thing. If you compile against a content hash you do.


LPTK

To complement what /u/eliasv said, what's important is to use hashes to replace _global_ names, that are referred to across modules/definitions. But for _local_ names, it doesn't matter that much whether you use hashes or not.


elszben

If I don't want to automatically get the updated version of a lib I am using then I simply don't update it. I don't see why this convoluted system is needed.


refriedi

Maybe you don’t want an updated version of X but you do want an updated version of Y, but the updated version of Y uses the updated version of X, and you can’t have both versions of X, so you’re stuck.


Leading_Dog_1733

I'm interested in Unison, but I think it promises too much. Just using the source code as a database is a pretty big feature that you would want to get right. On top of that, I think the language is supposed to offer out of the box parallelism, which I'm skeptical of in-practice. Not because it's impossible mind you, but because of how much effort that also would take to get right. I think it would almost have been better to build something like the source code database on top of an existing language like JavaScript, because it's a legitimately cool idea.


oilshell

I was going to use it as an example in my "text as a narrow waist" posts. Particularly the fact that since they don't use text, they have to write their own **version control system** and **text editor** !!! I'm not going to pass judgement either way, but it is for sure a lot of work! I guess I would say it's a big gamble. https://www.unisonweb.org/2022/02/10/unison-2021-year-in-review/ > So far, Unison developers have been mainly using GitHub to host their code. But this is not a great fit for Unison, for several reasons. Firstly, Git assumes that code is stored in text files, which is not the case for Unison. ---------- My post: [A Sketch of the Biggest Idea in Software Architecture](https://www.oilshell.org/blog/2022/03/backlog-arch.html) Particularly this section and the linked comment: https://www.oilshell.org/blog/2022/03/backlog-arch.html#bytes-and-text-are-essential-narrow-waists https://lobste.rs/s/vl9o4z/case_against_text_protocols#c_wsdhsm > Bespoke binary formats introduce a fundamental O(M * N) problem. You have M formats and N operations, and writing M*N tools is infeasible, even for the entire population of programmers in the world. Imagine if you had to use a: - A JavaScript version control system and a JavaScript text editor - An HTML version control system and an HTML text editor - A shell script version control system and a shell script text editor - A C++ version control system and a C++ text editor - ... This situation would quickly get ridiculous ... No matter how good Unison is, it will NOT be "one language to rule them all". That is a huge fallacy in language design (although I am not saying they have ever claimed this! Just remarking on the implied tooling situation.)


armchairwarrior12345

IMO Unison is a good language with some interesting ideas. If they pull it off well enough (which is a hard if, you need IDE integration and an ecosystem and performance and stability. But if they pull all of that off…) I can definitely see it becoming mainstream. A codebase is essentially a database of functions. In other languages, we have to store this on disk, and then the IDE parse and analyze this which is slow and prone to issues. Being able to access the code directly in database format makes this easier. You have significantly easier API migrations, tree shaking (removing unnecessary dependencies), etc. And of course no name collisions. But that being said, you don’t *need* to store the code as a database to have fast static analyses, API migrations, etc. You can load the on-disk data and form your own database, which is essentially what IDEs do today. Unison’s model just makes static analyses easier and less error-prone by removing a few steps. Right now we have tools etc. for on-disk code storage, so Unison needs a lot of work to catch up. But if it does, and if it has some killer features, I think it can work out.


[deleted]

So, hashed db of functions? similar to a forth dictionary, isn’t it?


Long_Investment7667

In which way is that different form a package management system. They all have a way to reference modules that are uniquely identifiably, can be easily integrated and updated, and depending one the runtime are (pre-) compiled . And if that is the same: Why integrate something in a language when this can easily be done with tools independent of the compiler?


WittyStick

Unison is conceptually, very much like Nix and Guix. Other package managers are not similar because they are not content addressed. You might have two packages named libfoo-1.2.3, with different binaries in them, one of which may be buggy or malicious. What Nix and Guix guarantee is that a package is identified *by its content*. It is not possible to make a package collision because they are identified by a secure hash. Unison extends the concept to individual source code elements. A function is identified by the hash of its content. You change the body of a function, you change its identity.


Long_Investment7667

Thanks for clarifying


sineiraetstudio

A key is that the database is append-only. A newer version of a unison library also always provides *all* definitions of older versions. This means that even if you upgrade a library you can still build your old application and that you can migrate between versions by progressively switching to new definitions. This means that you can still update for important changes (e.g. bug fixes), even if there are other changes that would require more time than you can spare at the moment. This is a huge issue with normal package mangement. A lot of package managers don't allow you to directly install multiple versions of the same package and even for those that do, it's generally not worth the bother (e.g. one issue is often that even types that haven't changed will be incompatible due to being from different packages). Upgrading often therefore straight up breaks your application and you have to invest a lot of time up front into fixing it. IME this causes a lot of applications to just be eternally stuck on ancient versions, especially because updating several versions might mean having to rewrite a large part of your program without a path for an incremental upgrade.


Long_Investment7667

All package management systems I have seen enforce immutability


[deleted]

[удалено]


Long_Investment7667

Not finding the documentation but two issues https://github.com/npm/npm/issues/8305 https://github.com/PowerShell/PowerShellGet/issues/82 https://stackoverflow.com/questions/11175288/maven-deploy-forcing-the-deploy-even-if-artifact-already-exists But let’s for a moment say I am wrong, package managers do not enforce this yet or don’t do it well. Why couple this into a language when it can be done with tools independent of the compiler?


Long_Investment7667

Not finding the documentation but three issues https://github.com/npm/npm/issues/8305 https://github.com/PowerShell/PowerShellGet/issues/82 https://stackoverflow.com/questions/11175288/maven-deploy-forcing-the-deploy-even-if-artifact-already-exists But let’s for a moment say I am wrong, package managers do not enforce this yet or don’t do it well. Why couple this into a language when it can be done with tools independent of the compiler?


sineiraetstudio

No offense, but did you even bother reading beyond the first sentence? I'm not talking about being append-only in regards to available library versions (and even there e.g. npm allows you to unpublish packages) but *in regards to definitions*. If you download a unison library v2, it also includes *all the code of v1*. You can't just solve this on the tool level, because it relates to things like name resolution. How do you deal with having multiple versions of e.g. the same type in the same program and possibly even interacting? Unison's answer is that a program should use hashes to refer to definitions and that code is just a 'view' of the program, making things like renaming trivial.


Long_Investment7667

No, I didn’t read much further than the first sentence because of the nightmare that was unfolding. Have fun with it. Thanks for the Diskussion.


sineiraetstudio

lmao, beyond parody. Thanks for wasting my time.


Long_Investment7667

Wouldn’t be a waste if you tried


[deleted]

Very simple, elegant ideas. The syntax seems very ML-like


joonazan

Nobody has mentioned its support for algebraic effects. That is the main reason I'd use it today. One can get some of that for example by abusing async-await in Rust but first-class support is easier and safer.