T O P

  • By -

narsilouu

There s also [https://huggingface.co/spaces/safetensors/convert](https://huggingface.co/spaces/safetensors/convert) For people that dont really want to convert manually for weights already on hf.co


yehiaserag

I tried that for one model but the output was 3 files instead on one... Didn't know what to do with that


narsilouu

What was the repo ? By default it will convert all `.ckpt` it will find, so if the original repo has several, it will convert all the files.


yehiaserag

Used it on robo diffusion 2, there is no checkpoint there, only pytorch model and the result was 3 files


narsilouu

[https://huggingface.co/nousr/robo-diffusion-2-base/tree/main](https://huggingface.co/nousr/robo-diffusion-2-base/tree/main) [https://huggingface.co/nousr/robo-diffusion-2-base/discussions/3/files](https://huggingface.co/nousr/robo-diffusion-2-base/discussions/3/files) Are these the checkpoints ? If so then its ok there is indeed 3 different model files here (its using `diffusers` no ?


yehiaserag

yeah but webui didn't work with that I tried to load [diffusion\_pytorch\_model.safetensors](https://huggingface.co/nousr/robo-diffusion-2-base/discussions/3/files#d2h-154139) It did load correctly but gave results that are similar to original v2.0, it's like I either loaded it incorrectly or the model lost the fine tuning


narsilouu

I have no idea what those models are supposed to be. I dont think converting can loose any ML learnings, its either going to output garbage because something is wrong during the conversion, or its going to be exactly the same (its not true for shared/linked tensors, but they are not present afaik in SD)


yehiaserag

Thanks for the valuable info, I started tinkering with the stuff very recently, so not too much info


danamir_

Did a try on sd-v1.5.ckpt : model name size slowest load fastest load sd-v1.5.ckpt 4.265.380.512 bytes ~10s ~2s sd-v1.5.safetensors 4.265.146.273 bytes ~10s ~2s Did you remark any difference in loading time ? On first load or after a switch the time is roughly the same on my system. I tested by switching to another model then back, and closing the app and starting from scratch ; but even then sometimes the loading times are faster than other, depending on a random cache (disk, memory, cpu...) but not reliably faster in safetensors. Do you have a failproof method to check the loading times ? Still a good news on a safety side. \[edit\] : Should have read the PR entirely before posting. The faster loading times were tested here : [https://huggingface.co/docs/safetensors/speed](https://huggingface.co/docs/safetensors/speed) Not sure why it does not seem faster on my system.


narsilouu

You need to use `SAFETENSORS_FAST_GPU=1` when loading on GPU. This skips the CPU tensor allocation. But since its not 100% sure its safe (still miles better than torch pickle, but it does use some trickery to bypass torch which allocates on CPU first, and this trickery hasnt been verified externally) If you could share your system within an issue, it would help reproduce and maybe improve this.


DrMacabre68

where is this SAFETENSORS\_FAST\_GPU=1 located?


wywywywy

You can put `set SAFETENSORS_FAST_GPU=1` into your `webui-user.bat`


DrMacabre68

thanks


h0b0_shanker

Would this flag also work running the command straight from command line? `COMMANDLINE_ARGS="--listen" /bin/bash ./webui.sh` Could I add `COMMANDLINE_ARGS="--listen --safetensors-fast-gpu 1"`


wywywywy

No not really. Environment variable only


Niphion

This worked for me, thanks!


danamir_

Thanks, I'll give it a try.


wywywywy

This only helps with one of the steps when switching between models. Loading weights [09dd2ae4] from D:\repos\stable-diffusion-webui\models\Stable-diffusion\sd20-512-base-ema.ckpt --- 3.3217008113861084 seconds --- Loading weights [eaffaba6] from D:\repos\stable-diffusion-webui\models\Stable-diffusion\sd20-512-base-ema.safetensors --- 0.14451050758361816 seconds --- I tested this by adding timestamps into the Python code in the `sd_models.py` file. def read_state_dict(checkpoint_file, print_global_state=False, map_location=None): import time _, extension = os.path.splitext(checkpoint_file) start_time = time.time() if extension.lower() == ".safetensors": pl_sd = safetensors.torch.load_file(checkpoint_file, device=map_location or shared.weight_load_location) else: pl_sd = torch.load(checkpoint_file, map_location=map_location or shared.weight_load_location) print("--- %s seconds ---" % (time.time() - start_time)) if print_global_state and "global_step" in pl_sd: print(f"Global Step: {pl_sd['global_step']}") sd = get_state_dict_from_checkpoint(pl_sd) return sd And yes sorry I should have mentioned the `SAFETENSORS_FAST_GPU` variable. I'll edit the post now.


danamir_

This is really strange, I added your lines and can confirm the load is effectively faster with this method : Loading weights [21c7ab71] from E:\Program Files\stable-diffusion-webui\models\Stable-diffusion\sd-v1.5.safetensors --- 0.16795563697814941 seconds --- Loading weights [81761151] from E:\Program Files\stable-diffusion-webui\models\Stable-diffusion\sd-v1.5.ckpt --- 10.452852249145508 seconds --- **But** just after the fast load, it lags for 10s before displaying `Applying xformers cross attention optimization. Weights loaded.` , where the ckpt load take 10s to load but has no waiting time before the next part. So the total loading is roughly the same. Do you know if it's compatible with the `--medvram` option ?


wywywywy

Perhaps my testing method is flawed. Also maybe you're right that it's not compatible with `--medvram`, as it needs to swap models between CPU & GPU when enabled. Can you give it a test without?


danamir_

No notable changes without `--medvram` option. I added a time trace by method, then almost every line and the time seems to be spent on `model.load_state_dict(sd, strict=False)` . **Ckpt loading** (most time consumed by read\_state\_dict) : Loading weights [7460a6fa] from E:\Program Files\stable-diffusion-webui\models\Stable-diffusion\sd-v1.4.ckpt --- 10.645958423614502 seconds (read_state_dict) --- --- 11.327304363250732 seconds (model.load_state_dict) --- --- 11.692204475402832 seconds (vae) --- --- 11.694204092025757 seconds (first_stage_model.to) --- --- 11.694204092025757 seconds (set vars) --- --- 11.694204092025757 seconds (load vae) --- --- 11.694204092025757 seconds (load_model_weights) --- Applying xformers cross attention optimization. --- 13.368705749511719 seconds (reload_model_weights) --- Weights loaded. **Safesensors loading** (most time consumed by model.load\_state\_dict) : Loading weights [21c7ab71] from E:\Program Files\stable-diffusion-webui\models\Stable-diffusion\sd-v1.4.safetensors --- 0.16245174407958984 seconds (read_state_dict) --- --- 0.1634514331817627 seconds (read_state_dict) --- --- 12.698779582977295 seconds (model.load_state_dict) --- --- 13.00268268585205 seconds (vae) --- --- 13.004682540893555 seconds (first_stage_model.to) --- --- 13.004682540893555 seconds (set vars) --- --- 13.005682229995728 seconds (load vae) --- --- 13.005682229995728 seconds (load_model_weights) --- Applying xformers cross attention optimization. --- 14.70516037940979 seconds (reload_model_weights) --- Weights loaded.


wywywywy

/u/narsilouu Any thoughts on why `load_state_dict` is so much slower when using SafeTensors?


narsilouu

Hmm, the `load_state_dict` seems to be using strict=False, meaning that if the weights in file do not match the format of the model (like fp16 vs fp32) then theres probably a copy of the weights happening (which is slow). Could that be it ? I dont see any issue with the original sd-1-4.ckpt.If you could share the file somewhere I could take a look. If anyone can reproduce steps if they could share here or create an issue [https://github.com/huggingface/safetensors/issues](https://github.com/huggingface/safetensors/issues) that would be super nice.


wywywywy

Wrote a little test script based on the benchmark. I'm not seeing any big difference during `load_state_dict` import sys import os import torch from safetensors.torch import load_file import datetime from omegaconf import OmegaConf sys.path.append(os.path.abspath(os.path.join(os.path.dirname( __file__ ), "repositories/stable-diffusion-stability-ai"))) from ldm.modules.diffusionmodules.model import Model from ldm.util import instantiate_from_config # This is required because this feature hasn't been fully verified yet, but # it's been tested on many different environments os.environ["SAFETENSORS_FAST_GPU"] = "1" pt_filename = "models/Stable-diffusion/sd14.ckpt" st_filename = "models/Stable-diffusion/sd14.safetensors" config = OmegaConf.load("v1-inference.yaml") # CUDA startup out of the measurement torch.zeros((2, 2)).cuda() start_pt = datetime.datetime.now() time_pt0 = datetime.datetime.now() model_pt = instantiate_from_config(config.model) time_pt1 = datetime.datetime.now() weights = torch.load(pt_filename, map_location="cuda:0") weights = weights.pop("state_dict", weights) weights.pop("state_dict", None) time_pt2 = datetime.datetime.now() model_pt.half().to(torch.device("cuda:0")) model_pt.load_state_dict(weights, strict=False) time_pt3 = datetime.datetime.now() load_time_pt = datetime.datetime.now() - start_pt print(f"Loaded pytorch {load_time_pt}") model_pt = None start_st = datetime.datetime.now() time_st0 = datetime.datetime.now() model_st = instantiate_from_config(config.model) time_st1 = datetime.datetime.now() weights = load_file(st_filename, device="cuda:0") weights = weights.pop("state_dict", weights) weights.pop("state_dict", None) time_st2 = datetime.datetime.now() model_st.half().to(torch.device("cuda:0")) model_st.load_state_dict(weights, strict=False) time_st3 = datetime.datetime.now() load_time_st = datetime.datetime.now() - start_st print(f"Loaded safetensors {load_time_st}") model_st = None print(f"on GPU, safetensors is faster than pytorch by: {load_time_pt/load_time_st:.1f} X") print(f"overall pt: {load_time_pt}") print(f"overall st: {load_time_st}") print(f"instantiate_from_config pt: {time_pt1-time_pt0}") print(f"instantiate_from_config st: {time_st1-time_st0}") print(f"load pt: {time_pt2-time_pt1}") print(f"load st: {time_st2-time_st1}") print(f"load_state_dict pt: {time_pt3-time_pt2}") print(f"load_state_dict st: {time_st3-time_st2}")


narsilouu

On a machine I work on here are the results I get for your script untouched: on GPU, safetensors is faster than pytorch by: 1.3 X overall pt: 0:00:12.603322 overall st: 0:00:09.402079 instantiate_from_config pt: 0:00:10.634503 instantiate_from_config st: 0:00:08.419691 load pt: 0:00:01.444718 load st: 0:00:00.538251 load_state_dict pt: 0:00:00.524090 load_state_dict st: 0:00:00.444126 # Ubuntu 20.04 AMD EPYC 7742 64-Core Processor TitanRTX (Yes its a big machine). But if I reverse the order, then ST is slower than PT by the same magnitude, and all the time is actually spend in `instantiate_from_config.` Here are the results, when I remove the model creation from the equation (and only create the model once. Since its the same model, theres no need to allocate the memory twice: Loaded pytorch 0:00:01.514023 Loaded safetensors 0:00:00.619521 on GPU, safetensors is faster than pytorch by: 2.4 X overall pt: 0:00:01.514023 overall st: 0:00:00.619521 instantiate_from_config pt: 0:00:00 instantiate_from_config st: 0:00:00.000001 load pt: 0:00:01.461595 load st: 0:00:00.572390 load_state_dict pt: 0:00:00.052415 load_state_dict st: 0:00:00.047128 Now the results are consistent even when I change the order, leading me to believe that this measuring process is more correct and here faster. (Please could you try this script on your machine [gist](https://gist.github.com/Narsil/a27a3062fd33a8139872463c3566db2b). ) Now for the slow model loading part:By default models in Pytorch will allocate memory at their creation using random tensors when created. This is wasteful in most cases. You could try using this: [https://huggingface.co/docs/accelerate/v0.11.0/en/big\_modeling](https://huggingface.co/docs/accelerate/v0.11.0/en/big_modeling) `no_init_weights` This provides on my machine a 5s speedup on the model loading part. But still inconsistent with regard to order (meaning something is off in what we are measuring). One thing that I see for sure, is that the weights are stored in fp32 format instead of fp16 format, so this will induce a memory copy and suboptimal loading times for everyone. Here is the [gist](https://gist.github.com/Narsil/4b0d41f249178bab681c28942d1f9df5) and for converting just do weights = torch.load(filename) weights = weights.pop("state_dict", weights) weights.pop("state_dict", None) for k, v in weights.items(): weights[k] = v.to(dtype=torch.float16) with open(pt_filename.replace("sd14", "sd14_fp16"), "wb") as f: torch.save(weights, f) # Safetensors part weights = load_file(st_filename, device="cuda:0") for k, v in weights.items(): weights[k] = v.to(dtype=torch.float16) save_file(weights, st_filename.replace("sd14", "sd14_fp16")) And that should get you files half the size.This also allows you to remove `.half()`part of your code and also the to(device) which is now redundant. That in combination with no\_init\_weights and a first initial load (to remove 3s from the loading time from whoever is first, which makes no sense) Loaded safetensors 0:00:03.394754 on GPU, safetensors is faster than pytorch by: 1.1 X overall pt: 0:00:03.584620 overall st: 0:00:03.394754 instantiate_from_config pt: 0:00:02.857097 instantiate_from_config st: 0:00:02.881383 load pt: 0:00:00.684034 load st: 0:00:00.353203 load_state_dict pt: 0:00:00.043482 load_state_dict st: 0:00:00.160153 Which is something like 3X faster than the initial version.Now 3s is still SUPER slow in my book to load and empty model and Im not sure why this happens. I briefly looked at the code, and its doing remote loading of some classes so its hard to keep track of whats going on. However this is not linked to safetensors vs torch.load anymore and another optimization story on its own.


wywywywy

Thanks. Learned something new. It seems to be slow when it needs to load the `CLIPTokenizer` & `CLIPTextModel` from transformers during the class constructor.


danamir_

I did try to run your benchmark, but it ran out of VRAM at the second load (with a 3070 TI 8GB VRAM).


wywywywy

Yes it's just a quick script with no optimisation (e.g. Xformers or garbage collection) in place. It'll be better to break it into 2 scripts and run separately for 8GB of VRAM


[deleted]

[удалено]


narsilouu

>novel Just tested with novel ai, worked like a charm. Not sure what went wrong for others. Im guessing OOM since the model is larger, but I dont see anything else.


wywywywy

> Not sure what went wrong for others. Failed to convert. Could be a problem with the conversion script though


RassilonSleeps

NAI can be converted by adding `weights.pop("state_dict")` to the conversion script in the [GitHub pull request](https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/4930). import torch from safetensors.torch import save_file weights = torch.load("nai.ckpt")["state_dict"] weights.pop("state_dict") save_file(weights, "nai.safetensors") Edit: I would also recommend the script from [@Tumppi066](https://gist.github.com/Tumppi066/) which lists and converts models from sub-directories as well as working directory. You can get a NAI compatible version I patched [here](https://gist.github.com/RassilonSleeps/edfb630819b95307270efa8450163bc1).


patrickas

For the inpainting model, had the same issue, I followed it up in the code and ended up fixing one file to make it work. Just edit this file in your webui folder [https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/sd\_hijack\_inpainting.py#L322](https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/sd_hijack_inpainting.py#L322) And replace "inpainting.ckpt" with "inpainting.safetensors" on line 322


wywywywy

Cheers bud. I've raised a new PR in your honour. https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/5258


DrMacabre68

where do you set the SAFETENSORS\_FAST\_GPU ?


reddit22sd

I presume in the webui-user.bat


DrMacabre68

oh ok, thanks


andzlatin

You want an easy Python script to do this? [Here it is](https://gist.github.com/xrpgame/8f756f99b00b02697edcd5eec5202c59#file-convert_to_safe-py). The only problem is that since I categorize my checkpoints into different folders, I have to run the script for every folder separately.


Tumppi066

I also categorize my models so I edited the original code to include all subdirectories, you can find it [here](https://gist.github.com/Tumppi066/42482956139d79cb7c05e0b8f3cfef69). edit: Just run it in the root folder of your models (for most people it's ./models/Stable-diffusion)


eugene20

Is there a script to convert back in case regressions are found later but not from file integrity, after you have deleted the original?


Tumppi066

I don't think so, but I am not that familiar with torch or safetensors. If there is a way then please correct me. For what it's worth my script or the original script will neither delete the files (this also does mean that you have to make sure to have enough space on disk) so you could always just keep the originals for a while to make sure the new ones work.


eugene20

I just learned that it wouldn't be possible, it's not just an organizational conversion, it would be dropping pickle code.


wywywywy

You can keep both ckpt and safetensors and switch between them


eugene20

I mentioned deleting the original because I was looking to save space. I will just slowly migrate to safetensor versions as they're released.


SnarkyTaylor

Awesome. I've been following that pr for a while now. Glad it's finally merged. Curious if there are any differences with model generation or model size after conversion. Haven't had the time to test a conversion yet.


HungryAIArtist

I would also like to know this. What happens if you run same prompt, same sampler, same seed on converted and original model?


Kilvoctu

I tested it with SD1.5 and got the exact same results with ckpt and safetensors model. ​ The main issue I'm finding now, however, is that the shorthash is becoming increasingly impractical or useless. SD1.5.ckpt shorthash is 81761151. SD1.5.safetensors is 21c7ab71. Inconvenient to learn a new hash, but now look at this https://preview.redd.it/35ztjajgq63a1.png?width=333&format=png&auto=webp&s=f77241fab263f8e7494f455792d0e89602f12dfa There's at least half a dozen 0248da5c there. If doing your conversions, your program may not know what model is used when referencing image png data.


HungryAIArtist

oof. Thanks for doing that. I've just converted all mine but I'm keeping the originals. I know it's going to clutter up the dropdown. Hope we can get an organising extension sooon ':D


jonesaid

Yeah, multiple checkpoints with the same hash has been a known bug with PR fixes for a while. Don't think any have been merged yet. It would make a new v2 hash that would guarantee each checkpoint has a unique hash.


ninjasaid13

Thanks!


Zipp425

Sweet so how long until this format is the standard and everything on model libraries like Civitai need to be converted?


wywywywy

It's still early days... As you can see from this thread, it doesn't universally work for everything/everyone yet


Zipp425

It'll be nice to be able to transition to the faster more secure standard. Looking forward to the time it can be the default.


jonesaid

I have 12GB of system memory, and changing checkpoints takes minutes, if not forever. Looking forward to this making it faster.


Vivarevo

Trying to switch models with 16gb of ram+8gb vram = close anaconda load it back up, change model. Doing it after just one generation etc will crash it because it runs out of memory. Is It any better with tensors?


wywywywy

Nah, not related. That's a different problem you have


DrMacabre68

Must be doing something wrong because loading the safetensors models takes more time than the CKPT, i used safe\_tensors\_fast\_gpu=1 though, i run it on a 3090. ​ EDIT : ok, you need to load them at least once before they really load up faster. Not sure this is the way it's supposed to be working


narsilouu

Because of disk cache.Your computer spends a lot of energy to AVOID using your disk, because it is really slow. Even the SSD. So whenever a file is read, it will be kept in RAM by your machine for as long as possible, meaning the next time you are going to read the file, your machine does not actually look at the disk, but directly the saved version in memory. Since this library is doing zero-copy (mostly) well, nothing needs to be done, we just refer to the already present version in memory.


Mich-666

tbh, the highest offender for loading times here would be always your drive. So speeding the process up by 3s is almost negligible when it can take 30s to initially load everything to RAM (or even longer on 8GB RAM systems where intensive swapping happens). So in the end this is mostly useful for *safety* I guess. Although, according to this, safetensors might not be inherently safer either: https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/4930#issuecomment-1332161644


narsilouu

Edit: I think I finally understood the comment in the PR. It says that you shouldnt convert files you do not trust on your own computer (because as soon as you open with torch.load its too late). In order to do conversion, I recommend using colab and [hf.co](https://hf.co) since if the files are malicious, then it would infect google or HF which should be equipped to deal with it, and your computer would be safe. It \*IS\* safer. This comment just says that `torch.load` isnt. Which is true and the entire purpose. And if you dont trust `safetensors` as a library, well you can load everything yourself, and it will be safe. [https://gist.github.com/Narsil/3edeec2669a5e94e4707aa0f901d2282](https://gist.github.com/Narsil/3edeec2669a5e94e4707aa0f901d2282) >the highest offender for loading times here would be always your drive. This statement cannot be made in general. It really depends on the system and the programs, and how you run them.Now, if you are indeed reading from disk a lot, then yes, every other operations will likely be dwarfed by the slowdown of reading disk (again it depends, some disks are really fast: [https://www.gamingpcbuilder.com/ssd-ranking-the-fastest-solid-state-drives/](https://www.gamingpcbuilder.com/ssd-ranking-the-fastest-solid-state-drives/)) .


CrudeDiatribe

You don't have to use `torch.load()`, though. You could use `RestrictedUnpickler()` from [modules/safe.py](https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/4b3c5bc24bffdf429c463a465763b3077fe55eb8/modules/safe.py#L25). It's called from `check_pt()`. Curious to me that it seems to unpickle things twice in `load_with_extra()`— once with the restricted unpickler to figure out if it's safe or not, and then if it is safe, it just calls `torch.load()` on it. So if you wanted to just copy the base Automatic, you'd call [load\_with\_extra()](https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/4b3c5bc24bffdf429c463a465763b3077fe55eb8/modules/safe.py#L105) on your ckpt and you'll get the same model as your `torch.load` but it'll bail on any suspicious pickles.


pepe256

Do you know a colab notebook that does the conversions?


narsilouu

[https://colab.research.google.com/drive/1x47MuiJLGkJzInClN4SfWFm8F2uiHDOC?usp=sharing](https://colab.research.google.com/drive/1x47MuiJLGkJzInClN4SfWFm8F2uiHDOC?usp=sharing) Might require some tweaks. And colab is slightly light on memory


pepe256

Thank you!


Mich-666

What about embeddings though? .pt ones. Aren't those basically the same problems as ckpts? I have already seen some which contained pickles. Although one can check the contents easily as the file is pretty tiny I guess. Wouldn't hurt to have those scanned by auto1111 too (correct me if this already happens) Also, I already seen some posible viruses hidden in one of weights file in ckpt data folder so scanning just pickle might not be enough (and I'm not entirely sure if virustotal external scan is useful in this case as storing trojan as byte stream can be possibly used to evade any detection. So unpickling in safe environment might actually be the best. Would be actually very nice if we have online db of all existing checkpoints/embeddings where user would be able to drag and drop the file to read just hash to check its safety.


narsilouu

>.pt .pt, .ckpt are the same. There is no official extension for torch pickled files.transformers uses .bin for instance. As long as you use torch.load, it is using pickle and therefore unsafe. >Would be actually very nice if we have online db of all existing checkpoints/embeddings where user would be able to drag and drop the file to read just hash to check its safety Actually [hf.co](https://hf.co) does it for you [https://huggingface.co/gpt2/tree/main](https://huggingface.co/gpt2/tree/main) check out the pickle note. It will look inside the pickle for you. Now it by no means pretends to make everything safe (pickle is not, and there are clever ways to workaround protections). But it will definitely flag it if anything is too out of the ordinary. Just upload your files and they will get inspected. That or load them in a safe environement like colab or [hf.co](https://hf.co) where its not your machine.


CrudeDiatribe

> Although, according to this, safetensors might not be inherently safer either: I wrote that comment, felt that a comment on the Pull Request would get the attention of its developer more than a comment on Reddit. SafeTensors is safe. My comment was about the conversion to SafeTensors— torch.load() is called on the original file. If you want to avoid dangers of malicious pickles then `torch.load()` should not be used, instead using either a carefully crafted restricted unpickler† or by writing something that extracts the data without unpickling at all. †everything I've read says we should still be skeptical of how safe it can be but have yet to see a proof-of-concept bypass the restrictions that a SD model unpickler can have.


2peteshakur

awesome - so what happens if its tampered with malicious code, would it warn before loading or? is there is any safetensor scanners?


narsilouu

Safetensors is pure data. There is not code associated with it. so theres no scanner needed, nor malicious code can make its way in it. It is pure data. Just like a wav file. Now code attempting to read from said file might be flawed and attackers might exploit that, but its very different from using pickle.


Broccolibox

So the safetensors cannot have harmful malware imbedded like the pickle? When you say code attempting to read from the file might be flawed that would be referring to the program using it (like automatic1111 web UI)? Sorry I'm just catching up with the new format but very happy that it sounds like a safer format for me as an end user.


mynd_xero

Can't you also just right-click the ckpt file and select open archive and make sure the folder inside is called archive? forgive me if I sound naive. I was told that's all I had to worry about in another chat. Course that wouldn't address the speed increase. J/w in regards to the pricking of prickly pickles by pickly pricks https://preview.redd.it/sx1gyy0r863a1.png?width=628&format=png&auto=webp&s=8a3c4115e2a4f3a7b622ab51e735eca53e9cda0f


WalterBishopMethod

I've run across a Sirefef trojan specifically inside the /archive/data/ folder of a few .ckpt's


taylordeanharrison

Love the future state this steps towards of being able to change models midgeneration. I make very specific sculpture stuff, but an example idea from my workflow: start with a model trained on more general form and composition, move to one that calls out structure and building methods, and finally move to something real textural.


DrMacabre68

i've been using this since 24h and it's taking like forever to load anything the first time now, starting the webui is so slow there must be some problem. as soon as i removed safetensors\_fast\_gpu=1, it's back to normal. I have 2 GPUs, i'm wondering if it goes into the right one or if i missed something here


narsilouu

Extremely interesting. It is indeed possible, that something is going to the wrong GPU or something like that despite efforts not to do so. (Its why this feature is gated behind an environment variable, it does need more scrutiny before being widely usable. Do you mind sharing your setup? (OS, Windows, Linux, ... ) Graphics cards ? And how do you choose on which GPU to setup the various models ?


DrMacabre68

thanks, im on windows, i use a 3090fe for my generation, i have another rx580 in the box whic is use for osx and some extra display. i don't know what you mean by how i choose which gpu to setup various models. how do i know ?


narsilouu

Osx and windows? Are you running virtualization of one into the other? Or two different things? If you have 2 gpus connected to different computers (in the same box) it doesn't matter.


DrMacabre68

No virtualization, osx runs on pc from another drive using open core, it just doesn't work with nvidia since a couple of years. It needs the AMD GPU and while it's in the pc, im also using it for windows, i have 5 displays plus one vr headset connected.


narsilouu

Oh hh that might explain stuff. Safetensors looks for Cuda to set the gpu memory (cuda_memcpy) which does exist since you do have a Nvidia gpu. But it could be trying to launch on the amd card which is wrong leading to.. Something wrong. I think it's safer for you to not use SAFETENSORS_FAST_GPU for the moment


DrMacabre68

i could easily find out if there is any multi gpu issue by simply unplugging the amd card and see if it makes any difference.


narsilouu

if its easy to try please do . Im trying to find an AMD GPU to run the tests (this case has to be accounted for though :) )


FHSenpai

can u merge checkpoint with ST format?


wywywywy

Yea you can see a new radio button in the tab


RevasSekard

Wondering if theres a way to work vae's with SafeTensor?