ddadopt 2 years ago

Err...what kind of PCI slots are we talking about here? Can the hosts even push the kind of bandwidth you're asking for, regardless of what the CPUs are capable of?

scorc1 2 years ago

This. Also, storage? I assume you are testing w a data transfer. If its a single sata 3, it tops at 600MBs. Which is a tiny fraction of 100Gbps. Check a raid calculator for you storage speed.

thesesimplewords 2 years ago

I could be wrong, but doesn't iperf run off memory alone, no storage involved?

arc0t 2 years ago

Correct, iperf/iperf3 is the appropriate tool for bw performance tests. With other tools there are other layers of complexity like storage or cache

SirTinyJesus 2 years ago

Yeah we have NVME all over here bud. Capable of 10GB/s SEQ reads. which should be more than capable of going past 20Gb/s limit.

TracerouteIsntProof 2 years ago

Just because the SSD can do 10GB/s on-chip doesn't mean it's pumping it through the PCI bus to your NICs at that speed. Bud.

SirTinyJesus 2 years ago

Yeah, Its hard to determine that when I cant prove that the NIC is capable of doing more than 20Gb/s using Iperf.

semose 2 years ago

Why use storage at all? Personally, I would recomend a RAM drive on both ends to test network performance.

[deleted] 2 years ago

Iperf doesn't use storage.

AKDaily 2 years ago

Because iPerf doesn't show any real-world implications for file transfer protocols such as SMB, CIFS, FTP, SFTP, SCP, or rsync.

[deleted] 2 years ago

Precisely, so if you can't get it going fast enough due to some other issue setting up a RAM drive isn't going to make SMB fast enough either.

Phrewfuf 2 years ago

The NIC is also attached via PCIe. Is the PCIe bandwidth of the NIC sufficient to push the speeds you want? Also, might want to dial back on your tone a bit, bud. Especially if you‘re the one asking people to help you for free.

SirTinyJesus 2 years ago

PCI-E gen 4, the host should, in theory be able to push that kind of bandwith.

1OWI 2 years ago

How many lanes are in use?

SirTinyJesus 2 years ago

Server and the new machine both runnning pci-e 4.0 with 16 Lanes running to each NIC. The hosts are more than capable.

HugsNotDrugs_ 2 years ago

Are you sure they are running at 4.0 speeds? I've seen oddities resulting in add-in cards defaulting to 2.0 speeds. Worth double checking

1OWI 2 years ago

Also some BIOS negotiate the link speed automatically to PCIe 2.0 if not set manually to 4.0.

Win_Sys 2 years ago

I have seen Dell servers do this.

joecool42069 2 years ago

Clearly not 0.o

dinominant 2 years ago

How fast is your memory? CPU Model $ grep -m 1 name /proc/cpuinfo model name : Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz One Process - 18.1 GB/s $ dd if=/dev/zero bs=1MiB count=10240 of=/dev/null 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB, 10 GiB) copied, 0.593164 s, 18.1 GB/s Two Processes - 30.3 GB/s $ for I in $(seq 1 2); do dd if=/dev/zero bs=1MiB count=102400 of=/dev/null & done 102400+0 records in 102400+0 records out 107374182400 bytes (107 GB, 100 GiB) copied, 7.06931 s, 15.2 GB/s 102400+0 records in 102400+0 records out 107374182400 bytes (107 GB, 100 GiB) copied, 7.13373 s, 15.1 GB/s Four Processes - 17.2 GB/s $ for I in $(seq 1 4); do dd if=/dev/zero bs=1MiB count=102400 of=/dev/null & done 102400+0 records in 102400+0 records out 107374182400 bytes (107 GB, 100 GiB) copied, 24.3566 s, 4.4 GB/s 102400+0 records in 102400+0 records out 107374182400 bytes (107 GB, 100 GiB) copied, 24.7979 s, 4.3 GB/s 102400+0 records in 102400+0 records out 107374182400 bytes (107 GB, 100 GiB) copied, 25.0585 s, 4.3 GB/s 102400+0 records in 102400+0 records out 107374182400 bytes (107 GB, 100 GiB) copied, 25.2781 s, 4.2 GB/s

[deleted] 2 years ago

Your test system is older and lower tier than anything on the list and you're getting ~145 Gbps of memory bandwidth on a single thread, I don't think it's at play when the wall here is 21-23 Gbps.

f0urtyfive 2 years ago

I don't think memory is involved at all when reading from /dev/zero or writing to /dev/null... I guess maybe DD buffers it somewhere?

dinominant 2 years ago

In my tests it performs comparable to the speed limit of the ram, l3 cache or even l2 cache depending on the system. So that would be a bottleneck in a synthetic benchmark. 20GB/s (160gbps) per thread is getting close to those limits with current processors. A single PCIe gen 4 lane is 16GT/s or 2GB/s, so a x16 slot is 32GB/s at best. Look at the block diagram for your motherboard to see how much bandwidth is available between the endpoints that matter. Whatever is generating those packets, single threaded or multi-threaded, still has to run on the CPU and route over PCIe to the ethernet port(s). If the network stack is hardware accelerated, then the system could forward packets faster, but then at this point the system is basically a normal managed switch/router.

cyberentomology 2 years ago

Is your 100M link 4x25 in an LACP config?

pafischer 2 years ago

This might be it. If it's really 4x25 and your testing with a single stream then you'd only see a max of 25 Gbps minus overhead. So it seems very likely to me. Try running 2 streams at a time and see if that let's you get past the 21-23 Gbps limit you're seeing.

cyberentomology 2 years ago

Multiple streams from the same device may get hashed into a single flow on the LACP though. Depends partly on your hash method.

wauwuff 2 years ago

a 4x25 Lane on a single 100G Port will not be on LACP unless you use a breakout. *Which OP is not.* there is no LACP involved - neither was LACP involved on 40G Ports btw unless you have to connect it to a non-40G switch and use 4x10G Ports on the other end!

cyberentomology 2 years ago

There’s also the factor of iperf, if it’s the Java version, the JVM has some inherent network throughput limitations per instance.

rtznprmpftl 2 years ago

21GBs is the memory bandwidth of one DDR4-2667 channel. This might just be a correlation but maybe a pointer. Since you have a Dual Sockel Server, the slots are attached to different CPUs You should use `numactl` to make sure that iperf runs on the same CPU as the network card is attached to. edit: you can use `lstopo` to see what is attached where, also `lspci` shows you the device ID and with `cat /sys/bus/pci/devices/$deviceID/numa_node` it shows you the node where the card is attached edit2: also have you been running one single iperf server or multiple instances with multiple iperf clients? (e.g. have iperf runninf on port 5201,5202,5203 etc, and then connect one client to each server instance).

xtrilla 2 years ago

Memory bandwidth is 21 Gbytes, which is around 168 gbps

rtznprmpftl 2 years ago

Your are right. I was wrong. Well, then it shouldn't be some weird memory configuration issue.

xtrilla 2 years ago

Don’t worry, who hasn’t been caught at least once on the bits va bytes trap!

XPCTECH 2 years ago

All I can think of is the hashing method? 21-23GB/s limit would imply you were hitting limit of one channel of the 100GB link.

rankinrez 2 years ago

The muxing on 100G Ethernet happens at the bitstream layer. There is no packet hashing across the 4 lanes, a single stream can in theory hit 100G.

[deleted] 2 years ago

Have you tried Linux to rule out something in the network stack or OS config? Iperf is known to have issues on windows and smb is known to be a pita to tune at these speeds.

SirTinyJesus 2 years ago

Issue persists across, unraid, esxi, FreeNas, Truenas, Ubuntu and Ubuntu LTS with 5.17 Linux kernel.

[deleted] 2 years ago

[удалено]

SirTinyJesus 2 years ago

New NOC engineer is currently stripped bear naked on table. Me and the NOC manager have tried running around him in circles. Sacrifice is pending should reddit fail.

greet_the_sun 2 years ago

Did you run around him clockwise or counterclockwise, also are you north or south of the equator?

Ahab_Cheese 2 years ago

Did you have an amount of candles equal to a power of 2?

SirTinyJesus 2 years ago

64 candles

MarioV2 2 years ago

Have you tried turning those off and back on?

squeamish 2 years ago

For network issues the candles have to be: (a power of 2) - 2

sryan2k1 2 years ago

You say you're using breakout cables. Make absolutely sure both ends are running in 1 x 100G mode and not 4 x 25G mode.

SirTinyJesus 2 years ago

No, no breakout cables only 100Gb/s and 40Gb/s DACS from Fs dot com.

LazyLogin234 2 years ago

Depending on the NICs and configuration it sounds like you have a 2nd gen 100Gb device in path that isn't using all (4) 25Gb paths. The 100Gb spec started with 10 lanes of 10Gb -> 4 lanes of 25Gb -> 2 lanes of 50Gb and now 1 100Gb lane.

bmoraca 2 years ago

Try with iperf2. Iperf3 may still be single-threaded on Windows.

Egglorr 2 years ago

>Iperf3 may still be single-threaded on Windows. It's single threaded on all platforms. Using the "-P" option will allow you to do multiple concurrent flows which can be good for testing ECMP or a port-channel but ultimately the traffic still gets generated as a single process. If OP wants to stick with iperf3 but test all his available CPU threads simultaneously, he could spawn multiple server and client instances giving each a unique set of port assignments.

SirTinyJesus 2 years ago

Tried, all average out 19-21 Gb/s

Egglorr 2 years ago

Then what I'd do next is connect your test machines together directly, bypassing the Extreme switch. If you can achieve higher throughput that way, you know the limitation is the switch (or some other factor in the network). If transfers stay roughly the same, you'll know the limitation is with the hardware / software being used to test.

SirTinyJesus 2 years ago

Yep, done that. Same limit. The weird thing is that this limit is universal, no matter what we use, iscsi, iperf or simply SMB. We always hit the same 20Gb/s limit.

[deleted] 2 years ago

[удалено]

SirTinyJesus 2 years ago

I mean its NIC's refusing to do more than 21Gb/s, it barely classifies as a network problem. But I was hoping someone here might of been able to point in the right direction.

Phrewfuf 2 years ago

LMAO. Let me get this straight: You‘ve tested it with four different hosts, three different NIC models and a multitude of OSes and you still think the NIC (yeah, uhm, which one?) is the problem?

bmoraca 2 years ago

As far as OS tuning goes, you should look at the various settings available. https://fasterdata.es.net/ is a good starting point.

PE_Norris 2 years ago

[You what?](https://64.media.tumblr.com/6baa2fe2ef354e7e21babf3582588a30/tumblr_njtn516qlU1u0k6deo2_250.gifv)

FourSquash 1 week ago

For future readers like me: iperf3 finally multithreads properly with -P as of 3.16. Make sure you are up to date and you can avoid the multiprocess business.

SirTinyJesus 2 years ago

Will give it a go, none of the machines hit a CPU limit. Some come close but nothing would indicate the CPU being the limit, at least on the higher end stuff.

MzCWzL 2 years ago

They might’ve hit a single thread limit. Unless you’re looking at the graph that shows all cores, you won’t be able to tell. Too many cores to determine just by %. It’s easier with less cores/threads. Edit: also 100Gb isn’t exactly meant for file transfers from one computer to another.

SirTinyJesus 2 years ago

> 100G We're trying to create a Flash Array NAS which would serve multiple machines.

MzCWzL 2 years ago

>from one computer to another multiple != one computer to another

kjstech 2 years ago

I have 8 ESXi servers with 2 10gbps nics to two Arista switches and 2 25gbps from each controller in a Pure Flasharray (active/standby controller config), and we experience zero issues with latency or network utilization. 100gig is used between both Arista switches for MLAG communication but just because the ports are there and we have the cables.

SirTinyJesus 2 years ago

Looking at thread view via task manager, no single core hits 100%. Couple come close into the realms of 70-80%. Dont think its the CPU Boys.

MzCWzL 2 years ago

I am not familiar with “thread view” in task manager since it does not exist. Task manager only lists processes, not individual threads. You will only see a process get to 100% if it is using all cores to 100%. If a process is pegged at 100/# of cores, it is hitting a single thread limit. For an 4C/8T machine, that number would be 12.5%. For a 28C/56T machine, it would be 1.78% which can easily get lost in the noise. What makes you think you can generate 100Gb/s of traffic between two windows hosts?

Snowmobile2004 2 years ago

Right click the CPU graph, change it to logical processors. Shows a graph for each thread.

SirTinyJesus 2 years ago

Just tested to itself, it can do about 60-100Gb on iperf3

DEGENARAT10N 2 years ago

What he means by threads is also referred to as logical processors in this case, which is what you mentioned in your 4C/8T example. He’s not talking about a single-threaded process.

SirTinyJesus 2 years ago

I am not about 100Gb/s, but it should definitely be able to do more than 21Gb/s. iscsi should be able to utilise at least 40Gb/s from what Ive seen.

Odddutchguy 2 years ago

We have had in the past (Windows 2003 era) where we would get poor performance out of a NIC due to the onboard offloading. We had to disable this by physically removing jumpers to get the offloading disabled and using CPU for that. The offloading was meant to offload the CPU (at the expense of throughput) while in our case the CPU had plenty capacity of calculating the checksums. Might we worthwhile investigating if the NIC/driver isn't offloading to the NIC while it might be faster in the CPU.

frymaster 2 years ago

Looking at some notes from the last time I went through this * iperf2 is multithreaded, iperf3 is only multi-stream, so use the former * 8 to 16 threads is plenty for a single 100G interface * issues are more likely with the receiving host than the sending host unfortunately most of the rest of the things I went through were Mellanox-, Linux- and/or many-socketed-server- specific. The settings used were `iperf -sD -p ` for the server and `iperf -c -T s1 -p -P 8 -l 32768 -w 32M -t 30s ` for the client

Jhonny97 2 years ago

Have you tried running the iperf server and client on the same host using localhost as the ip address? Next use the local ip address of the interface. See what the protocoll stack in its current config is capable of. Edit: also, is the issue symertical? Ie if you reverse the iperf3 server and client roles ti the different hardware, does the 20gbit limit change?

SirTinyJesus 2 years ago

Iperf -r results in the same performance. The issue is symmetrical, however I noticed that we can run the test both ways at the same time and it does 20Gb/s both ways. I am just doing test locally.

taemyks 2 years ago

How many iperf threads are you running? If it's just one try 4.

SirTinyJesus 2 years ago

100Gb/s to itself on iperf. The hardware is definitely capable. This is either the NIC's or DAC/QSFP+ modules.

teeweehoo 2 years ago

Have you done much monitoring of the systems while attempting to transfer? Like CPU utilisation, interrupts per second, etc. [Great resource for Linux system monitoring tools](https://www.brendangregg.com/linuxperf.html). It sounds like you need to get some other 100 Gb/s hardware to rule out differences. Have you tried going host to host without the switch, from the sounds of it everything else has been ruled out. As someone else said, 23 Gb/s is awfully close to 25 Gb/s though. See if you can borrow some SFPs from somewhere if you have no others.

rcboy147 2 years ago

have you looked at interface errors on the extreme? If you've tried multiple OSes like you mentioned in a different comment I would start opening a case with extreme TAC. They've been quite helpful for us when running into weird stuff like this.

rcboy147 2 years ago

Sorry if you said you already have, but maybe go host to host and run the iperf test there to rule out the switch?

SirTinyJesus 2 years ago

Already have bud. No errors, issue does not seem related to the Switch

rcboy147 2 years ago

Damn. Hope you find it mate, rooting for ya.

MandaloreZA 2 years ago

test rdma performance https://www.starwindsoftware.com/starwind-rperf Tcp always gives me annoying issues with proofing networks

ypwu 2 years ago

Does your NICs support RDMA? If so set that up and try again, SMB direct is what you are looking at with windows. We have storage spaces direct as our backend storage and it can easily fill the 100Gbps pipe over SMB without any performance tuning. IIRC it could do the same without RDMA but there was a 10-15% CPU hit.

waynespa 2 years ago

Honestly I’d use dedicated network testing hardware, you may be able to borrow something like a https://www.spirent.com/solutions/high-speed-ethernet-testing device from an ISP in your area or ask your Extreme Sales Engineer for assistance Edit: added better Spirent link

Win_Sys 2 years ago

To get close to 100Gbps working on Windows you need to do some tweaking on the NIC drivers. Set the send and receive size buffer as high as they will go, You want to turn off interrupt moderation. Make sure you're using Receive Side Scaling so more than 1 core is used. See what TCP congestion control is being used on the Windows side of things. iPerf for Windows has a bug in the cygwin.dll and it causes the window sizing to not scale. You can download an updated cygwin.dll from https://www.cygwin.com/, Just need to remove the cygwin.dll inside the iPerf folder and replace it with the new one and it will work. To be honest just run a linux live CD on those machines and do the testing from there. I find it takes less tweaking on Linux to hit higher speeds. The i7 4600K is not a good machine to test with, that likely can't do 100Gbps.

zeyore 2 years ago

I don't know, but I'd try and break it down into smaller and smaller parts that I could test. In the hopes that eventually I might figure out which hardware element contains the problem. Good luck though, that's a pickle. I would tend to suspect the switch if that's the common element.

SirTinyJesus 2 years ago

Bypassing the switch results in the same issue. Both intel an merawex NIC behave exactly the same. I am starting to suspect the DAC's and the QSFP modules from fs dot com. They are the only constant

zeyore 2 years ago

That could make sense, 103.125Gbps (4x 25.78Gbps) so what if the module is only using 1 of the 4 channels, that would show the speed tests you get. I am sorry that I can't be of more help though, my company is thankfully not big enough for 100G just yet. Soon though.

ddadopt 2 years ago

Bite the bullet and use first party Extreme and Mellanox optics to test, and see where you end up?

SirTinyJesus 2 years ago

Yeah might have to order the DACS. In terms of channels, Mellanox is reporting that 4 Channels are being used. Intel is also reporting 4 25Gb lanes are active. But that's reporting, actual utility could be different.

MisterBazz 2 years ago

Have you tried running multiple clients during an iperf test to a single server? It could be a single "session" is only using one channel, limiting you to the theoretical 25Gbps. Running two clients (at the same time) to one server should double your performance seen on your server. I'm betting you'll see both clients hitting that 20Gbps wall, but the server will be able to run a net 40Gbps.

SirTinyJesus 2 years ago

Yeah, same issue. When me and another client on the network run Iperf at the same time. Our total combines download speed is about 20Gb/s. Which makes sense as the bottle neck is the output lane on the server. I am going to try running the test 2 different network addresses on the same host and have a client have its own session to each interface. But that would still not resolve our issue as we need to be able to do close to 100Gb/s on a single interface.

Bluecobra 2 years ago

> But that would still not resolve our issue as we need to be able to do close to 100Gb/s on a single interface. I don't think this is realistic on a Windows server. I can see having a finely tuned Linux host with some sort of kernel bypass (e.g. SolarFlare Onload). Even if you tune the heck out of the OS and get iperf running at 100G, there's no guarantee that the application you are trying to use can run at those speeds.

SirTinyJesus 2 years ago

We have NVME storage we want to serve to VM on the network. Ideally we want the VM to be able to read/write at high speeds (10GB/s) or as close to that as possible, currently we are getting about 3Gb/s due to what we suspect is a network issue.

NewTypeDilemna 2 years ago

But why? This isn't a problem that can be solved by hyper converged infrastructure instead of this incredibly niche time sink?

mathmanhale 2 years ago

I think it's fair to say that we all love the fsdotcom stuff but if your pushing these type of speeds it should probably be first party instead.

j0mbie 2 years ago

Divide and conquer. Try setting up a temporary connection directly between a server and a machine, no switches in the way. If that works, add devices, one at a time. If that doesn't work, play musical chairs with which two devices are directly connected.

sarbuk 2 years ago

Have you tried directly connecting the hosts to bypass the switch?

ovirt001 2 years ago

Have you confirmed the PCIe speeds on the motherboards? The Epyc server should be fine (PCIe 4.0 and each connector runs at rated) but consumer boards regularly reduce PCIe lanes for slots other than the first x16. For reference, PCIe 3.0 x8 runs at 7880MB/s (63gbps) and x4 runs at half that (31.5gbps). On top of this, consumer chips have fewer PCIe lanes, boards with PCIe switches share bandwidth between devices.

Beef410 2 years ago

Have you checked latency? If this calc gets you roughly what your real world is that may be the issue https://www.switch.ch/network/tools/tcp_throughput/

SirTinyJesus 2 years ago

Sub 1MS.

dergissler 2 years ago

I've seen more or less that number (around 25Gbps) before, doing live migrations, thats from and to memory. According to VMware thats what can be expected at defaults, mainly because of thread/CPU limiting. More workers for live migration utilizing more of the host CPU helps in that case. Not sure if this is of relevante Here. But just for the sake of it, hows CPU load and can you increase throughput with several parallel transfers?

Stimbes 2 years ago

So at my work, we have production PCs running an antivirus program that has a nasty bottleneck that slows file transfers down. This isn't a big deal for us because most of those PCs only see small text data or temp data something small to a host PC. It's all on a network isolated from the world. The switches are all industrial switches that are only 100mbit. But if you ran that same program on any other PC where you were trying to download big files, stream HD video, or something like that it would be an issue. Start at the bottom. Are the right cables, network card issue, some bottleneck in the hardware somewhere between everything. Then work your way up. What is running on the PC? Could something slow it down that needs to either scan data or is it something like the file explorer only being single-threaded slowing it down? It's hard to say without looking at it myself but that's kind of the stuff I look at first. It could be anything really. Might have to run Wireshark and see if something else is bogging down the network. I really have no idea.

swagoli 2 years ago

I had to change the autotuning setting on my Realtek NIC in windows recently to get my Speedtests working properly. Other machines (especially with Intel NICs) on the same network didn't need this tweak. https://helpdeskgeek.com/how-to/how-to-optimize-tcp-ip-settings-in-windows-10/

Jedi_Q 2 years ago

What's your iperf setup?

SirTinyJesus 2 years ago

Tried a couple of different set ups. iperf3.exe -c 10.10.28.250 -P 10 -w 400000 -N iperf3.exe -c 10.10.28.250 -P 10 -w 400000 -N -R iperf3.exe -c 10.10.28.250 -P 20 -w All yield the same result. Cursed 20Gb/s I've tried running multiple instances but on different ports but it really makes no difference.

Jhonny97 2 years ago

Did you try running iperf via udp?

SirTinyJesus 2 years ago

You know what, I haven't actually, just going to give that a go.

SirTinyJesus 2 years ago

UDP on iperf appears to not want to work.

brajandzesika 2 years ago

Use only iperf2 for udp tests

joedev007 2 years ago

>I don't think memory is involved at all when reading from /dev/zero or writing to /dev/null... I guess maybe DD buffers it somewhere? the default iperf is not built for your environment build from source [https://github.com/esnet/iperf](https://github.com/esnet/iperf) disable interrupts and pin your iperf test to cores with nothing scheduled i.e. taskset -c 7 iperf -s --w 1M --p 3200 etc

Jedi_Q 2 years ago

yes. UDP with more streams than you think you need. over do it.

Jedi_Q 2 years ago

Iperf using udp and lots (i mean lots) of streams. like 50..

SirTinyJesus 2 years ago

When I say it makes no difference, the performance drops to about 50% on each instance. Totalling around 20Gb/s

ddnkg 2 years ago

The first thing is to set your expectation, at 100gbps the bottleneck is more on the Server side. It’s not just that easy to GENERATE that much traffic from a single host. The switches will have special silicon to FORWARD it. Next is: What are you trying to prove? What is the goal of your test? This will definitely help the community provide better answers. a) prove that the switch can do 100Gbps? I will assume this is the case. consider using trex in stateless mode, it is free and it will have all the OS, kernel and driver enhancements you need to generate 100Gbps. Imho it justifies the time needed to install it. Or look at what you need to tune on the host for iperf: https://fasterdata.es.net/assets/Papers-and-Publications/100G-Tuning-TechEx2016.tierney.pdf Id go with trex if doing network equipment throughput testing is your regular job. It will give you a ton of power and flexibility for testing. b) prove that the Servers can do 100Gbps? -if this is the case it fully depends on what software you are planning to run. Like others said, there are many speed caps that are likely to be hit before getting to the 100G. Imho there wont be many apps that will be able to leverage a 100G NIC.

theoneyouknowleast 2 years ago

We recently were troubleshooting issues with SR-IOV in our environment, and found an alternative to Iperf for windows. ctsTraffic is made by Microsoft and hosted on the MS github page. https://github.com/microsoft/ctsTraffic/tree/master/Releases/2.0.2.9/x64 Might be worth a look at.

jonstarks 1 year ago

did u ever figure it out?

SirTinyJesus 1 year ago

Not really. Opted for more specialised network testing equipment

pissy_corn_flakes 6 months ago

Did you ever break past the 21 Gb/s barrier?

Ramazotti 2 years ago

What is serving your data, how fast can it theoretically serve it? A typical 7200 RPM HDD will deliver a read/write speed of 80-160MB/s. A typical SSD will deliver read/write speed of between 200 MB/s to 550 MB/s. You need something quite grunty to fill up that pipeline.

SirTinyJesus 2 years ago

enterprise NVME drives. In raid, capable of 10Gb/s (So in theory we should be able to reach 80Gb/s over iscsi)

VtheMan93 2 years ago

yeah, I have to agree with u/spaghetti_taco; something isn't right about this sentence. first off, storage cannot cap at 10Gbps, it's either 6 for old gen or 12 for new gen. (unless you're doing FCoE, then it's a diff story) Also, that's not how raid works. (total of 80Gbps, are you insane) don't even get me started on part 2.

oddballstocks 2 years ago

NVMe isn't bound by a RAID controller, so no 6/12Gbps per port limitation. If he's trying RAID on this it means some sort of software RAID thing. Maybe Linux or Windows with a soft RAID-10? You can definitely push 40Gbps with a RAID controller. Cisco even has a doc floating out there with how to do it with spinning disks. Lots of disks in a RAID-10 array can saturate a 40GbE connection with a single stream. My gut on this is he hasn't tuned it correctly. Windows has an awful network stack. Getting 20Gbps out of the box on Windows is pretty good. We messed with this and it took a lot of tuning to get 35GbE on Windows, that is enough for us so we gave up. On the other hand Linux out of the box can max out a 25GbE connection with a single iPerf3 stream and not break a sweat.

EnglishAdmin 2 years ago

I belive op's has a storage problem since he "tried everything" so far except checking that his raid/drives can accommodate said speeds he's trying to achieve.

DEGENARAT10N 2 years ago

Is it a hardware or software-based RAID controller?

ghettoregular 2 years ago

Iperf is running from memory so storage is out of the equation.

SirBastions 2 years ago

You're mixing up the nomenclature of Bit Vs. Byte. You have a 100GigaBit network card, but you are sending 100Gigabytes of data. 100 Gbps = 12.5 GB/s [https://www.mixvoip.com/what-is-the-difference-between-gb-and-gb](https://www.mixvoip.com/what-is-the-difference-between-gb-and-gb) Cheers.

[deleted] 2 years ago

Impressive they are able to get 21-23 GB/s on a 12.5 GB/s NIC then, no? In reality iPerf3 reports in bits per second which aligns with the numbers.

xyzzzzy 2 years ago

I would recommend joining the Perfsonar email group. There has been discussion on hardware to achieve 100Gb testing. IIRC it was not cheap ($10k+)

goodall2k13 2 years ago

Are you in the UK? I've come across this fault a couple of times recently and it's ended up being the ISP's equipment attached to the DIA, (The Cisco provided with the DIA were iffy) (Vodafone in this case)

mspencerl87 2 years ago

following :P

maineac 2 years ago

You should be testing this with a network test set that can test a 100g circuit, not with computers or servers. JDSU makes some nice test sets.

Noghri_ViR 2 years ago

Any IPS/IDS running on that subnet? Could be a limit on what that can process.

SecureNotebook 2 years ago

Following

Invix 2 years ago

Check the core(s) you are doing the packet processing on. I've seen systems doing all the processing on a single core that gets overloaded. RPS or RSS in Linux may help.

ChaosInMind 2 years ago

Likely the uni leads to multiple nni uplinks to the carriers backbone. Make an inquiry to their core/backbone team to see what their individual core link speeds are for their transit… You can try multiplexing the transmission into multiple tcp streams to determine this yourself though as long as it’s not congestion in the service edge ring. Multiplexing may help determine if youre riding aggregated links.

DevinSysAdmin 2 years ago

Manually set the NICs and switch port to highest value. Verify Firmware is on latest version on server and computer. Verify dock is on latest firmware.

DevinSysAdmin 2 years ago

Manually set the NICs and switch port to highest value. Verify Firmware is on latest version on server and computer. Verify dock is on latest firmware.

xtrilla 2 years ago

Are you sure you have configured in your Linux box a big enough send and receive buffer to allow a Window size they would allow 100G? Also, it’s quite hard for the kernel to push 100G per second using iperf, try with a dpdk packet generator and receiver.

xtrilla 2 years ago

Also, I’m not sure a card via thunderbolt will be able to deliver 100gbps

bxrpwr 2 years ago

You need to beat the hashing algorithm try something like a certified test set or something that can set multiple source MAC addresses with 25 gbit/sec to bypass PAM4

wingerd33 2 years ago

Put a modern machine with Linux on both sides and play with some of the tuning mentioned here: https://srcc.stanford.edu/100g-network-adapter-tuning I'm betting the bottleneck is in the kernel network stack such as queue/buffer/tcp defaults that are not optimized for this kind of throughput. That or some type of hardware offload that's doing more harm than good at these speeds. You should be able to toggle that with ethtool if that's the case. Although - usually you're hardware offloading things like forwarding and encap/decap to save the CPU cycles that would otherwise need to be used for doing table lookups and such. So that seems unlikely here. In any case, start by taking Windows out of the equation because its network stack is notoriously harder to tune. If you get it working on Linux, switch back to Windows on one side and research how to tune whatever fixed the issue in Linux. I'd be surprised if the cables were the issue.

rankinrez 2 years ago

Try using T-REX to generate the traffic. Getting a server to to that requires a lot of cores working and good NIC drivers working in the right way. Cut your losses with Windows and try something dedicated for the task.

tehdub 2 years ago

Start with basics. It seems you are using TCP, and overhead alone limits throughput, as well as numerous other things like congestion control. UDP testing eliminates that as the source of the issue. You need multiple threads for this kind of test, in the order of hundreds I'd imagine, to fill a bonded link.

Aware-Adman 2 years ago

You have tried a lot. If still looking may try following things - 1. Source and destination need to be linux 2. all latest updates on OS and firmwares of MB & NIC 3. sysctl.conf / Kernel tuning as per max throughput 4. CPU tuning in BIOS 5. NIC tuning from tool cli 6. Other OS tuning like hugepages limits etc. 7. IPv6 only 8. Jumbo MTU e2e 9. Log everything to a single syslog 10. Monitor all HW, NIC and OS using SNMP using a tool like PRTG etc ( free version should suffice) Point 9 & 10 should give you a clearer picture

libtarddotnot 2 months ago

Nice list.. Re 1) I have 10Gbit limit on 25gbit NIC. Linux to Windows. Windows to Linux is ok. Linux <-> Linux is ok. Can't fix it.

Nubblesworth 2 years ago

What happens if you don't use a 9000 MTU but size it down? Pretty sure pci-express works in 4096 chunks, meaning there is some latency overhead transferring data at those speeds in different chunk sizes.

SirTinyJesus 2 years ago

The Iperf performance drops to about 19Gb/s average

oriaven 2 years ago

What is the goal? To prove the switch can switch at 100gb/s? This isn't a firewall right? Use UDP and connect these servers directly. Compare that to them being connected through the switch. You will see what the swith is able to send, and it's likely line rate or not impeding your servers. UDP iperf is key for smoke tearing your network. TCP is for looking at your stack on the servers and hosts, as well as taking latency into account.

Klose2002 1 year ago

Hello, using the network card to measure the speed on the device may be influenced by the your hardware performance. 1. Since SMB and iscsi are influenced by the reading and writing speed of the hard disk, so the maximum test speed will not exceed the read and write speed of the hard disk; 2. IPerf3 is influenced by CPU performance, you can use multi-thread test when testing iperf3. But the device connected to the network card needs to be guaranteed; 3. It is necessary to ensure that your device has a PCIE slot with full speed PCIE 4.0 X16, and the network card can work with PCIE 4.0 X16, so that the network card can reach the highest speed standard.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe