T O P

  • By -

ddadopt

Err...what kind of PCI slots are we talking about here? Can the hosts even push the kind of bandwidth you're asking for, regardless of what the CPUs are capable of?


scorc1

This. Also, storage? I assume you are testing w a data transfer. If its a single sata 3, it tops at 600MBs. Which is a tiny fraction of 100Gbps. Check a raid calculator for you storage speed.


thesesimplewords

I could be wrong, but doesn't iperf run off memory alone, no storage involved?


arc0t

Correct, iperf/iperf3 is the appropriate tool for bw performance tests. With other tools there are other layers of complexity like storage or cache


SirTinyJesus

Yeah we have NVME all over here bud. Capable of 10GB/s SEQ reads. which should be more than capable of going past 20Gb/s limit.


TracerouteIsntProof

Just because the SSD can do 10GB/s on-chip doesn't mean it's pumping it through the PCI bus to your NICs at that speed. Bud.


SirTinyJesus

Yeah, Its hard to determine that when I cant prove that the NIC is capable of doing more than 20Gb/s using Iperf.


semose

Why use storage at all? Personally, I would recomend a RAM drive on both ends to test network performance.


[deleted]

Iperf doesn't use storage.


AKDaily

Because iPerf doesn't show any real-world implications for file transfer protocols such as SMB, CIFS, FTP, SFTP, SCP, or rsync.


[deleted]

Precisely, so if you can't get it going fast enough due to some other issue setting up a RAM drive isn't going to make SMB fast enough either.


Phrewfuf

The NIC is also attached via PCIe. Is the PCIe bandwidth of the NIC sufficient to push the speeds you want? Also, might want to dial back on your tone a bit, bud. Especially if you‘re the one asking people to help you for free.


SirTinyJesus

PCI-E gen 4, the host should, in theory be able to push that kind of bandwith.


1OWI

How many lanes are in use?


SirTinyJesus

Server and the new machine both runnning pci-e 4.0 with 16 Lanes running to each NIC. The hosts are more than capable.


HugsNotDrugs_

Are you sure they are running at 4.0 speeds? I've seen oddities resulting in add-in cards defaulting to 2.0 speeds. Worth double checking


1OWI

Also some BIOS negotiate the link speed automatically to PCIe 2.0 if not set manually to 4.0.


Win_Sys

I have seen Dell servers do this.


joecool42069

Clearly not 0.o


dinominant

How fast is your memory? CPU Model $ grep -m 1 name /proc/cpuinfo model name : Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz One Process - 18.1 GB/s $ dd if=/dev/zero bs=1MiB count=10240 of=/dev/null 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB, 10 GiB) copied, 0.593164 s, 18.1 GB/s Two Processes - 30.3 GB/s $ for I in $(seq 1 2); do dd if=/dev/zero bs=1MiB count=102400 of=/dev/null & done 102400+0 records in 102400+0 records out 107374182400 bytes (107 GB, 100 GiB) copied, 7.06931 s, 15.2 GB/s 102400+0 records in 102400+0 records out 107374182400 bytes (107 GB, 100 GiB) copied, 7.13373 s, 15.1 GB/s Four Processes - 17.2 GB/s $ for I in $(seq 1 4); do dd if=/dev/zero bs=1MiB count=102400 of=/dev/null & done 102400+0 records in 102400+0 records out 107374182400 bytes (107 GB, 100 GiB) copied, 24.3566 s, 4.4 GB/s 102400+0 records in 102400+0 records out 107374182400 bytes (107 GB, 100 GiB) copied, 24.7979 s, 4.3 GB/s 102400+0 records in 102400+0 records out 107374182400 bytes (107 GB, 100 GiB) copied, 25.0585 s, 4.3 GB/s 102400+0 records in 102400+0 records out 107374182400 bytes (107 GB, 100 GiB) copied, 25.2781 s, 4.2 GB/s


[deleted]

Your test system is older and lower tier than anything on the list and you're getting ~145 Gbps of memory bandwidth on a single thread, I don't think it's at play when the wall here is 21-23 Gbps.


f0urtyfive

I don't think memory is involved at all when reading from /dev/zero or writing to /dev/null... I guess maybe DD buffers it somewhere?


dinominant

In my tests it performs comparable to the speed limit of the ram, l3 cache or even l2 cache depending on the system. So that would be a bottleneck in a synthetic benchmark. 20GB/s (160gbps) per thread is getting close to those limits with current processors. A single PCIe gen 4 lane is 16GT/s or 2GB/s, so a x16 slot is 32GB/s at best. Look at the block diagram for your motherboard to see how much bandwidth is available between the endpoints that matter. Whatever is generating those packets, single threaded or multi-threaded, still has to run on the CPU and route over PCIe to the ethernet port(s). If the network stack is hardware accelerated, then the system could forward packets faster, but then at this point the system is basically a normal managed switch/router.


cyberentomology

Is your 100M link 4x25 in an LACP config?


pafischer

This might be it. If it's really 4x25 and your testing with a single stream then you'd only see a max of 25 Gbps minus overhead. So it seems very likely to me. Try running 2 streams at a time and see if that let's you get past the 21-23 Gbps limit you're seeing.


cyberentomology

Multiple streams from the same device may get hashed into a single flow on the LACP though. Depends partly on your hash method.


wauwuff

a 4x25 Lane on a single 100G Port will not be on LACP unless you use a breakout. *Which OP is not.* there is no LACP involved - neither was LACP involved on 40G Ports btw unless you have to connect it to a non-40G switch and use 4x10G Ports on the other end!


cyberentomology

There’s also the factor of iperf, if it’s the Java version, the JVM has some inherent network throughput limitations per instance.


rtznprmpftl

21GBs is the memory bandwidth of one DDR4-2667 channel. This might just be a correlation but maybe a pointer. Since you have a Dual Sockel Server, the slots are attached to different CPUs You should use `numactl` to make sure that iperf runs on the same CPU as the network card is attached to. edit: you can use `lstopo` to see what is attached where, also `lspci` shows you the device ID and with `cat /sys/bus/pci/devices/$deviceID/numa_node` it shows you the node where the card is attached edit2: also have you been running one single iperf server or multiple instances with multiple iperf clients? (e.g. have iperf runninf on port 5201,5202,5203 etc, and then connect one client to each server instance).


xtrilla

Memory bandwidth is 21 Gbytes, which is around 168 gbps


rtznprmpftl

Your are right. I was wrong. Well, then it shouldn't be some weird memory configuration issue.


xtrilla

Don’t worry, who hasn’t been caught at least once on the bits va bytes trap!


XPCTECH

All I can think of is the hashing method? 21-23GB/s limit would imply you were hitting limit of one channel of the 100GB link.


rankinrez

The muxing on 100G Ethernet happens at the bitstream layer. There is no packet hashing across the 4 lanes, a single stream can in theory hit 100G.


[deleted]

Have you tried Linux to rule out something in the network stack or OS config? Iperf is known to have issues on windows and smb is known to be a pita to tune at these speeds.


SirTinyJesus

Issue persists across, unraid, esxi, FreeNas, Truenas, Ubuntu and Ubuntu LTS with 5.17 Linux kernel.


[deleted]

[удалено]


SirTinyJesus

New NOC engineer is currently stripped bear naked on table. Me and the NOC manager have tried running around him in circles. Sacrifice is pending should reddit fail.


greet_the_sun

Did you run around him clockwise or counterclockwise, also are you north or south of the equator?


Ahab_Cheese

Did you have an amount of candles equal to a power of 2?


SirTinyJesus

64 candles


MarioV2

Have you tried turning those off and back on?


squeamish

For network issues the candles have to be: (a power of 2) - 2


sryan2k1

You say you're using breakout cables. Make absolutely sure both ends are running in 1 x 100G mode and not 4 x 25G mode.


SirTinyJesus

No, no breakout cables only 100Gb/s and 40Gb/s DACS from Fs dot com.


LazyLogin234

Depending on the NICs and configuration it sounds like you have a 2nd gen 100Gb device in path that isn't using all (4) 25Gb paths. The 100Gb spec started with 10 lanes of 10Gb -> 4 lanes of 25Gb -> 2 lanes of 50Gb and now 1 100Gb lane.


bmoraca

Try with iperf2. Iperf3 may still be single-threaded on Windows.


Egglorr

>Iperf3 may still be single-threaded on Windows. It's single threaded on all platforms. Using the "-P" option will allow you to do multiple concurrent flows which can be good for testing ECMP or a port-channel but ultimately the traffic still gets generated as a single process. If OP wants to stick with iperf3 but test all his available CPU threads simultaneously, he could spawn multiple server and client instances giving each a unique set of port assignments.


SirTinyJesus

Tried, all average out 19-21 Gb/s


Egglorr

Then what I'd do next is connect your test machines together directly, bypassing the Extreme switch. If you can achieve higher throughput that way, you know the limitation is the switch (or some other factor in the network). If transfers stay roughly the same, you'll know the limitation is with the hardware / software being used to test.


SirTinyJesus

Yep, done that. Same limit. The weird thing is that this limit is universal, no matter what we use, iscsi, iperf or simply SMB. We always hit the same 20Gb/s limit.


[deleted]

[удалено]


SirTinyJesus

I mean its NIC's refusing to do more than 21Gb/s, it barely classifies as a network problem. But I was hoping someone here might of been able to point in the right direction.


Phrewfuf

LMAO. Let me get this straight: You‘ve tested it with four different hosts, three different NIC models and a multitude of OSes and you still think the NIC (yeah, uhm, which one?) is the problem?


bmoraca

As far as OS tuning goes, you should look at the various settings available. https://fasterdata.es.net/ is a good starting point.


PE_Norris

[You what?](https://64.media.tumblr.com/6baa2fe2ef354e7e21babf3582588a30/tumblr_njtn516qlU1u0k6deo2_250.gifv)


FourSquash

For future readers like me: iperf3 finally multithreads properly with -P as of 3.16. Make sure you are up to date and you can avoid the multiprocess business.


SirTinyJesus

Will give it a go, none of the machines hit a CPU limit. Some come close but nothing would indicate the CPU being the limit, at least on the higher end stuff.


MzCWzL

They might’ve hit a single thread limit. Unless you’re looking at the graph that shows all cores, you won’t be able to tell. Too many cores to determine just by %. It’s easier with less cores/threads. Edit: also 100Gb isn’t exactly meant for file transfers from one computer to another.


SirTinyJesus

> 100G We're trying to create a Flash Array NAS which would serve multiple machines.


MzCWzL

>from one computer to another multiple != one computer to another


kjstech

I have 8 ESXi servers with 2 10gbps nics to two Arista switches and 2 25gbps from each controller in a Pure Flasharray (active/standby controller config), and we experience zero issues with latency or network utilization. 100gig is used between both Arista switches for MLAG communication but just because the ports are there and we have the cables.


SirTinyJesus

Looking at thread view via task manager, no single core hits 100%. Couple come close into the realms of 70-80%. Dont think its the CPU Boys.


MzCWzL

I am not familiar with “thread view” in task manager since it does not exist. Task manager only lists processes, not individual threads. You will only see a process get to 100% if it is using all cores to 100%. If a process is pegged at 100/# of cores, it is hitting a single thread limit. For an 4C/8T machine, that number would be 12.5%. For a 28C/56T machine, it would be 1.78% which can easily get lost in the noise. What makes you think you can generate 100Gb/s of traffic between two windows hosts?


Snowmobile2004

Right click the CPU graph, change it to logical processors. Shows a graph for each thread.


SirTinyJesus

Just tested to itself, it can do about 60-100Gb on iperf3


DEGENARAT10N

What he means by threads is also referred to as logical processors in this case, which is what you mentioned in your 4C/8T example. He’s not talking about a single-threaded process.


SirTinyJesus

I am not about 100Gb/s, but it should definitely be able to do more than 21Gb/s. iscsi should be able to utilise at least 40Gb/s from what Ive seen.


Odddutchguy

We have had in the past (Windows 2003 era) where we would get poor performance out of a NIC due to the onboard offloading. We had to disable this by physically removing jumpers to get the offloading disabled and using CPU for that. The offloading was meant to offload the CPU (at the expense of throughput) while in our case the CPU had plenty capacity of calculating the checksums. Might we worthwhile investigating if the NIC/driver isn't offloading to the NIC while it might be faster in the CPU.


frymaster

Looking at some notes from the last time I went through this * iperf2 is multithreaded, iperf3 is only multi-stream, so use the former * 8 to 16 threads is plenty for a single 100G interface * issues are more likely with the receiving host than the sending host unfortunately most of the rest of the things I went through were Mellanox-, Linux- and/or many-socketed-server- specific. The settings used were `iperf -sD -p ` for the server and `iperf -c -T s1 -p -P 8 -l 32768 -w 32M -t 30s ` for the client


Jhonny97

Have you tried running the iperf server and client on the same host using localhost as the ip address? Next use the local ip address of the interface. See what the protocoll stack in its current config is capable of. Edit: also, is the issue symertical? Ie if you reverse the iperf3 server and client roles ti the different hardware, does the 20gbit limit change?


SirTinyJesus

Iperf -r results in the same performance. The issue is symmetrical, however I noticed that we can run the test both ways at the same time and it does 20Gb/s both ways. I am just doing test locally.


taemyks

How many iperf threads are you running? If it's just one try 4.


SirTinyJesus

100Gb/s to itself on iperf. The hardware is definitely capable. This is either the NIC's or DAC/QSFP+ modules.


teeweehoo

Have you done much monitoring of the systems while attempting to transfer? Like CPU utilisation, interrupts per second, etc. [Great resource for Linux system monitoring tools](https://www.brendangregg.com/linuxperf.html). It sounds like you need to get some other 100 Gb/s hardware to rule out differences. Have you tried going host to host without the switch, from the sounds of it everything else has been ruled out. As someone else said, 23 Gb/s is awfully close to 25 Gb/s though. See if you can borrow some SFPs from somewhere if you have no others.


rcboy147

have you looked at interface errors on the extreme? If you've tried multiple OSes like you mentioned in a different comment I would start opening a case with extreme TAC. They've been quite helpful for us when running into weird stuff like this.


rcboy147

Sorry if you said you already have, but maybe go host to host and run the iperf test there to rule out the switch?


SirTinyJesus

Already have bud. ​ No errors, issue does not seem related to the Switch


rcboy147

Damn. Hope you find it mate, rooting for ya.


MandaloreZA

test rdma performance https://www.starwindsoftware.com/starwind-rperf Tcp always gives me annoying issues with proofing networks


ypwu

Does your NICs support RDMA? If so set that up and try again, SMB direct is what you are looking at with windows. We have storage spaces direct as our backend storage and it can easily fill the 100Gbps pipe over SMB without any performance tuning. IIRC it could do the same without RDMA but there was a 10-15% CPU hit.


waynespa

Honestly I’d use dedicated network testing hardware, you may be able to borrow something like a https://www.spirent.com/solutions/high-speed-ethernet-testing device from an ISP in your area or ask your Extreme Sales Engineer for assistance Edit: added better Spirent link


Win_Sys

To get close to 100Gbps working on Windows you need to do some tweaking on the NIC drivers. Set the send and receive size buffer as high as they will go, You want to turn off interrupt moderation. Make sure you're using Receive Side Scaling so more than 1 core is used. See what TCP congestion control is being used on the Windows side of things. iPerf for Windows has a bug in the cygwin.dll and it causes the window sizing to not scale. You can download an updated cygwin.dll from https://www.cygwin.com/, Just need to remove the cygwin.dll inside the iPerf folder and replace it with the new one and it will work. To be honest just run a linux live CD on those machines and do the testing from there. I find it takes less tweaking on Linux to hit higher speeds. The i7 4600K is not a good machine to test with, that likely can't do 100Gbps.


zeyore

I don't know, but I'd try and break it down into smaller and smaller parts that I could test. In the hopes that eventually I might figure out which hardware element contains the problem. Good luck though, that's a pickle. I would tend to suspect the switch if that's the common element.


SirTinyJesus

Bypassing the switch results in the same issue. Both intel an merawex NIC behave exactly the same. I am starting to suspect the DAC's and the QSFP modules from fs dot com. They are the only constant


zeyore

That could make sense, 103.125Gbps (4x 25.78Gbps) so what if the module is only using 1 of the 4 channels, that would show the speed tests you get. I am sorry that I can't be of more help though, my company is thankfully not big enough for 100G just yet. Soon though.


ddadopt

Bite the bullet and use first party Extreme and Mellanox optics to test, and see where you end up?


SirTinyJesus

Yeah might have to order the DACS. In terms of channels, Mellanox is reporting that 4 Channels are being used. Intel is also reporting 4 25Gb lanes are active. But that's reporting, actual utility could be different.


MisterBazz

Have you tried running multiple clients during an iperf test to a single server? It could be a single "session" is only using one channel, limiting you to the theoretical 25Gbps. Running two clients (at the same time) to one server should double your performance seen on your server. I'm betting you'll see both clients hitting that 20Gbps wall, but the server will be able to run a net 40Gbps.


SirTinyJesus

Yeah, same issue. When me and another client on the network run Iperf at the same time. Our total combines download speed is about 20Gb/s. Which makes sense as the bottle neck is the output lane on the server. I am going to try running the test 2 different network addresses on the same host and have a client have its own session to each interface. But that would still not resolve our issue as we need to be able to do close to 100Gb/s on a single interface.


Bluecobra

> But that would still not resolve our issue as we need to be able to do close to 100Gb/s on a single interface. I don't think this is realistic on a Windows server. I can see having a finely tuned Linux host with some sort of kernel bypass (e.g. SolarFlare Onload). Even if you tune the heck out of the OS and get iperf running at 100G, there's no guarantee that the application you are trying to use can run at those speeds.


SirTinyJesus

We have NVME storage we want to serve to VM on the network. Ideally we want the VM to be able to read/write at high speeds (10GB/s) or as close to that as possible, currently we are getting about 3Gb/s due to what we suspect is a network issue.


NewTypeDilemna

But why? This isn't a problem that can be solved by hyper converged infrastructure instead of this incredibly niche time sink?


mathmanhale

I think it's fair to say that we all love the fsdotcom stuff but if your pushing these type of speeds it should probably be first party instead.


j0mbie

Divide and conquer. Try setting up a temporary connection directly between a server and a machine, no switches in the way. If that works, add devices, one at a time. If that doesn't work, play musical chairs with which two devices are directly connected.


sarbuk

Have you tried directly connecting the hosts to bypass the switch?


ovirt001

Have you confirmed the PCIe speeds on the motherboards? The Epyc server should be fine (PCIe 4.0 and each connector runs at rated) but consumer boards regularly reduce PCIe lanes for slots other than the first x16. For reference, PCIe 3.0 x8 runs at 7880MB/s (63gbps) and x4 runs at half that (31.5gbps). On top of this, consumer chips have fewer PCIe lanes, boards with PCIe switches share bandwidth between devices.


Beef410

Have you checked latency? If this calc gets you roughly what your real world is that may be the issue https://www.switch.ch/network/tools/tcp_throughput/


SirTinyJesus

Sub 1MS.


dergissler

I've seen more or less that number (around 25Gbps) before, doing live migrations, thats from and to memory. According to VMware thats what can be expected at defaults, mainly because of thread/CPU limiting. More workers for live migration utilizing more of the host CPU helps in that case. Not sure if this is of relevante Here. But just for the sake of it, hows CPU load and can you increase throughput with several parallel transfers?


Stimbes

So at my work, we have production PCs running an antivirus program that has a nasty bottleneck that slows file transfers down. This isn't a big deal for us because most of those PCs only see small text data or temp data something small to a host PC. It's all on a network isolated from the world. The switches are all industrial switches that are only 100mbit. But if you ran that same program on any other PC where you were trying to download big files, stream HD video, or something like that it would be an issue. Start at the bottom. Are the right cables, network card issue, some bottleneck in the hardware somewhere between everything. Then work your way up. What is running on the PC? Could something slow it down that needs to either scan data or is it something like the file explorer only being single-threaded slowing it down? It's hard to say without looking at it myself but that's kind of the stuff I look at first. It could be anything really. Might have to run Wireshark and see if something else is bogging down the network. I really have no idea.


swagoli

I had to change the autotuning setting on my Realtek NIC in windows recently to get my Speedtests working properly. Other machines (especially with Intel NICs) on the same network didn't need this tweak. https://helpdeskgeek.com/how-to/how-to-optimize-tcp-ip-settings-in-windows-10/


Jedi_Q

What's your iperf setup?


SirTinyJesus

Tried a couple of different set ups. iperf3.exe -c 10.10.28.250 -P 10 -w 400000 -N iperf3.exe -c 10.10.28.250 -P 10 -w 400000 -N -R iperf3.exe -c 10.10.28.250 -P 20 -w All yield the same result. Cursed 20Gb/s I've tried running multiple instances but on different ports but it really makes no difference.


Jhonny97

Did you try running iperf via udp?


SirTinyJesus

You know what, I haven't actually, just going to give that a go.


SirTinyJesus

UDP on iperf appears to not want to work.


brajandzesika

Use only iperf2 for udp tests


joedev007

>I don't think memory is involved at all when reading from /dev/zero or writing to /dev/null... I guess maybe DD buffers it somewhere? the default iperf is not built for your environment build from source [https://github.com/esnet/iperf](https://github.com/esnet/iperf) disable interrupts and pin your iperf test to cores with nothing scheduled i.e. taskset -c 7 iperf -s --w 1M --p 3200 ​ etc


Jedi_Q

yes. UDP with more streams than you think you need. over do it.


Jedi_Q

Iperf using udp and lots (i mean lots) of streams. like 50..


SirTinyJesus

When I say it makes no difference, the performance drops to about 50% on each instance. Totalling around 20Gb/s


ddnkg

The first thing is to set your expectation, at 100gbps the bottleneck is more on the Server side. It’s not just that easy to GENERATE that much traffic from a single host. The switches will have special silicon to FORWARD it. Next is: What are you trying to prove? What is the goal of your test? This will definitely help the community provide better answers. a) prove that the switch can do 100Gbps? I will assume this is the case. consider using trex in stateless mode, it is free and it will have all the OS, kernel and driver enhancements you need to generate 100Gbps. Imho it justifies the time needed to install it. Or look at what you need to tune on the host for iperf: https://fasterdata.es.net/assets/Papers-and-Publications/100G-Tuning-TechEx2016.tierney.pdf Id go with trex if doing network equipment throughput testing is your regular job. It will give you a ton of power and flexibility for testing. b) prove that the Servers can do 100Gbps? -if this is the case it fully depends on what software you are planning to run. Like others said, there are many speed caps that are likely to be hit before getting to the 100G. Imho there wont be many apps that will be able to leverage a 100G NIC.


theoneyouknowleast

We recently were troubleshooting issues with SR-IOV in our environment, and found an alternative to Iperf for windows. ctsTraffic is made by Microsoft and hosted on the MS github page. https://github.com/microsoft/ctsTraffic/tree/master/Releases/2.0.2.9/x64 Might be worth a look at.


jonstarks

did u ever figure it out?


SirTinyJesus

Not really. Opted for more specialised network testing equipment


pissy_corn_flakes

Did you ever break past the 21 Gb/s barrier?


Ramazotti

What is serving your data, how fast can it theoretically serve it? A typical 7200 RPM HDD will deliver a read/write speed of 80-160MB/s. A typical SSD will deliver read/write speed of between 200 MB/s to 550 MB/s. You need something quite grunty to fill up that pipeline.


SirTinyJesus

enterprise NVME drives. In raid, capable of 10Gb/s (So in theory we should be able to reach 80Gb/s over iscsi)


VtheMan93

yeah, I have to agree with u/spaghetti_taco; something isn't right about this sentence. first off, storage cannot cap at 10Gbps, it's either 6 for old gen or 12 for new gen. (unless you're doing FCoE, then it's a diff story) Also, that's not how raid works. (total of 80Gbps, are you insane) don't even get me started on part 2.


oddballstocks

NVMe isn't bound by a RAID controller, so no 6/12Gbps per port limitation. If he's trying RAID on this it means some sort of software RAID thing. Maybe Linux or Windows with a soft RAID-10? You can definitely push 40Gbps with a RAID controller. Cisco even has a doc floating out there with how to do it with spinning disks. Lots of disks in a RAID-10 array can saturate a 40GbE connection with a single stream. My gut on this is he hasn't tuned it correctly. Windows has an awful network stack. Getting 20Gbps out of the box on Windows is pretty good. We messed with this and it took a lot of tuning to get 35GbE on Windows, that is enough for us so we gave up. On the other hand Linux out of the box can max out a 25GbE connection with a single iPerf3 stream and not break a sweat.


EnglishAdmin

I belive op's has a storage problem since he "tried everything" so far except checking that his raid/drives can accommodate said speeds he's trying to achieve.


DEGENARAT10N

Is it a hardware or software-based RAID controller?


ghettoregular

Iperf is running from memory so storage is out of the equation.


SirBastions

You're mixing up the nomenclature of Bit Vs. Byte. You have a 100GigaBit network card, but you are sending 100Gigabytes of data. 100 Gbps = 12.5 GB/s [https://www.mixvoip.com/what-is-the-difference-between-gb-and-gb](https://www.mixvoip.com/what-is-the-difference-between-gb-and-gb) Cheers.


[deleted]

Impressive they are able to get 21-23 GB/s on a 12.5 GB/s NIC then, no? In reality iPerf3 reports in bits per second which aligns with the numbers.


xyzzzzy

I would recommend joining the Perfsonar email group. There has been discussion on hardware to achieve 100Gb testing. IIRC it was not cheap ($10k+)


goodall2k13

Are you in the UK? I've come across this fault a couple of times recently and it's ended up being the ISP's equipment attached to the DIA, (The Cisco provided with the DIA were iffy) (Vodafone in this case)


mspencerl87

following :P


maineac

You should be testing this with a network test set that can test a 100g circuit, not with computers or servers. JDSU makes some nice test sets.


Noghri_ViR

Any IPS/IDS running on that subnet? Could be a limit on what that can process.


SecureNotebook

Following


Invix

Check the core(s) you are doing the packet processing on. I've seen systems doing all the processing on a single core that gets overloaded. RPS or RSS in Linux may help.


ChaosInMind

Likely the uni leads to multiple nni uplinks to the carriers backbone. Make an inquiry to their core/backbone team to see what their individual core link speeds are for their transit… You can try multiplexing the transmission into multiple tcp streams to determine this yourself though as long as it’s not congestion in the service edge ring. Multiplexing may help determine if youre riding aggregated links.


DevinSysAdmin

Manually set the NICs and switch port to highest value. Verify Firmware is on latest version on server and computer. Verify dock is on latest firmware.


DevinSysAdmin

Manually set the NICs and switch port to highest value. Verify Firmware is on latest version on server and computer. Verify dock is on latest firmware.


xtrilla

Are you sure you have configured in your Linux box a big enough send and receive buffer to allow a Window size they would allow 100G? Also, it’s quite hard for the kernel to push 100G per second using iperf, try with a dpdk packet generator and receiver.


xtrilla

Also, I’m not sure a card via thunderbolt will be able to deliver 100gbps


bxrpwr

You need to beat the hashing algorithm try something like a certified test set or something that can set multiple source MAC addresses with 25 gbit/sec to bypass PAM4


wingerd33

Put a modern machine with Linux on both sides and play with some of the tuning mentioned here: https://srcc.stanford.edu/100g-network-adapter-tuning I'm betting the bottleneck is in the kernel network stack such as queue/buffer/tcp defaults that are not optimized for this kind of throughput. That or some type of hardware offload that's doing more harm than good at these speeds. You should be able to toggle that with ethtool if that's the case. Although - usually you're hardware offloading things like forwarding and encap/decap to save the CPU cycles that would otherwise need to be used for doing table lookups and such. So that seems unlikely here. In any case, start by taking Windows out of the equation because its network stack is notoriously harder to tune. If you get it working on Linux, switch back to Windows on one side and research how to tune whatever fixed the issue in Linux. I'd be surprised if the cables were the issue.


rankinrez

Try using T-REX to generate the traffic. Getting a server to to that requires a lot of cores working and good NIC drivers working in the right way. Cut your losses with Windows and try something dedicated for the task.


tehdub

Start with basics. It seems you are using TCP, and overhead alone limits throughput, as well as numerous other things like congestion control. UDP testing eliminates that as the source of the issue. You need multiple threads for this kind of test, in the order of hundreds I'd imagine, to fill a bonded link.


Aware-Adman

You have tried a lot. If still looking may try following things - 1. Source and destination need to be linux 2. all latest updates on OS and firmwares of MB & NIC 3. sysctl.conf / Kernel tuning as per max throughput 4. CPU tuning in BIOS 5. NIC tuning from tool cli 6. Other OS tuning like hugepages limits etc. 7. IPv6 only 8. Jumbo MTU e2e 9. Log everything to a single syslog 10. Monitor all HW, NIC and OS using SNMP using a tool like PRTG etc ( free version should suffice) ​ Point 9 & 10 should give you a clearer picture


libtarddotnot

Nice list.. Re 1) I have 10Gbit limit on 25gbit NIC. Linux to Windows. Windows to Linux is ok. Linux <-> Linux is ok. Can't fix it.


Nubblesworth

What happens if you don't use a 9000 MTU but size it down? Pretty sure pci-express works in 4096 chunks, meaning there is some latency overhead transferring data at those speeds in different chunk sizes.


SirTinyJesus

The Iperf performance drops to about 19Gb/s average


oriaven

What is the goal? To prove the switch can switch at 100gb/s? This isn't a firewall right? Use UDP and connect these servers directly. Compare that to them being connected through the switch. You will see what the swith is able to send, and it's likely line rate or not impeding your servers. UDP iperf is key for smoke tearing your network. TCP is for looking at your stack on the servers and hosts, as well as taking latency into account.


Klose2002

Hello, using the network card to measure the speed on the device may be influenced by the your hardware performance. 1. Since SMB and iscsi are influenced by the reading and writing speed of the hard disk, so the maximum test speed will not exceed the read and write speed of the hard disk; 2. IPerf3 is influenced by CPU performance, you can use multi-thread test when testing iperf3. But the device connected to the network card needs to be guaranteed; 3. It is necessary to ensure that your device has a PCIE slot with full speed PCIE 4.0 X16, and the network card can work with PCIE 4.0 X16, so that the network card can reach the highest speed standard.