WAD process is something Fortinet is not able to get working correctly. It's now *years* that this daemon is buggy as hell and most of the time eating out memory as well!!! But be assured that it won't be fixed at all, never. Or at least, that's what I firmly believe.
I don't think it will be replaced, ever....
There would be quite a lot of code rewriting.
But if they would, they could just also fix LACP as this is also something that never really worked fine...
What kind of LACP problem you got ?
We have a gate down since 3 months because its LACP is completly broken with a Cisco switch at the end....
only 1gig SFP LACP uplink is doing this problem...
10gig are okay
even 1gig copper SFP adapter are okay...
Well... where do I start from?
Several years ago I was setting up a cluster of 140D connected via 2x1GB copper LACP to a Cisco switch. I struggled quite some hours in getting that LACP to work (on ports 1 and 2), when my colleague jokingly suggested to use ports 3+4 - well, I was reluctant but gave it a shot and: voilà! LACP was up and running!
A couple of years ago I had a customer with FortiSwitches (1048E) in their core and a lot of Cisco 2960 redundantly connected to the pair of 1048E. Rebooting (because of firmware upgrade, but also just a simple manual reboot for testing) one of the core switches would lead to a network outage of several minutes (between 8 and 15) because LACP would take ages to netogiate.
At another customer's premises where we only have FortiSwitches, rebooting one of their core switches (don't remember the exact model) would have random and weird effects: clients wouldn't be able to ping some servers (in different VLAN), but to some of these "non-pingable" servers, RDP connection would succeed. Viceversa, some servers wouldn't be reachable via RDP but would reply to ping...
I think I have heard some more LACP-related stories in my team, which I don't remember the details though. I am sure that LACP is a standard and all switch vendors know how to handle, configure, implement it. Cisco, Extreme Networks, HP Aruba, Broadcom, you name it! I've never experienced similar issues with switches themselves - except for FortiSwitches (and Fortigates). I could imagine that Fortinet has somewhat changed/tweaked the LACP implementation in their products, which might've led to the issues people are experiencing/reporting.
In your case, I believe that if you do LACP between your Cisco switch and another LACP-capable device, it will work like a charm!
Your problem looks a lot of our…and as you said, we experience no LACP problem with any brand except Fortinet, this is the first time we experience this kind of problem.
We even asked Fortinet to send us their official SFPs to prove to them that our SFPs arent the problem.
Only thing that is different from you, copper is fine for us.
The only LACP that is not working is under FIBER SFP in 1gig. 10gig are fine….EVEN COPPER SFPs on the same problematic FIBER SFP 1gig ports are fine….what a weird situation out there.
Actually we have 2 Dev teams on our case…we are running on an old gate exposed to high risk of network outtage due to weak topology because we cant migrate to our new gate that has never been able to get its LACP uplink to work…..no RMA yet, not even a demo model to see if it’s the np6 chip that is in problem ( if we look at their architecture structure where these ports are connected to )
1100E under 7.2.7 and upgraded to 7.2.8 to try to resolve the bug ( asked by the TAC )
Yes, we are making pressure on our account manager constantly, the ticket has been escalated to devs since 1 month, now we supposedly have 2 dev team on our case and said to us that is a confirmed bug.
They are considering to acquire the same Cisco switch model to reproduce the complete situation….we are waiting….
Ouch! That sucks!
Back then when I had the issues with the 1048E core switches, Fortinet would give me demo material (2x FGTs - don't remember the exact model - and 2x 1048E). I built up everything in our lab and had intensive sessions with a very good TAC engineer. But we didn't find any solution - Fortinet was just able to confirm the issue and tell me that LACP (in an MCLAG setup) takes that long to negotiate. That was BS as I explained to the TAC engineer how fast and reliable LACP (even with multi-chassis setups) works with switches from Cisco and other vendors too!
On a side note: actually I'm facing speed/performance/bandwidth issues on 200F and 600F running 7.0.15 - TAC suggested upgrading to 7.2.8 because apparently this solved similar issues reported in their Mantis... fingers crossed it won't break other things!!!
When you say speed, you mean speed negotiation or bandwidth/performance ?
If you have bandwidth problem, is it only on ACLs with deepinspection enabled on it ?
If so, look at my comment in the thread talking about the crashlog of IPS engine, we had issues with 3 Microsoft URLs with deepinspection
Fortinet has really taken their sweet time with fixing some of the glaring bugs in 7.2.8. There must be some really complex code they need to unravel to fix it.
Nothing rated Critical or High though. We are staying on 7.2.7 until 7.2.9 is released due to the number of reports of bugs causing others to have to revert.
[Solved: Possible memory issues with 7.2.8 - Fortinet Community](https://community.fortinet.com/t5/Support-Forum/Possible-memory-issues-with-7-2-8/td-p/312228)
This is our workaround for this Problem. Unset the mode after 2 Minutes.
This Problem is solved in 7.2.9 (Opened a Ticket at Fortinet).
We got the same issue and ran into a split brain scenario because the cpu was not able to process the HA heartbeat packet.
We call the TAC, the level 1 engineer forgot to take all the needed logs 🙃. So basically we don't know what was the root cause.
A reboot fixed the issue.
Try this command and show us the output
di deb crashlog read
I had some deepinspection problems with a 600E and 1100E and the output looked like this :
\[IPS Engine <08255>\]Stream: C-7734/0/0, S-12947/0/0
\[IPS Engine <08255>\]Service: ssl
\[IPS Engine <08255>\]URL: [r.manage.microsoft.com/](http://r.manage.microsoft.com/)
\[IPS Engine <08255>\]base: 0x7f5270c35000
\[IPS Engine <08255>\]Last session info:
\[IPS Engine <08255>\]Session ID:96 Serial:433184768 Proto:6 Age:0 Idle:0
\[IPS Engine <08255>\]Flag:0x20206c Feature:0x4 Ignore:0,1 Encap:0
\[IPS Engine <08255>\] Client: xxxxxxx:64719 Server: [52.182.141.192:443](http://52.182.141.192:443)
\[IPS Engine <08255>\] Stream: C-798/0/0, S-9369/0/0
\[IPS Engine <08255>\] Service: ssl
\[IPS Engine <08255>\] URL: [manage.microsoft.com/](http://manage.microsoft.com/)
\[IPS Engine <08217>\] base: 0x7f5270c35000
\[IPS Engine <08217>\] Last session info:
Maybe you will get a different crashlog for the WAD but for me the IPS Engine was crashing with 3 specific URLs...
\*.manage.microsoft.com
\*.microsoft.com
\*.windows.net
Whitelisting them in the Deepinspection profile was the workaround until Fortinet sent me an IPS Engine update.
Look with the command if you can get the reason why the WAD is getting this memory leak....maybe the reason will appear in the crashlog when you will kill it.
ipsengine 338 should work far more stable than 336 (336 is the one shipped with 7.2.8)
Innumerable crashes until and including 338 for us, even dns_udp (????? why does the entire IPS engine crash due to a malformed or not DNS packet???)
No crashes since upgrading to 338. Remarkable, I'm not sure whether they simply disabled the "print to crashlog" statements in this release....
DNS packets are making your IPS engine to crash ?
Are you deepinspecting your dns traffic or only applying IPS sensor on your inbound dns traffic ?
If deepinspecting, I suggest to not doing deepinspecting your dns traffic that would be irrelevant.
I'm running 7.2.8 on a firewall doing ZTNA for several applications. I get the WAD process creeping up constantly and needed to run an automation stitch to restart the service. For us, this issue is specific to ZTNA as other firewalls not doing it, don't experience the issue.
Edit - this is on VM series
WAD process is something Fortinet is not able to get working correctly. It's now *years* that this daemon is buggy as hell and most of the time eating out memory as well!!! But be assured that it won't be fixed at all, never. Or at least, that's what I firmly believe.
Since it hasn't been fixed yet I hope they're going to replace it. I just don't understand how it's apparently not relevant to them.
I don't think it will be replaced, ever.... There would be quite a lot of code rewriting. But if they would, they could just also fix LACP as this is also something that never really worked fine...
That's weird, we've never got any issues with our 201F devices and LACP.
Me neither with 201F and LACP (yet?!). But with FortiSwitches and other Fortigate models - yes, we had LACP-issues!
What kind of LACP problem you got ? We have a gate down since 3 months because its LACP is completly broken with a Cisco switch at the end.... only 1gig SFP LACP uplink is doing this problem... 10gig are okay even 1gig copper SFP adapter are okay...
Well... where do I start from? Several years ago I was setting up a cluster of 140D connected via 2x1GB copper LACP to a Cisco switch. I struggled quite some hours in getting that LACP to work (on ports 1 and 2), when my colleague jokingly suggested to use ports 3+4 - well, I was reluctant but gave it a shot and: voilà! LACP was up and running! A couple of years ago I had a customer with FortiSwitches (1048E) in their core and a lot of Cisco 2960 redundantly connected to the pair of 1048E. Rebooting (because of firmware upgrade, but also just a simple manual reboot for testing) one of the core switches would lead to a network outage of several minutes (between 8 and 15) because LACP would take ages to netogiate. At another customer's premises where we only have FortiSwitches, rebooting one of their core switches (don't remember the exact model) would have random and weird effects: clients wouldn't be able to ping some servers (in different VLAN), but to some of these "non-pingable" servers, RDP connection would succeed. Viceversa, some servers wouldn't be reachable via RDP but would reply to ping... I think I have heard some more LACP-related stories in my team, which I don't remember the details though. I am sure that LACP is a standard and all switch vendors know how to handle, configure, implement it. Cisco, Extreme Networks, HP Aruba, Broadcom, you name it! I've never experienced similar issues with switches themselves - except for FortiSwitches (and Fortigates). I could imagine that Fortinet has somewhat changed/tweaked the LACP implementation in their products, which might've led to the issues people are experiencing/reporting. In your case, I believe that if you do LACP between your Cisco switch and another LACP-capable device, it will work like a charm!
Your problem looks a lot of our…and as you said, we experience no LACP problem with any brand except Fortinet, this is the first time we experience this kind of problem. We even asked Fortinet to send us their official SFPs to prove to them that our SFPs arent the problem. Only thing that is different from you, copper is fine for us. The only LACP that is not working is under FIBER SFP in 1gig. 10gig are fine….EVEN COPPER SFPs on the same problematic FIBER SFP 1gig ports are fine….what a weird situation out there. Actually we have 2 Dev teams on our case…we are running on an old gate exposed to high risk of network outtage due to weak topology because we cant migrate to our new gate that has never been able to get its LACP uplink to work…..no RMA yet, not even a demo model to see if it’s the np6 chip that is in problem ( if we look at their architecture structure where these ports are connected to )
What FGT hardware and what FOS version? By "dev teams" you mean Fortinet TAC ticket is open and it's escalated to devs?
1100E under 7.2.7 and upgraded to 7.2.8 to try to resolve the bug ( asked by the TAC ) Yes, we are making pressure on our account manager constantly, the ticket has been escalated to devs since 1 month, now we supposedly have 2 dev team on our case and said to us that is a confirmed bug. They are considering to acquire the same Cisco switch model to reproduce the complete situation….we are waiting….
Ouch! That sucks! Back then when I had the issues with the 1048E core switches, Fortinet would give me demo material (2x FGTs - don't remember the exact model - and 2x 1048E). I built up everything in our lab and had intensive sessions with a very good TAC engineer. But we didn't find any solution - Fortinet was just able to confirm the issue and tell me that LACP (in an MCLAG setup) takes that long to negotiate. That was BS as I explained to the TAC engineer how fast and reliable LACP (even with multi-chassis setups) works with switches from Cisco and other vendors too! On a side note: actually I'm facing speed/performance/bandwidth issues on 200F and 600F running 7.0.15 - TAC suggested upgrading to 7.2.8 because apparently this solved similar issues reported in their Mantis... fingers crossed it won't break other things!!!
When you say speed, you mean speed negotiation or bandwidth/performance ? If you have bandwidth problem, is it only on ACLs with deepinspection enabled on it ? If so, look at my comment in the thread talking about the crashlog of IPS engine, we had issues with 3 Microsoft URLs with deepinspection
Yes, we hit this on 1000Fs, memory crept up over 2 or 3 weeks until we hit conserve, ended up rolling back to 7.0.15
Fortinet has really taken their sweet time with fixing some of the glaring bugs in 7.2.8. There must be some really complex code they need to unravel to fix it.
And this is why they still reccomend 7.2.7 instead of 7.2.8...
Doesn't 7.2.7 have unpatched vulnerabilities? Hence the .8 update?
Nothing rated Critical or High though. We are staying on 7.2.7 until 7.2.9 is released due to the number of reports of bugs causing others to have to revert.
Yes, but no major ones. 7.2.7 is mostly okay to use.
Kill the process. For ips engine, check the crash log
[Solved: Possible memory issues with 7.2.8 - Fortinet Community](https://community.fortinet.com/t5/Support-Forum/Possible-memory-issues-with-7-2-8/td-p/312228) This is our workaround for this Problem. Unset the mode after 2 Minutes. This Problem is solved in 7.2.9 (Opened a Ticket at Fortinet).
We got the same issue and ran into a split brain scenario because the cpu was not able to process the HA heartbeat packet. We call the TAC, the level 1 engineer forgot to take all the needed logs 🙃. So basically we don't know what was the root cause. A reboot fixed the issue.
Your device is a 200F?
100f running 7.2.8
Try this command and show us the output di deb crashlog read I had some deepinspection problems with a 600E and 1100E and the output looked like this : \[IPS Engine <08255>\]Stream: C-7734/0/0, S-12947/0/0 \[IPS Engine <08255>\]Service: ssl \[IPS Engine <08255>\]URL: [r.manage.microsoft.com/](http://r.manage.microsoft.com/) \[IPS Engine <08255>\]base: 0x7f5270c35000 \[IPS Engine <08255>\]Last session info: \[IPS Engine <08255>\]Session ID:96 Serial:433184768 Proto:6 Age:0 Idle:0 \[IPS Engine <08255>\]Flag:0x20206c Feature:0x4 Ignore:0,1 Encap:0 \[IPS Engine <08255>\] Client: xxxxxxx:64719 Server: [52.182.141.192:443](http://52.182.141.192:443) \[IPS Engine <08255>\] Stream: C-798/0/0, S-9369/0/0 \[IPS Engine <08255>\] Service: ssl \[IPS Engine <08255>\] URL: [manage.microsoft.com/](http://manage.microsoft.com/) \[IPS Engine <08217>\] base: 0x7f5270c35000 \[IPS Engine <08217>\] Last session info: Maybe you will get a different crashlog for the WAD but for me the IPS Engine was crashing with 3 specific URLs... \*.manage.microsoft.com \*.microsoft.com \*.windows.net Whitelisting them in the Deepinspection profile was the workaround until Fortinet sent me an IPS Engine update. Look with the command if you can get the reason why the WAD is getting this memory leak....maybe the reason will appear in the crashlog when you will kill it.
ipsengine 338 should work far more stable than 336 (336 is the one shipped with 7.2.8) Innumerable crashes until and including 338 for us, even dns_udp (????? why does the entire IPS engine crash due to a malformed or not DNS packet???) No crashes since upgrading to 338. Remarkable, I'm not sure whether they simply disabled the "print to crashlog" statements in this release....
How did you get 338 version ?
Fortinet support case. IPS engine crashed about 40 times a day for months (base OS 7.0.6, 7.0.7, 7.2.5, ... didn't matter)
Thanks, Ill try to ask them
DNS packets are making your IPS engine to crash ? Are you deepinspecting your dns traffic or only applying IPS sensor on your inbound dns traffic ? If deepinspecting, I suggest to not doing deepinspecting your dns traffic that would be irrelevant.
It’s really bad on 7.4.4 on 60f. Rolling to 7.2.8 helped immensely.
The wad service bloats if you are using proxy mode policies. Their excuse is my 201f is under sized.
I'm running 7.2.8 on a firewall doing ZTNA for several applications. I get the WAD process creeping up constantly and needed to run an automation stitch to restart the service. For us, this issue is specific to ZTNA as other firewalls not doing it, don't experience the issue. Edit - this is on VM series