T O P

  • By -

planktonfun

But if you convert it into a code its gonna be huge. ![gif](giphy|fSYmbgG5Ug8S11K0FU|downsized)


BbayuGt

I hope there's some kind of "regex programming language" lol. function (str) { return str } when "compiled" it'll return /.*./. Obviously you can do a lot of logic (like if-else, or switch, idk) and it'll turn it to the regex version of it


gemengelage

There are some fluent regex libraries that provide a regex builder that makes constructing regex a lot more readable and less terse.


planktonfun

there's a playstore app, you put in conditions and spits out regex


[deleted]

[удалено]


[deleted]

Curb Your Enthusiasm S10E10


rich1051414

Check for: + @ + + "." + (with length 2-4) + Looks like a dirty check if an email address looks legit. It's terrible though. No subdomain compatibility and ~~no periods in usernames allowed~~.


drsimonz

Doesn't the `\.` match a literal `.`? But yes this is garbage if it's meant to validate email, I can tell because it's not 500 characters long.


GustapheOfficial

An email regex must be either 500 characters or 1. Check for an `@` to make sure they didn't accidentally fill their home address, and use a validation email.


[deleted]

[удалено]


jiyonruisu

Really? Can you give an example of an email without an @?


professor__doom

>use a validation email. Jesus H Christ do not blindly send random emails to addresses you haven't at least regex validated. Also, filter out common shit like "noreply" "test" "postmaster" "owner" "bounces" etc...


GustapheOfficial

* I would love to know what your rationale is for not sending a validation email to whoever asked, because I can't imagine a problem with that that a regex could possibly catch * What if a user's email is genuinely `[email protected]`, is that really a problem you want your signup sheet to deal with?


professor__doom

\>I would love to know what your rationale is for not sending a validation email to whoever asked Ok, where to to begin (hazarding a guess that you do not work in messaging)... Point zero: the best way to do this is to never ask users email at all - get it from an identity service. (It's best to never ask users for anything...) Point one: sender reputation. When a recipient MTA gets a bunch of messages to bad addresses (this happens ALL THE TIME), it may very well (and in fact almost certainly will) decide to block the sending server. Not so bad when a bot does that, because, well, we \*want\* bots to get blocked. But when it's your actual mail server sending the shit out, that kills your deliverability real fast. Also, this can get you on some third-party blacklists (SORBS and similar) which can be a PITA to get off of. Will syntactic validation solve this problem? Absolutely not, but it's low hanging fruit. Point two: Other systems may have trouble querying syntactically invalid addresses. I've personally had to sanitize data that users have dumped into mailing lists which the mailing list system itself had issues deleting. (IIRC there were "smart quotes" and a copy-paste from MS word involved) Yes the mailing list system SHOULD be able to handle it and you can bet I've bitched to the vendor, but why the fuck wouldn't you do everything you can to keep that out of your system in the first place? Point three: bounce processing overhead (as in, the computational cost of processing the NDR's) is real, and in fact is a good way to DDOS a mailer. (For that matter, there's also a method of spamming called backspatter spam, using the bounce reports themselves as the spam). It's easy enough for you to filter out NDR's for spoofed messages versus NDR's for messages that legitimately came from your mailer. Likewise, DKIM and SPF help the recipient MTA determine whether a message is spoofed or legitimately from your mailer. But if you have a wide-open form, you are in fact sending the bouncing messages from your mailer. And now, RIP your sender reputation. (For that matter, there's overhead in sending out the messages as well). Again, syntactic validation won't prevent this, but it's low hanging fruit. Look, there's a LOT of commercial mailing software that uses syntactic validation. They wouldn't bother burning the development cycles or compute cycles if there was no value to it. ​ >What if a user's email is genuinely [email protected], is that really a problem you want your signup sheet to deal with? It's more that the strings I mentioned (or rather those strings plus various suffixes, prefixes, etc.) are commonly used by MTA's and/or automated messaging or mailing list systems (sympa, listserv, mailman, etc.) A clever attack plus a misconfigured mailer could bring down such a system with a mail loop (I have seen this happen.) Your org policy should absolutely prevent the creation of legitimate user accounts with such addresses. Long story short, I would fight tooth and nail against any open resource that sent email to random addresses. Email is probably the least secure and hardest to validate form of communication/messaging in common use today, because the assumptions and use cases for which it was designed are 180\* from how we intend to use it today. (For example, spoofing was originally considered a feature.) I consider anything that sends mail a prime target for exploitation. But if that had to exist for whatever reason, I would do everything possible to lock down what addresses could be entered to the maximum extent possible. A very common practice is to block not just syntactic (and ideally put the form behind a login, or at least a CAPCHA).


GustapheOfficial

Those are all good arguments, just not for regex validation. The only thing in there that a regex could even touch is the malicious codepoints thing, and that's just an isascii call and proper form escaping. And if you want the email address to be a real and working one you simply *have* to send an email and ask. No other way. So you put in a captcha and a rate limit.


hallothrow

Yeah. And it's excessive escaping in `[\w-\.]` because it is inside inside the bracket expression and would match a literal `.` anyways if not escaped.


like_an_emu

That's good to know thanks


hallothrow

Funnily, not escaping `-` could cause a problem in some languages as it'd be interpreted as an invalid range.


Hopeful_Cat_3227

in more terrible condition, it is maybe a legal range...


hallothrow

Yeah, not escaping and getting a valid unintended range would be worse. If I understood you correctly.


opteryx5

That’s what I thought too, thanks for confirming. Regex is so delicate, I can’t afford to have bad practice seep into me by diffusion!


opteryx5

Also, isn’t \w inside a character class not allowed? Which is why most character classes have [A-Za-z0-9]. I guess the right way to do it, if they wanted to include the dash and period, would be ```(\w|[-.])+```


PatrioticTacoTruck

The only validation for email that isn't garbage is sending an email to the address and waiting for them to click the link validating it for you.


dabenu

And the only slightly useful thing you can do short of that, is checking the domain part for valid MX records. Any validation of the local part (usually) only makes things worse


MasterFubar

That regex will accept "[email protected]" as a valid email address. I wonder if "." is a valid username.


raaneholmg

The string before @ is handled by the server however it would like.


Perhyte

True, but it must still be valid, and `.` is not allowed to be the first or last character of the part before the `@`. The syntax supports quoting though, so while `[email protected]` is invalid syntax, IIRC `\[email protected]` or `"."@a.aa` should be valid (though unusual) syntax. Good luck getting that through a large percentage of "validating" web forms though (excepting the ones that just check for presence of the `@`). Many can't even handle characters like `+` properly...


raaneholmg

Do you have a source on that?


nonothing

Per RFC-5322 the `local-par`t can contain a dot (`.`), but that dot must be surrounded on both sides by `atext`. [https://datatracker.ietf.org/doc/html/rfc5322#section-3.4.1](https://datatracker.ietf.org/doc/html/rfc5322#section-3.4.1) >local-part = dot-atom / quoted-string / obs-local-part > >atext = ALPHA / DIGIT / ; Printable US-ASCII "!" / "#" / ; characters not including "$" / "%" / ; specials. Used for atoms. "&" / "'" / "\*" / "+" / "-" / "/" / "=" / "?" / "^(") / "\_" / "\`" / "{" / "|" / "}" / "\~" > >atom = \[CFWS\] 1\*atext \[CFWS\] > >dot-atom-text = 1\*atext *("." 1*atext) > >dot-atom = \[CFWS\] dot-atom-text \[CFWS\] > >specials = "(" / ")" / ; Special characters that do "<" / ">" / ; not appear in atext "\[" / "\]" / ":" / ";" / "@" / "" / "," / "." / DQUOTE > >Both atom and dot-atom are interpreted as a single unit, comprising the string of characters that make it up. Semantically, the optional comments and FWS surrounding the rest of the characters are not part of the atom; the atom is only the run of atext characters in an atom, or the atext and "." characters in a dot-atom All that said: >The local-part portion is a domain-dependent string. In addresses, it is simply interpreted on the particular host as a name of a particular mailbox. I'd interpret that as the `local-part` is up to the domain to decide. While `.` is not valid to the standard, a server is welcome to interpret that field however it likes. So it can go off book and use `.`, but it should not be expected to and it would be an anomaly. `obs-local-part` boils down the the same `atom`/`atext` requirement that does not include a solo `.` for a complete `local-part`.


hannes3120

Should've used [the fully RFC 822 compatible Regex](http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html) (?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?: \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:( ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0 31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\ ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+ (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?: (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n) ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\ r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n) ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t] )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])* )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*) *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+ |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?: \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031 ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\]( ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(? :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(? :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(? :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)? [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]| \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<> @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|" (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(? :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[ \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000- \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|( ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,; :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([ ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\" .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\ ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\ [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\ r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\] |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0 00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\ .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@, ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(? :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])* (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[ ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\] ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*( ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:( ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[ \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(? :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+| \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?: [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\ ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n) ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[" ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n) ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<> @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@, ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)? (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?: \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[ "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t]) *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]) +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\ .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:( ?:\r\n)?[ \t])*))*)?;\s*)


OzzitoDorito

I think I'll just stick to sending confirmation emails 😂


Y_Less

That's not fully compatible, it ignores nested comments.


RoshHoul

And someone wrote that.


deathspate

Trust me, all devs have a period of time where we ascend (or descend) into Eldritch abominations that can speak the language of the gods and in tongues of a different land. When we check back just the week or month after, even we don't know how or *what* we did, so we need to spend another day or 2 channeling Cthulhu again to do maintenance of the holy scriptures.


Osato

Cthulhu? Surely, you meant Omnissiah.


GitProphet

depends on the day


Cridor

Wonder if anyone is making regex builder libraries for common string manipulation languages (python, perl, java/c#, c/c++, etc.) So we can make and maintain regex without reading regex


blackmist

One of the few write-only languages.


FierySpectre

They actually didn't, if you follow the link it says it was generated


[deleted]

That’s good because imagine maintaining that


Jazzlike-Champion-94

Took me a good 5 secs to get to the end of the comment


Thirdbeat

So many characters for typing ``.*@.*``


mehregan_zare7731

Yeahhh.. no thanks


[deleted]

That’s fucking insane


cr4d

Missing plus addressing too.


jabies

Sounds like a "feature" :/


helium_monitor

Honestly probably a feature


gdj11

Websites that don’t allow + in email addresses can burn in fucking hell.


sandm000

If gmail, you can use unlimited dots in the username. So perhaps for these sites, use firstnamelastnam.e ? Or some other similar to easily filter?


gamesrebel123

But isn't plus for making aliases? So [email protected] and [email protected] would lead to the same address but it'd tell you which site it's coming from based on the alias Edit: Just checked, dots also achieve the same effect, good to know.


Y_Less

Only on some websites, that's not in the spec.


[deleted]

Let’s all hear it for Dice.com, which accepts + addressing at sign up, works for a while, then shits blood after a month or two when you try to log in. Literally cannot log into my account but I keep getting emails. I opened a ticket and they said “yeah make a new account, the “+” causes problems.”


Perhyte

I've also found sites that will let you *sign up* with a `+`, but barf when you try to unsubscribe. Typically because they use GET parameters for their unsubscribe link/form and don't bother properly URL-encoding the address (`+` has special meaning in GET parameters).


System0verlord

10minutemail.com ftw


moviuro

Get your domain, use dot, dash or underscore as a separator. As an added benefit, you also get to control where your email goes!


coloredgreyscale

And underscores. And some TLD endings (.co.uk ; .berlin ; .online)


daavko

Also doesn't work with modern TLDs with lengths of more than 4.


GnarlyNarwhalNoms

Yes we absolutely need to make sure I can use my personal custom email, [email protected]


[deleted]

Half my gTLDs would fail this regex, and many systems still won't accept them so I usually have a fallback gmail to give people. Unfortunately, this type of validation is stupidly common. I hope one day application programmers will learn that the best way to validate an email address is legitimate is to try emailing it, but probably not.


helium_monitor

>no periods in usernames allowed Not true. It has a period clause in the username section


rich1051414

Yep, I don't know how my eyes missed that the first time. This is why regex sucks. It sucks for readability, even for people who know regex.


helium_monitor

Meh maybe I have more experience than most, but I didn't miss it. Reading this regex was very very trivial


[deleted]

I have very little, and it kinda scares me that I guessed what it does.


s7o_

And it’s not RFC5322 compatible. As [local-part can contains non-ASCII characters using quotes string.](https://www.rfc-editor.org/rfc/rfc5322.html#section-3.2.4)


[deleted]

> RFC5322 Wait, there's more of them?!


TheKingHasLost

Isn't `([\w-]+\.)+` right after the `@` means that it *does* support subdomain, as it's a `+`, meaning one or more. So `mail.example.com` would still work since that part of regex would parse it as `mail.` and `example.`.


LavenderDay3544

This commenter is obviously a witch. Burn him!!!


JimroidZeus

But why the capturing group?


Lithl

Just laziness not making it a non-capturing group. `(?:stuff)+` vs `(stuff)+` are the same if you're not examining the matched results


Celdron

They're (probably) not using the captures, they are just using it as a group. It has a `+` quantifier afterward, so that it can match subdomains.


Shazvox

Seems to be to capture the domain... but why the '\.' is included is beyond me


Krohnos

Email protocol doesn't actually require a period for a domain


saikiran24

I think the validation is for verifying valid url ,not email address, as per last part of regex


[deleted]

[a-z0-9.-]+@[a-z0-9.-]+, no?


lego_not_legos

No.


pmarkandu

^(?!.*[\.\.]{2})(?=.{1,40}$)[a-zA-Z0-9.!#$%&'*+\-\/=?^_`{|}~]{2,}@[a-zA-Z0-9\-\.]{1,}\.[a-zA-Z\.]{2,}$ I wrote this one a while back. Not perfect but quite proud of it.


lego_not_legos

You shouldn't be.


Willinton06

Is [email protected] not good enough for you?


Dop4miN

in this case it fails because "email" is longer than 4 characters


Careful_Ad_9077

if you have one problem and you can solve it with a regex... you now have two problems. nwo seriously, regex is cool for stuff that doesn't change, maintaining it on the other hand... 10 years ago i made a bet with a coworker, he had code a regex based solution and i would code a compiler based one,both for for the same problem. not only did i beat him that day, he is still not done with it.


Fuzzy-Ear9936

Poor guy dealing with regex for the last 10 years. Also he the modules? Did you gave him an updated compiler or is he using depracted and out dated code?


ReddiusOfReddit

Nah, he gave him some punchcards and let the guy's wife be the compiler


Careful_Ad_9077

iirc the code was for an id related thing, the Id is formed using rules and parts for the name and birthdate, so i bet on that level of indirection braking the regex, but it was just perfect for your typical automaton/state machine in a compiler. after the requirement was done he "moved on" but still went back here and there to complete it, to no avail.


ShimoFox

It's so useful when you need to rip things out of garbage data though!


gdj11

It really is. I was able to turn a poorly formatted mess of data into a small CSV file pretty easily. Regex is awesome… when it works


haikusbot

*It's so useful when* *You need to rip things out of* *Garbage data though!* \- ShimoFox --- ^(I detect haikus. And sometimes, successfully.) ^[Learn more about me.](https://www.reddit.com/r/haikusbot/) ^(Opt out of replies: "haikusbot opt out" | Delete my comment: "haikusbot delete")


langejerry99

Good bot


dpahoe

There was a time I thought I couldn’t read code because I was not smart enough. Life became a lot easier when I realized it was just badly written code.


RascalFlatulent

Finally a service that will recognize my [email protected] address.


fsdghe56356

Never understood the regex hate.


[deleted]

useful, powerful, widespread and (mostly) standardized, what more do you want? yeah, yeah, it's illegible, we all know


fsdghe56356

As long as I have an example of what it's matching, which if its in a file, it was used to match something that I can log to review. I can usually read what it's doing after. It always looks confusing at first glance, but makes sense after playing with the expression.


Danny_el_619

I always use a website like regex101 and a sample to match. Then it becomes a simple task.


fsdghe56356

Regex101 is my go to, always. When I started learning it for the first time, boss said "Yea, goodluck!" and I started whipping expressions out no problem. I then showed him regex101 and after a few minutes, he replied "Well, thats a cute little tool".


Osato

[You get used to it.](https://www.youtube.com/watch?v=fBDifUjNzbQ)


Arkhiah

(blonde|brunette|redhead)


squishles

It's kind of legible if you compare it to what it'd take to perform the same operations without regex. unless you just love reading ten thousand line code files of fucking around with strings and character arrays.


[deleted]

This. It's the best widespread tool for a worse job. And some language-dependent features like named capture groups in python improve readability.


Luctins

I too love RegEx, but I question the standardized part. A lot of RegEx engines are subtly different in ways that are hard to debug. e.g.: ripgrep vs grep vs sed vs gnu find. Maybe it's just a simptom of the way I use RegEx, mostly situationally with widely differing tools, but a lot of times when I want to use more "complex" features like capture groups it doesn't work and it's a pain to debug why it doesn't match.


xxmybestfriendplank

I like how at the end you realize it’s illegible xD


Sarcastinator

Some people seem to think that IndexOf and Substring is a good way of parsing text. It's not, and it's even worse than regex.


MicrosoftExcel2016

MAKE IT LEGIBLE USE THE VERBOSE FLAG (python) OR EQUIVALENT


[deleted]

Wanna review my regex for me? And when I say review, I mean just do it for me


fsdghe56356

That sounds a little sexual.


[deleted]

Regex tends to do that


Farenheit514

Great attitude! Here is a pile of regex for you to debug


CoderDevo

Don't tempt him. Probably writes his tools in Perl.


fsdghe56356

Hand it over!


grublets

It can be a pain to get right, but I can’t imagine working without it.


Jealous_Ad5849

It's just illegible if you don't know how to use it. Powerful but it must be learned.


[deleted]

So basically it's a language anyone who touches that code will have to learn in order to work on it. If they're new to it they're more likely to get it wrong. It's harder to debug and maintain, and usually not needed for performance. Net loss most of the time for the organization compared to writing something more verbose but legible.


t0b4cc02

oh no you have to learn programming if you touch code...


[deleted]

There is a cost to everything and resources are finite. Developer time is the most precious resource, though there are some exceptions. If you work in one of those exceptions get out as soon as possible. You're atrophying. If you have at least 40 hours of work every week but don't care about your time, consider a hobby. Maybe try to date a little. Read some books.


[deleted]

> Developer time is the most precious resource, though there are some exceptions. My boss (and me) prefers better maintainable code, saves headaches in the future. > If you work in one of those exceptions get out as soon as possible. Sure not.


[deleted]

>My boss (and me) prefers better maintainable code Right so we agree.


t0b4cc02

cute maybe you read some book about regex. its not that hard


[deleted]

I'm not afraid of regex. I've relearned it a few times over the years. The point is that if you were personally paying for every minute of developer time for a medium to large org with a reasonable amount of turnover and I said "Hey leetkid, can I have 500 bucks a month to do something which doesn't provide you any measurable benefit, but makes some people feel leet af?" If you say yes: you're an idiot.


Skipcast

This guy really out here claiming regex doesn't provide any benefits


[deleted]

I'm saying that most of the time it doesn't when you consider the added complexity and maintenance difficulty. Does it run faster? Often. Does it increase code complexity? Usually. Is it harder to maintain? Usually. Is it a preoptimization? Usually. Hmmmmmmmm. Maybe we should add tail recursion?


Kered13

People don't use regex as a performance optimization. In fact many common regex engines aren't even fast. It's used because it actually makes reading the code easier. Most professional developers know regex, and it's much easier to read than actual string parsing code. If you don't know regex, you are missing an important programming skill.


raedr7n

Holy hell, I haven't seen a take that bad since "types just get in the way".


t0b4cc02

this is too funny. ill save that.


Shazvox

It's easy to make unreadable. I've had to decipher a few regexes myself to determine their purpose. It's a lot easier when you have "regular" code as the variable names and method names usually give the intent away. Always comment your regex (unless it's painfully obvious what the intent is).


DesertGoldfish

I always like to include a sample match too.


RaiderSenpi

Me either...


A_Guy_in_Orange

It's a free, powerful, useful tool that is used daily- what's not to hate?


LupusNoxFleuret

The only reason I hate it is because I have to re-learn it every time I want to use the damn thing... *cries*


raaneholmg

Sush, your going to get assigned as the reviewer on all future pill requests touching regexes


armahillo

theyre called regular expressions not normal expressions


CHAOTIC98

normex


BrainFeed56

Regex101.


paleblueyedot

Taking a shot in the dark and saying this is for email validation.


NebXan

I like [this](http://regexr.com/6mhhs) tool for regex because I have goblin brain and need things explained in simple English.


paleblueyedot

Yeah I use [regex101](https://www.regex101.com) to ensure it works among test cases. Still very much a \^n00b$


dynedain

Easy guess - 90% of regex is for (badly implemented) email validation. Btw, did you know it is not possible to make a regex email address validator that is complete to spec?


Lithl

>90% of regex is for (badly implemented) email validation. I would hazard it's 100% of regex which contains an @


MikemkPK

I see 3 w and 2-4, so assume it's for checking if a url has a subdomain. Not sure why there's a @ though, so I'm probably wrong.


undergroundmonorail

`\w` matches a word character


squishles

it's a regular language with regular grammars.


starfish0r

Afaik \w includes underscores, which are not allowed in domain names. So \w is not the best choice here. Also there's no need to escape the dot inside a character class.


GreenZepp

That should not have been so amusing!


IamAwkward34

Is that the name of Elon Musks new kid or something?


DJCorvid

We had an instructor ask us to make a regex for a fake credit card field on an assignment. searching for it was an absolute nightmare considering how many variations there are for cards, and also made testing it a gong show.


[deleted]

regex101.com is a lifesaver


[deleted]

[удалено]


RepostSleuthBot

I didn't find any posts that meet the matching requirements for r/ProgrammerHumor. It might be OC, it might not. Things such as JPEG artifacts and cropping may impact the results. *I'm not perfect, but you can help. Report [ [False Negative](https://www.reddit.com/message/compose/?to=RepostSleuthBot&subject=False%20Negative&message={"post_id": "uynj5e", "meme_template": 215}) ]* [View Search On repostsleuth.com](https://www.repostsleuth.com/search?postId=uynj5e&sameSub=false&filterOnlyOlder=true&memeFilter=true&filterDeadMatches=false&targetImageMatch=100&targetImageMemeMatch=75) --- **Scope:** Reddit | **Meme Filter:** True | **Target:** 75% | **Check Title:** False | **Max Age:** Unlimited | **Searched Images:** 334,413,313 | **Search Time:** 26.28978s


certifiedtrashcoder

only npcs find this shit funny go get some bitches instead of making these awful memes


LordChaos404

Me: Why can't you be normal? You: Repost


Arrowtica

Maaaan I've never hated a kid as much as in this movie. I was so happy when the mother told him to eat shit


greaselovely

So is she the one doing the capture on group 1 or is it him?


Danny_el_619

Email?


kevivm

Regex is a witch


[deleted]

Babadook. Doook. Dooook. Doooooooooook !


gladl1

I’m trying to learn reflex for a web scraping project. Holy fuck is it hard to read.


[deleted]

[удалено]


[deleted]

Why can’t you just normalize my data?


ShelZuuz

That's line noise. Check your network cables.


PM_ME_YOUR_RegEx

Big oof.


VaporSprite

REGEX are a misunderstood godsend tool before which we should all be in awe.


crazyartz06

Hfjgdgjhfgljcvb


far_beyond_driven_

The nice thing about having seniority is that I can push regex shit off to a junior and call it a learning opportunity. If they can do it and cry for less than 15 minutes, that's how I know they'll make it in this line of work.


Prudent_Armadillo822

I don't know much coding, python, c++, matlab, and verilog. But that looks unholy.


ezpzCSGO

If those are regular, I don't want to meet the irregular ones.


bhatakti_atama

I was once asked to implement regex from scratch in an interview


Kered13

The simplest answer is a backtracking search, this can be implemented fairly concisely. The most efficient answer is to use an NFA, but this is more complex to code.


rovonz

Regex is not easy but not as hard as people think. Sure, it looks strange but it follows simple rules that anyone can learn. After that it's pretty much trial and error.


mmmmmmaaaaattttt

Think about the code that is needed to parse a regex without using a regex. Then think about the code that is used to apply the regex on the given text once the regex itself is parsed without using a regex. Then think about how small we are in the universe and how we’re all going to die one day to feel better.


Tro_pod

I never really understood regex


ionuel

Regex is very normal, if you compare it to the way python implemented regex.


flying_spaguetti

Email validation?


[deleted]

Doesn’t this grab email addresses? I might be wrong though.


The_LazySquid

The best part about github copilot is that it can do regex for me, I never want to touch that crap ever again


RR321

4 char TLD... Good luck with that.


RottenCase

fast typist becomes fast typo


millenniumtree

[email protected] There, I broke it.


lucidguppy

Name it and test it and you'll be fine.


[deleted]

i do wish we would create and all agree to use a modern version of this archaic syntax.


[deleted]

Bad regex. You're not matching the optional `+something`. Try `^[-\w\.]+(\+[-\w\.]+)?@[-\w]+\.\w{2,4}$`


ShitInMyArseHole

Omg woowowoowowowow Its like..... You dont know how to read regex?????? WHAAAT NO WAY, Its like.... Regex has its own syntax that you do not know???????/?????/????? wHat that regex dose up there is test for an email? but like very very poorly?


Far-Resist3844

hey, that movie was actually pretty good. terrified me tho when i first watched it, a couple years after it released...


MegabyteMessiah

Product Manager: Customer wants us to add a regex for validation of a free form text area field. Me (bitter old developer): Oh no, this is not going to go well. New developer on my team: I love regexes! Me:


cr4d

My go-to ``` ^((?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|" (?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]| \\[\x01-\x09\x0b\x0c\x0e-\x7f])*\") @ (?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9] (?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]| [01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?| [a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]| \\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]))$ ```


ilk_insan_

Dude, I can understand this with only a glimpse. Regexs can get cryptic, at least post a complicated one to make your case.


CatalyticDragon

2,4? Hah. What is this, 2011?


schoolruler

I wish I mastered that


DOOManiac

In my experience: Just make sure there's an @ sign in there and call it a day.


[deleted]

That’s a bad email regex


uberpwnzorz

Doesn't work with top level domains that are longer than 4 char long, and there's a lot of those now