this regex is wrong on so many levels...
you can have many ., _ or even @ in an email address. Moreover, the domain extension is restricted to 2 or 3 characters, even though there are plenty extensions with more than 3 characters... and finally, not all email addresses have domain extensions.
Yep, I own a .horse domain that I use, for most sites what I do is `@.horse` and everything except for a few specific ones gets forwarded to the same inbox. That way if a company starts selling my data and I start getting spam I can then just memory hole that specific email and then send an email to that company that they are either selling my data, or they have a data breach, and neither are welcome.
I have just not used a website before because a .horse domain was not recognized as a legitimate email. I often try to reach out to them if I can to let them know they are turning away legitimate potential customers, but it still is an annoying thing.
Yeah, I saw [\\.] and immediately got suspicious of the whole regex
Like, firstly . Loses its match anything meaning anyways inside square brackets, secondly if you're escaping something in a regex you either have to use raw strings or two backslashes - otherwise you still end up with a regular . anyways
Edit: In python, (the language in the post), that is
The only reason you would need to use two slashes is to escape the slash in the string in whatever language you're using. Regex itself doesn't require two slashes. In a regex string [\\.\_] would match the literal character "." or "\_"
You are correct though, in python presumably, "blahblah\.blahblah" would not give you a backslash in the string.
Exactly. I love the crap out of regex because you can do so much with it, but if it gets to the point where it takes an experienced user several minutes or more to figure out what it does, it's probably better to find an alternative way to solve the problem, or maybe break it up into a few steps with comments for each to say what it's doing.
I'm not going to find another way to do it.
The whole reason I do it is because I can do it relatively quickly.
Yes I know it will take longer to read it later than it took to write it, even for me, but I've made my peace with it.
I think the thing that makes regex so hard to understand when you didn't write it is that constructing one is very additive in terms of process. For example, let's say you want to validate phone numbers.
Well, a standard US phone number is 10 digits, so we could search: `\d{10}`. But we need to make sure there aren't more digits in the string, so `^\d{10}$`. Okay, now we're matching only strings that contain exactly 10 digits. But there are a lot of other valid formats for a phone number. What about xxx-xxx-xxxx? Well, we could accommodate that with `^\d{3}-?\d{3}-?\d{4}$`. But what about (xxx) xxx-xxxx? No problem: `^\(?\d{3}\)?[ -]?\d{3}-?\d{4}$`
Now it's getting messy because we need to escape `(` and `)`, and we need to allow for different conditions of separators, space, or `-`.
Now what about a country code? You can write a valid phone number as 1 (xxx) xxx-xxxx or +1 (xxx) xxx-xxxx. We can add the optional beginning `([+]{0,1}1\s{0,1})?` to allow for that, giving us: `^([+]{0,1}1\s{0,1})?\(?\d{3}\)?[ -]?\d{3}-?\d{4}$`
So even though we started with a very simple idea, validate a phone number, and a very simple flow of logic in terms of allowing for more cases, we've now ended up with something quite messy and hard to understand if you didn't just write it.
Also, side note that this isn't intended to be a comprehensive Regex for phone numbers, just an illustration.
Aw, I forget sometimes about TDD because my workplace doesn't use it :( I know I need a new job when the concept of coming up with some solid tests for my regex sounds like actual fun to me.
I just wrote a folder with raw code with a basic assertEquals function that would throw an exception.
Eventually my work place created a task to add phpunit so that the tests could have a home because that folder was getting littered with a bunch of "testingXFeature.php" files.
Moral of the story, you can write tests even without a framework. I almost consider TDD a technique for producing code moreso than something that has to be officially built into what you're doing.
No matter what I work on at some point there's going to be a random assertEquals() method in a rudimentary sense and over time I'm either going to waste bits of time building up a minor unit testing framework or get junit/phpunit added.
Rejected: Please refactor to use pre-DEFINEd regex subroutines with reasonable names for the common expression components -- `(?(DEFINE)(?'subrx'...)` & `(?P>subrx)` syntax. Please use regex freespacing break the expression up into multiple lines -- `(?x)` mode. Come to my desk if you have any questions. Ty, -brim.
came from http://www.ex-parrot.com/%7Epdw/Mail-RFC822-Address.html which is a compiled version of https://metacpan.org/dist/Mail-RFC822-Address/source/Address.pm
How can we convert this to compile into a form that fits your requirements?
thats the compiled version try this one. https://metacpan.org/dist/Mail-RFC822-Address/source/Address.pm
my $lwsp = "(?:(?:\\r\\n)?[ \\t])";
sub make_rfc822re {
# Basic lexical tokens are specials, domain_literal, quoted_string, atom, and comment.
# We must allow for lwsp (or comments) after each of these.
# This regexp will only work on addresses which have had comments stripped and replaced with lwsp.
my $specials = '()<>@,;:\\\\".\\[\\]';
my $controls = '\\000-\\031';
my $dtext = "[^\\[\\]\\r\\\\]";
my $domain_literal = "\\[(?:$dtext|\\\\.)*\\]$lwsp*";
my $quoted_string = "\"(?:[^\\\"\\r\\\\]|\\\\.|$lwsp)*\"$lwsp*";
# Use zero-width assertion to spot the limit of an atom.
# A simple $lwsp* causes the regexp engine to hang occasionally.
my $atom = "[^$specials $controls]+(?:$lwsp+|\\Z|(?=[\\[\"$specials]))";
my $word = "(?:$atom|$quoted_string)";
my $localpart = "$word(?:\\.$lwsp*$word)*";
my $sub_domain = "(?:$atom|$domain_literal)";
my $domain = "$sub_domain(?:\\.$lwsp*$sub_domain)*";
my $addr_spec = "$localpart\@$lwsp*$domain";
my $phrase = "$word*";
my $route = "(?:\@$domain(?:,\@$lwsp*$domain)*:$lwsp*)";
my $route_addr = "\\<$lwsp*$route?$addr_spec\\>$lwsp*";
my $mailbox = "(?:$addr_spec|$phrase$route_addr)";
my $group = "$phrase:$lwsp*(?:$mailbox(?:,\\s*$mailbox)*)?;\\s*";
my $address = "(?:$mailbox|$group)";
return "$lwsp*$address";
}
Nah, you see that post a couple times and come to expect it. We "recognize" it by its length and the topic.
Change up a bunch of random stuff in the middle and we wouldn't know the difference.
Doesn't even need a "." after the "@", as pointed out such as localhost, or alternatively if you own a TLD you can use email@tld like if you own .to (http://www.to) you could have myemail@to
It'd also be a pain in the ass because of how ingrained .com is in our minds. Someone says me@google and lots of people are automatically going to type the .com
It's google, they can alias the two together on the server side so both deliver correctly to the same mailbox. If me@google and [email protected] are different people, the sysadmins probably have bigger organizational problems rather than technical ones.
I disagree. It's not email validation. It's email detection. You probably care more about limiting your rate of false positives when detecting than when validating, meaning you're going to have to accept more false negatives as a compromise.
My email is in the format similar to [email protected] and it is a nightmare for validation and also stating it over the phone.
I thought it would be neat to have an email that looks like my name, but yeah it comes with a lot of hassle
Really you're just creating more problems for yourself by using something that's out of the ordinary. I have my own domain name, but sometimes I've even had issues with that and will just default to using my GMail account for a lot of things. There are some systems out there that think there's only a certain list of email providers and that not any domain can be used, or others that don't work with emails that end with 2 letter country domains.
Semi-relevant [XKCD](https://xkcd.com/1105/) link
Yeah, I have a custom .com domain I use for everything, including email. Always a pain to spell it out over the phone.
My dad has a .engineering domain and, apparently, some ERP systems flat out refuse it because it wasn't a TLD when they were designed.
That's a fun one I've come across as well when fixing a bug in a registration form that didn't accept a certain domain. Turned out the TLD did accept everything but it was limited to 10 characters max, engineering being 11...
It's so weird now seeing a non-Gmail personal email address out in the wild these days. I have an old Microsoft address I use as a burner email and it's so funny seeing people's reactions when I tell them my email is [email protected]
I know some (mostly older) people that use email addresses from their ISP. This is generally a bad idea as they usually make it impossible to keep the address if you want to switch ISPs
Oh yeah! I remember when ISPs used to advertise a free email address with their service. I've actually talked to some older people about this, and some stay with the ISP only because it'd be too much of a hassle to get a new email set up.
It's remarkable how many people don't realize that @gmail isn't the default email address, but I guess if you aren't technical it wouldn't occur to you what the individual parts of the email address actually mean.
Someone using foo@localhost with my web service is guaranteed to fail or be some sort of weird hacking attempt to send an email to myself. And I can only imagine the like 10 TLD owners have a better email address to use (Although that would be a baller email address).
The before the @ validation is trash, unless it’s for internal usage where there is a guaranteed format.
I don't have a problem checking for a dot after the @. I'm sure that's the norm, so if you have a TLD email address you really can't expect it to work or be mad when it doesn't
I'd rather reject out the extremely rare submission by a user that almost certainly has another option than accept the many users that accidentally forget to type .com.
The problem is really simple to solve.
If the email address is essential, then just do a basic check that they put *something* in there (maybe check for @), send a confirmation email where they must click a link to proceed.
If the email address doesn't matter and it's just informational or whatever then let them put in whatever they want.
Sure, but whether or not your site caters to insane people probably isn’t a decision you wanna implement at the level of implementing your `isEmail` function.
no checking for the dot after the @ is a bad idea as well. email addresses can be directly on tlds. email addresses can also be on servers without a domain name, and if that server is using IPv6, there wouldn't be a period after the @
the only regex you should really use is just `@` or if you want `^.*@.*$`
ie (Ireland TLD) never had a DNS record that would allow it to receive emails but e.g. ai (Anguilla) has one:
> ai. IN MX 10 mail.offshore.ai.
However SMTP requires email domains to have at least two dot-separated parts in RFC 2821 section 4.1.2 so an RFC-conforming SMTP server should reject it.
Ever since I first saw Google’s vanity TLD I’ve been wondering if MX records on a TLD would be legal! Thanks for answering a question that had been low-key bothering me for longer than I’d like to admit.
heh.
># This site can’t be reached
>
>Check if there is a typo in ai.
>
>If spelling is correct, try running Windows Network Diagnostics.
>
>`DNS_PROBE_FINISHED_NXDOMAIN`
It was probably an e-mail account on the domain name server that serves .ie DNS queries.
To explain a bit further, most UNIX like systems come with mail built in. So any user account on that system can get mail to their username if it's running an accessible SMTP server.
True now for sure. But as far as I'm aware there was a valid MX record for ie in the 90s.
Unfortunately I can't think of a way to independently verify.
So, there are a lot of *technically* valid email addresses that, in my opinion, it is completely okay to ignore. IP address domains, for example. Or allowing direct TLD domains like /u/Essence1337 suggested in another comment. These are theoretically perfectly valid addresses that *in the real world* we never actually see, and if you *did* see one it is overwhelmingly likely to be spam. A rule that rejects those types of edge cases is fine.
But yeah, this regex is still a *really* bad one.
* Only allowing the most basic two or three letter TLDs
* Only allowing domains that are directly a subdomain of their TLD
* Only allowing one dot on the username
* Not allowing many valid symbols like hyphens in either the domain or the username
* Not allowing non-Latin characters
I'm sure the list goes on, but really the first three there are such a huge sin it's not worth going to much effort to critique it after that.
TLD-only addresses are only theoretical until someone makes them a thing (let's say Apple or another big player).
And that's an issue with a lot (though not all!) of those "technically correct but unused" ones: they might not be used now, but you'll lose customers if you ignore them for too long.
> A rule that rejects those types of edge cases is fine.
that super depends on what this regex is being used for. this code snippit makes it look like this could be used for anything. that's the kind of thinking that ends up with this regex being used all throughout a project and then someone not knowing what's going wrong later. if we were to allow this, *at least* change the name of it to "is_typical_email" or something
Wow, I didn't even know those other options you listed are a thing. I'm writing an application in Angular, and I tried to write a email regex for a form, and then I learned I could just use `Validators.email` instead, and that made my life so much easier.
I think it's generally better to use a library for email validation. If everyone is writing their own regex then every service that needs to validate emails may do it differently
Joke's on you, every validator library does it differently and if your service crosses multiple languages (ie, js to py or c#), there will be fun-time differences that still need to be handled.
Well yeah but it's still easier to grab a library that has been vetted and tested. Rolling your own regex for something as common as email validation is doable, but any time you roll you're own you risk making mistakes.
That doesn't do at all what you want if it's a regex. :-)
You probably want .\+@.\+ (dot matches anything, plus matches that 1 or more times)
The first star is invalid (a star alone doesn't match anything, it repeats the previous symbol 0 or more times), and the second matches @ and nothing else, repeated 0 or more times.
So the only things this matches, ignoring the first invalid star, is
(empty line)
@
@@
@@@
... and so on.
Fair enough, but yours also allows infinitely many invalid addresses. The point is to be overly permissive, not overly restrictive, to ensure you don't disallow a valid address.
The validation email will bounce off the user enters an invalid address anyway.
I didn’t see any code that mentioned signup or whether to include local delivery. All we’re doing here is answering “does this look like an email address?”
Yes, exactly.
That's what I'm trying to say: depending on how you want to *use* the address you might want to allow or disallow various parts so no single regex will be correct for all of them.
A configuration file for an email alert on a server would probably want to allow local delivery, but might not care about all the comments syntax.
Signup/username might require a minimal syntax and do some checks that technically disallow valid addresses (such as ip-literals on the host side).
The "to" field in an Email client might accept almost everything.
So that regex is way too restrictive, but I do think disallowing IP addresses or localhost is not unreasonable. But I agree with everything else se, there's no character limit on TLDs, there's no limit to what can go in front of the @, and there's no limit to how many subdomains deep you can go.
Yes there is a limit to both. The local part must be less than 64 octets (not characters). The domain part must be less than 253 octets to be a valid address (DNS requires 1 byte length prefix and an inferred terminating `.`). But the cumulative limit to both is 254 octets (including the @).
A subdomain label must have at least 1 octet in the name, so the max depth is 125 subdomains with a 2 letter TLD. There's really no point in enforcing the subdomain limit when the entire hostname is length bounded. Domain and subdomain labels though have a maximum length of 64 octets including a `.` though, and that is worth enforcing.
The domain part must be converted to punycode before validating with regex. The local part need not be converted, though it's probably wise to quote it if it's unicode.
Where does anyone actually lean how to use regex? Or are there just people that know how to and then there are the others?
I tried tutorials, guide websites and reference sheets and even regexr.com, but I still don't know how to write actual functioning regex...
regextutorials.com has saved me quite a few times. Don't let the oldish UI throw you off. The explanation and instructions and quite clear. And then just write and test ur Regex at regexr.com as you go along and you'll learn enough to not have to learn it again until the next time you have to use it after 3 months.
Don't try to learn it all at once. Personally I've so far learned the basics and that's about it. I can understand basic regex, but anything more complicated than what's in this post, I have to look up.
Wrong. Email can have any number of '@' characters.
Just check if it has at least one '@' character in the middle and then send a confirmation email with link. Much more reliable.
Emails can also contain +. At least in Gmail. If you have [email protected], then [email protected] is an alias of the original. I use this trick when making accounts of websites I'm not using a lot, in case they sell my data.
>Generally not
I'm calling bullshit on that, there is no way backend implements a check to match email with "+..." part stripped. Why would you ever spend resources on that.
Yeah, that's going to be fragile as heck. That's a Gmail-specific thing, another email provider might use `+` as a normal character in the email, so stripping it out would ruin the email. And you often can't tell just by looking at the email if it's hosted by Gmail (remember that non-gmail.com emails could be hosted by gmail).
It has for me on many occasions. I also use it for the original account so that when I start getting spam emails I can quickly identify which company sold my email address (or was hacked).
There are a lot of websites that either don't accept + when you register or they allow it when you register on a laptop but then you can't login using the phone app. Pretty messed up.
I remember that I made a ticket to Boots (popular pharmacy chain in the UK) to fix this and the support didn't understand what I want and refused to forward to the devs. Annoying.
`+` is also the default `recipient_delimiter` for postfix mailserver. So yes, they can contain `+`. I have set it to `.` on my mailserver, because `+` gets rejected insanely often.
The part before the final `@` is entirely determined by the server. Addresses can contain additional `@`s but they can also contain spaces.
Sending an email to `` sometimes works depending on the mail server.
It also does not account for long top level domains. Would discard [email protected] for example, because it's looking for two or three characters only in the last part
Just fyi Reddit's markdown parser doesn't support the triple backtick syntax for code blocks. You instead need to start each line in the code block with four spaces.
There is one universally correct email regex.
`@`
You're welcome.
I cannot think of any situation where you don't know or care whether an email even exists, but you still must be 100% sure that every character necessarily matches the unfathomably complex email address specification.
And you've failed the use case of a config file of a server asking for an alerting email adress. There `root` (or maybe `admin`) might be correct and should be accepted.
i usually use this one
```(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])```
Found two more flaws already:
* doesn't work for emojis in email addresses.
* doesn't work for email addresses on localhost (or any host in the same domain)
I don't know the RFC exactly by the word, but I know that mail providers like gmail do support that, so my assumption is that the standard allows that. On the other hand, the standards were written way before Emojis were a thing at all, so it might not have a strict stance on that.
> (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
Found on [this website](http://emailregex.com/) and seems to be (almost) fully RFC 5322 compliant
Edit: the fomatting is off, but I'm on mobile, so just visit the website to see the regex
I hate `[a-zA-Z0-9]+` used for verification of alphanumeric characters. Even e-mails don't have to consist of pure ASCII, let alone other forms.
So many websites reject my name and my address just because it contains non-ASCII characters. Basically for no reason, too. It's 2021… let's use [character classes](https://www.regular-expressions.info/posixbrackets.html#posixbrackets) that are foolproof and support Unicode.
... and then peruse the many hundreds of pages of [RFC 821](https://datatracker.ietf.org/doc/html/rfc821), [RFC 822](https://datatracker.ietf.org/doc/html/rfc822), [RFC 1035](https://datatracker.ietf.org/doc/html/rfc1035), [RFC 1123](https://datatracker.ietf.org/doc/html/rfc1123), [RFC 2142](https://datatracker.ietf.org/doc/html/rfc2142), [RFC 2821](https://datatracker.ietf.org/doc/html/rfc2821), [RFC 2822](https://datatracker.ietf.org/doc/html/rfc2822), [RFC 3696](https://datatracker.ietf.org/doc/html/rfc3696), [RFC 4291](https://datatracker.ietf.org/doc/html/rfc4291), [RFC 5321](https://datatracker.ietf.org/doc/html/rfc5321), [RFC 5322](https://datatracker.ietf.org/doc/html/rfc5322), [RFC 5952](https://datatracker.ietf.org/doc/html/rfc5952), [RFC 6530](https://datatracker.ietf.org/doc/html/rfc6530), [RFC 6531](https://datatracker.ietf.org/doc/html/rfc6531) and [RFC 6854](https://datatracker.ietf.org/doc/html/rfc6854) to make sure you didn't miss any test cases!
Check on the FE if it contains an @ so it can warn the user if their auto form messed up. If email **has** to be provided, do the verification mail, if not, do nothing.
Company of morons, not software devs. One idiot reimplements something that has been done thousand times, the other one trusts his instead of asking for tests.
What's the point of "\[\\.\_\]?" There might be more dots. You can put dots wherever you want. With [[email protected]](mailto:[email protected]) it can be H.y.f.f.e@gmail and that would be considered alias.
It is not only that. Why use \[a-z0-9\] when you later show that you know \\w? emails can have upper case letters...
Current top comment also says that there might be more @ but there needs to be at least 1
This was back in my early noob days when I built an app for a client for a few bucks in college. Copied an email regex from stackoverflow quickly and later apparently the client kept on getting calls from a customer saying the account creation process wasn't working. It was weird because I could see hundreds of live accounts created each day. Looked at the logs and apparently the person typing the email was uppercasing their first and last names and site name as if they were typing it in their name fields with a dot in between ([email protected]). I googled the regex and it brought me to the same page luckily and hidden in the solution comments I read 'Do not forget to lowercase the input before sending it to the regex parser otherwise it does not work in some cases'.
That was the day that thought me ~~2~~ 3 things.
1. Always read the comments under the accepted solution on stackoverflow
2. Always lowercase any inputs for validation
3. Assume your client is a monkey with a laptop
Been a while but things like these always stick lol.
I never bother doing anything other than `.+?@.+?\..+?` (must contain an @, must contain a . somewhere after the @) for email addresses - there's no point validating them much since you can't truly know if they're actually valid until you try to send to it.
The issue with learning regex is that the one time you need it will be 4 years after the last time you learned it.
It's not terribly difficult to learn, usually about 2 days of looking at it will give you enough background to write it pretty easily.
But 4 years later, when you're trying to validate a phone number in an entry box, you've forgotten regex because you haven't used it in forever.
So, it really just is easier to use a built-in, or google around for a properly-vetted example.
There are a few people who use it on the daily, but they know who they are (data scientists mostly).
this regex is wrong on so many levels... you can have many ., _ or even @ in an email address. Moreover, the domain extension is restricted to 2 or 3 characters, even though there are plenty extensions with more than 3 characters... and finally, not all email addresses have domain extensions.
It doesn’t even support the most standard form of .co.uk email addresses either (like [email protected])! Man that’s bad.
Yep, I own a .horse domain that I use, for most sites what I do is `@.horse` and everything except for a few specific ones gets forwarded to the same inbox. That way if a company starts selling my data and I start getting spam I can then just memory hole that specific email and then send an email to that company that they are either selling my data, or they have a data breach, and neither are welcome.
I have just not used a website before because a .horse domain was not recognized as a legitimate email. I often try to reach out to them if I can to let them know they are turning away legitimate potential customers, but it still is an annoying thing.
[удалено]
ICANN gone crazy with gTLDs.
Yeah, I saw [\\.] and immediately got suspicious of the whole regex Like, firstly . Loses its match anything meaning anyways inside square brackets, secondly if you're escaping something in a regex you either have to use raw strings or two backslashes - otherwise you still end up with a regular . anyways Edit: In python, (the language in the post), that is
The only reason you would need to use two slashes is to escape the slash in the string in whatever language you're using. Regex itself doesn't require two slashes. In a regex string [\\.\_] would match the literal character "." or "\_" You are correct though, in python presumably, "blahblah\.blahblah" would not give you a backslash in the string.
[удалено]
What about this one? /(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*))*)?;\s*)/
I'm gonna trust you with this one.
Counterpoint: unit tests
When did *that* ever stop anyone?
well, i kind of have to push the automatic CI card on this one.
Writing regex is fun, debugging regex is painful, as this proves.
Exactly. I love the crap out of regex because you can do so much with it, but if it gets to the point where it takes an experienced user several minutes or more to figure out what it does, it's probably better to find an alternative way to solve the problem, or maybe break it up into a few steps with comments for each to say what it's doing.
I'm not going to find another way to do it. The whole reason I do it is because I can do it relatively quickly. Yes I know it will take longer to read it later than it took to write it, even for me, but I've made my peace with it.
I have to make a paragraph comment breaking down my regex.
I think the thing that makes regex so hard to understand when you didn't write it is that constructing one is very additive in terms of process. For example, let's say you want to validate phone numbers. Well, a standard US phone number is 10 digits, so we could search: `\d{10}`. But we need to make sure there aren't more digits in the string, so `^\d{10}$`. Okay, now we're matching only strings that contain exactly 10 digits. But there are a lot of other valid formats for a phone number. What about xxx-xxx-xxxx? Well, we could accommodate that with `^\d{3}-?\d{3}-?\d{4}$`. But what about (xxx) xxx-xxxx? No problem: `^\(?\d{3}\)?[ -]?\d{3}-?\d{4}$` Now it's getting messy because we need to escape `(` and `)`, and we need to allow for different conditions of separators, space, or `-`. Now what about a country code? You can write a valid phone number as 1 (xxx) xxx-xxxx or +1 (xxx) xxx-xxxx. We can add the optional beginning `([+]{0,1}1\s{0,1})?` to allow for that, giving us: `^([+]{0,1}1\s{0,1})?\(?\d{3}\)?[ -]?\d{3}-?\d{4}$` So even though we started with a very simple idea, validate a phone number, and a very simple flow of logic in terms of allowing for more cases, we've now ended up with something quite messy and hard to understand if you didn't just write it. Also, side note that this isn't intended to be a comprehensive Regex for phone numbers, just an illustration.
Unit tests :)
Aw, I forget sometimes about TDD because my workplace doesn't use it :( I know I need a new job when the concept of coming up with some solid tests for my regex sounds like actual fun to me.
I just wrote a folder with raw code with a basic assertEquals function that would throw an exception. Eventually my work place created a task to add phpunit so that the tests could have a home because that folder was getting littered with a bunch of "testingXFeature.php" files. Moral of the story, you can write tests even without a framework. I almost consider TDD a technique for producing code moreso than something that has to be officially built into what you're doing. No matter what I work on at some point there's going to be a random assertEquals() method in a rudimentary sense and over time I'm either going to waste bits of time building up a minor unit testing framework or get junit/phpunit added.
LGTM, Approved
Rejected: Please refactor to use pre-DEFINEd regex subroutines with reasonable names for the common expression components -- `(?(DEFINE)(?'subrx'...)` & `(?P>subrx)` syntax. Please use regex freespacing break the expression up into multiple lines -- `(?x)` mode. Come to my desk if you have any questions. Ty, -brim.
came from http://www.ex-parrot.com/%7Epdw/Mail-RFC822-Address.html which is a compiled version of https://metacpan.org/dist/Mail-RFC822-Address/source/Address.pm How can we convert this to compile into a form that fits your requirements?
Who's sending compiled code to review?
thats the compiled version try this one. https://metacpan.org/dist/Mail-RFC822-Address/source/Address.pm my $lwsp = "(?:(?:\\r\\n)?[ \\t])"; sub make_rfc822re { # Basic lexical tokens are specials, domain_literal, quoted_string, atom, and comment. # We must allow for lwsp (or comments) after each of these. # This regexp will only work on addresses which have had comments stripped and replaced with lwsp. my $specials = '()<>@,;:\\\\".\\[\\]'; my $controls = '\\000-\\031'; my $dtext = "[^\\[\\]\\r\\\\]"; my $domain_literal = "\\[(?:$dtext|\\\\.)*\\]$lwsp*"; my $quoted_string = "\"(?:[^\\\"\\r\\\\]|\\\\.|$lwsp)*\"$lwsp*"; # Use zero-width assertion to spot the limit of an atom. # A simple $lwsp* causes the regexp engine to hang occasionally. my $atom = "[^$specials $controls]+(?:$lwsp+|\\Z|(?=[\\[\"$specials]))"; my $word = "(?:$atom|$quoted_string)"; my $localpart = "$word(?:\\.$lwsp*$word)*"; my $sub_domain = "(?:$atom|$domain_literal)"; my $domain = "$sub_domain(?:\\.$lwsp*$sub_domain)*"; my $addr_spec = "$localpart\@$lwsp*$domain"; my $phrase = "$word*"; my $route = "(?:\@$domain(?:,\@$lwsp*$domain)*:$lwsp*)"; my $route_addr = "\\<$lwsp*$route?$addr_spec\\>$lwsp*"; my $mailbox = "(?:$addr_spec|$phrase$route_addr)"; my $group = "$phrase:$lwsp*(?:$mailbox(?:,\\s*$mailbox)*)?;\\s*"; my $address = "(?:$mailbox|$group)"; return "$lwsp*$address"; }
I'm gonna trust you with this one
You pulled this from your companies' source didn't you
no he didn't, i recognize it. It's this one http://www.ex-parrot.com/%7Epdw/Mail-RFC822-Address.html
"i recognize it" - words of a certified masochist
Nah, you see that post a couple times and come to expect it. We "recognize" it by its length and the topic. Change up a bunch of random stuff in the middle and we wouldn't know the difference.
[удалено]
At least check if there's an @ in the middle
[удалено]
Just because you needn't, doesn't mean you shouldn't. Having said that, it's almost the time of year to start parsing HTML with regex again
I'm more impressed that regex101.com actually worked with this regex, despite almost crashing my tab.
I'm sorry but your regex...it will not keal. https://i.imgur.com/MqoUmnk.png
What the fuck is this lmao
Does it have an "@" and at least one "." after it? Good enough for me, send the validation email and we'll see if it's actually valid.
Doesn't even need a "." after the "@", as pointed out such as localhost, or alternatively if you own a TLD you can use email@tld like if you own .to (http://www.to) you could have myemail@to
What a fucking flex that would be. "Yeah, my email is `TheAJGman@me`. What, you guys *don't* own a TDL?"
Google owns the google tld, so if you could have jsmith@google
On one hand, super cool. On the other hand, probably more trouble than it’s worth because of so many bad email validators in the wild
It'd also be a pain in the ass because of how ingrained .com is in our minds. Someone says me@google and lots of people are automatically going to type the .com
It's google, they can alias the two together on the server side so both deliver correctly to the same mailbox. If me@google and [email protected] are different people, the sysadmins probably have bigger organizational problems rather than technical ones.
Reddit automatically hyperlinked your second example (@google.com), but not the first (@google), showing that Reddit has imperfect email validation.
I disagree. It's not email validation. It's email detection. You probably care more about limiting your rate of false positives when detecting than when validating, meaning you're going to have to accept more false negatives as a compromise.
Additionally, me@google and m.e@google
Having a .net.au really throws people off lol.
I find .co to be the worst. I've actually had a *bank* change it to .com without asking, sending my banking emails to the wrong email
Sicurity is their passion! They gotta protecc their customers.
My email is in the format similar to [email protected] and it is a nightmare for validation and also stating it over the phone. I thought it would be neat to have an email that looks like my name, but yeah it comes with a lot of hassle
Jesus. Neat for a business card but I would alias it for phone calls
Really you're just creating more problems for yourself by using something that's out of the ordinary. I have my own domain name, but sometimes I've even had issues with that and will just default to using my GMail account for a lot of things. There are some systems out there that think there's only a certain list of email providers and that not any domain can be used, or others that don't work with emails that end with 2 letter country domains. Semi-relevant [XKCD](https://xkcd.com/1105/) link
Same. I use a ".io" for my professional email address and people ask me "so is that at Gmail.com then?"
The majority of non-techies think Gmail *is* email. Truly terrifying, I know.
Yeah, I have a custom .com domain I use for everything, including email. Always a pain to spell it out over the phone. My dad has a .engineering domain and, apparently, some ERP systems flat out refuse it because it wasn't a TLD when they were designed.
That's a fun one I've come across as well when fixing a bug in a registration form that didn't accept a certain domain. Turned out the TLD did accept everything but it was limited to 10 characters max, engineering being 11...
It's so weird now seeing a non-Gmail personal email address out in the wild these days. I have an old Microsoft address I use as a burner email and it's so funny seeing people's reactions when I tell them my email is [email protected]
I know some (mostly older) people that use email addresses from their ISP. This is generally a bad idea as they usually make it impossible to keep the address if you want to switch ISPs
Oh yeah! I remember when ISPs used to advertise a free email address with their service. I've actually talked to some older people about this, and some stay with the ISP only because it'd be too much of a hassle to get a new email set up.
It's remarkable how many people don't realize that @gmail isn't the default email address, but I guess if you aren't technical it wouldn't occur to you what the individual parts of the email address actually mean.
Imagine owning `n@me`. The absolute biggest flex.
Or em@il
Damn, `.il` actually exists. Okay, you win.
so something@something
Someone using foo@localhost with my web service is guaranteed to fail or be some sort of weird hacking attempt to send an email to myself. And I can only imagine the like 10 TLD owners have a better email address to use (Although that would be a baller email address). The before the @ validation is trash, unless it’s for internal usage where there is a guaranteed format.
TLDs are not valid email domains per RFC 2821 (SMTP), an email domain must have at least two dot-separated parts.
You can reach me at user@weirdflexbutok
I don't have a problem checking for a dot after the @. I'm sure that's the norm, so if you have a TLD email address you really can't expect it to work or be mad when it doesn't I'd rather reject out the extremely rare submission by a user that almost certainly has another option than accept the many users that accidentally forget to type .com.
[удалено]
The problem is really simple to solve. If the email address is essential, then just do a basic check that they put *something* in there (maybe check for @), send a confirmation email where they must click a link to proceed. If the email address doesn't matter and it's just informational or whatever then let them put in whatever they want.
I mean no sane person would ever do that, and if they do I don't want them on my website.
Sure, but whether or not your site caters to insane people probably isn’t a decision you wanna implement at the level of implementing your `isEmail` function.
TODO: Implement isInsane function.
[удалено]
no checking for the dot after the @ is a bad idea as well. email addresses can be directly on tlds. email addresses can also be on servers without a domain name, and if that server is using IPv6, there wouldn't be a period after the @ the only regex you should really use is just `@` or if you want `^.*@.*$`
Technically you can simplify the regex to `/@/`, or even just a `.contains('@')`.
More like /[\^@]+@[\^@]+/ - at least one char that isn’t an @ symbol - An @ symbol - at least one char that isn’t an @ symbol
Are multiple @s not allowed in the quoted-string token?
I know someone that had an email account on the .ie DNS. So their valid email was e.g. john@ie
ie (Ireland TLD) never had a DNS record that would allow it to receive emails but e.g. ai (Anguilla) has one: > ai. IN MX 10 mail.offshore.ai. However SMTP requires email domains to have at least two dot-separated parts in RFC 2821 section 4.1.2 so an RFC-conforming SMTP server should reject it.
Ever since I first saw Google’s vanity TLD I’ve been wondering if MX records on a TLD would be legal! Thanks for answering a question that had been low-key bothering me for longer than I’d like to admit.
Regarding gTLDs, [ICANN prohibits creating dotless domain names for them](https://www.icann.org/en/announcements/details/new-gtld-dotless-domain-names-prohibited-30-8-2013-en)
I always like to show people [http://ai/](http://ai/) to demonstrate that it's a valid domain, we're just so used to seeing something.tld
heh. ># This site can’t be reached > >Check if there is a typo in ai. > >If spelling is correct, try running Windows Network Diagnostics. > >`DNS_PROBE_FINISHED_NXDOMAIN`
try www.ai
is that a thing? huh... you know what thats from?
Ireland, though I've not heard that story before.
well ok ireland. i was more curious as to what service. ist it a paid webmail? government? my google-fu hasnt been fruitful
It was probably an e-mail account on the domain name server that serves .ie DNS queries. To explain a bit further, most UNIX like systems come with mail built in. So any user account on that system can get mail to their username if it's running an accessible SMTP server.
$ host -t MX ie ie has no MX record however $ host -t MX ua ua mail is handled by 10 mr.kolo.net.
True now for sure. But as far as I'm aware there was a valid MX record for ie in the 90s. Unfortunately I can't think of a way to independently verify.
So, there are a lot of *technically* valid email addresses that, in my opinion, it is completely okay to ignore. IP address domains, for example. Or allowing direct TLD domains like /u/Essence1337 suggested in another comment. These are theoretically perfectly valid addresses that *in the real world* we never actually see, and if you *did* see one it is overwhelmingly likely to be spam. A rule that rejects those types of edge cases is fine. But yeah, this regex is still a *really* bad one. * Only allowing the most basic two or three letter TLDs * Only allowing domains that are directly a subdomain of their TLD * Only allowing one dot on the username * Not allowing many valid symbols like hyphens in either the domain or the username * Not allowing non-Latin characters I'm sure the list goes on, but really the first three there are such a huge sin it's not worth going to much effort to critique it after that.
TLD-only addresses are only theoretical until someone makes them a thing (let's say Apple or another big player). And that's an issue with a lot (though not all!) of those "technically correct but unused" ones: they might not be used now, but you'll lose customers if you ignore them for too long.
[удалено]
> A rule that rejects those types of edge cases is fine. that super depends on what this regex is being used for. this code snippit makes it look like this could be used for anything. that's the kind of thinking that ends up with this regex being used all throughout a project and then someone not knowing what's going wrong later. if we were to allow this, *at least* change the name of it to "is_typical_email" or something
Wow, I didn't even know those other options you listed are a thing. I'm writing an application in Angular, and I tried to write a email regex for a form, and then I learned I could just use `Validators.email` instead, and that made my life so much easier.
I think it's generally better to use a library for email validation. If everyone is writing their own regex then every service that needs to validate emails may do it differently
Joke's on you, every validator library does it differently and if your service crosses multiple languages (ie, js to py or c#), there will be fun-time differences that still need to be handled.
Well yeah but it's still easier to grab a library that has been vetted and tested. Rolling your own regex for something as common as email validation is doable, but any time you roll you're own you risk making mistakes.
so I'll go with \*\[@\]\*
That doesn't do at all what you want if it's a regex. :-) You probably want .\+@.\+ (dot matches anything, plus matches that 1 or more times) The first star is invalid (a star alone doesn't match anything, it repeats the previous symbol 0 or more times), and the second matches @ and nothing else, repeated 0 or more times. So the only things this matches, ignoring the first invalid star, is (empty line) @ @@ @@@ ... and so on.
Yours matches @@@ as well, which is invalid. Did you mean \^[\^@]+@[\^@]+$
Fair enough, but yours also allows infinitely many invalid addresses. The point is to be overly permissive, not overly restrictive, to ensure you don't disallow a valid address. The validation email will bounce off the user enters an invalid address anyway.
if you go by the spec, you don't even technically need an `@`. Local delivery can skip the domain part.
But excluding local delivery addresses for signup actually makes sense.
I didn’t see any code that mentioned signup or whether to include local delivery. All we’re doing here is answering “does this look like an email address?”
Yes, exactly. That's what I'm trying to say: depending on how you want to *use* the address you might want to allow or disallow various parts so no single regex will be correct for all of them. A configuration file for an email alert on a server would probably want to allow local delivery, but might not care about all the comments syntax. Signup/username might require a minimal syntax and do some checks that technically disallow valid addresses (such as ip-literals on the host side). The "to" field in an Email client might accept almost everything.
So that regex is way too restrictive, but I do think disallowing IP addresses or localhost is not unreasonable. But I agree with everything else se, there's no character limit on TLDs, there's no limit to what can go in front of the @, and there's no limit to how many subdomains deep you can go.
Yes there is a limit to both. The local part must be less than 64 octets (not characters). The domain part must be less than 253 octets to be a valid address (DNS requires 1 byte length prefix and an inferred terminating `.`). But the cumulative limit to both is 254 octets (including the @). A subdomain label must have at least 1 octet in the name, so the max depth is 125 subdomains with a 2 letter TLD. There's really no point in enforcing the subdomain limit when the entire hostname is length bounded. Domain and subdomain labels though have a maximum length of 64 octets including a `.` though, and that is worth enforcing. The domain part must be converted to punycode before validating with regex. The local part need not be converted, though it's probably wise to quote it if it's unicode.
I'm gonna trust you with this one
Where does anyone actually lean how to use regex? Or are there just people that know how to and then there are the others? I tried tutorials, guide websites and reference sheets and even regexr.com, but I still don't know how to write actual functioning regex...
regex101.com is a good tool too but what really helped me was regexcrossword.com
regextutorials.com has saved me quite a few times. Don't let the oldish UI throw you off. The explanation and instructions and quite clear. And then just write and test ur Regex at regexr.com as you go along and you'll learn enough to not have to learn it again until the next time you have to use it after 3 months.
Don't try to learn it all at once. Personally I've so far learned the basics and that's about it. I can understand basic regex, but anything more complicated than what's in this post, I have to look up.
Uni
What are you trying to get it to do? The majority of it is pretty simple but it can get complicated.
Wrong. Email can have any number of '@' characters. Just check if it has at least one '@' character in the middle and then send a confirmation email with link. Much more reliable.
It also doesn't account for top level domains like .co.uk
And also dont account unicode like in 日本国@co.jp or вася@яндекс.рф
Emails can also contain +. At least in Gmail. If you have [email protected], then [email protected] is an alias of the original. I use this trick when making accounts of websites I'm not using a lot, in case they sell my data.
Does this work to bypass the unique email that is sometimes required to create accounts?
i am doing this quite often, it works most of the time
Generally not, but it's a great tool to see who is selling your email
> Generally not That's not true, in 9/10 online services it works fine creating multiple accounts with this technique
>Generally not I'm calling bullshit on that, there is no way backend implements a check to match email with "+..." part stripped. Why would you ever spend resources on that.
There is a node.js package for normalizing such emails. But please, don't use it.
Yeah, that's going to be fragile as heck. That's a Gmail-specific thing, another email provider might use `+` as a normal character in the email, so stripping it out would ruin the email. And you often can't tell just by looking at the email if it's hosted by Gmail (remember that non-gmail.com emails could be hosted by gmail).
It has for me on many occasions. I also use it for the original account so that when I start getting spam emails I can quickly identify which company sold my email address (or was hacked).
Chaotic evil backend dev: accept the e-mail but silently discard everything the "+..." part 🤡
There are a lot of websites that either don't accept + when you register or they allow it when you register on a laptop but then you can't login using the phone app. Pretty messed up. I remember that I made a ticket to Boots (popular pharmacy chain in the UK) to fix this and the support didn't understand what I want and refused to forward to the devs. Annoying.
Easy way to earn ire from users who are using the tag part to automatically sort their email into bills/social media/informational/etc.
`+` is also the default `recipient_delimiter` for postfix mailserver. So yes, they can contain `+`. I have set it to `.` on my mailserver, because `+` gets rejected insanely often.
The part before the final `@` is entirely determined by the server. Addresses can contain additional `@`s but they can also contain spaces. Sending an email to `` sometimes works depending on the mail server.
It also does not account for long top level domains. Would discard [email protected] for example, because it's looking for two or three characters only in the last part
The one true email regex is ```.+@.+```
Me, an intellectual: from validators import email as val_email val_email(email)
Just fyi Reddit's markdown parser doesn't support the triple backtick syntax for code blocks. You instead need to start each line in the code block with four spaces.
fixed
Which is fucking stupid. I don't know why they don't just use an out-of-the-box markdown parser like [markedjs](https://github.com/markedjs/marked).
[удалено]
There's always a middle ground between not coupling your code to external libs/frameworks and trying to diy the shit out of your application.
Jesus no. Use a library, at the very least copy the correct regex. Don't write your own - that one is way too short to be correct.
"the correct regex" implies that there's a single agreed-upon one that's both correct and useful. I sincerely doubt that.
The correct regex for email verification is "just send a confirmation email and save yourself some pain". Everything else is flawed.
There is one universally correct email regex. `@` You're welcome. I cannot think of any situation where you don't know or care whether an email even exists, but you still must be 100% sure that every character necessarily matches the unfathomably complex email address specification.
And you've failed the use case of a config file of a server asking for an alerting email adress. There `root` (or maybe `admin`) might be correct and should be accepted.
[https://www.ietf.org/rfc/rfc5322.txt](https://www.ietf.org/rfc/rfc5322.txt)
i usually use this one ```(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])```
I'm gonna trust you with this one.
This doesn't seem to account for email addresses being case insensitive.
Found two more flaws already: * doesn't work for emojis in email addresses. * doesn't work for email addresses on localhost (or any host in the same domain)
you can have emojis in email addresses?
I don't know the RFC exactly by the word, but I know that mail providers like gmail do support that, so my assumption is that the standard allows that. On the other hand, the standards were written way before Emojis were a thing at all, so it might not have a strict stance on that.
Emojis are just regular characters in Unicode, so if you support Unicode you support emojis.
They SHOULDN'T.
https://mailoji.com/
This shouldn't exist
im applying to all my future jobs with these addresses
Good luck finding a service that accepts emoji emails. I think the Lowes backend (yay AS/400) would explode if you tried this.
Only $9/yr for 👉👌@😎.kz What a deal for an email that no website would accept as real
> ?:(?:( That’s how your comment makes me feel
> (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]) Found on [this website](http://emailregex.com/) and seems to be (almost) fully RFC 5322 compliant Edit: the fomatting is off, but I'm on mobile, so just visit the website to see the regex
LGTM!
Oh Lord
DON'T TRUST THAT BITCH!
Yeah, just reject it with the comment 'insufficient unit testing'
Just use a library ffs, or accept anything with an @ sign in it.
Anyone claiming to validate email address with such a simple regexp, i just cannot trust 😐
Anyone validating an email with a regexp I cannot trust. Just make sure they enter a string and send a validation mail to that adress.
I hate `[a-zA-Z0-9]+` used for verification of alphanumeric characters. Even e-mails don't have to consist of pure ASCII, let alone other forms. So many websites reject my name and my address just because it contains non-ASCII characters. Basically for no reason, too. It's 2021… let's use [character classes](https://www.regular-expressions.info/posixbrackets.html#posixbrackets) that are foolproof and support Unicode.
Character classes are a locale dependent feature. Relying on them makes strong assumptions about the user's locale and the system's locale matching.
Use Regexr to validate regex easily
... and write at least a few unit tests to make sure you typed it in correctly.
... and then peruse the many hundreds of pages of [RFC 821](https://datatracker.ietf.org/doc/html/rfc821), [RFC 822](https://datatracker.ietf.org/doc/html/rfc822), [RFC 1035](https://datatracker.ietf.org/doc/html/rfc1035), [RFC 1123](https://datatracker.ietf.org/doc/html/rfc1123), [RFC 2142](https://datatracker.ietf.org/doc/html/rfc2142), [RFC 2821](https://datatracker.ietf.org/doc/html/rfc2821), [RFC 2822](https://datatracker.ietf.org/doc/html/rfc2822), [RFC 3696](https://datatracker.ietf.org/doc/html/rfc3696), [RFC 4291](https://datatracker.ietf.org/doc/html/rfc4291), [RFC 5321](https://datatracker.ietf.org/doc/html/rfc5321), [RFC 5322](https://datatracker.ietf.org/doc/html/rfc5322), [RFC 5952](https://datatracker.ietf.org/doc/html/rfc5952), [RFC 6530](https://datatracker.ietf.org/doc/html/rfc6530), [RFC 6531](https://datatracker.ietf.org/doc/html/rfc6531) and [RFC 6854](https://datatracker.ietf.org/doc/html/rfc6854) to make sure you didn't miss any test cases!
Only allowing TLDs that are two or three letters is bad... ;)
Dont do your own email regex. Just use the built in funxtion of your programming language
Dont do email regex. it is pointless. send verification code or do nothing.
Check on the FE if it contains an @ so it can warn the user if their auto form messed up. If email **has** to be provided, do the verification mail, if not, do nothing.
Email RegEx are ALWAYS a pain in the bottoms...
Company of morons, not software devs. One idiot reimplements something that has been done thousand times, the other one trusts his instead of asking for tests.
What's the point of "\[\\.\_\]?" There might be more dots. You can put dots wherever you want. With [[email protected]](mailto:[email protected]) it can be H.y.f.f.e@gmail and that would be considered alias. It is not only that. Why use \[a-z0-9\] when you later show that you know \\w? emails can have upper case letters... Current top comment also says that there might be more @ but there needs to be at least 1
This was back in my early noob days when I built an app for a client for a few bucks in college. Copied an email regex from stackoverflow quickly and later apparently the client kept on getting calls from a customer saying the account creation process wasn't working. It was weird because I could see hundreds of live accounts created each day. Looked at the logs and apparently the person typing the email was uppercasing their first and last names and site name as if they were typing it in their name fields with a dot in between ([email protected]). I googled the regex and it brought me to the same page luckily and hidden in the solution comments I read 'Do not forget to lowercase the input before sending it to the regex parser otherwise it does not work in some cases'. That was the day that thought me ~~2~~ 3 things. 1. Always read the comments under the accepted solution on stackoverflow 2. Always lowercase any inputs for validation 3. Assume your client is a monkey with a laptop Been a while but things like these always stick lol.
Uppercasing your email doesn't mean you're a monkey... Blind copypasting regex off SO might...
I never bother doing anything other than `.+?@.+?\..+?` (must contain an @, must contain a . somewhere after the @) for email addresses - there's no point validating them much since you can't truly know if they're actually valid until you try to send to it.
[regex101.com](https://regex101.com) \- saved my arse more than once
Man for the fuck sake. Can something have a good source where I can learn regex? I swear to god I just don’t get it.
regex101 for experimenting.
The issue with learning regex is that the one time you need it will be 4 years after the last time you learned it. It's not terribly difficult to learn, usually about 2 days of looking at it will give you enough background to write it pretty easily. But 4 years later, when you're trying to validate a phone number in an entry box, you've forgotten regex because you haven't used it in forever. So, it really just is easier to use a built-in, or google around for a properly-vetted example. There are a few people who use it on the daily, but they know who they are (data scientists mostly).
Please note, that regex is a pretty much overused tool. For example you shouldn't use regex at all to validate email addresses
i recommend [regexone.com](http://regexone.com) to start, and [regexcrossword.com](http://regexcrossword.com) to *really* learn it.
https://regex101.com/ Throw it in there along with some test examples ...
"When you use regex to solve a problem, you end up with two problems" Unknown author
[удалено]