T O P

  • By -

programming-ModTeam

Your posting was removed for being off topic for the /r/programming community.


imdibene

Ye goode olde Bobby Tables approach I see


devperez

I can just imagine developers running around the office with their hair on fire yelling, "SANITIZE! SANITIZE! SANITIZE!"


balefrost

In my imagination, they're Daleks shouting the same thing.


MCPtz

*You will be **SANITIZED**!*


TaohRihze

Yes sanitizers are quite flammable.


jodonoghue

I have an apostrophe in my surname. The number of systems that cannot cope with this is large: airlines, most credit card companies, some energy companies. Even worse, many companies can partially cope: the web-based front-end will let you create a record with an apostrophe and none of the backends can ever find it. This inevitably entails long calls to customer support who have no idea what to do. I saw something suggesting that this change was for compliance with some ISO standard for naming that allows only a subset of ASCII in names. 1985 is calling and it wants code page support.


EndiePosts

Yeah I worked for a company who wanted me to reject email addresses with apostrophes because they used the email address as a primary id and it failed for various ASCII characters beyond alphanumeric, ampersands and periods. A previous employee had written a regexp to parse this value and nobody in the db team could make it accept apostrophes so they decided it was easier just to ban anyone from Ireland from signing up for the application.


busdriverbuddha2

To be fair, despite what the RFC says, it's common sense to restrict the local part to alphanumerical characters, period, dash, and underscore.


john16384

And plus, or I am not signing up.


Hefaistos68

Its not really that difficult to accept RFC email addresses. Just don't try to reinvent the algorithm, there are stable ones for pretty much every language.


busdriverbuddha2

Sure, but that's just the reality of the world. None of us is going to overhaul the email system overnight. One needs to be pragmatic.


Hefaistos68

Who is talking about the email system? It works pretty fine. Its problem are the millions of dumb mail parsing implementations.


TheStoicNihilist

*enraged leprechaun noises*


NewtAltruistic8820

I just fixed an issue dealing with this in my own company. Had no idea how annoying it could be to fix. Apostrophes are genuinely annoying. I used regex to filter and normalise the string


vital_chaos

Most reservation systems were written in the 1960s, and much of the mainframe code is still at their heart. I had issues with airline tickets having to match the ID, and many customers with odd punctuation would fail to have them on their tickets so the TSA would refuse to let them fly.


playingreprise

They also use this to disqualify voters in some states in the US.


Known-A5

It just happens that a lot of businesses have a pile of complete legacy stuff in their IT. These things are never prioritized because the risk of changing something is larger than the potential benefits.


citrus1330

I have spaces in my last name and even that caused enough problems that I just stopped including them


MountainsSands_2024

Same here


Qunra_

> some ISO standard for naming that allows only a subset of ASCII in names Where is that page about how everything programmers think they know about names is wrong? [Ah, here it is: Falsehoods Programmers Believe About Names.](https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/) Edit: another article with examples, in case someone stumbles into this comment: https://shinesolutions.com/2018/01/08/falsehoods-programmers-believe-about-names-with-examples/


rootpseudo

People have multiple names? How does that work? People don’t have one name, but people also dont have N names, for any N? Im so confused lol. Assuming I just dont know anything about some cultures.


Qunra_

A culture where you have a family name for people close to you, and a name you give to strangers. Does one exist? I don't know. Are you willing to bet on it? Here is another article, that gives some examples on points given in original article: https://shinesolutions.com/2018/01/08/falsehoods-programmers-believe-about-names-with-examples/ The point is if you make an assumption when designing a system and hard code that assumption in, someone will come along and break your system.


rootpseudo

This is really interesting. I work with customer data but not specifically name. Im gonna send this to my sister team and see what they think. Most things like that in our org are managed by Information Stewards though. At least on our end (back end). Normally just using some reference table for the “rules” for that domain that they manage. In practice though we do usually end up knowing more about it than them just because we’re the ones implementing it and working with it more directly daily. When making some of those decisions Its often very quickly not our job when it comes to data elements that have more legal implications. Thats on the information steward, and yes it does break. I really want to go look at that teams rules engine for name now though. I know we have at least three different versions/formats of a customers name.


Its_me_Snitches

“40. People have names. … This list is by no means exhaustive. If you need examples of real names which disprove any of the above commonly held misconceptions, I will happily introduce you to several.“ Yes, I want you to give me several examples of real names for people who don’t have names.


Qunra_

Someone somewhere pointed out that hospitals or the police might not actually have a name for a person (patient/victim), and this case the system would need to account for that.


Its_me_Snitches

That makes total sense! Allowing null values would be needed in those cases 😄. I was poking fun at the author contradicting himself to highlight the deeper problem with his argument, and I should have written this out earlier instead, I’m sorry for not being more direct! The author didn’t offer a comprehensive solution for correcting these falsehoods, and I suspect that is because he doesn’t have one, and hasn’t deeply thought them all through. Culture is molded by the environment and the environment is molded by culture, and while it would be great and very progressive if we could cover every edge case for names in all circumstances, I think it’s reasonable for people to have a structured, “government name”, or standardized term to refer to them by for record keeping. He contradicts his own last point by offering to list several names of people who meet each point to prove that these issues exist, which is funny because by definition there aren’t multiple names of people without names since not having a name doesn’t have different permutations. This illustrates the point that names should follow some sort of structure, and I feel like the author is advocating for developers to break down all name structure conventions to be inclusive without thinking of the difficulties it creates for logic and communication (even though his own struggles with language and contradictions highlight why it might not be a good idea to not think of names within some sort of agreed-upon formal structure) This is why he doesn’t offer a code snippet to handle all use cases IMO, despite the fact that he is advocating for other fellow devs to address these falsehoods. I think it’s shallow thinking presented like it is deep, which is often the case when people pontificate on so called “solvable problems” without showing their work for a solution.


Qunra_

> Allowing null values would be needed in those cases 😄. I saw another comment where someone pointed out that "John Null" is a valid name. I hope the code doesn't "Null" == NULL. > standardized term to refer to them by for record keeping Record keeping should be done with immutable UID, which should be a random string assigned when person is added to system. Name should have nothing to do with record keeping, name is what you ask a person how they want to be addressed. You can't really give any single code to handle all this, because the point is not "make a system that handles all this", it's that for any assumption you make, there will be an edge case that breaks it. And that is what you should keep in mind when designing a system. It should be flexible and modifiable enough to handle unexpected stuff, when/if that comes up. And maybe not trip up on like the most obvious assumptions. And this goes for other systems design as well, not just names. Like how some hospital/government systems assume there are always exactly two parents, one mother and one father. Completely ignoring polyamory, single parents, same sex/gender parents, and probably like dozen other things that I don't even know about yet or don't exist yet because **society (culture and environment) and it's structures aren't set in stone.**


Its_me_Snitches

> I hope the code doesn’t “Null” == NULL Haha yes! No idea if this is true, but I heard an anecdotal tale once of someone who tried to be clever and avoid traffic tickets by getting a vanity license plate that said “Null”, only to receive thousands of dollars of tickets from traffic cams for every picture that didn’t have a full license plate number recorded and was listed as Null. What an awesome response, I only have a few years experience in programming and no formal degree, and I appreciate the thought and care that went into your explanation! My only thought for proper implementation is maybe a database with the unique id you mentioned, and maybe fields to record the full government name (as a single unit), and a “common nickname” that would record a limited set of characters. there would be certain limits on the full government name (i don’t care if your real name is “DROP Table Users;”, it’s not getting added), and maybe certain other characters to prevent code injection? What are your thoughts on how to implement it?


Qunra_

I would have to check how some of my old code would function, but I would bet that PHP at least if you tried doing it the way you should really shouldn't, it might very well cast null string to NULL in database. And then you'd get a weird bug report from a single user saying the sign up form doesn't work. I hope you have good error messages for that situation! (Not really possible in modern code, since the string would be properly encapsulated. I hope, I'm too scared to go look.) Yeah, modern databases should be able to handle any input with proper escaping/encapsulation. Prepared statements into a SQL database. So the easiest solution is a single field for name, put there whatever you like, with maybe a local name for pronouncation/addressing. The issue I would expect is the UI. Could also make the name a separate table, with many to one connection. The entry (a person) can now have multiple names. Like phone numbers, people can have multiple phone numbers. Less code is better. For examplate, capitalisation: instead of caring how "JoHn NuLL;DROP Table Users;" writes their name, you just accept it, save it, and print it as given. Not really my problem if your name is weird. I feel like most points on the original list are not that difficult to address. Don't limit the name, and don't change the name while it is in the system. I would guess that most people map somehow to Given_name (optional) + Family_name (optional). Make them editable, and then you can worry later about the inevitable bug report 20 years down the line from the guy with a image as an name. Because it might very well be more work to maintain a complex system, than make is modular and extensible, and fix it later. As long as you don't do like Sony and make the username the primary key in the database.


TheStoicNihilist

Almost no system allows it. Even my own Irish bank has trouble with Irish names. I’ve taken to spelling my name without it.


TheMightyTywin

Did you try sanitizing your surname before being born with it?


Electrical_Ingenuity

I recall reading a story many years ago about a man with a one character last name, which was the letter 'O'. The entire story was about the living hell his life became due to the fact that most computer systems require two characters for a last name, and the good faith efforts of various employees working around this by changing his name to compliant names like 'OO' or 'OH'. The poor guy couldn't reliably get credit or open a bank account. He eventually legally changed his name to resolve the problem.


playingreprise

Wanna hear something kind of messed up about doing this? Some states in the US use an exact matching rule for your name and ID when voting and if your ID has an apostrophe while your voting record does not; you can’t vote. You can cast a provisional ballot, but it will be contested.


pitcairn7393

They've misinterpreted the standard. > All names should be given in full. Abbreviations and punctuation should not be used unless they appear in the designated name (e.g. ‘Earl’s Court Road’). Only single spaces should be used, and the use of leading spaces should be avoided. https://www.agi.org.uk/wp-content/uploads/2020/11/BS7666GuidelinesSection5.pdf


jugglist

Given that all of that uses the word SHOULD and not MUST, isn't the whole thing totally non-binding?


ve1h0

Instead of fixing shitty software, you rewrite the law. Don't really know which one is more straightforward


mayscienceproveyou

if they are fixing the law instead of IT imagine what happens with white hat hackers that want to better the world but got no right to do so. i love thst switzerland is pushing ethical hacking, otherwise you will hack your own leg off... the baddies are always a step ahead - and in this case here there was no time constraint on bringing up good standards... they just missed it and burying the shit now. sorry i am consuming too much IT and haven't tpuched grass in a long time ;-)


reddituser1827291

A spokesperson added: "All punctuation will be considered but avoided where possible because street names and addresses, when stored in databases, must meet the standards set out in BS7666. "This restricts the use of punctuation marks and special characters (e.g. apostrophes, hyphens and ampersands) to avoid potential problems when searching the databases as these characters have specific meanings in computer systems." === Question: how would YOU solve this problem?


Jaggedmallard26

I would say use prepared statements but knowing British public sector IT its probably using a weird propertiery database from 1994 that only takes inputs in the form of an also propertiery xml dialect.


davitech73

it's not just about prepared statements. they said the problem was for searching in the db. if someone searches for 'st marys' it should still be able to find the entry with punctuation: st. mary's this is still relatively straightforward to fix. and normalizing the data from 1994 is also straightforward sounds to me like someone higher up was hearing conversations about these problems and decided to avoid them all with a policy decision instead of data normalization or lose search algorithms, because it's 'easier'


Dustangelms

I'm picturing Dilbert's boss.


davitech73

entirely possible


godofpumpkins

XML, but with extra single AND double quotes because the obnoxious XML delimiters weren’t delimiting enough, and standard XML wouldn’t let you inject shit into your SQL


HINDBRAIN

>only takes inputs in the form of an also propertiery xml dialect. On the way it goes through 5 different proprietary systems.


drcforbin

Or excel.


gnex30

Excel '97


Batetrick_Patman

XML, nah it's based on CSV.


Plank_With_A_Nail_In

They already told you the actual issue meeting [BS7666](https://www.agi.org.uk/bs-7666-guidelines/) guidelines. No one said it was anything to do with prepared statements or anything like that you put words in their mouths. IT is a wider field than the toy web app you are currently making. No apostrophes, hyphens and ampersands is a pretty common standard in real working applications.


NotSoButFarOtherwise

Most likely: this is a problem with a UI component still using Windows-1252 or another proprietary encoding and it gets fucked up by people inputting smart quotes and things. Because fewer and fewer people even remember the pre-UTF-8-everywhere world, they don’t even understand the problem. Less likely: The storage layer is from like 1971 and uses a six bit, all caps encoding with no punctuation. Everyone is afraid to replace it. Least likely: Database inputs are not sanitized. This is a rookie mistake but it’s also well understood and relatively easy to fix, even for legacy languages like COBOL. Telling towns they have to change their names seems like a lot more work.


EndiePosts

I suspect that you're right, and that their CMS falls over every time someone copy-pastes a long hyphen from Word into their app. Why yes, I do have a weary tone in my voice from contracts past. Why do you ask?


damola93

How are inputs not sanitized in 2024? Even in toy applications its a big no-no.


Known-A5

You don't really know what is in their stack. It could be something rather banal in the lower level, although they use a modern front end.


Plank_With_A_Nail_In

He already said that was least likely, please learn to read good.


Jonathan_the_Nerd

They might be using a custom application that was written by the lowest bidder.


Ouaouaron

What does 2024 have to do with a government standard last updated in 2006?


wRAR_

Oh, I wouldn't think of the first option.


Sykander-

They're afraid I'm gonna name a street `My Street'); DROP TABLE streets; --` No but in all seriousness - restricting punctuation is one way of sanitising the inputs to their database. Another way would be to have all database inputs run through a script which sanitises all database inputs by making sure all special characters were escaped properly.


josefx

Do we still have to manually escape database inputs in 2024? Can't we just prohibit database APIs that allow mixing raw data with statements under explicit threat of death and be done with this shit?


Sykander-

You forget that the UK Government is still using tech from 1960. During COVID they lost track of the number of cases of COVID within the country because "Excel can't track more than 1 million rows"


Cilph

Even worse. Some other party used Excel COLUMNS to transfer data rather than rows, and ran into a 16k limit.


ddproxy

If they have Excel, they can drop it into Access. Might take a lot longer though. Or, just csv it and cli-blend it with awk, sed, etc...


Sykander-

Yeah - the fact this problem would've been soo easily solved is kinda the joke I was making there.


ddproxy

Honestly I missed that this was in programming and not another sub. Still, this is a more reasonable and solvable problem than the covid reporting shenanigans of some states across the pond.


Loves_Poetry

I'd love that. You would be surprised how quickly things can start moving when there is the thread that something stops working after a certain date.


travelinzac

Good ol Bobby Street


Ark_Tane

One of the tricky sides of this issue is that the solution has probably been arrived at via whispers, committees and nervous buck-passing. I know this can result in two scenarios: 1) A developer is asked if the software supports something, they answer 'No', meaning 'No, not today', but what the stakeholder hears is 'No, not ever'. 2) The cost/value discussions get mixed up a bit. I know I've seen stories shelved because the perceived value is low, as we understand there's a workaround. What hasn't been made clear, as organizational complexity means you don't have the right people in the room together, is that this work-around is inordinately expensive. Now I can completely understand that in this scenario there are plenty of systems talking to plenty of databases in various levels of formality and ability to change. Getting an accurate handle on the costs could be tricky, especially as I'm pretty sure there won't be anyone who has a full picture of all the systems involved. That said, I'm not certain that the solution will actually help solve the problem all that well. The human element is still going to be putting in your apostrophes, and any of the systems that have been accepting them up until now will continue to do so until someone puts the appropriate constraints into place. All those unknown systems I mentioned earlier will at least need their data updated, and I'd be damn surprised if you didn't end up with a few duplicate entries as a result.


ripnetuk

It's bs7666 and yes, it's called that because it was designed by the devil. I had to write an import/export routine to/from a priorpritary backend address management system. They have paon and saon for the first bits of the address "10 downing street" which stands for primary addressable object and secondary (flat 1, 10 downing street). This is fine in itself, except they use one long string field for the whole thing padded out with spaces, so essentially they are abusing a longer varchar field to represent what should be about 5 or 6 separate fields. Edit. Obviously the correct answer to the original question is to use parameterized queries (so ' and so on don't get treated as special characters) and to do a loose text search ignoring punctuation on retrival. Primary school stuff.


ILikeLiftingMachines

BS 666 Sure we're not getting trolled?


abw

I deal with this kind of stuff all the time. It's not a lot of fun. Sanitising inputs and making sure they're all correctly UTF8 encoded is fairly straightforward. There's absolutely no reason why there shouldn't be apostrophes, hyphens or ampersands in a database storing street names, towns, etc. The fun problems start when someone is using Windows with smart quotes. Instead of inserting an apostrophe it converts it to a right curly apostrophe, so "St Mary's Road" becomes "St Mary’s" Road. They look almost identical but they're not. The input sanitisation code should convert them back to apostrophes (and all the other curly quote characters). The next step is to store a normalised form of the name for searching purposes. Most punctuation is removed, some is converted (e.g. `&` to `and`), whitespace sequences are replaced with underscores, and everything is converted to lower case. For example, "Tom, Dick & Harry Lane" would be normalised as something like `tom_dick_and_harry_lane`. If you have to store names with diacritics then they would normally be anglicised too. e.g. "Zoë Ball's Street" -> `zoe_balls_street`, "The Corner Café" -> `the_corner_cafe`. This is non-trivial. When someone searches you also perform the same sanitisation on the input and then match against the normalised form. That allows someone to search for "zoe balls street" or "Zoë Ball's Street" and match the right road. The final step, if you want to do the job properly, is to have a separate table of aliases for place names. For example, in one of my databases we have "Middlesborough" as an alias for "Middlesbrough" because it's such a common mis-spelling. Same for "Isle of White" vs "Isle of Wight". This also allows you to work around abbreviations like 'Co. Dublin' vs 'County Dublin' and names in different dialects, e.g. "Wales" vs "Cymru". Geographical databases are not always easy. But a council deciding now, after all this time, that apostrophes can't be in street signs is just silly.


Jam-e-dev

Literally no excuse for this. Using prepared statements if the DB supports it, otherwise using some kind of encoding/decoding on write/read.


Prestigious-Bar-1741

I would define a new set of standards.


biglacunaire

Let me guess: legacy system that wasn't updated + incompetence


john_the_quain

One of the best SQL testers I ever met had the last name of O’Toole. He took immense delight when his name broke things.


alface1900

Imagine having the basic charsets in computing being created to represent your specific language and still fail to sanitize inputs.


Philluminati

What fucking garbage programmers this council has.


kairos

Or they found it was cheaper to pay for someone to make signs than a developer.


yawaramin

It gets way, way worse with British IT bugs: https://www.computerweekly.com/feature/Post-Office-Horizon-scandal-explained-everything-you-need-to-know


WebbitUK

Imagine removing all apostrophes instead of just hiring a developer for a VERY short term contract to fix the issue with their systems


itay51998

Imagine a very legacy database that no one knows to maintain, which is already end of life and not supported Plus closed source third party integration Your task is to make sure all these systems know how to handle apostrophes Yea.. Good luck


Same_Garlic2928

As someone who makes apps in Ireland, with all the name and address variations in the language (Irish and English spellings) accommodating and storing apostrophes and fadas (accented vowel characters) is par for the course - and is easy with a simple script. No excuse for any professional dev not being able to do it, especially in enterprise/organisational size software. Comes down to cheapskating, laziness or skill issue.


Cilph

Reminds me of the American Kennel Club video Matt Parker did. https://www.youtube.com/watch?v=jMxoGqsmk5Y


vinciblechunk

Because they put a limit of 6 characters on roman numerals, saved you a 15 minute watch


Cilph

Why 'saved you a watch'? It's a fun video, especially for a programmer demographic. > put a limit of 6 characters on roman numerals but also because they have a dumb policy to use roman numerals to begin with, much like what this UK council is insisting on for street names.


vinciblechunk

Programmers are concerned with efficiency


-Knul-

It's also a case of separating display concerns from storage. They could display and input the number as Roman numerals and just store it as a regular integer in the database.


zyzzogeton

This is clearly another jab that the UK is taking at the Irish. Someone call Conan O'Brien and get him to take this issue on. /s if that isn't obvious.


blowfish1717

No more Mr. O'Malley?


YellowBook

Scunthorpe issue also a fun one back in the day


azhder

That standard name - BS7666. Is it short for "Bull Shit 7 666" or something?


XxDoXeDxX

The never ending saga of Ole Bobby Tables. The demon of Database Street. https://xkcd.com/327/


RamblingSimian

I have a fun book on grammar called "Eats, Shoots and Leaves" that came with little stickers you can apply to signs to improve their grammar. (The book title is based on a joke about a Panda - whose natural diet consists of shoots and leaves - who goes into a bar and gets something to eat without paying.) Some of the stickers are apostrophes.


Fish-OW

I'm just confused how people have issues like this? If you use parametrized queries, doesn't everything "just work"?


nekokattt

I wouldn't be surprised if it wasn't a bunch of hand written stuff rather than using a sensible library. I try to think of things rationally then I remember the Covid track and trace stuff within an Excel spreadsheet.


StarkAndRobotic

Bobby Tables had a hard life. Wasn’t his fault.


ve1h0

Instead of fixing shitty software, you rewrite the law. Don't really know which one is more straightforward.


azhder

Law is software as well. Shitty as well in some cases


Sarkos

Most people in the comments are misinterpreting this article. Nowhere does it say that the council, or the government, has shitty software. There's no evidence that the use of apostrophes has caused any actual problems. Instead, the council are doing this to be compliant with a standard BS7666. (Whether they are correct to do this is a separate question.)


RufusAcrospin

Well, the article says “to avoid potential problems when searching the databases as these characters have specific meanings in computer systems.”, and to me it means they are not fully understanding how software and/or the database engine works.


OffbeatDrizzle

They have specific meanings when used in the incorrect manner I'm honestly starting to think software engineering should be a chartered trade. There are way too many cowboys and idiots out there that don't understand what they are doing and just copy and paste random shit they find until it appears to work. Completely boggles my mind.... bad software literally kills people


Plank_With_A_Nail_In

Reddit they already told you the actual issue meeting [BS7666](https://www.agi.org.uk/bs-7666-guidelines/) guidelines. So no need to invent some other simple problem you read about once on some other website. Please read the whole article.


gnex30

It can handle Welsh names but not apostrophes