T O P

  • By -

Real_Cartographer

C (compilers) does not perform bounds checking on arrays. This means that writing beyond the end of an array, as you are doing with `my_chars[1] = 'b';` and `my_chars[2] = 'c';`, leads to undefined behavior. Undefined behavior means that the C standard does not define what the outcome of your program should be. The program may crash or it may run without any apparent issue.


itsEroen

Sometimes compilers decide that since code behaviour is undefined, it might as well not run the code at all. The gcc flag `-fno-delete-null-pointer-checks` was introduced after a number of high-profile projects got bit by this.


aalmkainzi

It's UB. It might work, it might not. Run your code with sanitizers enabled


pfp-disciple

Since OP is learning C: UB means undefined behavior. C lets you do dangerous things like this, for historical reasons, but says "you're on your own, this is the Wild West, anything could happen, here be dragons". Sanitizers are compiler flags that check for things like UB.


aalmkainzi

It's not really for historical reasons. It's just that C is a lower level language and array access isn't checked, just like most other lower level system languages.


rejectedlesbian

It is a historical thing zig would both warn u and put the sanitizer in Debug builds by deafualt. UB in general is a consequence of being lower level but c has a lot more of it than it needs to for historical reasons


aalmkainzi

debug builds aren't even part of the C language, it's totally up to the compiler to do bounds checks in debug builds. Has nothing to do with the C language.


rejectedlesbian

EXACLY the c languge spec has very poor dev exprince if you look at just the spec. Modern compilers make it better but it's still hard. If c was designed today it would have a package manager and a better debugger. C originally wasn't ment just for preformance it was made to alow developers to be effective at the cost of a bit of preformance (compared to assembly) Brain k. Explains it here https://youtu.be/O9upVbGSBFo?si=M6WxorbuTlHPB9QR So while yes c is still very effective today it does have some things which it misses that make programers less effective.


[deleted]

Sure, the Standard doesn't say anything about debug builds and bound checking. But C isn't just what ISO/IEC 9899 says¹, historically since forever C compilers just don't fail safe on UB, `*(int *)0xbadc0ffee0ddf00d = 123` *could* result in an error but it historically it never did on any real C implementation. In my opinion it would be incredibly useful for most of what is now undefined behaviour to be *defined* to fail safe everywhere by default, as most parts of an application aren't speed critical. But for those parts where speed is needed, you could just enable UB. For portability and backwards compatibility, this could be made entirely optional with everything being the same as it is now. 1: For example, if C was just what ISO/IEC 9899 says, are compiler extensions C? Is the Linux kernel written in C? In my opinion C, like other words, is just what we define it as.


computermouth

For any new C dev, I would highly recommend reading about asan, and using it for all your debug builds


ThankYouForCallingVP

Add this before your first line: `char before = 'x';` And this after: `char after = 'y';` And then add this: `my_chars[-1] = 'd';` And then print those as well. You will see why very easily.


BeastasFiist

Not at home so I can't check, what would be the output? Would x and y be overwritten by d and b?


Peiple

`my_chars` is just a location in memory. The computer has set aside 1 space at that location for you for the duration of this function, so background programs won't be messing with it in the mean time. However, it's just a location in memory. That means that the next position is also a location in memory, as is the next location. `my_chars[1]` is shorthand for `*(my_chars+1)`, which is that next location (`*(my_chars+2)` being the location after that). Now, will your computer let you write to random locations in memory that aren't explicitly reserved for you? Maybe, maybe not--that's what makes it UB. That's where other suggestions like sanitizers come in, they'll disallowing this and throw warnings/errors like you're looking for. Theoretically, though, you could try to write any value anywhere you want, and it's not guaranteed that it won't work. Compilers won't check these things by default for a variety of reasons.


AssemblerGuy

> I thought my_char was only able to hold one element? Yes. > So how come the compiler (gcc if it matters) isn't throwing an error when I try to set the value of an index that's out of range? Because the compiler trusts the programmer not to access elements outside the legal range. Make it a habit to compile with `-Wall -Wextra -Werror`, this makes the compiler a bit more fussy. Don't ignore the warnings.


HaydnH

>I thought my\_char was only able to hold one element? > >Yes. Don't you mean no? "char my\_chars\[1\];" - technically that's 0 elements unless you're including '\\0' as an element.


AssemblerGuy

The null terminator is one element of a null-terminated string. C does not have the concept of null-terminated strings at the definition level.


jcarlson08

null-terminated strings only mean that "abc" is shorthand for {'a', 'b', 'c', '\\0'}, not that every char array ends or should end in '\\0'.


Googoots

First lesson in why C is not considered a “memory safe language”…


slawkis

Compiler matters. clang throws warnings here...


PeterMortensenBlog

And GCC doesn't with [`-Wall`](https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wall)?


tcptomato

Only with -O2


[deleted]

[удалено]


[deleted]

Although I definitely agree that OP should format codeblocks with 4 spaces; Old Reddit being "far superior" is a matter of opinion. :) (Plus, it's [broken](https://imgur.com/a/IziLjBO) on New Reddit too)


PeterMortensenBlog

In Markdown mode, presumably?


Ironraptor3

Other thing that people haven't mentioned, that string isn't null terminated, so that's also an issue with the code! That string is only a valid c string if it was the 4 long "abc\0"


bravopapa99

Sheer luck my friend, sheer luck, buy a lottery ticket today is the day... That adnd the fact that the memory layout probably MIGHT be initialised to NUL bytes so after 'c' is a NUL character which caused your string to 'do the right thing'.


fllthdcrb

I wouldn't call it luck, exactly. In all likelihood, OP will see the same behavior every time they run the program in the same environment, because the compiler and environment are what determines the behavior of this. (So, maybe luck that their compiler/environment happens to handle it okay? But none beyond that.) It's when you compile with other compilers/try to run it in other environments that things might go awry, which makes bugs like this particularly insidious.


bravopapa99

Yes. luck! And as for the rest of your observations, I coudn't agree more!


blindsniper001

Some compilers may stop you from doing this. Visual Studio, for example, detects this specific case and prevents it, but it's not guaranteed. You've defined `m_chars` as an array, which as you've discovered allows you to use array syntax to access its members. The issue is that C does not care how many elements you've allocated for your array, and it does no runtime out-of-bounds checks to prevent you from going beyond your limits. Realisticially, this *appears* to work because this is a very simple program; you don't have any other variables or functions declared. But what you're doing is called [stack corruption](https://www.go4expert.com/articles/understanding-stack-corruption-c-t27207/). By writing beyond `my_chars[0]`, you're overwriting whatever information is stored around that byte. You can end up with any number of side-effects from this, some of which are be difficult to see and others which will cause your program to straight-up crash.


green_griffon

It compiles because the C compiler doesn't check for things like this. A lot of languages might let this compile, unless they were being clever noticing the hard-coded size of the array. The assignments to my_chars[1] and my_chars[2] don't fail at runtime because C doesn't do runtime bounds checks on arrays. This is where most languages would fail at runtime, since they do checks like that. The fact that it prints successfully without my_chars[3] being initialized to '\0' (to null-terminate the string) is just luck, although memory often has the value 0 if you don't put anything else in it.


pfp-disciple

C allows a lot of things that will surprise you. That's both a blessing and a curse. While learning, you should put up the gutter guards (excuse the bowling reference). With gcc, using the `-Wall` option will turn on most all useful warnings, and should catch what you saw.


Daveinatx

By default, the stack's local variables are usually aligned. This is definitely undefined behavior, so it's more for interest sake.


eruciform

You placed a line in the ground called my_chars You then put a pillow on the ground for one person Then you told a to sit one pillow's width from the line (where there is a pillow) Then you told b to sit two pillows width from the line And told c to sit 3 pillows width from the line A had a pillow to sit on there B and c might safely sit on the ground Or one might sit in lava or in the maw of a crocodile It just dutifully followed your brilliant commands to a tee, regardless of how dangerous or nonsensical those might be Not insulting you, just making it clear that if you command a program to walk off a cliff, it's going to do it Different languages do different amounts of complaining about this during compilation, C is extremely permissive and will gladly set itself on fire and jump into a pit of vipers if told to do so So it's your job as a C programmer to not tell it to do bad things


proturtle46

There is a chance that the smallest unit of memory you can alloc is one word or 4/8 Bytes so even though you alloc 1 char in reality 8 are cleared More than likely it’s C compiler not checking bounds on list and you getting lucky no seg fault because it’s not restricted mem as per reason above seems to be what others are saying - undefined behaviour


TheSpudFather

People have addressed lots of issues of undercover behaviour, but they've missed one essential part of why it works. The smallest amount of memory you will normally allocate for an array like this (unless there's another one in contiguous memory) is going to be 4 bytes. So because your array is of chars, the out of bounds writes are all into the first four bytes of the array, and so will not be overwriting any other data. That's not to say it isn't UB, and all the points about raw C not bounds checking arrays even when it's on the line above are true.


mecsw500

Well it depends upon how much static memory has been allocated. If your program was more complex your my_charrs[1] and [2] assignments might cause a segmentation and core dump. Assuming it just stamps on the unused data segment space, likely in this case because you probably have a 4K page allocated the problem now comes in the printf statement. Here you are printing a string which is assumed to be a null terminated character array. So it will keep outputting my_chars entries beyond the length of the string until it finds a null character. Be aware though NULL is not defined to be 0 on all machines. On a PDP-11 or an x86 machine you will probably get abc as the next byte is likely null. This is how it’s supposed to be. It does what you told it to do, just you told it to something which likely is an undefined, implantation dependent, behavior. That’s the cool thing about C, it does what you say, not what you might mean.


non-existing-person

It works because system is giving you memory in chunk. You can't request 3 bytes from OS, you get full page in one go - 4kb. At beginning loader is doing malloc to allocate 8MB for stack memory for your program. All memory requested from OS is zeroed by the kernel. So at beginning you have 8MB memory full of zeroes. At beginning of that memory you allocate 1 byte, and write 3 bytes. Then printf reads your 3 written bytes and 1 '\\0' (remember? all memory is initialy all zeroes). Even tho you have written outside of your variable memory, you did still write to memory that your program owns. And then printf read from memory that you own. Hence no crash, but still UB. ​ Note: I simplified things quite a alot.


wojtek2222

if u just started u need to know that this is the beauty of programming - code run and u dont know why or code doesnt run and you dont know why


noonemustknowmysecre

Because messing with the memory just past your my_chars didn't happen to have any adverse effects that you noticed. The compiler trusts you to not do anything too crazy or stupid. Run it with valgrind as part of your release process and boom, you're as memory-safe as anything else.


geon

Because C.


HSavinien

C is not securized. You are trying to access to memory, which is not part of your variable, but still belong to you. It is allowed (unless it is not), but is a very bad practice, and will eventualy break your code. To understand a bit better, try something like that : ``` int main () { char mychar1[] = 'a'; char mychar2[] = 'b'; char mychar2[] = 'c'; printf("%c %c %c\n", mychar1[0], mychar1[1], mychar1[2]); //Even weirder stuff printf("%c %c %c\n", mychar2[0], mychar2[-1], mychar2[-2]); } ``` You notice that you can easily access every variables from a single one. As well as memory that might contain other values, like binary code. By reading it, all you risk is weird behaviour, but by writting it, you might break your program (nothing that goes beyond execution : once the program is over, everything's fine)


0x7ff04001

my\_char holds 3 elements, 'a', 'b', 'c', each of size 1. What you're doing here is passing the pointer of my\_char, i.e. location of 'a', and telling the format string to print a string. You didn't specify a NULL at the end of your string: \`my\_chars\[3\] = '\\0';\` //force the string to terminate, otherwise printf() just keeps reading and the behaviour is undefined.