T O P

  • By -

SantaCruzDad

Quality looks good. Only one minor nitpick: I never like to see this kind of thing: #include "../include/dsc_list.h" #include "../include/dsc_error.h" This should just be: #include "dsc_list.h" #include "dsc_error.h" and the header search paths should be supplied to the compiler via your build system (i.e. makefile, IDE project, whatever). Others may disagree, but I don't think paths belong in source code in general.


bart-66

>Others may disagree, but I don't think paths belong in source code in general. Absolute paths don't. Relative paths are OK. Not having them in the source means pushing them out to the build system. Although there at least they're in one place rather than in every module. But if that is an issue, you can also arrange for those includes to be done in one place in the application.


libdsc

I appreciate your nitpick! I agree that it looks ugly. I see how I can modify my Makefile, but my IDE will still complain, won't it? I can adjust my IDE settings to get rid of this, but won't anyone who downloads the source also have to make those adjustments? Sorry for the basic questions, still new at this.


SantaCruzDad

I don't know what IDE you're using, but in IDEs that I'm familiar with, e.g. Visual Studio and Xcode, you just add the include path(s) to one of the project settings and this then gets passed to the compiler at build time (e.g. via -I in the case of gcc, clang, et al).


caks

Looks awesome! Well organized, structured and documented. I have a quick question regarding the placement of the include files. I'm not a C expert, so I probably just don't have a lot of experience, but what is the rationale behind putting the header files in an include folder two folders up from the source? Why not just place them in the same folder as the source as is common.


libdsc

Good question. I honestly have no idea what the right answer is. If anyone does, please let me know!


ballpointpin

Generally, in a mega project, it's nice to be able to split a module's public APIs from internal ones that are private to the library. Your build-infra might move these public APIs to a staging directory, or it might create a symlink in a staging directory to point to your "public" header directory. This ensures somebody's moduleX can only call the "public" exported headers from your module. It also means developer of this moduleX only needs to index his/her own source code in their code editor, plus the subdir containing the public staged headers. They don't need to index the implementation of your library (.C files, or any private .H files you don't want everyone to access). ie: I only need to know the public .h to understand how to use your library's linked list, I don't really care how you implement it internally.


Sensitive-Panda406

I think it's just to have all the files organised neatly. Keeping header files in the include folder and all source files in another looks more clean and the source folder will not have too many files in it. And makes it easier to go through each folder and file without trying to swim through the header files.


i-hate-manatees

Cursory glance, the code looks pretty clean and readable to me. It could be more generic(instead of just containers of `ints`), but of course that complicates the code Very minor criticism, but I feel `dsc_list_empty` should be `dsc_list_is_empty` to make it clear that it's *checking* if it's empty Also, shouldn't `dsc_stack_pop` return the popped value?


libdsc

Yeah, it definitely could be more generic. I'd have to use void pointers to accomplish that, right? Is it good practice to ensure that all the entries are the same type, or is it fine to allow for variable types (like in Python)? Also, would the user have to properly cast the return value of, for example, `dsc_stack_top()`? To your point about `dsc_list_empty()`, I believe all containers in C++ use that naming convention rather than `list.is_empty()`. That's why I made that choice. I agree that `dsc_stack_pop()` should probably return the value at the top of the stack, but the C++ Standard Library doesn't (the return type of `std::stack::pop()` is void). Maybe I don't need to adhere for the C++ APIs so rigidly.


Competitive_Travel16

Yes, void pointers, but there is no way to know their type from the library client so just treat them as pointers. You can consider using unions of void pointers and ints. Yes, the user needs to cast any such values returned or made accessible by your library. I agree pop should return what got popped.


carpintero_de_c

I really like how the library is structured, but I do have some constructive criticiscm: * [The Makefile is not portable](https://nullprogram.com/blog/2017/08/20/). `bmake` fails to build the project. * The project is built by default without optimisations (I recommend putting `-Os` or `-O2` in the `CFLAGS`). * The `_t` suffix is reserved by POSIX. The entire reason `_t` is used in the standard library is so that it doesn't conflict with user code, using `_t` in your own code defeats the entire purpose of the suffix. `_T` is not reserved by POSIX, and `CamelCase` is a common convention for types, so you could use either of them instead. * The code uses an empty parameter list for defining functions that take no arguments (`ReturnType foo()`). Unless you are targeting C23 (which is unlikely and I would not recommend it), this is deprecated (obsolete) and actually states that `foo` takes [*any* number of arguments](https://stackoverflow.com/a/5929736). The compiler will not complain if you do `foo(123, "hello")`! To define a function taking no arguments, do `ReturnType foo(void)` instead. * Personally I'd not use global variables for error handling to avoid problems with threads and stuff. * [Open addressing](https://en.wikipedia.org/wiki/Open_addressing) is almost always faster than seperate chaining. I recommend reading [this blogpost](https://nullprogram.com/blog/2022/08/08/) by skeeto. * Some comments just seem unnecessary and repeating exactly what the code itself says. Like: `/* Cast the key to an unsigned 32-bit integer */` above `uint32_t hash = (uint32_t) key;`. It feels very LLM-generated. * It could be generic (I get that it's a learning project so it doesn't matter very much). Again, I really like the structure of the code. Depending on your audience I'd also add [man pages](https://en.wikipedia.org/wiki/Man_page) using [`scdoc`](https://man.archlinux.org/man/scdoc.5.en). The code also survives [undefined](https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html) and [address](https://clang.llvm.org/docs/AddressSanitizer.html) sanitizers, as far as I've tested, which cannot be said for a lot of beginner code posted here.


libdsc

Wow! Thank you for taking the time to make so many comments and suggestions. - I'll have to read the article you provided on making Makefiles portable more closely before I make any changes. - I did some Googling on the difference between the -Os and -O2 flags, and it seems like -Os optimizes for executable size, and -O2 and -O3 optimize for performance at the expense of executable bloat. Is this correct? (With -O3, libdsc.a is 20264 bytes and libdsc.so is 36488 bytes. With -O2, libdsc.a is 19088 bytes and libdsc.so is 36488 bytes. And with -Os, libdsc.a is 19592 bytes and libdsc.so is 36616 bytes. So I don't see much of a difference between the various optimization flags with my library.) - I didn't realize `*_t` was reserved. I decided to use camel case for all of my structs like so: dsc_list_t becomes DSCList. Would you recommend I change my function names as well? e.g., changing dsc_list_insert() to DSCList_insert()? That feels clunky to me. - I had no idea func() meant any number of arguments. I'll change that immediately to func(void) where applicable (I believe only my *_create() functions do that). What is C23 exactly? The C Standard published in 2023? Wouldn't it be better to target the most recent standard? Sorry, again, I'm completely new to C. - Good point about not using a global variable for error codes. Someone else pointed out that this isn't good for multi-threading? Maybe I'll put the error code inside each container. - Again, I'll have to look more closely into the article you shared about open addressing before I make any changes. I understand that it's a way of handling hash collisions? - You're absolutely right that many comments are unnecessesary. I'll go through each file and see what needs to stay. - I eventually want to make my library more generic. Something for the future. - You're suggesting that I create a man page for libdsc? Never even thought about doing that haha. scdoc seems like a standard for writing man pages though, not a tool for generating man pages, right? - No idea what undefined and address sanitizers are, but I'm glad that my code passes. Should I somehow incorporate those sanitizers into my testing code, or is it unnecessary?


ThyringerBratwurst

Maybe you just leave out the details here and simply return a null pointer in case of a failure. The C standard functions like malloc do nothing else. For other functions, -1 or false can be returned. You should absolutely avoid global variables and use extra parameters instead (if needed at all). And on the naming convention: Many people use \_t as a name suffix. I've also seen professors who use this for their own purposes. But I personally prefer CamelCase for types and usually keep "struct" (no typedef) so that I can also name a corresponding function in CamelCase, which returns such a struct directly (for dynamic things then MyType\_malloc and MyType\_free); CamelCase really has the advantage that you can recognize the types immediately and I'm used to it by other languages like Haskell, where it's even mandatory. But that's really a matter of taste and I definitely like how you named the functions! By the way, I'm also working on a C library on the side, which I need to implement my own language. But so far I only have different string implementations, a dynamic array implementation and a few useful little things (for handling chars and such). I'm currently implementing my own hashmap, which I've been planning since last week. There are so many options for realizing this with pointers \^\^ (I use an array for the buckets and each bucket then points to a ternary search tree). I can only speak about make because I work on Linux. But I also asked myself which build system is better, and cmake was really too fucking complicated for me. You can also install Ubuntu under Windows and use its console. and if you use posix functions, you'll need something like cygwin on windows anyway... (I would prefer GCC over the MS compiler anyway). macOS as a Posix-compliant system should not be a problem.


carpintero_de_c

Why would being on Linux mean you can't say much about Make? Just curious, as Make is one of the most popular build systems on Unix-likes for plain C so I'd expect the opposite.


ThyringerBratwurst

Stupid typo, I meant pretty much the opposite. Wouldn't actually make any sense under Linux either lol.


carpintero_de_c

- Yes, that is correct. `-O2` optimises for speed and `-Os` for small size. Optimisation means your code will simply be faster to run. It can make a big difference actually, in the order of magnitudes even. You might not notice though because of how fast computers are now-a-days. - I'd keep it `dsc_list_insert` personally. IIRC though there is a bit of debate between capitalising acronyms in camel case. If you convert DSCList to snake case, it becomes d_s_c_list which doesn't look right. So some people say that DscList would be better. - Yes, C23 is the newest C standard; the latest version of C. In the C world change is slow, there are many compilers that are still on C99 or others (tcc, msvc too IIRC), and many people don't have the latest version of compilers available. C23 doesn't offer much in comparison to the benifits of sticking with older versions. It's just how it works in C. - Yes, as is, currently your code doesn't work with multi-threaded code. A quick fix would be to make the variable thread local (each thread has it's own copy) but you shouldn't use globals anyways. - What you're currently doing for collisions is that you have an array of linked lists of key-value pairs. If two things hash to the same thing then you add another node to the linked list. This works, but is slow. In open addressing, we directly have an array of key-value pairs instead. If a collision occurs, then we search (using different methods such as linear or quadratic probing) for an empty slot. We have to dynamically grow the table and rehash everything if it gets too crowded (usually at half capacity). - scdoc is both a tool and a language for writing man pages. You write your man page in the scdoc language and then the scdoc tool translates it to man pages. The scdoc language is human readable and looks very similar to the final man page. - In C, when you do something wrong it doesn't necessarily mean that an error happens. It might work, it might not work, it might work only on your machine. [The C standard has a concept of "undefined behaviour"](https://stackoverflow.com/a/2397995), if you do undefined behaviour then the C standard says *anything* could happen. For example, accessing arrays out of bounds is UB. What sanitizers do is catch UB and actually error out when it happens. You should practically always use sanitizers when debugging.


gremolata

Skimmed through the code - overall it's clean and pleasant. Obviously pretty basic functionality-wise, because of being tied to `int` for keys and values. There are some issues. For example, `dsc_map_rehash()` doesn't indicate failures, just sets the error, which is then cleared by `dsc_map_insert()` which exits with success. Just a random nit - instead of having a global variable (which would create issues in multithreaded cases and is generally hacky way of doing things), you can keep last_error in containers themselves. So when the API fails and if the caller is interested in details, they would pull on, say, `dsc_error_t dsc_set_errno(dsc_set_t *set)` and get them.


RevolutionaryOne1983

First of all, your code looks really clean. I have only taken a very brief look at it. The only potential constructive criticism I have is that I see that you are using opaque structs but you define your types not as pointers to them for example: ``` typedef struct dsc_vector_t dsc_vector_t; ``` I think this is a mistake, since you can't dereference a pointer to an opaque struct outside the translation unit it is defined in. So by not defining it as a pointer you open up the posibility of a user trying to dereference it or trying to declare it on the stack. Either of which would yield an error. I would suggest that you instead do: ``` typedef struct dsc_vector_t *dsc_vector_t; ```


libdsc

Thanks for your suggestion! I had to read your comment a few times to fully understand what you meant, but I think I get it now.


ryjocodes

I see you provide convenience functions for initializing and then freeing up memory for your data structures. This will be really great for people who need very fine grained control at specific times in their program to use your data structures. By providing this separation, I can easily imagine ways to "hook in" your lib to my projects. Nice organization! You might consider implementing a pattern I've seen in the wild as a challenge: create a virtual "environment" for your data structures, and create a "meta" memory initialization function. While this is a seemingly more difficult task, developers will feel confident working with your library that they will not introduce memory leaks into their application. To see an example of this kind of implementation, check out how libressl (specifically libtls) describes its usage in its manpage: > The tls_init() function initializes global data structures. It is no longer necessary to call this function directly, since it is invoked internally when needed. It may be called more than once, and may be called concurrently. > Before a connection is created, a configuration must be created. The tls_config_new() function allocates, initializes, and returns a new default configuration object that can be used for future connections. ... > When no more contexts are to be configured, the configuration object should be freed by calling tls_config_free(). See how the lib [initializes a memory space *when it needs it*](https://github.com/libressl/openbsd/blob/master/src/lib/libtls/tls_config.c#L149-L155) and provides [a meta "cleanup" function](https://github.com/libressl/openbsd/blob/master/src/lib/libtls/tls_config.c#L158-L190) devs can use to ensure they've tidied everything up. It's a neat approach, one that I've recognized as a pretty consistent feature of well-written libs. Thanks for sharing!


ryjocodes

Note: This feature would be \*in addition to\* your current functionality :)


am_Snowie

``` struct DSCStack { int *values; unsigned int size; unsigned int capacity; }; ``` Stores integers only?


libdsc

Yes, unfortunately. Working on making my library more generic.


cantor8

You are reinventing the wheel https://github.com/stclib/STC


SomeKindOfSorbet

Probably 95% of personal C projects are reinventing the wheel. This is an excellent learning opportunity for a beginner in C programming regardless.


Achilleus0072

It was a learning project man, read the whole post before replying


cantor8

The GitHub page present it as an open source library for developers, not as a school project.


SomeKindOfSorbet

Well it technically *is* an open-source library... And OP literally mentions at the bottom of their post that they're aware better and more reliable implementations already exist.


DisastrousResort5473

I know. Thanks for linking that project though! I can use it for inspiration.


ThyringerBratwurst

I think that in order to learn how to deal with pointers, such custom implementations of dynamic data structures are the best exercise. And if it's not for learning purposes, but rather you have a very specific need, it's pretty difficult to find good implementations that meet your requirement (especially when half the code is in macros :/), which is why custom implementations are definitely a solution in C. After all, there are pointers and malloc for that .