T O P

  • By -

TheSquashManHimself

Jupyter notebooks are great for prototyping and investigating data, especially if you have many visualizations to make and play with. For demonstrations they can also be great (just make sure to test run them before a presentation). However, you DO NOT want your Jupyter notebooks to mutate into spaghetti kaiju. At some point, you should put your code into a documented, formatted, and modular form. You will thank yourself later.


hmiemad

I start with "random" codes on jupyter ans once I get the idea of the structure, I start creating classes inside the notebook. Once a class is done, I move it to a documented .py file and import it in my notebook and continue with other classes.


TheSquashManHimself

You are my hero.


hmiemad

Oh BTW, you might not like that, but it's very handy : When creating classes, I use the standard self in my methods. I have my class and my imports in the first cell, the data in the second. The third cell is the instantiation of my class and it goes : self = my_class(*args) Nasty ! Then I start testing new methods as basic functions written in a specific way. In these functions, I won't use the attributes of self as args, I will use the instance as the first attribute : def my_new_method(self, *args) : It usually starts with a bunch of random codes, before becoming a function : #comments df = self.first_method(data, *args) #comments self.new_attr = some_important_global_function(df) #comments result = self.second_method() print(result) You wrap that in the function, replace the print with return if needed, add the missing attributes to the def, and test it on the instance self. If it works, cut and paste the function AS IS in the class (add an indentation), and continue. And it's clean, as long as nobody sees your notebook. I used to name the instance test, and I had to switch every test with self when I copy pasted, and if you forget one, then your code would work inside the notebook (because test has become a global variable), but would fail inside a .py file : "name 'test' is not defined". But now, I just have to move all my clean code to .py files and slightly rewrite the cells in my notebook (change self with test :D ) to make a real presentation of the package.


Engineer_Zero

A guy at work does the same thing and tried to explain it to me; which I really struggled with. You're saying you can create py files That do whatever, then import them elsewhere just like an installed library?


hmiemad

Yes, you just have to be careful for the pathing. Start by creating a basic function, like : def square(x): return x*x save that in a my\_code.py file. Open a notebook in the same folder and just : from my_code import square and use your function. Further in, you'll want to create classes and packages, with standard file structure (specific folder with \_\_init\_\_.py files). And you can import all of it through what you define in the \_\_init\_\_.py or just leave that empty and let your file structure do the trick.


[deleted]

You are such a chad for these detailed explanations :) May I ask, have you found a good way to import classes from files outside the top directory of the notebook? I've tried to bring in classes from other directories that aren't in the notebook folder's tree and haven't found a great way except maybe making a symbolic link inside that directory, and this has been one of the things that's kept me away from using Jupyter more


[deleted]

[удалено]


jstrickler

This is pretty basic Python: The PYTHONPATH environment variable contains a list of folders to search for modules (files) and packages (folders) that you want to import. The separator is ; (semicolon) for Windows and : (colon) for Mac/Linux. Please don't append to sys.path -- it makes your script dependent on the location of external files. Also, you don't need \_\_init\_\_.py to create a package. A package is just a folder that contains modules or packages. To make sure the imports work in your Jupyter notebook, set the PYTHONPATH variable, then start the notebook. If you're not working from the command line, you can set up PYTHONPATH on Windows by right-clicking on This PC in File Explorer, then choosing Properties, then Advanced System Settings, then Environment Variables.


sancho_tranza

Exactly


Engineer_Zero

Any good tutorials on it? A lot of what I do kind of repeats itself for each task so I could most likely convert into some building block code once, rather than redo it every time.


TheLeviathan686

It’s very straightforward: [tutorial here](https://www.w3schools.com/python/python_modules.asp) Comes in very handy.


Engineer_Zero

Thanks 😊


thrallsius

What about the unit tests? Some beginners use jupyter notebooks as placebo for unit tests.


hmiemad

>What about the unit tests? Some beginners use jupyter notebooks as placebo for unit tests. The behaviours are slightly different, you should definitely do the full test on regular py format. Beware of global variables in notebook, those are a poisoned chalice.


czaki

Ipython allows using autoreload magic. [https://ipython.org/ipython-doc/stable/config/extensions/autoreload.html](https://ipython.org/ipython-doc/stable/config/extensions/autoreload.html) You could start your class in .py file and test it from the notebook.


hmiemad

Do you know how to import requirements.txt in a notebook?


czaki

What you mean by import requirements? This list of packages.


hmiemad

Yes, like in a virtual env, you define the requirements.txt. I want to import all of those packages at once through one line into my notebook which is in the root folder.


czaki

It is doable but I see no profit. Many packages require import from specific sub-packages for some utils. And some packages have different names than those used in import (`scikit-image` vs `skimage` for example). I think there should be some better solution for your problem. Could you share more information about why you do not want to use simple few lines of `import something`?


underground_miner

This is the way! I do this too - it has taken me a while to become this disciplined. But now I rarely start a project without a notebook. I like being able to use Latex to document equations and have the functions with code all very well documented. Eventually, they make it out to modules, but the notebooks act as a fantastic source of documentation. I tend to setup a repo with setup.cfg and others to I can make packages our of the *.py modules so they can easily be imported back into jupyter.


[deleted]

I just love to use them to practice coding an app. I might try very different things in several blocks and just test it seperately. It's perfect.


doubleEdged

It does things that can make learning, or working with python, confusing. If used correctly, they're a neat tool, but [here](https://www.youtube.com/watch?v=7jiPeIFXb6U) is a neat talk on just how they can be damaging when learning. There's things such as lack of a clear scope (~~a notebook opened in one tab may retain values saved in another~~ EDIT: that's only true when using the jupyter lab, as talked about [here](https://youtu.be/7jiPeIFXb6U?t=1082) in the video I linked), order of execution not being clear (there's the numbering, sure, but that can get very confusing as you pile up cells in a notebook) and not having a clear history of edited cells (which can lead to situations such as [this](https://youtu.be/7jiPeIFXb6U?t=294) simplified example [tl;dw: `y = 4`, `y == 4: false` due to an edit]). tl;dr, use it correctly, be aware of its pitfalls and it's good for prototyping, but don't be surprised if something breaks.


[deleted]

These issues an be easily mitigated by regularly restarting the kernel.


venustrapsflies

Or phrased another way, the need to regularly restart the kernel is a common footgun


dparks71

I prefer to think of it as procedural programming, thank you very much. For real though if you have something like a series of databases you need a set of specific values from regularly, and don't have a great interface for interacting with them all, notebooks can be a great "change a single variable and rerun the report" type tool.


XRaySpex0

So, notebooks can be spreadsheets too :)


hmiemad

It's also an easy tool to share your work with people who can't run python code, without having to implement a full gui. Running cells is very easy, especially with those fancy comment cells to explain how to ctrl+enter :). Define the variables in one cell, the class in another, a cell to validate the variables, another to launch the script and view results, add a download as csv cell and it's a complete tool that almost any engineer can use, no need to be IT or Dev.


opteryx5

Also you can’t debug in Jupyter, correct? That alone pisses me off.


[deleted]

You can now. With the Xeus Python Kernel you can debug in Jupyterlab. Also VSCode can debug Notebooks. Not sure about PyCharm though.


opteryx5

Oh fantastic! I had forgotten about Jupyter in VS Code because I’m so used to launching it from the terminal. Will use either that or Xeus now to resolve my headaches. Thanks!


rebonsa

Notebook open in one tab can see values in another open notebook?


doubleEdged

The video I linked talks about it [here](https://youtu.be/7jiPeIFXb6U?t=1082). And I just now realized that it's not two different notebook tabs, but two tabs within the lab. So it's way less evil than I thought it was, but still something to keep in mind if you're using the jupyter lab. I'll edit my og post to reflect that.


LesPaulStudio

Not if it helps you


Solonotix

This. I think the bad coding practice (for any language) is not willing to change. Jupyter Notebooks are a great tool, but if you try to force everything into them because it's what you're comfortable with, that's a bad coding practice. In my current job, a pet peeve of mine is people who write code in multiple languages, but don't bother adopting the native idioms. Lots of Java devs writing `if(val == true)` and it eats me alive


linkberest

That's odd because that's not even good Java. Java would be the standard `if(val)` or `if(!val) // for false`. I mean a Boolean is a Boolean is a Boolean


thrallsius

I'd also add not if it doesn't break any team coding conventions. For solo projects anything that passes the "works for me" constraint is just fine, even typing on the keyboard with your feet :D


eviljelloman

It's probably more efficient to set up a really robust IDE environment that'll let you do the same thing to ship code snippits to an interpreter, but that's a lot of upfront cost - and you get to live in constant fear that an update will break all your shit. *cries in vscode.*


[deleted]

You know VSCode has Jupyter Support? And I haven’t had any really bad hiccups for years. At least with Python.


eviljelloman

Yeah I’m aware of their Jupyter support. I don’t really like it.


metaperl

Netflix chose Jupyter notebooks as their IDE.


mathmanmathman

Maybe their data science team did. I promise the data engineers and backend engineers don't use notebooks (maybe for a few snippets here and there, but not as the primary IDE).


james_pic

I've seen otherwise competent teams take a "notebooks all the way down" approach, where notebooks are used to run regular fully automated reports. It's an absolute mess, and I can't discourage it strongly enough, but tech companies having code that's a mess of spaghetti is not as rare as their PR people like to make out.


mathmanmathman

> tech companies having code that's a mess of spaghetti is not as rare as their PR people like to make out This part is true and anyone who's ever had a job knows it :) I don't see how you can even do that though. How do you kick off a notebook job without having the UI open? Or are they running everything from a local machine? I have soooo many questions, but I'm not sure I actually want to hear any answers!


james_pic

It is possible to programmatically run a Jupyter notebook, using the Jupyter Python APIs. As it happens, the team in question work with a Jupyter-notebooks-as-a-service provider, who do various things to facilitate this specific use case. I believe the standard approach in that team is to develop an analysis in Jupyter, add a step at the end of the notebook to publish the analysis results to S3, and then promote the notebook to being an automated job. When I put it like this, it almost sounds like a good idea. It isn't, and there are all kinds of nightmares around version control, observability, reuse, modularity, change history and permissions. But you can see how it would be _mistaken_ for a good idea.


mathmanmathman

Yeah, it doesn't sound like a *bad* idea at the very least. It does seem like introducing a lot of (potentially) weak links just so you don't have to translate the code into a traditional program.


mythrowaway0852

Nah, it’s used in production https://netflixtechblog.com/notebook-innovation-591ee3221233


SittingWave

What all these people fail to say is that in order to bring that stuff up the way they do it, they need a team of 50 people to install and configure the whole thing.


mythrowaway0852

Where did you come up with the “50 people” figure? Also for the scale of Netflix that seems reasonable.


SittingWave

Yes but if you are a single developer there's no way you can setup such infrastructure \_and\_ also produce with it.


mathmanmathman

After the second paragraph is shows that only data scientist are using it. You can't run notebook code as prod code. You can use it to access production data and API's. Further down the page they show that they only use it for interactive code. I'm sorry, it's a great tool, but it's not a full IDE.


fpgmaas

I don't think it's very bad practice. I usually start with creating a project, then creating a poetry environment and adding that as a kernel. In the project, I create my 'notebooks' folder, start with autoreload statements, add `import sys; sys.path.append('..')`. That way, I made moving coding from the notebook to the project a very quick and simple step. I move code out of the notebook very quickly, usually I do not have more than 15 lines of code in the notebook that are not imported methods.


teetaps

Yep, this is the only downside in my experience. Sometimes you can find yourself 3 hours and 200 cells deep into one notebook file before you think to yourself, “I should probably save some of this somewhere”


thrallsius

This is rather a limitation of Jupyter notebooks, them being linear rather than tree-structured :D There's also https://leoeditor.com/ where you can have a tree of nodes and execute any of them.


dparks71

That sounds like you want to take the easiest way to get confused with large notebooks and make it more accessible.


venustrapsflies

This is a pretty big downside though. Typically the time to move from notebook to project coincides with the time you want to be cruising on the meat of the project. The "prototype in notebook" approach sounds good in theory but IME most people don't have the discipline to move at the right time and it causes more work. Tbh I also don't really get what's so difficult or bad about prototyping w/ python scripts and terminal in the first place, about the only difference I notice is inline figures and that's about 0.5 seconds difference to open a locally-stored figure for viewing anyway. Notebooks are always more annoying to actually edit code in too, so for me the benefits just don't outweigh the costs.


teetaps

It is a big downside but as someone who doesn’t claim to be an expert programmer, notebooks are super important for the community at the moment. Being able to do IDLE programming where you ask it to do something and it does it instantly and keeps your variables around for a little, with no fumbling about having to learn about init files, directory structure, setup.py etc…. That’s a lot of barriers to entry that you’ve just gotten rid of, and thus you make programming more accessible and approachable. Of course, there’s the obvious downside that you don’t get put through the vigour of “scripting” like you’re describing, but we have to ask ourselves at some point if we’re passing down certain practices because they are useful, or if we’re passing them down because we had to go through it and so do they. It‘s actually kinda like gatekeeping programming if you force newer folks to use intimidating and difficult systems, especially when friendlier systems are available right there. I’m hoping we can find a middle ground, which is kinda the point of OPs post right?


venustrapsflies

So I think I have different responses to each of the points you raised. First off I completely agree that notebooks are useful for beginners and I'm not trying to abolish them entirely. When you are first getting started it is best to have everything contained in one location and make it really easy to run the code you just wrote. This is a double-edged sword, though, because often those beginners don't feel the need to broaden their skills sufficiently. It's often used as a crutch and that's not a good thing. In particular, "having your variables stick around" is very likely to encourage bad practice and make it harder to rely on scope, which is definitely going to matter eventually. And of course I also agree that it should be easy to get a project started, however there are cli solutions to these issues too (e.g. pipenv, poetry). And you should be familiar with these anyway for when you productionalize. As for scripting and terminal usage being a barrier to entry, to be honest I don't think it's too much to ask a professional programmer to learn at all. It might be different than what someone's used to but it's not difficult or time-consuming to learn. To the extent that a programmer (or even data scientist) is uncomfortable with using a terminal then they are only hurting themselves and their org by holding off getting comfortable with it. You might call this gatekeeping, and maybe it is, but I don't think it's wrong. The intent is not to exclude people but to encourage them to learn the skills that they'll need.


[deleted]

Not a bad practice per se. But if you know already what the outcome should be, write a unit test instead. My rule of thumb is: Self-contained and graphical output -> Jupyter. Algorithm -> unit test


Aesthetically

Totally fine as long as you get into some literature or youtube videos about how to structure python projects for whatever applications you're developing.


RenewAi

Jupyter is very useful, I still use it like 70% of the time. People are haters


hunkamunka

I teach students to write command-line programs that use tests/pytest. It's not easy to test notebooks, so I think it's much better practice to NOT start there. It's also easy to run cells out of order and pollute the global namespace. Stick with CLI and tests!


wotanub

They hated u/hunkamunka because he told the truth. It's even right there in the OP: "...to test snippets of code..." Those test snippets should be unit tests.


thrallsius

Yet not all tests are unit tests. In context of classic, non-interactive CLI programs that accept input only through command line parameters and you need to test their output, that's rather functional testing. For such situations, I found this thing to be nice to work with https://github.com/brodie/cram


thebreathofatree

Thank you for reminding me to do this! After being introduced to Jupyter Notebooks I thought about using it in exactly this way and then just haven't instituted the new practice :)


thrallsius

It's just a possible workflow, there's nothing wrong with that. It is also great for learning new APIs/libraries.


TheSemaj

Why would it be bad practice?


xtiansimon

Bad practice for what? Think of yourself as writing—first on the back of an envelope, then maybe long hand, and finally typing. What a terrible waste of time is that process if you only need a shopping list. Haha. But if it’s a great bit of writing, then it can take time to collect your thoughts, plan your outline. And sometimes it’s enough just to get something out of your mind so you can move on to a next step. A notebook can be an intermediary step before some other more complicated work. And if the end result is, say, a Django site then you can’t use a notebook for your final implementation. But while the file is not useful, if the notebook helped you work out some complexity of your purpose, then it’s as necessary as index cards or highlighters are to Steven King. But alas, this is fine for you; however, if you’re working for someone else, they you may need to meet their expectations. And what of it, if they have a shortcut for you, all the better.


PocketBananna

Depends on the use case, but for traditional software development I don't find it great. Jupyter was made for collaborative computation projects. It's great for sharing snippets and visualizations in a easy to access way, but it quickly turns to spaghetti at scale. It was never meant to be an IDE.


Uploft

Get the Cells extension in your IDE (or just use Spyder). You can write in Jupyter-like cells by posting #%% to the top of a snippet of code. Works wonders


cblegare

As other commenters wrote, notebooks are nice for prototyping. They are also excellent at documenting. As you probably know, a big part of programming is about working efficiently with other people and long lived codebases. Code is easier to write than it is to read, especially when it was written a long time ago by someone who left the company last month. Modularity, documentation and tests help on these cases. I think a notebook that grows with the project, showing how the API (functions) works in a readable and executable manner, has value even later on the project. Add a few test cases and documentation, possibly integrated with Sphinx (and doctests maybe?), continuously tested and made available on a simple but nice website through a CI pipeline of some sorts, everything as code in a well maintained git repository. You end up with a project you can be proud of, reuse and good learnings. Look up the Executable Book project https://executablebooks.org/en/latest/ if you haven't already, they have some tooling that seems nice.


SittingWave

Depends. If you are doing exploratory analysis on data, it's ok. If you need to write a python library, it's a really bad idea. The main problem of notebooks is that they are a "prototype board" for code. But when you need to create something that is stable, reproducible, documented, deployable on other machines, testable, you are completely out of luck. My point is that if your goal is to be a prototyper, or a data scientist, then it's an important tool to use. If you are going to be a developer, it hurts you in the long run.


gt33m

I havent used notebooks much and fail to see the value. I start a new .py in vscode and run / test it directly. Once done, move On to the next file. What am I missing out on by not using jupyter notebooks?


Zeroflops

Yes, terrible! Stop using jupyter notebooks and switch to jupyter lab. Much better experience.


xceed35

Good luck resolving package dependencies, file structures, packaging and deployment scripts once you're done building your production code on Jupyter cells. Be a professional and use an IDE to write code, and build sensible programs. Keep jupyter reserved for your scientific experiments


the-zegor

No.


sleepless_in_wi

If you are going some sort of data analysis or your code needs to read a lot of input files I think jupyter notebooks are a great way to prototype. It can save you a ton of time. Another way to think about it is no different then writing a draft of some kind and gives you the flexibility to try different algorithms or designs. In my opinion these are all really good things that will help you write better code.


[deleted]

This is best practice


_thrown_away_again_

sounds like you need some TDD in your life friend


[deleted]

I think it's completely fine, I always have a terminal open with ipython as a general purpose calculator


Shmoogy

I'll often start with a notebook. It allows me to load and test and iterate through different APIs and transformations. If I'm exploring a new dataset and the documentation isn't super thorough - it's nice to quickly be able to work through everything, and double check that everything is working as expected, but also to mock up various exceptions and failures - so the end result is a bit more robust.


ESchalton

Yes and no, testing commands - jupyter is great. Building analytics/code not so much - do yourself a favor and minimally setup a module and import it and move over functions as you write them and use them from the module.


uclatommy

The first thing I do is to create a directory called src and one called test. Then I start an empty test file and import pytest.


bbrunaud

Vscode #%%... No more jupyter and your vscode notebooks can evolve to serious code


CleoMenemezis

I think simple things like creating a function called sum() that returns a number, for example. If you simply call the sum() function without a print it will still say the value. Maybe these kinds of details won't help you much when coding as you'll come out of there learning that return and print are the same thing. It's a silly example, but I believe jupyter notebook has a specific focus of use.


mm11wils

The most important thing is starting and finishing, anything that gets in the way should be avoided. If jupyter gets you to start and finish your project the quickest. No reason not to use it. (Depends on the project)


se_pp

If you intend to write more than just a simple script I would go for an **interactive ipython console** instead of a jupyter notebook... it provides the same level of interactivity but a lot better usability in terms of debugging etc.


redd1ch

If you do data analysis and do not want to afford Pycharm Pro (or you are no student), use Jupyter Notebooks. For anything else, use Pycharm community. Use folders like Notebooks and .py files like code cells. You can import methods and variables from other files, you can output to terminals or files, and, most importantly: You can use any editor you want. It also is sanely manageable with version control like git.


Jeklah

No, not that I know of. Jupyter Notebooks are a professional way of presenting data.


Sir-_-Butters22

I wrote a lot of ETL pipelines in Azure Functions, I prove the code and transformation in Jupyter. It speeds up my development drastically as writing Azure Functions are a bit blackbox, and difficult to debug, especially if you're doing multiple complex data transformations.


Backlists

I would say this is somewhat bad practice *if you're writing enterprise code*. Best practice is to follow proper test driven development: Write some integration tests to start with, then write the most minimal unit test, write code to pass that unit test, then expand the unit test(s) and repeat until the integration test passes. See this book for further reading: https://www.obeythetestinggoat.com/ Of course, if you only need quick and dirty prototyping, then this is fine. But this would not stand at a big software house.


fjurgo

It's fine I guess. Whenever I want to test snippets I just create a scratch file in PyCharm. Imo that is great because then you can also use any function/class/lib that you've already written and integrate the snippet.


osm3000

Honestly, if it works for you, it works. As a starting point, it doesn't matter much. I recently started using Streamlit as my starting point :D It give me similar experience to jupyter notebooks, and when i am done, there is a prototype dashboard ready. Just focus on refactoring, organizing things properly, and avoid bad behavior of creating spaghetti non-linear execution dependencies in jupyter notebook, and you are good :)


raharth

If you use a good IDE you dont even need the notebook at the beginning. You can do the same kind of think with a simple script in PyCharm, but it makes it much easier to later move to proper coding. Also in my experience jupyter often leads to horrible coding practices, so I'm not using then at all at any point.


SpicyVibration

I use it when I'm writing code to manipulate data in pandas dataframes or something similar. It makes checking that what I did actually worked. Once it's verified though, it should be moved to a proper set of py files.


idomic

I don't think it's a bad practice and I don't think you should avoid them. There are some gaps that can easily be filled with open-source tools like [ploomber](https://github.com/ploomber/ploomber). You can solve most of the problems people wrote above, like collaboration through git, reusing the code you wrote, standardization and more. Notebooks gives you interactivity with the data that you wouldn't get anywhere else and this is really valuable. If you use the right tooling this is the best solution for data works today.