How to Create A Python Package

Getting started with Python is an overwhelmingly easy thing. For this reason, it is generally the first programming language lots of people learn. From my perspective, Python is the most versatile and user friendly language for people to get started with. For me,  it was the first real language I learned in university studying mechanical engineering. In talking with professors from all over the globe at conferences such as SciPy, they all generally say that basic foundations in Python are a must for all of their students; undergrad, graduate, and PhD students all alike.

Now getting started in Python and getting started in building meaningful tools and applications in Python, is a completely different beasts. Actually sitting down and writing my first python package, wasn’t something that I did until after doing Python development here and there for a few years. I had contributed to other people’s python packages, have created lots of Python code that runs on things like robots, and built tools for automated testing. But writing a package of my own, was somewhat terrifying for me. I’m not exactly sure as to why, but I think it was mostly because I was afraid of admitting to myself that there was something incredibly basic that I didn’t know. In the engineering world, that is something hard to admit to your self; accepting your weaknesses. But what you do about it, is one of the biggest measurements of an engineer.  That was all true until the day I said, “Hey, I can figure this out and it’s nothing to be ashamed of.”

Hopefully you have come across this post in search of the same thing I originally set out some time ago and that is to do just what the title says, create a distribute-able Python package. Now, we are technically going a little bit deeper than just creating a directory and putting python code into it, but will also go into how to make this package accessible and install-able by anyone connected to the internet. The assumption that we will be working under for this is that you are already familiar with Git, GitHub, and Python in general to the point that if directions simply say; create a Github repo for this package, you’d know what to do. The reason that I’m extending the basic “how to create a package” is because without people being able to install it with something like a conda install or pip install it’s not really a deployed package IMHO. Without further ado, let’s get started.

Step 1: Create you package structure.
Now, before you jump straight to doing computer’y things, lets grab a notebook and a writing utensil of your choosing. You’d be surprised the amount of pain and frustration you can save yourself by just taking a step back from your computer and pull out your notebook/sketchbook and write out what exactly it is you are wanting to do and approximately how you want to do it. This is an exercise that I used to really hate as a young engineer because I just wanted to start coding and doing the cool stuff. Fortunately I had a great manager who was much wiser and patient than I. That was the whole reason he was where he was and I was where I was. He was an architect and I would say that I was just a hacker at that time. So take some time and really think about how you want to accomplish what it is you are wanting to do. Think about what all of the shortcomings of your architecture are and ask some honest questions about if this is the best you can do.

Once you have written out what it is you are going to do and how to do it, the next thing is to think about naming conventions. Following standards is important and when we fail to do so, we invite chaos. the standard naming convention fo rpython packages is to have a fairly consise name and that it is all in lowercase. For more information on Python best practices; check out PEP8 (https://www.python.org/dev/peps/pep-0008/?) This outlines the standards put out by the original creators of Python about how to best write clean Python code.

After you have your general architecture written out and a good name; now is the time we can take a look at our computers and start doing computer-y things. Here we will start to layout the structure of our package. (Remember, python packages contains modules or python files all organized by the use of directories.). In here you can have a higher level of structure of modules separated by directories.

For the basic package that I will be using to demonstrate, we will be creating a very sparse package called “firstpack” which will demonstrate package structure, have some class definitions, and methods for doing pretty simple stuff. Here in this post, we won’t be demonstrating anything overly fancy but all of this is to demonstrate how easy it is to create your own python package. All of the demo code that will be presented is readily available via our Github page and can be found here: https://github.com/ideaGarageIO/PythonTutorials/tree/master/PackageCreation/Lesson1/FirstPack

To start, we will first create the following directory structure:

As you can see, there is a parent directory (not using PEP8) which is just the directory in which we are building our package, a directory for the package in  following PEP8 standards (this is the thing we will be building), some directories (categoryX, classes, and methods), and then one called tests. If you pulls down the code posted on GitHub, you will have the .ipynb (iPython Notebook) file. If you don’t know about Jupyter Notebooks, check out our article on the subject here.

If you chose to go the route of defining your own structure based on your previous activity, just note that the most important thing is to ensure that all of the directories in your package contain an __init__.py (double-underscore init double-underscore dot py) This python file is completely blank and only servers the purpose of indicating to the python interpreter the directory contains python code. (https://docs.python.org/3/tutorial/modules.html#packages)

As for now, don’t worry too much about the contents of the files shown. For an FYI: everything in the ‘docs’ directory is a text file and the MANIFEST file will be auto generated when we build our package.

Step 2: Write your code

Now that everything is roughed out, we can start writing our code. As indicated before, there’s nothing really to what I have written. In the classes directory, there are a couple of python files that contain class definitions that essentially just print out statements when they are initialized. Then in the method we do some silly stuff. Really I could write out all the code in here but that would be a waste of everyone’s time. If you want to see exactly everything I’m doing, check out the useless code over at the GitHub repo found at:  https://github.com/ideaGarageIO/PythonTutorials/tree/master/PackageCreation/Lesson1/FirstPack

Step 3: Write your setyp.py file.

Now that you have the core data for your package good to go, now is the time to do the work to make it install-able. This is done via a python script file called setup.py This is a basic script that imports a function from two possible packages, setuptools (the preferred packaging package within the Python community) and distutils (essentially depricated but still part of the standard library for Python). Now as to why setuptools isn’t part of the standard library, I don’t have the slightest clue and if you could leave the reasoning in the comments below, I’d love to know why.

For our purposes, we are going to use setuptools because it comes as part of the the Anaconda distribution of Python. The same method can be imported as: from distutils.core import setup

Because I will be using setuptools, my code will look like the following:


"""
setup.py install file for the firstpack package
"""

from setuptools import setup

setup(name='firstpack',
version='0.0.1',
description='How to create a python package',
url="https://github.com/ideaGarageIO/PythonTutorials/tree/master/PackageCreation/Lesson1/FirstPack",
author='Cameron Owens',
author_email='[email protected]',
license='BSD 3-Clause',
packages=['firstpack','firstpack.classes', 'firstpack.methods','firstpack.categoryX'],
)

 

Here we just import setup and then feed it some basic arguments. There are several more arguments that you can feed into this function but we will save those for a later date.

Once you have that all in place, uploading your package to PyPI for distribution becomes quite trivial. By uploading it to PyPI allows it to be  distributed through pip install. Now the next section is based upon the assumption that you have a PyPI account. If you don’t, simply go to https://pypi.org/ and click on the register button in the top right. You’ll have to submit some basic info like name, email, username, and password. After you do that, you’ll have to verify your account.

Step 4: Upload your package

After you setup your PyPI account, we can now move on to publishing this brand new package. The most simple and straighforward method for doing this can be done with the single command:

python setup.py register sdist upload

Now obviously, this should be run from where your setup.py file is in order to be able to call the script. What this does is it 1: triggers the login sequence (requires your PyPI credentials) 2: puts your package into a tarbal file, and then 3: Uploads it to PyPI.

As a wise man once said, this all made a short story long. Well, hopefully not. Hopefully this was valuable in showing how to easily create and distribute a Python Package. To learn how to distribute it via conda/conda-forge you’ll have to stay tuned for our next post on this subject coming to a browser near you soon!

Why You Need to Use Python Virtual Environments

Today in this post we are going to talk about one of the most important thing for Python developers that is very much overlooked and that is the use of what are called Python Virtual Environments. This is a tool that I didn’t really learn about and find any value in until I had been working as an engineer for about 2 years and had been doing a fair amount of software development. It wasn’t until one of the PyTexas conferences I had attended that I realized the immense value in using Python Environments. If you have heard about them but never really found use for them, hopefully there is something of value that you’ll find in this post. If this is something completely new or foreign to you, at the end we will give a brief tutorial on how to use them.

Now before we can properly get started, everything that I’ll be showing off will be by using the Anaconda distribution of Python which has it’s own syntax for creating python environments. If you are not familiar with Anaconda, check out a post about why we prefer to use Anaconda over other distributions. Nothing that will be shared in this post will be specific to using Anaconda over any other distribution, however some of the syntax that will be demonstrated will be specific to the Anaconda ecosystem; we’ll try and point out the differences where possible. Without further ado, lets start talking abut what Python environments are and why you should really be using them.

If you have ever done web development or developed software/applications designed to run on multiple platforms, you know how important it is to have multiple test environments. For example, when developing web apps, you should be testing your app on multiple browsers (IE: Chrome behaves differently than say Internet Exploder Explorer), multiple operating systems (this should be given), as well as multiple devices to cover all possible screen sizes as well as interfaces that a user will be using to interact with your application. Well, in Python, there is something very similar but specifically for development called environments. For the majority of the Python world, most people will be using either Virtualenv or Conda to create and manage their Python environment.

The main purpose of having Python virtual environments is to be able to manage project dependencies. With using a Python virtual environment, you can create an isolated environment in which you can install the required Python packages for your specific project without impacting other Python virtual environments or the root Python path. The reason you want this are two fold, one it makes creating documentation for your project pretty simple and straightforward. When you are listing your dependencies, all you have to do is just list the packages in the environment associated with the project. The second is it keeps your Python system as a whole clean. This last one may be me just being a little neurotic about keeping everything super organized, but it makes it easier to sleep at night knowing exactly what I have installed on my system.

The next big reason working with Python virtual environments is that if you want to quickly spin up the same Python development environment you are working with, somewhere else without going through the tedium of starting from scratch. Let’s take the theoretical example that I’m as a developer you have NEVER encountered, and that is of when you are collaborating with someone, on a project and you want to ensure you are using the same tools.  With GitHub and other similar services, it makes it really easy to share the source code that you are developing via all the features we have become accustomed to in our version control system of choice. But, how do you make sure that you both are using the same packages let alone using the same versions of said packages? Enter virtual environments. Here you are able to bundle up your virtual environment and send it to a friend, colleague, or coworker to ensure that you are both using the same setup. Besides being incredibly practical in ensuring consistency; virtual environments are incredibly handy when debugging or validating something under specific circumstances. We will talk more about how share your virtual environments in the tutorial section below.

Now the next really good reason to be using virtual environments is to preserve known good working environments. Now fortunately for me I have never really run into the problem of upgrading a package and then it being incompatible with much of what I’m working on. That said, I know it is a real thing because it has happened to me in different scenarios when it comes to C++ and stuff during the early days of ROS. Actually, I take that back. Switching some stuff from Python 2.7 to 3.4 was a little rough but it was to be expected as it was a big change between the two. Anyways, little is as frustrating as  having everything breaking down simply because you ran an update/upgrade. Isn’t that the whole point of staying up to date; having stuff fixed, not broken? In an ideal world that’s how it should go. However, if there is a piece of code written by a human, it has bugs in it somewhere and finding those bugs can either be done by users or by developers. Sadly a lot of stuff gets out into the wild without having been tested all that much. By having a dedicated Python virtual environment, you can make it possible to have the best of both worlds. In one Python virtual environment, you can have it set up exactly how you need it to, in order to run some weird piece of code that requires certain versions of python packages to run correctly while at the same time have a python virtual environment that allows you to have the latest and greatest of everything.

All in all, Python virtual environments are really something that you should be using if you haven’t already adopted it as part of your work flow in doing development. Pretty much whenever I start a new project that requires Python development; I create a dedicated virtual environment just for that project.

Now for those who are completely unfamiliar with how to use virtual environments, on to the tutorial section. Now before we do so you’ll either want to install Anaconda Python (as alluded to above) or you’ll need to perform a pip install of virtualenv. For instructions on how to do either of these check out the following:

Anaconda Python Installation

virtualenv Installation

Why Anaconda Python is my Choice

“Anaconda Python is my distribution choice of Python. There are many like it, but this one is mine.”

(See Rifleman’s Creed)

Without triggering a flame war here, I will start off by saying that there are many distributions of Python available to people and all of them have their pro’s and cons (See Python Distribution List). That said, I personally prefer to use Anaconda for the reason’s I’ll outline below. Generally speaking, the preference of one tool over another is purely objective. If you don’t believe me, just search Vim vs Emacs and get lost in a debate that has been going on for literally decades over personal preference.

1 Package Management

One of the biggest reasons that I love the Anaconda/Conda environment is their package management tool. It’s fairly well curated and provides simple ways for installing pre-compiled binaries of Python Packages. Most everything that everyone is using these days is just a simple conda install <package name>. If you didn’t catch that, conda is the package manager that is included when you install Anaconda Python.

2 It’s Batteries Included+

When talking about Python, a lot of people used to say that it was a “Batteries Included” programming language which referred to the fact that Python does stuff like garbage collection, type detection, and a rich standard library. Well with Anaconda, it takes that to a whole new level by not only providing the source of Python but also coming with a whole lot of Python packages that are used by an overwhelming majority of Data Scientists. A lot of the time, just installing Anaconda will give you everything that you’ll need to do what you are wanting to do without searching for and installing required packages.

3 Conda isn’t just a package manager

As part of the Anaconda installer, you get this really cool CLI tool called Conda. As alluded to, this tool can do package management for you using the Anaconda repos for pre-compiled binaries. Well that’s not all it does, it also serves as a conda environment manager. If you have never used virtualenv or virtualenvwrapper, this may be a little lost on you so go and do some research on Python Virtual Environments. Now there are sometimes some short comings with virtualenv; such as sharing of your virtual environments. For me personally, conda environments are so much easier to work with and are so much more feature rich.

4 They have GUI tools to make life easier for you

Now I generally try to avoid GUI tools where and when I can. Call me a purist but there is something that is so much faster about using a well documented CLI. (Emphasis on ‘well documented’) However if you are not like me, they fortunately have a GUI tool for people on Windows and Mac OS X to make life a bit easier. This GUI allows you to manage all of your environments as well as other Python related apps that come with Anaconda.

5 Comes with Jupyter Notebooks

One of the best things that comes as part of Anaconda when you install it, is Jupyter Notebook. Now technically this should fall under the “Batteries included +” section but Jupyter Notebooks isn’t so much a package as it is an application. This is an immensely powerful tool that anyone who is writing Python code should be using to prototype their next new algorithm or to interactively munge a particular data set.

 

Anaconda Python simply put, makes your life much easier.

5 Reasons You Should Be Using Jupyter Notebooks

It seems like everyone is getting into Python these days. Honestly I can’t blame them. You can do just about anything in Python. I have created GUI applications using Pyside. I have written ROS code in Python. I once gave a presentation at PyTexas about how you could talk to an Arduino using nothing but Python and you can even now put Python directly onto a microcontroller. Websites completely written in Python? Yep that’s possible and we often use it for our clients.

As the XKCD comic says, (Python) “Is so simple.”

Python
XKCD 353: Python

https://xkcd.com/353/

But in all seriousness, Python is an incredibly powerful tool. One of the biggest questions we often get from people looking to become coders or that are just looking to add more tools to their toolbox is “what is the best IDE for Python?”

Now that is a bit of a loaded question because it all depends on what exactly it is that you are doing and the person. That’s like asking what is the best flavor of ice cream, it’s purely subjective. That is unless you are talking about what pairs well with peach cobbler and that is ALWAYS Blue Bell Vanilla Bean.

As for IDEs I personally love using Atom, Sublime, and even Visual Studio from time to time. However, when it comes to Python development, 80-90% of my time is spent using a tool called Jupyter Notebooks. What are Jupyter Notebooks you say? Well short story is that it is an incredibly awesome and powerful tool that can really help improve your workflow and save you time.

For those of you unfamiliar with Jupyter Notebooks, here are my “5 Reasons You Should Be Using Jupyter Notebooks.”

 

1. When you are sketching out designs for that new project, you use a notebook/notepad/piece of scratch paper. Why is coding treated differently 
– Jupyter Notebooks is the latest iteration of a project first started by Fernando Perez called iPython. This original project was essentially aimed at providing an interactive Python interpreter that was able to provide a history of what occurred. But what was unique about this was it wasn’t just a history slowly scroll through; it was an index-able history. Meaning, you could query and say what was the result from command 8? Eventually this grew to having it put into a web browser, and has evolved into what we know today as Jupyter Notebooks. The best way to describe this tool to someone who has never seen it or used it, is that it is quite literally a programmer’s notebook in which you can run code in individual cells and it allows you to slowly build up your algorithm, classes, or utilities in a way that is very organic. The biggest thing that I’d say is wrong when writing code directly into an IDE is that lack of interractivity. Sure you can run things in debug mode and slowly work up from there, but that doesn’t really apply when you are just prototyping things out.In Jupyter Notebooks, this is highlighted by the concept of cells. These are blocks of either plain text, HTML, or Python/Julia/R code inside of them that can be run.

 

2. Did I Forget to Mention Inline Graphs?
When you are working in a data science position or performing data analysis, one of the most important things you have to do, is convey the meaning of said data. An incredibly power thing with Jupyter Notebooks is the ability to combine code with images, graphics, and graphs using packages like matplotlib and plotly. There’s something awesome about being able to have a section of code that acts on  something like an SQL database munging data and then generate a graph that is shown immediately underneath the code that was responsible for performing the operation. Great examples data visualization within notebooks can be seen over at the Jupyter Notebooks Gallery on Github.

 

3. They are Easily Shared
One of the reason that Jupyter Notebooks have become so popular outside of the reasons listed above, is the fact that they are easily shared in the same format that you last left your notebook in and can be presented directly on services like GitHub. If you haven’t taken a look at the Official Jupyter Notebook Gallery, you can see a collection of amazing notebooks to understand what I’m talking about.  There you can see notebooks in action and even run the code directly in them. Sure you can share a piece of code very easily in the modern world via services such as GitHub and people can get exact copies of what you have shared with the world, but what makes Jupyter Notebooks unique is being able to share them where people can look at both the code and the results from executing that code. Because of this feature, they have become increasingly popular among lecturers, professors, and students in submitting their work and sharing their results. If you fall into this category, you should also take a look at an amazing project that is part of the Jupyter ecosystem, Nbgrader.

 

4. You Can Easily Build Dashboards and Widgets in Them
As a lot of applications and services move more and more to the web (yes web apps have been the promise of developers since the late 90’s), you often need to quickly prototype interfaces to quickly get to the most optimal solution. Because Jupyter Notebooks support extensions and interactive widgets, Jupyter Notebooks can be amazing platform for building these tools because they are already running within a web browser, support doing things in a mix of Python and Javascript (javascript is required when you develop your own widgets), and there are already a wide range of widgets that you can use as a starting point to build your custom interface. For examples of this and the documentation on it, head over to http://jupyter.org/widgets

 

5. You Can Run Notebooks Remotely on Your Own Server
Because Notebooks run inside a web browser there is some dark magic that is required to run this in order to support multiple Python kernels (these link to your Python environments). Getting into how all of that works is a bit outside of the scope of this post but the reason that I bring it up is that with Jupyter Notebooks, you can run them remotely on a dedicated server. Just like a lot of developers will have certain virtual machines or servers dedicated for development in order to keep their environments clean, you can essentially do the same thing with Jupyter Notebook. If you have run Jupyter Notebooks in the past, you most likely were using the default configuration of launching a notebook server locally. However you can have a dedicated server managing Python environments and host notebooks remotely. If you have ever taken a Udacity Class that used python, this is what you were using. What makes this powerful is that it allows people who have access to the Notebook server to be able to run and execute Python code without having anything installed on their computer locally because it is all be accessed over the web.

Although this didn’t quite make it to my top 5 reasons but another reason you should take a look at using Jupyter Notebooks is because all of the cool kids are using them. Now before you start thinking that I’m advocating to do something just because others are doing it hear me out. When you are trying to learn about what the makers and shakers are working on, it is incredibly valuable to be familiar with the tools they are using when you are going through their work. I don’t know about you but I have mad respect for people like DynamicPaige, James Powell, and Jake VanDerPlas and every time I have seen them talk about what they are working on or give a presentation at a Python conference, it has always been through a Jupyter Notebook. Now you should never blindly follow anyone but it is immensely useful to be familiar with the tools and tricks people are using when you are trying to emulate and learn from them.

 

Now there are a lot of people out there who really don’t like Jupyter Notebooks and I really think that they are trying to use them for scenarios where Jupyter Notebooks were never designed for. Point in case, if you read “5 Reasons Why Jupyter Notebooks Suck” by Alexander Meuller; he talks about things like the inability to test Jupyter notebooks or to use code checkers and etc. The thing that I think most people miss is the entire name, Jupyter Notebooks. When you were in school, did you ever turn in your notebook or rough drafts as the final product. I would hope you didn’t. Jupyter Notebooks are not designed for production but for everything that leads up to writing production code.

Jupyter Notebooks are truly one of my favorite tools and I use them for all kinds of things. If you haven’t given them a try, I’d really recommend it.

For more information on them, take a look at the Youtube videos provided by the Software Carpentry Foundation. These videos are generally hosted via Enthought’s YouTube account as most of the presentations are given at SciPy which is hosted by Enthought.