Notes on Working with Python Packages and Git Repos in a Cloud Account

For some Python packages, we prefer to work with the Git repos in the cloud, but there can be some issues in setting it up.

This brief article discusses some issues we have had with Python packages on cloud web servers (web hosting in a data center) where we can't become root and have to live with various site customizations. For certain Python packages, we want to work with the Git repo so we have access to the latest release, experimental branches, and have an ability to apply various patches with ease.

An example of this is the Django web framework. When we set up our web project on a particular data center Linux machine and configured it for Django, we found that the provider installed a Python egg in a library path picked up during initialization (and added to sys.path).

If you're not famliar with sys.path, it is a Python list of strings that specifies the search path for modules.

Although you can find your site package directories through site.getsitepackages() and site.getusersitepackages(), additional paths may be added by the sitecustomize module, which is imported during initialization. The Python site documentation states the module "can perform arbitrary site-specific customizations", and we have found this to be the case.

One useful option when experimenting with the interaction of the site module and related path manipulations is to invoke python with the "-S" option. From the python docs: "Disable the import of the module site and the site-dependent manipulations of sys.path that it entails. Also disable these manipulations if site is explicitly imported later."

One brute force option we have when using a clone of a Git repo is to manually add it to our path using sys and this also provides an opporutunity to show the use of "-S", as shown below:

$ python3 -S
Python 3.5.2 (default, Sep  4 2016, 00:03:21) 
>>> import sys
>>> import django
Traceback (most recent call last):
  File "", line 1, in 
ImportError: No module named 'django'
>>> sys.path.append('/home/user/build/django-trunk')
>>> import django
>>> django.VERSION
(1, 11, 0, 'final', 1)

So assuming you have a handle on where your modules are being found and imported, let's turn to the simple task of working with a cloned Git repo instead of an installed Python package, as shown below:

$ cd /build # this is where we clone our repos
$ git clone https://github.com/django/django.git
Cloning into 'django'...
$ cd django
$ git checkout 1.11

Next cd to the path where your account's modules are imported and create a django.pth file. In the file, add the path to where you cloned the Django repo.

The django.pth file and its contents will direct Python to add the repo to sys.path during site initialization so it can be found during an import.

Now that you are working from a git repo in the cloud for your Python package, you can easily update / patch it, try different branches from the remote origin, and create your own branches for experiments.

However, when updating a Python repo, we have found the need to clear the __pycache__ folders since the ".pyc" files may be stale and cause issues. For Django with Apache, we have found errors like the following after a repo update and restart:

ImportError: bad magic number in 'django.contrib.sitemaps.models': b'\\x03\\xf3\\r\\n'
We resolve it by doing the following to clear out __pycache__:
$ cd /build/django # location of our Django Git repo
$ find . -name '*.pyc' | xargs rm
$ find . -name '*.pyc'   
$ # they're all gone

References

Help us improve this article by adding your comment or question:

email addresses are neither displayed nor shared