Contributing to the source code
Which files do I have to focus on?
Modifying the source code can seem daunting at first: this repo contains a lot of directories, subdirectories and files. Don't fear: this section will clarify which are the "important" files, i.e. the few files that actually do something — most of the files in here exist just because they're required by Django, or because they are static files needed by the HTML interface.
So here are the files that actually count:
DBNsim/DBNtrain/views.py
is the Python interface between a client and the server; this file, along withdbntrain.js
, is the one you will be working on most of the time.DBNsim/DBNtrain/templates/DBNtrain/index.html
is the Django HTML template that forms the main interface.DBNsim/DBNtrain/static/dbntrain.js
is the JavaScript source code that brings to life the HTML interface; it contains several JavaScript functions that are event-oriented (but not object-oriented at all).DBNsim/DBNtrain/static/main.css
is (currently) the only CSS stylesheet that affects the HTML interface.DBNsim/DBNlogic/nets.py
is the Python module that manages the network classes (currently DBN and RBM).DBNsim/DBNlogic/sets.py
is the Python module that manages the available training datasets.DBNsim/DBNlogic/train.py
is the Python module where strategy objects for training the networks reside.DBNsim/DBNlogic/util.py
is a Python module that stores useful functions needed by the other modules.
Structure of the repository
To have an idea of the big picture, the repo is structured like this. The DBNsim
directory contains the source code, while the docs
directory contains all the documentation needed by the users, administrators and developers. (By "users" we mean the end users of the web app, while the administrators are users of the server (those serving DBNsim from a machine) and developers maintain the source code.)
The DBNsim
directory contains two main subdirectories, which are actually Python packages (they have a __init__.py
file that marks them as packages). They are:
DBNlogic
, the business logic of the application;DBNtrain
, the application logic that makes DBNsim a web app.
More specifically, DBNlogic
is a Python package that is totally independent from the rest of the application: you can use it as a command line library for handling and training DBNs, if you like. DBNtrain
uses DBNlogic
for satisfying the clients' requests and is itself a Python package.
DBNtrain
is centered around the views
module, where the developer has to define a function for each request (URL) that the server can accept; adding a function to views.py
requires adding a line in urls.py
, so that Django knows that a particular URL has to be mapped to this new Python function.
Things to do
If you want some ideas about how to improve DBNsim, here are some:
- Configure DBNsim to run on a production-stable WSGI server. The default Django server is not really secure for production.
- Find a more convenient way to manage the jobs run by clients on the server. Currently, DBNsim requires that each client be assigned a random 10-character key, so that the server can update the right training job when the client asks (for example) to perform one training epoch of a job. A much more professional way to do this would be to use WebSockets, for example.
- Tidy the
JavaScript
source code, maybe distributing it across two or three distinct JS modules. - Where possible, replace jQuery with vanilla JavaScript, as jQuery was used only to quickly prototype the application but tends to clutter the UI, slowing down the app on older browsers.
- Improve the use of Cytoscape.js, maybe adding some calls to
cy.batch
instead of manually iterating through a large number of nodes and edges. Currently, the graph drawing efficiency is reasonable but can definitely be improved. - Add some more consistency checks to the JS code that handles the HTML forms and the buttons. I set Travis CI to systematically test the Python back end, but there are no automatic tests for the front end, so it isn't possible to systematically catch hidden bugs.
- Repair the "upload DBN" button. Currently, it is only possible to download a DBN, not uploading it.
- Remove Gnumpy and try to use another library for GPU computing, as Gnumpy only works for Python 2. After this, upgrading DBNsim to Python 3 should be easy.