{ "info": { "author": "Jillian Anderson, Joel Becker, Steve McColl, John McLevey", "author_email": "janderes@uwaterloo.ca, jbwbecker@uwaterloo.ca, s2mccoll@uwaterloo.ca", "bugtrack_url": null, "classifiers": [], "description": "Overview\r\n============\r\n\r\n``gitnet`` is a Python 3 package with tools for collecting, cleaning, and exporting datasets from local Git repositories, as well as creating network datasets and visualizations. The primary purpose of ``gitnet`` is to provide scholarly tools to study the collaboration structure of free and open source software development projects, but may also be of use to organizations, project managers, and curious coders.\r\n\r\n``gitnet`` is currently in active development by the University of Waterloo's NetLab_. The current build offers flexible tools for working with local Git repositories. Future iterations will include support for creating networks using issue report and pull request data, tools for analyzing contributors' communication networks, reproducible data collection, and more tools for increased flexibility. If you are curious about the project, want tips regarding how to use ``gitnet``, find a bug, or wish to request a feature, please feel free to email a contributor or submit an issue report.\r\n\r\n.. _NetLab: http://networkslab.org/\r\n\r\nA Quick (Meta) Example\r\n-------------------------------\r\n\r\n``gitnet`` makes it easy to collect, clean, and visualize local Git repositories. Here, we used it to create a network visualization of contributions to `.py` files in our Git repository.\r\n\r\n\r\n::\r\n\r\n import gitnet as gn\r\n\r\n gn_log = gn.get_log(\"Users/localpath/gitnet\")\r\n gn_log = gn_log.ignore(\"\\.py$\",ignoreif = \"no match\")\r\n\r\n gn_net = gn_log.network(\"author/file\")\r\n gn_net.quickplot(\"plot.pdf\", layout = \"spring\", colours = \"simple\")\r\n\r\nThis snippet imports ``gitnet``, creates a ``CommitLog`` from our local repository, uses a regular expression to ignore files with names that do not end with ``.py``, creates a ``MultiGraphPlus`` object using presets for a bipartite author/file network, and saves a basic visualization of the network. (By default, author nodes are coloured white and python files are coloured light red.) The result looks like this:\r\n\r\n.. image:: resources/gitnet_plot_py.png\r\n\r\nAdditionally, you can export data retrieved by gitnet in either ``grapml`` or plaintext edgelist format. This data can then be used in the statistical programming language R, to create visualizations like this one:\r\n\r\n.. image:: resources/gitnet_plot_r.png\r\n\r\nRetrieving Data\r\n---------------------------\r\n\r\nCurrently, only local Git retrieval is supported. Use the `get_log()` function to create a ``CommitLog`` object, by passing a file path for the Git repository.\r\n\r\n``my_log = gn.get_log(\"Users/localpath/my_repository\")``\r\n\r\nThe Log Class\r\n-------------------\r\n\r\nThe core data class for all data collected by ``gitnet`` is a ``Log``. ``Logs`` contain a core dataset of records, attributes documenting its retrieval, and a number of methods to explore, clean, and export the data it contains. In practice, users will generally use a subclass of the ``Log`` class, with extra features appropriate for the source of their data (e.g. the ``Log`` subclass for Git commit data is called ``CommitLog``, and has methods for generating author-file networks, ignoring files by extension, and so on.)\r\n\r\nThe core dataset is a dictionary of dictionaries, and held in log.collection. All `Logs` are subscriptable, so you can access individual records directly by their identifiers (e.g. their commit hash).\r\n\r\nThe basic methods available for `Log` and all its subclasses are as follows:\r\n\r\n+-----------------------+----------------------------------------------------------------------+\r\n| Method | Purpose |\r\n+=======================+======================================================================+\r\n| `.attributes()` | Produces a list of all the tags in the collection. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.describe()` | Prints a detailed, subclass-specific summary of the `Log` |\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.browse()` | Interactively prints the content of each record in the collection. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.filter()` | Selectively remove records using some matching criteria. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.tsv()` | Export a tab delimited spreadsheet containing the collected data. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.df()` | Create a `Pandas` dataframe object using the collected data. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.vector()` | Create a list of all values with a specified tag. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.replace_val()` | Replace a specified tag value. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.generate_edges()` | Creates network edges by record. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.write_edges()` | Writes an edgelist (with attributes) to a file. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.generate_nodes()` | Creates a dictionary of network nodes. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.write_nodes()` | Writes a list of nodes (with attributes) to a file. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.generate_network()` | Creates a network, producing a `MultiGraphPlus` object. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n\r\nThe CommitLog Subclass\r\n-----------------------------\r\n\r\nGit commit log datasets are stored as a ``CommitLog``, which inherits all the features of a ``Log`` as well as the following methods:\r\n\r\n\r\n+-----------------------+----------------------------------------------------------------------+\r\n| Method | Purpose |\r\n+=======================+======================================================================+\r\n| `.describe()` | A `CommitLog` specific summary, which overrides `Log` describe. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.ignore()` | Removes files matching a regular expression from all records. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.network()` | Contains preset options for generating networks from a `CommitLog`. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n\r\n\r\nThe MultiGraphPlus Class\r\n----------------------------\r\n\r\nWhen you create a network using ``gitnet``, it is represented as a ``MultiGraphPlus`` object, which is a subclass of the networkx_ class for undirected graphs with duplicate edges, the ``MultiGraph``. ``MultiGraphPlus`` inherits all the features of a ``MultiGraph``, and so can be used with all ``networkx`` functions that have ``MultiGraph`` support. However, ``MultiGraphPlus`` defines a number of new methods to make working with ``gitnet`` networks more convenient. The methods unique to ``MultiGraph`` are:\r\n\r\n.. _networkx: https://pypi.python.org/pypi/networkx/\r\n\r\n+-----------------------+----------------------------------------------------------------------+\r\n| Method | Purpose |\r\n+=======================+======================================================================+\r\n| `.describe()` | A description of the network. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.quickplot()` | Presets for plotting networks in one line of code. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.node_attributes()` | Adds node attributes, with prebuilt or custom helper functions. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.node_merge()` | Merges two nodes. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.collapse_edges()` | Simplifies a network by merging edges which occur between node pairs.|\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.write_graphml()` | Exports the network as a GraphML file. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n| `.write_tnet()` | Exports the network as tnet edgelist for use in R. |\r\n+-----------------------+----------------------------------------------------------------------+\r\n\r\nCustom Data Sources\r\n-------------------------\r\n\r\nIf you want to use the features of `gitnet` for an unsupported data source, it is easy to initialize a `Log` object with a custom dataset. First, convert your data into a dictionary of dictionaries, for example:\r\n\r\n::\r\n\r\n data = {\"id1\":{\"attr1\":val1,...,\"attrn\":valn},\r\n :\"idm\":{\"attr1\":val1,...,\"attrn\":valn}}\r\n\r\nThen, initialize a `Log` with the dictionary of dictionaries.\r\n\r\n::\r\n\r\n my_log = Log(data)\r\n\r\n\r\nIf you wish to request or contribute support for a new data source, please contact the developers. Further documentation can be found here_.\r\n\r\n.. _here: http://networkslab.org/gitnet/page/documentation/\r\n\r\n\r\nProject Status\r\n------------------\r\n\r\n- Gitnet is currently beta-0.1.1.\r\n\r\nTo-Do\r\n--------------\r\n\r\nAs a project in development, Gitnet will have a list of potential issues, updates, and features.\r\nAny external requests and issue reports can be made on our GitHub project page.\r\nWe appreciate any comments from developers and researchers who stumble upon our work.\r\n\r\n- Solve problems related to the pygraphviz dependency on Windows. Some users may encounter difficulty running `graph.quickplot()` as a result.\r\n - May not be possible given the general inaccessibility of the graphviz software interface.\r\n- Increase efficiency of internal log parsing. Some large projects can take up to several minutes to process.\r\n - Currently in progress, some significant improvements have been made, although ahead of any official release.\r\n- Include remote log extraction. One of the biggest caveats of gitnet is that you have to spend a significant amount of time downloading large projects.\r\n- Include additional export options for users of additional visualization packages, and who want to export dynamic network data.\r\n- Include additional custom classes for more VCS types and mailing lists.", "description_content_type": null, "docs_url": null, "download_url": "https://github.com/networks-lab/gitnet", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://networkslab.org/gitnet", "keywords": "", "license": "GPL", "maintainer": "", "maintainer_email": "", "name": "gitnet", "package_url": "https://pypi.org/project/gitnet/", "platform": "", "project_url": "https://pypi.org/project/gitnet/", "project_urls": { "Download": "https://github.com/networks-lab/gitnet", "Homepage": "http://networkslab.org/gitnet" }, "release_url": "https://pypi.org/project/gitnet/0.1.1/", "requires_dist": [ "bash", "matplotlib", "networkx" ], "requires_python": "", "summary": "A data extraction and network generation tool for local git repositories.", "version": "0.1.1" }, "last_serial": 2285682, "releases": { "0.1.1": [] }, "urls": [] }