{ "info": { "author": "redhat-performance", "author_email": "browbench@redhat.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: Apache Software License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "Scribe\n======\n\nPython Library for generating docs conforming to common data model .\n\nscribe is a python library to process data collected and conform it to\ncommon data model to facilitate data indexing into elasticsearch.\nCurrently scribe only supports data collected through stockpile but\nit's very easy to integrate other data sources.\n\nIt is suggested to use a venv to install and run scribe.\n\n::\n\n python3 -m venv /path/to/new/virtual/environment\n source /path/to/new/virtual/environment/bin/activate\n pip install mutate\n # For testing purposes you can do clone the repo first\n # and then install as a local project\n git clone https://github.com/redhat-performance/scribe.git\n pip install -e scribe/\n\nNote: we're creating a python3 venv here as scribe is written in python3\nand is currently incompatible with python2\n\nUsing scribe as a python library\n--------------------------------\n\nScribe is provided by the python library mutate, which helps\nto transcribe an input document to a series of documents that conform\nto CDM. It can be done as follows.\n\n::\n\n from mutate.transcribe import scribe\n for scribe_object in scribe('/tmp/stockpile.json','stockpile'):\n print(scribe_object)\n\nContributing(create patchsets)\n------------------------------\n\nPlease visit http://gerrithub.io/ and Sign In with your github account.\nMake sure to import your ssh keys.\n\nNow, clone the github repository\n\n::\n\n $ git clone https://github.com/redhat-performance/scribe.git\n\nMake sure, you've git-review installed, following should work.\n\n::\n\n $ sudo pip install git-review\n\nTo set up your cloned repository to work with Gerrit\n\n::\n\n $ git review -s\n\nIt's suggested to create a branch to do your work, name it something\nrelated to the change you'd like to introduce.\n\n::\n\n $ git branch my_special_enhancement\n $ git checkout !$\n\nMake your changes and then commit them using the instructions below.\n\n::\n\n $ git add /path/to/files/changed\n $ git commit -m \"your commit title\"\n\nUse a descriptive commit title followed by an empty space. You should\ntype a small justification of what you are changing and why.\n\nNow you're ready to submit your changes for review:\n\n::\n\n $ git review\n\nIf you want to make another patchset from the same commit you can use\nthe amend feature after further modification and saving. Make sure to be\non same branch, and if don't have the branch please follow next set of\ninstructions\n\n::\n\n $ git add /path/to/files/changed\n $ git commit --amend\n $ git review\n\nIf you want to submit a new patchset from a different location (perhaps\non a different machine or computer for example) you can clone the repo\nagain (if it doesn't already exist) and then use git review against your\nunique Change-ID:\n\n::\n\n $ git review -d Change-Id\n\nChange-Id is the change id number as seen in Gerrit and will be\ngenerated after your first successful submission. So, in case of\nhttps://review.gerrithub.io/#/c/redhat-performance/scribe/+/425014/\n\nYou can either do git review -d 425014 as it's the number or you can do\ngit review -d If0b7b4f30615e46f009759b32a3fc533e811ebdc where\nIf0b7b4f30615e46f009759b32a3fc533e811ebdc is the change-id present\n\nMake the changes on the branch that was setup by using the git review -d\n(the name of the branch is along the lines of\nreview/username/branch\\_name/patchsetnumber).\n\nAdd the files to git and commit your changes using,\n\n::\n\n $ git commit --amend\n\nYou can edit your commit message as well in the prompt shown upon\nexecuting above command.\n\nFinally, push the patch for review using,\n\n::\n\n $ git review\n\nAdding Depends-On to commit message\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nA lot of times, especially when adding a new module.. Changes made to\nscribe, will not ensure the CI to work propery until the respective\nchanges to stockpile are merged. In this case, to ensure CI doesn't work\nwith master branch of stockpile but rather the patchset you submitted to\nstockpile, You can use the depends-on functionality.\n\nTo add Depends-On functionality, please copy the Change-Id of the\npatchset you submitted to stockpile, and add it to the commit message at\nthe end like this.\n\nNote: Please add it after Change-Id in commit message.\n\nThe commit message should look like:\n\n::\n\n Your commit message\n\n Change-Id: I9bc121f076b8625da88705c9d96bd00117f94c22\n\n Depends-On: {Change-Id of the review submitted to stockpile}\n\nSay for example, you're working on adding a module to process satellite\ndata, the CI won't be able to test it because stockpile doesn't have a\nsatellite collection yet. However, because you've a commit thats still\nyet to be merged like\nhttps://review.gerrithub.io/#/c/redhat-performance/stockpile/+/425015/\n\nYou can still ensure and verify stockpile-scribe workflow by adding\nDepends-On to your commit message in scribe, so commit message will look\nlike:\n\n::\n\n Adding satellite Module to work with stockpile.\n\n Change-Id: some_random_change_id_generated_after_git_review\n\n Depends-On: I66329511b38a558ce61efb7edb4c3be18625b252\n\nNote that the change ID in Depends-On is the same one in\nhttps://review.gerrithub.io/#/c/redhat-performance/stockpile/+/425015/\n\nFor another example look at:\nhttps://review.gerrithub.io/#/c/redhat-performance/scribe/+/425969/\n\nContributing(making changes)\n----------------------------\n\nScribe package is basically made of two modules:\n\n1. scribes\n2. scribe\\_modules\n\nThese 2 modules serve different purpose, scribes are for reading the\ninput data and pre-processing them into a structure that can be used to\ncreate scribe\\_modules\n\nThe pre-processed dictionary structure can look like this:\n\n.. code:: json\n\n {\n \"scribe_module_1\": [\n {\n \"host\": \"localhost\",\n \"value\": \"sample_value_1\"\n },\n {\n \"host\": \"host1\",\n \"value\": \"sample_value_2\"\n },\n {\n \"host\": \"host2\",\n \"value\": \"sample_value_3\"\n }\n ],\n \"scribe_module_2\": [\n {\n \"host\": \"host2\",\n \"value\": {\n \"field1\": \"sample_filed1_value_3\",\n \"field2\": \"sample_field2_value_3\"\n }\n },\n {\n \"host\": \"host1\",\n \"value\": [\n {\n \"field1\": \"sample_filed1_value_1\",\n \"field2\": \"sample_field2_value_1\"\n },\n {\n \"field1\": \"sample_filed1_value_2\",\n \"field2\": \"sample_field2_value_2\"\n }\n ]\n }\n ]\n }\n\nBasically the dictionary needs to have first level keys that you've\nwritten 'scribe\\_modules', match the name of the file in\nscribe\\_modules/ . The children of each of the module in dictionary\nshould have the 2 keys - 'host' and 'value'. the value for the key\n'value' can be either a dictionary or a list of dictionary\n\nPlease note that the value for the key 'value' will be the one passed to\nthe scribe\\_modules while creating the object.\n\nSo let's take the simple example of scribe\\_module\\_2 for host2, just\none object would be created and the value passed would be\n\n.. code:: json\n\n {\n \"field1\": \"sample_filed1_value_3\",\n \"field2\": \"sample_field2_value_3\"\n }\n\nAnd like wise for host1, there will be 2 objects created.\n\nfor object 1, following value would be passed:\n\n.. code:: json\n\n {\n \"field1\": \"sample_filed1_value_1\",\n \"field2\": \"sample_field2_value_1\"\n }\n\nfor object 2, following value would be passed:\n\n.. code:: json\n\n {\n \"field1\": \"sample_filed1_value_2\",\n \"field2\": \"sample_field2_value_2\"\n }\n\nWhile for scribe\\_module\\_1 for host1, the value that will be passed\nwould be: \"sample\\_value\\_2\"\n\nAdding new scribes\n~~~~~~~~~~~~~~~~~~\n\nSteps to extend scribe to work with a new input-type 'example1' would\ninvolve:\n\n1. Creating 'example1.py' in the 'mutate/scribes/' directory. The\n sample code would look like:\n\n.. code:: python\n\n\n from . import ScribeBaseClass\n\n\n class Example1(ScribeBaseClass):\n\n def example1_build_initial_dict(self):\n output_dict = {}\n Example1_data = load_file(self._path)\n # .... some sort of data manipulation\n # .... to build the output_dict\n return output_dict\n\n def __init__(self, path=None, source_type=None):\n ScribeBaseClass.__init__(self, source_type=source_type, path=path)\n self._dict = self.example1_build_initial_dict()\n\n def emit_scribe_dict(self):\n return self._dict\n\nNote the following:\n\na) from . import ScribeBaseClass needs to be present as we are\n inheriting from the ScribeBaseClass\n\nb) class Example1(ScribeBaseClass) is where inheritance occurs, ensure\n that '(ScribeBaseClass)' is present when you write the class\n definition\n\nc) The first letter in classname must be uppercase that's how factory\n method is defined.\n\nd) The \\_\\_init\\_\\_ function first calls the parent's \\_\\_init\\_\\_\n function and passes the default arguments which are path and\n source\\_type, however more can be added. and they won't be needed to\n passed on to parent class's \\_\\_init\\_\\_ function.\n\ne) emit\\_scribe\\_dict is an abstractmethod and thus it needs to be\n defined in any other class that is written. However the method itself\n can be changed but it should return the dictionary object as\n described above.\n\n2. Add the module to choices list in scribe.py at L14, currently it\n looks like choices=['stockpile'], because at the time of creating\n this documentation only stockpile data could be transcribed using\n scribe.\n\nAdding new scribe\\_modules\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nSteps to extend scribe\\_modules to work with a new module\n'scribe\\_module\\_1' would involve:\n\n1. Adding a new class 'scribe\\_module\\_1.py' to directory\n 'mutate/scribe\\_modules'. It'd probably look something like this:\n \\`\\`\\`python\n\nfrom . import ScribeModuleBaseClass\n\nclass Scribe\\_module\\_1(ScribeModuleBaseClass):\n\n::\n\n def __init__(self, input_dict=None, module_name=None, host_name=None,\n input_type=None, scribe_uuid=None):\n ScribeModuleBaseClass.__init__(self, module_name=module_name,\n input_dict=input_dict,\n host_name=host_name,\n input_type=input_type,\n scribe_uuid=scribe_uuid)\n if input_dict:\n new_dict = {}\n # ... this is where transformation occurs\n # ... can call other member functions of class\n # ... can set the entities of the class object like\n self.entity_1 = input_dict\n\n # This isn't needed here, as it's how the __iter__ function is defined\n # in the parent class and it's not an abstractmethod, so only if you'd\n # like to change how __iter__ method should work for your class, you\n # should add the following next lines.\n # Not recommended, unless you know what you're doing\n def __iter__(self):\n # ... your definition of how to make it iterable\n\n\\`\\`\\`\n\nNote the following important things:\n\na) from . import ScribeModuleBaseClass needs to be present as we are\n inheriting from the ScribeModuleBaseClass\n\nb) class Example1(ScribeModuleBaseClass) is where inheritance occurs,\n ensure that '(ScribeModuleBaseClass)' is present when you write the\n class.\n\nc) The first letter in classname must be uppercase that's how factory\n method is defined.\n\nd) The \\_\\_init\\_\\_ function first calls the parent's \\_\\_init\\_\\_\n function and passes the default arguments which are module\\_name,\n input\\_dict, host\\_name, input\\_type and scribe\\_uuid. Please note\n that no more arguments can be passed.\n\ne) setting the new entities should be done inside the \\_\\_init\\_\\_\n function only, but the user has flexibility of calling another method\n from either same class or from lib/util.py to do transformation.\n\n2. Add schema for the new class 'example1.yml' to the directory\n 'mutate/schema'. Scribe currently uses cerberus to validate the\n iterable produced by the scribe\\_modules subclass. Please look at\n http://docs.python-cerberus.org/en/stable/validation-rules.html for\n more information on how to write the schema for your class's output.\n\nNote: The name of the yml file should match that of the scribe\\_modules\nclass that you create it for. Thus, for 'example1' class the file should\nbe named 'example1.yml'\n\nData Model and ES templates\n---------------------------\n\nDirectory 'mutate/schema' will essentially contain the data model.\nWork needs to be done so that these yml files can be used to create\ntemplates for elasticsearch. It's on the line of the ViaQ's\nelasticsearch templates work.\n\nPlease refer https://github.com/ViaQ/elasticsearch-templates for more\ninfo on how templates can be created.\n\nDo note that, currently ViaQ/elasticsearch-templates doesn't support\ncreating templates from the schema files present in 'mutate/schema'\n\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/redhat-performance/scribe", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "mutate", "package_url": "https://pypi.org/project/mutate/", "platform": "", "project_url": "https://pypi.org/project/mutate/", "project_urls": { "Homepage": "https://github.com/redhat-performance/scribe" }, "release_url": "https://pypi.org/project/mutate/0.0.1.dev13/", "requires_dist": [ "pyyaml (>=3.12)", "argparse (>=1.4.0)", "cerberus (>=1.2)", "setuptools" ], "requires_python": "", "summary": "Data processing tool to provide a CDM adhering readily indexable JSON docs", "version": "0.0.1.dev13" }, "last_serial": 4826657, "releases": { "0.0.1.dev13": [ { "comment_text": "", "digests": { "md5": "8db6b20150f949a727b983fd3fc47402", "sha256": "f3792ee6677153d74cad5a515cd6636326fdda2a5472ed539305a74b026f63b1" }, "downloads": -1, "filename": "mutate-0.0.1.dev13-py3-none-any.whl", "has_sig": false, "md5_digest": "8db6b20150f949a727b983fd3fc47402", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 27337, "upload_time": "2019-02-15T19:14:00", "url": "https://files.pythonhosted.org/packages/f8/44/9eac04b5271530f8e80b52044cdc122dceaca54d0ee65dd917451543edaa/mutate-0.0.1.dev13-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0ce382b58a6920fcce0ff1fcdf93de79", "sha256": "c3768905871f6576872972cc7c626db231d67528b8c28b16c9b671cdbe0a91cb" }, "downloads": -1, "filename": "mutate-0.0.1.dev13.tar.gz", "has_sig": false, "md5_digest": "0ce382b58a6920fcce0ff1fcdf93de79", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20281, "upload_time": "2019-02-15T19:14:03", "url": "https://files.pythonhosted.org/packages/71/ed/abec74bfa477746cfaeaaa7a159a211190a56fbaa3599ebd82087131e986/mutate-0.0.1.dev13.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "8db6b20150f949a727b983fd3fc47402", "sha256": "f3792ee6677153d74cad5a515cd6636326fdda2a5472ed539305a74b026f63b1" }, "downloads": -1, "filename": "mutate-0.0.1.dev13-py3-none-any.whl", "has_sig": false, "md5_digest": "8db6b20150f949a727b983fd3fc47402", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 27337, "upload_time": "2019-02-15T19:14:00", "url": "https://files.pythonhosted.org/packages/f8/44/9eac04b5271530f8e80b52044cdc122dceaca54d0ee65dd917451543edaa/mutate-0.0.1.dev13-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0ce382b58a6920fcce0ff1fcdf93de79", "sha256": "c3768905871f6576872972cc7c626db231d67528b8c28b16c9b671cdbe0a91cb" }, "downloads": -1, "filename": "mutate-0.0.1.dev13.tar.gz", "has_sig": false, "md5_digest": "0ce382b58a6920fcce0ff1fcdf93de79", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20281, "upload_time": "2019-02-15T19:14:03", "url": "https://files.pythonhosted.org/packages/71/ed/abec74bfa477746cfaeaaa7a159a211190a56fbaa3599ebd82087131e986/mutate-0.0.1.dev13.tar.gz" } ] }