{ "info": { "author": "Piero Molino", "author_email": "piero.molino@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "Ludwig\n======\n\nIntroduction\n------------\n\nLudwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code.\n\nAll you need to provide is a CSV file containing your data, a list of columns to use as inputs, and a list of columns to use as outputs, Ludwig will do the rest.\nSimple commands can be used to train models both locally and in a distributed way, and to use them to predict on new data.\n\nA programmatic API is also available in order to use Ludwig from your python code.\nA suite of visualization tools allows you to analyze models' training and test performance and to compare them.\n\nLudwig is built with extensibility principles in mind and is based on data type abstractions, making it easy to add support for new data types as well as new model architectures.\n\nIt can be used by practitioners to quickly train and test deep learning models as well as by researchers to obtain strong baselines to compare against and have an experimentation setting that ensures comparability by performing standard data preprocessing and visualization.\n\nLudwig provides a set of model architectures that can be combined together to create an end-to-end model for a given use case. As an analogy, if deep learning libraries provide the building blocks to make your building, Ludwig provides the buildings to make your city, and you can chose among the available buildings or add your own building to the set of available ones.\n\nThe core design principles we baked into the toolbox are:\n- No coding required: no coding skills are required to train a model and use it for obtaining predictions.\n- Generality: a new data type-based approach to deep learning model design that makes the tool usable across many different use cases.\n- Flexibility: experienced users have extensive control over model building and training, while newcomers will find it easy to use.\n- Extensibility: easy to add new model architecture and new feature data types.\n- Understandability: deep learning model internals are often considered black boxes, but we provide standard visualizations to understand their performance and compare their predictions.\n- Open Source: Apache License 2.0\n\n\nInstallation\n============\n\nLudwig's basic requirements are the following:\n\n- tensorflow\n- numpy\n- pandas\n- scipy\n- scikit-learn\n- Cython\n- h5py\n- tabulate\n- tqdm\n- PyYAML\n- absl-py\n\nLudwig has been developed and tested with Python 3 in mind.\nIf you don\u2019t have Python 3 installed, install it by running:\n\n```\nsudo apt install python3 # on ubuntu\nbrew install python3 # on mac\n```\n\nYou may want to use a virtual environment to maintain an isolated [Python environment](https://docs.python-guide.org/dev/virtualenvs/).\n\n```\nvirtualenv -p python3 venv\n```\n\nIn order to install Ludwig just run:\n\n```\npip install ludwig\n```\n\nor install it by building the source code from the repository:\n\n```\ngit clone git@github.com:uber/ludwig.git\ncd ludwig\nvirtualenv -p python3 venv\nsource venv/bin/activate\npip install -r requirements.txt\npython setup.py install\n```\n\nThis will install only Ludwig-s basic requirements, different feature types require different dependencies.\nWe divided them as different extras so that users could install only the ones they actually need.\n\nText features extra packages can be installed with `pip install ludwig[text]` and include:\n\n- spacy\n- bert-tensorflow\n\nIf you intend to use text features and want to use [spaCy](http://spacy.io) based language tokenizers, install language specific models with:\n```\npython -m spacy download \n```\nMore details in the [User Guide](https://uber.github.io/ludwig/user_guide/#spacy-based-word-format-options).\n\nImage features extra packages can be installed with `pip install ludwig[image]` and include:\n\nscikit-image\n\nAudio features extra packages can be installed with `pip install ludwig[audio]` and include:\n\n- soundfile\n\nVisualization extra packages can be installed with `pip install ludwig[viz]` and include:\n\n- matplotlib\n- seaborn\n\nModel serving extra packages can be installed with `pip install ludwig[serve]` and include:\n\n- fastapi\n- uvicorn\n- pydantic\n- python-multipart\n\nAny combination of extra packages can be installed at the same time with `pip install ludwig[extra1,extra2,...]` like for instance `pip install ludwig[text,viz]`.\nThe full set of dependencies can be installed with `pip install ludwig[full]`.\n\nBeware that in the `requirements.txt` file the `tensorflow` package is the regular one, not the GPU enabled one.\nTo install the GPU enabled one, uninstall `tensorflow` and replace it with `tensorflow-gpu` after installation.\n\nIf you want to train Ludwig models in a distributed way, you need to also install the `horovod` and the `mpi4py` packages.\nPlease follow the instructions on [Horovod's repository](https://github.com/uber/horovod) to install it.\n\n\nBasic Principles\n----------------\n\nLudwig provides two main functionalities: training models and using them to predict.\nIt is based on datatype abstraction, so that the same data preprocessing and postprocessing will be performed on different datasets that share data types and the same encoding and decoding models developed for one task can be reused for different tasks.\n\nTraining a model in Ludwig is pretty straightforward: you provide a CSV dataset and a model definition YAML file.\n\nThe model definition contains a list of input features and output features, all you have to do is specify names of the columns in the CSV that are inputs to your model alongside with their datatypes, and names of columns in the CSV that will be outputs, the target variables which the model will learn to predict.\nLudwig will compose a deep learning model accordingly and train it for you.\n\nCurrently the available datatypes in Ludwig are:\n\n- binary\n- numerical\n- category\n- set\n- bag\n- sequence\n- text\n- timeseries\n- image\n- audio\n- date\n- h3\n- vector\n\nThe model definition can contain additional information, in particular how to preprocess each column in the CSV, which encoder and decoder to use for each one, feature hyperparameters and training parameters.\nThis allows ease of use for novices and flexibility for experts.\n\n\nTraining\n--------\n\nFor example, given a text classification dataset like the following:\n\n| doc_text | class |\n|---------------------------------------|----------|\n| Former president Barack Obama ... | politics |\n| Juventus hired Cristiano Ronaldo ... | sport |\n| LeBron James joins the Lakers ... | sport |\n| ... | ... |\n\nyou want to learn a model that uses the content of the `doc_text` column as input to predict the values in the `class` column.\nYou can use the following model definition:\n\n```yaml\n{input_features: [{name: doc_text, type: text}], output_features: [{name: class, type: category}]}\n```\n\nand start the training typing the following command in your console:\n\n```\nludwig train --data_csv path/to/file.csv --model_definition \"{input_features: [{name: doc_text, type: text}], output_features: [{name: class, type: category}]}\"\n```\n\nwhere `path/to/file.csv` is the path to a UTF-8 encoded CSV file contaning the dataset in the previous table.\nLudwig will perform a random split of the data, preprocess it, build a WordCNN model (the default for text features) that decodes output classes through a softmax classifier, train the model on the training set until the accuracy on the validation set stops improving.\nTraining progress will be displayed in the console, but TensorBoard can also be used.\n\nIf you prefer to use an RNN encoder and increase the number of epochs you want the model to train for, all you have to do is to change the model definition to:\n\n```yaml\n{input_features: [{name: doc_text, type: text, encoder: rnn}], output_features: [{name: class, type: category}], training: {epochs: 50}}\n```\n\nRefer to the [User Guide](https://uber.github.io/ludwig/user_guide/) to find out all the options available to you in the model definition and take a look at the [Examples](https://uber.github.io/ludwig/examples/) to see how you can use Ludwig for several different tasks.\n\nAfter training, Ludwig will create a directory under `results` containing the trained model with its hyperparameters and summary statistics of the training process.\nYou can visualize them using one of the several visualization options available in the `visualize` tool, for instance:\n\n```\nludwig visualize --visualization learning_curves --training_statistics path/to/training_statistics.json\n```\n\nThe commands will display a graph that looks like the following, where you can see loss and accuracy as functions of train iteration number:\n\n![Learning Curves](docs/images/getting_started_learning_curves.png \"Learning Curves\")\n\nSeveral visualizations are available, please refer to [Visualizations](https://uber.github.io/ludwig/user_guide/#visualizations) for more details.\n\n\nDistributed Training\n--------------------\n\nYou can distribute the training of your models using [Horovod](https://github.com/uber/horovod), which allows to train on a single machine with multiple GPUs as well as on multiple machines with multiple GPUs.\nRefer to the [User Guide](https://uber.github.io/ludwig/user_guide/#distributed-training) for more details.\n\n\nPredict\n-------\n\nIf you have new data and you want your previously trained model to predict target output values, you can type the following command in your console:\n\n```\nludwig predict --data_csv path/to/data.csv --model_path /path/to/model\n```\n\nRunning this command will return model predictions and some test performance statistics if the dataset contains ground truth information to compare to.\nThose can be visualized by the `visualize` tool, which can also be used to compare performances and predictions of different models, for instance:\n\n```\nludwig visualize --visualization compare_performance --test_statistics path/to/test_statistics_model_1.json path/to/test_statistics_model_2.json\n```\n\nwill return a bar plot comparing the models on different measures:\n\n![Performance Comparison](docs/images/compare_performance.png \"Performance Comparison\")\n\nA handy `ludwig experiment` command that performs training and prediction one after the other is also available.\n\n\nProgrammatic API\n----------------\n\nLudwig also provides a simple programmatic API that allows you to train or load a model and use it to obtain predictions on new data:\n\n```python\nfrom ludwig.api import LudwigModel\n\n# train a model\nmodel_definition = {...}\nmodel = LudwigModel(model_definition)\ntrain_stats = model.train(training_dataframe)\n\n# or load a model\nmodel = LudwigModel.load(model_path)\n\n# obtain predictions\npredictions = model.predict(test_dataframe)\n\nmodel.close()\n```\n\n`model_definition` is a dictionary contaning the same information of the YAML file.\nMore details are provided in the [User Guide](https://uber.github.io/ludwig/user_guide/) and in the [API documentation](https://uber.github.io/ludwig/api/).\n\n\nExtensibility\n-------------\n\nLudwig is built from the ground up with extensibility in mind.\nIt is easy to add an additional datatype that is not currently supported by adding a datatype-specific implementation of abstract classes which contain functions to preprocess the data, encode it, and decode it.\n\nFurthermore, new models, with their own specific hyperparameters, can be easily added by implementing a class that accepts tensors (of a specific rank, depending of the datatype) as inputs and provides tensors as output.\nThis encourages reuse and sharing new models with the community.\nRefer to the [Developer Guide](https://uber.github.io/ludwig/developer_guide/) for further details.\n\n\nFull documentation\n------------------\n\nYou can find the full documentation [here](http://uber.github.io/ludwig/).", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://ludwig.ai", "keywords": "ludwig deep learning deep_learning machine machine_learning natural language processing computer vision", "license": "Apache 2.0", "maintainer": "", "maintainer_email": "", "name": "ludwig", "package_url": "https://pypi.org/project/ludwig/", "platform": "", "project_url": "https://pypi.org/project/ludwig/", "project_urls": { "Homepage": "https://ludwig.ai" }, "release_url": "https://pypi.org/project/ludwig/0.2.1/", "requires_dist": null, "requires_python": ">=3", "summary": "A deep learning experimentation toolbox", "version": "0.2.1" }, "last_serial": 5965847, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "68a1123334315c02490234114dd479d9", "sha256": "c7ebd8463d4142bcbfe5375fb65dfa8f3c66f6142c6bc43b2ea98db1bc1035e5" }, "downloads": -1, "filename": "ludwig-0.1.0.tar.gz", "has_sig": false, "md5_digest": "68a1123334315c02490234114dd479d9", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 121114, "upload_time": "2019-02-11T03:39:13", "url": "https://files.pythonhosted.org/packages/7d/13/6ed1b17cfeb19fe89c1e8f622d67b97228bd870caf97e7ee9074e083ce67/ludwig-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "d1a361eb153b23db2fa2ac20028bb885", "sha256": "fc75ee19331cae770be47dd09a9fbfc8b30c03b759a4b5ba225b9339e647c6e3" }, "downloads": -1, "filename": "ludwig-0.1.1.tar.gz", "has_sig": false, "md5_digest": "d1a361eb153b23db2fa2ac20028bb885", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 129955, "upload_time": "2019-04-09T06:04:14", "url": "https://files.pythonhosted.org/packages/cd/a2/9f7f1952398e5aeb2f39579616fab8c3fada84a956ba6c855e6bc30a99f1/ludwig-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "4774905b9d2844fa6c4c77698bca0afa", "sha256": "398b0489bb6e558467dea5aa98043ebe7452a2c1520e4bcaf657c636905004ab" }, "downloads": -1, "filename": "ludwig-0.1.2.tar.gz", "has_sig": false, "md5_digest": "4774905b9d2844fa6c4c77698bca0afa", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 134062, "upload_time": "2019-04-30T07:32:38", "url": "https://files.pythonhosted.org/packages/18/22/c16ee80fb46e50e11b01c233e5440697ae9f02f99b950411151e1b5efbe6/ludwig-0.1.2.tar.gz" } ], "0.2": [ { "comment_text": "", "digests": { "md5": "ec59d44f3d9e414a6da80adafdf236eb", "sha256": "8672d2914d9fae27ed5c132d8f7e99538b87089a111350ccc72cac31fd9d0c0e" }, "downloads": -1, "filename": "ludwig-0.2.tar.gz", "has_sig": false, "md5_digest": "ec59d44f3d9e414a6da80adafdf236eb", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 157266, "upload_time": "2019-07-24T07:34:00", "url": "https://files.pythonhosted.org/packages/41/d2/df3890a7582c7fa8642c0d7fbf1b2de55123b7d008d172a77f10fba1ecb6/ludwig-0.2.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "c361c7881572ce6b39bc58c7954e9535", "sha256": "e32ddb7ee43237ae8f0027e7d10ede539bab94e2df49883fd0c7a267b409698e" }, "downloads": -1, "filename": "ludwig-0.2.1.tar.gz", "has_sig": false, "md5_digest": "c361c7881572ce6b39bc58c7954e9535", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 164474, "upload_time": "2019-10-12T23:59:51", "url": "https://files.pythonhosted.org/packages/9c/59/59a658a5846cda34fc3832992e28e65d727c54a576ff48bcb3c90c3072e8/ludwig-0.2.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "c361c7881572ce6b39bc58c7954e9535", "sha256": "e32ddb7ee43237ae8f0027e7d10ede539bab94e2df49883fd0c7a267b409698e" }, "downloads": -1, "filename": "ludwig-0.2.1.tar.gz", "has_sig": false, "md5_digest": "c361c7881572ce6b39bc58c7954e9535", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 164474, "upload_time": "2019-10-12T23:59:51", "url": "https://files.pythonhosted.org/packages/9c/59/59a658a5846cda34fc3832992e28e65d727c54a576ff48bcb3c90c3072e8/ludwig-0.2.1.tar.gz" } ] }