Metadata-Version: 1.0
Name: authorid
Version: 0.0.1
Summary: A utility to determine the author of a document using linguistic analysis
Home-page: UNKNOWN
Author: Manan Shah
Author-email: manan.shah.777@gmail.com
License: MIT
Description: Description
        ===========
        
        A package that uses linguistic analysis in order to determine the author of a document. Currently version 0.0.1.
        
        Installation
        ------------
        
        1. Install the package from the PyPi homepage (run `setup.py install`) 
        2. Import the package in your project (`from authorid import authorid`)
        3. You're good to go! You can now run individual functions (i.e. `authorid.run()`) by calling the package name. 
        
        Usage
        -----
        
        `authorid` is a package meant to help analyze linguistic features of files and determine their potential author, provided a list of attributes. 
        In order to invoke the primary utility, simply run `authorid.run()` and you will be prompted for a file containing the mystery text. After analysis
        is complete, the program will print a signature for that file, and prompt for a directory where `.stats` files are stored. 
        
        If this is the first time you are running `authorid`, exit out of the main program now, and copy the signature list to another file, ending with the 
        signature `.stats`. A sample .stats file may look like the following: 
        
        ``
        first last
        4.41553119311
        0.0563451817574
        0.02229943808
        16.8869087498
        2.54817097682
        ``
        
        Remember to order the information correctly in order to ensure optimal results. Complete this step for various files, and when you have a directory containing
        your made `.stats` files (this process will be automated in `0.0.2`), run `authorid.run()` once more, this time with a mystery file and providing the directory with
        your STATS files. The program will compare signatures with those in the list and provide a "best match" author.
        
        Other utility functions are also available, which are listed below (also open sourced on GitHub):
        
        ``
        def clean_up(s)
        def average_word_length(text)
        def type_token_ratio(text)
        def hapax_legomana_ratio(text)
        def split_on_separators(original, separators)
        def average_sentence_length(text)
        def avg_sentence_complexity(text)
        def get_valid_filename(prompt)
        def read_directory_name(prompt)
        def compare_signatures(sig1, sig2, weight)
        def read_signature(filename)
        def run()
        ``
        
        Note that `text` is a list of strings. 
Keywords: linguistics author finding
Platform: UNKNOWN
