PK!GWWcranial/examples/README.mdto get data for these examples go to https://www.kaggle.com/snapcrack/all-the-news/homePK!> ubb!cranial/examples/gensim_lda.ipynb{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Train an LDA model\n", "\n", "- download data from ..\n", "- use spacy to tokenize and leave only nouns\n", "- train a gensim dictinoary\n", "- train gensim LDA" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import csv\n", "csv.field_size_limit(100000000)\n", "\n", "import glob\n", "import os\n", "import sys\n", "from toolz.functoolz import compose" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# turn on logging to see progress\n", "os.environ['CRANIAL_LOGLEVEL'] = \"INFO\"\n", "\n", "from cranial.re_iter import ReMap, ReChain, ReFilter, Progress, ReBatch, DiskCache, ReZip\n", "from cranial.models.spacy_tokenizers import SpacyWrapper\n", "from cranial.models.gensim_models import GensimDictionary, GensimTFIDF, GensimLDA" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get a data files list " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['data/articles1.csv', 'data/articles2.csv', 'data/articles3.csv']" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "files = glob.glob('data/*.csv')\n", "files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets check the header " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['', 'id', 'title', 'publication', 'author', 'date', 'year', 'month', 'url', 'content']\n" ] } ], "source": [ "with open(files[0]) as f:\n", " print(f.readline().strip().split(','))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define helper functions" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def read_csv(fname):\n", " \"\"\"Read a csv file and output each row as a dictionary\"\"\"\n", " with open(fname) as f:\n", " reader = csv.reader(f)\n", " header = next(reader)\n", " for line in reader:\n", " yield dict(zip(header, line))\n", " \n", "def to_tokens_list(doc):\n", " \"\"\"Take only nouns, remove stop words, and lemmatize\"\"\"\n", " return [t.lemma_ for t in doc if t.pos_ == 'NOUN' and not t.is_stop]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Instantiate spacy model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Start spacy model with in and out fields defined since each data point is a dictionary and we need to tokenize only text in \"content\" field.\n", "\n", "Alternatively, if each data point was a text, then in and out fields could be left as None.\n", "```python\n", "spacy_tokenizer = SpacyWrapper(lang='en', batch_size=1000)\n", "```" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2018-07-05T17:31:52PDT - spacy_tokenizers.py - INFO - loading spacy...\n" ] } ], "source": [ "spacy_tokenizer = SpacyWrapper(lang='en', in_field='content', out_field='doc', batch_size=1000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Define transformations of iterators" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# file names tranformed into iterators of rows in each file\n", "out = ReMap(read_csv, files)\n", "\n", "# all individual rows iterators are chained together\n", "records = ReChain(out, name='chain rows from files')\n", "\n", "# spacy creates a 'doc' key in each tranformed row wich containes spacy-parsed document\n", "out = spacy_tokenizer.itransform(records)\n", "\n", "# print out how many rows has been tranformed\n", "out = Progress(out, max_period=5000, name='OUT')" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# convert into a list of tokens\n", "tokens = ReMap(lambda rec: to_tokens_list(rec['doc']), out)\n", "\n", "# store each row to disk to avoid upstream re-runs (spacy is computationally expensive)\n", "tokens = DiskCache(tokens)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Instantiate gensim dictionary and train it" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2018-07-05T17:31:53PDT - gensim_models.py - INFO - Init gensim dictionary with params:\n", "{'no_below_raw': 0, 'no_above_raw': 1.0, 'max_n_raw': 100000, 'no_below': 10, 'no_above': 0.1, 'max_n': 10000, 'dict_filter_every': 50000}\n", "2018-07-05T17:31:53PDT - gensim_models.py - INFO - Building gensim dictionary...\n", "2018-07-05T17:31:53PDT - re_iter.py - INFO - Disk Cache:\tStart iter number 1\n", "2018-07-05T17:31:53PDT - re_iter.py - INFO - Disk Cache:\tSaving iterable to 4476141b-8009-4d46-a7c7-d8707b251d1c\n", "2018-07-05T17:31:53PDT - re_iter.py - INFO - reMap:\tStart iter number 1\n", "2018-07-05T17:31:53PDT - re_iter.py - INFO - OUT:\tStart iter number 1\n", "2018-07-05T17:31:53PDT - re_iter.py - INFO - reGenerate:\tStart iter number 1\n", "2018-07-05T17:31:53PDT - re_iter.py - INFO - chain rows from files:\tStart iter number 1\n", "2018-07-05T17:31:53PDT - re_iter.py - INFO - reMap:\tStart iter number 1\n", "2018-07-05T17:31:53PDT - re_iter.py - INFO - chain rows from files:\tStart iter number 2\n", "2018-07-05T17:31:53PDT - re_iter.py - INFO - reMap:\tStart iter number 2\n", "2018-07-05T17:35:44PDT - re_iter.py - INFO - OUT yielded 1 items\n", "2018-07-05T17:35:44PDT - re_iter.py - INFO - OUT yielded 2 items\n", "2018-07-05T17:35:44PDT - re_iter.py - INFO - OUT yielded 5 items\n", "2018-07-05T17:35:44PDT - re_iter.py - INFO - OUT yielded 10 items\n", "2018-07-05T17:35:44PDT - re_iter.py - INFO - OUT yielded 20 items\n", "2018-07-05T17:35:44PDT - re_iter.py - INFO - OUT yielded 50 items\n", "2018-07-05T17:35:45PDT - re_iter.py - INFO - OUT yielded 100 items\n", "2018-07-05T17:35:45PDT - re_iter.py - INFO - OUT yielded 200 items\n", "2018-07-05T17:35:45PDT - re_iter.py - INFO - OUT yielded 500 items\n", "2018-07-05T17:35:45PDT - re_iter.py - INFO - OUT yielded 1000 items\n", "2018-07-05T17:39:43PDT - re_iter.py - INFO - OUT yielded 2000 items\n", "2018-07-05T17:51:57PDT - re_iter.py - INFO - OUT yielded 5000 items.\tspeed now 4.15\tEMA speed 4.15\n", "2018-07-05T18:07:02PDT - re_iter.py - INFO - OUT yielded 10000 items.\tspeed now 5.52\tEMA speed 4.29\n", "2018-07-05T18:15:17PDT - re_iter.py - INFO - OUT yielded 15000 items.\tspeed now 10.09\tEMA speed 4.87\n", "2018-07-05T18:24:00PDT - re_iter.py - INFO - OUT yielded 20000 items.\tspeed now 9.57\tEMA speed 5.34\n", "2018-07-05T18:32:45PDT - re_iter.py - INFO - OUT yielded 25000 items.\tspeed now 9.52\tEMA speed 5.76\n", "2018-07-05T18:41:40PDT - re_iter.py - INFO - OUT yielded 30000 items.\tspeed now 9.35\tEMA speed 6.12\n", "2018-07-05T18:53:11PDT - re_iter.py - INFO - OUT yielded 35000 items.\tspeed now 7.24\tEMA speed 6.23\n", "2018-07-05T19:05:22PDT - re_iter.py - INFO - OUT yielded 40000 items.\tspeed now 6.84\tEMA speed 6.29\n", "2018-07-05T19:16:09PDT - re_iter.py - INFO - OUT yielded 45000 items.\tspeed now 7.72\tEMA speed 6.43\n", "2018-07-05T19:24:11PDT - re_iter.py - INFO - OUT yielded 50000 items.\tspeed now 10.38\tEMA speed 6.83\n", "2018-07-05T19:29:05PDT - gensim_models.py - INFO - Current dictionary: Dictionary(58696 unique tokens: ['access', 'administration', 'advocate', 'ally', 'appeal']...)\n", "2018-07-05T19:29:05PDT - gensim_models.py - INFO - Filtering at 50000 documents\n", "2018-07-05T19:29:06PDT - gensim_models.py - INFO - Now dictionary: Dictionary(58696 unique tokens: ['access', 'administration', 'advocate', 'ally', 'appeal']...)\n", "2018-07-05T19:49:11PDT - re_iter.py - INFO - OUT yielded 55000 items.\tspeed now 3.33\tEMA speed 6.48\n", "2018-07-05T20:02:56PDT - re_iter.py - INFO - OUT yielded 60000 items.\tspeed now 6.07\tEMA speed 6.44\n", "2018-07-05T20:10:20PDT - re_iter.py - INFO - OUT yielded 65000 items.\tspeed now 11.26\tEMA speed 6.92\n", "2018-07-05T20:21:32PDT - re_iter.py - INFO - OUT yielded 70000 items.\tspeed now 7.43\tEMA speed 6.97\n", "2018-07-05T20:37:15PDT - re_iter.py - INFO - OUT yielded 75000 items.\tspeed now 5.31\tEMA speed 6.80\n", "2018-07-05T20:49:12PDT - re_iter.py - INFO - OUT yielded 80000 items.\tspeed now 6.97\tEMA speed 6.82\n", "2018-07-05T20:56:43PDT - re_iter.py - INFO - OUT yielded 85000 items.\tspeed now 11.10\tEMA speed 7.25\n", "2018-07-05T21:04:20PDT - re_iter.py - INFO - OUT yielded 90000 items.\tspeed now 10.93\tEMA speed 7.62\n", "2018-07-05T21:12:12PDT - re_iter.py - INFO - OUT yielded 95000 items.\tspeed now 10.60\tEMA speed 7.91\n", "2018-07-05T21:27:23PDT - re_iter.py - INFO - OUT yielded 100000 items.\tspeed now 5.49\tEMA speed 7.67\n", "2018-07-05T21:30:26PDT - gensim_models.py - INFO - Current dictionary: Dictionary(88609 unique tokens: ['access', 'administration', 'advocate', 'ally', 'appeal']...)\n", "2018-07-05T21:30:26PDT - gensim_models.py - INFO - Filtering at 100000 documents\n", "2018-07-05T21:30:27PDT - gensim_models.py - INFO - Now dictionary: Dictionary(88609 unique tokens: ['access', 'administration', 'advocate', 'ally', 'appeal']...)\n", "2018-07-05T21:42:11PDT - re_iter.py - INFO - OUT yielded 105000 items.\tspeed now 5.62\tEMA speed 7.47\n", "2018-07-05T21:55:11PDT - re_iter.py - INFO - OUT yielded 110000 items.\tspeed now 6.41\tEMA speed 7.36\n", "2018-07-05T22:09:12PDT - re_iter.py - INFO - OUT yielded 115000 items.\tspeed now 5.95\tEMA speed 7.22\n", "2018-07-05T22:20:53PDT - re_iter.py - INFO - OUT yielded 120000 items.\tspeed now 7.13\tEMA speed 7.21\n", "2018-07-05T22:32:37PDT - re_iter.py - INFO - OUT yielded 125000 items.\tspeed now 7.10\tEMA speed 7.20\n", "2018-07-05T22:51:52PDT - re_iter.py - INFO - OUT yielded 130000 items.\tspeed now 4.33\tEMA speed 6.91\n", "2018-07-05T23:12:42PDT - re_iter.py - INFO - OUT yielded 135000 items.\tspeed now 4.00\tEMA speed 6.62\n", "2018-07-05T23:30:24PDT - re_iter.py - INFO - OUT yielded 140000 items.\tspeed now 4.71\tEMA speed 6.43\n", "2018-07-05T23:37:51PDT - re_iter.py - INFO - reMap:\tFinished iter number 2\ttotal items: 5\ttotal time: 21958.2 sec\n", "2018-07-05T23:37:51PDT - re_iter.py - INFO - chain rows from files:\tFinished iter number 2\ttotal items: 284570\ttotal time: 21958.2 sec\n", "2018-07-05T23:39:38PDT - re_iter.py - INFO - reMap:\tFinished iter number 2\ttotal items: 6\ttotal time: 22064.3 sec\n", "2018-07-05T23:39:38PDT - re_iter.py - INFO - chain rows from files:\tFinished iter number 2\ttotal items: 285140\ttotal time: 22064.3 sec\n", "2018-07-05T23:39:38PDT - re_iter.py - INFO - reGenerate:\tFinished iter number 1\ttotal items: 142570\ttotal time: 22064.4 sec\n", "2018-07-05T23:39:38PDT - re_iter.py - INFO - OUT:\tFinished iter number 1\ttotal items: 142570\ttotal time: 22064.4 sec\n", "2018-07-05T23:39:38PDT - re_iter.py - INFO - reMap:\tFinished iter number 1\ttotal items: 142570\ttotal time: 22064.4 sec\n", "2018-07-05T23:39:38PDT - re_iter.py - INFO - Disk Cache:\tSaved iterable to 4476141b-8009-4d46-a7c7-d8707b251d1c, size 219,687,347\n", "2018-07-05T23:39:38PDT - re_iter.py - INFO - Disk Cache:\tFinished iter number 1\ttotal items: 142570\ttotal time: 22064.4 sec\n", "2018-07-05T23:39:39PDT - gensim_models.py - INFO - Final raw dictionary: Dictionary(100000 unique tokens: ['access', 'administration', 'advocate', 'ally', 'appeal']...)\n", "2018-07-05T23:39:39PDT - gensim_models.py - INFO - Final dictionary: Dictionary(10000 unique tokens: ['access', 'advocate', 'ally', 'appeal', 'appropriation']...)\n" ] } ], "source": [ "gensim_dict = GensimDictionary({\n", " 'no_below_raw': 0,\n", " 'no_above_raw': 1.,\n", " 'max_n_raw': 100000,\n", " 'no_below': 10,\n", " 'no_above': 0.1,\n", " 'max_n': 10000,\n", " 'dict_filter_every': 50000,\n", "})\n", "\n", "gensim_dict = gensim_dict.train(tokens)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Convert tokens into Bag-of-Words representation" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "bow = gensim_dict.itransform(tokens)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Instantiate and train gensim LDA model" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2018-07-05T23:39:39PDT - gensim_models.py - INFO - Init gensim LDA with params:\n", "{'num_topics': 100}\n", "2018-07-05T23:39:39PDT - re_iter.py - INFO - GensimDictionary:\tStart iter number 1\n", "2018-07-05T23:39:39PDT - re_iter.py - INFO - Disk Cache:\tStart iter number 2\n", "2018-07-05T23:39:39PDT - re_iter.py - INFO - Disk Cache:\tReading saved iterable from 4476141b-8009-4d46-a7c7-d8707b251d1c\n", "2018-07-05T23:39:57PDT - re_iter.py - INFO - Disk Cache:\tFinished iter number 2\ttotal items: 142570\ttotal time: 17.6 sec\n", "2018-07-05T23:39:57PDT - re_iter.py - INFO - GensimDictionary:\tFinished iter number 1\ttotal items: 142570\ttotal time: 17.6 sec\n", "2018-07-05T23:39:57PDT - re_iter.py - INFO - GensimDictionary:\tStart iter number 2\n", "2018-07-05T23:39:57PDT - re_iter.py - INFO - Disk Cache:\tStart iter number 3\n", "2018-07-05T23:39:57PDT - re_iter.py - INFO - Disk Cache:\tReading saved iterable from 4476141b-8009-4d46-a7c7-d8707b251d1c\n", "/Users/merekhinsky/miniconda3/lib/python3.6/site-packages/gensim/models/ldamodel.py:775: RuntimeWarning: divide by zero encountered in log\n", " diff = np.log(self.expElogbeta)\n", "2018-07-05T23:41:04PDT - re_iter.py - INFO - Disk Cache:\tFinished iter number 3\ttotal items: 142570\ttotal time: 67.2 sec\n", "2018-07-05T23:41:04PDT - re_iter.py - INFO - GensimDictionary:\tFinished iter number 2\ttotal items: 142570\ttotal time: 67.2 sec\n" ] } ], "source": [ "g_lda = GensimLDA(lda_params={'num_topics': 100}, id2word=gensim_dict.state.model.id2token)\n", "g_lda = g_lda.train(bow)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Convert BOW representation to LDA sparse vectors and join with original data" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "vectors = g_lda.itransform(bow)\n", "\n", "# zip together with original records\n", "final = ReZip(records, vectors)\n", "\n", "# and add vectors to records\n", "final = ReMap(lambda x: {'lda': x[1], **x[0]}, final)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2018-07-05T23:41:31PDT - re_iter.py - INFO - reMap:\tStart iter number 1\n", "2018-07-05T23:41:31PDT - re_iter.py - INFO - re-zip:\tStart iter number 1\n", "2018-07-05T23:41:31PDT - re_iter.py - INFO - chain rows from files:\tStart iter number 3\n", "2018-07-05T23:41:31PDT - re_iter.py - INFO - reMap:\tStart iter number 3\n", "2018-07-05T23:41:31PDT - re_iter.py - INFO - GensimLDA:\tStart iter number 1\n", "2018-07-05T23:41:31PDT - re_iter.py - INFO - GensimDictionary:\tStart iter number 3\n", "2018-07-05T23:41:31PDT - re_iter.py - INFO - Disk Cache:\tStart iter number 4\n", "2018-07-05T23:41:31PDT - re_iter.py - INFO - Disk Cache:\tReading saved iterable from 4476141b-8009-4d46-a7c7-d8707b251d1c\n", "2018-07-05T23:45:07PDT - re_iter.py - INFO - reMap:\tFinished iter number 3\ttotal items: 3\ttotal time: 215.9 sec\n", "2018-07-05T23:45:07PDT - re_iter.py - INFO - chain rows from files:\tFinished iter number 3\ttotal items: 142570\ttotal time: 215.9 sec\n", "2018-07-05T23:45:07PDT - re_iter.py - INFO - re-zip:\tFinished iter number 1\ttotal items: 142570\ttotal time: 215.9 sec\n", "2018-07-05T23:45:07PDT - re_iter.py - INFO - reMap:\tFinished iter number 1\ttotal items: 142570\ttotal time: 215.9 sec\n" ] } ], "source": [ "# trigger all these final calculations\n", "final = [_ for _ in final]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Look at results" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "{'lda': [(14, 0.07707414031028748),\n", " (18, 0.48351725935935974),\n", " (21, 0.2209157645702362),\n", " (33, 0.15834416449069977),\n", " (55, 0.05055266246199608)],\n", " '': '146023',\n", " 'id': '218073',\n", " 'title': 'What U.S. Muslims fear from Trump',\n", " 'publication': 'Washington Post',\n", " 'author': 'Naureen Shah',\n", " 'date': '2016-12-30',\n", " 'year': '2016.0',\n", " 'month': '12.0',\n", " 'url': 'https://web.archive.org/web/20161231004909/https://www.washingtonpost.com/opinions/gen-kelly-has-talked-about-human-rights-will-trump-listen/2016/12/30/ebabbcea-c928-11e6-bf4b-2c064d32a4bf_story.html\\n',\n", " 'content': ' Naureen Shah is director of security and human rights at Amnesty International USA. The Obama administration is dismantling a homeland security program created to track immigrants from countries in an attempt to prevent Donald Trump from fulfilling his campaign promise to create a Muslim registry. As an American Muslim and human rights advocate, I am hoping against hope that retired Gen. John F. Kelly, the homeland security secretary nominee, will not reassemble the program. Kelly is not an obvious champion of human rights. As head of U. S. Southern Command, Kelly oversaw Guantanamo, where he frequently dismissed human rights concerns. Dozens of people languished in detention without charge, and many were after going on hunger strikes. But he could be our best hope in the Trump administration. While at Southern Command, Kelly invited critiques from human rights groups. Every year, he asked Amnesty International and other organizations to join him for a frank roundtable discussion. After one meeting, he took me aside to explain his point of view and hear me out. Dialogue and decency: In today’s political climate, these are as rare as unicorns. And they matter. If I could talk to Kelly today, I think he’d listen. I would tell him that people are afraid. Activists worry that if they speak out, the government could retaliate or put them under surveillance. Trump’s idle tweets about stripping people of citizenship for are eerily reminiscent of foreign dictators threatening to jail people for peaceful dissent. People like me — ordinary Americans with Muslim names and ancestry from countries — fear being put on a watchlist, barred entry into the United States, even banned because of who we are. Many people — African Americans, Jewish Americans, Muslim Americans, immigrants who’ve spent most of their adult lives here — spent the holidays swapping stories of threats, harassment and even violent attacks by fellow Americans who think the election has given them license to act on hatred. I believe Kelly would listen to me, not because he has ever agreed with me, but because he has been willing to talk. And a top national security official who values dialogue over diatribes is what we need to put the brakes on Trump’s most frightening counterterrorism proposals. Kelly must not revive NSEERS (the National Security Registration System). He is a smart man — he knows that a special registry would make for bad counterterrorism. Law enforcement officials need people to trust them and tip them off, not fear and avoid them. A special registration would send shockwaves through immigrant communities, inviting uncertainty and anxiety, more fear of law enforcement and less safety. Unlike some of Trump’s other national security advisers, though, Kelly does not appear to be infected with bizarrely virulent prejudice. And more than anything, the proposed Muslim ban, internment and special registration proposals are about prejudice — not safety. They cater to bigotry and fear, which fly in the face of our country’s most precious values. They tear at the seams of our commonality by implying that only some people are included in the ideals of liberty and justice. They drive people even farther apart from each other, after an election that already has left us fragmented. It may be naive to think that Kelly — or anyone else in the Trump administration — would risk his career to stand in the way of rights proposals. But many of these proposals, only a short while ago, would have been considered unimaginable. They threaten to return this country to the grimmest chapters of our history, like the mass imprisonment of U. S. citizens and noncitizens of Japanese descent. They are the stuff of dystopic novels, of nightmares. Kelly could reject the bigotry and irrationality of these proposals, and senators at his confirmation hearing should call on him to do so. The next secretary of homeland security can refuse to carry forward Trump’s policies, and also decline to cooperate with the FBI or any other agency on the surveillance of activists, immigrants or particular communities. Perhaps most important, the general could use his position to counteract advisers who may tell Trump that he needn’t listen to the millions of Americans who support human rights and civil liberties. Kelly was always willing to listen to the human rights community. Now, I’m hoping that the will listen to him. Read more on this issue: Josh Rogin: A workable Homeland Security plan for Trump Carter and Schulman: Trump is surrounding himself with generals. That’s dangerous. The Post’s View: Trump has made some dangerous appointments The Post’s View: Trump’s election threatens human rights around the world Jackson Diehl: Trump’s coming war against Islam '}" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "final[-10]" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.045*\"immigration\" + 0.036*\"immigrant\" + 0.019*\"border\" + 0.015*\"migrant\" + 0.012*\"deportation\" + 0.011*\"crime\" + 0.011*\"enforcement\" + 0.008*\"asylum\" + 0.008*\"citizen\" + 0.007*\"refugee\"\n", "0.020*\"rule\" + 0.020*\"bill\" + 0.016*\"judge\" + 0.015*\"ban\" + 0.011*\"ruling\" + 0.010*\"legislation\" + 0.010*\"governor\" + 0.008*\"lawmaker\" + 0.008*\"justice\" + 0.007*\"regulation\"\n", "0.017*\"march\" + 0.013*\"protest\" + 0.012*\"town\" + 0.012*\"hall\" + 0.012*\"senator\" + 0.012*\"activist\" + 0.010*\"corruption\" + 0.008*\"protester\" + 0.008*\"crowd\" + 0.007*\"dinner\"\n" ] } ], "source": [ "print(g_lda.state.model.print_topic(18))\n", "print(g_lda.state.model.print_topic(21))\n", "print(g_lda.state.model.print_topic(33))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 } PK!PO{{cranial/examples/w2v.ipynb{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Train a CBOW word vectors model" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "source": [ "%pylab inline" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import csv\n", "csv.field_size_limit(100000000)\n", "\n", "import glob\n", "import os\n", "import sys" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "import torch.optim as optim" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from toolz.functoolz import compose" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "os.environ['CRANIAL_LOGLEVEL'] = \"INFO\"\n", "from cranial.re_iter import ReMap, ReChain, ReFilter, Progress, ReBatch, DiskCache, ReZip\n", "from cranial.models.tokenizers import MosesTokenizer\n", "from cranial.models.gensim_models import GensimDictionary\n", "from cranial.model_base import StatefulModel" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define files list" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['', 'id', 'title', 'publication', 'author', 'date', 'year', 'month', 'url', 'content']\n" ] } ], "source": [ "files = glob.glob('data/*.csv')\n", "\n", "with open(files[0]) as f:\n", " print(f.readline().strip().split(','))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Helper function that returns generator of parsed lines from a file" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def read_csv(fname):\n", " with open(fname) as f:\n", " reader = csv.reader(f)\n", " header = next(reader)\n", " for line in reader:\n", " yield dict(zip(header, line))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Instantiate objects and define tranformations pipeline" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# this will create a pathos process pool with size 4 and is it to process individual items\n", "mt = MosesTokenizer('path_to_/mosesdecoder/', proc_type='sub', n_proc=4)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# file name -> generator of lines (records)\n", "out = ReMap(read_csv, files)\n", "\n", "# chain together all records from all generators\n", "records = ReChain(out, name='chain rows from files')\n", "\n", "# record -> text\n", "out = ReMap(lambda rec: rec['content'].lower(), records)\n", "\n", "# create batches of texts, join them all into single string with 4x new lines separators\n", "out = ReBatch(out, batch_size=2000)\n", "out = ReMap(lambda batch: '\\n\\n\\n\\n'.join(batch), out)\n", "\n", "# use moses tokenizer wrapper to convert text into a string where all tokens are separated by space\n", "out = mt.itransform(out)\n", "\n", "# split batched strings by 4x new lines and chain all results together\n", "out = ReMap(lambda s: s.split('\\n\\n\\n\\n'), out)\n", "out = ReChain(out)\n", "\n", "# just print the number of processed texts so far\n", "out = Progress(out, max_period=10000, name='OUT')\n", "\n", "# store to disk intermediate results to avoid costly re-runs\n", "out = DiskCache(out)\n", "\n", "# text -> list of tokens\n", "tokens = ReMap(lambda s: s.split(), out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train a dictionary\n", "\n", "Use gensim dictionary to control the size of the vocabulary and to convert tokens into integer IDs" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2018-07-03T08:49:06PDT - gensim_models.py - INFO - Init gensim dictionary with params:\n", "{'no_below_raw': 0, 'no_above_raw': 1.0, 'max_n_raw': 100000, 'no_below': 10, 'no_above': 1.0, 'max_n': 10000, 'dict_filter_every': 50000}\n", "2018-07-03T08:49:06PDT - gensim_models.py - INFO - Building gensim dictionary...\n", "2018-07-03T08:49:06PDT - re_iter.py - INFO - reMap:\tStart iter number 1\n", "2018-07-03T08:49:06PDT - re_iter.py - INFO - Disk Cache:\tStart iter number 1\n", "2018-07-03T08:49:06PDT - re_iter.py - INFO - Disk Cache:\tSaving iterable to 1bb3959f-7758-4d72-b7ea-6c0f127b8a4c\n", "2018-07-03T08:49:06PDT - re_iter.py - INFO - OUT:\tStart iter number 1\n", "2018-07-03T08:49:06PDT - re_iter.py - INFO - :\tStart iter number 1\n", "2018-07-03T08:49:06PDT - re_iter.py - INFO - reMap:\tStart iter number 1\n", "2018-07-03T08:49:06PDT - re_iter.py - INFO - MosesTokenizer sub:\tStart iter number 1\n", "2018-07-03T08:49:06PDT - re_iter.py - INFO - Trying to terminate previous pool\n", "2018-07-03T08:49:06PDT - re_iter.py - WARNING - Is this the first time creating a pool...\n", "2018-07-03T08:49:06PDT - re_iter.py - INFO - reMap:\tStart iter number 1\n", "2018-07-03T08:49:06PDT - re_iter.py - INFO - reBatch:\tStart iter number 1\n", "2018-07-03T08:49:06PDT - re_iter.py - INFO - reMap:\tStart iter number 1\n", "2018-07-03T08:49:06PDT - re_iter.py - INFO - chain rows from files:\tStart iter number 1\n", "2018-07-03T08:49:06PDT - re_iter.py - INFO - reMap:\tStart iter number 1\n", "2018-07-03T08:49:17PDT - re_iter.py - INFO - OUT yielded 1 items\n", "2018-07-03T08:49:17PDT - re_iter.py - INFO - OUT yielded 2 items\n", "2018-07-03T08:49:17PDT - re_iter.py - INFO - OUT yielded 5 items\n", "2018-07-03T08:49:17PDT - re_iter.py - INFO - OUT yielded 10 items\n", "2018-07-03T08:49:17PDT - re_iter.py - INFO - OUT yielded 20 items\n", "2018-07-03T08:49:17PDT - re_iter.py - INFO - OUT yielded 50 items\n", "2018-07-03T08:49:17PDT - re_iter.py - INFO - OUT yielded 100 items\n", "2018-07-03T08:49:17PDT - re_iter.py - INFO - OUT yielded 200 items\n", "2018-07-03T08:49:17PDT - re_iter.py - INFO - OUT yielded 500 items\n", "2018-07-03T08:49:17PDT - re_iter.py - INFO - OUT yielded 1000 items\n", "2018-07-03T08:49:18PDT - re_iter.py - INFO - OUT yielded 2000 items\n", "2018-07-03T08:49:18PDT - re_iter.py - INFO - OUT yielded 5000 items\n", "2018-07-03T08:49:24PDT - re_iter.py - INFO - OUT yielded 10000 items.\tspeed now 552.01\tEMA speed 552.01\n", "2018-07-03T08:49:39PDT - re_iter.py - INFO - OUT yielded 20000 items.\tspeed now 655.83\tEMA speed 562.39\n", "2018-07-03T08:49:51PDT - re_iter.py - INFO - OUT yielded 30000 items.\tspeed now 884.54\tEMA speed 594.61\n", "2018-07-03T08:50:04PDT - re_iter.py - INFO - OUT yielded 40000 items.\tspeed now 716.34\tEMA speed 606.78\n", "2018-07-03T08:50:26PDT - re_iter.py - INFO - OUT yielded 50000 items.\tspeed now 468.52\tEMA speed 592.95\n", "2018-07-03T08:50:37PDT - gensim_models.py - INFO - Current dictionary: Dictionary(215204 unique tokens: ['$', ',', '.', '13', '20']...)\n", "2018-07-03T08:50:37PDT - gensim_models.py - INFO - Filtering at 50000 documents\n", "2018-07-03T08:50:38PDT - gensim_models.py - INFO - Now dictionary: Dictionary(100000 unique tokens: ['$', ',', '.', '13', '20']...)\n", "2018-07-03T08:50:47PDT - re_iter.py - INFO - OUT yielded 60000 items.\tspeed now 463.07\tEMA speed 579.97\n", "2018-07-03T08:51:07PDT - re_iter.py - INFO - OUT yielded 70000 items.\tspeed now 513.42\tEMA speed 573.31\n", "2018-07-03T08:51:23PDT - re_iter.py - INFO - OUT yielded 80000 items.\tspeed now 629.87\tEMA speed 578.97\n", "2018-07-03T08:51:41PDT - re_iter.py - INFO - OUT yielded 90000 items.\tspeed now 548.88\tEMA speed 575.96\n", "2018-07-03T08:51:56PDT - re_iter.py - INFO - OUT yielded 100000 items.\tspeed now 660.14\tEMA speed 584.38\n", "2018-07-03T08:52:03PDT - gensim_models.py - INFO - Current dictionary: Dictionary(238507 unique tokens: ['$', ',', '.', '13', '20']...)\n", "2018-07-03T08:52:03PDT - gensim_models.py - INFO - Filtering at 100000 documents\n", "2018-07-03T08:52:04PDT - gensim_models.py - INFO - Now dictionary: Dictionary(100000 unique tokens: ['$', ',', '.', '13', '20']...)\n", "2018-07-03T08:52:13PDT - re_iter.py - INFO - OUT yielded 110000 items.\tspeed now 591.67\tEMA speed 585.11\n", "2018-07-03T08:52:30PDT - re_iter.py - INFO - OUT yielded 120000 items.\tspeed now 578.40\tEMA speed 584.44\n", "2018-07-03T08:52:58PDT - re_iter.py - INFO - OUT yielded 130000 items.\tspeed now 367.97\tEMA speed 562.79\n", "2018-07-03T08:53:10PDT - re_iter.py - INFO - reMap:\tFinished iter number 1\ttotal items: 3\ttotal time: 243.7 sec\n", "2018-07-03T08:53:10PDT - re_iter.py - INFO - chain rows from files:\tFinished iter number 1\ttotal items: 142570\ttotal time: 243.7 sec\n", "2018-07-03T08:53:10PDT - re_iter.py - INFO - reMap:\tFinished iter number 1\ttotal items: 142570\ttotal time: 243.7 sec\n", "2018-07-03T08:53:18PDT - re_iter.py - INFO - OUT yielded 140000 items.\tspeed now 476.87\tEMA speed 554.20\n", "2018-07-03T08:53:30PDT - re_iter.py - INFO - reBatch:\tFinished iter number 1\ttotal items: 72\ttotal time: 264.1 sec\n", "2018-07-03T08:53:30PDT - re_iter.py - INFO - reMap:\tFinished iter number 1\ttotal items: 72\ttotal time: 264.1 sec\n", "2018-07-03T08:53:30PDT - re_iter.py - INFO - MosesTokenizer sub:\tFinished iter number 1\ttotal items: 72\ttotal time: 264.1 sec\n", "2018-07-03T08:53:30PDT - re_iter.py - INFO - reMap:\tFinished iter number 1\ttotal items: 72\ttotal time: 264.1 sec\n", "2018-07-03T08:53:30PDT - re_iter.py - INFO - :\tFinished iter number 1\ttotal items: 142570\ttotal time: 264.1 sec\n", "2018-07-03T08:53:30PDT - re_iter.py - INFO - OUT:\tFinished iter number 1\ttotal items: 142570\ttotal time: 264.1 sec\n", "2018-07-03T08:53:30PDT - re_iter.py - INFO - Disk Cache:\tSaved iterable to 1bb3959f-7758-4d72-b7ea-6c0f127b8a4c, size 669,756,849\n", "2018-07-03T08:53:30PDT - re_iter.py - INFO - Disk Cache:\tFinished iter number 1\ttotal items: 142570\ttotal time: 264.1 sec\n", "2018-07-03T08:53:30PDT - re_iter.py - INFO - reMap:\tFinished iter number 1\ttotal items: 142570\ttotal time: 264.1 sec\n", "2018-07-03T08:53:33PDT - gensim_models.py - INFO - Final raw dictionary: Dictionary(100000 unique tokens: ['$', ',', '.', '13', '20']...)\n", "2018-07-03T08:53:34PDT - gensim_models.py - INFO - Final dictionary: Dictionary(10000 unique tokens: ['$', ',', '.', '13', '20']...)\n" ] } ], "source": [ "gensim_dict = GensimDictionary({\n", " 'no_below_raw': 0,\n", " 'no_above_raw': 1.,\n", " 'max_n_raw': 100000,\n", " 'no_below': 10,\n", " 'no_above': 1.,\n", " 'max_n': 10000,\n", " 'dict_filter_every': 50000,\n", "})\n", "\n", "gensim_dict = gensim_dict.train(tokens)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define tranformation from tokens to IDs " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# ise gensim dictionary to convert tokens to IDs, \n", "# each item in the resulting iterator is an original \n", "# text document represented as a list of integers\n", "ids = ReMap(lambda d: gensim_dict.state.model.doc2idx(d), tokens)\n", "\n", "# prune to go through the dataset faster so can go through it more times\n", "# this will improve randomization of training data\n", "ids = ReFilter(lambda _: np.random.rand() > 0.9, ids)\n", "\n", "# shiffle documents with buffer 20k\n", "ids = ReBatch(ids, batch_size=1000, only_full=True, shuffle=True, buffer_size=20000)\n", "ids = ReChain(ids)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define tranformations from tokenized documents to training examples" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# set context window size (number of tokens on each side of a center token)\n", "ws = 3\n", "\n", "# make (context windows, center) pairs\n", "pairs = ReMap(lambda l: [(l[i:i+ws] + l[i+ws+1:i+2*ws+1], l[i+ws]) for i in range(len(l) - 2*ws)], ids)\n", "pairs = ReChain(pairs)\n", "\n", "# shuffle examples with buffer 100k and assemble into batches\n", "pairs = ReBatch(pairs, batch_size=256, only_full=True, shuffle=True, buffer_size=100000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Transform batches into pytorch tensors" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def collate_fn(batch):\n", " \"\"\"Convert a batch of python lists of integers into pytorch tensors\"\"\"\n", " # shift all IDs by 1 to adjust for unknown words which have ID = -1 in gensim\n", " x = torch.LongTensor([pair[0] for pair in batch]) + 1 \n", " y = torch.LongTensor([pair[1] for pair in batch]) + 1\n", " return (x, y)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "batches = ReMap(collate_fn, pairs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define pytorch model" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "class CBOW(nn.Module):\n", "\n", " def __init__(self, vocab_size, embedding_dim):\n", " super().__init__()\n", " self.emb = nn.EmbeddingBag(vocab_size, embedding_dim, scale_grad_by_freq=True)\n", " self.lin = nn.Linear(embedding_dim, vocab_size)\n", " self.lin.weight = self.emb.weight\n", "\n", " def forward(self, x):\n", " out = self.emb(x)\n", " out = self.lin(out)\n", " out = F.log_softmax(out, dim=1)\n", " return out" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Instantiate and train" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "cbow = CBOW(len(gensim_dict.state.model)+1, 128)\n", "loss_fn = nn.NLLLoss()\n", "optimizer = optim.Adam(cbow.parameters(), lr=0.001)\n", "losses = []" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "for ep in range(5):\n", " for i, (x, y) in enumerate(Progress(batches, max_period=1000, name='batches')):\n", " optimizer.zero_grad()\n", " out = cbow(x)\n", " loss = loss_fn(out, y)\n", " loss.backward()\n", " optimizer.step()\n", " losses.append(loss.item())\n", " if i % 100 == 0:\n", " print(i, '\\t', loss.item())" ] }, { "cell_type": "code", "execution_count": 125, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 125, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAD8CAYAAACGsIhGAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xl8VOW9x/HPL4GwE3ZB9kVBRBBF1CIirgFrrVdF0apVXOrVW63WCtVbUVGpVlttveJGqVbFrVYoKlWKAooiyC47BAhbwhbCErI99485iZPJTE4CSc4k+b5fr7wy88xzzvk9mcl85yxzjjnnEBERKU1C0AWIiEj8U1iIiIgvhYWIiPhSWIiIiC+FhYiI+FJYiIiIL4WFiIj4UliIiIgvhYWIiPiqE3QBFaVVq1auS5cuQZchIlKtLFiwYKdzrrVfvxoTFl26dGH+/PlBlyEiUq2Y2cay9NNmKBER8aWwEBERXwoLERHxpbAQERFfCgsREfGlsBAREV9xHRZm1s3MXjWz94KuRUSkNvMNCzOrb2bzzGyxmS03s4ePdGFmNtHM0s1sWZTHUsxslZmtNbPRAM659c65UUe6vLLYezCHaUu2VeYiRESqvbKsWRwGznXO9QNOBlLM7IzwDmbWxsyaRLT1iDKvSUBKZKOZJQLPA8OA3sBIM+tdphEcpTve/I473vyOLXsPVcXiRESqJd+wcCH7vbt1vR8X0W0I8KGZ1Qcws1uA56LMaxawO8piBgJrvTWJHGAycGlZBmBml5jZS5mZmWXpXsLm3aGQyM0rOKLpRURqgzLtszCzRDNbBKQDnzrnvgl/3Dn3LvAJMNnMrgVuAkaUo472wOaw+2lAezNraWYTgP5mNibahM65qc65W5OTk8uxuLDpvdwzO6LJRURqhTKdG8o5lw+cbGbNgA/MrI9zbllEnyfNbDLwAtA9bG2kLKK9VTvn3C7gF+WYT7k5V1iA0kJEJJZyHQ3lnNsLfE70/Q6DgT7AB8BD5awjDegYdr8DsLWc8zgiRWGhrBARiaksR0O19tYoMLMGwPnAyog+/YGXCe1nuBFoYWbjylHHt8BxZtbVzJKAq4Ep5Zj+iLVLrg8oLERESlOWNYt2wEwzW0LoTf1T59y/Ivo0BK50zq1zzhUANwAlTntrZm8Bc4GeZpZmZqMAnHN5wJ3AdGAF8I5zbvmRDqo85m/cA8CcNTurYnEiItWS7z4L59wSoL9Pny8j7ucSWtOI7DeylHl8BHzkV09lyc3X0VAiIrHE9Te4q1LkscAiIvIDhYUnv0BxISISi8LCo6wQEYlNYeEpUFqIiMSksPDkKSxERGJSWHgyD+UGXYKISNxSWHgmfLEu6BJEROKWwsKTlKg/hYhILLX+HXLwca0AOKFdE5+eIiK1V60Pi7reGoX2b4uIxFbrw6Lw/IEFTmkhIhKLwsI73eyhnPyAKxERiV+1PiyWpO0FYP3OAwFXIiISv2p9WBzWtbdFRHzV+rC4uG+7oEsQEYl7tT4s9h7MCboEEZG4V+vDwtD1VEVE/NT6sOh9bNOgSxARiXu1PiyOa9M46BJEROJerQ+LunVq/Z9ARMRXrX+n1AkERUT81fp3yuQGdYMuQUQk7tX6sEhM0NFQIiJ+FBYKCxERX7U+LBJMYSEi4qfWh4XWLERE/NX6sOjUomHQJYiIxL1aHxZasxAR8Vfrw0JERPwpLERExJfCQkREfCksRETEl8JCRER8xXVYmFk3M3vVzN4LuhYRkdrMNyzMrKOZzTSzFWa23MzuOtKFmdlEM0s3s2VRHksxs1VmttbMRgM459Y750Yd6fJERKRilGXNIg+41zl3AnAGcIeZ9Q7vYGZtzKxJRFuPKPOaBKRENppZIvA8MAzoDYyMXIaIiATHNyycc9ucc995t7OAFUD7iG5DgA/NrD6Amd0CPBdlXrOA3VEWMxBY661J5ACTgUvLMxAREak85dpnYWZdgP7AN+Htzrl3gU+AyWZ2LXATMKIcs24PbA67nwa0N7OWZjYB6G9mY2LUdImZvZSZmVmOxYmISHmUOSzMrDHwPnC3c25f5OPOuSeBbOAF4CfOuf3lqCPaOTecc26Xc+4Xzrnuzrknok3onJvqnLs1OTm5HIsTEZHyKFNYmFldQkHxhnPuHzH6DAb6AB8AD5WzjjSgY9j9DsDWcs7jqO09mFPVixQRqRbKcjSUAa8CK5xzz8To0x94mdB+hhuBFmY2rhx1fAscZ2ZdzSwJuBqYUo7pK0TmodyqXqSISLVQljWLQcB1wLlmtsj7GR7RpyFwpXNunXOuALgB2Bg5IzN7C5gL9DSzNDMbBeCcywPuBKYT2oH+jnNu+RGP6ght2XuoqhcpIlIt1PHr4JybQ/R9CuF9voy4n0toTSOy38hS5vER8JFfPZUpOzc/yMWLiMStuP4Gd1VbsHFP0CWIiMQlhUWYvQe1z0JEJBqFRZh+HZoFXYKISFxSWITJyS8IugQRkbiksAgz6avUoEsQEYlLCoswa9PL86VzEZHaQ2EhIiK+FBYiIuJLYSEiIr4UFkC75PpBlyAiEtcUFiIi4kthASQmlHrqKxGRWk9hgcJCRMSPwgK4+ayuQZcgIhLXFBZA72ObBl2CiEhcU1gALRvVC7oEEZG4prAA8gpc0CWIiMQ1hQWQV6CzzYqIlEZhAeTla81CRKQ0CgugZeOkoEsQEYlrCgugXXKDoEsQEYlrCgsREfGlsBAREV8KCxER8aWwEBERXwqLCNm5+UGXICISdxQWEdam7w+6BBGRuKOwiGA6W7mISAkKCxER8aWwEBERXwqLCLPX7Ay6BBGRuKOwiDD+45VBlyAiEncUFiIi4kthISIivhQWIiLiS2EhIiK+FBYiIuJLYSEiIr4UFiIi4kthISIivhQWIiLiS2EhIiK+FBYiIuJLYSEiIr4UFp7/ObdH0CWIiMQthYXHdIk8EZGYFBaehLCsSNtzMLhCRETikMLCkxi2ZrFyW1aAlYiIxB+FhSchbNUir8AFWImISPxRWHhaNEoqup2vsBARKUZh4TmtS/Oi23kFBQFWIiISfxQWnu6tGxfd3rEvO8BKRETij8LCE37o7LwNewKsREQk/igsovhsxY6gSxARiSsKCxER8aWwEBERXwoLERHxpbAQERFfCgsREfGlsBAREV8KCxER8aWwiCEnT6f8EBEppLCIwaGTCYqIFFJYiIiIL4WFiIj4isuwMLNuZvaqmb0XVA3LtmQGtWgRkbhTZWFhZhPNLN3MlkW0p5jZKjNba2ajAZxz651zo6qqtmgysg4HuXgRkbhSlWsWk4CU8AYzSwSeB4YBvYGRZta7CmuK6ZlPVwddgohI3KiysHDOzQJ2RzQPBNZ6axI5wGTg0qqqKdI1p3cqur16x/6gyhARiTtB77NoD2wOu58GtDezlmY2AehvZmNiTWxmt5rZfDObn5GRcdTF1E0w/04iIrVQnYCXH+3d2TnndgG/8JvYOfcS8BLAgAEDjvqLEV1bNTraWYiI1EhBr1mkAR3D7ncAtgZUC6d3axnUokVE4lrQYfEtcJyZdTWzJOBqYEpQxSSYNkOJiERTlYfOvgXMBXqaWZqZjXLO5QF3AtOBFcA7zrnlVVVTpMSgo1NEJE5V2T4L59zIGO0fAR9VVR2liVyzyM7Np37dxICqERGJH/osHSZyB3ev//0koEpEROKLwiKMaZ+FiEhUCgsREfGlsBAREV8KCxER8aWwEBERX9U+LMzsEjN7KTNT158QEaks1T4snHNTnXO3JicnB12KiEiNVe3DQkREKp/Cwsfm3QeDLkFEJHAKCx+Dn5wZdAkiIoFTWIiIiC+FRYRWjZOCLkFEJO4oLCL87IzOQZcgIhJ3FBYRbj+ne9AliIjEHYVFhHp1dP0KEZFICosyWLMjK+gSREQCpbAog5mr0oMuQUQkUNU+LKri3FAZWYcrbd4iItVBtQ+Lqjg31MuzN1TavEVEqoNqHxZVxTkXdAkiIoFRWJTRvuy8oEsQEQmMwiKKE49tWqLNLIBCRETihMIiiuevOaVE23vz0wKoREQkPigsoujSqlGJtkf+9X0AlYiIxAeFRTnk5RcEXYKISCAUFuXQ44GPgy5BRCQQCgsREfGlsCinZz5dHXQJIiJVTmFRTs/NWMMLn6+joEBf0hOR2kNhcQR+/8lK7nzru6DLEBGpMgqLI/TR0u1auxCRWkNhcRQUFSJSW1T7sKisU5Q/N7K/b5+0PQcrdJkiIvGq2odFZZ2i/Cf9jvXtM+SpzzmUk1+hyxURiUd1gi6gunvuP2toXK8OGVmHmfRVKtN+eRYnHlt519YQEQmCwqIUbZrUI93nKnkvfL6u2P33F2yhW6vG7Nx/mI4tGlZmeSIiVabab4aqTI9c2ueIprv19fkMfnImWdm5MY+YWrhpD/M27D6a8kREqozCohQpfdqWe5r1O/cze81OAE4a+28e/2hF1H6X/d9XjHhx7lHVJyJSVRQWFezzVRnF7r8yZwO/entR0RrGtsxD2ikuItWO9llUgQ8WbgFgxbZ9rNyeVSHz/Oz7HXRu2ZDjjmlSIfMTESmN1ix81EmomOupfrBwS9SgGPHiXH751sIS7f9aspXMg7lF979ev4v73l3M4bzQWsnNr83ngj/OqpDapHr7Zv0uVmzbF3QZUsMpLHy0alyvUuc/b8NupizeyrBnZ9NtzDQO5uSxcdcB7nxzIb+cvJCCAkdefgFXv/Q17y5I461vNhULkaq0YOMefvzn2WTnajNaPLnqpa8Z9uzsoMuI6tvU3YG9XqViKSx81EmsmDULPyu27aPAQe/fTS/abPXF6gwu+cucYhddGjv1e/o98u+i+/9aspVznprJ2CnLeWza90xfvh2A9xaksXHXATbvLv4t822Zh0jPyj6iGh+ZupxlWypuU1q47ZnZCqEaJi+/gCsnzOX6id8EXUrcys0vYOyU5ezaX/oh+vFAYeFj1Fldq3yZf/psTdHt5VtL37xw55sLSd11kElfpfLy7A3c9voCJn25gV+/u5ghT33O4Cdncv97S7hh4jzmp+7mzCf+w8DHZhSbR0GBo8voaXQZPY39h/NwznHHG9/xxeriO+vXpO8H4KfPf8m4KNckv3vyQs59+nPf8a1NzyoWDGt2ZHHGEzO47fUFHMzJI/8oT9A4e00Gq8oQaMu2ZNJl9DQWbCx+CPP2zGwu+uMstmceWajGsnDTHm55bX6x8f3x09W8Mnt91P4zV6WzbEvFnsamsny4aAtrdkT/my/zeQ1XtfEfr2Takm1BlwHAv5fvYNJXqTw8tfj/06QvN5T4oBfNba/P54Jnvqis8opRWPioXzcx6BLKbWzEC+/t+Zv5YnUGV0z44VDdp6avZNSkbznr9/9h3LQfDu8dO2U5Xcd8xLSl27hh4jwAVu/IosvoaRwMO4rrlTkbcM7x0dJtdBk9jUemfs8/F21lfcYBdu0/HPO8WSc9NJ3zn5nF9a/Oo6DA8eIX64r2vXyxOoPev5vOLycv5OVZ63nwn0sBcM7xxeqMoiPKDhzOKwq3bZmHiubtnGPHvmyue3UeF/2p5P6cKYu3MnfdLnLzC3h1zgZmrkwHYOribYz/eCWpOw9QUOB4c94mVu3IYvK3m2L+jfcfzisKmfDg2384j537D5Mbdr32sVOWM2t1Bne+uZBPv99RrOZnZ6wp9vd3zrFh5wFemrWOG//6LT/+85yix/zWvHbsC4VbXn4B05dvx7noobsv++g2C42YMJfX5qYWa7tr8qKY+9Bi1RHpg4VpXPXiXLbsPVSsPSevgFGTvuX7CgqdCV+s4443S15iYP/hPF74fF2pH1Yysg5z6V/msD0zm7z8AibO2cCfZ6wpNsav1u6k25hp7D2YA4Sej90HcqLOr8CbLj9s+syDuYyd+j3XvPK171imL99R9CGussX10VBm1gj4PyAH+Nw590ZV11DG13m18/zMH755PvHLDUW331uQVqzfh4u2cNfkRVHn0XXMR1Hnceq4zwB46bpTef3rjUXfOzk2uT5Zh/MAmJe6m26//Yhopi3ZVvTJ76T2ydz//tKix964+XSufeWHzRpnPvEfAE7u2IxFm/eWmFdWdi7XT5zHn646uehAggcvPoFx01bQpklof9Skr1KB0JvIfRf1LBZKAK/MXk/XVo1YkpZJ3w7JjPrb/KL5n39CGz5bkc4ndw+m5zFN6PPQ9KLH6iYaufmu2DIA1qbvp2mDujStX7dYW8OkRGauSueBD5aVGMctr83n0+930POYJvz95tNJblCXR/61nL9//UOgnf74DE5o15Q6CcZSb41k9m+G0rR+Xb7btIehvdowb8NuRrw4l+dG9qd760Ys37qP7ZnZ3DioC03q12Xx5r10b9OYRkmJrMs4QJum9Whavy4/+csc9h3K5ZROzZmXupt5qbtpUDeRrq0aMaBLi2K1bs/MZtHmPbT2/r4FDvo9/G8yD+Uy494htG/WgBkr0jm9WwtGTfqWey7sSW5eAb96ezEAP584j0/vGVI0v++37WPGynQy9h/mlsHdyMg6zHkntKFzy0Yl/k4A6zP2896CNO67qCdmsTcj5+YXUDcxgT0HclixfR/XvBx6Xe3LzmXjrgM8e3V/9hzM4eqXvmZ9xgHmjjmX9+ansTgtkzOemEG75Pps89Y+/+vUDqxN348BL85aR4GDpVsy+WTZdt74JvQcrXgkhQZJoQ+fh3LyqV+35Gf1ZVsyi14rm3eHQnPTroPsOnCYE49NZsrirfy4b7sSH2KXpO2lb4dmMcdaEawsqW9mzYBXgD6Ezsx9k3Ou3N8oM7OJwI+BdOdcn4jHUoBngUTgFefceDO7DtjrnJtqZm87566KNe8BAwa4+fPnx3r4iKVnZTP82dns3B/9k4FIZfv0V2dX2JFvrZvUIyPGKWw+uXswKX8quaP8zVtOL3ojjebeC47n6XJcbrh/p2Ys3FQy2MPVSTD6dkjmgt5t6dchmWte+YZurRqxfueBEn3PPr41r900EAjtkyv8AHFOz9bsO5TLd5v20qBuIt8+eD4HD+cx8PHQZtif/6gLtw3pVtS/LG4Z3JWXZ28otc+gHi35cu0u7hjavdiHso/vGsxLs9bzz0VbcA4aJiUytGcbpi3dRstGSTw3sn+xD0JQ9r/tz3/UhbE/ObHM4whnZguccwN8+5UxLP4GzHbOvWJmSUBD59zesMfbAIecc1lhbT2cc2sj5nM2sB94LTwszCwRWA1cAKQB3wIjgUuBj51zi8zsTefcNbFqrKywKPTXLzeU2K4oIvHjuZH9ox6GXhtURVj47rMws6bA2cCrAM65nPCg8AwBPjSz+t40twDPRc7LOTcLiHZCpIHAWufceudcDjCZUFCkAR1Kq7WyrmcRqUur6Ku8IhIfamtQABzMyav0ZZRlB3c3IAP4q5ktNLNXvH0JRZxz7wKfAJPN7FrgJmBEOepoD2wOu5/mtf0DuNzMXgCmRpuwsq5nEaljc51BVkTi0zvz0/w7HaWyhEUd4BTgBedcf+AAMDqyk3PuSSAbeAH4iXOuPLvoo+2Fcs65A865G51ztwexcztcjzaNg1y8iEigyhIWaUCac65wz8t7hMKjGDMbTGgH+AfAQ+WsIw3oGHa/A7C1nPOodA8MP4EG1fBQWhGRo+UbFs657cBmM+vpNZ0HFNvTa2b9gZcJ7We4EWhhZuPKUce3wHFm1tXbgX41MKUc01eJW87uxopHU4IuQ0SkypX1S3n/A7xhZkuAk4HHIx5vCFzpnFvnnCsAbgA2Rs7EzN4C5gI9zSzNzEYBOOfygDuB6cAK4B3n3PIjGZCIiFS8Mn0pzzm3CIh5aJVz7suI+7mE1jQi+40sZR4fAdG/pSUiIoHS6T6OwM/O6ARA6viLubhvu4CrERGpfGX6Ul51UNlfygvnnKPAQaJ3rYucvAKOf/Bjn6lERCpP6viLj2i6CvtSnpRkZkVBAZBUJ4FV47TjW0RqLoVFBalXJ5HL+rcPugwRkUqhsKhAf7iyH8sevijoMkREKpzCogIlJhiN69XhmRH9AGhc74eDzf7v2hLfYxQRqTYUFpWgg3ceqTZNQ+fzT0pMoFfbJgAM69M2sLpERI6UwqISFJ4SpFfbJjz60z58ft85dGvdmL/eeBpPj+jH12POY+nYC4v6r3lsGH+4sl+J+Zx9fOuo879jaPfKKVxEJIa4vlJedXVSh2SevKIvKX3aFrsa2tCebQBomBT6s/92eC+en7mOuokJDPGCoWWjJHYdyKF760a8dtNAMrIOc9pjnxXNo/DwuPCLqoiIVDatWVSSEQM6FguKaG49uzuLHwqtYdTzLrHYoUVD/nbTQCbfeiYQurLZ4ocu5NKTj+WJ/zqpaNoNTwznilM7lJhnq8b1OK5NYxp6l2/8m3cFsctPKdm3NEl1Ehh7SW/aJdeP2WfCz04t1zxFpPpSWMSJpvXr8vL1A5h4wwCGHN+66PrFAMkN6vLs1f0ZObBTUZuZ8asLjuesHq2KzefyU9rz6T1D+OyeIfz1xtMYcnxrUsdfzFNX9OXJy/uWWO6JxzYtuv3+7T/i3V+EQuqk9sn8fFBXvrhvaLH+V4YFVIq3/yV8Hn4m3Xha0e2//vy0Unr+4Jim9Ti1c3Mm/OxUNjwxvNS+t5/TnZ+efGyJ9s4ti1+PpF/Hyr1esdRe3Vv7Xyhtws+O/oCXDs0bHPU8ykNhEUcu6H0MLRvX8+/oad+sAX+/+XQgdCTW4t9dyP0pvQA4tlmDos1eAAkJxojTOjLj3iH07ZDMSe1DF4vqeUyToj4ntPvhduFXDpPqJPDCtadwzwXHc16vNjzxXycx4Wen8uEdgwD47J6zmXzrGZzcsRkvXz+A24Z0i1nvr84/nnN6tuGTuwez7OGLGNqrDX+5pj8AIwaEQuj+lF588N8/KjZdgQsFWUqftpgZG54YztNX9uOu844rsYz7U3rx2GWhNbCnrujLjHuHMPXOs/jivqHMuT8UfGf1aMW7t53JqnEpXHN6J24+q2uJ+Xz/yEUsePB8Fjx4flHQNG9Yck1x+EltYwbYH68quR8KYPW4Ybx43am8c9uZ3DSo+LL7d2rG30edzoMXn8BNg7ryt5sGMuXOQaSc2DbqN3RTx1/M27eeEXU54Xq3a0rXVo1Y+WgKqeMvZt4D50Xt9/vLT+L9289k7phzSR1/MWseG8a4n/aJ2hdg8e8ujPlYuJEDO7HhieGsf3w4fTv8cKGyP1zZj+dG9i912tTxFzP7N0NL7RPuwYtP4P3bi7+G6nmv42jaNq3PK9eHvsD8aMRYH7/sJP73x72L7k/75Vm8fH3sLzuf26sN0345mG4+gXFh77asHjes6P4ndw+m8Hu+hQfDAMx/8Pyo098yuCtz7j+36H5p/3cVRaf7qAE27TpIw3qJtCpH0AB8tXYn/Ts158NFWxg7dTnLH05hwcY9jHhxLgM6N+e9iH+4slq+NZMT2jbl9jcWsPtADovTMpnzm6G0aRp7k1akt+ZtYsw/lgKhT2oz7j0nZt8uo6cBcE7P1ky6cWCp89206yDHNqtPncTin5PeX5DGj3q05LMV6Qzu0arYZXT/9lUqD01Zzp9H9qd5wyQGdm3Bpt0H6NHmh3/qwhrCrXt8OJO+SiUp0Xh3QRp/uupk9hzM4dTOLYr1e2Tq94wc2JHjwoI7lv9+YwE9j2nKXeeXDMrwGlLHX8y6jP2c9/QXRfejWZexn8O5BTRvVJfcPEenlrGvCFk4/xn3DqF9swbU9w7kyM7Np25iAl+t28mWPYcY7T1v4R79aR+uO6MzAHn5BfR4IHR6nJWPppCVncdpj31WtL/umtM78fhlJ5Hyp1nkFzg+vWdI1DEWTp+UmEC33/5wDtLwsabtOUizhknFDmNfm57F+c/MolXjJHbuz2FQj5a8cXPxwC1czoYnhmNmDBr/H7bsPcTs3wwlPeswl7/wFf07NWN0Si8e/OcyurduzCfLt/PYZX249vTOZOfm0+t/PwHg9VEDue7VecXmX1hjl9HT6NC8AXPuP5es7Fye/WwN96X0ZOribbRLrs+gHq1YsHE3l78wFwi9xj9flVFU17vzNzNu2gpm/vocWjRKivnclaasp/uo9mFhZpcAl/To0eOWNWvWBF1OtZedm8+1r3zDwz85kT7tK/dStX4K/2HP7NaSt0r59PzmN5v47QdLmX732fRs6/+GW14FBY6v1u1iUI+WmEW7qOMPtSYlJvDUlX2Zn7qHRy49MWb/ynDPO4v4x3dbuOHMzjx8aegTcnZuPs5Bg6Sjv2jXgo17+Hr9Lu4Y2iNmH+cc7y1Io1/HZlz4x1lF7V+OPpf2zaJvNik8iKNV4yTmP3hBqTUUFDj+tXQbJ7RtwsrtWVzSL7TJcdHmvbRv1oA6CUbzMr5pOud4dsYarhnYqcQHmdU7sliXvp9hJ4VOFHr+M1+wNn0/c+4fSqOkOvR/9FOevrIfl3ubZbNz83ltbio3Depa9EFk4pwNNK5fhxEDQtd1K3yNnNalOe/+IvRBbM+BHOrVTSg66CUItSYsCtXmNYua6sF/LuXvX2/ijG4tinb4x6vX56bSukn9ov04QZmxYgdnHdeKenWCv6Ljxl0HaJhUp9j+t2h27T/MqeM+o1OLhswqx+amqrR590E+XLSFO4b2OOIPAGvTs/j39zu4akDHcm1urmxlDQsdOitx65K+x/L3rzfFxRufn+vO7BJ0CQCcd8IxQZdQpHNL/x29AC0b1+O3w3uRcmL8nu6/Y4uG3HluyU1/5dGjTZNimy6rG4WFxK3TurTgzqE9uP5HnYMuRSrZrWfri6bxTmEhcSshwfj1RT39O4pIpdOhsyIi4kthISIivhQWIiLiS2EhIiK+FBYiIuJLYSEiIr4UFiIi4kthISIivmrMuaHMLAPYeISTtwJ2VmA58aQmjw1q9vg0tuqpuo2ts3Mu+jWcw9SYsDgaZja/LCfSqo5q8tigZo9PY6ueaurYtBlKRER8KSxERMSXwiLkpaALqEQ1eWxQs8ensVVPNXJs2mchIiK+tGYhIiK+an1YmFmKma0ys7VmNjroekpjZqlmttTMFpnZfK+thZl9ambOSCw/AAAEBklEQVRrvN/NvXYzs+e8cS0xs1PC5nOD13+Nmd0Q1n6qN/+13rSVdgFpM5toZulmtiysrdLHEmsZVTC2sWa2xXvuFpnZ8LDHxnh1rjKzi8Lao742zayrmX3jjeFtM0vy2ut599d6j3ephLF1NLOZZrbCzJab2V1ee7V/7koZW4147o6ac67W/gCJwDqgG5AELAZ6B11XKfWmAq0i2p4ERnu3RwO/924PBz4GDDgD+MZrbwGs934392439x6bB5zpTfMxMKwSx3I2cAqwrCrHEmsZVTC2scCvo/Tt7b3u6gFdvddjYmmvTeAd4Grv9gTgdu/2fwMTvNtXA29XwtjaAad4t5sAq70xVPvnrpSx1Yjn7qj/PkEXEOjgQy/I6WH3xwBjgq6rlHpTKRkWq4B23u12wCrv9ovAyMh+wEjgxbD2F722dsDKsPZi/SppPF0o/oZa6WOJtYwqGFusN5xirzlguve6jPra9N5AdwJ1Il/DhdN6t+t4/aySn8MPgQtq0nMXZWw18rkr709t3wzVHtgcdj/Na4tXDvi3mS0ws1u9tmOcc9sAvN9tvPZYYyutPS1Ke1WqirHEWkZVuNPbFDMxbBNKecfWEtjrnMuLaC82L+/xTK9/pfA2lfQHvqGGPXcRY4Ma9twdidoeFtG2ycfz4WGDnHOnAMOAO8zs7FL6xhpbedvjQU0YywtAd+BkYBvwtNdekWOrsnGbWWPgfeBu59y+0rrGqClun7soY6tRz92Rqu1hkQZ0DLvfAdgaUC2+nHNbvd/pwAfAQGCHmbUD8H6ne91jja209g5R2qtSVYwl1jIqlXNuh3Mu3zlXALxM6LmD8o9tJ9DMzOpEtBebl/d4MrC7osdiZnUJvZm+4Zz7h9dcI567aGOrSc/d0ajtYfEtcJx3hEISoR1LUwKuKSoza2RmTQpvAxcCywjVW3gkyQ2EtrPitV/vHY1yBpDprbpPBy40s+be6vSFhLabbgOyzOwM7+iT68PmVVWqYiyxllGpCt/kPJcReu4K67naOxqmK3AcoR28UV+bLrRReyZwRZQxhI/tCuA/Xv+KHIcBrwIrnHPPhD1U7Z+7WGOrKc/dUQt6p0nQP4SO1lhN6OiFB4Kup5Q6uxE6qmIxsLywVkLbNWcAa7zfLbx2A573xrUUGBA2r5uAtd7PjWHtAwj9I6wD/kIl7mAD3iK0Sp9L6FPVqKoYS6xlVMHYXvdqX0LojaFdWP8HvDpXEXYEWqzXpvdamOeN+V2gntde37u/1nu8WyWM7SxCm0eWAIu8n+E14bkrZWw14rk72h99g1tERHzV9s1QIiJSBgoLERHxpbAQERFfCgsREfGlsBAREV8KCxER8aWwEBERXwoLERHx9f/Parzisj2YywAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "semilogy(losses)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Do a little bit of words math" ] }, { "cell_type": "code", "execution_count": 151, "metadata": {}, "outputs": [], "source": [ "v1 = sum([cbow.emb.weight[gensim_dict.state.model.token2id[w] + 1] for w in 'washington'.split()])\n", "v2 = sum([cbow.emb.weight[gensim_dict.state.model.token2id[w] + 1] for w in 'america'.split()])\n", "v3 = sum([cbow.emb.weight[gensim_dict.state.model.token2id[w] + 1] for w in 'russia'.split()])\n", "\n", "vq = v1 - v2 + v3" ] }, { "cell_type": "code", "execution_count": 152, "metadata": {}, "outputs": [], "source": [ "distances = torch.sum((cbow.emb.weight - vq)**2, dim=1)" ] }, { "cell_type": "code", "execution_count": 153, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor(28.6929) \t washington\n", "tensor(32.9521) \t moscow\n", "tensor(33.3162) \t post\n", "tensor(35.9955) \t russia\n", "tensor(40.3321) \t post.\n", "tensor(40.5119) \t sanctions\n", "tensor(41.0595) \t russian\n", "tensor(41.3179) \t c.\n", "tensor(41.9050) \t kislyak\n", "tensor(43.6825) \t sergey\n", "tensor(44.4279) \t russia.\n", "tensor(44.6247) \t tehran\n", "tensor(44.7495) \t d.\n", "tensor(45.2658) \t ambassador\n", "tensor(46.1342) \t diplomatic\n" ] } ], "source": [ "ds, ixs = torch.sort(distances, 0)\n", "for i, d, j in zip(range(15), ds, ixs):\n", " j = j.item()\n", " if j > 0:\n", " print(d, '\\t', gensim_dict.state.model.id2token[j-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Moscow in the second place - good enough." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 } PK!m N7N7cranial/model_base.py""" base classes for models """ from abc import ABCMeta, abstractmethod import os from collections import OrderedDict from cranial.common import logger from cranial.re_iter import ReMap log = logger.get(name='model_base', var='MODELS_LOGLEVEL') # streaming log # Optional packages try: import dill as pickle except ImportError: import pickle try: import numpy as np except ImportError as e: log.info("Failed to import optional package: {}: {}.".format(type(e), e)) class NoMatch(Exception): pass class State(metaclass=ABCMeta): """ An object with save() & load() methods and any other properties. Notice that `Foo` below does NOT inherit from State. >>> class Foo(): ... def save(self): ... return ... def load(self): ... return self ... >>> isinstance(Foo(), State) True >>> isinstance(tuple(), State) False >>> s = State() >>> s.quest = 'grail' >>> str(s) 'quest = len 5 grail' >>> from tempfile import mkstemp >>> fh, tmpfile = mkstemp() >>> s.save(tmpfile) >>> t = State.load(tmpfile) >>> t.quest == s.quest True """ def __str__(self): ss = [] for attr in dir(self): if (not attr.startswith("_")) and (not hasattr(getattr(self, attr), '__call__')): attr_obj = getattr(self, attr) try: if isinstance(attr_obj, np.ndarray): attr_obj = "arr {} {}".format(attr_obj.shape, attr_obj) else: raise NoMatch() except (NoMatch, NameError) as e: if hasattr(attr_obj, "__len__"): attr_obj = "len {} {}".format(len(attr_obj), attr_obj) ss.append("{} = {}".format(attr, attr_obj)[:200]) return '\n'.join(ss) def save(self, fpath:str=None, connector=None, t) -> None: """ For now this just pickles the state into a file or puts into a connector stream Parameters ---------- fpath direct path to a pickled file, or a `name` argument to pass to connector. In the latter case this name=fpath string will be appended after '/' separator to the base_address of the connector. For example, if connector.base_address = 'some/path' then the final destination will be 'some/path/fpath'. Note, fpath can be None only if connector is given and its base_address is a full file path connector An optional connector object, if None, then state will be saved to a file """ assert (fpath is not None) or (connector is not None), "either fpath or connector should be given" try: if connector is not None: log.info("Trying to save using {} to fpath={}".format( connector, fpath)) connector.put(source=pickle.dumps(self), name=fpath) log.info("Saved using {} to fpath={}".format(connector, fpath)) else: log.info("Trying to save {}".format(fpath)) with open(fpath, 'wb') as f: pickle.dump(self, f) log.info("Saved {}".format(fpath)) except PicklingError as e: log.error( """Pickling Error: Consider installing the `dill` module. It will automagtically be used as a replacement for `pickle` when available.""") raise e @classmethod def load(cls, fpath:str=None, connector=None) -> State: """ Loads pickled state from file or a stream and returns it Parameters ---------- fpath direct path to a pickled file, or a `name` argument to pass to connector. In the latter case this name=fpath string will be appended after '/' separator to the base_address of the connector. For example, if connector.base_address = 'some/path' then the final destination will be 'some/path/fpath'. Note, fpath can be None only if connector is given and its base_address is a full file path connector An optional connector object, if None, then state will be loaded from a file Returns ------- loaded state object """ assert (fpath is not None) or (connector is not None), "either fpath or connector should be given" if connector is not None: assert not connector.do_read, "connector should not be in read mode (should return only a readable buffer)" assert connector.binary, "connector should be in binary mode" log.info("Trying to load using {} from fpath={}".format(connector, fpath)) obj = pickle.loads(connector.get(name=fpath).read()) log.info("Loaded using {} from fpath={}".format(connector, fpath)) else: log.info("Trying to load {}".format(fpath)) with open(fpath, 'rb') as f: obj = pickle.load(f) log.info("Loaded {}".format(fpath)) return obj @classmethod def __subclasshook__(cls, ClassObject): """This hook cases `isinstance(x, State)` to be True for any x which is an object or class having both save() and load() methods.""" if cls is State: if any("save" in B.__dict__ and "load" in B.__dict__ for B in ClassObject.__mro__): return True return NotImplemented class ModelBase(metaclass=ABCMeta): """A model that does not have a state, just implements a data transformation method. It still can have some class attributes that are temporary and do not need to be stored or modified >>> class Foo(): ... def transform(self, record): ... return ... def itransform(self, **params): ... return ... >>> isinstance(Foo(), ModelBase) True >>> isinstance(tuple(), ModelBase) False """ name = 'ModelBase' def __init__(self, **kwargs): self.proc_type = kwargs.pop('proc_type', None) self.n_proc = kwargs.pop('n_proc', 1) self.per_proc_buffer = kwargs.pop('per_proc_buffer', 1) for k, v in kwargs.items(): setattr(self, k, v) @abstractmethod def transform(self, record): """ A method to transform input data, one data-point/row/features/example at a time. Parameters ---------- record a single example of the data Returns ------- transformed data """ return record def itransform(self, iterable, iter_name=None): iter_name = self.__class__.__name__ if iter_name is None else iter_name return ReMap(iterable_input=iterable, fn=self.transform, name=iter_name, proc_type=self.proc_type, n_proc=self.n_proc, per_proc_buffer=self.per_proc_buffer) @classmethod def __subclasshook__(cls, ClassObject): if cls is ModelBase: if any("transform" in B.__dict__ and "itransform" in B.__dict__ for B in ClassObject.__mro__): return True return NotImplemented class StatefulModel(ModelBase, metaclass=ABCMeta): """ A BaseModel that needs a state, which needs to be trained. Trained state also needs to be saved/loaded into/from a file >>> class Foo(ModelBase): ... state = 1 ... def transform(self, record): ... return ... def train(self, iter): ... return ... >>> isinstance(Foo(), StatefulModel) True >>> isinstance(tuple(), StatefulModel) False """ name = 'StatefulModel' def __init__(self, **kwargs): """ When defining a specific model - set its default name to something appropriate for that model """ super(StatefulModel, self).__init__(**kwargs) self.state = State() @abstractmethod def train(self, iterable): """ method that modifies the state and returns self Parameters ---------- iterable data to use for training Returns ------- self """ return self def save(self, fpath:str=None, connector=None): """ A method to save the state Parameters ---------- fpath file path, or stream address, to save model's state into, if None, then self.name will be used as file path connector An optional connector object, if not None the pickled state will be put into that connector and fpath (or the self.name) will be added to the connector's base_address. For example, if connector.base_path = 'some/path', then final destination will be 'some/path/fpath' """ fpath = self.name if fpath is None else fpath self.state.save(fpath=fpath, connector=connector) def load(self, fpath:str=None, connector=None): """ a method to load model state from a file or a connector Parameters ---------- fpath file path, or stream address, to load model's state from, if None, then self.name will be used as file path appending '/'+self.name, if a direct file path desired, set append_name=False connector An optional connector object, if not None the pickled state will be read from that connector and fpath (or the self.name) will be added to the connector's base_address. For example, if connector.base_path = 'some/path', then final destination will be 'some/path/fpath' Returns ------- self """ fpath = self.name if fpath is None else fpath self.state = self.state.load(fpath=fpath, connector=connector) return self @classmethod def __subclasshook__(cls, ClassObject): if cls is StatefulModel: if any("train" in B.__dict__ and issubclass(B, ModelBase) and hasattr(B, 'state') for B in ClassObject.__mro__): return True return NotImplemented class ComposedModel(ModelBase): """ WORK IN PROGRESS a model that is composed of other models I would like to move towards a direction where there is a list (OrderedDict) of steps defined and then - make_transform will be standard and will just take those steps in order and compose - save/load will also go through steps and call save/load if it exsists """ steps = OrderedDict() step_setups = OrderedDict() name = 'ComposedModel' def transform(self, record): raise Exception("model composition was not complete") @abstractmethod def _make_transform(self): """ This defines what composed transform should be like. Even though most of the time its going to be a list of functions composed together, I am living this undefined for additional flexibility for things like composed_transform(x) = f1(f2(f1(x))) """ # theoretically this stuff should work, # but for now it is just an idea expressed in code, and the method is left abstract transform_steps = [] for name, step in self.steps.items(): if callable(step): fn = step elif isinstance(step, ModelBase): fn = step.transform else: raise Exception("step should be either callabale or a subclass of ModelBase") setup = self.step_setups.get(name) if setup is None: pass else: fn = setup.get('modifier', lambda x, **kwargs: x)(**setup.get('kwargs', {})) transform_steps.append(fn) pass def save(self, fpath=None, connector=None): """ A master save method that saves all stateful step-models Parameters ---------- fpath a root path for the whole model, a directory with overall model name will be created at this location connector Optional connector, if given model will be saved to connector, if not, then model will save to disk at the specified `fpath` location """ for name, step in self.steps.items(): # cannot just check for Stateful, because of the onlineLearningWrapper if isinstance(step, ModelBase) and hasattr(step, 'save'): step_path = os.path.join(fpath, step.name) if connector is None else fpath step.save(fpath=step_path, connector=connector) def load(self, fpath=None, connector=None): """ A master load method that loads all stateful step-models Parameters ---------- fpath a root path for the whole model, a directory with overall model name should exist at this location connector Optional connector, if given model will be loaded form connector, if not, then model will try to load from disk at the specified `fpath` location """ for name, step in self.steps.items(): # cannot just check for Stateful, because of the onlineLearningWrapper if isinstance(step, ModelBase) and hasattr(step, 'load'): step_path = os.path.join(fpath, step.name) if connector is None else fpath self.steps[name] = step.load(fpath=step_path, connector=connector) self._make_transform() def _prepend_names(self): """ A must run after steps are set up, otherwise save/load will have wrong paths """ for name, step in self.steps.items(): if isinstance(step, ModelBase) and not isinstance(step, ComposedModel): step.name = self.name + '/' + step.name self.steps[name] = step PK!3))cranial/models/gensim_models.py""" This file has primitive models that wrap around gensim common models such as LSI, TFIDF, etc... """ import gensim as g import os from cranial.re_iter import ReMap, DiskCache from cranial.model_base import StatefulModel, ModelBase from cranial.common import logger log = logger.get(name='gensim_models', var='MODELS_LOGLEVEL') # streaming log class GensimDictionary(StatefulModel): name = 'gensim_dictionary' def __init__(self, dict_params: dict, **kwargs): """ Wraps around gensim's Similarity index Parameters ---------- sim_params kwargs to pass to gensim's Similarity initialization This must have `output_prefix` and `num_features` kwargs any other kwargs to be passed to parent class __init__ """ super(GensimDictionary, self).__init__(**kwargs) self.params = dict_params self.state.model = None log.info("Init gensim dictionary with params:\n{}".format(dict_params)) def transform(self, record): """ Dictionary transforms list of tokens into list bow document Parameters ---------- record list of tokens Returns ------- BoW document: list of tuples (token_id, count) """ return self.state.model.doc2bow(record) def train(self, iterable): """ each item in iterable is a list of tokens """ log.info("Building gensim dictionary...") self.state.model = g.corpora.Dictionary() batch = [] for i, doc in enumerate(iterable): batch.append(doc) # occasionally dump into dictionary, if i > 0 and i % (self.params['dict_filter_every'] // 5) == 0: self.state.model.add_documents(batch) batch = [] # occasionally filter if i > 0 and i % self.params['dict_filter_every'] == 0: log.info("Current dictionary: {}".format(self.state.model)) log.info("Filtering at {} documents".format(i)) self.state.model.filter_extremes(no_below=self.params['no_below_raw'], no_above=self.params['no_above_raw'], keep_n=self.params['max_n_raw']) self.state.model.compactify() log.info("Now dictionary: {}".format(self.state.model)) # finalize self.state.model.add_documents(batch) self.state.model.filter_extremes(no_below=self.params['no_below_raw'], no_above=self.params['no_above_raw'], keep_n=self.params['max_n_raw']) self.state.model.compactify() log.info("Final raw dictionary: {}".format(self.state.model)) self._reduce_dictionary() self.state.model.id2token = {v: k for k, v in self.state.model.token2id.items()} return self def _reduce_dictionary(self): ''' # make a smaller version and also save Parameters ---------- dict_filter Returns ------- ''' # optionally remove certain tokens if self.params.get('bad_tokens') is not None: bad_ids = [self.state.model.token2id[t] for t in self.params.get('bad_tokens') if t in self.state.model.token2id.keys()] self.state.model.filter_tokens(bad_ids=bad_ids) # apply new filter self.state.model.filter_extremes(no_below=self.params['no_below'], no_above=self.params['no_above'], keep_n=self.params['max_n']) self.state.model.compactify() log.info("Final dictionary: {}".format(self.state.model)) class GensimSimilarity(StatefulModel): name = 'gensim_similarity' def __init__(self, sim_params: dict, **kwargs): """ Wraps around gensim's Similarity index Parameters ---------- sim_params kwargs to pass to gensim's Similarity initialization This must have `output_prefix` and `num_features` kwargs any other kwargs to be passed to parent class __init__ """ super(GensimSimilarity, self).__init__(**kwargs) self.params = sim_params self.state.model = None self.state.doc_index = [] def transform(self, record): """ Record is a gensim doc - list of tuples (dim_id, score) """ if self.state.doc_index: return [(self.state.doc_index[ix], float(sc)) for ix, sc in self.state.model[record]] else: return [(int(ix), float(sc)) for ix, sc in self.state.model[record]] def train(self, iterable): """ Either each item is a tuple (some_str_ID, doc), or just a doc, where doc is a list of tuples (dim_id, score) """ iterable = DiskCache(iterable) self.state.doc_index = [itm[0] for itm in iterable] corpus = ReMap(iterable, lambda itm: itm[1]) self.state.model = g.similarities.Similarity(corpus=corpus, **self.params) return self def update(self, iterable): """ Either each item is a tuple (some_str_ID, doc), or just a doc, where doc is a list of tuples (dim_id, score) """ iterable = DiskCache(iterable) self.state.doc_index.extend([itm[0] for itm in iterable]) corpus = ReMap(iterable, lambda itm: itm[1]) self.state.model.add_documents(corpus) return self class GensimLSI(StatefulModel): name = 'gensim_lsi' def __init__(self, lsi_params: dict, id2word: dict = None, **kwargs): """ Wraps around gensim's LDA model Parameters ---------- lsi_params kwargs to pass to gensim's LSI model initialization id2word id2word to pass to gensim's LSI model initialization, separate from lda_params because id2word needs to be obtained by training a dictionary kwargs any other kwargs to be passed to parent class __init__ """ super(GensimLSI, self).__init__(**kwargs) self.params = lsi_params self.id2word = id2word self.state.model = None self.state.topic_names = [] log.info("Init gensim LSI with params:\n{}".format(self.params)) def transform(self, record): return [(int(ix), float(sc)) for ix, sc in self.state.model[record]] def train(self, iterable): self.state.model = g.models.LsiModel(corpus=iterable, id2word=self.id2word, **self.params) # need topic names self.state.topic_names = [] for i in range(self.state.model.num_topics): vals = self.state.model.show_topic(i, 100) v0 = abs(vals[0][1]) vals = [w for w, v in vals if abs(v) > 0.1 * v0][:20] name = ' '.join(vals) self.state.topic_names.append(name) return self class GensimTFIDF(ModelBase): name = 'gensim_tfidf' def __init__(self, gensim_dictionary=None, **kwargs): """ Wraps around gensim's TFIDF model for the sake of standardization Parameters ---------- gensim_dictionary gensim's native dictionary, thats enough to make a TFIDF model kwargs any other kwargs to be passed to parent class __init__ """ super(GensimTFIDF, self).__init__(**kwargs) self.gensim_dictionary = gensim_dictionary self.tfidf = g.models.TfidfModel(dictionary=gensim_dictionary) def transform(self, record): return self.tfidf[record] class GensimLDA(StatefulModel): name = 'gensim_lda' def __init__(self, lda_params: dict, id2word: dict = None, **kwargs): """ Wraps around gensim's LDA model Parameters ---------- lda_params kwargs to pass to gensim's LDA model initialization id2word id2word to pass to gensim's LDA model initialization, separate from lda_params because id2word needs to be obtained by training a dictionary kwargs any other kwargs to be passed to parent class __init__ """ super(GensimLDA, self).__init__(**kwargs) self.params = lda_params self.id2word = id2word self.state.model = None self.state.topic_names = [] self.state.token2topics = {} log.info("Init gensim LDA with params:\n{}".format(self.params)) def transform(self, record): return [(int(ix), float(sc)) for ix, sc in self.state.model[record]] def train(self, iterable): self.state.model = g.models.LdaMulticore(corpus=iterable, id2word=self.id2word, **self.params) # need topic names self.state.topic_names = [] for i in range(self.state.model.num_topics): vals = self.state.model.show_topic(i, 100) v0 = abs(vals[0][1]) vals = [w for w, v in vals if abs(v) > 0.1 * v0][:20] name = ' '.join(vals) self.state.topic_names.append(name) # need token2topic (str -> str) for i in range(self.state.model.num_terms): topics = self.state.model.get_term_topics(i) if len(topics) > 0: self.state.token2topics[self.state.model.id2word[i]] = [self.state.topic_names[t[0]] for t in topics] return self def norm_gensim_vec(g_vec): """ Normalize a gensim-style vector """ norm_sq = sum([t[1] ** 2 for t in g_vec]) if norm_sq == 0: return g_vec norm = norm_sq ** 0.5 return [(id_, val / norm) for id_, val in g_vec] def get_gensim_vec_to_list_fn(dims=None): """ creates a function to use in map that transforms a gensim-style vector to just a vector: [(0, val_0), (1, val_1), ..., (n, val_n)] --> [val_0, val_1, ..., val_n] Parameters ---------- dims give number of dimensions if gensim vector is sparse, to make sure the resulting vector is dense Returns ------- a vector - list of values """ if dims is None: def gensim_vec_to_list(g_vec): return [val for _, val in g_vec] else: def gensim_vec_to_list(g_vec): vec = [0] * dims for i, val in g_vec: vec[i] = val return vec return gensim_vec_to_list PK! 3cranial/models/nlp.py""" nlp models """ import collections import os import numpy as np from collections import Counter from cranial.common import logger from cranial.model_base import StatefulModel log = logger.get(name='nlp_models', var='MODELS_LOGLEVEL') class BasicDictionary(StatefulModel): name = 'basic_dictionary' def __init__(self, no_below_raw, no_above_raw, max_num_raw, no_below, no_above, max_num, filter_at=100000, token_is_tuple=False, protected_tokens=None, **kwargs): """ A custom class for creating a dictionary of tokens from given texts When creating dictionary from a long list of texts, there is an intermediate filtering of the dictionary, the frequency is defined by filter_at number. Parameters ---------- no_below_raw min token frequency (float 0 to 1 or int number of documents) to keep during intermediate filtering no_above_raw max token frequency (float 0 to 1) to keep during intermediate filtering max_num_raw max number of most frequent tokens to keep during intermediate filtering no_below min token frequency (float between 0 and 1, or an int number of documents it occured in) to be kept in the final dictionary no_above max token frequency (a float between 0 and 1) to be kept in the final dictionary max_num max number of tokens in the final dictionary filter_at frequency of intermediate filtering specified in numbers of training items (docs) passed through token_is_tuple set True to account for Dandelion annotations where each 'token' in a document is a tuple (token, score) protected_tokens list of tokens that should always be in the dictionary, essentially added even if not encountered in documents kwargs additional kwargs passed to parent class constructor """ super(BasicDictionary, self).__init__(**kwargs) self.no_below_raw = no_below_raw self.no_above_raw = no_above_raw self.max_num_raw = max_num_raw self.no_below = no_below self.no_above = no_above self.max_num = max_num self.filter_at = filter_at self.token_is_tuple = token_is_tuple self.protected_tokens = protected_tokens self.state.frequency = Counter() self.state.doc_frequency = Counter() self.state.id2token = [] self.state.token2id = {} self.state.size = 0 self._num_docs = 0 def __repr__(self): return 'BasicDictionary with {} tokens:\t"{}"'.format( len(self.state.frequency), '", "'.join([itm[0] for itm in collections.Counter(self.state.frequency).most_common(20)]) ) def train(self, iterable): """ iterable is a list of lists of tokens Parameters ---------- iterable Returns ------- self """ for self._num_docs, doc in enumerate(iterable): if self.token_is_tuple: doc = [t[0] for t in doc] if self.protected_tokens is not None: doc = [t for t in doc if t not in self.protected_tokens] self.state.frequency.update(doc) self.state.doc_frequency.update(set(doc)) if self._num_docs > 0 and self._num_docs % self.filter_at == 0: log.info(self.__repr__()) log.info("filtering at {}".format(self._num_docs)) self._filter_tokens(self.no_above_raw, self.no_below_raw, self.max_num_raw) self._num_docs += 1 log.info(self.__repr__()) log.info("Total docs: {}.\tFinal filter:".format(self._num_docs)) self._filter_tokens(self.no_above, self.no_below, 0) # will decrease number in the next step max_num = self.max_num if self.max_num > 0 else len(self.state.frequency) self.state.id2token = [t for t, v in self.state.frequency.most_common(max_num)] self.state.frequency = {t: self.state.frequency[t] for t in self.state.id2token} # fix conters self.state.doc_frequency = {t: self.state.doc_frequency[t] for t in self.state.id2token} # fix conters self.state.token2id = {t: i for i, t in enumerate(self.state.id2token)} if self.protected_tokens is not None: self.state.token2id.update({t: t for t in self.protected_tokens}) self.state.size = len(self.state.id2token) return self def _filter_tokens(self, no_above, no_below, max_num): """ helper methods to filter dictionary """ max_freq = int(self._num_docs * no_above) if max_freq > 0: log.info("filtering for frequency <= {}".format(max_freq)) bad_tokens = [t for t, v in self.state.doc_frequency.items() if v > max_freq] [self.state.doc_frequency.pop(t) for t in bad_tokens] [self.state.frequency.pop(t) for t in bad_tokens] min_freq = no_below * self._num_docs if no_below < 1 else no_below if max_num > 0 and max_num < len(self.state.frequency): freqs = sorted(self.state.frequency.values(), reverse=True) min_freq = max(freqs[max_num], max_freq) if min_freq > 0: log.info("filtering for frequency > {}".format(min_freq)) bad_tokens = [t for t, v in self.state.frequency.items() if v < min_freq] [self.state.doc_frequency.pop(t) for t in bad_tokens] [self.state.frequency.pop(t) for t in bad_tokens] def transform(self, record): """ record is a list of tokens, transform into a list of IDs Parameters ---------- record list of tokens Returns ------- list of IDs """ if len(record) == 0: return [] if self.token_is_tuple: return [(self.state.token2id[t], v) for t, v in record if t in self.state.token2id.keys()] if isinstance(record[0], str): return [self.state.token2id[t] for t in record if t in self.state.token2id.keys()] elif isinstance(record[0], (list, tuple, np.ndarray)): return [self.transform(sub_list) for sub_list in record] PK!%_Tuucranial/models/reporting.pyfrom cranial.messaging import Async_WrapperPool from cranial.model_base import ModelBase from abc import ABCMeta, abstractmethod class ReporterBase(ModelBase, metaclass=ABCMeta): name = "metric reporter" def __init__(self, apply_funcs, **kwargs): """ This is just a step to be used in a model (or dataset or consumer) transformations chain because calculations of metrics will happen in global scope, make metrics as simple as possible to avoid slowing down the whole model Parameters ---------- apply_funcs single or a list of functions that will be calculated for every data record usually this should be at least a function that serializes data because only str/bytes can be sent kwargs passed to a parent class """ super(ReporterBase, self).__init__(**kwargs) self.apply_funcs = apply_funcs if isinstance(apply_funcs, list) else [apply_funcs] @abstractmethod def report(self, record): """ calculate metric and define how to send, messenger.notify or notifier.send, etc... Parameters ---------- record a current data record """ values = [fn(record) for fn in self.apply_funcs] pass def transform(self, record): """ This is the tranformation for this step, all it does is report the current data record and then returns back what it got Parameters ---------- record incoming data Returns ------- same record """ self.report(record) return record class MessengerReporter(ReporterBase): name = "messenger metric reporter" def __init__(self, apply_funcs, messenger, **kwargs): """ reporter that uses messenger Parameters ---------- apply_funcs optional single or a list of functions that will be calculated for every data record usually this should be at least a funciton that serializes data because only str/bytes can be sent messenger messenger that will send messages through all of its notifiers for every incoming data kwargs passed to a parent class """ super(MessengerReporter, self).__init__(apply_funcs, **kwargs) self.messenger = messenger def report(self, record): [self.messenger.notify(fn(record)) for fn in self.apply_funcs] class NotifierReporter(ReporterBase): name = "notifier metric reporter" def __init__(self, apply_funcs, notifier, address, endpoint, n_threads=None, **kwargs): """ reporter that uses a single notifier Parameters ---------- apply_funcs optional single or a list of functions that will be calculated for every data record usually this should be at least a funciton that serializes data because only str/bytes can be sent notifier notifier that will send messages address address to use in notifier.send() endpoint endpoint to use in notifier.send() n_threads if not none, make sending async with specified number of threads kwargs passed to a parent class """ super(NotifierReporter, self).__init__(apply_funcs, **kwargs) self.notifier = notifier self.n_threads = n_threads self.address = address self.endpoint = endpoint # choose which reporting method to use if n_threads is not None and (not isinstance(self.notifier, Async_WrapperPool)): self.notifier = Async_WrapperPool(notifier, n_threads=self.n_threads) def report(self, record): [self.notifier.send(message=fn(record), address=self.address, endpoint=self.endpoint) for fn in self.apply_funcs] PK!T "cranial/models/spacy_tokenizers.py""" tokenizers that use spacy """ import spacy from cranial.common import logger from cranial.re_iter import ReGenerator from cranial.model_base import ModelBase from cranial.models.tokenizers import add_n_grams log = logger.get(name='tokenizers_spacy', var='MODELS_LOGLEVEL') # streaming log class SpacyWrapper(ModelBase): name = 'spacy_wrapper' def __init__(self, lang='en', in_field=None, out_field=None, batch_size=10000, n_threads=1, **spacy_load_params): """ Use spaCy to transform text records into spacy document objects. Parameters ---------- min_length min number of characters for a token stop_list list of tokens to exclude n_grams add n-grams, if n_grams=2, then 'a b c' -> 'a', 'b', 'c', 'a_b', 'b_c' """ super().__init__(**spacy_load_params) self.lang = lang self.in_field = in_field self.out_field = out_field assert (self.in_field is None and self.out_field is None) or \ (self.in_field is not None and self.out_field is not None) self.batch_size = batch_size self.n_threads = n_threads log.info("loading spacy...") self.nlp = spacy.load(lang, **spacy_load_params) def transform(self, record: str): """ transform one text into spacy doc Parameters ---------- record text to transform Returns ------- spacy doc """ # spaCy-fy return self.nlp(record) def itransform(self, iterable, iter_name=None): """ use spacy built-in multiprocessing to transform an iterable of texts Parameters ---------- iterable iter_name Returns ------- generator """ if self.in_field is None: texts = iterable return ReGenerator( lambda:(doc for doc in self.nlp.pipe(texts, batch_size=self.batch_size, n_threads=self.n_threads))) else: texts = (itm[self.in_field] for itm in iterable) return ReGenerator( lambda: ( {**d, self.out_field: doc} for d, doc in zip(iterable, self.nlp.pipe(texts, batch_size=self.batch_size, n_threads=self.n_threads)) ) )PK! Izr]],cranial/models/tests/data/dandelion_nex.json{"dandelion": {"field_1": {"annotations": [{"categories": ["Some Category 1", "Some category"], "alternateLabels": ["alt label", "another alt label"], "label": "label_1", "confidence": 0.75}, {"label": "label_1", "confidence": 0.8, "types": ["http://dbpedia.org/ontology/Person", "http://dbpedia.org/ontology/Agent"]}, {"label": "label_3", "confidence": 0.5, "lod": {"wikipedia": "http://en.wikipedia.org/wiki/Salary", "dbpedia": "http://dbpedia.org/resource/Salary"}}]}, "field_2": {"annotations": [{"categories": ["Some Category 2", "The category"], "alternateLabels": ["third alt label"], "label": "label_1", "confidence": 0.9}, {"label": "label_1", "confidence": 0.5, "types": ["http://dbpedia.org/ontology/Person", "http://dbpedia.org/ontology/Agent"]}, {"label": "label_3", "confidence": 0.4, "lod": {"wikipedia": "http://en.wikipedia.org/wiki/good"}}]}}}PK!a{{(cranial/models/tests/data/just_texts.txttrump fires comey as fbi director; justice department blames mishandling of clinton's emails last year president trump fired fbi director james b. comey on tuesday, stunning washington with a decision that came as the bureau investigates whether the president's associates had colluded with russian agents to influence the 2016 presidential election. . the abrupt ouster was needed to allow a "new beginning" at the bureau, trump said. the firing caught comey by surprise as he spoke to fbi agents at an event in los angeles. “he was caught flat-footed,” a senior fbi official told reporters before comey headed back to washington, skipping a scheduled recruiting event in hollywood. the dismissal drew immediate calls from senior democrats for an independent prosecutor to oversee the criminal inquiry. even some members of the president's own party expressed concern. sen. richard burr (r-n.c.), the head of the senate intelligence committee, said he was “troubled by the timing and reasoning” of the dismissal, which he said “further confuses an already difficult investigation.” comey's departure, he added, was “a loss for the bureau and the nation.” trump said he had relied on the recommendation of the new deputy atty. gen. rod rosenstein, a career prosecutor who is overseeing the fbi's handling of the russia investigation because atty. gen. jeff sessions has stepped aside from any role in it. in a memorandum to sessions, which was released by the white house, rosenstein harshly criticized comey for his actions beginning last july, when comey held a news conference to announce that the fbi would not seek charges against clinton in the email investigation but also denounced her conduct. that was a serious misjudgment, rosenstein said, adding, "the goal of a federal criminal investigation is not to announce our thoughts at a press conference." comey's actions were “a textbook example of what federal prosecutors and agents are taught not to do,” he wrote. rosenstein said comey made the problems worse with his decision in late october — 11 days before the election — to disclose that the fbi had reopened its investigation of clinton after finding state department emails on a computer belonging to former rep. anthony weiner, the subject of a separate investigation and the estranged husband of clinton's aide huma abedin. after a week, the fbi determined that those emails added no significant new evidence to the case. clinton has blamed the comey letter for contributing to her defeat, although polling evidence on that point is unclear. trump loudly praised comey's announcement at the time. republicans insisted that both the fbi and congressional investigations of russia's actions would continue without white house interference. several stressed the need for trump to appoint an independent figure to head the fbi. "his removal at this particular time will raise questions," sen. bob corker (r-tenn.) said in a notable understatement. for the last 10 months, comey has come under sharp and widespread criticism from figures in both parties for his handling of two investigations connected to the election — the counterintelligence investigation of russia's role and the inquiry into democratic nominee hillary clinton's email practices while she was secretary of state. despite their criticism, democrats said that nothing comey did last year justified trump's firing him now. in statements, leading democratic lawmakers called his ouster during the ongoing russia investigation “outrageous” and said it was “not what an innocent person would do.” they warned that comey's dismissal could lead to a white house effort to shut down the fbi investigation. “no one should accept president trump's absurd justification” for the firing, declared sen. patrick j. leahy (d-vt.), the former head of the senate judiciary committee. “the president has removed the sitting fbi director in the midst of one of the most critical national security investigations in the history of our country — one that implicates senior officials in the trump campaign and administration. this is nothing less than nixonian,” leahy said. senate democratic leader charles e. schumer of new york said he told trump, who called to notify him before making the firing public, "you're making a very big mistake." although the fbi director serves a fixed 10-year term, which is supposed to insulate him from political pressure, previous presidents of both parties have taken the position that as an officer of the executive branch, the director can be fired by the president. comey told the senate judiciary committee last week that the criticism he had received for his actions in the campaign had been “painful.” “i've gotten all kinds of rocks thrown at me, and this has been really hard, but i think i've done the right thing at each turn,” he testified. he added that he welcomed an fbi inspector general's review of his conduct, which was announced in january. but comey argued that he had no choice but to disclose the renewed investigation just before an election and not "conceal" it. rosenstein sharply disagreed. prosecutors should never disclose nonpublic information about investigations, he wrote. "silence is not concealment." given comey's errors and his refusal to admit that they were mistakes, rosenstein continued, "the fbi is unlikely to regain public and congressional trust until it has a director who understands the gravity of the mistakes and pledges never to repeat them." sessions, in a letter to trump, said that he was recommending comey's dismissal "for the reasons expressed by the deputy attorney general" and in order for the department to "clearly reaffirm its commitment to longstanding principles" of proper conduct by investigators. trump, in a letter to comey informing him of his dismissal, said he had accepted the recommendation. he added that he "greatly appreciate[d] you informing me on three separate occasions, that i am not under investigation." white house press secretary sean spicer announced the decision to reporters tuesday evening, saying that trump had "accepted the recommendation of the attorney general and the deputy attorney general regarding the dismissal of the director of the federal bureau of investigation." in a statement, the white house quoted trump as saying that “the fbi is one of our nation's most cherished and respected institutions and today will mark a new beginning for our crown jewel of law enforcement." a search for a new permanent fbi director will begin immediately, the statement said. but nominating and ultimately confirming a new fbi director in such a politically toxic environment will be an extraordinarily difficult task. democrats will intensely scrutinize any trump pick in part because of the president's comments and those of his administration about the judiciary and investigatory agencies. just tuesday, the white house questioned the public assertions and private actions of former deputy atty. gen. sally yates, who testified monday about concerns she had raised to trump officials that then-national security advisor michael flynn had been compromised through misleading public statements about his interactions with russian officials. spicer suggested yates was acting as a pro-clinton partisan, and said without evidence that she was "widely rumored to play a large role" in a clinton administration. sen. dianne feinstein of california, the top democrat on the senate judiciary committee, said trump called her at 5:30 p.m. to relay his decision. "the next fbi director must be strong and independent and will receive a fair hearing," she said. obama nominated comey in 2013 to replace robert mueller, who had served beyond the typical 10-year term of an fbi director in part because of the difficulty in finding a replacement amid continuing national security threats. comey was easily confirmed by the senate. in choosing comey, a republican, the obama administration highlighted his credentials as a federal prosecutor and his apolitical manner. during his time as acting attorney general under president george w. bush, comey had threatened to resign rather than bow to administration pressure to authorize secret surveillance of telephone calls by the national security agency without judicial approval. but the campaign tested his reputation for nonpartisanship. prominent democrats faulted comey for not disclosing the extent of the trump-russia inquiry during the campaign, in contrast with his very public role in discussing the clinton investigation. trump at first appeared inclined to keep comey in his position. just two days after his inauguration, trump singled him out during a gathering with law enforcement officials in the blue room, shaking his hand and patting him on the back. "he's become more famous than me," the president quipped. speaking to reporters at his afternoon briefing, spicer hedged on whether comey still had the president's confidence after the fbi confirmed he had given inaccurate testimony to a congressional panel this week about the clinton email investigation. "i have not asked the president since the last time we spoke about this," spicer said. he returned to the briefing room hours later with news of the firing. times staff writers noah bierman, evan halper, lisa mascaro and joseph tanfani contributed to this report. fbi director james comey fired mt. hebron's sydney robinson down the stretch to win the girls 200 meter dash during the howard county track and field championships at river hill high school on tuesday, may 9. wilde lake's christian saulsbury, left, leads atholton's brandon houston in the final of the boys 200 meter dash during the howard county track and field championships at river hill high school on tuesday, may 9. washington, dc - january 22: u.s. president donald trump (c) shakes hands with james comey, director washington, dc - january 22: u.s. president donald trump (c) shakes hands with james comey, director of the federal bureau of investigation (fbi), during an inaugural law enforcement officers and first responders reception in the blue room of the white house on january 22, 2017 in washington, dc. trump today mocked protesters who gathered for large demonstrations across the u.s. and the world on saturday to signal discontent with his leadership, but later offered a more conciliatory tone, saying he recognized such marches as a "hallmark of our democracy." (photo by andrew harrer-pool/getty images) ** outs - elsent, fpg, cm - outs * nm, ph, va if sourced by ct, la or mod ** trump's shifting stances on fbi director james comey over the past year, president trump has been all over the map when it comes to his feelings about fbi director james comey . first, comey was allegedly corrupt. then, he was gutsy. then, he was respectable. and then, suddenly, he was no longer fit to hold office. july 5, 2016 trump says the system is ‘rigged' after comey announces no charges against clinton last year, comey found himself in an unusual position as his special agents investigated hillary clinton over her use of a private email server while she was secretary of state. normally, the justice department decides whether to bring criminal charges in a case. but u.s. attorney general loretta lynch had to distance herself from the email investigation after a brief and ill-advised meeting at an airport with clinton's husband, former president bill clinton, raised questions about her neutrality. lynch said she would accept whatever recommendations were made by career prosecutors and comey. that's what led comey to take the unusual step of publicly announcing in july why he was recommending that prosecutors file no charges against clinton. the fbi investigation focused on whether clinton had improperly shared classified information over the server, and although he concluded that the democratic candidate shouldn't face any charges, he was hardly complimentary. “although we did not find clear evidence that secretary clinton or her colleagues intended to violate laws governing the handling of classified information, there is evidence that they were extremely careless in their handling of very sensitive, highly classified information,” comey said . trump, who would later embrace anti-clinton cries from his supporters to “lock her up,” was disappointed in comey's announcement, tweeting the hashtag #riggedsystem. https://twitter.com/realdonaldtrump/status/750353319084843008 oct. 17, 2016 trump alleges ‘collusion' and ‘corruption at the highest level' of the government as the presidential campaign entered its closing stretch, trump's allegations of corruption grew deeper after the fbi released documents that showed officials discussing a possible “quid pro quo” between the fbi and the state department over clinton's private email server. as the los angeles times reported then : according to the documents, which were based on interviews with fbi agents, a high-ranking state department official allegedly sought to pressure the bureau into changing the classification of an email related to the benghazi attack in exchange for agreeing to help place more fbi agents in places like iraq, where they are restricted. officials said that no deal was made, and that the classification of the email was not changed -- nor were more fbi agents sent to iraq. but trump depicted it as a sign of “corruption at the highest level.” "this is very big, and frankly it's unbelievable,” trump said in a video statement. “what was just found out is the department of justice, the state department, and the fbi colluded - got together - to make hillary clinton look less guilty and look a letter than she looks. this is one of the big breaking stories of our time, in my opinion." trump later suggested in an oct. 27 interview with abc news that comey didn't just make a “mistake,” but that “something happened.” https://twitter.com/realdonaldtrump/status/788123233442824192 oct. 28, 2016 after comey re-opens clinton investigation, trump says: ‘i have great respect for the fbi' and comey ‘brought back his reputation' after comey informed congress — in a now-infamous letter -- that the fbi was reopening the investigation into clinton's emails, trump exulted in front of a roaring campaign crowd. “the f.b.i.,” trump said — pausing as the crowd cheered -- “after discovering new emails is re-opening their investigation into hillary clinton.” trump added: “i have great respect for the fbi for righting this wrong.” trump said later: “it took guts for director comey to make the move that he made in light of the kind of opposition he had where they're trying to protect her from criminal prosecution.” trump said of comey: “i was not his fan, but i'll tell you what: what he did, he brought back his reputation. he brought it back.” jan. 24, 2017 now president, trump bro-hugs comey, decides to keep him on: ‘he's become more famous than me' in an interview with “60 minutes” shortly after winning the election in november, trump declined to say whether he would retain comey. “i haven't made up my mind,” trump said, adding: “i respect him a lot. i respect the fbi a lot.” after trump took the oath of office in january, the white house announced that he planned to keep comey. the fbi chief had been part of a delegation of intelligence and security officials who went to trump tower in december to brief the president-elect on evidence that russia had interfered in the election in an effort to help trump win. in a public reception at the white house jan. 22, two days after his inauguration, trump greeted comey warmly — telling an audience, “he's become more famous than me .” he shook comey's hand and gave him a gentle bro-hug, patting him on the back. may 2, 2017 trump turns on comey again: ‘comey was the best thing that ever happened to hillary clinton' as the months wore on, comey cast a shadow over the trump administration. in march, he testified before congress that the fbi was still investigating russia's interference the presidential election — adding: “that includes investigating the nature of any links between individuals associated with the trump campaign and the russian government, and whether there was any coordination between the campaign and russia's efforts.” on may 2, hillary clinton said during an appearance in new york that comey's october bombshell — his disclosure, shortly before the election, that he was reopening the fbi's email inquiry — cost her the presidency. trump took to twitter. https://twitter.com/realdonaldtrump/status/859601184285491201 https://twitter.com/realdonaldtrump/status/859604996236742656 asked about the president's tweets the next day, white house press secretary sean spicer said: “the president has confidence in the director. but i think, clearly, his point was after some of the comments that were made yesterday regarding the reason for the outcome of the election, i think he just wanted to make it clear what exactly happened.” may 9, 2017 trump fires comey in a surprise move, trump fired comey a week later. trump cited letters from the attorney general jeff sessions and the deputy atty. gen. rod j. rosenstein recommending that comey be removed. the meat of the case against comey came from rosenstein, who wrote that comey had acted inappropriately by going public with his reasons for not pursuing criminal charges against clinton. rosenstein said comey had laid out “his version of the facts for the news media as if it were a closing argument, but without a trial." in trump's letter of dismissal to comey, he added a personal twist: "while i greatly appreciate you informing me, on three separate occasions, that i am not under investigation, i nevertheless concur with the judgment of the department of justice that you are not able to effectively lead the bureau.” trump added: “i wish you the best of luck in your future endeavors." support our journalism already a subscriber? thank you for your support. if you are not, please consider subscribing today. get full access to our signature journalism for just 99 cents for the first four weeks. boyle heights, ca november 8, 2016 -- people voting inside the restauraunt el mercado de los angeles boyle heights, ca november 8, 2016 -- people voting inside the restauraunt el mercado de los angeles is located in the boyle heights district of the city of los angeles, ca november 8, 2016. (francine orr/ los angles times) bethany webb, an indivisible oc organizer, holds a milk carton bearing rep. dana rohrabacher's pictu bethany webb, an organizer of indivisible oc, holds a milk carton with congressman dan rohrabacher's picture on it while protesting outside his office in huntington beach on tuesday, may 9. mike fowler protests outside rep. dana rohrabacher's office in huntington beach on tuesday. mike fowler, protestes outside congressman dan rohrabacher's office in huntington beach on tuesday, may 9. alex mathews takes video of protesters outside rep. dana rohrabacher's office in huntington beach on alex mathews takes video of protesters outside of congressman dan rohrabacher's office in huntington beach on tuesday, may 9. members of indivisible oc protest outside congressman dan rohrabacher's office in huntington beach o members of indivisible oc protest outside congressman dan rohrabacher's office in huntington beach on tuesday, may 9. demonstrators protest rep. rohrabacher's support of american health care act about 30 demonstrators, some holding signs that read “putin's favorite congressman,” “dump dana 2018” and “just say no to trumpcare ,” gathered outside rep. dana rohrabacher 's office in downtown huntington beach on tuesday afternoon to protest the republican congressman's support for president trump 's policies. the gathering was organized by indivisible oc 48, a left-leaning group of constituents in rohrabacher's 48th congressional district who have planned protests outside his office at 101 main st. since trump's inauguration. they started at the office and marched down the street to the pier. indivisible oc's aim has been to coax the congressman into holding a town hall meeting to discuss what indivisible members say are troubling issues stemming from the white house: the attempt to repeal the affordable care act , trump's attitude toward russia, and a perceived general “nastiness” toward minorities and the working class. “it's appalling,” said newport beach resident jim percival. “if you're representing the people you should make yourself available instead of hiding. people like rohrabacher don't give a hoot about the little people.” in february, a scuffle at rohrabacher's office resulted in injury to a staff member and a 2-year-old girl, who was knocked in the head by a door. rohrabacher responded to the incident by saying the activists were involved in “political thuggery.” since then, activists said they have remained outside of his office during their demonstrations. a few people tuesday held milk cartons bearing a picture of rohrabacher and reading “have you seen dana?” while activists expressed disdain over the current administration, the house of representatives' recent passage of the american health care act, which would repeal and replace the affordable care act, drew fresh ire from the group. people chanted “shame on you” and “when healthcare is under attack, what do we do? stand up, fight back.” if the american health care act is approved by the senate and signed into law, it could lead to an estimated 24 million fewer americans with health insurance and could affect those on medicaid and with employer-provided health insurance, the los angeles times reported. rohrabacher, who supported the act, wrote in a press release following his vote that the affordable care act made healthcare too expensive for many americans and officials needed to address a “looming crisis.” “the republican healthcare proposal takes us in the right direction,” he wrote. “what we sent to the senate may not satisfy everyone, but it's vastly superior to the failing obamacare monstrosity.” rohrabacher could not be reached for further comment tuesday. aaron mccall, an organizer with indivisible oc 48, said he's primarily concerned about the bill's effect on individuals with preexisting conditions and the increasing cost of health insurance premiums. “my entire family has preexisting conditions,” he said. “this is going to affect the people in this district immensely.” demonstrators were expected to gather again at the pier tuesday evening to host a “die-in,” in which individuals would lie on yoga mats with tombstones noting their preexisting conditions and eventual causes of death. linda clough of costa mesa said trump and rohrabacher's views on climate change concern her the most. rohrabacher has been openly skeptical of global warming, disputing scientists' theory that man-made carbon emissions are primarily to blame. clough, who was an activist during the vietnam war era, said she was motivated by trump's election to get involved with her local indivisible oc chapter. “i've seen a lot in my lifetime,” she said, “but i've never been as terrified as i am now with this administration.” hannah.fry@latimes.com twitter: @hannahfrytcn leading off: mattingly on ejection streak, k-rod loses job a look at what's happening all around the majors today: ___ trifecta? miami's don mattingly has managed in just three innings over the first two games of a series against st. louis because of a pair of ejections. mattingly was thrown out by hunter wendelstedt in the second inning monday, then got tossed arguing balls and strikes with andy fletcher on tuesday. of course, mattingly is likely frustrated with more than just crew chief joe west's umpiring unit — his marlins entered tuesday on a 5-12 slide. trout update the angels are still hopeful of avoiding a disabled list stint for injured star mike trout . the reigning al mvp sat out a fourth straight game tuesday with a tight left hamstring, but planned to hit in the cage and play catch. he also plans to test the leg running and doing work in the outfield wednesday. trout is trying to dodge what would be the first trip to the dl in his career. be stephen strasburg and the nl east -leading nationals face the orioles in the third leg of a four-game series. strasburg (3-1, 2.66 era) will be making his first start outside of the division this season. he labored through 5 2/3 scoreless innings against philadelphia in his last start, walking four and throwing 119 pitches in a 4-2 win. wade miley (1-1, 2.27) was pulled from his previous start after two outs when he was struck by successive line drives. before that, he had walked 11 batters over 12 innings in his last two starts. closed out francisco rodriguez is out as tigers closer, replaced at the back of the bullpen by justin wilson . rodriguez ranks fourth all-time with 437 career saves, but he's blown four leads in 11 opportunities this season, including two over the weekend in oakland. the 35-year-old rodriguez trails lee smith by 41 saves for third on the career list, but he's struggled this year behind a fastball that's dropped to an average of 88 mph. first time hanley ramirez is set to make his first defensive appearance this season with the red sox playing an interleague series in milwaukee. ramirez wasn't in the lineup for tuesday's opener, but boston manager john farrell says he plans to start ramirez at first base in game 2. ramirez was boston's everyday first baseman last season but has been limited to serving as designated hitter this year because of a lingering right shoulder injury. welcome aboard tommy milone will start for the mets against san francisco following a whirlwind few days. the left-hander was claimed off waivers from milwaukee on sunday and traveled from his california home back to milwaukee following the claim, then arrived at citi field about 90 minutes before monday's game. milone hasn't pitched since a relief appearance april 29. new york will be milone's fifth major league team in seven years. ___ more ap baseball: https://apnews.com/tag/mlbbaseball copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. watch stephen colbert host a 'daily show' reunion on his 'late show' it was a rare tv reunion tuesday as stephen colbert played host to a gang of fellow "daily show" alums on a special edition of cbs' "the late show." along with jon stewart , former longtime anchor of the comedy central fake newscast, colbert welcomed samantha bee, john oliver , ed helms and rob corddry, all of whom, like colbert, sharpened their satirical skills and won fans as "daily show" correspondents before heading out on their own. colbert joked, "we haven't aged a day." a comic sketch flashed back to spoof colbert's departure from "the daily show" in 2005. "i can't believe you're leaving right in the middle of the george w. bush administration," bee told him. "there's never gonna be another president this good for comedy. this guy does something ridiculous like at least once every month!" "i can't believe you're leaving us," echoed helms. "it's like beyonce leaving destiny's child. we're never gonna hear from her again." "we're a family," said stewart, pretending to choke up, "but i guess i'm realizing that you'll all spread your wings and leave me." later, during his guest spot, stewart told colbert, "i've been reading about you." referring to colbert's attention-grabbing joke last week at the expense of president donald trump , he said, "you have a potty mouth." "that i do," colbert replied. "but might i say, i learned it from you, dad." growing serious, stewart told colbert, "the things that you say, even if they're crass, even if they in some ways are not respectful enough to the office of the presidency, can insult. but he can injure. for the life of me, i do not understand why in this country we try and hold comedians to a standard we do not hold our leaders to." stewart, who left "the daily show" in 2015, said he misses that platform. "the process of making the show somehow became intwined with my process of making sense of things that i didn't understand," he said, "so i miss that." later, colbert gathered all of his guests to chat in a semicircle of chairs. "this arrangement we have right now," he said, "is exactly something we would have made fun of on 'the daily show': it looks like a morning show." currently, stewart is developing a project for hbo, where oliver hosts "last week tonight." bee hosts "full frontal" on tbs. helms scored with "the hangover" and its sequels and "the office." corddry created and starred in the comedy series "childrens hospital." west hartford hosts cultural celebration dancers from the coogan irish dance school performed. west hartford hosts cultural celebration binaksha gharti magar performed a nepali style dance. pick 6: ou's brown, texas' williams among top ol in '17 two offensive linemen were selected in the first round of last month's nfl draft , the fewest since 1965. in 2017, the big guys up front are positioned to bounce back. with the help of former auburn guard and espn and sec network analyst cole cubelic, a look at some of the draft-eligible offensive linemen that will be vital to their team's performance in 2017 — and possible first-round picks in 2018. mike mcglinchey, ot, and quinton nelson, og, notre dame . the fighting irish are hoping the left side of their line can help lead a turnaround from last season's 4-8 mess. mcglinchey is a mountain at 6-foot-8 and 312 pounds, with potential to be the first offensive lineman taken in next year's draft. the 6-5, 330-pound nelson is a dominant run blocker. "he's an absolute mauler," cubelic said. orlando brown, ot, oklahoma the sooners have to replace some top-flight talent at running back and wide receiver, but the offensive line should make life easy on the new playmakers. brown is the leader. at 6-7 and 340, brown took a big leap forward from his redshirt freshman season to last year in fundamentals and technique. athletically, he has all the tools. "your elite prototypical tackle frame," cubelic said. mitch hyatt, ot, clemson hyatt was a huge recruit who has been starting since his freshman season. he has been made steady progress from solid in 2015 to very good in 2016. this season the tigers expect him to be one of the best offensive linemen in the country and a leader of an offense that will have plenty of new faces at the skill positions. martinas rankin, ot, mississippi state rankin is a junior college transfer who redshirted his first season with the bulldogs in 2015. he needed some time to develop last season, but by the end of 2016 he was showing star potential. frank ragnow, c, arkansas how important is frank ragnow to the razorbacks? arkansas coach bret bielema held his center out of full-contact scrimmages this spring to preserve a key asset. ragnow is a three-year starter who passed on a chance to jump into the nfl draft after last season when he graded out as the top offensive lineman in the country by pro football focus. connor williams, ot, texas williams has grown into his massive frame during his two seasons at texas while retaining quickness and athleticism. the 6-6, 320-pound junior is the best player on an offensive line that has experienced players but needs better overall play. remarkably, texas has not had an offensive lineman drafted since 2008. williams is likely to break that string. ___ extra point six more draft-eligible offensive lineman looking to dominate in 2017: trey adams, ot, washington; mason cole, c, michigan; tyrell crosby, ot, oregon; martez ivey, og, florida; jamarco jones, ot, ohio state; brock ruble, ot, florida state. ___ follow ralph d. russo at www.twitter.com/ralphdrussoap ___ more college football coverage: http://collegefootball.ap.org/ copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. west hartford hosts cultural celebration hello! west hartford's board members include bepsie perry, manuela canales, clare taylor neseralla, daliah fazuia, sajuti sajitnu, tony magno, and mirtica aldave. west hartford hosts cultural celebration hall high school student samantha acosta sang the national anthem. west hartford hosts cultural celebration a large crowd turned out to hello! west hartford's sixth annual cultural celebration. west hartford hosts cultural celebration sinfronteras performed south american music. west hartford hosts cultural celebration hall high school's international aid club was in attendance. west hartford hosts cultural celebration a large crowd turned out to hello! west hartford's sixth annual cultural celebration. west hartford hosts cultural celebration conard high school student grace andrews (center) and hall high school student yasmin albur were this year's recipients of the dr. karen l. list global ambassador awards. west hartford hosts cultural celebration hello! west hartford hosted its sixth annual cultural celebration on april 24 at the town hall. pérez y moustakas apalean a los rays el receptor venezolano salvador pérez y el antesalista mike moustakas pegaron sendos jonrones en el triunfo de los reales de kansas city por 7-6 sobre los rays de tampa bay. pérez (7) mandó la pelota fuera del campo en el sexto episodio con un corredor en los senderos, al responder los lanzamientos del abridor matt andriese, cuando el serpentinero tenía un out en la entrada. moustakas (8) conectó en el décimo segundo episodio, sin corredores en el camino y definió el triunfo de los reales. la victoria fue para el relevo jake junis (1-0) en un episodio, dio dos pasaportes y ponchó a dos. el cerrador dominicano kelvin herrera (5) trabajó una entrada, dio una base, ponchó a dos y se apuntó el salvamento. la derrota fue para el cerrador venezolano diego moreno (0-1) en una entrada, permitió dos imparables, jonrón y carrera. capitals, ducks confront game 7 demons vs. penguins, oilers game 7 may be the most exciting phrase in sports to a lot of people. probably not for the washington capitals and anaheim ducks . the capitals have lost six of nine game 7s in the alex ovechkin era, and the ducks have lost five in a row with stars ryan getzlaf and corey perry , including a heartbreaker in each of the last four years. wednesday night is the chance for each team to confront its game 7 demons as washington hosts the defending stanley cup champion pittsburgh penguins and anaheim hosts the edmonton oilers with spots in the conference finals at stake. "i don't know whether from coaching or playing whether you get into a mental block or not," said bruce boudreau , who coached in game 7 four times with the capitals and four times with the ducks. "i think washington for sure is due to win. i've said it for four years in anaheim we're due to win, but in the end your best players have got to be your best players." for the capitals, that means more production from ovechkin, nicklas backstrom and evgeny kuznetsov, and strong goaltending from braden holtby when the puck drops against the penguins (7:30 p.m. et, nbcsn) for the chance to meet the ottawa senators in the east final. in those nine game 7s, ovechkin has three goals and three assists, and at the moment he is earning praise from teammates and coach barry trotz in this series for accepting a demotion to the third line. getzlaf and perry have combined for only seven points in six chances in game 7 going into another one at home against edmonton (10 p.m. et, nbcsn). goaltender john gibson was pulled from his only game 7 start in 2014 after allowing four goals on 18 shots, and he's coming off another hook after three goals on six shots in a 7-1 drubbing in game 6 on sunday. ducks coach randy carlyle didn't blame gibson and said it's about the entire team being better. "obviously there's more at stake when it's the final game," said carlyle, who won the cup in 2007 but hasn't won a game 7 since 2006. "now it boils down to one. ... i'm sure that you could poll 100 people, and 99 of them would say they'd rather play at home. it's our turn to serve, and holding serve means that we go on. if we don't hold serve, then it's not what we're looking for." boudreau, who is 1-7 in his nhl coaching career in game 7 after success in that spot in the minors, thinks goaltending will be the difference. trotz doesn't think it'll have anything to do with history. "i don't know if there's any hump to get over," said trotz, who is 1-1 with the capitals in two game 7 opportunities in 2015. "i just think with this group that i've been with, our game 7s have been pretty solid. you're not going to win every one. but i thought our game was really, really quite good in both those game 7s." whether it was marc-andre fleury stopping ovechkin on a breakaway in 2009, jaroslav halak stopping 41 of 42 shots in 2010, losing by one goal to the new york rangers in 2012, getting shut out by the rangers in 2013 or losing in overtime at the rangers in 2015, game 7 just hasn't been kind to the capitals. "at the end of the day they're a different team," said adam oates, who coached the capitals' 5-0 game 7 loss in 2012. "i think they're the better team right now, so hopefully they play that way. based on (monday) night i don't see any reason why they won't." beating the penguins emphatically 5-2 in game 6 in pittsburgh is why boudreau believes the capitals will win game 7. their last game 7 victory at home came in the first round in 2009 with boudreau behind the bench when sergei fedorov scored the winner to knock off the rangers. "i've got to believe that (the momentum from game 6 is) going to roll over, that they're finally sick and tired of hearing that they haven't gone to the third round and will break through," boudreau said. defensive crosby concussed on may 1, sidney crosby returned five days later and didn't look at his best monday after crashing headfirst into the boards. but crosby and coach mike sullivan said he was checked by a doctor and not put into the nhl's concussion protocol. crosby continues to play but isn't pleased that he keeps getting questioned. "you talk to the doctor," crosby said tuesday. "we can sit here and i can explain for 10 minutes what concussion protocol is and all that stuff but i don't really want to do it." holy fleury fleury has allowed nine goals on 58 shots the past two games after nine on 142 shots the first four games against the capitals. some of that is the play in front of him, especially with pittsburgh defenseman trevor daley missing game 6 with a lower-body injury. if the capitals have found a hole on the goaltender's glove side they're not saying. "we're just trying to shoot where there is open spot," kuznetsov said. mr. game 7 justin williams has a chance to add to his legend as "mr. game 7" in his first chance with washington. a three-time cup winner, williams is 7-0 with seven goals and seven assists in seven chances in game 7. unfamiliar oilers this is the first nhl game 7 for connor mcdavid, leon draisaitl, cam talbot and the majority of the oilers, who as a franchise last played in one in 2006. that was a loss to the carolina hurricanes in game 7 of the cup final that as, you guessed it, williams sealed it with an empty-netter. seven degrees of game 7 the hurricanes' coach in that game 7 against edmonton was none other than peter laviolette, who's waiting for the ducks or oilers in the western conference final with the nashville predators . laviolette's predators are looking to become the third eighth seed in the salary-cap era that began in 2005-06 to reach the cup final. the first two? the '06 oilers he beat and the 2012 los angeles kings, who got 15 points during their run from williams. ___ ap hockey writer greg beacham and ap sports writer will graves in pittsburgh contributed. ___ follow hockey writer stephen whyno on twitter at http://www.twitter.com/swhyno ___ more ap nhl: http://apnews.com/tag/nhlhockey . copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. ainge's leadership in spotlight as celtics chase 18th title when you work for the boston celtics , there's no need for daily reminders about your motivations. they're constantly casting shadows overhead. "there's only one goal in boston," celtics coach brad stevens recently said. "there's 17 banners hanging above us. we only go for one goal here." and few people are as keenly aware of that chase as danny ainge. boston's front office chief has already been to the nba 's mountaintop; first as player winning a pair of championships in the 80s and then as the architect of the big 3 that brought the celtics their 17th nba championship in 2008. he's since progressed from kevin garnett 's triumphant "anything's possible" declaration in '08, to believing in the possibilities of the present with a corps of young talent that's two wins away from returning the franchise to the conference finals for the first time since 2012. but with the east's top seed locked in 2-2 tie with washington, the outcome of this series could go a long way toward affirming the recent moves ainge has made, or exposing the holes that still exist as his team chases banner no. 18. ainge, who has shown willingness to make big moves, stood pat at february's trade deadline despite holding a wealth of coveted assets. it's a decision ainge hasn't second-guessed. "make no mistake that we did try to improve our team," ainge said. "but we do have a lot of confidence in our team and the guys that don't get a chance to play. it doesn't seem really fair when we have guys that are healthy and that we like and aren't even getting on the court to bring in other guys just because they're playing and everyone assumes they're better." after brushing off initial overtures, ainge was lured back to boston as the president of basketball operations in 2003 with the endorsement of none other than celtics' legend red auerbach. the way ainge once told the story to a church group, auerbach called ainge "the luckiest guy i know" in recommending him to owners wyc grousbeck and steve pagliuca. while ainge has acknowledged some fortunate outcomes to get the celtics back to this point, such as the rise of isaiah thomas into an all-star, there have also been plenty of pivotal moves by ainge. one of the league's most-tenured front office heads, he has recently used that experience to his advantage. it started with the hiring of stevens, then just a 37-year-old college coach at butler, in 2013. that was followed by the trade of garnett and paul pierce to brooklyn, which netted the celtics three first-round picks and the right to swap picks with the nets this season. that led to the drafting of rookie jaylen brown last summer and a wealth of possibilities with this year's brooklyn pick, which has the best odds of being no. 1 overall. it's a bargaining chip that only still exists because of ainge's decision to not make any trade deadline in each of the last two years. though boston did miss on wooing kevin durant to town last summer, it was able to land big man al horford , who has since credited ainge's vision as one of the major factors that swayed him to sign. horford's addition has not only has provided the celtics with needed veteran leadership, but he's been one of the sustaining elements for a group that was able to rally behind thomas after his sister's sudden passing on the eve of the playoffs. "i think that that showed the character of all the players involved," ainge said. "i think the first two games (of chicago series) there was a little bit of a cloud because one of our family was hurting really, really bad...it was like no one knew how to really react to the whole situation. credit goes to isaiah, first and foremost, for inspiring his team and the team for fighting for isaiah." tnt analyst and former nba coach kevin mchale played alongside ainge as a player and later saw him in action as an executive. he said while the word patient didn't used to be one he'd have used to describe his friend, it is one example of how his style has evolved. mchale said ainge has also shown a willingness to bring tools like analytics into how he looks at his roster — even if at the end of the day he ultimately still relies on his instincts. "he's not a pigeonhole guy," mchale said. "he uses everything and i think you have to...he has a good eye for grit and toughness. none of that analytics can show that." whether it's luck or skill, ainge isn't waiting for an 18th banner to fall in his lap. "it's like being player...you don't sit around and wait for luck," ainge said. "you work your way into having good fortune go your way. you behave and act certain way with integrity and character so that when opportunities present themselves you're ready." ___ more nba basketball: https://apnews.com/tag/nbabasketball copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. wilkerson pronostica que los jets mejorarán su marca en triunfos el ala defensiva de los jets de nueva york, muhammad wilkerson , predijo que este año el equipo mejorará la marca de 5-11 que tuvieron el campeonato pasado. "no hay lugar a dudas, los jets tendrán marca ganadora y harán más y mejor cosas que el año pasado, eso lo puedo garantizar", declaró wilkerson a un grupo de periodistas. wilkerson se dio cita en la celebración gridiron en nueva york en donde se hicieron presentes jugadores activos y retirados. los jets terminaron últimos en la división este de la conferencia americana (afc). "puedo asegurar que los jets tendremos más victorias que el campeonato pasado", comentó wilkerson. agregó que "hemos estudiado lo que nos pasó en el campeonato pasado y podremos corregir errores, les puedo decir que tendremos marca ganadora". la olympic organizers putting their plans on display los angeles olympic organizers are putting their plans on display at a time of uncertainty in the race for the 2024 games. members of the international olympic committee are in southern california this week to inspect stadiums and arenas that could become future olympic venues. but there's a big unknown. los angeles and paris are the only two bidders left for the 2024 games that will be awarded in september at a meeting of olympic leaders in peru. the ioc is considering a proposal to use that meeting to award the next two olympics — 2024 and 2028. that means one to each city. like paris, l.a. says it's only interested in 2024. members of the ioc will be in southern california for several days of meetings and tours, including stops at the rose bowl and the los angeles memorial coliseum . the contest for the 2024 games has been messy. the race began with five cities, but rome, hamburg, germany, and budapest, hungary, all pulled out. the ioc is eager to keep costs in check after decades of runaway spending, and l.a. has made its lean budget a selling point. the l.a. bid requires no new construction of permanent venues. it projects spending $5.3 billion, which would be around one-third of what tokyo is expected to spend for 2020. copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. senators back in conference finals for 1st time in 10 years erik karlsson and the senators have goals this season beyond the eastern conference finals. still, ottawa planned to enjoy the achievement for a night before getting back to work. after all, it's been a while. karlsson had a goal and an assist to help the senators reached the third round of the playoffs for the first time in a decade by eliminating the new york rangers 4-2 in game 6 on tuesday night. "we have a long way to go here," karlsson said. "we're going to enjoy this for a little bit and again, get back to work as soon as tomorrow." ottawa will face either pittsburgh or washington in its first trip to the third round since going to the stanley cup final in 2007. the penguins and capitals will play game 7 of their series wednesday night. after outlasting boston in six games during a first-round series featuring four ot finishes, the resilient senators outworked the rangers at madison square garden. ottawa won all three games at home — each by one goal — including two that went to overtime after the senators tied it in the closing minutes. that included game 2, in which jean-gabriel pageau scored twice in the final 3 1/2 minutes of regulation, and then won it with his fourth of the game in the second extra period. the rangers tied the series with two dominant 4-1 wins at madison square garden, but the senators returned home and won game 5 in overtime to take the lead and then finished off the rangers in new york. "over the course of the series, we were the better team for three games," karlsson said. "and the fourth, we had pageau." after their successful rallies against new york in ottawa, the senators had to hold off the desperate rangers near the end of game 6. chris kreider scored early in the third period to make it 3-2, but the senators held firm from there. craig anderson finished with 37 saves, and pageau clinched the series with an empty-netter with 6.2 seconds left for his seventh goal of the postseason. "the four games we lost in this series, it's as simple as them making one more play defensively or one more play offensively," rangers coach alain vigneault said. "we were in all of those games, we didn't make the defensive play when we needed to and we didn't make the offensive play to bury them. you have to give them a lot of credit. they played well and they deserved to win." ottawa led 2-0 after 20 minutes despite being outshot 13-10. new york had three power plays and more scoring chances, but the senators were aggressive on defense while blocking nine shots in the opening period — and 20 for the game. whatever got past the defense was stopped by anderson. "i thought the players have shown again, character," ottawa coach guy boucher said. "we didn't have two good games here and so i think the players reloaded emotionally, mentally and physically real well for home and then we wanted to do the same for this game. ... the players were extremely poised. they looked really rested and had a lot of energy." mike hoffman put ottawa in front 4:27 into the game when he deflected a shot from karlsson past henrik lundqvist for his fourth of the playoffs. it came on the senators' second shot on goal of the game and marked the first time ottawa scored first in the series. mark stone doubled the lead with 5:16 left in the first with his fourth of postseason. mika zibanejad got the rangers on the scoreboard with about 6 1/2 minutes left in the second, but karlsson beat lundqvist on the blocker side with 4:07 remaining in the period to restore the senators' two-goal lead. ___ follow vin cherwoo at www.twitter.com/vincherwooap ___ more ap hockey: https://www.apnews.com/tag/nhlhockey copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. 'project keep me safe' caps autism awareness month in suffield suffield first responders closed out autism awareness month with project keep me safe, a community gathering on the town green promoted by april 28. the event included an ice cream social and a chance to meet with police, fire and ems personnel. spurs overcome leonard injury, harden to beat rockets in ot the san antonio spurs were in a tough spot after kawhi leonard hurt his ankle in the second half. that's when danny green and the spurs' supporting cast stepped up. green scored seven of his 16 points in overtime, helping san antonio top james harden and the houston rockets 110-107 on tuesday night to take a 3-2 lead in their second-round playoff series. harden had 33 points, 10 rebounds and 10 assists. he had a chance at a potential tying 3-pointer in the final seconds of ot, but was blocked from behind by manu ginobili . "i remember saying to a couple of the guys this is what we live for, these moments, to play in these situations," spurs point guard patty mills said. "game 5, at home, you just try to soak it up and play hard. i guess that's where all the passion comes out, at those moments. the diving on loose balls, coming up with whatever it may be, you're throwing your body on the line in those situations." leonard had 22 points and 15 rebounds in 38 minutes before exiting with an injured right ankle. he stepped on harden's foot while running back in transition with 5:37 left in the third quarter and then played limited minutes before sitting out overtime. the spurs already were without tony parker , who will miss the rest of the playoffs after surgery to repair a ruptured left quadriceps tendon. "it was frustrating because i wanted to play," leonard said. "but i was happy seeing my teammates out there putting in a good effort and getting the win." leonard said he will play in game 6 on thursday in houston. green made a go-ahead 3-pointer and converted a three-point play to make it 109-107 spurs with 30.1 seconds left. he also made a foul shot down the stretch to help san antonio hold on. "i think we all made a decision to be aggressive," green said of leonard's absence, "regardless of what was going to happen. we weren't going to lose the game being on our heels." two questionable plays closed regulation with the game tied at 101. harden was whistled for a charge after dribbling all but a few seconds off the shot clock before driving to the basket on jonathon simmons. the spurs failed to take a shot on their final possession, with mills banking in a 3-pointer after the buzzer sounded. harden also played solid defense, helping hold leonard and lamarcus aldridge to 15-for-42 shooting as he rotated assignments in houston's smaller lineup. "i was trying to keep them in front of me," harden said. "two all-stars and even pau (gasol), who is 7-foot-something. i just tried to be aggressive and do the best that i can and help my team." but the defensive effort and facilitating houston's offense appeared to take a toll on harden. he had four points on 1-for-6 shooting and four turnovers in the final five minutes of regulation and overtime. harden, who finished with nine turnovers, said he was not fatigued despite playing 43 minutes. "i just missed shots," harden said. the spurs tried to exploit a size advantage over the rockets' small lineup early, especially when harden was guarding either aldridge and gasol. the spurs' big men only managed 2-for-5 shooting in the opening three minutes, leading san antonio to abandon the strategy midway through the first quarter and sub simmons for gasol. simmons responded by scoring 10 points on 5-for-13 shooting in 26 minutes. tip-ins rockets: houston has lost four straight best-of-seven series when it was tied 2-2. the last time the rockets won a series after being tied 2-2 was in the 1995 western conference finals when they beat the spurs. ... the rockets have not lost to the spurs in three previous postseason series. ... c nene is out for the remainder of the postseason after tearing his left adductor in game 4. ... the team that led at the half had won every game in the series until game 5. spurs: coach gregg popovich opted to start mills after starting rookie dejounte murray in the previous two games in place of parker. . ginobili has 314 3-pointers in his postseason career, which is third in league history behind ray allen (385) and reggie miller (320). my bad popovich screamed at gasol after the veteran center failed to set a screen for mills on the final play in regulation. the miscue helped contribute to the spurs failing to take a shot at the close of regulation, forcing overtime. "i think i made a wrong read," gasol said. "it was tough because when they put a small on you, you have a tendency to go in. i saw a switch, but i should have stayed up because (patrick) beverley went with lamarcus and we could have played off that. i made a bad, bad read on that play." copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. steve mcqueen to direct authorized tupac shakur documentary academy award-winner steve mcqueen is set to direct a documentary about tupac shakur. shakur estate trustee tom whalley and amaru entertainment said tuesday that the film is fully sanctioned by the late hip-hop artist's estate. mcqueen is best known for directing "12 years a slave," which won the best picture oscar in 2014 and earned him a best director nomination. the director said in a statement that he looks forward to working with shakur's family to bring his unvarnished story to life. shakur's aunt and late mother's sister gloria cox will serve as an executive producer. "few, if any shined brighter than tupac shakur," said mcqueen, who hinted there was some overlap between himself and shakur during his time at nyu film school in 1993. whalley hopes that the documentary will help take shakur's legacy beyond "the refraction of the headlines, the controversy, and the tragic way his life ended." shakur died in a still-unsolved drive-by shooting in las vegas in 1996, at age 25. cox said that the genesis of the project started with shakur himself in 1996 and was something that his mother afeni shakur continued pursuing until her death last year. "our goal has always been to tell the true story, which has never been done before in such a complete way. my sister always said to me, 'we are not in the business of defending tupac. our job is to allow him to be seen in the most complete way, so his actions, his choices, and his words will allow him to speak for himself,'" cox wrote in a statement. "i believe this film will do exactly that." no release date or timeline was announced for mcqueen's documentary. the director hasn't released a feature film since "12 years a slave," but has the gillian flynn-penned crime drama "widows," starring liam neeson , on his schedule. shakur's legacy is having a moment in hollywood movies. the prolific artist will also get the biopic treatment in "all eyez on me," which hits theaters on june 16. newcomer demetrius shipp jr. is playing shakur. copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. zhang yimou says 'great wall' story may have been too weak zhang yimou says the disappointing u.s. performance of the biggest budget china-u.s. co-production to date, "the great wall," may have been down to a weak story, but he hopes other filmmakers won't be put off from attempting such ambitious hollywood-chinese collaboration. "the actors are all very good; (star) matt damon and everyone was splendid," the acclaimed director told the associated press on tuesday. "probably the story is a bit weak, or the timing of it wasn't right, or we didn't do a very good job in making the film. there could be many reasons." zhang spoke amid preparations for the beijing opening of the stage play "2047 apologue," which he described as a "conceptual performance" linking chinese traditional culture with an imagined future of how humans will interact with technology. producers of "the great wall" had hoped the movie with a $150 million production budget could buck the trend of china-u.s. co-productions failing to make a splash in both markets, at a time when movie makers wrestle with how to appeal to chinese and western audiences at the same time. the script for the 3-d adventure fantasy that has damon and chinese warriors fighting monsters with china's iconic great wall as protection took hollywood seven years to develop. zhang added elements of chinese culture and his opulent visual style, seen in the romantic kung fu drama "house of flying daggers" and the 2008 beijing olympics ceremonies. "the great wall" has pulled in a disappointing $45 million in the u.s. since its february release, though it has earned $332 million globally. in china, where it was released in december, it made $171 million, making it the eighth-highest earner in the country last year. the movie was made by legendary east , the chinese arm of legendary entertainment, a hollywood studio now owned by chinese real estate and theater chain developer wanda group. other companies behind the movie include the state-owned china film group corp.; le vision pictures, a private film company affiliated with chinese tech firm leeco; and hollywood's universal pictures. zhang said "the great wall" marked a milestone in the collaboration of chinese and hollywood producers. "as the chinese saying goes, 'all beginnings are hard.' i feel that this beginning is valuable. i hope that there will be more cooperation like this, that people won't stop just because the result wasn't so good," zhang said. pressed on whether he would attempt a chinese-hollywood co-production again, the director said: "it doesn't have to be me. i hope more people will collaborate like this." zhang's new, much smaller-scale endeavor aims to start a conversation about the relationship between people and technology, and where this relationship is heading. he called "2047 apologue" a "conceptual performance, because it's not a show or a story." instead, he has hired chinese folk art performers and companies from europe and the u.s. to supply technology such as drones and robotic arms for the hourlong show that is broken into several "fragments." it will be performed at the national center for performing arts in beijing from june 16-18, and then tour several chinese cities. it is slated to play in edinburgh in august, as well as other countries that haven't been confirmed yet, the publicity team says. "humankind has been so smart in developing technology that kills; the americans are especially strong in that, right?" said zhang. "when technology has become weapons that help us to kill, what is the relationship between it and us? will it be used on you one day?" he said his inspiration for the theme could have come from wanting his children to spend less time on computers, something he never did as a child. one of the performers, peking opera star qiu jirong, said that people were increasingly depending on technology, such as using their phones rather than a bank card to make payments, and "2047 apologue" ''is telling you that this is a threat, too." he said his eight-minute dance performance involves a menacing laser that is out to kill him. "i want to escape, but can't," said qiu. "when you think about it, humans made these things, and have released them. "i think these things are people's desires, and people are doomed in their desires." ___ follow louise watt on twitter at twitter.com/louise_watt copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. pinewood elementary school marks 50 years, and counting when frank herman first walked the halls of pinewood elementary school in 1967, the surrounding community was mostly trees and forest. the houses near the school were spread out then, pinewood's first principal said, and mays chapel village, the sprawling residential development next door, was untouched farmland yet to be developed. "there were farms here and woods there," herman, now 88, recalled. "timonium road was just a little lane." fifty years later, the timonium school has reached its golden anniversary. on a rainy april 25, herman joined more than 500 students gathered in pinewood elementary's cafeteria that afternoon to celebrate the milestone with students and faculty past and present. the event included a speech from pinewood's current principal, recognition of past administrators, music and poetry performances by current students and a time capsule presenation. parents and visitors also toured the school later in the day. pinewood elementary opened in november 1966 to fanfare from the surrounding community, with herman at its helm. as the school's first principal, he joined a faculty of 16 teachers serving 355 students in grades one through six. today, the school enrolls 538 students in kindergarten through fifth grade and employs 32 teachers. as the faces walking the halls have changed, so has the building. the original building received a kindergarten and classroom addition in 1970, bcps spokesman mychael dickerson said. in 1995, a modular addition added more space to accomodate increasing enrollment. school officials said they had no record of the costs of that construction. a 2014-2015 redistricting moved about 100 pinewood students to the new mays chapel elementary school and other schools bordering pinewood's boundary, according to current principal tricia rueter. prior to the boundary changes, pinewood enrolled 622 students with a capacity of 566. the push to build the school began in 1964, when a group of parents and neighbors in the area joined to campaign for a new elementary school. before then, neighborhood students rode buses to now-shuttered hillendale elementary school, which was 10 miles away, and towson elementary school, which was four miles away and is now the county's bykota senior center. some students also commuted three miles to riderwood elementary school, which was built in 1965. by summer, the group got the good news—an architect was drawing plans for the new school. in 1966, bulldozers prepped the land and started construction and by 1967, pinewood elementary was finished. parents and faculty had a neighborhood school close enough to walk to, herman said. a year later, the former principal bought a home on presway road, less than a mile away, so he could walk to school as well. "we couldn't keep the parents out of the school," herman said. 'each teacher had three aides. after all these years of sending kids on buses they finally had their own school." on april 25, a variety of former administrators, teachers and students took the stage to share their stories of the school's past. "the school has always been very special to me," former teacher richard loeschke said. loeschke taught various grades at pinewood from 1971 to 1983, eventually getting a promotion to a county administrative position. he recalled running into former students around the baltimore area. his time at pinewood, he said, molded him into the person he is today. today, most of pinewood's students still come from the immediate neighborhood, between padonia road and seminary avenue to the north and south, interstate 83 to the east and mays chapel to the west. "the faces have changed but the mission, passion and purpose is the same," said rueter. pinewood's focus, she said, continues to be on building the next generation of leaders. as pinewood's former vice principal, rueter said she is happy to be back at the school after a promotion led her to take a principal position for several years at wellwood international elementary school, in pikesville. she returned to pinewood at the beginning of the current school year. she spent much of the year preparing for the 50th anniversary festivities. "it's such a powerful community," rueter said. "it started strong and it's never ebbed. it just continues to be positive. that's what we hope continues." though rueter said she isn't sure if pinewood's building will be the same in another 50 years, she said she hopes the student achievement and strong community involvement she sees continues. that sentiment is shared by kregg cueller, the zone 1 community superintendent for baltimore county public schools. pinewood elementary is in zone 1. "pinewood elementary school continues to accelerate in academic achievement while always focusing on the whole child, which is evident through its many successes and opportunities for students," cueller said in a may 5 email. the hope for the future, he said, is to continue to foster a learning environment that allows students to work toward their highest potential. "the integration of the arts perpetuates and nurtures this type of learning environment," cueller said of the school's focus on integrating arts throughout all subjects. "pinewood students continue to show exceptional achievement levels in math and reading." colbert welcomes fellow 'daily show' alums to 'late show' it was a rare tv reunion tuesday as stephen colbert played host to a gang of fellow "daily show" alums on a special edition of cbs' "the late show." along with jon stewart , former longtime anchor of the comedy central fake newscast, colbert welcomed samantha bee, john oliver , ed helms and rob corddry, all of whom, like colbert, sharpened their satirical skills and won fans as "daily show" correspondents before heading out on their own. colbert joked, "we haven't aged a day." a comic sketch flashed back to spoof colbert's departure from "the daily show" in 2005. "i can't believe you're leaving right in the middle of the george w. bush administration," bee told him. "there's never gonna be another president this good for comedy. this guy does something ridiculous like at least once every month!" "i can't believe you're leaving us," echoed helms. "it's like beyonce leaving destiny's child. we're never gonna hear from her again." "we're a family," said stewart, pretending to choke up, "but i guess i'm realizing that you'll all spread your wings and leave me." later, during his guest spot, stewart told colbert, "i've been reading about you." referring to colbert's attention-grabbing joke last week at the expense of president donald trump, he said, "you have a potty mouth." "that i do," colbert replied. "but might i say, i learned it from you, dad." growing serious, stewart told colbert, "the things that you say, even if they're crass, even if they in some ways are not respectful enough to the office of the presidency, can insult. but he can injure. for the life of me, i do not understand why in this country we try and hold comedians to a standard we do not hold our leaders to." stewart, who left "the daily show" in 2015, said he misses that platform. "the process of making the show somehow became intwined with my process of making sense of things that i didn't understand," he said, "so i miss that." later, colbert gathered all of his guests to chat in a semicircle of chairs. "this arrangement we have right now," he said, "is exactly something we would have made fun of on 'the daily show': it looks like a morning show." currently, stewart is developing a project for hbo, where oliver hosts "last week tonight." bee hosts "full frontal" on tbs. helms scored with "the hangover" and its sequels and "the office." corddry created and starred in the comedy series "childrens hospital." copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. flag day: a day of grilling, music, and american spirit locals look at t-shirts and flags provided free of charge by paradiso insurance during last year's flag day event. flag day: a day of grilling, music, and american spirit flag day is right around the corner, and you're invited to the paradiso insurance flag day bbq and celebration! flag day at our agency is an annual celebration, and we pay respects to our flag, country, and all those who have served or are currently serving. plus, we want the whole town covered in red, white, and blue! there is no better feeling than having our community show their patriotic side. we'll be grilling up free hot dogs and hamburgers for all to enjoy, plus there will be live music from radio 98.3. we had a spectacular time with them last year, and can't wait to have radio 98.3 with us again this year. be sure to tune into the station on june 14th for updates on our event! we'll also be giving away free replacement american flags and t-shirts during this event. even better, the flags and t-shirts are made right here in the good old usa! you probably didn't know this, but over 90% of american flags are actually manufactured outside of the country, just like clothing and apparel. however, we believe in everything being 100% american made and supporting local businesses. everyone who receives a flag during the event will also get a free complementary mounting kit. plus, if you have an older flag that needs to be retired we can provide a proper disposal ceremony for it. this is a family friendly event, so everyone is welcome to attend. we'll have face painting and a coloring station for the kids. max, our office puggle, will also be attending so don't forget to say hello to him when you're here. plus, for all you classic car collectors, we'll have some gorgeous vehicles for you to see. of course, you're welcome to bring your own! we'd love to see your vintage pride and joy at our barbeque. at paradiso insurance, we also take great strides in thanking the many men and women who risk their lives to protect and save others. therefore, all local veterans, firefighters, and emt service members are welcome to join us for the barbeque, as a thank you for their service to our community and country. so, even if you don't want a hamburger or to check out the awesome cars, at least come stop by to thank the wonderful people who make our community a safer place. the flag day barbeque will be held at our office located at 8 east main street, stafford springs, ct from 11:30am to 2:00pm on june 14th. save the date because we hope to see you there! this item was posted by a community contributor. to read more about community contributors, click here . more inconsistencies in flynn work, now with turkish client targeted in widening investigations of his foreign entanglements, president donald trump 's former national security adviser, michael flynn , is at odds with his former turkish client over two unusual payments totaling $80,000 that flynn's firm sent back last year to the client. the disagreement points to inconsistencies in flynn's accounts to the u.s. government about his work for foreign interests. flynn's company, flynn intel group, told the justice department in march that the two $40,000 payments were consulting fees for unspecified work. but turkish businessman ekim alptekin has told the associated press that the payments from flynn's firm were refunds for unperformed lobbying. the difference matters because flynn's foreign business relationships and the veracity of his disclosures are under scrutiny by congressional, military and intelligence inquiries. congressional committees and the pentagon's inspector general are separately examining whether flynn was fully forthcoming about his foreign contacts and earnings from organizations linked to the governments of russia and turkey. his firm's turkish work occurred while he was a top trump campaign adviser. on monday, former deputy attorney general sally yates told senators that flynn's misstatements about his contacts with russia's ambassador to the u.s. raised concerns that he could be targeted for blackmail. yates also cited the possibility that flynn could have broken federal law by operating as a paid foreign agent for the turkish client without u.s. government permission. the retired army lieutenant general and former chief of the defense intelligence agency formally told the justice department in march that his now-defunct flynn intel group was paid $530,000 for operating as a foreign agent for alptekin's firm, inovo bv, and performing work that could have benefited the turkish government. that filing —prompted by justice department pressure — came just weeks after trump fired flynn from his national security post. the president has said he made the decision after it became clear flynn had misled vice president mike pence about conversations with russia's ambassador to the u.s. the paperwork flynn filed with the justice department raised new questions because it cited two consulting payments back to alptekin's company without specifying what, if any, work was performed. alptekin told the associated press in an email that the payments were refunds guided by a verbal agreement he worked out last year with flynn intel that set out how much flynn's firm was to receive each month for lobbying and other contractual work. when alptekin didn't see any lobbying work, he said, he asked flynn intel to refund $80,000 to his firm. but flynn's filing with the justice department did not disclose those discussions or the payment arrangements cited by alptekin. the u.s. foreign agent law requires disclosure of all written and verbal contracts and modifications. national security law experts said the failure to disclose such discussions could spur additional scrutiny of flynn if justice department officials were to determine the missing material was legally significant. the law "says disclosure has to include material fact and makes it a crime to omit such material," said stephen i. vladeck, a professor and national security law expert at the university of texas school of law. flynn's foreign agent filing included only one contract signed by flynn and alptekin. the contract did not mention any adjustments made verbally, alptekin's lobbying demands, arranging for allotting payments or any consulting role for alptekin's company, inovo bv. in the filing, flynn's firm said the description of each payment back to inovo as a "consultancy fee" came from the firm's accounting records. similar "consultancy fee" entries described payments to other members of the team hired for the work. asked about the discrepancies between alptekin's statements and the filing, flynn's attorney, robert kelner, said: "we'll stick with what's in the filing." kelner declined to answer additional questions from the ap about the payment arrangement. in a brief statement tuesday, alptekin again said the payments to his firm from flynn intel were refunds for unperformed work. alptekin also suggested that flynn intel's description of the payments as consulting fees was an accounting error. in monday's senate hearing, yates told sen. richard blumenthal, d-conn., that flynn could face legal trouble for failing to disclose his foreign work and payments properly. leaders of a bipartisan house inquiry into flynn's foreign earnings have said they found no evidence that flynn asked for permission from the defense department or the state department to accept foreign payments, though they said any likely penalty for that violation would be fines, not prosecution. the defense department's inspector general is investigating. rep. jason chaffetz , r-utah, who leads the inquiry as chairman of the house oversight and government reform committee, told the ap he had not looked into flynn's foreign agent disclosures but he "would urge the justice department to pursue that if they feel it's necessary." the panel's senior democrat, rep. elijah cummings of maryland, added that flynn's "work on behalf of turkey while a top national security adviser for president trump's campaign raises grave questions." flynn intel's work last year centered on developing evidence for a criminal case against fethullah gulen, a turkish muslim cleric living in pennsylvania. turkish president recep tayyip erdogan wants gulen extradited because he believes gulen inspired last year's attempted coup against him. the obama administration rebuffed turkey's extradition requests. it's unclear whether the trump administration will change that stance, though u.s.-turkish relations have warmed under trump, who congratulated erdogan after a recent referendum expanded his presidential powers. international monitors called the referendum an undemocratic power grab. alptekin said he disagrees with flynn intel's decision to register with the justice department as a foreign agent because he says the work wasn't orchestrated by the turkish government and he doesn't have any ties to erdogan's administration. but alptekin serves on an economic committee overseen by turkey's finance ministry. in its justice department filing, flynn intel also disclosed that alptekin had consulted with turkish government officials about the gulen-related work. flynn and alptekin have yet to provide full and consistent explanations. alptekin initially said last fall that his company paid only tens of thousands of dollars, but later acknowledged that the $530,000 in payments listed in flynn's foreign agent filing was correct. alptekin also told the ap that his firm and flynn intel had agreed verbally in august to divide monthly $200,000 installments for lobbying, public relations, research and other work. that arrangement specified flynn's firm would be paid $40,000 a month for lobbying and $15,000 a month for public relations, alptekin said. alptekin justified the two $40,000 payments— one in september and the other in october— as refunds, saying he saw no evidence that flynn intel performed any lobbying. but flynn's firm reported lobbying activity. it registered with congress as a lobbyist in september, midway through the contract. and flynn intel and a contracted public relations firm disclosed in their paperwork with the justice department that it had lobbied a house committee and an arkansas state official. ___ have a tip about this story? contact the authors securely at https://www.ap.org/tips ___ follow chad day on twitter: https://twitter.com/chadsday copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. trump defends comey firing, says both parties will thank him president donald trump defended his firing of fbi director james comey , asserting in a flurry of tweets wednesday that republicans and democrats "will be thanking me." trump did not mention any effect the firing might have on the probe into contacts between his 2016 campaign and russia. instead, trump tweeted that he'll name a replacement "who will do a far better job, bringing back the spirit and prestige of the fbi." nevertheless, tuesday's abrupt firing throws into question the future of the investigation into the trump campaign's possible connections to russia and immediately raised suspicions of an underhanded effort to stymie a probe that has shadowed the administration from the outset. trump has ridiculed the investigations as "a hoax" and denied any campaign involvement with the russians. democrats likened comey's ouster to president richard nixon's "saturday night massacre" and renewed calls for the appointment of a special prosecutor, and some republicans also questioned the move. in a flurry of tweets, trump said comey had "lost the confidence of almost everyone in washington," adding: "when things calm down, they will be thanking me!" in his brief letter tuesday to comey, trump said the firing was necessary to restore "public trust and confidence" in the fbi. the administration paired the letter with a scathing review by deputy attorney general rod rosenstein of how comey handled the investigation into democrat hillary clinton 's email practices, including his decision to hold a news conference announcing its findings and releasing "derogatory information" about clinton . while comey has drawn anger from democrats since he reopened the email investigation in the closing days of last year's campaign, they didn't buy that justification for his firing. several republicans joined them in raising alarms of how it could affect probes into possible coordination between trump associates and russia to influence the 2016 presidential election. in one of the strongest statements by republicans, sen. richard burr of north carolina, chairman of the senate intelligence committee, said, "i am troubled by the timing and reasoning of director comey's termination." "his dismissal further confuses an already difficult investigation by the committee," burr said. senate democratic leader chuck schumer told trump in a phone call he thought dumping comey was a mistake. on wednesday, trump labeled the senate minority leader "'cryin' chuck schumer.'" trump will now appoint a successor at the fbi, which has been investigating since late july, and who will almost certainly have an impact on how the investigation moves forward and whether the public will accept its outcome. it was only the second firing of an fbi director in history. president bill clinton dismissed william sessions amid allegations of ethical lapses in 1993. democrats compared the ouster to nixon's decision to fire the independent special prosecutor overseeing the watergate investigation in 1973, which prompted the resignations of the justice department's top two officials. "this is nixonian," sen. bob casey , d-pa., declared on twitter. "outrageous," said oregon sen. ron wyden , calling for comey to immediately be summoned to testify to congress about the status of the trump-russia investigation. rep. adam schiff of california, top democrat on the house intelligence committee, said the white house was "brazenly interfering" in the probe. republican sen. john mccain of arizona said congress must form a special committee to investigate russia's interference in the election. senate majority leader mitch mcconnell, r-ky., said only: "once the senate receives a nomination, we look forward to a full, fair and timely confirmation process to fill the director position. this is a critical role that is especially important as america faces serious threats at home and abroad." comey was speaking to agents at the fbi's field office in los angeles when the news broke. television screens in the office began flashing the news, and comey initially chuckled, according to a law enforcement official who was present and spoke on condition of anonymity. but comey finished his speech before heading into an office and did not reappear in the main room. he later left los angeles on a plane to return to washington. in his letter to comey, trump thanked him for telling him three times "that i am not under investigation." the fbi has not confirmed that comey ever made those assurances to the president. in public hearings, comey has declined to answer when asked if trump is under investigation, urging lawmakers not to read anything into that statement. comey, 56, was nominated by president barack obama for the fbi post in 2013 to a 10-year term, though that appointment does not ensure a director will serve the full term. praised frequently by both parties for his independence and integrity, he spent three decades in law enforcement. before the past months' controversies, the former deputy attorney general in the george w. bush administration was perhaps best known for a remarkable 2004 standoff with top officials over a federal domestic surveillance program. in march of that year, comey rushed to the hospital bed of attorney general john ashcroft to physically stop white house officials in their bid to get his ailing boss to reauthorize a secret no-warrant wiretapping program. but his prominent role in the 2016 presidential campaign raised questions about his judgment and impartiality. though the fbi did not recommend charges against clinton for mishandling classified information, comey was blisteringly critical of her decision to use a personal email account and private internet server during her four years as secretary of state. comey strongly defended his decisions during a senate judiciary committee hearing last week. he said he was "mildly nauseous" at the thought of having swayed the election but also said he would do the same again. clinton has partially blamed her loss on comey's disclosure to congress less than two weeks before election day that the email investigation would be revisited. comey later said the fbi, again, had found no reason to bring any charges. ___ ap writers darlene superville, ken thomas, vivian salama, catherine lucey and sadie gurman in washington and michael balsamo in los angeles contributed to this report. copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. trump to meet top russian diplomat at the white house president donald trump will meet wednesday with vladimir putin 's top diplomat at the white house , officials say, marking the highest level, face-to-face contact with russia of the american leader's young presidency. it would also signal that the two countries have improved ties that trump recently described as being at an "all-time low." trump's talks with russian foreign minister sergey lavrov will take place after the russian's meetings earlier in the day with secretary of state rex tillerson . a russian plan to stabilize syria after more than six years of civil war is the most urgent foreign policy topic on the agenda. but the meeting will be impossible to separate from the trump administration's unfolding political drama in washington, where fbi and congressional investigations are looking into possible collusion between trump campaign associates and the kremlin related to last year's presidential election. u.s. intelligence agencies have asserted that moscow meddled in the election to help trump's chances of victory. the stigma of the russia probes has been impossible for trump to shake. trump on tuesday abruptly fired fbi director james comey, ousting the nation's top law enforcement official in the midst of the bureau's investigation into trump's ties with russia. less than a month into trump's presidency, he fired his national security adviser, michael flynn , saying flynn misled senior administration officials about his pre-inauguration talks with sergey kislyak, russia's ambassador in washington. in a senate hearing monday, former acting attorney general sally yates said she bluntly warned trump's white house in january that flynn "essentially could be blackmailed" by the russians because he apparently had lied to his bosses about his contacts with kislyak. trump has said he has no ties to russia and isn't aware of any involvement by his aides in any russian election interference. he calls the various investigations a "hoax" driven by democrats still bitter that their candidate, hillary clinton, was defeated last year. but in the meantime, his hopes for a possible rapprochement with moscow, so regularly repeated during the campaign, have been derailed. ties soured further in april after the u.s. blamed a russian ally, syrian president bashar assad , for a deadly chemical weapons attack on civilians and trump ordered that some 60 cruise missiles be fired at a syrian air base in response. after tillerson visited putin and lavrov in moscow on april 12, trump said flatly, "right now we're not getting along with russia at all." still, tillerson's meeting provided a blueprint for how the former cold war foes might go about improving ties. a main focus is syria, where both governments want to end a civil war that has killed up to 400,000 people, contributed to a global refugee crisis and allowed the islamic state group to emerge as a global terror threat. the continued fighting between rebels and assad's military has complicated u.s. efforts to defeat is. lavrov will be coming to the american capital with a russian plan to end the violence, after hashing out an agreement with iran and turkey last week. it focuses on the creation of four de-escalation zones. critical details still need to be finalized and the u.s. response has been cautious, with top officials such as defense secretary jim mattis saying they're still studying the concept and its various unanswered questions. the would-be safe zones would not cover areas where the u.s.-led coalition is fighting is. despite the lack of clarity, the possibility of a meeting between trump and lavrov would in itself be a sign of some progress. the russian diplomat hasn't visited washington at all since 2013, a year before russia's annexation of ukraine's crimea region and two years before it intervened militarily in syria to help assad remain in power. copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. democrats again fall short in a closely-watched election democrats again fell just short in a closely-watched election as heath mello lost the omaha mayoral race on tuesday after a fierce debate within the national party over his anti-abortion views. his loss was a setback for supporters who argued that the democratic national committee and abortion rights groups were wrong to attack the anti-abortion former state senator. it was also another near miss for democrats fighting in typically republican territory since donald trump 's presidential election victory. democrats lost a special election for a house seat in kansas and narrowly missed an outright win in a special election in georgia. mello, a 37-year-old catholic from omaha's working-class south side, had become a flashpoint for the internal democratic battle over whether a candidate's position on reproductive rights should disqualify him from support by the national party after its crushing losses around the country last year. tuesday, mello acknowledged the "completely different dynamic" the campaign took on in the closing weeks, but noted what he described as unified support across ideological lines. "we tried to run a campaign that was inclusive from the beginning regardless of political affiliation, regardless or ideology under the banner of change," mello told hundreds in a west omaha hotel ballroom. republican jean stothert, a 63-year-old former nurse and city council member elected in 2013, was elected to a second term. the national abortion and reproductive rights action league slammed the democratic national committee for supporting mello, who voted for abortion restrictions during his eight years in the nebraska legislature. responding to the criticism, the democratic committee chairman tom perez declared that, "every democrat, like every american, should support a woman's right to make her own choices." his comment sparked a fierce debate within the party over whether there should be an abortion rights litmus test, with mello caught in the middle. "it's astounding that our party chairman would say pro-life democrats are not welcome," nebraska democratic party chairwoman jane kleeb told the associated press tuesday as mello conceded defeat. a cbs news poll taken in january found 15 percent of democrats nationally believed that abortion should not be permitted. omaha democratic voter adam gouttierre, a 45-year-old business developer, said democrats in nebraska didn't have the luxury of being choosy. "abortion is one item on the menu of progressive concepts," he said, frustrated at the backlash. "you can't have them all!" at the april 20 rally in omaha, sen. bernie sanders, the vermont independent who sought the democratic presidential nomination last year, endorsed mello, telling thousands, "are you ready for a political revolution?" mello had cast himself as a next-generation democrat focused on economic opportunity, while embracing gop-friendly ideas such as public-private partnerships as a way to solve the city's vexing streets problem. "that's the future of the democratic party, in my mind, looking at that pro-growth, progressive, future-focused mentality." copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. us reps, dalai lama take aim at china sore spot tibet as president donald trump appears to be warming to china, a bipartisan group from the u.s. house of representatives took aim wednesday at one of beijing's sore spots: tibet. rep. nancy pelosi accused china of using economic leverage to crush tibetan calls for autonomy. during a meeting with tibetans and the dalai lama at his main temple in the indian hill town of dharmsala, she urged the community not to give up. "you will not be silenced," said pelosi, a california democrat. "the brutal tactics of the chinese government to erase race, culture and language of tibetan people challenges the conscience of the world. we will meet that challenge." the visit by pelosi and seven other u.s. representatives irritated beijing, where a spokesman for the foreign ministry reiterated china's stance that the dalai lama is a dangerous separatist. "the visit by u.s. congressmen to dharmsala and their meeting with the dalai lama has sent a very wrong signal to the outside world about supporting tibetan independence, which violates the u.s. government's commitment not to support independence for tibet," the spokesman, geng shuang, told reporters. he said beijing had complained to the u.s. government over the matter, and urged the american representatives "to stop any kind of contact with the dalai lama, and take immediate measures to eliminate the negative impact." but rep. jim sensenbrenner assured that the u.s. congress stood in "solidarity with the cause of the tibetan people to be free from the repression that has been put upon them for a very, very long time from beijing." "without justice there is no freedom," said the wisconsin republican, noting that the u.s. constitution has prohibited government restrictions on the free exercise of religion for more than 220 years. "today there is no justice in tibet for tibetans, for their religion, for their culture, for their language, and for his holiness the dalai lama. ... this is a civil rights issue." china says the himalayan region has been part of the country for more than seven centuries. many tibetans insist they were essentially independent for most of that time. at least 148 tibetans have set themselves on fire since 2009 to protest china's rule. in many cases, china has offered aid packages to foreign governments on the condition that they support china's position on issues such as tibet and taiwan, the self-governing island that beijing has pledged to take control of, by force if necessary. mongolia said in december that it would no longer allow visits by the dalai lama after a recent trip by the exiled tibetan spiritual leader led china to suspend talks on a major loan. "china uses its economic leverage to silence the voices of friends of tibet," pelosi said wednesday. "but if we don't speak out against repression in tibet and the rest of china because of china's economic power, we lose all moral authority to talk about human rights anywhere else in the world." pelosi told the gathering that she would limit her comments on china's "brutal tactics" because the dalai lama had "prayed for me that i would rid myself of my negative attitude about dwelling on the negative too much." the dalai lama, meanwhile, said tibetans do not need weapons in their struggle for autonomy, and again prescribed a path of nonviolence and compassion. while he has devolved political power to an elected government, the dalai lama is still widely revered by tibetans as their most influential leader. tibetans who remain in the closely guarded region "are living in fear and anxiety. their life is at risk, but they are still preserving our traditions," said the dalai lama, who fled tibet to india in 1959 during an abortive uprising. "we all are dedicated to the tibetan cause, but should not think of harming the chinese people as such. we need to befriend them" and work through compassion to resolve the tibetan issue, he said. the timing of the u.s. congressional visit may irk trump , who just weeks ago boasted of enjoying cozy conversations and chocolate cake with chinese president xi jinping at trump's florida resort. during xi's official visit last month, beijing also provisionally approved several trademark applications for ivanka trump, the president's daughter. president trump's rhetoric on china has warmed considerably since the u.s. presidential campaign, when he repeatedly called the asian giant a currency manipulator and an economic adversary of the united states. many in the crowd at wednesday's gathering in dharmsala said they were delighted, and relieved, to see a bipartisan u.s. delegation address the tibetan issue. "it perhaps shows that there is huge support for tibet in the u.s. congress. with trump at the helm, things are uncertain," said internet security analyst lobsang gyatso, 34. rinchen, a 27-year-old antiques dealer who fled tibet as a teenager in 2006, said the visit had burnished the tibetan cause and sent a strong message to china. "the mere fact that this delegation is visiting dharmsala gives importance to tibet and the dalai lama," said rinchen, who uses only one name, as is common in the region. when people inside tibet hear of the visit, "they will know that the support is real," he said. ___ daigle reported from new delhi. associated press writers louise watt in beijing and ashok sharma contributed to this report. ___ this story has been corrected to show that ivanka trump received provisional approval for several trademarks, not patents. copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. some names trump might consider in picking a new fbi chief with james comey ousted as fbi director, president donald trump will have an opportunity to select a replacement for a new 10-year term. the fbi in the interim will be led by comey's top deputy, andrew mccabe. but trump is likely to reach outside the bureau to find someone to run the storied law enforcement agency. "the fbi is one of our nation's most cherished and respected institutions, and today will mark a new beginning for our crown jewel of law enforcement," trump said in a statement issued by the white house. here are some possible candidates: — ray kelly: the longest-serving police commissioner in new york city, kelly oversaw the force in the years following the sept. 11 attacks when terror threats were routine. his tough-on-crime stance, including support for provocative tactics like stop-and-frisk, could make him a natural ally of attorney general jeff sessions and a go-to-guy for a fellow new yorker like trump. kelly as commissioner defended a police operation, exposed by the associated press, that conducted secret surveillance of muslims. he could partner with trump and sessions on anti-terrorism efforts. — chris christie: though his relationship with trump has been topsy-turvy, the governor of new jersey has known the president for years and could bring law enforcement bona fides to the job. christie is a former republican-appointed united states attorney in new jersey, and he cited that background time and again during his 2016 presidential campaign. his legacy as governor took a hit, however, with a bridgegate scandal that was investigated by the fbi and prosecuted and brought down some of his allies. — david clarke: a wild-card, but the outspoken and polarizing milwaukee county, wisconsin, sheriff has been a fierce supporter of trump and even landed a speaking spot at last summer's republican national convention. a conservative firebrand known for his cowboy hat, clarke has called himself "one of those bare-knuckles fighters" and has been critical of what he called the "hateful ideology" of the black lives matters movement. but he'd be a long shot given that a county jury recently recommended criminal charges against seven milwaukee county jail staffers in the dehydration death of an inmate who went without water for seven days. — trey gowdy: the south carolina republican led the house committee investigation of former secretary of state hillary clinton 's actions surrounding the deaths of four americans in benghazi, libya. gowdy is also a former federal prosecutor who boasts of his work on drug trafficking, bank robberies and child pornography cases. he was among lawmakers critical of comey's decision not to prosecute clinton in the email server investigation, saying other government officials would have been prosecuted if they handled classified information like clinton did, but federal officials disagree with that assessment. gowdy said after comey's firing that though he had differences with the former fbi director on some matters, he "never lost sight of the fact that he had a very difficult job." ___ associated press writer sadie gurman contributed to this report. copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. fbi chief known for judgment calls is done in by turmoil there was a time when doing the right thing seemed pretty simple to james comey , the fbi director whom president donald trump fired. "there's right, and there's wrong and it ain't hard to tell the difference," he once said flatly. that was before comey lobbed a stink bomb into the 2016 presidential race just before the november election by announcing investigators had found more emails that might — or might not — relate to hillary clinton 's use of a private email setup as secretary of state. and it was before comey publicly confirmed in march that the fbi since last summer had been investigating contacts between the trump campaign and russian officials. before comey put himself at odds with trump by contradicting presidential tweets in which trump asserted his phones had been ordered tapped by president barack obama . before comey confessed last week that he felt "mildly nauseous" at the thought that he might have tipped the election outcome. before the fbi had to correct the record on tuesday regarding misstatements he'd made in his latest testimony on the clinton email case. whew. after months of tumult and tension between comey's fbi and the white house, trump said he was acting to restore "public trust and confidence" in the nation's top law enforcement agency. the administration cited comey's handling of the clinton email investigation as justification for his dismissal. by the time trump cut short comey's 10-year appointment, the fbi director who prided himself on his squeaky-clean reputation was catching criticism from all directions. sen. lindsey graham, r-s.c., said after the firing that "given the recent controversies surrounding the director, i believe a fresh start will serve the fbi and the nation well." democrats accused trump of using the email scandal as a fig leaf for getting rid of the head of the fbi as it investigates possible trump campaign connections to the russians. comey has found himself in the spotlight before for standing on what the 6-foot-8 lawyer saw as the moral high ground. before the 2016 presidential campaign, comey was best known for the tale of his dramatic rush to the bedside of then-attorney general john ashcroft in a darkened hospital room in 2004 for a standoff with senior white house officials over federal wiretapping rules. comey, serving as acting attorney general during ashcroft's illness, dashed to the bedside to block bush administration officials from making an end run to get ashcroft's permission to reauthorize a secret no-warrant wiretapping program. "that night was probably the most difficult night of my professional life," comey testified before congress in 2007. he's experienced plenty of turmoil since. former justice department officials and lawmakers from both parties called comey's revelation about clinton's emails just 11 days before the election an improper, astonishing and perplexing intrusion into politics in the critical endgame of the 2016 campaign. it was an unexpected predicament for the man who had painted ethical decision-making as an easy call. but comey's internal certitude has led the fbi official to freelance his positions at times. in 2015, he broke from the white house in suggesting a possible link between rising homicide rates in some american cities and police officers' anxieties about taking actions that could be recorded for viral videos. the white house distanced itself from those remarks, saying there was no scientific evidence to support a connection, or to show that officers were pulling back from their responsibilities. comey, a former republican who is no longer registered with a political party, spent 15 years as a federal prosecutor before serving in the george w. bush administration. his office brought the case that led to martha stewart's conviction on obstruction of justice and lying to government investigators. as an assistant u.s. attorney in virginia, he handled the investigation of the 1996 bombing of the khobar towers housing complex in saudi arabia that killed 19 members of the u.s. military. obama, when he nominated comey to the fbi job in 2013, cited his willingness to stand up to power "at key moments when it's mattered most," referencing the hospital-room standoff. but the obama white house left comey dangling after his much-criticized announcements regarding clinton, saying it was up to comey to defend himself in the face of what obama spokesman josh earnest called "significant criticism from a variety of legal experts, including individuals who served in senior department of justice positions in administrations that were led by presidents in both parties." clinton last week said she was "on the way to winning" until comey's letter and the wikileaks release of internal campaign emails scared off voters. during a senate hearing last week, comey testified that, faced with whether to disclose the information about clinton late in the campaign or conceal it, he had to choose between "really bad and catastrophic" and he decided to "walk into the world of really bad." it is longstanding justice department protocol to avoid taking investigative action in the run-up to an election that could affect its outcome. but comey told colleagues he felt obligated to go public after having told congress over the summer that the investigation had been concluded without prosecution. christine chung, a new york lawyer who worked with comey when he was the top federal prosecutor in manhattan, described him last year as ever "determined to do the right thing." the criticism he's faced over the email disclosure, she added, is a "lesson for why good people shouldn't go to washington." ___ associated press writer eric tucker contributed to this report. follow nancy benac on twitter at: http://twitter.com . copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. us may send patriot missile to lithuania amid moscow threat u.s. defense officials said a long-range patriot missile battery may be deployed to the baltic region later this year as part of a military exercise. the move, if finalized, would be temporary but signal staunch u.s. backing for baltic nations concerned about the threat from russia. u.s. defense secretary jim mattis on wednesday declined to confirm the specific deployment, but said, "we are here in a purely defensive stance. everyone knows this is not an offensive capability. for anyone who says otherwise, i would just say i have too much respect for the russian army to think that they actually believe there's any offensive capability." at a news conference with lithuania president dalia grybauskaite , mattis said the u.s. "will deploy only defensive systems to make certain that sovereignty is respected. the specific systems that we bring are those that we determine necessary." asked about a potential patriot deployment, grybauskaite would only say that "we need all necessary means for defense and for deterrence, and that's what we will decide together." u.s. officials said the patriot surface-to-air missile system could move into the region during the july air defense exercise, but it would be gone by the time a large russian military exercise begins in august and september. they said there will be a u.s. component to the air defense exercise, adding that the u.s. is not considering any long-term change to its air defense status in the region. the potential placement of the patriot system in lithuania has been discussed since last fall. it would be the first time the system has been deployed in the baltics. the officials said the u.s. will keep a close eye on the russian exercise, called zapad, which will take place in russia's kaliningrad territory and western section of the country. they said the u.s. will have an enhanced presence in the region at that time to monitor whether russia uses the exercise as an opportunity to mass troops and equipment there and leave some behind when it's over. the officials, who were not authorized to discuss the matter publicly and spoke on condition of anonymity, said russia could have as many as 100,000 troops in the region for the exercise. lithuania, which borders kaliningrad, has deep concerns about the russian threat on its eastern flank. grybauskaite said mattis understands the challenges and threats facing lithuania. nato has strengthened its military support for the baltic region. last july the alliance decided to deploy four multinational battalions in poland and in the three baltic states — latvia, lithuania and estonia — this year as a response to growing russian military activity. those countries have been increasing the size of their military forces, buying new weapons systems and improving their artillery forces. also wednesday, mattis toured the pabrade training area, northeast of vilnius near the belarus border, where a german battle group has been based as part of the nato effort. he walked along displays of tanks and soldiers, asking commanders about how well the forces are getting along together and with their allies. the u.s. has an armored brigade in poland. mattis is in lithuania as part of a three-country trip, including stops in denmark and london. copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. ap releases in-depth review of its coverage of nazi germany the associated press has conducted an in-depth review of its operations in nazi germany, concluding that the news agency acted as "forthrightly and independently as possible." but the review also found ap handled some situations inadequately. the review was undertaken after an article published last year contended that the ap allowed nazi propagandists to exert some influence over its news photo report in the 1930s by maintaining a photo subsidiary in germany, registered under a restrictive nazi press law. the author, historian harriet scharnberg, also identified ap german photographers who were drafted into or joined nazi military propaganda units during world war ii, some while still being paid by ap. ap's review disputed scharnberg's conclusion that the news agency was in any way complicit with the nazi regime during the years 1933-41, when the agency was present in the country. the ap was kicked out of germany when the united states entered world war ii in december 1941. "we recognize that ap should have done some things differently during this period, for example protesting when ap photos were exploited by the nazis for propaganda within germany and refusing to employ german photographers with active political affiliations and loyalties," the report says. "however, suggestions that ap at any point sought to help the nazis or their heinous cause are simply wrong," it adds. "due in large part to the ap's aggressive reporting, the dangers of the nazis' ambitions for domination in europe and its brutal treatment of its opponents were revealed to the wider world." the report spells out instances in which ap editors clashed with nazi censors and also demanded that stronger steps be taken to keep the ap german photo service free of nazi propaganda. it also cites ap reporting in the 1930s that alerted readers in the united states to the acts of anti-semitism and cruelty of the nazi regime both in words and photos. ap executive editor sally buzbee said the ap's coverage of nazi germany reflected its core newsgathering principles. "it is essential to cover tyrannical regimes and other undemocratic movements, when possible from within the borders they control, in order to accurately relay what is happening inside," she said. "that is what we do, without compromising ap's independence or standards." "ap believes it is important to know one's own story — warts and all — and so we have re-examined the period, taking a hard look," says the report's introduction, written by john daniszewski, ap's vice president and editor at large for standards. the report was written by larry heinzerling, an adjunct assistant professor at the columbia graduate school of journalism and retired ap deputy international editor, with contributions by ap investigative researcher randy herschaft. research began more than a year ago with a review of previously unexamined ap archives. that review was then was extended to other records — including u.s. military documents, and the oral histories and personal papers of deceased employees. scharnberg also was interviewed. the report notes that louis p. lochner, the ap's berlin bureau chief from 1928-41, was awarded the pulitzer prize in 1939 for his comprehensive coverage of the nazi regime, including the nazis' anti-semitic policies and actions. a german-american and former world war i peace advocate who personally despised the nazis, lochner was aware that some critics back home viewed him as pro-nazi, particularly when he was covering nazi military victories in the first years of world war ii. among the report's key findings for the period 1933-41: —the ap's german photo service, established as a subsidiary in 1931, provided photos to german media after nazis took power in 1933. the nazis quickly brought the ap german photo service and all other german media companies under the supervision of the propaganda ministry. while ap management insisted that its german photo service production stay neutral, german staff members faced constant pressure from propaganda ministry officials about the ap's photo output, "with some doing a better job of resisting nazi demands than others." —ap's photo captions when they appeared in german media often were rewritten or published under misleading or offensive headlines. while the ap protested and fought against nazi attempts to censor the ap itself, the review found no evidence that ap protested these abuses by pro-nazi media. current ap practice requires a strong response when ap customers willfully distort the meaning of ap content. —after resisting for two years, the ap in late 1935 submitted to an anti-semitic edict that all people working in german media must be of german "aryan" origin. ap's german photo service let go six employees considered jewish by the nazis, while helping them to find work elsewhere. "the ap made the difficult decision to comply because it believed it was critical for ap to remain in germany and gather news and photos during this crucial period," the report said. with ap's aid, all of these employees emigrated and survived the holocaust. —ap's berlin-based american reporters and german photographers covered the first part of world war ii from 1939-41 from the german side of the battle lines. the united states had not yet entered the war but some of this coverage was criticized from within the u.s. embassy in berlin as channeling german official views and disinformation; ap executives in new york assessed the accusations and rejected the criticism, stating that ap reports reflected events as seen by the reporters. —a few of ap's german employees held pro-nazi views and covered the german side of the war enthusiastically. one staff and then freelance photographer employed by the ap german service was austrian-born franz roth, an ardent nazi who traveled as a war photographer with the waffen ss to several fronts before and after the ap's expulsion from germany. he died as a combat photographer in 1943. —after 1939, the german government drafted several ap german photo service employees to serve with propaganda units accompanying troops to cover the fighting, requiring that the resulting photos be pooled for use by german media while their salaries still were paid by ap germany. ap management at the time believed their photography had news value in spite of the restrictions caused by traveling with german forces. among the report's key findings for the period 1942-45: —with the u.s. entry into the war against germany in december 1941, ap's american staff members were arrested and interned for five months before being deported in a prisoner exchange. the ap german picture service was seized, handed over to the german foreign ministry and put under control of a waffen ss photographer, helmut laux. most german former ap personnel were forced into laux's operation; others were sent to military units. —in an arrangement reached in neutral portugal in 1942 between laux and the local ap correspondent, laux's operation gathered and sent regular packets of german-censored photos from germany and german-occupied europe to ap's new york and london office via lisbon. in exchange, with the knowledge and approval of u.s. wartime officials, ap sent photographs from the u.s. to neutral countries for ultimate distribution inside germany. the exchange was approved by ap's new york headquarters and ap annual reports at the time made public that the ap was receiving photos from nazi-german-occupied areas. with one known exception, the ap report says, the ap images that appeared in german publications through this arrangement during the war were unaltered by the germans, but captions were rewritten by german propagandists to conform to official nazi views. according to the report, ap's management in new york considered obtaining the german photos an important way to fulfil its mission to cover the war as comprehensively as possible. "although the exchange necessitated dealing with the nazi regime, it was the ap's belief then and now that the photos gave the u.s. public a much fuller picture of the war than could have been obtained otherwise," says the ap report. "that included scenes of fighting on the russian front, the results of bombings of german cities and germany's falling war fortunes." ___ online: https://www.ap.org/about/history/ap-in-germany-1933-1945/ copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. russian opposition leader undergoes eye surgery after attack russian opposition leader alexei navalny has undergone eye surgery in spain after being attacked last month. navalny suffered a severe chemical burn in his right eye last month when an attacker doused him with green antiseptic. navalny's supporters identified the attacker as a pro-government activist, but police haven't made any arrests. navalny wrote on instagram on tuesday that he had been operated on the previous day at a barcelona clinic and that doctors expect the vision in his right eye to be restored in several months. after being denied travel documents for five years, navalny, who is serving a five-year suspended sentence in a dubious embezzlement case, was issued a passport to travel last week. navalny, who built a reputation with his investigation into official corruption, spearheaded anti-government rallies in march, russia's largest and most widespread in years, and has called on his supporters to protest again in june. the kremlin on wednesday published a decree signed by president vladimir putin boosting security ahead of the world cup in 2018 and during the confederations cup that will be held in russia in june and july. the tightened security measures mean that all protest rallies will be banned in the cities holding the competitions unless they receive permission from authorities and security services — a major blow to navalny's plans to hold rallies across russia on june 12. on wednesday, navalny tweeted that the june protest should go ahead. "the basic constitutional rights of russian citizens cannot be abolished and scrapped by presidential decrees," he said. "no 'special procedure for the approval of rallies' set by putin will not stand in the way of the anti-corruption rallies on june 12." ___ a previous version of this story has been corrected to show that navalny received his passport last week, not last year. copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. hillary clinton to speak at book publishing convention hillary clinton is coming to next month's publishing convention. the former presidential candidate, secretary of state and u.s. senator will speak june 1 at bookexpo, the industry's annual national gathering, convention officials told the associated press on wednesday. the hour-long event is being billed as "an evening with hillary rodham clinton." it will take place at the jacob javits center in new york, site for bookexpo. officials declined to say whether clinton will give a speech and/or will be interviewed onstage. clinton will likely discuss the book of essays she has planned for september. the book, currently untitled, is expected to touch upon her loss to donald trump in the 2016 election. clinton was first lady when she spoke at the 1995 convention to promote her book "it takes a village." copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. mytowns headlines barker find your town page manchester 9:40 pm no tax hike in manchester's adopted budget hartford 9:10 pm hartford moves closer to bankruptcy, soliciting proposals from law firms hartford 8:18 pm state pledges $1 million to hartford's bowles park redevelopment california's state board of education will consider how to satisfy an obama-era law under the rule of trump betsy devos k-12 may 10, 2017, 6:00 a.m. california's state board of education will consider how to satisfy an obama-era law under the rule of trump joy resmovits (alex wong / getty images) the time has come for the california state board of education to formulate its plan for satisfying the every student succeeds act , the obama-era replacement of no child left behind. this change will be the major topic of discussion at the board meeting on wednesday and thursday. where no child used a stringent system to reward and punish schools for their performance on test scores, essa, as it's known, gives states much more leeway in deciding how to hold schools accountable for good performance.  with the trump administration in office — and an education secretary who insists that states and school districts do much of the decision-making — states will get even more freedom than they had expected. trump in march  signed a bill that trashed obama's  rules  for essa state compliance. the state board wrote in the introduction to its draft plan that trump has provided "maximum flexibility" for the state to create its own policies for managing about $2.6 billion in federal money. the plan, titled "the california way," the board wrote, “has been written to meet, not exceed, federal requirements.” but how much flexibility is too much? the state's plan already has been criticized by  some advocates , who say the draft focuses on equity in word but not in deed. at bare minimum, essa requires that states identify their lowest-performing 5% of high-poverty schools, as well as high schools with persistently low graduation rates, and help them improve. the draft is vague on its prescribed interventions, and largely relies on the california school dashboard — the new school rating tool — to set goals for school performance. the state has until september to submit its plan. stay tuned for our coverage of this debate.  the board will also discuss: changes to its contract with standardized testing vendors. several requests from charter schools.  potentially dropping riverside county as the fiscal agent for the california collaborative for educational excellence. you can watch the meeting live  here . latest updates outcry over netflix films prompts cannes to change rules after a backlash over programming netflix films, the cannes film festival says it will, beginning next year, only accept theatrically released films for its prestigious palme d'or competition. in a statement wednesday, the french festival said it has adapted its rule to require films in competition to be distributed in french movie theaters. the festival said it wanted to "reiterate its support to the traditional mode of exhibition of cinema in france and in the world." cannes this year for the first time selected two films in its official competition from netflix: noah baumbach's "the meyerowitz stories" and bong joon ho's "okja." in france, films that don't obtain theatrical release are prohibited from streaming or subscription video on demand for three years. on tuesday, france's national federation of films distributors said the netflix films at cannes were "endangering a whole ecosystem." copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. liveblog header for la-essential-education-updates-southern-california-2017 essential education: l.a. unified restates its tough stand on immigration enforcement may 10, 2017, 6 a.m. welcome to essential education, our daily look at education in california and beyond. here's the latest the l.a. board of education unanimously passed a new set of policies that says clearly: immigration officers, stay off school campuses. uc tweaks its proposal to limit the percentage of out-of-state undergraduates. all updates betsy devos california state university charter schools community colleges for parents higher education hs insider k-12 lausd university of california cultures celebrated at buttonball lane school buttonball lane school students, and their families, got to sample different cultures from around the world on april 26 at the school's annual international night. desserts and snacks from dozens of different countries lined one of the hallways. the gymnasium was filled with interactive crafts stations, each one highlighting a different country's culture. andrinka stamps from ghana, panda crafts from china and babushka pencil toppers from russia were among the offerings, as well as a quiz to determine if a person could find the differences between american and british english. in one classroom, people were able to get henna tattoos from india, provided by parent volunteers. several doors down, a room was filled with stations where children could view egyptian collar jewelry, and then create their own. the centerpiece of the evening was the multicultural performance stage. acts included irish step, greek and bollywood kathak dancers, chinese, hindustani and arabic singers, and international chamber music selections from around the world, performed by buttonball students and alumni. everything was generated by buttonball students, parents and staff, and coordinated by the event's co-chairs, angie elawa and carrie wechsler, and a committee of teachers. "we have a wonderful teachers who are part of our committee every year, that are amazing," wechsler said, adding that the event evolved from a means to welcome families from around the world, into a celebration of all of the cultures that come together to form buttonball's community. "it's just a chance for the children to learn about other countries and cultures," wechsler said. some of the teachers also gear lesson plans toward the event. "international night is one of my favorite nights at buttonball," said the school's principal kent hurlburt. "it's a showcase of the many cultures we celebrate at our school, through the lens of food, music, dancing, crafts, singing, and more. i am filled with pride and amazement at the many ways our families come together on this night to celebrate and learn about one another." for more photos, visit courant.com/community/glastonbury. essential education: l.a. unified restates its tough stand on immigration enforcement for parents k-12 lausd may 10, 2017, 5:30 a.m. l.a. school board makes it clear: immigration officers won't be welcomed on campuses howard blume (al seib / los angeles times) the los angeles school board on tuesday unanimously approved a set of policies that board members said would provide families with a higher level of protection from federal immigration raids. among the safeguards in the sweeping set of guidelines: no immigration officers will be allowed on campus without clearance from the superintendent of schools, who will consult with district lawyers. until that happens, they won't be let in, even if they arrive with a legally valid subpoena. read more for parents k-12 lausd may 9, 2017, 5:41 p.m. no vote yet on contract for l.a. schools' one-stop online enrollment system howard blume l.a. schools supt. michelle king (al seib/los angeles times) los angeles school officials on tuesday pulled back from a plan to quickly install a one-stop online student enrollment system in the nation's second-largest school system. the apparent problem was that supt. michelle king didn't have a solid majority on the board of education to approve the purchase of the necessary technology. as a result, officials quietly removed a vote on the $24-million, three-year contract from the agenda just before the meeting. the effort to match families with academic programs and draw in new enrollment has become a major early initiative of king, who took office about 15 months ago. she would like some features of the enrollment system ready by the fall. that timeline is now in doubt. a unified enrollment system would allow parents to research all education options for their children at one online location and then fill out just one application form. getting to that point would take several years, although some elements, including a search tool to find descriptions of schools and programs, could have been ready in several months. members of the board of education signaled some discomfort with the plan last week, when senior staff presented the concept at a board committee meeting. some board members may not go along if charter schools are included; others might balk if charters are excluded. charters are privately operated and exempt from some rules that govern traditional campuses. they also are growing in numbers and enrollment, which exacerbates financial strains on l.a. unified. for king and other backers, unified enrollment is one way to try to win students back and otherwise increase enrollment. last week, however, staff members working on the project could not cite an example of a district in which unified enrollment had brought students back from charter schools. they said it's difficult to find a district to compare with l.a. unified, the nation's second-largest. l.a. unified has more charter schools and more charter students than any other system, although they still account for only about 16% of district students. for now, charters are not part of unified enrollment. board members last week also expressed concerns about the potential for another technology fiasco, as happened in recent years with an ipads-for-all plan and a student records system. for parents higher education k-12 lausd university of california may 10, 2017, 5:00 a.m. l.a. unified's stand on raids, the school board race's home stretch, unified enrollment: what's new in education today joy resmovits (christina house / for the times) in and around los angeles: l.a. unified reasserts its commitment to standing up to federal immigration raids. l.a. school board candidates are responding to voters'  — and students' — concerns in the home stretch of the may 16 vote. the school board stepped away from  a plan to quickly install an online student enrollment portal.  in california: former san diego unified and long beach unified superintendent carl cohn says it's time for california's schools to meet the needs of all immigrant students . a look back at how chinese immigrants fought to win access to san francisco's schools. the university of california changed its proposal to limit out-of-state undergraduates. nationwide:  the college board's khan academy practice tools boost scores, a new study found. students at the parsons school of design at the new school are working on clothing designs  for people with disabilities. higher education university of california may 9, 2017, 5:24 p.m. uc revises its plan to limit the share of spots going to out-of-state students teresa watanabe students at uc berkeley's moffitt library (david butow / for the times) the  university of california , aiming to end fighting over how many out-of-state students it admits, on tuesday announced a revised proposal to limit non-californian and international undergraduates. under the  proposal , uc would restrict the percentage of nonresident students to 18% at five of its nine undergraduate campuses.  uc berkeley ,  ucla ,  uc san diego  and  uc irvine  — whose proportion of nonresident students exceeds 18% — would be allowed to keep, but not increase, those higher percentages. the new plan is a retreat from the proposal for a 20% systemwide cap on nonresident students that university officials  presented  to the uc board of regents in march. the cap, which would have been the first of it its kind, drew so much dissension from faculty and lawmakers that it was  pulled from action  and a vote was delayed until this month. read more betsy devos k-12 may 10, 2017, 6:00 a.m. california's state board of education will consider how to satisfy an obama-era law under the rule of trump joy resmovits (alex wong / getty images) the time has come for the california state board of education to formulate its plan for satisfying the every student succeeds act , the obama-era replacement of no child left behind. this change will be the major topic of discussion at the board meeting on wednesday and thursday. where no child used a stringent system to reward and punish schools for their performance on test scores, essa, as it's known, gives states much more leeway in deciding how to hold schools accountable for good performance.  with the trump administration in office — and an education secretary who insists that states and school districts do much of the decision-making — states will get even more freedom than they had expected. trump in march  signed a bill that trashed obama's  rules  for essa state compliance. the state board wrote in the introduction to its draft plan that trump has provided "maximum flexibility" for the state to create its own policies for managing about $2.6 billion in federal money. the plan, titled "the california way," the board wrote, “has been written to meet, not exceed, federal requirements.” but how much flexibility is too much? the state's plan already has been criticized by  some advocates , who say the draft focuses on equity in word but not in deed. at bare minimum, essa requires that states identify their lowest-performing 5% of high-poverty schools, as well as high schools with persistently low graduation rates, and help them improve. the draft is vague on its prescribed interventions, and largely relies on the california school dashboard — the new school rating tool — to set goals for school performance. the state has until september to submit its plan. stay tuned for our coverage of this debate.  the board will also discuss: changes to its contract with standardized testing vendors. several requests from charter schools.  potentially dropping riverside county as the fiscal agent for the california collaborative for educational excellence. you can watch the meeting live  here . higher education may 10, 2017, 3:26 a.m. these black colleges in atlanta are some of hollywood's best kept filming secrets tre'vell anderson (morehouse college / 20th century fox) on a red clay hill in the heart of atlanta, hundreds of black men saunter up and down  morehouse college 's brown street on their way to classes. it's a late march afternoon and the magnolia trees, fuchsia and white pansies bloom into a sweet, tea-tinged breeze. to the unfamiliar eye, morehouse seems like many other colleges. but it's not. most schools are not used as filming locations for big-budget hollywood productions. morehouse, however, has been the setting for two in just a single year, the oscar-nominated “hidden figures” and bet's “the quad.” read more k-12 lausd may 9, 2017, 10:15 a.m. at forums for l.a. school board races, concerns vary by crowd howard blume l.a. unified school board candidates nick melvoin, from left, imelda padilla, steve zimmer and kelly gonez (on screen) answer questions at a united way forum. (gina ferazzi / los angeles times) in the home stretch before the may 16 vote, two very different audiences focused on very different concerns at campaign forums for pivotal los angeles school board contests. the first, sponsored by  united way  of greater los angeles, handed over the gavel and much of the organizing to its young civic leaders program. there were probably few voters among the packed crowd of about 200 on saturday, but students' questions came from their direct experience in schools. a central goal was to get candidates to commit to putting more counselors on campuses. two days later, when parents and community members gathered at palisades charter high school, a key theme was privately operated public charter schools and their status within the  los angeles unified school district . the unavoidable, underlying campaign dynamic was charter advocates versus the teachers union, which, together, have spent millions of dollars to stuff mailboxes, make calls and pound on doors. read more charter schools for parents k-12 lausd may 9, 2017, 9:48 a.m. l.a. school board will weigh $24-million contract for unified enrollment system howard blume supt. michelle king during a visit to windsor hills elementary school. (christina house / for the times) the los angeles board of education on tuesday is scheduled to vote on a contract for an online enrollment system that is supposed to match families with schools and draw in new enrollment. the effort has become a major early initiative of supt. michelle king, who took office about 15 months ago. she wants some features of the enrollment system ready by the fall. a district team has been working on the project for months, but tuesday's meeting is key because king is asking the school board to approve a $24-million, three-year technology contract to build the system. there will be other costs as well, including providing staff to help parents use the search and application software at campuses. a unified enrollment system would allow parents to research all education options for their children at one online location and then fill out just one application form. getting to that point will take several years, although some elements, including a search tool to find descriptions of schools and programs, could be ready in several months. not all members of the board of education were won over as of last week, when senior staff presented the concept at a board committee meeting. some board members may not go along if charter schools are included; others might balk if charters are excluded. charters are privately operated and exempt from some rules that govern traditional campuses. they also are growing in numbers and enrollment, which exacerbates financial strains on l.a. unified. for king and other backers, unified enrollment is one way to try to win students back and otherwise increase enrollment. last week, however, staff members working on the project could not cite an example of a district in which unified enrollment had brought students back from charter schools. they said it's difficult to find a district to compare with l.a. unified, the nation's second-largest. l.a. unified has more charter schools and more charter students than any other system, although they still account for only about 16% of district students.   for now, charters are not part of unified enrollment. board members last week also expressed concerns about the potential for another technology fiasco, as happened in recent years with an ipads-for-all plan and a student records system. k-12 may 9, 2017, 8:09 a.m. houston has become the most diverse city in the u.s. brittny mejia (gary coronado / los angeles times) the members of the margaret long wisdom high school soccer team hails from central america, mexico, africa and points between. its bench hums with spanish, kinyarwanda, swahili and often english. but its real unifying language — soccer, played hard — is universal. the high school is in southwest houston, a city whose stunning growth and high-volume immigration have turned it into the most racially and ethnically diverse major metropolis in the country,  surpassing new york in 2010 . houston — with a black, democratic mayor and a powerfully pro-immigrant population — has potentially become one of the battlefronts in texas over the city's “don't ask” policy, which prohibits police from inquiring about the immigration status of a person who hasn't been arrested. read more higher education university of california may 8, 2017, 8:09 p.m. racial tensions help drive ucla student election upsets teresa watanabe chloe pan, left, divya sharmar and sayron stokes all won positions during recent ucla student elections. (luis sinco/los angeles times) at ucla, the furor started with a photo of the undergraduate student body president, making a hand sign associated with the bloods. danny siegel is white. he was wearing a suit and tie. many african american students were angered by what they saw as a man of white privilege mocking their community and clueless about the poverty and despair that drive some in it into gangs. there were those, of course, who said to chill out, the photo was a joke. but anger over the image appears to have contributed to the  stunning defeat  of siegel's campus party in last week's undergraduate student elections — and the intensity of the reaction was the latest sign of discontent among many university of california students of color who believe that administrators and some fellow students continue to slight them and to discount their needs. read more charter schools higher education k-12 university of california may 9, 2017, 5:00 a.m. nicki minaj's tuition gift, delaine easton's gubernatorial bid, paul ryan's charter visit: what's new in education today joy resmovits (kevin winter / getty images) in and around los angeles: a photo of the ucla student body president sparked furor — and drove an election upset. meet the usc student body president who was the first in his family to attend college. the teenage boy who was killed outside a san diego high school left a suicide note. in california: a california bill targeting junk food  could end the box tops for education program. former state schools chief delaine easton talked to kpcc about why she's now running for governor. she's the only woman in the race. nationwide: a fan tweeted at nicki minaj , asking for help covering college tuition. the singer said yes. house speaker paul d. ryan apparently is  planning a visit  to eva moskowitz's success academy, a high-profile harlem charter school. higher education k-12 may 8, 2017, 1:35 p.m. nicki minaj pays fans' tuition and school expenses in a burst of generosity on social media christie d'zurilla (dimitrios kambouris / getty images) nicki minaj was handing out money on twitter this weekend, specifically for college tuition, student loans, books and other supplies.  the singer was tweeting about a contest related to the billboard music awards when she came across a tweet from someone asking her to pay their tuition. read more higher education may 8, 2017, 9:22 a.m. campus conversation: edwin saucedo, usc student body president rosanna xia edwin saucedo speaks during usc's first generation summit in february. (gus ruelas / usc) when edwin saucedo first set foot on campus, he doubted his usc classmates would consider him a peer, let alone someone “good enough” to be a voice for all 19,000 undergraduates.  as he prepares to don his cap and gown this weekend, he reflects on a changing campus and the issues facing higher education today. read more betsy devos charter schools higher education k-12 lausd may 8, 2017, 9:02 a.m. balancing homework with immigration fears, trump's hbcu confusion, a teen with options: what's new in education today joy resmovits daniel garcia, 17, inside his bedroom at his grandmother's home in el sereno. (genaro molina / los angeles times) in and around los angeles: a santee education complex student of modest means earned himself a wealth of college options. local teens balance high school with the adult responsibility of preparing for their parents' potential deportation. young men with autism learn how to interact with police officers. usc's student president, the first in his family to go to college, talks about what drove him to leadership. in california: the state made a much-publicized turn away from punitive discipline measures. but 40% of teachers told their union they haven't been trained in alternative methods. khan academy materials could officially make their way into your southern california classroom. nationwide: a deep look at how school choice leads to the sorting of new york's students by race and class. president trump made a confusing statement about funding historically black colleges, and now he and betsy devos are defending their support of such institutions. higher education k-12 lausd may 8, 2017, 7:42 a.m. america's top universities wanted him. how this l.a. teen of modest means earned a wealth of options steve lopez (genaro molina / los angeles times) the first response arrived in february, from  cal state bakersfield . “it's a good feeling when you open a letter and it says, ‘congratulations,'” says noe martinon. it was the first of many. martinon, 18, is a senior at  santee  education complex, south of the santa monica freeway and downtown l.a. martinon's parents went only as far as sixth grade in mexico. victor martinon is a janitor at a real estate company, irma palma is a seamstress in a clothing factory. read more k-12 may 8, 2017, 7:36 a.m. in l.a., teens balance high school with planning in case their parents are deported sonali kohli maria garcia, 18, looks out over macarthur park, an area she used to visit when she went to school nearby. (genaro molina / los angeles times) it was hard not to eavesdrop in the tiny pico-union studio where maria garcia grew up. she was around 9 when her father came home one day from his low-wage job as a garment worker and told her mother about the immigration raid at his downtown l.a. factory. she could hear their relief that her father hadn't been found. garcia, who is now a high school senior, is one of many thousands of teenagers who were born in the u.s. to parents who are in the country illegally. a  2013 usc analysis  found that about 16% of children in los angeles county were u.s. citizens with at least one parent without legal status. in 1999, garcia was one of about 215,000 children born in the u.s. to immigrants in the country illegally,  according to the pew research center . read more k-12 may 8, 2017, 7:28 a.m. teenage boy killed by san diego police after pointing a bb gun at officers called 911 moments earlier cindy chang and maya lau (howard lipin / san diego union-tribune) the 911 caller asked police to check on a 15-year-old boy standing outside of torrey pines high school. the boy was wearing a gray shirt and black pants, had a medium build and was not armed, the caller said. it was about 3:30 a.m. saturday — an unusual time for someone to be at the school, located in a wealthy san diego community a few miles from the scenic pacific coastline and world-famous torrey pines golf course. read more k-12 may 8, 2017, 4:41 a.m. reporting from abuja, nigeria 82 freed chibok schoolgirls arrive in nigeria's capital associated press (sunday aghaeze / associated press) the 82 freed chibok schoolgirls arrived in nigeria's capital on sunday to meet president muhammadu buhari as eager families awaited an official list of names and looked forward to reuniting three years after the mass abduction. the newly released girls arrived at the abuja airport and were met by buhari's chief of staff, presidential advisor femi adesina said. read more k-12 may 8, 2017, 3:39 a.m. anatomy of a tragedy: police say boy with bb gun fatally shot by officers at high school kristina davis and dana littlefield (howard lipin / san diego union-tribune) san diego police  are investigating after two officers fatally shot a 15-year-old boy saturday morning as he stood in front of torrey pines high school. police say he was holding a bb gun. here's how authorities say the shooting went down: read more higher education may 5, 2017, 11:32 a.m. penn state fraternity and 18 of its members are charged in student's death associated press jim and evelyn piazza listen as centre county dist. atty. stacy parks miller announces the results of an investigation into the death of their son timothy piazza. (joe hermitt / associated press) a  penn state university  fraternity pledge had toxic levels of alcohol in his body and was badly injured in a series of falls, authorities said friday in announcing criminal charges against 18 members of the fraternity and the fraternity itself. centre county dist. atty. stacy parks miller said a grand jury investigation, aided by security camera video from the beta theta pi chapter house, found that friends failed to get help for 19-year-old timothy piazza before his death in february. the grand jury said their actions in some cases may have worsened his injuries. read more liveblog barker: la-essential-education-updates-southern-california-2017-barker-blurb live blog barker essential education: l.a. unified restates its tough stand on immigration enforcement updates » essential education: l.a. unified restates its tough stand on immigration enforcement » 5:30 a.m. l.a. school board makes it clear: immigration officers won't be welcomed on campuses 5:41 p.m. no vote yet on contract for l.a. schools' one-stop online enrollment system 5:00 a.m. l.a. unified's stand on raids, the school board race's home stretch, unified enrollment: what's new in education today 5:24 p.m. uc revises its plan to limit the share of spots going to out-of-state students florida masseur accused of inappropriately touching a teen a florida masseur has been charged with inappropriately touching a teenage girl during a massage. news outlets report 29-year-old orlay palacio was arrested monday and charged with lewd or lascivious conduct on someone under 18. according to a police report, palacio touched the 17-year-old while she and her 18-year-old cousin were at a massage and stretch center in davie on may 3. davie police wrote in a report that palacio would periodically touch her vagina during the massage. police say palacio later removed her underwear and tried to perform oral sex, but the girl refused. it's unclear if palacio has an attorney. copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. 1 dead, 3 injured in police-involved shooting in bridgeport authorities say a connecticut police officer killed a man and wounded another after the driver of a stolen car struck at least one officer at the end of a car chase. two officers were injured. state police say bridgeport officers stopped the stolen car tuesday afternoon after a pursuit. officials say when officers approached the car, the driver sped up in reverse and struck at least one officer. another officer opened fire, killing the driver and wounding a passenger. authorities say the passenger was brought to a hospital with non-life threatening injuries. two city officers were treated for minor injuries at a hospital. the names of the driver and officers have not been released. police identified the passenger as 21-year-old julian fyffe, of bridgeport. state police are investigating. copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. travel trips and deals: trans-atlantic cruise won't sink your budget here are some of the more interesting events, deals, websites and other travel tidbits that have come across our desk recently: •a 14-night trans-atlantic cruise with an ocean-view cabin for as little as $1,999 per person, double occupancy, is a great deal. throw in free air returning to tampa, fla., and it's an amazing deal. the online vacation center is offering the may 2018 package that includes port calls in key west, fla.; la palma and tenerife in the canary islands; and malaga, spain, before docking in barcelona, spain. it also includes two nights' hotel in barcelona. air add-ons are available from other departure points, including $300 from new york and $350 from chicago. 800-780-9002, http://tinyurl.com/kklsqmm •spring into summer rates are being offered by the seven disney springs resort area hotels in florida for stays through july 31. rates are as low as $79 per night. http://tinyurl.com/kx8qznw •sea island, a five-star resort on georgia's coast, is holding a storyteller's weekend june 16-17. three noted authors will discuss their approach to novel writing, and aspiring writers will have a chance to get their work critiqued. http://tinyurl.com/lh4bsth •timberline adventures offers more than 80 hiking and biking tours in the u.s., canada and great britain. 800-417-2453, http://tinyurl.com/kg4rgln •magnificent macaws will be taking to the air at the indianapolis zoo beginning may 27. seven species of the colorful birds have been undergoing training at the zoo and will fly overhead several times a day as well as being on exhibit. http://tinyurl.com/kodtyzm •abbey road on the river, billed as the world's largest beatles-inspired music festival, will be may 25-29 in jeffersonville, ind., just across the river from louisville, ky. more than 50 acts from around the world will perform on multiple stages, including herman's hermits, starring peter noone. http://tinyurl.com/kxx3lkv •the michigan antique festival & classic car show will be june 3-4 in midland. there will be more than 80 acres of antiques and collectibles from nearly 1,000 dealers of "treasures & memories." the car show will feature antique and classic vehicles, a swap meet and cars for sale. http://tinyurl.com/ka4bx5d •the outta sight kite flight will be june 3-4 on the lakefront in kenosha, wis. this is the 15th year for this event that features professional sport kite performances as well as a kite-flying school. http://tinyurl.com/lbnqqb9 •the movement electronic music festival will be may 27-29 at hart plaza in downtown detroit. the event will feature more than 100 acts, including carl cox, the belleville three, adam beyer, barclay crenshaw, danny brown and dixon. http://tinyurl.com/jotukb8 •the summer camp music festival, may 26-28 in chillicothe, ill., will have more than 100 bands performing on seven stages. among the headliners are umphrey's mcgee and moe. this is the 17th year for the fest, which draws more than 20,000 music fans. http://tinyurl.com/yatey6p •step back into history on two weekends during the iowa renaissance festival, may 27-29 and june 3-4 in middle amana. this is the 26th year for the fest, which includes period entertainment, such as jousting, swordplay, magic, music and more. there will also be lots of food and vendors. http://tinyurl.com/kpcecnl deals and websites have been checked for availability as of press time. listings are not endorsements. send tips at least a month in advance to chicagotribtravel@gmail.com . phil marty is a freelancer. arrested journalist says he was only trying to do job police said a west virginia journalist was arrested after yelling questions at u.s. health and human services secretary tom price. price and senior white house aide kellyanne conway visited the state capitol in charleston on tuesday to learn about efforts to fight opioid addiction in a state that has the nation's highest overdose death rate. capital police said in a criminal complaint that daniel ralph heyman, 54, was yelling questions at the two. it says he tried to breach secret service security and had to be removed from a hallway at the capitol. he was charged with willful disruption of governmental processes, a misdemeanor. heyman, who works for public news service, said he was arrested after asking repeatedly whether domestic violence would be considered a pre-existing condition under the proposed health care overhaul. "i'm not sure why, but at some point, i think they decided i was just too persistent in asking this question and trying to do my job and so they arrested me," he said during a news conference that was posted on facebook by the american civil liberties union. his attorney, tim depiero, called it a "highly unusual case." he said heyman's only intent was to ask a couple of questions. "i've never had anyone get in trouble criminally for talking too loud," he said. "we just don't understand why he got arrested. it just seems way over the top." copyright 2017 the associated press. all rights reserved. this material may not be published, broadcast, rewritten or redistributed. west hartford hosts cultural celebration hello! west hartford hosted its sixth annual cultural celebration, on april 24 at the town hall. organized by the nonprofit, the celebration consisted of performances by different culture groups. different organizations and groups also presented at booths around the auditorium. PK!}U U *cranial/models/tests/test_gensim_models.pyimport unittest import sys sys.path.append('.') # in case file is run from root dir from cranial.models.gensim_models import GensimLDA, GensimLSI, GensimTFIDF, GensimDictionary class TestGensimModels(unittest.TestCase): def setUp(self): # a list of 3 BOW documents, format: (token_id, count) self.inputs = [ [ [0, 2], [1, 2], [2, 3], ], [ [3, 1], [4, 4], ], [ [4, 1], [3, 1], ] ] lda_params = dict(num_topics=2, workers=None, chunksize=2000, passes=50, batch=False, alpha='symmetric', eta=None, decay=0.5, offset=1.0, eval_every=10, iterations=50, gamma_threshold=0.001, random_state=137, minimum_probability=0.01, minimum_phi_value=0.01, per_word_topics=False) id2word = {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'} self.lda = GensimLDA(lda_params=lda_params, id2word=id2word) self.lda.train(self.inputs) def test_gensim_lda_train(self): res = [(itm[0][1] > itm[1][1]) * 1 for itm in self.lda.itransform(self.inputs)] actual = [res[1] == res[2], res[0] == res[1]] expected = [True, False] # just the fact that it runs is good enough # but first and last document should belong to the same cluster self.assertListEqual(actual, expected, "second and third document should belong to the same cluster and " "different from the first one") def test_gensim_lda_topics(self): actual = self.lda.state.topic_names expected = ['e d', 'c a b d e'] self.assertListEqual(actual, expected, "should concatenate topic terms") def test_gensim_lda_keywords(self): actual = self.lda.state.token2topics expected = {'a': ['e d', 'c a b d e'], 'b': ['e d', 'c a b d e'], 'c': ['e d', 'c a b d e'], 'd': ['e d', 'c a b d e'], 'e': ['e d', 'c a b d e']} self.assertDictEqual(actual, expected, "every tocken should be related to every topic...") if __name__ == '__main__': unittest.main() PK!v cranial/models/tests/test_nlp.pyimport unittest import sys sys.path.append('.') # in case file is run from root dir from cranial.models.nlp import BasicDictionary def fetcher(): return ('this is a test sentence' for _ in range(3)) class TestNLPModels(unittest.TestCase): def test_BasicDictionary_train_no_filter(self): inputs = [ ['b', 'b'], ['b', 'c', 'd', 'c', 'd'], ['a', 'b', 'c'] ] d = BasicDictionary(no_below_raw=0, no_above_raw=1, max_num_raw=10, no_below=0, no_above=1, max_num=10, filter_at=10, token_is_tuple=False, protected_tokens=None) d.train(inputs) actual = {attr: getattr(d.state, attr, None) for attr in ['doc_frequency', 'frequency', 'id2token', 'token2id', 'size']} expected = { 'doc_frequency': {'b': 3, 'c': 2, 'd': 1, 'a': 1}, 'frequency': {'b': 4, 'c': 3, 'd': 2, 'a': 1}, 'id2token': ['b', 'c', 'd', 'a'], 'token2id': {'b': 0, 'c': 1, 'd': 2, 'a': 3}, 'size': 4 } self.assertDictEqual(actual, expected, "should count document and overall frequencies; " "create id<->token maps; " "count vocab size") # TODO: make more tests for dictionary if __name__ == '__main__': unittest.main()PK!e-cranial/models/tests/test_spacy_tokenizers.pyimport unittest import sys sys.path.append('.') # in case file is run from root dir from cranial.models.spacy_tokenizers import SpacyWrapper from cranial.re_iter import ReGenerator, ReMap def fetcher(): return ('this is a test sentence' for _ in range(3)) class TestTokenizers(unittest.TestCase): def test_spacy_wrapper_no_fields(self): m = SpacyWrapper() x = ReGenerator(fetcher) x = m.itransform(x) expected = ['this is a test sentence', 'this is a test sentence', 'this is a test sentence'] actual = [itm.text for itm in x] self.assertListEqual(actual, expected, "each item in result should be a spacy doc which has attribute " "text which should be equal to the original sentence") def test_spacy_wrapper_with_fields(self): m = SpacyWrapper(in_field='text', out_field='doc') x = ReGenerator(fetcher) x = ReMap(lambda s: {'text': s}, x) x = m.itransform(x) expected = ['this is a test sentence', 'this is a test sentence', 'this is a test sentence'] actual = [itm['doc'].text for itm in x] self.assertListEqual(actual, expected, "each item in result should be a dictionary with key'doc' that contains " "spacy doc which has attribute text which should be equal to " "the original sentence") if __name__ == '__main__': unittest.main()PK!4'cranial/models/tests/test_tokenizers.pyimport unittest import sys import time import shutil import subprocess sys.path.append('.') # in case file is run from root dir from cranial.models.tokenizers import MosesTokenizer def fetcher(): return ('this is a test sentence' for _ in range(3)) class TestTokenizers(unittest.TestCase): @classmethod def tearDownClass(cls): shutil.rmtree('moses_test') @classmethod def setUpClass(cls): # a list of 3 BOW documents, format: (token_id, count) try: shutil.rmtree('moses_test') except: pass subprocess.run("git clone https://github.com/moses-smt/mosesdecoder moses_test".split()) def test_moses_tokenizer(self): tk = MosesTokenizer(moses_repo_path='moses_test') texts = ['one text.', 'two text?', 'The final text!'] res = tk.itransform(texts) actual = [_ for _ in res] expected = ['one text .', 'two text ?', 'The final text !'] self.assertListEqual(actual, expected, "output should be tokenized") def test_time_moses_tokenizer(self): tk = MosesTokenizer(moses_repo_path='moses_test') with open('cranial/models/tests/data/just_texts.txt') as f: texts = f.read().split('\n\n') res = tk.itransform(texts * 10) t0 = time.time() _ = [_ for _ in res] print("All separate: ", len(texts) * 10, 'x', sum([len(txt) for txt in texts])// len(texts), '\ttime: ', time.time() - t0) texts = ['\n\n'.join(texts)] * 10 res = tk.itransform(texts) t0 = time.time() _ = [_ for _ in res] print("by file: ", len(texts), 'x', sum([len(txt) for txt in texts])// len(texts), '\ttime: ', time.time() - t0) texts = ['\n\n'.join(texts)] res = tk.itransform(texts) t0 = time.time() _ = [_ for _ in res] print("All files in one: ", len(texts), 'x', len(texts[0]),'\ttime: ', time.time() - t0) if __name__ == '__main__': unittest.main()PK!*wz z cranial/models/tokenizers.py""" tokenizers that do not use spaCy """ import subprocess import os import collections import logging from cranial.common import logger from cranial.model_base import ModelBase log = logger.create('tokenizers', os.environ.get('MODELS_LOGLEVEL', logging.WARNING)) # streaming log class MosesTokenizer(ModelBase): name = 'moses_tokenizer' def __init__(self, moses_repo_path, language='en', threads=None, **kwargs): """ This wraps around a moses tokenizer - https://github.com/moses-smt/mosesdecoder Note that it is much faster to transform few large chunks of text instead of many small ones. So before passing strings into this tokenizer it might be good to batch short texts into a large one with some known separator between individual texts, and then after split apart again Parameters ---------- moses_repo_path path to the cloned repo language language, default 'en', never checked that it can work with others but supposedly yes: https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl threads number of threads to pass to the tokenizer command (did not notice any improvements) kwargs additional kwargs to pass to the parent class constructor """ super(MosesTokenizer, self).__init__(**kwargs) self.moses_repo_path = moses_repo_path self.language = language self.threads = threads self.comm = [os.path.join(self.moses_repo_path, 'scripts/tokenizer/tokenizer.perl'), '-q', '-l', self.language] if self.threads is not None and self.threads > 1: self.comm += ['-threads', format(self.threads)] # check command result = subprocess.run(self.comm, input="testing...".encode('utf8'), shell=True, check=False, stderr=subprocess.PIPE) if result.returncode: raise Exception(result.stderr) else: log.info("Moses tokenizer command >> " + ' '.join(self.comm) + ' -- OK') def transform(self, record: str) -> str: """ transform one record Parameters ---------- record raw text Returns ------- tokenized text (tokens are space-separated) """ result = subprocess.run(self.comm, input=record.encode('utf8'), shell=True, check=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE) return result.stdout.decode('utf8').strip() def add_n_grams(token_list, n=1): """ creates n-grams: for n = 2 ['a', 'b', 'c'] -> ['a', 'b', 'c', 'a_b', 'b_c'] Parameters ---------- token_list input list of strings n n in n-grams, how many adjacent elements to merge Returns ------- modified list of strings """ if n < 2: return token_list else: adds = [] for i in range(2, n + 1): adds.extend(['_'.join(tt) for tt in zip(*[token_list[j:] for j in range(i)])]) return token_list + adds PK!^M8M8cranial/online_training.pyfrom abc import abstractmethod, ABCMeta import time from collections import deque from concurrent.futures import ThreadPoolExecutor from cranial.common import logger from cranial.model_base import ModelBase, StatefulModel log = logger.get(name='online_learning') class TrainerBase(metaclass=ABCMeta): """ Object responsible for defining when and how to update a model - `is_ready` is a method that will be called every time a transform method of OnlineLearningWrapper is called, if it returns True then OnlineLearningWrapper will try to get a training data from its accumulator and use it to update a model, or in case of remote updates will try to load a saved state from a connector. - `update` is a method that defines how to update: call a model.update with accumulated data, or start to load a remotely stored state """ @abstractmethod def update(self, model, data): """ Should take model and training data as arguments and return an updated model. The updating logic can be anything, it can use the data, or not use the data, or maybe completely re-instantiates a model. It's up to a developer and what needs they have. This method should also return True/False whether update was completed, this will allow OnlineLearningWrapper to call update again even outside of schedule to check again if update was completed Parameters ---------- model model that needs to be updated data data to use for updates Returns a tuple ------- (updated or original model, True/False if model was updated) """ return model, True class AccumulatorBase(metaclass=ABCMeta): def __init__(self): """ This object is responsible for accumulating incoming data and organizing it into training examples """ self._batch = [] @abstractmethod def add(self, record): """ Implement logic to add record to accumulator """ pass @abstractmethod def get_batch(self): """ Implement logic to return data for updates, don't forget to reset batch if it is needed. """ return self._batch def reset(self): """ removes all stored examples """ self._batch = [] class ScheduleBase(metaclass=ABCMeta): @abstractmethod def is_ready(self): """ Implement logic defining when to update, can depend on a number of examples or a time passed... or checks if download of saved state was complete ... """ return True class OnlineLearningWrapper(ModelBase): def __init__(self, model: StatefulModel, trainer: TrainerBase, schedule: ScheduleBase, accumulator: AccumulatorBase = None, **kwargs): """ This object converts a stateful model to a model that can learn from data passed for inference To learn almost-online two things are needed - Accumulating incoming data into micro-batches. This is be done with accumulator object - Specifying what kind of updates to make (use data to do updates, load from a remote storage, etc...). This is defined in trainer object. - schedule - when to do updates. This is also defined in trainer object Parameters ---------- model a stateful model to convert into an online learning model trainer trainer is responsible to accumulating and keeping examples for training and specifies how to update accumulator an object responsible for accumulating incoming data into batches used for updates. This one is optional, the default is None, in that case now incoming data will be stored anywhwere and will not be used for models updates. This should be used in case of remote updates form a saved files. kwargs """ super(OnlineLearningWrapper, self).__init__(**kwargs) self.trainer = trainer self.model = model self.schedule = schedule self.accumulator = accumulator # self.name = self.model.name self._retry = False self._last_update_data = [] def transform(self, record): """ A higher level composed model (where this wrapper model is just a single step in a chain of transformations) will always call `transform` method for inference. Since the goal is to learn online, this modified `transform` method should contain both actual transform step and updates to the model if needed. Three things happen here 1. add example to accumulator 2. update model if it's time or need to check on incomplete previous updates 3. actual transform of an input data Parameters ---------- record incoming single record of data that needs to be transformed Returns ------- transformed data """ # 1. add record to accum if self.accumulator is not None: self.accumulator.add(record) # 2. maybe update # first see if need to re-try previous update, this is in case update consisted of just a # future and now need to check again if future was done if self._retry: self.model, success = self.trainer.update(self.model, self._last_update_data) self._retry = not success # now see if schedule thinks it's time to update if self.schedule.is_ready(): # get data for updates (could be [], but its ok, model's update should be able to handle that) self._last_update_data = [] if self.accumulator is None else self.accumulator.get_batch() # trainer updates model, could be based on provided data or not at all self.model, success = self.trainer.update(self.model, self._last_update_data) # set to re-try if update was unsuccessful self._retry = not success # 3. finally return the transformation (it does not matter that it is after update at all) return self.model.transform(record) def load(self, *args, **kwargs): """ pass through to the models's load method """ return self.model.load(*args, **kwargs) def save(self, *args, **kwargs): """ pass through to the models's save method Parameters ---------- fpath file path for saving model's state """ return self.model.save(*args, **kwargs) @property def name(self): return self.model.name @name.setter def name(self, new_name): self.model.name = new_name ##### below are specific implementations for accumulators and trainers class CountSchedule(object): def __init__(self, update_freq, start_true=False): """ Schedule that triggers after a specified number of calls Parameters ---------- update_freq number of calls that needs to pass before triggering start_true the very first call will return True, useful in combination with remote loading """ self.update_freq = update_freq self.start_true = start_true self._counter = -1 if start_true else 0 def is_ready(self): """ This is called every time a OnlineLearningWrapper.transform is called. This method will count the number of times it is called and will return True if it counted to `update_freq`, otherwise False Returns ------- True/False whether update need to happen """ self._counter += 1 return self._counter % self.update_freq == 0 class TimeSchedule(object): def __init__(self, update_freq, start_true=False): """ Schedule that triggers after a specified period of time passed Parameters ---------- update_freq time in seconds that needs to pass before triggering start_true the very first call will return True, useful in combination with remote loading """ self.update_freq = update_freq self.start_true = start_true self._last_update_time = time.time() self._first_time = start_true def is_ready(self): """ Start update process when enough time passed Returns ------- True/False if passed time is larger than a given value """ if self._first_time: self._first_time = False return True tmp = time.time() - self._last_update_time > self.update_freq if tmp: self._last_update_time = time.time() return tmp class SimpleAccumulator(AccumulatorBase): def __init__(self, max_size=None): """ A trainer that will update a model when minimum number of examples is accumulated. Parameters ---------- max_size if None, then accumulator does not have a max size and every time data is used the accumulator will be emptied. If not None, then when data is used accumulator will not be emptied, instead a new data will replace an oldest data if size reached max_size """ super(SimpleAccumulator, self).__init__() self._batch = deque(maxlen=max_size) self.max_size = max_size def add(self, record): """ In this simple accumulator examples of data are just added directly Parameters ---------- record a new single example of incoming data """ self._batch.append(record) def get_batch(self): """ Request a micro-batch of data for updates, it could be an empty list if no examples are currently in the accumulator Returns ------- a list of data records intended for updating a model """ batch = self._batch if self.max_size is None: self.reset() return batch class LocalTrainer(TrainerBase): def __init__(self, connector=None, wait_future=False): """ This trainer will be ready to update a model after every `update_freq` examples of new data (because is_ready is called for every example) Parameters ---------- update_freq number of examples for inference to see for each update (there is no guarantee that it will be equal to a number of examples to be used for an actual update, because a matching accumulator might not have any matched examples yet.) connector every time model is updated, it will be also saved to a location specified by a connector (connector.put will be called). Connector should already contain a final destination. """ self.connector = connector self.wait_future = wait_future # need its own thread pool so that reading connector's buffer and unpickling can happen in a separate thread self._pool = ThreadPoolExecutor(1) self._future = None def update(self, model, data): """ This trainer will use passed data for calling model's update method if data is not an empty list. Also if connector is given, it will use it to save state remotely (put to a connector) Parameters ---------- model a model to update data data to use for updating the model Returns ------- (updated model, True) this kind of trainer always returns success """ if len(data) > 0: model = model.update(data) # save to a specified location if provided if self.connector is not None: if self.wait_future: model.save(connector=self.connector) else: self._pool.submit(lambda: model.save(connector=self.connector)) return model, True class RemoteLoadTrainer(TrainerBase): def __init__(self, connector, wait_future=False): """ Loads a state from a remote location Parameters ---------- update_freq how often try to load in seconds connector connector should already have a final remote location where to get saved state """ self.connector = connector self.connector.do_read = False self.wait_future = wait_future # need its own thread pool so that reading connector's buffer and unpickling can happen in a separate thread self._pool = ThreadPoolExecutor(1) self._future = None def update(self, model, _): """ This trainer will start a new thread where loading of new state is done and return a future of its result, and if that future already exists then will check if loading was complete, then will swap current model's state with new loaded one Because this trainer only loads remotely stored state, it does not need any accumulated data. Parameters ---------- model a model to update _ second argument (passed in data) is not used for remote loading Returns ------- (updated model, success indicator) """ success = False if self._future is None: # this will do everything in the background and will return a state object already loaded into memory # self._future = self._pool.submit(lambda: pickle.loads(self.connector.get().read())) self._future = self._pool.submit(lambda: model.load(connector=self.connector)) if self._future.done() or self.wait_future: try: model.state = self._future.result().state except Exception as e: log.error("NO UPDATE:\t{}".format(e)) # reset future only after it is done self._future = None success = True # set True even if there was an exception in future because the process was complete return model, success PK!F 0: with ThreadPool(self.n_proc) as p: # this is a workaround for limiting input iterator consumption, got it from SO buff = [] for itm in self.iterable_input: buff.append(itm) if len(buff) >= self.per_proc_buffer * self.n_proc: if self.ordered: for itm in p.imap(self.fn, buff): yield itm else: for itm in p.imap_unordered(self.fn, buff): yield itm buff = [] # feed the remaining buffer after input is exhausted if self.ordered: for itm in p.imap(self.fn, buff): yield itm else: for itm in p.imap_unordered(self.fn, buff): yield itm elif self.proc_type in ('sub', 'proc', 'subprocess') and self.n_proc > 0: try: log.info("Trying to terminate previous pool") # this is stupid, but that's how pathos is built self.pool.terminate() self.pool.clear() log.info("Yay! Cleared previous process pool") except AttributeError: log.warning("Is this the first time creating a pool...") self.pool = ProcessPool(nodes=self.n_proc) # this is a workaround for limiting input iterator consumption, got it from SO buff = [] for itm in self.iterable_input: buff.append(itm) if len(buff) >= self.per_proc_buffer * self.n_proc: if self.ordered: for itm in self.pool.imap(self.fn, buff): yield itm else: for itm in self.pool.uimap(self.fn, buff): yield itm buff = [] # feed the remaining buffer after input is exhausted if self.ordered: for itm in self.pool.imap(self.fn, buff): yield itm else: for itm in self.pool.uimap(self.fn, buff): yield itm else: for itm in map(self.fn, self.iterable_input): yield itm class ReFilter(ReIterBase): def __init__(self, fn, iterable_input, name='reFilter', verbose=True): """ Filter function that can be used more than once, returns an iterator Parameters ---------- iterable_input iterable input fn filter function name name to use for logging messages """ super().__init__(iterable_input=iterable_input, name=name, verbose=verbose) self.fn = fn def _iter(self): return filter(self.fn, self.iterable_input) class ReDedup(ReIterBase): def __init__(self, iterable_input, dedup_key, name='reFilter', verbose=True): """ This iterator keeps track of seen values of a certain field in items (of type dict) and yield only yet unseen ones The reason this object is here instead of in models is because 1. it does not save the state anywhere 2. it does not return an output for every input Parameters ---------- iterable_input iterable input dedup_key a key of dict for which to track values name name to use for logging messages """ super().__init__(iterable_input=iterable_input, name=name, verbose=verbose) self.dedup_key = dedup_key self._known_values = set() def _iter(self): self._known_values = set() for itm in self.iterable_input: val = itm.get(self.dedup_key) if val in self._known_values: continue else: self._known_values.add(val) yield itm class ReChain(ReIterBase): """ Analog of itertools.chain that can be iterated multiple times Given an iterator where each item is iterable itself, returns a single iterator with all sub-items """ def _iter(self): for itm in self.iterable_input: for sub_itm in itm: yield sub_itm class ReRepeat(ReIterBase): def __init__(self, iterable_input, n=1, name='re-repeat', verbose=True): """ NOT an analog of itertools.repeat Returns an multi-use iterator where each item of an input iterator is repeated n times. Example: given input = [1, 2, 3] and n = 2 -> [1, 1, 2, 2, 3, 3] Parameters ---------- iterable_input input iterable n how many times to repeat each item name name to use for logging messages """ super().__init__(iterable_input=iterable_input, name=name, verbose=verbose) self.n = n def _iter(self): for itm in self.iterable_input: for _ in range(self.n): yield itm class ReCycle(ReIterBase): def __init__(self, iterable_input, n=0, name='re-cycle', verbose=True): """ Analog of itertools.cycle, but can be iterated over multiple times Returns an iterator that repeats input sequence n times, or infinite number of times if n = 0 Parameters ---------- iterable_input iterable input n number of time to repeat input name name to use for logging messages """ super().__init__(iterable_input=iterable_input, name=name, verbose=verbose) self.n = n def _iter(self): n_ = 0 while True: for itm in self.iterable_input: yield itm n_ += 1 if n_ == self.n: break class ReZip(ReIterBase): def __init__(self, *iterable_inputs, name='re-zip', verbose=True): """ Analog of zip, but can be iterated more than once Parameters ---------- iterable_inputs iterable input (each item is a tuple to zip) name name to use for logging messages """ super().__init__(iterable_input=iterable_inputs, name=name, verbose=verbose) def _iter(self): return zip(*self.iterable_input) class DiskCache(ReIterBase): def __init__(self, iterable_input, name='Disk Cache', tmp_file_path=None, tmp_file_dir=None, serializer='json', delete_when_done=True, verbose=True): """ On the first pass the results are stored in a temp file, on all subsequent passes input_iterator is not used, instead results are read from temp file. Parameters ---------- iterable_input input objects name for logging messages tmp_file_path file to save iterable serializer how to serialise objects to file, default 'json' delete_when_done if True, deletes file when object is deleted """ super().__init__(iterable_input=iterable_input, name=name, verbose=verbose) self.delete_when_done = delete_when_done self.serializer = serializer assert self.serializer in ['json'], "Unsupported serializer" self.tmp_filename = str(uuid.uuid4()) if tmp_file_path is None else tmp_file_path if tmp_file_dir is not None: self.tmp_filename = os.path.join(tmp_file_dir, self.tmp_filename) dir_path = os.path.split(self.tmp_filename)[0] if len(dir_path) > 0: os.makedirs(dir_path, exist_ok=True) self.cleanup() # in case file already exists self.cached = False self.file_size = 0 def _yield_write_json(self): log.info("{}:\tSaving iterable to {}".format(self.name, self.tmp_filename)) with open(self.tmp_filename, 'w') as f: for res in self.iterable_input: try: self.file_size += f.write(json.dumps(res) + '\n') except: log.info(res) raise yield res self.cached = True log.info("{}:\tSaved iterable to {}, size {:,}".format(self.name, self.tmp_filename, self.file_size)) def _yield_read_json(self): log.info("{}:\tReading saved iterable from {}".format(self.name, self.tmp_filename)) with open(self.tmp_filename, 'r') as f: for line in f: yield json.loads(line) def _iter(self): if not self.cached: if not os.path.isfile(self.tmp_filename): return self._yield_write_json() else: return self.iterable_input else: return self._yield_read_json() def cleanup(self): """ remove temp file """ if os.path.isfile(self.tmp_filename): os.unlink(self.tmp_filename) def __del__(self): if self.delete_when_done: self.cleanup() class ReBatch(ReIterBase): def __init__(self, iterable_input, batch_size, only_full=False, shuffle=False, buffer_size=None, name='reBatch', verbose=True): """ combine items from input iterator into batches Parameters ---------- iterable_input iterator with individual items batch_size batch size only_full if True, skip last batch that has less than batch_size items shuffle buffer_size name name to use for logging """ super().__init__(iterable_input=iterable_input, name=name, verbose=verbose) self.batch_size = batch_size self.only_full = only_full self.shuffle = shuffle self.buffer_size = self.batch_size if buffer_size is None else buffer_size def _iter(self): buffer = [] i = 0 for itm in self.iterable_input: buffer.append(itm) i += 1 if i == self.buffer_size: if self.shuffle: random.shuffle(buffer) # release one or more full batches n_batches = self.buffer_size // self.batch_size if n_batches > 1: # yield only half of available batches # intended for use with buffer_size >> batch_size and shuffle=True n_batches = n_batches // 2 for i in range(n_batches): batch = buffer[i * self.batch_size: (i + 1) * self.batch_size] yield batch buffer = buffer[self.batch_size * n_batches:] i = len(buffer) # yielding remaining if self.shuffle: random.shuffle(buffer) # number of full batches n_batches = len(buffer) // self.batch_size for i in range(n_batches): batch = buffer[i * self.batch_size: (i + 1) * self.batch_size] yield batch if not self.only_full and len(buffer) > n_batches * self.batch_size: batch = buffer[n_batches * self.batch_size:] yield batch class BucketBatch(ReIterBase): def __init__(self, iterable_input, batch_size, buckets, pad_index, only_full=False, field=None, shuffle=False, buffer_size=None, name='Bucket Batch', verbose=True): """ combine items from input iterator into batches Parameters ---------- iterable_input iterator with individual items batch_size batch size buckets list of integers - quantized lengths of sequences, for example [8, 16, 32, 64] means no sequence below length 4, or above 64 pad_index index to pad with only_full if True, skip last batch that has less than batch_size items name name to use for logging """ super().__init__(iterable_input=iterable_input, name=name, verbose=verbose) self.batch_size = batch_size self.buckets = buckets self.max_length = buckets[-1] self.pad_index = pad_index self.only_full = only_full self.field = field self.shuffle = shuffle self.buffer_size = self.batch_size if buffer_size is None else buffer_size def _iter(self): bucket_batches = {b: [] for b in self.buckets} bucket_sizes = {b: 0 for b in self.buckets} for itm in self.iterable_input: if self.field is not None: if isinstance(itm, tuple): itm = list(itm) real_itm = itm itm = itm[self.field] length = len(itm) if length == 0: continue if length > self.max_length: itm = itm[:self.max_length] length = self.max_length for lower_b, upper_b in zip(self.buckets, self.buckets[1:]): if lower_b < length <= upper_b: num_pad = upper_b - length itm += [self.pad_index] * num_pad if self.field is not None: real_itm[self.field] = itm itm = real_itm bucket_batches[lower_b].append(itm) bucket_sizes[lower_b] += 1 ########################## # yielding if bucket_sizes[lower_b] == self.buffer_size: buffer = bucket_batches[lower_b] if self.shuffle: random.shuffle(buffer) # release one or more full batches n_batches = self.buffer_size // self.batch_size if n_batches > 1: n_batches = n_batches // 2 for i in range(n_batches): batch = buffer[i * self.batch_size: (i + 1) * self.batch_size] yield batch bucket_batches[lower_b] = buffer[self.batch_size * n_batches:] bucket_sizes[lower_b] = len(bucket_batches[lower_b]) ########################## # since this item was placed, stop the loop break ########################## # yielding remaining for lower_b in self.buckets[:-1]: buffer = bucket_batches[lower_b] if self.shuffle: random.shuffle(buffer) # number of full batches n_batches = len(buffer) // self.batch_size for i in range(n_batches): batch = buffer[i * self.batch_size: (i + 1) * self.batch_size] yield batch # check if there is last batch and it needs to be yielded if not self.only_full and len(buffer) > n_batches * self.batch_size: batch = buffer[n_batches * self.batch_size:] yield batch class Progress(ReIterBase): def __init__(self, iterable_input, max_period=2000, name='progress', verbose=True): """Utility for logging number of processed items Example: # >>> a = range(20) # >>> b = Progress(a, max_period=7) # >>> _ = [_ for _ in b] # 2018-02-09T14:46:05PST - re_iter.py - INFO - progress: Start iter number 5 # 2018-02-09T14:46:05PST - re_iter.py - INFO - progress yielded 1 items # 2018-02-09T14:46:05PST - re_iter.py - INFO - progress yielded 2 items # 2018-02-09T14:46:05PST - re_iter.py - INFO - progress yielded 5 items # 2018-02-09T14:46:05PST - re_iter.py - INFO - progress yielded 7 items # 2018-02-09T14:46:05PST - re_iter.py - INFO - progress yielded 14 items # 2018-02-09T14:46:05PST - re_iter.py - INFO - progress: Finished iter number 5 total items: 20 total time: 0.0 sec # Out[10]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] """ super().__init__(iterable_input=iterable_input, name=name, verbose=verbose) self.max_period = max(max_period, 5) self.checkpoints = [] k = 1 while k < max_period: self.checkpoints.extend([k, 2 * k, 5 * k]) k *= 10 self.checkpoints = [ch for ch in self.checkpoints if ch < max_period] def _iter(self): t_start = time.time() i_start = 0 ema_speed = 0 for i, itm in enumerate(self.iterable_input): if (i + 1) in self.checkpoints: log.info("{} yielded {} items".format(self.name, i + 1)) elif (i + 1) % self.max_period == 0: t_now = time.time() speed_now = (i - i_start) / (t_now - t_start) ema_speed = speed_now if ema_speed == 0 else (0.9 * ema_speed + 0.1 * speed_now) t_start = t_now i_start = i log.info("{} yielded {} items.\tspeed now {:.2f}\tEMA speed {:.2f}".format( self.name, i + 1, speed_now, ema_speed)) yield itm PK!qcranial/records_wrapper.py""" functions to deal with the need for models to operate on dictionaries only """ def wrap_apply_to_fields(fn, in_field, out_field): """ Modifies a given function that performs transformation "in_data -> out_data" to the one that performs "{in_field: in_data} -> {in_field: in_data, out_field: out_data}" Parameters ---------- fn original function in_field field of an input dictionary where in_data will be out_field field of an output dictionary where to put transformed data Returns ------- modified function """ def fn_wrapped(record): if in_field in record.keys(): record[out_field] = fn(record[in_field]) return record return fn_wrapped def get_copy_fields_fn(fields, defaults=None): """ creates a function to use in map that copies incoming objects but leaves only certain fields Parameters ---------- fields list of fields to copy defaults if field is emty what default value to use, can be None, a single value, or a dictionary of values for each field in fields Returns ------- function to copy objects """ if isinstance(fields, list): # convert to dictionary fields = {f: f for f in fields} if defaults is None: defaults = {f: None for f in fields.keys()} elif not isinstance(defaults, dict): defaults = {f: defaults for f in fields.keys()} else: defaults = {f: defaults.get(f) for f in fields.keys()} def copy_fields_fn(obj): """ copy a subset of fields, optionally rename and input defaults for missing values """ return {out_f: obj.get(in_f, defaults[in_f]) for in_f, out_f in fields.items()} return copy_fields_fnPK!~cranial/tests/test_cranial.py"""Provides doctests for unittest discovery.""" import doctest from cranial import model_base def load_tests(loader, tests, ignore): tests.addTests(doctest.DocTestSuite(model_base)) return tests PK!H0g? ? cranial/tests/test_model_base.pyimport unittest import os from cranial.common import logger from cranial.model_base import State, StatefulModel, ModelBase log = logger.get('test_re_iter') class DummyModel(ModelBase): def transform(self, record): return record * 2 class DummyStateful(StatefulModel): def transform(self, record): return record * self.state.n def train(self, iterable): c = 0 for _ in iterable: c += 1 self.state.n = c class TestModelBase(unittest.TestCase): def test_State_save(self): s = State() s.foo = 'bar' s.save('tmp_state') actual = os.path.isfile('tmp_state') os.unlink('tmp_state') self.assertTrue(actual, 'should save state into a file') def test_State_save_load(self): s = State() s.foo = 'bar' s.save('tmp_state') s1 = State.load('tmp_state') os.unlink('tmp_state') actual = str(s1) expected = str(s) self.assertEqual(actual, expected, "saved and loaded states should be the same") def test_ModelBase_init(self): m = DummyModel(foo='bar') actual = getattr(m, 'foo', 'not bar') expected = 'bar' self.assertEqual(actual, expected, 'any argument to init should become an attribute') def test_dummyModel(self): m = DummyModel() inputs = [0, 1, 2, 3, 4] out = m.itransform(inputs) actual = [_ for _ in out] + [_ for _ in out] expected = [0, 2, 4, 6, 8] * 2 self.assertListEqual(actual, expected, 'model should return numbers multiplied by two, twice') def test_StatefulModel_init(self): m = DummyStateful() actual = hasattr(m, 'state') self.assertTrue(actual, 'on init model should create a state') def test_StatefulModel_init_args(self): m = DummyStateful(foo='bar') actual = getattr(m, 'foo', 'not bar') expected = 'bar' self.assertEqual(actual, expected, 'any argument to init should become an attribute') def test_StatefulModel_train(self): m = DummyStateful() inputs = [0, 1, 2, 3, 4] m.train(inputs) actual = m.state.n expected = 5 self.assertEqual(actual, expected, "should count objects and make result as attribute of state") def test_StatefulModel_transform(self): m = DummyStateful() inputs = [0, 1, 2, 3, 4] m.state.n = 2 out = m.itransform(inputs) actual = [_ for _ in out] + [_ for _ in out] expected = [0, 2, 4, 6, 8] * 2 self.assertEqual(actual, expected, "model should return numbers multiplied by two, twice") def test_StatefulModel_save(self): m1 = DummyStateful() m1.state.n = 2 m1.save('tmp_model_state') actual = os.path.isfile('tmp_model_state') os.unlink('tmp_model_state') self.assertTrue(actual, 'should save state to a file') def test_StatefulModel_save_load(self): m1 = DummyStateful() m1.state.n = 2 m1.save('tmp_model_state') m2 = DummyStateful() m2.load('tmp_model_state') os.unlink('tmp_model_state') actual = str(m2.state) expected = str(m1.state) self.assertEqual(actual, expected, "saved and loaded states should match") if __name__ == '__main__': unittest.main() PK!_Zv%cranial/tests/test_online_learning.pyimport unittest from cranial.common import logger from cranial.model_base import StatefulModel from cranial.online_training import OnlineLearningWrapper, TrainerBase, \ AccumulatorBase, CountSchedule log = logger.get('test_online_learning') class DummyModel(StatefulModel): def __init__(self): super(DummyModel, self).__init__() self.state.n = 0 def transform(self, record): return record * self.state.n def update(self, iterable): for _ in iterable: self.state.n += 1 return self def train(self, iterable): return self.update(iterable) class DummyTrainer(TrainerBase): is_ready = True def update(self, model, data): model.state.n = 1 return model class DummyAccum(AccumulatorBase): _batch = [1, 2, 3] def add(self, record): pass def get_batch(self): return self._batch class TestModelBase(unittest.TestCase): def test_olwrapper_init(self): t = DummyTrainer() m = DummyModel() a = DummyAccum() s = CountSchedule(1) om = OnlineLearningWrapper(model=m, trainer=t, accumulator=a, schedule=s) actual = [ om.trainer is t, om.model is m, om.accumulator is a ] expected = [True, True, True] self.assertListEqual(actual, expected, 'should make args as attributes') if __name__ == '__main__': unittest.main() PK!((cranial/tests/test_re_iter.pyimport unittest import sys sys.path.append('.') # in case file is run from root dir from cranial.re_iter import * from cranial.common import logger log = logger.get(name='test_re_iter') def dummy_fn(x): return 2 * x class TestReIter(unittest.TestCase): def test_ReGenerator(self): gen_fn = lambda: range(5) out = ReGenerator(gen_fn) actual = [_ for _ in out] + [_ for _ in out] expected = [_ for _ in range(5)] + [_ for _ in range(5)] self.assertListEqual(actual, expected, 'should repeat 0->4 sequence twice') def test_ReFilter(self): inpt = [0, 1, 2, 3, 4] out = ReFilter(iterable_input=inpt, fn=lambda x: x % 2) actual = [_ for _ in out] + [_ for _ in out] expected = [1, 3, 1, 3] self.assertListEqual(actual, expected, 'should leave only odd numbers, twice') def test_ReChain(self): inpt = [ [0, 1, 2], [3, ], [4, 5] ] out = ReChain(inpt) actual = [_ for _ in out] + [_ for _ in out] expected = list(range(6)) * 2 self.assertListEqual(actual, expected, 'should extend into a single sequence, twice') def test_ReRepeat(self): inpt = [0, 1, 2, ] out = ReRepeat(inpt, n=2) actual = [_ for _ in out] + [_ for _ in out] expected = [0, 0, 1, 1, 2, 2] * 2 self.assertListEqual(actual, expected, 'should repeat each item twice, twice') def test_ReCycle(self): inpt = [0, 1, 2, ] out = ReCycle(inpt, n=2) actual = [_ for _ in out] + [_ for _ in out] expected = [0, 1, 2, ] * 4 self.assertListEqual(actual, expected, 'should repeat sequence twice, twice') def test_ReZip(self): inpt_1 = [0, 1, 2, ] inpt_2 = [0, 1, 2, 3] out = ReZip(inpt_1, inpt_2) actual = [_ for _ in out] + [_ for _ in out] expected = [(0, 0), (1, 1), (2, 2)] * 2 self.assertListEqual(actual, expected, 'should zip two input sequences (to the end of shortest), twice') def test_ReMap_main_proc(self): inpt = [0, 1, 2, 3, 4] fn = lambda x: 2 * x out = ReMap(iterable_input=inpt, fn=fn) actual = [_ for _ in out] + [_ for _ in out] expected = [0, 2, 4, 6, 8] * 2 self.assertListEqual(actual, expected, 'should apply x2 function to input, twice') def test_ReMap_sub_proc(self): inpt = [0, 1, 2, 3, 4] out = ReMap(iterable_input=inpt, fn=dummy_fn, proc_type='sub', n_proc=2, verbose=True) actual = [_ for _ in out] + [_ for _ in out] expected = [0, 2, 4, 6, 8] * 2 self.assertListEqual(actual, expected, 'should apply x2 function to input, twice') def test_ReMap_threads(self): inpt = [0, 1, 2, 3, 4] out = ReMap(iterable_input=inpt, fn=dummy_fn, proc_type='th', n_proc=2, verbose=True) actual = [_ for _ in out] + [_ for _ in out] expected = [0, 2, 4, 6, 8] * 2 self.assertListEqual(actual, expected, 'should apply x2 function to input, twice') def test_ReBatch_no_random_only_full(self): inpt = [0, 1, 2, 3, 4] out = ReBatch(inpt, 2, only_full=True, shuffle=False, buffer_size=None) actual = [_ for _ in out] + [_ for _ in out] expected = [[0, 1], [2, 3]] * 2 self.assertListEqual(actual, expected, 'should make two batches only, twice') def test_ReBatch_no_random(self): inpt = [0, 1, 2, 3, 4] out = ReBatch(inpt, 2, only_full=False, shuffle=False, buffer_size=None) actual = [_ for _ in out] + [_ for _ in out] expected = [[0, 1], [2, 3], [4]] * 2 self.assertListEqual(actual, expected, 'should make two batches only, twice') def test_ReBatch_random_only_full_len(self): inpt = [0, 1, 2, 3, 4] out = ReBatch(inpt, 2, only_full=True, shuffle=True, buffer_size=None) res = [_ for _ in out] + [_ for _ in out] actual = [ len(res), len(res[0]), sum([len(r) for r in res]) ] expected = [4, 2, 8] self.assertListEqual(actual, expected, 'should produce 4 lists of 2 items each') def test_ReBatch_no_random_len(self): inpt = [0, 1, 2, 3, 4] out = ReBatch(inpt, 2, only_full=False, shuffle=True, buffer_size=None) res = [_ for _ in out] + [_ for _ in out] actual = [ len(res), len(res[0]), len(res[2]), sum([len(r) for r in res]) ] expected = [6, 2, 1, 10] self.assertListEqual(actual, expected, 'should produce two lists of two items, then ' 'list of one item, then repeat') def test_ReBatch_random_full_only_2iters_difference(self): inpt = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] # larger sequence to decrease the chance of accidental order out = ReBatch(inpt, 4, only_full=True, shuffle=True, buffer_size=None) actual_0 = [_ for _ in out] actual_1 = [_ for _ in out] self.assertNotEqual(actual_0, actual_1, 'should randomize every iteration') def test_ReBatch_random_full_only(self): inpt = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] # larger sequence to decrease the chance of accidental order out = ReBatch(inpt, 4, only_full=True, shuffle=True, buffer_size=None) actual_0 = [_ for _ in out] actual_1 = [_ for _ in out] actual = actual_0 + actual_1 notexpected = [[0, 1, 2, 3], [4, 5, 6, 7]] * 2 self.assertNotEqual(actual, notexpected, 'should not return items in order') def test_BucketBatch_no_random_only_full(self): inpt = [ [0, 1, 2], [0], [0, 1], [3, 4, 5, 6, 7], [1], [2, 3], [7, 8, 9], [2], [4, 5], ] out = BucketBatch(inpt, batch_size=2, buckets=[0, 1, 2, 4], pad_index=-1, only_full=True, shuffle=False, buffer_size=None) actual = [_ for _ in out] expected = [ [ [0, 1, 2, -1], [3, 4, 5, 6], ], [ [0], [1], ], [ [0, 1], [2, 3] ] ] self.assertListEqual(actual, expected, 'first should return a list of two lists of 4 items, ' 'first list should end with padding index: -1; then ' 'should return list of two lists of one item each; finally ' 'should return list of two lists of 2 items each') def test_ReBucketBatch_no_random(self): inpt = [ [0, 1, 2], [0], [0, 1], [3, 4, 5, 6, 7], [1], [2, 3], [7, 8, 9], [2], [4, 5], ] out = BucketBatch(inpt, batch_size=2, buckets=[0, 1, 2, 4], pad_index=-1, only_full=False, shuffle=False, buffer_size=None) actual = [_ for _ in out] expected = [ [ [0, 1, 2, -1], [3, 4, 5, 6], ], [ [0], [1], ], [ [0, 1], [2, 3] ], [ [2], ], [ [4, 5], ], [ [7, 8, 9, -1] ] ] self.maxDiff = None self.assertListEqual(actual, expected, 'first should return a list of two lists of 4 items, ' 'first list should end with padding index: -1; then ' 'should return list of two lists of one item each; finally ' 'should return list of two lists of 2 items each; after that ' 'start returning lists of one list of increasing lengths: 1, 2, 4; ' 'final list should have last item -1 again') def test_BucketBatch_random_only_full_len(self): inpt = [ [0, 1, 2], [0], [0, 1], [3, 4, 5, 6], [1], [2, 3], [7, 8, 9], [2], [4, 5], ] out = BucketBatch(inpt, batch_size=2, buckets=[0, 1, 2, 4], pad_index=-1, only_full=True, shuffle=True, buffer_size=10) res = [_ for _ in out] actual = [ [len(r) for r in res], [len(r[0]) for r in res], ] expected = [ [2, 2, 2], [1, 2, 4] ] self.assertListEqual(actual, expected, 'since buffer is large, it waits till the end and ' 'then returns in order of increasing length') def test_BucketBatch_randomization_only_full(self): inpt = [ [0], [1], [2], [3], [4], [5], [6], [7], [8], [9], ] out = BucketBatch(inpt, batch_size=2, buckets=[0, 1, ], pad_index=-1, only_full=True, shuffle=True, buffer_size=10) actual = [_ for _ in out] notexpected = [ [[0], [1]], [[2], [3]], [[4], [5]], [[6], [7]], [[8], [9]], ] self.assertNotEqual(actual, notexpected, 'sequence should be randomized') def test_DiskCache(self): inpt = [0, 1, 2, 3, 4] out = DiskCache(inpt, tmp_file_path='tmp_test_file') file_path = out.tmp_filename res1 = [_ for _ in out] actual = [os.path.isfile(file_path)] res2 = [_ for _ in out] del out actual.append(os.path.isfile(file_path)) actual.append(res1) actual.append(res2) expected = [True, False, inpt, inpt] self.assertListEqual(actual, expected, "should make file, then delete at the end, in " "the meantime should produce input seq twice") if __name__ == '__main__': unittest.main() PK! -cranial/tests/test_re_iter_multiprocessing.pyimport sys sys.path.append('.') # in case file is run from root dir import unittest import numpy as np from cranial.common import logger from cranial.re_iter_multiprocessing import * log = logger.get('test_re_iter') def dummy_fn(x): return 2 * x class TestReIter(unittest.TestCase): def test_just_an_example(self): x = list('kjhiwuhewkaalskjqoik c2837y2hdins') i2q = IterToQueue(x) step = Step(MapOperator, operator_kwargs={'fn': ord}, previous_step=i2q, n_workers=2) step2 = Step(MapOperator, operator_kwargs={'fn': chr}, previous_step=step, n_workers=2) step3 = Step(BatchOp, operator_kwargs={'batch_size': 4, 'buffer_size': 100, 'only_full': False, 'shuffle': True}, previous_step=step2, n_workers=2) res = FromQueue(step3) actual1 = [r for r in res] actual2 = [r for r in res] log.info("Example using multiprocessing steps:\nOnce:\n" + str(actual1) + '\nTwice:\n' + str(actual2)) def test_IterToQueue(self): inpt = [0, 1, 2, 3, 4] i2q = IterToQueue(inpt) i2q.start() actual = [] while True: obj = i2q.out_q.get() if obj is None: break actual.append(obj) expected = [0, 1, 2, 3, 4] self.assertListEqual(actual, expected, 'should get out what put in') def test_FromQueue(self): inpt = [0, 1, 2, 3, 4] i2q = IterToQueue(inpt) out = FromQueue(i2q) actual = [_ for _ in out] + [_ for _ in out] expected = [0, 1, 2, 3, 4] * 2 self.assertListEqual(actual, expected, 'should get out what put in, twice') def test_MapOperator_1_proc(self): inpt = [0, 1, 2, 3, 4] in_q = Queue() out_q = Queue() in_q.name = 'in queue' out_q.name = 'out queue' map_op = MapOperator(dummy_fn, in_q, out_q, n_siblings=1) map_op.start() for i in inpt: in_q.put(i) in_q.put(None) actual = [] while True: obj = out_q.get() if obj is None: break actual.append(obj) expected = [0, 2, 4, 6, 8] self.assertListEqual(actual, expected, 'should multiply each input by 2') def test_MapOperator_2_proc(self): inpt = [0, 1, 2, 3, 4] in_q = Queue() out_q = Queue() in_q.name = 'in queue' out_q.name = 'out queue' map_ops = [MapOperator(dummy_fn, in_q, out_q, n_siblings=2, verbose=False) for _ in range(2)] [op.start() for op in map_ops] for i in inpt: in_q.put(i) in_q.put(None) actual = [] while True: obj = out_q.get() if obj is None: break actual.append(obj) actual.sort() # since now they can mix order expected = [0, 2, 4, 6, 8] self.assertListEqual(actual, expected, 'should multiply each input by 2') def test_MapOperator_1_proc_seq(self): inpt = [0, 1, 2, 3, 4] in_q = Queue() out_q = Queue() in_q.name = 'in queue' out_q.name = 'out queue' map_op = MapOperator(lambda i: 'a' * i, in_q, out_q, n_siblings=1, res_is_sequence=True) map_op.start() for i in inpt: in_q.put(i) in_q.put(None) actual = [] while True: obj = out_q.get() if obj is None: break actual.append(obj) expected = ['a'] * 10 self.assertListEqual(actual, expected, 'should return single list of (0+1+2+3+4)=10 letters "a"') def test_Step(self): inpt = [0, 1, 2, 3, 4] i2q = IterToQueue(inpt) step = Step(MapOperator, operator_kwargs={'fn': dummy_fn, 'res_is_sequence': False}, previous_step=i2q, n_workers=2) out = FromQueue(step) actual = [_ for _ in out] + [_ for _ in out] actual.sort() # since order is not guaranteed with n_workers > 1 expected = [0, 0, 2, 2, 4, 4, 6, 6, 8, 8] self.assertListEqual(actual, expected, 'should multiply each input by 2, twice') def test_BatchOp_no_random_with_last(self): inpt = [0, 1, 2, 3, 4] i2q = IterToQueue(inpt) batch = Step(BatchOp, operator_kwargs={'batch_size': 2, 'shuffle': False, 'only_full': False, }, previous_step=i2q, n_workers=1) out = FromQueue(batch) actual = [_ for _ in out] + [_ for _ in out] expected = [[0, 1], [2, 3], [4], [0, 1], [2, 3], [4]] self.assertListEqual(actual, expected, 'should make batches of two, and then last batch of size 1, twice') def test_BatchOp_no_random_full_only(self): inpt = [0, 1, 2, 3, 4] i2q = IterToQueue(inpt) batch = Step(BatchOp, operator_kwargs={'batch_size': 2, 'shuffle': False, 'only_full': True, }, previous_step=i2q, n_workers=1) out = FromQueue(batch) actual = [_ for _ in out] + [_ for _ in out] expected = [[0, 1], [2, 3], [0, 1], [2, 3]] self.assertListEqual(actual, expected, 'should make only batches of two, twice') def test_BatchOp_random_full_only(self): inpt = [0, 1, 2, 3, 4] i2q = IterToQueue(inpt) batch = Step(BatchOp, operator_kwargs={'batch_size': 2, 'shuffle': True, 'only_full': True, 'buffer_size': 3}, previous_step=i2q, n_workers=1) out = FromQueue(batch) actual = [_ for _ in out] + [_ for _ in out] notexpected = [[0, 1], [2, 3], [0, 1], [2, 3]] self.assertNotEqual(actual, notexpected, 'should make random batches of two only, twice') def test_BatchOp_random_full_only_len_1(self): inpt = [0, 1, 2, 3, 4] i2q = IterToQueue(inpt) batch = Step(BatchOp, operator_kwargs={'batch_size': 2, 'shuffle': True, 'only_full': True, 'buffer_size': 3}, previous_step=i2q, n_workers=1) out = FromQueue(batch) res = [_ for _ in out] + [_ for _ in out] actual = [len(res), [len(r) for r in res]] expected = [4, [2, 2, 2, 2]] self.assertListEqual(actual, expected, 'should make 4 batches of length 2') def test_BatchOp_random_full_only_len_2(self): inpt = [0, 1, 2, 3, 4] i2q = IterToQueue(inpt) batch = Step(BatchOp, operator_kwargs={'batch_size': 2, 'shuffle': True, 'only_full': True, 'buffer_size': 5}, previous_step=i2q, n_workers=1) out = FromQueue(batch) res = [_ for _ in out] + [_ for _ in out] actual = [len(res), [len(r) for r in res]] expected = [4, [2, 2, 2, 2]] self.assertListEqual(actual, expected, 'should make 4 batches of length 2') def test_BatchOp_random_full_only_len_3(self): inpt = [0, 1, 2, 3, 4] i2q = IterToQueue(inpt) batch = Step(BatchOp, operator_kwargs={'batch_size': 2, 'shuffle': True, 'only_full': True, 'buffer_size': 50}, previous_step=i2q, n_workers=1) out = FromQueue(batch) res = [_ for _ in out] + [_ for _ in out] actual = [len(res), [len(r) for r in res]] expected = [4, [2, 2, 2, 2]] self.assertListEqual(actual, expected, 'should make 4 batches of length 2') if __name__ == '__main__': unittest.main() PK!|wfKK(cranial_modeling-0.2.0.dist-info/LICENSE GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The GNU General Public License is a free, copyleft license for software and other kinds of works. The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it applies also to any other work released this way by its authors. You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things. To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others. For example, if you distribute copies of such a program, whether gratis or for a fee, you must pass on to the recipients the same freedoms that you received. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. Developers that use the GNU GPL protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License giving you legal permission to copy, distribute and/or modify it. For the developers' and authors' protection, the GPL clearly explains that there is no warranty for this free software. For both users' and authors' sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions. Some devices are designed to deny users access to install or run modified versions of the software inside them, although the manufacturer can do so. This is fundamentally incompatible with the aim of protecting users' freedom to change the software. The systematic pattern of such abuse occurs in the area of products for individuals to use, which is precisely where it is most unacceptable. Therefore, we have designed this version of the GPL to prohibit the practice for those products. If such problems arise substantially in other domains, we stand ready to extend this provision to those domains in future versions of the GPL, as needed to protect the freedom of users. Finally, every program is threatened constantly by software patents. States should not allow patents to restrict development and use of software on general-purpose computers, but in those that do, we wish to avoid the special danger that patents applied to a free program could make it effectively proprietary. To prevent this, the GPL assures that patents cannot be used to render the program non-free. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS 0. Definitions. "This License" refers to version 3 of the GNU General Public License. "Copyright" also means copyright-like laws that apply to other kinds of works, such as semiconductor masks. "The Program" refers to any copyrightable work licensed under this License. Each licensee is addressed as "you". "Licensees" and "recipients" may be individuals or organizations. To "modify" a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a "modified version" of the earlier work or a work "based on" the earlier work. A "covered work" means either the unmodified Program or a work based on the Program. To "propagate" a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well. To "convey" a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying. An interactive user interface displays "Appropriate Legal Notices" to the extent that it includes a convenient and prominently visible feature that (1) displays an appropriate copyright notice, and (2) tells the user that there is no warranty for the work (except to the extent that warranties are provided), that licensees may convey the work under this License, and how to view a copy of this License. If the interface presents a list of user commands or options, such as a menu, a prominent item in the list meets this criterion. 1. Source Code. The "source code" for a work means the preferred form of the work for making modifications to it. "Object code" means any non-source form of a work. A "Standard Interface" means an interface that either is an official standard defined by a recognized standards body, or, in the case of interfaces specified for a particular programming language, one that is widely used among developers working in that language. The "System Libraries" of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A "Major Component", in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it. The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work. For example, Corresponding Source includes interface definition files associated with source files for the work, and the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work. The Corresponding Source need not include anything that users can regenerate automatically from other parts of the Corresponding Source. The Corresponding Source for a work in source code form is that same work. 2. Basic Permissions. All rights granted under this License are granted for the term of copyright on the Program, and are irrevocable provided the stated conditions are met. This License explicitly affirms your unlimited permission to run the unmodified Program. The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work. This License acknowledges your rights of fair use or other equivalent, as provided by copyright law. You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you. Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary. 3. Protecting Users' Legal Rights From Anti-Circumvention Law. No covered work shall be deemed part of an effective technological measure under any applicable law fulfilling obligations under article 11 of the WIPO copyright treaty adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention of such measures. When you convey a covered work, you waive any legal power to forbid circumvention of technological measures to the extent such circumvention is effected by exercising rights under this License with respect to the covered work, and you disclaim any intention to limit operation or modification of the work as a means of enforcing, against the work's users, your or third parties' legal rights to forbid circumvention of technological measures. 4. Conveying Verbatim Copies. You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program. You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee. 5. Conveying Modified Source Versions. You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions: a) The work must carry prominent notices stating that you modified it, and giving a relevant date. b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to "keep intact all notices". c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it. d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so. A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an "aggregate" if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate. 6. Conveying Non-Source Forms. You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways: a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange. b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge. c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b. d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements. e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d. A separable portion of the object code, whose source code is excluded from the Corresponding Source as a System Library, need not be included in conveying the object code work. A "User Product" is either (1) a "consumer product", which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling. In determining whether a product is a consumer product, doubtful cases shall be resolved in favor of coverage. For a particular product received by a particular user, "normally used" refers to a typical or common use of that class of product, regardless of the status of the particular user or of the way in which the particular user actually uses, or expects or is expected to use, the product. A product is a consumer product regardless of whether the product has substantial commercial, industrial or non-consumer uses, unless such uses represent the only significant mode of use of the product. "Installation Information" for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made. If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information. But this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product (for example, the work has been installed in ROM). The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network. Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying. 7. Additional Terms. "Additional permissions" are terms that supplement the terms of this License by making exceptions from one or more of its conditions. Additional permissions that are applicable to the entire Program shall be treated as though they were included in this License, to the extent that they are valid under applicable law. If additional permissions apply only to part of the Program, that part may be used separately under those permissions, but the entire Program remains governed by this License without regard to the additional permissions. When you convey a copy of a covered work, you may at your option remove any additional permissions from that copy, or from any part of it. (Additional permissions may be written to require their own removal in certain cases when you modify the work.) You may place additional permissions on material, added by you to a covered work, for which you have or can give appropriate copyright permission. Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms: a) Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or c) Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or d) Limiting the use for publicity purposes of names of licensors or authors of the material; or e) Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or f) Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors. All other non-permissive additional terms are considered "further restrictions" within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying. If you add terms to a covered work in accord with this section, you must place, in the relevant source files, a statement of the additional terms that apply to those files, or a notice indicating where to find the applicable terms. Additional terms, permissive or non-permissive, may be stated in the form of a separately written license, or stated as exceptions; the above requirements apply either way. 8. Termination. You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License (including any patent licenses granted under the third paragraph of section 11). However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation. Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice. Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, you do not qualify to receive new licenses for the same material under section 10. 9. Acceptance Not Required for Having Copies. You are not required to accept this License in order to receive or run a copy of the Program. Ancillary propagation of a covered work occurring solely as a consequence of using peer-to-peer transmission to receive a copy likewise does not require acceptance. However, nothing other than this License grants you permission to propagate or modify any covered work. These actions infringe copyright if you do not accept this License. Therefore, by modifying or propagating a covered work, you indicate your acceptance of this License to do so. 10. Automatic Licensing of Downstream Recipients. Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License. An "entity transaction" is a transaction transferring control of an organization, or substantially all assets of one, or subdividing an organization, or merging organizations. If propagation of a covered work results from an entity transaction, each party to that transaction who receives a copy of the work also receives whatever licenses to the work the party's predecessor in interest had or could give under the previous paragraph, plus a right to possession of the Corresponding Source of the work from the predecessor in interest, if the predecessor has it or can get it with reasonable efforts. You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License. For example, you may not impose a license fee, royalty, or other charge for exercise of rights granted under this License, and you may not initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging that any patent claim is infringed by making, using, selling, offering for sale, or importing the Program or any portion of it. 11. Patents. A "contributor" is a copyright holder who authorizes use under this License of the Program or a work on which the Program is based. The work thus licensed is called the contributor's "contributor version". A contributor's "essential patent claims" are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License, of making, using, or selling its contributor version, but do not include claims that would be infringed only as a consequence of further modification of the contributor version. For purposes of this definition, "control" includes the right to grant patent sublicenses in a manner consistent with the requirements of this License. Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor's essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version. In the following three paragraphs, a "patent license" is any express agreement or commitment, however denominated, not to enforce a patent (such as an express permission to practice a patent or covenant not to sue for patent infringement). To "grant" such a patent license to a party means to make such an agreement or commitment not to enforce a patent against the party. If you convey a covered work, knowingly relying on a patent license, and the Corresponding Source of the work is not available for anyone to copy, free of charge and under the terms of this License, through a publicly available network server or other readily accessible means, then you must either (1) cause the Corresponding Source to be so available, or (2) arrange to deprive yourself of the benefit of the patent license for this particular work, or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients. "Knowingly relying" means you have actual knowledge that, but for the patent license, your conveying the covered work in a country, or your recipient's use of the covered work in a country, would infringe one or more identifiable patents in that country that you have reason to believe are valid. If, pursuant to or in connection with a single transaction or arrangement, you convey, or propagate by procuring conveyance of, a covered work, and grant a patent license to some of the parties receiving the covered work authorizing them to use, propagate, modify or convey a specific copy of the covered work, then the patent license you grant is automatically extended to all recipients of the covered work and works based on it. A patent license is "discriminatory" if it does not include within the scope of its coverage, prohibits the exercise of, or is conditioned on the non-exercise of one or more of the rights that are specifically granted under this License. You may not convey a covered work if you are a party to an arrangement with a third party that is in the business of distributing software, under which you make payment to the third party based on the extent of your activity of conveying the work, and under which the third party grants, to any of the parties who would receive the covered work from you, a discriminatory patent license (a) in connection with copies of the covered work conveyed by you (or copies made from those copies), or (b) primarily for and in connection with specific products or compilations that contain the covered work, unless you entered into that arrangement, or that patent license was granted, prior to 28 March 2007. Nothing in this License shall be construed as excluding or limiting any implied license or other defenses to infringement that may otherwise be available to you under applicable patent law. 12. No Surrender of Others' Freedom. If conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all. For example, if you agree to terms that obligate you to collect a royalty for further conveying from those to whom you convey the Program, the only way you could satisfy both those terms and this License would be to refrain entirely from conveying the Program. 13. Use with the GNU Affero General Public License. Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU Affero General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the special requirements of the GNU Affero General Public License, section 13, concerning interaction through a network will apply to the combination as such. 14. Revised Versions of this License. The Free Software Foundation may publish revised and/or new versions of the GNU General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies that a certain numbered version of the GNU General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that numbered version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the GNU General Public License, you may choose any version ever published by the Free Software Foundation. If the Program specifies that a proxy can decide which future versions of the GNU General Public License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Program. Later license versions may give you additional or different permissions. However, no additional obligations are imposed on any author or copyright holder as a result of your choosing to follow a later version. 15. Disclaimer of Warranty. THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. Limitation of Liability. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 17. Interpretation of Sections 15 and 16. If the disclaimer of warranty and limitation of liability provided above cannot be given local legal effect according to their terms, reviewing courts shall apply local law that most closely approximates an absolute waiver of all civil liability in connection with the Program, unless a warranty or assumption of liability accompanies a copy of the Program in return for a fee. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively state the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . Also add information on how to contact you by electronic and paper mail. If the program does terminal interaction, make it output a short notice like this when it starts in an interactive mode: Copyright (C) This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, your program's commands might be different; for a GUI interface, you would use an "about box". You should also get your employer (if you work as a programmer) or school, if any, to sign a "copyright disclaimer" for the program, if necessary. For more information on this, and how to apply and follow the GNU GPL, see . The GNU General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. But first, please read . PK!HW"TT&cranial_modeling-0.2.0.dist-info/WHEEL A н#J."jm)Afb~ ڡ5 G7hiޅF4+-3ڦ/̖?XPK!HyK)cranial_modeling-0.2.0.dist-info/METADATARn@}WZvBV HI}bUbvF{fywmgI+ ~%f Pz uj\EZ&9b,&c뀶B_ 5t['00&fp~{ A v`p5h 2:F˨*z L~P%q|p߆q14*&+6o/[۲Eg,նEὧm,o(%#<8? 2mQm]&UHl(#Zߓ9bx'Yn7|"gM͵ঋ^s@Q/fZGvn1GE!8HͼIZQӗvuJTɍ½>>!+Uxcc1}=4Nqa\)lYTA3ɜn@)7u]f5@?ÎVW)Hs[*b觚45[Y[ɑ}kl'8a'; Tʼni9j:Β2]߃=MgN٫c+JȐp*j|R 6d H)$<~m4Ry ĂbGSZvp O ezKVY(jv71|biy *yApj:uVĹ]egτ )?Zq9A bkX8aY9QM3bsڠNg+v \[^,8<&Gic7Edy ,qF4,՗pTǘ(_o`n'xεcǕuLj05uE|8z%~:_N+"Axdkã~uSxCX ml4e~ssfԓI q&n]zBK67OrH'K3j Ǎ.yVp&/rOV n}u*27"SI8. {(oT\l{דj:V;l:y]/ؕ?`{Rl!B|FFΔ2vaf[2^-gYWuz UuUTí0*R4rc;;(3&eڟ PRnp>$U0N&:A^>o2CmB>yQkE/r8ӆ#\p/,cSs>FhW }^ЗêlF$s/Mv4; <ߧt7s#wPc9H@*yJ"t0AgHu}YX'FcC:Gϸ!99ܫP2N-aV(ꂸrIGws->Ms$lw"PK!GWWcranial/examples/README.mdPK!> ubb!cranial/examples/gensim_lda.ipynbPK!PO{{xccranial/examples/w2v.ipynbPK!m N7N7[cranial/model_base.pyPK!3))cranial/models/gensim_models.pyPK! 3@cranial/models/nlp.pyPK!%_TuuYcranial/models/reporting.pyPK!T "kicranial/models/spacy_tokenizers.pyPK! Izr]],9scranial/models/tests/data/dandelion_nex.jsonPK!a{{(vcranial/models/tests/data/just_texts.txtPK!}U U *cranial/models/tests/test_gensim_models.pyPK!v cranial/models/tests/test_nlp.pyPK!e-hcranial/models/tests/test_spacy_tokenizers.pyPK!4' cranial/models/tests/test_tokenizers.pyPK!*wz z cranial/models/tokenizers.pyPK!^M8M8gcranial/online_training.pyPK!F