A Spherical Joke

Milk production at a dairy farm was low, so the farmer wrote to the local university, asking for help from academia. A multidisciplinary team of professors was assembled, headed by a theoretical physicist, and two weeks of intensive on-site investigation took place. The scholars then returned to the university, notebooks crammed with data, where the task of writing the report was left to the team leader. Shortly thereafter the physicist returned to the farm, saying to the farmer “I have the solution, but it only works in the case of spherical cows in a vacuum.”

More on the topic.

Tregime te Moçme Shqiptare {Mitrush Kuteli} [vizatime nga Gazmend Leka]

Kliko ne cilindo imazh per te filluar leximin. Burimi: MNVR.



Feynman on social sciences

On News Media

Le Pont du Gard, by Hubert Robert

A car drives over a bridge, and the bridge collapses. What does the news media focus on? The car. The person in the car. Where he came from. Where he planned to go. How he experienced the crash (if he survived). But that is all irrelevant. What’s relevant? The structural stability of the bridge. That’s the underlying risk that has been lurking, and could lurk in other bridges. But the car is flashy, it’s dramatic, it’s a person (non-abstract), and it’s news that’s cheap to produce. News leads us to walk around with the completely wrong risk map in our heads. So terrorism is over-rated. Chronic stress is under-rated. The collapse of Lehman Brothers is overrated. Fiscal irresponsibility is under-rated. Astronauts are over-rated. Nurses are under-rated.

– Taleb, Dobelli

Click Here for the full article.

Surely You’re Joking, Mr. Feynman! (click the image)

Kliko imazhin te lexosh

Kliko imazhin te lexosh

Build your own Google Alerts substitute (Python)

I loved Google Alerts, until it worked.  So, I was forced to create my own notifier (in the fastest possible way) and I wrote a basic python script that for my particular case, does the job pretty well.

The idea is the following: we search Google for our interested keyword/phrase, get the results back, find the total number of results, compare this number to our previous number of results (saved somewhere on a file), if greater: send email and save the bigger number on our file to use it for the next comparison. You can put this script on a crontab or, if in Windows, check Z-cron, and schedule it to run invisibly, say, every hour.

After the notification, I would go to Google and check the last 24 hour results to see what’s going on.

Below is the python script that searches for “Nard Ndoka” and sends an email to nardndoka@gmail.com when some page mentions this name. To use the script change the URL, email credentials, file paths, accordingly.


import urllib2
import re
from bs4 import BeautifulSoup
import smtplib


def remove_html_markup(s):
    tag = False
    quote = False
    out = ""

    for c in s:
            if c == '<' and not quote:
                tag = True
            elif c == '>' and not quote:
                tag = False
            elif (c == '"' or c == "'") and tag:
                quote = not quote
            elif not tag:
                out = out + c
    return out

url = "https://www.google.com/search?q=%22Nard+Ndoka%22"; #url to search, change accordingly

opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
find_html = opener.open(url).read()
soupg = BeautifulSoup(find_html)
results = soupg.findAll("div", { "id" : "resultStats" }).__str__() 

text = remove_html_markup(results) #remove html tags
final = re.sub(r"\D", "", text) #remove non-digit characters


with open("C:\Users\user\Desktop\searched.txt") as f:
    value = f.read()

if int(final)<=int(value):
    print "No new results"
else:
    f = open("C:\Users\user\Desktop\searched.txt", "w")
    f.write(final)      
    f.close()
    print "New results. New results. Sending mail.."
    def send_email2():
            import smtplib

            gmail_user = "nardndoka@gmail.com"
            gmail_pwd = "password"
            FROM = 'nardndoka@gmail.com'
            TO = ['nardndoka@gmail.com'] #must be a list
            SUBJECT = "New Alert"
            TEXT = "Google Search number was changed for Nard Ndoka. Now is ", final

            # Prepare actual message
            message = """\From: %s\nTo: %s\nSubject: %s\n\n%s
            """ % (FROM, ", ".join(TO), SUBJECT, TEXT)
            try:
                #server = smtplib.SMTP(SERVER) 
                server = smtplib.SMTP("smtp.gmail.com", 587) #or port 465 doesn't seem to work!
                server.ehlo()
                server.starttls()
                server.login(gmail_user, gmail_pwd)
                server.sendmail(FROM, TO, message)
                #server.quit()
                server.close()
                print 'Successfully sent the mail'
            except:
                print "Failed to send mail"
    send_email2()

Nard Ndoka v1.0, si te ndertosh nje chatbot qe flet shqip ne me pak se 10 min. (Python)

Shkurtimisht, ideja eshte kjo: te perdorim materialet e nje chatbot-i qe flet anglisht, por inputet dhe outputet do ti perkthejme ne gjuhen shqipe duke perdorur Google translate. Pra, useri do shkruaje ne shqip -> kjo dergohet ne Google Translate dhe perkthehet ne anglisht -> me pas chatboti merr tekstin ne anglisht, e “kupton” -> dhe i kthen pergjigje ne anglisht -> kjo me pas perkthehet me Google translate ne shqip, dhe useri e lexon pergjigjen ne shqip. Me pak fjale, nje telefon i prishur.

Chatbot-in e pagezova me emrin Nard Ndoka pasi me pelqeu niveli i inteligjences. Ja dhe nje pjesez nga chatimi im i kendshem me te,  mungonin vetem qirinjte:

nardi

Rreshtat qe nuk kane ‘>’ jane te Nardit.

Nese doni dhe ju Nardin ne kompjuterin tuaj, ndiqni hapat e meposhtme. Nese keni paqartesi, lexoni linket dhe dokumentacionet e tyre.

1. Shkarko Python dhe PyAIML

2. Shkarko nje AIML set. (Une perdora kete)

3. Ne direktorine qe shkarkove AIML-te, krijo nje fajl nardi.py dhe bej paste:

#!/usr/bin/env python
# -*- coding: latin-1 -*-
import urllib2
import aiml

k=aiml.Kernel()
k.learn("std-startup.xml")
k.respond("load aiml b")
k.setBotPredicate("name", "Nard Ndoka") #Emri i Botit

def translate(to_translate, to_langage="auto", langage="auto"): #funksioni qe kryen perkthimin ne Google Translate
    agents = {'User-Agent':"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)"}
    before_trans = 'class="t0">'
    link = "http://translate.google.com/m?hl=%s&sl=%s&q=%s" % (to_langage, langage, to_translate.replace(" ", "+"))
    request = urllib2.Request(link, headers=agents)
    page = urllib2.urlopen(request).read()
    result = page[page.find(before_trans)+len(before_trans):]
    result = result.split("<")[0]
    return result

if __name__ == '__main__':
    while True:
        print translate(k.respond(translate(raw_input("> "), 'en')), 'sq').decode('utf-8') #merr input -> perkthe ne anglisht-> merr pergjigje nga Nardi ne anglisht -> perkthe perseri ne shqip dhe printoje

Me pas ekzekuto skriptin, dhe VOILA, Nardi eshte i gjithi i joti!

Analyzing Job Listings to See Employers’ Most Desired Skills

To have a 1000 foot view of the job market requirements and preferences, I thought it would be a good idea to count the frequency of words in job descriptions and requirements posted by the companies themselves.  As the main source, I used newsletters sent by  the American University in Bulgaria (Career Center) to its alumni for open job positions posted from June until end of August (2013).  To see the nature of these job listings, see a sample of a single newsletter. The overall scanned text length was 94725 words.  Now, straight to the results, since I intended this to be just a quicky.

Most used keywords, without any kind of filtering:

top_keywords

Keywords that are related to certain fields of study:

field-related

Languages: (Most of the jobs are located in Bulgaria, hence the high bar for Bulgarian)

languages

Top technologies mentioned:

technologies

Most used collocations (sequence of words that occur together unusually often):

University degree; remuneration package; communication skills; problem solving; Human Resources; Competitive remuneration; Shared Services; English language; computer literacy; short-listed candidates; customer service;  Computer Science.

Tag Cloud (overall):

wordcloud

Technical Thoughts on Bullshit

First, to clarify why these are technical thoughts and not just thoughts. It all started with my not-so-great idea of having an automated bullshit detector, one that resembles a spam filter, but for bullshit. Since in our times that thing would be super-busy, I rolled up my sleeves (or to put it more realistically, I stopped masturbating). To actually build such machine it is quite challenging, so I thought I would first theoretically speculate on the principles that would make such machine possible. Beware, theoretical speculations might be only theoretically useful!

Let’s cut the crap and go straight to the bullshit part. One way, the simplest way, to prevent bullshit is to automatically filter out subjects related or that contain specific terms such as “initiative”, “leadership”,  “prediction”, “synergy”, “the government will” and so forth. The list of keywords can be expanded, reduced or altered according to user’s tolerance level of bullshit. But why these keywords? I’ll be quick: ‘initiatives’ – mostly used by NGO-s to initiate wire transfers, ‘leadership’ – used especially in university pamphlets while unintentionally throwing the dangerous idea that our leaders are educated, ‘prediction’ – mostly used by banks and investors to show us they can see the future but not the past, ‘synergy’ – used by business to describe the mutual exploitation that keeps them alive,  and ‘the government will’ – used by officials and government PR agencies such as the media.

Technically speaking, keyword filtering is easy to implement. Implementally speaking, the do verb would have been more straight-forward. But you have to understand, implement is more sophisticated and longer, and it increases the text length, which bullshit lovers interpret as: “too many lines, too much knowledge”.  Also, not to forget that ‘implement’ comes from Latin whereas ‘do’ comes from ‘scooby doo’.

Another, mechanical, way of spotting bullshit might consist in checking for over usage of terms and ideas, over a long-term period or sample. I realized this first when skimming Jehovah’s Witnesses’ pamphlets and noticed how many times the word ‘truth’ was mentioned. The number of occurrences was so vast, one starts to think there is something fishy going on. Then, an epiphany: it’s the lack of truth what makes them emphasize it. Compensating for the unseen at its dumbest.

The same happens with the word ‘intelligent’ (and its many synonyms) in business departments. They keep using and overusing these words until they don’t mean anything anymore; until they become conjunctions. Conjunctions of other conjunctions. But in business, and by business here I mean mostly corporate sized business, the bullshitter actually knows that the bullshitee (i.e. bullshit receiver) knows that is being bullshitted. I-know-that-you-know-but-let’s-pretend-to-not-know, that kind of situation. And they continue to do it, top-down, bottom-up.

Politics, on the other hand, is on a different league. Here, inferior bullshit is impossible to find. Everything is top-notch, and of course, top-down. In such situation, our anti-bullshit machine would be pointless, like a submarine looking for liquid matter. Imagine for a moment how bored our machine would become when scanning: bullshit-bullshit-bullshit-bullshit-bullshit-bullshit. Such machine would not even need 0s and 1s to work, a single state would be enough.

I have to stop at some point because I run the risk to go on indefinitely if I continue to bring examples. The truth is, my dream machine is very far away from becoming a reality, and we will still be surrounded by bullshitters and bullshitees for a long time. Until then, the least we can do is scooby dooby doo.

Banksy ngjallavitet

Kush po pyet per Banksy-in do thoni ju, e di, po per ata qe pyesin, do e marrin nje pergjigje ne tetor te ketij viti.

UPDATE: Behej fjale per shfaqjen e Banksyt ne New York. Shikoje: http://www.banksyny.com/home/index

Këtë e pëlqejnë %d blogues: