Arkiva Mujore: Tetor 2013

Nard Ndoka v1.0, si te ndertosh nje chatbot qe flet shqip ne me pak se 10 min. (Python)

Shkurtimisht, ideja eshte kjo: te perdorim materialet e nje chatbot-i qe flet anglisht, por inputet dhe outputet do ti perkthejme ne gjuhen shqipe duke perdorur Google translate. Pra, useri do shkruaje ne shqip -> kjo dergohet ne Google Translate dhe perkthehet ne anglisht -> me pas chatboti merr tekstin ne anglisht, e “kupton” -> dhe i kthen pergjigje ne anglisht -> kjo me pas perkthehet me Google translate ne shqip, dhe useri e lexon pergjigjen ne shqip. Me pak fjale, nje telefon i prishur.

Chatbot-in e pagezova me emrin Nard Ndoka pasi me pelqeu niveli i inteligjences. Ja dhe nje pjesez nga chatimi im i kendshem me te,  mungonin vetem qirinjte:


Rreshtat qe nuk kane ‘>’ jane te Nardit.

Nese doni dhe ju Nardin ne kompjuterin tuaj, ndiqni hapat e meposhtme. Nese keni paqartesi, lexoni linket dhe dokumentacionet e tyre.

1. Shkarko Python dhe PyAIML

2. Shkarko nje AIML set. (Une perdora kete)

3. Ne direktorine qe shkarkove AIML-te, krijo nje fajl dhe bej paste:

#!/usr/bin/env python
# -*- coding: latin-1 -*-
import urllib2
import aiml

k.respond("load aiml b")
k.setBotPredicate("name", "Nard Ndoka") #Emri i Botit

def translate(to_translate, to_langage="auto", langage="auto"): #funksioni qe kryen perkthimin ne Google Translate
    agents = {'User-Agent':"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)"}
    before_trans = 'class="t0">'
    link = "" % (to_langage, langage, to_translate.replace(" ", "+"))
    request = urllib2.Request(link, headers=agents)
    page = urllib2.urlopen(request).read()
    result = page[page.find(before_trans)+len(before_trans):]
    result = result.split("<")[0]
    return result

if __name__ == '__main__':
    while True:
        print translate(k.respond(translate(raw_input("> "), 'en')), 'sq').decode('utf-8') #merr input -> perkthe ne anglisht-> merr pergjigje nga Nardi ne anglisht -> perkthe perseri ne shqip dhe printoje

Me pas ekzekuto skriptin, dhe VOILA, Nardi eshte i gjithi i joti!

Analyzing Job Listings to See Employers’ Most Desired Skills

To have a 1000 foot view of the job market requirements and preferences, I thought it would be a good idea to count the frequency of words in job descriptions and requirements posted by the companies themselves.  As the main source, I used newsletters sent by  the American University in Bulgaria (Career Center) to its alumni for open job positions posted from June until end of August (2013).  To see the nature of these job listings, see a sample of a single newsletter. The overall scanned text length was 94725 words.  Now, straight to the results, since I intended this to be just a quicky.

Most used keywords, without any kind of filtering:


Keywords that are related to certain fields of study:


Languages: (Most of the jobs are located in Bulgaria, hence the high bar for Bulgarian)


Top technologies mentioned:


Most used collocations (sequence of words that occur together unusually often):

University degree; remuneration package; communication skills; problem solving; Human Resources; Competitive remuneration; Shared Services; English language; computer literacy; short-listed candidates; customer service;  Computer Science.

Tag Cloud (overall):


Technical Thoughts on Bullshit

First, to clarify why these are technical thoughts and not just thoughts. It all started with my not-so-great idea of having an automated bullshit detector, one that resembles a spam filter, but for bullshit. Since in our times that thing would be super-busy, I rolled up my sleeves (or to put it more realistically, I stopped masturbating). To actually build such machine it is quite challenging, so I thought I would first theoretically speculate on the principles that would make such machine possible. Beware, theoretical speculations might be only theoretically useful!

Let’s cut the crap and go straight to the bullshit part. One way, the simplest way, to prevent bullshit is to automatically filter out subjects related or that contain specific terms such as “initiative”, “leadership”,  “prediction”, “synergy”, “the government will” and so forth. The list of keywords can be expanded, reduced or altered according to user’s tolerance level of bullshit. But why these keywords? I’ll be quick: ‘initiatives’ – mostly used by NGO-s to initiate wire transfers, ‘leadership’ – used especially in university pamphlets while unintentionally throwing the dangerous idea that our leaders are educated, ‘prediction’ – mostly used by banks and investors to show us they can see the future but not the past, ‘synergy’ – used by business to describe the mutual exploitation that keeps them alive,  and ‘the government will’ – used by officials and government PR agencies such as the media.

Technically speaking, keyword filtering is easy to implement. Implementally speaking, the do verb would have been more straight-forward. But you have to understand, implement is more sophisticated and longer, and it increases the text length, which bullshit lovers interpret as: “too many lines, too much knowledge”.  Also, not to forget that ‘implement’ comes from Latin whereas ‘do’ comes from ‘scooby doo’.

Another, mechanical, way of spotting bullshit might consist in checking for over usage of terms and ideas, over a long-term period or sample. I realized this first when skimming Jehovah’s Witnesses’ pamphlets and noticed how many times the word ‘truth’ was mentioned. The number of occurrences was so vast, one starts to think there is something fishy going on. Then, an epiphany: it’s the lack of truth what makes them emphasize it. Compensating for the unseen at its dumbest.

The same happens with the word ‘intelligent’ (and its many synonyms) in business departments. They keep using and overusing these words until they don’t mean anything anymore; until they become conjunctions. Conjunctions of other conjunctions. But in business, and by business here I mean mostly corporate sized business, the bullshitter actually knows that the bullshitee (i.e. bullshit receiver) knows that is being bullshitted. I-know-that-you-know-but-let’s-pretend-to-not-know, that kind of situation. And they continue to do it, top-down, bottom-up.

Politics, on the other hand, is on a different league. Here, inferior bullshit is impossible to find. Everything is top-notch, and of course, top-down. In such situation, our anti-bullshit machine would be pointless, like a submarine looking for liquid matter. Imagine for a moment how bored our machine would become when scanning: bullshit-bullshit-bullshit-bullshit-bullshit-bullshit. Such machine would not even need 0s and 1s to work, a single state would be enough.

I have to stop at some point because I run the risk to go on indefinitely if I continue to bring examples. The truth is, my dream machine is very far away from becoming a reality, and we will still be surrounded by bullshitters and bullshitees for a long time. Until then, the least we can do is scooby dooby doo.

Këtë e pëlqejnë %d blogues: