Archive for the ‘Computer Science’ Category
For the love of F#
Enter F#, the ideal middle ground that I’ve always wanted. F# is a functional programming language from Microsoft Research which sits on top of the .NET framework and as all other .NET languages are, can easily integrate with its other less functional cousins. The best feature of F# for me has been it’s a gentle yet intellectually stimulating introduction to the world of .NET programming that I’ve so cautiously eschewed so far. The thing with learning a new language is not learning new syntax. Anyone can do that over a weekend. The hard part is learning its idiosyncrasies – the libraries that let you do real things, the idiomatic usage of language constructs and the external peculiarities that go with the language be it make files or the umpteen build systems that go with C++ or ant and maven with java or the comfortable world of visual studio which makes you believe building is just one keystroke away.
Learning F#, as obscure as it is, means that I will not be spending a lot of time learning the massive libraries that go with the .NET world again and again, but just once. I can just as easily use them in a different world. Code Reuse is a big deal, but something that’s much bigger is the knowledge reuse you get with platforms like .NET.
So, how does one get started with F#? Well, thanks to Microsoft, it has been made absurdly simple. The first thing for anyone programming within the windows environment is Visual Studio. You can get the free Visual Studio Integrated shell here and F# here.
Once that’s setup, then you can hack away.
Signing off,
Vishnu Vyas
How much time does it take to secure a linux system?
4 hours, yeah.. 4 fucking hours, especially if you are a newbie to the whole “networking” -iptables, ipchains thingamajiks…
I am trying to setup a small python based annotation engine and I am planning to let it into the wild on the internet (the horror!!) and as any normal chump who’s seeing the whole “web is the way to go for apps” mentality everywhere, I setup my application server behind apache using mod_proxy and let it run for sometime. And sometime in the fast moving internet space is 3 days, and on the third day.. I check my logs and I see lots and lots random people from all over the world trying to hack my damn server. Well this story is not about them.
So I decide to setup iptables - thats a pretty darn good idea, you might say.. except for one thing.. I don’t know anything about iptables. So, after browsing for almost an hour on tutorials, howtos, message boards, google groups..(has anyone noticed the search in google groups sucks?) I still couldn’t get anywhere.
Every tutorial out there seems to want to teach me what a TCP packet is or what link layer protocols are or the history of the whole IPTables filtering. Many would say thats great, you learn from the basics, you get your concepts straight. And to them I say “F*#$ you”. I just want to secure my damn server, not take the RHCE. And finally after three more hours of digging and reading the various “subtleties” of the IP protocol, I finally maanged to figure out what to do to secure my server.
Write 2 lines. Yeah, just 2 lines - the result for spending 4 fucking hours is not enlightenment, just getting to write two lines. For those who are using mod_proxy and don’t have linux networking guru to service you at your every beck and call, here are those two lines :
/sbin/iptables -A INPUT -p tcp -m tcp -s “your-hostname/ip/trusted subnet” -dpt:”application server port” -j ACCEPT
/sbin/iptables -A INPUT -p tcp -m tcp -dpt:”application server port” -j DROP
Where “your-hostname/ip/trusted subnet” should usually refer to the machine on which apache is running, In my case, the same machine. The “application server port” is the port on which CherryPy listens, by default i think its 8080. If you have multiple instances of CherryPy running, you would need to add similar rules for each instance (note : add the ACCEPT rules first, before you do the DROP rules).
Signing off,
Vishnu Vyas
Is Eclipse the next Emacs?
Emacs, for those who know me, I am an big fan of, almost to the point of being religious. And and recently I’ve found another one - Eclipse. Emacs, as most would know is the ultimate editor that is written in a dialect of lisp called elisp (which predates attempts to standardize lisp and common lisp) - was the result of a time and a place where almost every programmer wrote lisp, AI was a buzzword and Symbolics was a household name.
Thus, emacs, naturally was written in the language of its time - lisp. With over 3o years behind its belt, emacs is now a mature multipurpose software application that most people go to the extent of calling it an operating system. The things that made emacs such a huge success story was not only was it written in lisp, the language of the day, it was also extensible in lisp, the language that most programmers who first used emacs knew. Thus, every pet-peeve of almost every programmer was solvable with just a few lines of elisp. Extensibility - Thats what made emacs a huge success. With packages for everything from terminal emulation, remote editing, newsreaders and even a web browser - Emacs is one multipurpose software application.
With, the coming of the AI winter, lisp lost ground and eventually gave way to Java. Java, being severely used in the past 10-20 years has become the lingua franca of the time. And, with Java we have another emacs incarnate, something that’s not only written in Java, also extensible in Java - eclipse. It has the same extensibility as emacs has , though not as mature in terms of extensions as emacs. So, Is Eclipse the next emacs?
Signing Off,
Vishnu Vyas
Flexing my fingers.
It’s been quite a while since I wrote anything of any significance these days. My blog seems to have moved into a more or less vegitative state. Also, since I am in line for quite some writing in the coming days ahead I think Its about time I did some emergency CPR here and get this blog back to life. Anyway, as a start, maybe I should start with a story. No, its not one about damsels in distress and charming princes. Its a more mundane story about programming.
This happened not so long ago, I’ve always been a pretty good C++ programmer, and of late I’ve been doing a lot of my programming in python. Python, if I hadn’t mentioned before, is this amazing dynamic language which is amazingly easy to use and more importantly maintain. Its one great language, except for its speed. For most practical purposes I never had any problems with the speed of python. But, sometimes when you have to wait for an hour to get some output on some data you are processing, it gets irritating. The task here was simple decipherment. I was basically using the EM algorithm (or to be more precise, the forward backward algorithm) for deciphering a piece of text. I managed to write a pretty good implementation of it in python, but it was slow - real slow.
So, I sat down and rewrote the forward backward algorithm in C++ (in the time that my python program was running) and the speed difference was unbelievable. My C++ code went 40 times faster than my hand optimized, psyco-compiled python code. If you have programmed in both C++ and python, you already knew that. C++ is faster than python, atleast 10-fold on the average. But thats not the lesson here.
The most amazing thing was, I actually managed to write, debug and get a working version of the C++ program in less time than I would have expected it to take. That’s the most surprising part. So, I’ve decided to share my experience with you guys. One of the main things that really helped me during my C++ development was not only did I have a very clear goal of what I am doing (which most software projects rarely have), but I also had a very clear goal of how I was going to do it. This was because, I had already implemented my original version in python.
Python, as someone has already said, is executable psuedo-code. Not only did I have a very clear idea of what data structure to use where, How to model the various elements (in this case, the plain text, the cipher text, etc..) and how my models interact with each other. This was all ready done, the only thing remaining was more or less manual translation from python to C++. The whole lesson here is that python is not only a great language for exploratory programming, but its a great language to prototype as well.
I am sure, that if I had started all this in C++ from the beginning, I would have been just too lazy to do all the refactoring that my code would have required. Changing from one type of object-method interface to another is pretty much a pain in C++. On the other hand, by the time I had my python code running, not only was it a correct working version, but a well designed version as well. Any screw-ups in the initial design were promptly corrected without too much effort. Any useless “just in case virtual functions” that would have cropped up in my C++ were not there because, refactoring is so easy in python that you can add them as you go. And most of all, you can test for all the bigger logical errors that occur when you have multiple objects interacting with each other in a complicated program in a python program easier than in a C++ program.
As, an unexpected side effect, I picked up a couple of good habits from python that I would have never bothered to do in C++ for my hobby programming. For example, unit-testing. I do write unit-tests, only if my projects get big enough that I think Its worth the trouble, but with python, you always have this simple if __name__ == '__main__' which serves as a poor man’s unit test. Not too much trouble, yet worth the every second you invest in writing simple tests there. These days, I do it as a matter of habit for all my python modules, and thats one good habit that spontaneously extended to my C++. With a bit of preprocessor magic, you can do pretty much the same type of poor-man’s unit-testing in C++ as well, and this did save me some pain later.
Now, that I’ve rather incoherently rambled on, I would like to summarize my experience. With, python you can not only prototype with great speed and get a clean implementation, you also end up picking up a lot of good habits on the way, that not only makes you a better python programmer, but a better C++ programmer as well!.
Signing off,
Vishnu Vyas
Furstration, thy name is Javascript.
Do I need to say anymore?
Is language huffman coded?
Huffman coding, for the uninitiated is a sort of compression scheme in computer science that assignes short binary representations to frequently used characters and longer binary representations to the less frequently used. The idea is that what you lose by encoding less frequently occuring characters with longer bitstrings, you gain by encoding more frequently used characters with shorter bitstrings.
Having said that, something I noticed only lately is that human language seems like its encoded using a similar scheme. The more frequent a word is used, the shorter it is. For examples, most of the closed class words like prepositons and determiners and sometimes even commonly used verbs are short, while the ones which are longer are usually rarely used words. Since I’ve been bragging lately about all this computing power that I have, Its about time to flaunt it.
I used the enron corpus which contains about 250,000 unique email messages totalling to approximately 1.5 gigs of text, small but not too small. I plotted the word frequency vs length of the word and the results seem as expected (click the image to see a better picture).
There is an initial peak when the length of characters is about 3 or so and then the frequency rapidly declines to almost nothing when the length increases to around 13 or so. A much clearer picture of the encoding that goes on here can be obtained if we normalize the frequency counts by the number of words of a given length. There are very few one letter words (namely the determiner ‘a’) but more two letter words (’an’,'to’,'in’,etc..) and so on.
At first look the above graph looks negative exponential to me (meaning, good job on the compression). Infact this is what one would expect if someone were to do a good job on the compression.
Its really suprising to realise that what usually requires sophisticated critical thinking (schemes like huffman encoding) can also be easily reproduced by random, unplanned phenomena like linguistic evolution.
Signing Off,
Vishnu Vyas.
Big Numbers 2 - The Hadoop Version.
Well, I’ve talked about big numbers before, but never before have I thought that having a 1.5 gigabytes of text corpora to be actually small. Well, guess what - it is small. Tiny even. This is what happens when you have the power to process multiples of terabytes and all you have is puny 1.5 gigabytes of data. That’s right, I’m now setup on two clusters one of 3 nodes and one of 19 nodes all running Hadoop - an open source version of the Google File System and the Google Map Reduce program.
So, what is all the fuss about hadoop and map reduce? Haven’t people been doing such stuff for a long long time? - Well, yes and no. The idea of distributing your computation and then combining the results has been around for long, but what hadoop does is that instead of moving your data to the place of computation, it moves computation to the location of data. And thus allowing to run multiple independent jobs called ‘maps’ which work on each chunk of data independently and then one can use the output to do a ‘reduce’ which then combines all the output of a map step into the final result that one desires. Programming in this model is fun, powerful and furthermore really really simple.
I will probably put up some ajaxy demos of some results that I’ve with the my new found computing power quite soon, so till then, stay tuned.
Signing Off,
Vishnu Vyas
Big Numbers!
You know you are going to be an engineer if not a scientist if numbers fascinate you. And every time you encounter a big number for the first time your eyes pop open, baffled at even the existence of the concept - How can such large numbers exist? How many is really a mole? What! - the sun is 98 million miles from here? How far is 98 million miles really? These kinda questions that you could ponder about all day and your brain would just tire from thinking about them.
Then there comes a time, so sneakily that you don’t suspect it has come. You become used to the numbers. They become fact. They become usual and finally they become boring. Now the sun is no longer the exciting and unimaginable 98 million miles it was when you heard it for the first time. You take that mole is that ugly big number. Its kind of a coming of age thing, rites of passage if you will.
And Computer Science is no different, it offers its own bag of big numbers. In fact, you wouldn’t believe me if I told you most of the computer scientists are obsessed with big numbers. And so was I. My jaw dropped when I heard things like this program consumes 300 MB of ram. It takes 7 hours to get over. We need around 300 computers to do that. The pan-galactic gargle blaster effect. I was just as excited and confused as I was when I heard about the sun or the mole. But working with large data sets, multi- million word corpus, gigabyte size databases for someone whose biggest database was a puny address book, these big numbers have become usual, rather sneakily if I might add. They’ve become matter of factly, common and boring. Its not the pan-galactic gargle blaster anymore. Its not even strong russian vodka that any fellow comrade would swear by. Its kinda become beer - American beer.
I guess I’ve come of age, the necessary rite of passage before I can contemplate terabytes, exponential order and thousand-node clusters.
Signing Off,
Vish(!hick!) nu Vyas
The Need for Esoteric Languages.
Caution: When I mean esoteric, I mean non-mainstream as opposed to things like INTERCAL or brainfuck.
The first thing that anyone who gets to know me in a professional capacity seems to find unusual about me is that I can program in a couple of languages that are very non mainstream. Things like Haskell and Smalltalk. They consider that its a rather time wasting if not an utterly useless hobby.
One thing that a friend of mine asked is that why is there even these languages in the first place as no one even practically uses it. That’s one question that I have never bothered to ask myself, in-spite of getting to play with more than 20 or so languages. He considers that languages such as Haskell are practically useless in the sense that there is almost no mainstream development going on and there is very little point in even trying to develop new ones.
Being a language enthusiast I came up with a plethora of standard reasons that language enthusiasts do. Trite boring old reasons like productivity, higher expressive power and what not. Then there were always reasons which I dish out, in a half-believing manner like how if it weren’t for Sun’s marketing muscle smalltalk would have been the order of the day and things like that. But what struck me unusual was the part of the question about what purpose if at all any, do they serve, apart from satiating the bloated ego’s of self-proclaimed language enthusiasts.
But only on some deeper thinking could I answer that question myself in a much more clearer manner. Either through short sight or arrogance I’ve never seen this angle. They are fertile breeding grounds for newer ideas, paradigms and sometimes even ground breaking innovations in the way we program (as opposed to just newer linguistic constructs). It is entirely plausible that those same innovations come from the mainstream languages, and once in a while they do - like STL for instance. But generally they don’t.
That in my opinion is the bane of any mainstream language. Mainstream languages by virtue of being mainstream have a tradition in the way which things are done. Style guides, language restrictions, limits of the runtime or other restrictions. New innovations even if they are good need a lot of pushing from within a community to gain any acceptance. On the other hand, in fringe languages like Haskell or ML there is lesser community inertia if any at all and they can easily push newer innovations, its much easier to fork into newer territories or basically explore the unexplored.
These are not just fringe languages like I’ve referred to them before, they are in fact frontier languages. They are usually at the edges of current paradigms and sometimes they just fall over the edge flat without ever coming up with anything new. On the other hand sometimes truly interesting ideas come out of it. Many a time, these ideas are incorporated into older, more mainstream languages. But once in a while, there comes an idea or a philosophy that’s associated with a language that’s so different, ground breaking and amazing that it simply is not possible to do the back-porting anymore. Then the language has no choice but to go mainstream - case in point is that of Ruby and Ruby on Rails.
That’s what we need those esoteric languages for. That’s where these language enthusiasts come in. They are the ones who will discover the next big thing. That’s precisely the need for esoteric languages.
Signing Off,
Vishnu Vyas.
The New Revolution - Computer Science.
Being an undergraduate in computer science for the past four years has affirmed my belief in computer science being a true and separate science in is own right as opposed to being a branch of engineering or just applied mathematics. The more I learn about what constitutes the fundamentals of computer science, the more I’m sure of it. But to characterize it in a simple way is currently beyond my limited understanding of this vast branch of human knowledge.
Computer Science is as much about computers as microbiology is about microscopes or astronomy is about telescopes. Agreed, both the microscope and telescopes are essential instruments in their fields, yet they are not the object of study themselves, that onus falls on the field of optics. Similarly computers are vital instruments in computer science, but they are not the objects of study themselves. The systematic study of devices which can perform the fundamental function of a computer - computation, in itself is computer engineering.
Computer Science is the formal study of process and information. It is the study of fundamental limits at which discretely and unambiguously specified process (called algorithms) work. It is the study of how information is transformed from one form to another, making it more useful in one way or the other. It is the study of how those above mentioned process interact with information and vice versa. Computer Science in essence is the study of computation and information.
If we can agree that computer science is the study of computation and information, then why should we proceed calling it computer science at all. I’m sure astronomers don’t call astronomy telescope-science or microbiologists microscope-science. Why not call computer science for what it actually is? - Computational and Informational Science.
Signing Off,
Vishnu Vyas.

