Sunil Jagadish


with one comment

Proceed to


Written by Sunil

2011.11.24 at 03:33 AM

Posted in Uncategorized

CAPTCHA Generator for .NET

with 5 comments

I needed a CAPTCHA generator for some stuff I’ve been doing. I looked around and did find some assemblies for .NET. I thought of coding my own CAPTCHA generator and decided to do this using C#. The result – You can download/contribute here. You can even download a sample usage of this in a Windows Forms application form the “Releases” section. This can even be used in ASP.NET applications. This assembly will need permission to write to disk (to store the CAPTCHA image generated).


using CAPTCHA;
CaptchaGenerator cgen;

Single custom coloured CAPTCHA
cgen.CaptchaColor = Color.FromArgb(130, 120, 130);
CaptchaText = cgen.GenerateCaptcha(@”G:\test1.gif”);
pictureBox1.ImageLocation = @”G:\test1.gif”;


Default Random Coloured CAPTCHA
CaptchaText = cgen.GenerateCaptcha(@”G:\test1.gif”);
pictureBox1.ImageLocation =

Written by Sunil

2007.07.25 at 07:26 PM

Posted in .NET


with 3 comments

What is an anagram?

Problem: Given a file containing words, re-arrange it such a way that all anagrams appear consecutively.


The result should be:

Here is a simple Perl + unix command line which does the job:

#!/usr/local/bin/perl -w

use strict;


while(<INP>) {
   my $sig = join ”, sort split(”,$_);
   print “$sig\t$_\n”;

Command line: ./ ip.txt | sort | awk -F’\t’ ‘{print $2;}’

Technique: Every word is associated with a signature. Here, a signature is nothing but the word with all the characters in it appearing in sorted order.

Here is how it looks:

abcd    abcd
abcd    cbad
abcd    dcba
pqrs    pqrs
pqrs    sqrp
xyz     xyz
xyz     zyx

Then, awk -F’\t’ ‘{print $2;}’ simply prints the original words (column 2).

Written by Sunil

2007.07.18 at 12:47 PM

Posted in Algorithms


leave a comment »

I’ve been downloading software from my MSDN Subscription account. MSDN provides a custom download application which (un)fortunately does not have the feature to schedule downloads to begin at a particular time. Also, I’m not allowed to use DAP or any other download accelrator to download from MSDN. Sharath had pointed me to this sometime back. PTFB is a trial version software, so, here is a free version of a similar (but much more simpler, lesser features & less jing-bang) application which does the same job. I’ve called it Click-O-Matic (for the lack of a better name :-))

A “just-works” version is ready. In case you are intrested to use this, leave a comment here with your e-mail ID and I can mail the setup file to you.

For the geekier ones here, I wrote this in C# using simple PInvoke to simulate the mouse clicks.

Written by Sunil

2007.07.18 at 11:54 AM

Posted in .NET

Smart Device Apps not smart yet on Vista

with 2 comments

After hunting around for a way to get internet connectivity on my WM 5.0 PPC Emulator, I came across an informative thread on MSDN Forums. The two suggestions given there didn’t work for me. Unfortunately the PPC Emulator cannot be cradled in Vista’s Mobile Device Center yet, which is sad. When is a fix coming for this? I hope I won’t have to wait for Orcas. [Ref: VSD team blog]

Written by Sunil

2007.04.28 at 06:36 PM

Posted in Uncategorized

Indexing data using Plucene

with one comment

If you are impressed with what Doug Cutting’s Lucene can do to your data, to enable super-fast searching, then, you can will be happy to know the existence of Plucene (a Perl port of Lucene). Off late I have been playing around with Plucene, indexing GBs of data, which I will have to query in a tight loop later.

So, why Lucene? Lucene stores data in the form of an inverted index which makes retrieval significantly faster compared to a normal indexing scheme.

A very simple example scenario where we have 3 documents containing different words. This is how a normal indexing scheme and an inverted index would index this data-

Normal Index
Doc1 – Bill Gates, Linus Torvalds, Richard Stallman
Doc2 – Steve Jobs, Scott Mc Nealy, Linus Torvalds, Bill Gates
Doc3 – Bill Gates, Steve Jobs, Larry Ellison, Scott Mc Nealy

Inverted Index
Bill Gates – Doc1, Doc2, Doc3
Linus Torvalds – Doc1, Doc2
Richard Stallman – Doc1
Steve Jobs – Doc2, Doc3
Scott Mc Nealy – Doc2, Doc3
Larry Ellison – Doc3

It is quite clear that a search for “Bill” on the inverted index will right-away return the documents in which the term appears. Whereas, in the case of a normal index, we’ll have to go through all the documents and check if the search term exists.

use Plucene::Document;
use Plucene::Document::Field;
use Plucene::Analysis::SimpleAnalyzer;

# Use the simple analyzer to tokenize the input in the default way
my $analyzer = Plucene::Analysis::SimpleAnalyzer->new();

# Create an object of Index::Writer which will write into the index
$writer = Plucene::Index::Writer->new(“/usr/local/my_index”, $analyzer, 1);

my $DocToIndex;
open(DOC, $DocToIndex);

# Create a new Document object, which will contain the fields & corresponding values
my $doc = Plucene::Document->new;

while(<DOC>) {
# Read from the file
my $line = <DOC>;
# Create a text field and store a unique ID in it. Generation of ID not shown here.
$doc->add(Plucene::Document::Field->Keyword(“id” => $id));
# Create a text field and store each line of the file in it
$doc->add(Plucene::Document::Field->text(“text” => $line));

# Add the document to the present index

# Merge multiple segment files created while indexing
undef $writer;

Plucene would have now created many files in /usr/local/my_index which together forms the index. The purpose of contents of each file is described in

Next, let’s see how we can query this index.

my $parser = Plucene::QueryParser->new({
analyzer => Plucene::Analysis::SimpleAnalyzer->new(),
default => “text”

# Prepare the query – search for Bill in the text field
my $query = $parser->parse(‘text:”Bill”‘);

# Which index to search?
my $searcher = Plucene::Search::IndexSearcher->new(“/usr/local/my_index”);

my @docs;
# A callback which is called every time a search hit is found.
my $hc = Plucene::Search::HitCollector->new(collect => sub {
my ($self, $doc, $score) = @_;
push @docs, $searcher->doc($doc);

# Search!
$searcher->search_hc($query, $hc);

# @docs contains Document objects, so, extract only the IDs from it by mapping it to a @results array
my @results = map {
} @docs;

# Print the ID of the documents which contained the search term
foreach my $id(@results) {
print “\nRes: “, $id;

This is a very simple implementation of Plucene. However it is highly scalable and many more complex applications can be built to make use of Plucene.

Written by Sunil

2007.04.25 at 03:05 PM

Posted in Uncategorized

Let’s Build A Compiler For The CLR

leave a comment »

Wow, only now I realized that Raj has mentioned my name in the context of his book which is titled – “Let’s Build A Compiler For The CLR“. Thanks Raj! His book is a wonderful piece of writing. Raj has struck a very nice balance between the geek-ness and simplicity. Do read it if stuff like Compilers fascinates you.

Written by Sunil

2007.03.09 at 08:34 PM

Posted in .NET, Programming