Kruskal’s Algorithm

I planned to write a serious of posts on working and analysis of some of the basic algorithms. I also planned to give link to a JAVA implementation of the same. I will start off this series by first describing the Kruskal’s Algorithm for finding the cost of the Minimum Spanning Tree in a graph.

Minimum Spanning Tree: A minimum cost spanning tree is a subgraph induced on the original graph such that the induced sub-graph is a tree on all the vertices on the original graph and the sum of weights of the edges on this graph is the minimum among all the possible spanning trees of the graph.

The basic intuition behind the Kruskal’s algorithm is that the least weighted edge of the original graph will appear in the minimum spanning tree. Hence, sorting the edges based on their weight might be a first step in building the spanning tree. Does that mean, the first n-1 least weight edges are sufficient to build the spanning tree (If n is the number of vertices in the graph) ? We also need to ensure that the edges picked for the spanning tree do not form a cycle. Hence, that is the second key observation. The two observations is formalized below as algorithm:

Vector Edges = list of edges in the input graph
Sort the edge list based on their edge weight
Iterate through the sorted list:
    If adding this edge to the MST does not create a cycle:
        Add this edge to the MST
    Else
       Drop this edge and continue to the next edge in the list

The algorithm above can be easily implemented. The only difficult part of this is in line 4. To check if adding an edge will create a cycle. This can be done a subroutine call to a standard search algorithm in the graph, which can detect a cycle.

Let us now analyse the complexity of the above algorithm.

Line 2: O(|E| log( |E| )

Line 4: O( |E| + |V| )

Since Line 4 is repeated O(|E|) times, the overall complexity will be

O( |E|^2 + |E|*|V| + |E| log( |E| ) ) . And in the worst case,

|E| = O( |V|^2 ),therefore this implementation has a worst case complexity of O( |V|^4  ).

We can modify the above algorithm by using a disjoint-set data structure. The working of this data structure can be found here.

Here is the modified algorithm:

Vector Edges = list of edges in the input graph
Sort the edge list based on their edge weight
Iterate through the sorted list of edge E as (u,v):
    Let pu be the parent of vertex u in the disjoint set
    Let pv be the parent of vertex v in the disjoint set
    if pu !=pv:
        Add the edge to the MST
        Make parent of u = parent of v in the disjoint set data structure

Let us now analyse the complexity of this new algorithm.
Line 2: Sorting takes O( |E| log( |E| ) ). This is same as before

The entire loop takes an amortised complexity of O( |E| log( |V| ) ). This is because Line 4 and Line 5, uses the disjoint set find operation.

And this has an amortised complexity of

O( log( |V| ) ). Hence, the overall complexity of the loop is

O( |E| log( |V| ) ). And as before, if we consider the worst case complexity of

|E| = O( |V|^2 ), the algorithm takes O( |V|^2 log( |V| ) ) time.

Here is a link to the JAVA implementation of Kruskal’s Algorithm –

__name__ for Python class and Python function

This is my first post in the series of small random learnings.

The attribute __name__ is by default present in every python function. But suprisingly an object of a class doesnt have this attribute unless specifically specified.


class example():
    def classFunction():
        return "test"

obj = example()
print obj.__name__  #This is an error

On the other hand ,

def functionA():
    return "A"
def functionB(params):
    return params.__name__

print functionB(functionA) # prints functionA

Neo 4J Graph Database

In this post, I will give a brief introduction to Neo 4J database, installation and basic usage with JAVA .

Firstly, Neo4J is a NO-SQL database. It is used to store data in the form of a graph data structure , i.e. a structure having nodes and edges(called relationships) among the various nodes. This form of database finds its use case in many applications such as storing a social network in systems like facebook , twitter ,etc . More information on what Neo 4J can be found here .

Now, coming to the setup of neo 4J graph database. You can download the installation package from the official site here. Save the tar in a local folder and extract the contents. For this post I am going to run a sample code in neo 4j using eclipse(Details about eclipse can be found here).

Open the eclipse IDE and create a new project , say “firstNeo4jApp”. Add a class named graphDb.java in the default package. Now here comes the important step, to add the neo4j libraries to the build path of eclipse.(Improperly configured build path can lead to errors while compiling the code). To add the neo4j libraries , right click on the project name > Build Path . In the window that appears, select libraries tab and click on external JAR. Now, browse to the downloaded neo4j package folder and go to the lib sub folder. Select all the JAR files in that folder and click on OK. Now, refresh the project and the build path will be configured. You will be able to use the neo4j libraries in you code now.

Let us now look at a simple neo4j example.In this example we will see how to create a node ,set relationship between two nodes and how to get the relationship between two nodes.

Creating a Node in neo 4j


public Node createNode() {
	Node n = null;
	Transaction tx = GraphDb.getDb().handle().beginTx();

	try {
		n = this.gdb.createNode();
		n.setProperty("property1", "property1 key");
		tx.success();
	} finally {
		tx.finish();
	}
	return n;
}

Every write operation to the graph in neo4j must be written within a transaction. For this purpose , we will use the interface org.neo4j.graphdb.Transaction. To start a transaction, we will use the beginTx() function of the org.neo4j.graphdb.GraphDatabaseService interface. this interface also provides the function createNode() which creates a node in the graph. We can set numerous properties to the node based on a simple key-value schema. Also, it is important to finish every transaction. The transaction will not release the locks or memory it has acquired until it has been finished.

Set relationship between two nodes

public static void set(Node A, Node B) {
	Transaction tx = GraphDb.getDb().handle().beginTx();
	Relationship relship = null;

	try {
	        RelationshipIndex rolesIdx = GraphDb.getDb().relationshipIndex();
                
                //Creates a relationship between node A and node B  
	        relship = A.createRelationshipTo(B, RelationshipEnum.RELATED);
                
                //Adds special properties to relation; a key-value pair
		relship.setProperty("key 1", "value 1");
                
                //Adds the relationship object to the graph 
		rolesIdx.add(relship, "name", "related");
		rolesIdx.add(relship, "id", relship.getId());
		tx.success();
	} finally {
		tx.finish();
	}
}

public enum RelationshipEnum implements RelationshipType {
		RELATED
}

Again, since a write operation is performed on the graph, we have to enclose it within a transaction. We can also set properties to the edge(or relationship) like in this case, the property is “key 1” and the value is “value 1”. Finally , the Relationship object is added to the graph. We are intentionally storing it as “name”:”related pair and “id”:id pair,so that we can query based on the name property also.

Get relationship between two nodes

public static List get(Node A, Node B) {
	RelationshipIndex rolesIdx = GraphDb.getDb().relationshipIndex();
        
        // Get the edge(or relationship) between Node A and Node B having the key value pair as "name":"related" 
	IndexHits roles = rolesIdx.get("name", "related", A, B);

	List rships = new ArrayList();

	//Convert the IndexHits to a List 
        try {
             
	    for (Relationship role : roles) {
                rships.add(role);
	    }
	} finally {
		roles.close();
        }

	return rships;

}

As for the querying the graph, since no write operation is performed on the graph, hence , there is no need to enclose it within a transaction.

The above pieces of code gave a overall idea of how to create a node , create a relationship between two nodes and get a particular kind of relationship between two nodes. The entire code for all three can be downloaded here .

Django for beginners

Hello,

This is my first blog post and I want to dedicate this to all those web developers out there , who want to move to django for backend developement.

This post contains details of installing django and setting up the basic settings to get started with. For all those who are wondering what is django, please look up here.

Prerequisites :

1) This post contains the details of installing django on a linux machine. For windows and mac users, the installation procedure might slightly vary. However the setting up of django settings, once the installation is done remains the same.

2) The post is written with the assumption that the user is familiar with basic python. For tutorials on python, please look up here.

Installation Procedure:

Now that we are done with the talking lets get our hands wet. To install django, download the tarball of the official release here .

Please make sure you download the official release from the link in topic 1 of the page mentioned above. Once the tar is downloaded, extract it to a suitable folder of you choice. In my example , I am extracting it to /home/abinav.

To extract the tarbal, run the following command on the terminal:

tar -zxvf Django-1.3.1.tar.gz

Once the folder is extracted, change the working directory into Django-1.3.1 from terminal(command to do that is listed below)

cd Django-1.3.1

Inside this folder, you can notice that there is a script called setup.py . To install django, you have to run this script. Type the following command on the terminal(This requires python to be installed on your computer. You can download it from here.)

python setup.py install

Thats it!! Now django would be installed on your computer.

Creating a new Project

Now that django is installed successfully, we shall look at how to create a new project and set up the settings.

To create a new project, cd into the folder where you want the project files to reside and run the following command

django-admin.py startproject projectname

You can replace the projectname with the suitable project name you want. For this post I will use the name ‘projectname’ through out.

This will create a new directory in your working folder, with the name projectname .

This directory will have the following files:

manage.py

__init__.py

settings.py

urls.py

Before I get into how to setup the settings, I will give a brief about what the files are.

__init__.py is there to tell python that evry subdirectory uunder this directory should be part of sys.path

manage.py is like a linker file, which links tells python to look for the settings in the file called settings.py

settings.py : This is the most important file which contains the settings of your project. We shall look at how to put the settings in the next paragraph.

urls.py : This file contains the urls your project will need to look up and the views it needs to connect to when hit on that url.

Setting up the settings.py file

Now that we have a basic idea about the files, let us look at how to set up a basic settings.py file.

The first line of the settings file conatins  the following line

DEBUG=True

This is to tell django that this is in development stages. So django will provide detailed errors of what went wrong, whenevr a 500 or 404 error is encountered. Once the entire developement is done change this to

DEBUG = False

and the conventional 404 and 500 pages will be displayed.For this post we will work with DEBUG = True .

Next line contains

ADMINS = ()

Fill in the name and email adress , so that django will mail the details of a 500 error whenever it is in DEBUG=False mode. For now we will leave it blank.

Next comes the major part of the settings file.

DATABASES = {
‘default’: {
‘ENGINE’: ‘django.db.backends.’, # Add ‘postgresql_psycopg2’, ‘postgresql’, ‘mysql’, ‘sqlite3’ or ‘oracle’.
‘NAME’: ”, # Or path to database file if using sqlite3.
‘USER’: ”, # Not used with sqlite3.
‘PASSWORD’: ”, # Not used with sqlite3.
‘HOST’: ”, # Set to empty string for localhost. Not used with sqlite3.
‘PORT’: ”, # Set to empty string for default. Not used with sqlite3.
}
}

This contains the details of how django should connect with the database.

‘ENGINE’ : specifies which kind of database to use. For this post we will use mysql database.

Hence,  it should be

‘ENGINE’ : ‘django.db.backends.mysql’

The next line contains the name of the database to use. For this post I will use the database name , ‘projectname_db’.So the line should look like :

‘NAME’ : ‘projectname_db’

The next to lines gives the details of the username and password for connecting to the mysql database. For this post we will assume, the username is ‘user’ and pasasword is ‘password’

‘USER’: ‘user’

‘PASSWORD’ : ‘password’

The last line tells a specific posrt to use while connecting to mysql server. For now we will leave it blank and let mysql use the default port.

the line after DATABASES contains information about timezone and characterset. For now we will use the default values.

The two lines after this are fairly important. They tell django where to look for the media folders which the project uses.

If you are testing it on your computer, put the media files(js,css,images) in the folder /var/www and mention the path in MEDIA_ROOT as follows:

MEDIA_ROOT = ‘/var/www/’

and mention the url as

MEDIA_URL = ‘http://localhost/’

Similarly, static root and static url tell django where to look for static pages. For now we will leave it blank and let django assume that all the pages used are dynamic.

Next bunch of lines give various description of the project, which for now we will ignore.

The next important thing is template directory. Templates are the html files which django backend serves. You can read more about them here.

So create a folder called ‘templates’ within your project folder and include this line in the settings.

TEMPLATE_DIRS = (

‘/templates’

)

Once all this is done, you are now ready to go. You can write your views , models and mention the urls.

You can view the entire settings file here .