From understanding to setup — Installing Neo4j on an Azure virtual machine (Linux/Ubuntu)

Siddhartha Sehgal
Neo4j Developer Blog
6 min readDec 30, 2019

--

What is Neo4j?

“The world’s most flexible, reliable and developer friendly graph database as a service.” It is an online database management system with Create, Read, Update and Delete (CRUD) operations that stores data as a graph.

What is a graph database?

A graph database, also called a graph-oriented database, is a type of NoSQL database that uses graph theory to store, map and query relationships. A graph database is essentially a collection of nodes and edges.

A graph is composed of two elements: a node (or vertex) and a relationship (or edge). Each node represents an entity (a person, place, thing, category or other piece of data), and each relationship represents how two nodes are associated. This general-purpose structure allows you to model all kinds of scenarios — from a system of roads, to a network of devices, to a population’s medical history or anything else defined by relationships.

Why would you ever require a graph database anyway?

A graph database, unlike a relational database management system (RDBMS), treats relationships as first class citizens. It is not required to use approaches such as complex join queries or accessing foreign keys to get data related to each other. We join entities as soon as we know they’re connected, so these mapping methods are unnecessary. Since graph databases employ object oriented thinking at their core, the data model you draw on your whiteboard is the model of data you can store in your database.

Modern data has, implicitly, lots of relationships. In order to leverage these data connections, organizations need a database technology that stores relationship information as a first-class entity. That technology is a graph database. Unfortunately, legacy RDBMS are poor at handling data relationships. Also, their rigid schemas make it difficult to add different connections or adapt to new business requirements.

Not only do graph databases effectively store data relationships; they’re also flexible when expanding a data model or conforming to changing business needs. You can read more about advantages of using Graph databases.

Neo4j 4.0 was recently released, so do bear in mind which version you are installing. The Neo4j 3.5.x series requires Java 8, whereas Neo4j 4.0 uses Java 11. Also, there have been some changes to the way you connect to the database in 4.0, with the new connection schema of neo4j://.

Let us get started with Neo4j on an Microsoft Azure virtual machine.

Create a virtual machine on which you can host a Neo4j community version server

The process to create a virtual machine on Azure is quite straight forward. I am using a Linux (Ubuntu 18.04) virtual machine for this post.

Here is a little refresher — go to your Azure portal. Create a resource from the home page itself:

Azure portal homepage

Search ‘virtual machine’ and look under the ‘Compute’ section:

Search for virtual machine in Azure Marketplace

Enter your preferences for name, disk size, region, resource group, authentication, and finally review and create it:

Review settings and create the virtual machine
Virtual machine is created

Once the virtual machine is created (it will start automatically), connect to it via a client software of your choice , such as Putty or MobaXterm. I will be using MobaXterm for this post.

Connect to the virtual machine and start installing

Use the public or private IP to connect to the virtual machine using the client of your choice. Remember, the public IP is at risk of changing when the virtual machine is restarted. You can, alternatively, setup a domain name system (DNS)for the virtual machine and connect using that.

Once the connection is established to the virtual machine, let us install Java, which is necessary for Neo4j. Install Java using the command below:

sudo apt-get install openjdk-8-jre

Do bear in mind if you’re installing Neo4j 4.0 you will need Java 11.

Installing the latest Neo4j community version

Add Neo4j to list of repositories in Debian:

wget -O - https://debian.neo4j.org/neotechnology.gpg.key | sudo apt-key add -
echo 'deb https://debian.neo4j.org/repo stable/' | sudo tee /etc/apt/sources.list.d/neo4j.list

Update the apt-get repositories list:

sudo apt-get update

Finally, install Neo4j:

sudo apt-get install neo4j

This should set up Neo4j on your Linux machine, and be available to use. You can start the Neo4j service using the following command:

sudo service neo4j restart

Wait a few seconds and check to see if the server started. Note that the default port Neo4j is configured to run on is 7474. Since Neo4j is currently only accessible to the Linux machine , i.e., local, use the following command to check if it is working:

sudo curl localhost:7474

The sudo command will only work if you have root level access to the Linux machine.

If the service is active, the result should be something like this:

{
“data” : “http://localhost:7474/db/data/",
“management” : “http://localhost:7474/db/manage/",
“bolt” : “bolt://localhost:7687”
}

Be aware that if you’re using Neo4j 4.0, the bolt:// connection schema is replaced with neo4j://.

Setting up Neo4j to be accessible over the internet

Now we will make the Neo4j service available to anyone over the internet. We accomplish this by changing few things in the Neo4j config file.

Access the Neo4j config file from the following path on the Linux machine where we installed Neo4j:

/etc/neo4j/neo4j.conf

I am going to access the conf file using the vim MobaXterm command:

sudo vim /etc/neo4j/neo4j.conf

Add/edit the following lines to the config file

Once the neo4j.conf file is open in Vim editor, press key ‘I’ to enter ‘INSERT’ mode and make the relevant changes. Once complete, click the ‘esc’ key and ‘:wq’ to save changes and quit the editor:

dbms.connector.bolt.enabled=true
dbms.connector.bolt.listen_address=0.0.0.0:7687
dbms.connector.http.enabled=true
dbms.connector.http.listen_address=0.0.0.0:7474

The above commands will enable the connector to bolt and http (not secure) for the IP addresses and the ports we mentioned. Note that mentioning the IP part as 0.0.0.0 will allow all IP addresses to hit your machine at port 7474 for Neo4j access. This is not secure, and I suggest you to use the IP address of the machine you intend to access the Neo4j server from.

To access port 7474 from outside the Linux machine, you will have to add it to the inbound port rules for the virtual machine. See how to do that here.

Restart the Neo4j service after making these changes:

sudo service neo4j restart

We can now access Neo4j browser by using:

http://<IP address/DNS for linux machine>:7474/browser/

When connected to the Linux machine from your local/other machine, the browser looks like this:

Neo4j browser

The default credentials for Neo4j browser are:

Username: neo4jPassword: neo4j

You will be prompted to change the password on the first login. Set a new password and you are set! You can start writing Cypher queries to generate data, and create some graphs.

You can further increase the heap size of your Neo4j setup to store more data to improve performance. This is done by uncommenting and updating the following parameters in the conf file:

dbms.memory.heap.initial_size=4gdbms.memory.heap.max_size=4g

That’s all folks.

Edit: Got Neo4j up and running? Read about next steps to get started on modeling your data in a graph database here:

--

--