Getting started with Apache Cassandra, Quickly

“The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra’s support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.”

The marketing is great, why wouldn’t you want to try this :-)

This post gives a fast track overview for setting up Apache Cassandra with Docker, simulating a 3 DC x 2 rack cluster and some introductory commands to get started.

Create/Recreate an Apache Cassandra Docker Environment

SEEDS=""

# Cleanup
for dc in 1 2 3; do
for rack in 1 2; do
docker stop cassandra-dc${dc}-rack${rack}
docker rm cassandra-dc${dc}-rack${rack}
SEEDS=${SEEDS},cassandra-dc${dc}-rack${rack}
done
done
docker network remove cassandra-network

# Remove leading comma
SEEDS=$(echo $SEEDS | cut -c2-)

# Recreate
docker network create -d bridge cassandra-network
for dc in 1 2 3; do
for rack in 1 2; do
docker run --name cassandra-dc${dc}-rack${rack} --network cassandra-network -d -e CASSANDRA_SEEDS=$SEEDS -e CASSANDRA_DC=DC${dc} -e CASSANDRA_RACK=RACK${rack} -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch cassandra:latest
done
done

Show the nodes

# docker ps -a | grep cassandra
3bfeeba7a292 cassandra:latest "docker-entrypoint.s…" 29 seconds ago Up 25 seconds 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp cassandra-dc3-rack2
6d71ef35b8b9 cassandra:latest "docker-entrypoint.s…" 32 seconds ago Up 28 seconds 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp cassandra-dc3-rack1
3dd7ab91b59a cassandra:latest "docker-entrypoint.s…" 34 seconds ago Up 31 seconds 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp cassandra-dc2-rack2
dc11e49c374e cassandra:latest "docker-entrypoint.s…" 36 seconds ago Up 33 seconds 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp cassandra-dc2-rack1
7e74b5b83142 cassandra:latest "docker-entrypoint.s…" 38 seconds ago Up 36 seconds 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp cassandra-dc1-rack2
ad55065d4f6e cassandra:latest "docker-entrypoint.s…" 40 seconds ago Up 38 seconds 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp cassandra-dc1-rack1

Use one of the nodes

# docker exec -it cassandra-dc3-rack2 bash

Get cluster status

root@3bfeeba7a292:/# nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.28.0.2 70.01 KiB 256 32.3% 06a3eabc-9e05-4c75-a8fb-bc01f6cec80c RACK1
UN 172.28.0.3 70 KiB 256 32.1% 47bb39e8-7c5c-40f2-9688-28c4ab79c0c9 RACK2
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.28.0.4 70.05 KiB 256 34.3% 726d1f71-6c83-4242-8187-5d041cc8b1dd RACK1
UN 172.28.0.5 70.03 KiB 256 31.8% 8abe20cc-aadc-446c-bac8-7e82c203bbd5 RACK2
Datacenter: DC3
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.28.0.6 70.04 KiB 256 36.3% 853a7c0a-d1b6-4e98-9788-a5eade445a31 RACK1
UN 172.28.0.7 70.07 KiB 256 33.2% 1656459f-e3a7-4553-a5ef-d3be721cdb18 RACK2

Create keyspace with even replication distribution

cqlsh> CREATE KEYSPACE firstkeyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};
cqlsh> use firstkeyspace ;
cqlsh:firstkeyspace> CREATE TABLE emp(emp_id int, emp_name text, emp_age int, PRIMARY KEY(emp_id));
cqlsh:firstkeyspace> CONSISTENCY
Current consistency level is ONE.
cqlsh:firstkeyspace> CONSISTENCY ALL
Consistency level set to ALL.
cqlsh:firstkeyspace> INSERT INTO emp (emp_id, emp_name, emp_age) VALUES(10, 'john', 28);

Show distribution

root@3bfeeba7a292:/# nodetool status firstkeyspace
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.28.0.2 92.01 KiB 256 48.5% 06a3eabc-9e05-4c75-a8fb-bc01f6cec80c RACK1
UN 172.28.0.3 92 KiB 256 48.9% 47bb39e8-7c5c-40f2-9688-28c4ab79c0c9 RACK2
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.28.0.4 92.05 KiB 256 51.1% 726d1f71-6c83-4242-8187-5d041cc8b1dd RACK1
UN 172.28.0.5 92.03 KiB 256 47.9% 8abe20cc-aadc-446c-bac8-7e82c203bbd5 RACK2
Datacenter: DC3
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.28.0.6 92.04 KiB 256 51.8% 853a7c0a-d1b6-4e98-9788-a5eade445a31 RACK1
UN 172.28.0.7 92.07 KiB 256 51.7% 1656459f-e3a7-4553-a5ef-d3be721cdb18 RACK2

Create Keyspace with even unbalanced replication distribution

cqlsh> CREATE KEYSPACE secondkeyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 4};
cqlsh> use secondkeyspace ;
cqlsh:secondkeyspace> CREATE TABLE emp(emp_id int, emp_name text, emp_age int, PRIMARY KEY(emp_id));
cqlsh:secondkeyspace> CONSISTENCY ALL
Consistency level set to ALL.
cqlsh:secondkeyspace> INSERT INTO emp (emp_id, emp_name, emp_age) VALUES(10, 'john', 28);
cqlsh:secondkeyspace>

Note the difference

root@3bfeeba7a292:/# nodetool status secondkeyspace
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.28.0.2 98.95 KiB 256 65.2% 06a3eabc-9e05-4c75-a8fb-bc01f6cec80c RACK1
UN 172.28.0.3 98.94 KiB 256 64.3% 47bb39e8-7c5c-40f2-9688-28c4ab79c0c9 RACK2
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.28.0.4 82.03 KiB 256 66.3% 726d1f71-6c83-4242-8187-5d041cc8b1dd RACK1
UN 172.28.0.5 82.02 KiB 256 67.6% 8abe20cc-aadc-446c-bac8-7e82c203bbd5 RACK2
Datacenter: DC3
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.28.0.6 82.04 KiB 256 66.3% 853a7c0a-d1b6-4e98-9788-a5eade445a31 RACK1
UN 172.28.0.7 98.99 KiB 256 70.2% 1656459f-e3a7-4553-a5ef-d3be721cdb18 RACK2

Create Keyspace using all nodes and racks

cqlsh> CREATE KEYSPACE thirdkeyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': 2, 'DC2': 2, 'DC3': 2};
cqlsh> use thirdkeyspace;
cqlsh:thirdkeyspace> CONSISTENCY ALL
Consistency level set to ALL.
cqlsh:thirdkeyspace> CREATE TABLE emp(emp_id int, emp_name text, emp_age int, PRIMARY KEY(emp_id));
cqlsh:thirdkeyspace> INSERT INTO emp (emp_id, emp_name, emp_age) VALUES(10, 'john', 28);

All nodes, storing data

root@3bfeeba7a292:/# nodetool status thirdkeyspace
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.28.0.2 70.79 KiB 256 100.0% 06a3eabc-9e05-4c75-a8fb-bc01f6cec80c RACK1
UN 172.28.0.3 70.79 KiB 256 100.0% 47bb39e8-7c5c-40f2-9688-28c4ab79c0c9 RACK2
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.28.0.4 104.05 KiB 256 100.0% 726d1f71-6c83-4242-8187-5d041cc8b1dd RACK1
UN 172.28.0.5 104.04 KiB 256 100.0% 8abe20cc-aadc-446c-bac8-7e82c203bbd5 RACK2
Datacenter: DC3
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.28.0.6 104.06 KiB 256 100.0% 853a7c0a-d1b6-4e98-9788-a5eade445a31 RACK1
UN 172.28.0.7 70.8 KiB 256 100.0% 1656459f-e3a7-4553-a5ef-d3be721cdb18 RACK2

Stop a node

# docker stop cassandra-dc1-rack1
cassandra-dc1-rack1

Note the DN Value

root@r730:/local/docker/cassandra# docker exec -it cassandra-dc3-rack2 bash
root@3bfeeba7a292:/# nodetool status thirdkeyspace
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
DN 172.28.0.2 70.79 KiB 256 100.0% 06a3eabc-9e05-4c75-a8fb-bc01f6cec80c RACK1
UN 172.28.0.3 70.79 KiB 256 100.0% 47bb39e8-7c5c-40f2-9688-28c4ab79c0c9 RACK2
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.28.0.4 70.79 KiB 256 100.0% 726d1f71-6c83-4242-8187-5d041cc8b1dd RACK1
UN 172.28.0.5 70.81 KiB 256 100.0% 8abe20cc-aadc-446c-bac8-7e82c203bbd5 RACK2
Datacenter: DC3
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.28.0.6 70.83 KiB 256 100.0% 853a7c0a-d1b6-4e98-9788-a5eade445a31 RACK1
UN 172.28.0.7 70.8 KiB 256 100.0% 1656459f-e3a7-4553-a5ef-d3be721cdb18 RACK2

We can still query data by default

root@3bfeeba7a292:/# cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.6 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh> USE thirdkeyspace ;
cqlsh:thirdkeyspace> select * from emp;

emp_id | emp_age | emp_name
--------+---------+----------
10 | 28 | john

But, with consistency all, it will fail for read/writes

cqlsh:thirdkeyspace> CONSISTENCY ALL
Consistency level set to ALL.
cqlsh:thirdkeyspace> select * from emp;
NoHostAvailable:
cqlsh:thirdkeyspace> INSERT INTO emp (emp_id, emp_name, emp_age) VALUES(10, 'john2', 28);
NoHostAvailable:

Start back up the container, and wait for the cluster to resolve itself

# docker start cassandra-dc1-rack1
cassandra-dc1-rack1

# docker exec -it cassandra-dc3-rack2 bash
root@3bfeeba7a292:/# nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
DN 172.28.0.2 70.79 KiB 256 ? 06a3eabc-9e05-4c75-a8fb-bc01f6cec80c RACK1
UN 172.28.0.3 70.79 KiB 256 ? 47bb39e8-7c5c-40f2-9688-28c4ab79c0c9 RACK2
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 172.28.0.4 70.79 KiB 256 ? 726d1f71-6c83-4242-8187-5d041cc8b1dd RACK1
UN 172.28.0.5 70.81 KiB 256 ? 8abe20cc-aadc-446c-bac8-7e82c203bbd5 RACK2
Datacenter: DC3
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 172.28.0.6 70.83 KiB 256 ? 853a7c0a-d1b6-4e98-9788-a5eade445a31 RACK1
UN 172.28.0.7 70.8 KiB 256 ? 1656459f-e3a7-4553-a5ef-d3be721cdb18 RACK2

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

root@3bfeeba7a292:/# nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 172.28.0.2 208.56 KiB 256 ? 06a3eabc-9e05-4c75-a8fb-bc01f6cec80c RACK1
UN 172.28.0.3 70.79 KiB 256 ? 47bb39e8-7c5c-40f2-9688-28c4ab79c0c9 RACK2
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 172.28.0.4 70.79 KiB 256 ? 726d1f71-6c83-4242-8187-5d041cc8b1dd RACK1
UN 172.28.0.5 70.81 KiB 256 ? 8abe20cc-aadc-446c-bac8-7e82c203bbd5 RACK2
Datacenter: DC3
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 172.28.0.6 70.83 KiB 256 ? 853a7c0a-d1b6-4e98-9788-a5eade445a31 RACK1
UN 172.28.0.7 70.8 KiB 256 ? 1656459f-e3a7-4553-a5ef-d3be721cdb18 RACK2

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

Retry, with available replicas

root@3bfeeba7a292:/# cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.6 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh> USE thirdkeyspace ;
cqlsh:thirdkeyspace> CONSISTENCY ALL;
Consistency level set to ALL.
cqlsh:thirdkeyspace> select * from emp;

emp_id | emp_age | emp_name
--------+---------+----------
10 | 28 | john

(1 rows)
cqlsh:thirdkeyspace> INSERT INTO emp (emp_id, emp_name, emp_age) VALUES(10, 'john2', 28);
cqlsh:thirdkeyspace> select * from emp;

emp_id | emp_age | emp_name
--------+---------+----------
10 | 28 | john2

(1 rows)
Share