Stormbase

Connect

If not already done, go to the download section, download the tar file, copy it to Linux 64 bits server, and untar the file this way:

"> tar -xvf stormbase_tar_file"

Stormbase is installed !

Start Stormbase :

01_database> ./start.sh

Connect to Stormbase :

"01_database> ./sql.sh 2219 > desc; ... <- data"

First script

Create a file foo.sql in 01_database.

" -- Let's create a product table and a sales table -- Then we will join them on the product code column -- First the col_type, col_type is mandatory to join the tables CREATE COL_TYPE T_PROD_CODE AS TEXT -- -- Create the product table, each product is affected to a department CREATE TABLE MY_PRODUCTS ( PRODUCT_CODE T_PROD_CODE , DEPARTMENT TEXT ) -- -- Create the sales table, with quantity sold per product CREATE TABLE MY_SALES ( PRODUCT_CODE T_PROD_CODE , QTY_SOLD NUMBER ) -- -- I insert 2 products INSERT INTO MY_PRODUCTS VALUES ( 'HAM SANDWICH' , 'DELI' ) INSERT INTO MY_PRODUCTS VALUES ( 'COKE' , 'DRINKS' ) -- -- I insert one sales line per product INSERT INTO MY_SALES VALUES ( 'HAM SANDWICH' , 6 ) INSERT INTO MY_SALES VALUES ( 'COKE' , 10 ) -- -- I create the view CREATE VIEW MY_VIEW AS SELECT * FROM MY_PRODUCTS , MY_SALES WHERE MY_PRODUCTS.PRODUCT_CODE = MY_SALES.PRODUCT_CODE -- -- I run the magic command REFRESH DIRTY VIEW -- -- I run a select group by department against the view, two lines are returned, as expected SELECT DEPARTMENT , SUM( QTY_SOLD ) FROM MY_VIEW GROUP BY DEPARTMENT department|sum(qty_sold) DELI|6 DRINKS|10 -- -- Now let's insert another sales line. INSERT INTO MY_SALES VALUES ( 'COKE' , 9 ) -- -- I need to refresh the view to make it available for select REFRESH DIRTY VIEW -- -- Same select as before. SELECT DEPARTMENT , SUM( QTY_SOLD ) FROM MY_VIEW GROUP BY DEPARTMENT department|sum(qty_sold) DELI|6 DRINKS|19

And run it this way :

"01_database> ./sql.sh --file:foo.sql

What's included ?

Directory	Role
01_database	Stormbase itself
02_sql	The CLI (command line interface)
05_jdbc	The jdbc driver

Node.js CLI

Local CLI

"01_database> ./sql.sh [ --file:filename ]"

Remote CLI

"02_sql> ./sql.sh user/password@host:port [ --file:filename ]"

Exit from CLI

"> bye;"

Comments in scripts

"-- This line in commented"

What is Stormbase ?

BI database

Stormbase is a in memory database dedicated to business intelligence and response time. In other words, it is fast. Like other databases it uses tables, views, inserts, selects.

If you want to do small inserts and manage transaction is an ACID way, then Stormbase is not what you need.

If you have hundreds of millions of lines of data or even more. And you want to do select, sum, group by etc... And you don't like to wait, then Stormbase is what you need.

What is a col_type ?

Main difference with other database is that stormbase uses functional data types (the col_type), not technical data types.

Let say I have a product code column on 13 characters. In a regular database I will define it as CHAR(13) or VARCHAR(13).

In stormbase there are only 2 primary data types: text, number. On top of this primary types you can define your col_type (the functional data type).

Assuming you an integer value in a column, will you define it as text or number?

The answer is simple. Are you going to sum this column? If yes then it is a number column, otherwise it is a text column.

So if your integer column contains for example a status code, use text. And if it contains quantities, use number.

I can create a table like this :

"CREATE TABLE my_products ( product_code TEXT, …"

Or like this :

"CREATE COL_TYPE t_prod_code AS TEXT; CREATE TABLE my_products ( product_code t_prod_code, …"

Why would I create a COL_TYPE?

Stormbase’s first rule: you can join columns if and only if they have the same col_type.

Inserts & selects

Big picture of a simple stormbase project is :

1 - Create the col_types you need
2 - Create the tables you need
3 - Insert some data
4 - Create the views you need
5 - Refresh the views
6 - Now you can select from the views
7 - Save (for persistance)
8 - Insert more data
9 - Refresh the views

Stormbase’s second rule: you can only select from views, not from tables.

So if you want to select from a single table you should create a view on top of the table.

Case

Stormbase is not case sensitive, except for file names and parameters.

Col_types & tables

COL_TYPE creation :

"CREATE COL_TYPE col_type_name AS [ TEXT | NUMBER ] ;"

TEXT	Any value that won't be sum.
NUMBER	Any number, integer or float. I can do sum, average, min and max on these columns. Note that you can't group by number values

TABLE creation :

"CREATE TABLE [ BIG | MERGE | BIG_ONLINE ] table_name ( column_name [ col_type_name | TEXT | NUMBER ] , ... ) ;"

BIG	In traditional BI, BIG tables would be your fact tables (at the center of your star schema).
MERGE	First column of a MERGE TABLE becomes the primary key of the table. MERGE tables have an "insert or update" logic, based on the primary key. In traditional BI, MERGE tables would be your dimension tables of your star schema.
BIG_ONLINE	Means non persistant BIG table that has a "cancel and replace" logic. See the REFRESH_ONLINE section.

Inserts

Insert from file :

"INSERT INTO table_name SELECT * FROM FILE ( ‘file_name’ ) [ WHERE ROWNUM < line_max ] ;"

Stormbase will look for file CSV_FILES_DIR/table_name_lowercase/file_name. CSV_FILES_DIR is defined in stormbase.conf (see configuration section).

Single row insert (usually for testing) :

"INSERT INTO table_name VALUES ( value_col_1 , value_col_2 ... ) ;"

Multi sessions insertion is not possible. In other words if Stormbase receives an insert statement while another insert is running, the second insert will receive an error. Keep in mind that Stormbase is dedicated to BI not to transations. So multi session is only possible for SELECTs.

Views

View creation from several tables :

"CREATE VIEW view_name AS SELECT * FROM table_name1 , table_name2 ... WHERE table_name1.col_nameA = table_name2.col_nameB AND …"

Note: Each join between two tables must involve one and only one column in each table. As a consequence, if you have N tables in your from clause you must have N-1 joins in your where clause.

View on top of a single table :

"CREATE VIEW view_name AS SELECT * FROM table_name ;"

Refresh

This is the magic command. It makes the data inserted since last REFRESH DIRTY VIEWS available for selects, it also computes everything Stormbase needs to deliver maximum speed :

"REFRESH DIRTY VIEWS ;"

Please note that the REFRESH DIRTY VIEWS is meant to happen at night, when there are no users connected. The reason for that begin that Stormbase will refuse all SQLs (including selects) during the REFRESH DIRTY VIEWS. Most BI applications get updated once per day, at night. Stormbase also support online refresh, please see after.

Refresh online

Same as REFRESH DIRTY VIEWS but transparent for the end users :

"REFRESH_ONLINE DIRTY VIEWS ;"

Please note that the REFRESH_ONLINE DIRTY VIEWS has limitations.

The tables in which you can insert before a REFRESH_ONLINE DIRTY VIEWS can be used in 2 kinds of views:

- single table view on top of a MERGE table

- BIG (or BIG_ONLINE)/MERGE(s) view. In other words, the view is on top of 1 BIG (or BIG_ONLINE) table and several (MERGE) tables.

So if an insert has been done into a RAW table the REFRESH_ONLINE DIRTY VIEWS will return an error.

The deletes are not accepted except full deletes against BIG_ONLINE tables.

So if a delete has been done against a non BIG_ONLINE table or if this delete is not a full delete then the REFRESH_ONLINE DIRTY VIEWS will return an error.

Save

Save means save to disk for persistence :

"SAVE ;"

Before you save, your stormbase’s data only exist in memory. So if you bounce the server you loose everything you did since last SAVE. Please notice that the keyword is SAVE and *NOT* COMMIT. SAVE will write to disk any modifications (done by CREATE/INSERT/DELETE/REFRESH DIRTY VIEWS) that have occured since last SAVE.

Delete

Delete from a table :

"DELETE table_name WHERE where_clause ;"

Refer to select section for where_clause syntax.

Rebuild the views after DELETEs :

"REFRESH DIRTY VIEWS ;"

The "REFRESH DIRTY VIEWS" must be done between the DELETEs and the SAVE, otherwise the DELETEs are lost.

Drop

Drop a TABLE :

"DROP TABLE table_name ;"

Drop a VIEW :

"DROP VIEW view_name ;"

Drop a COL_TYPE :

"DROP COL_TYPE col_type_name ;"

Drop all TABLEs, VIEWs, COL_TYPEs :

"DROP ALL ;"

Shutdown

Shutdown stormbase server :

"SHUTDOWN ;"

Another way to shutdown is to run ./stop.sh from 01_database. This shell will simply kill the stormbase process. If you use the ./stop.sh script you must be sure that there is no SAVE running, otherwise your data will probably be corrupted.

Describe

Describe all objects :

"DESC ;"

Describe user defined col_type(s) :

"DESC COL_TYPE [ col_type_name ] ;"

Describe all col_type(s) :

"DESC COL_TYPE [ col_type_name ] VERBOSE ;"

Describe all tables :

"DESC TABLE ;"

Describe a table (one row per column) :

"DESC TABLE table_name ;"

Describe all views :

"DESC VIEW ;"

Describe a view (one row per table) :

"DESC VIEW view_name ;"

Describe a view (one row per column) :

"DESC VIEW view_name VERBOSE;"

Permissions

Permission means making some rows visible to some end users and not to others.

END_USER keyword

Create a COL_TYPE of type TEXT named END_USER :

"CREATE COL_TYPE END_USER AS TEXT ;"

Permission TABLEs

A permission TABLE is a TABLE with 2 columns, where the first column is defined as END_USER :

"CREATE TABLE permission_tab_name ( end_user_col_name END_USER, permission_col_name another_col_type ) ;"

Compute permissions

Once you have inserted data in the permission TABLEs :

"REFRESH DIRTY VIEWS ;"

Selects

Use END_USER pseudo column in SELECT to restrict the rows read :

"SELECT ... WHERE END_USER='end_user_name' ;"

Example

"create col_type end_user as text; create col_type t_site as text; create col_type t_dept as text; create col_type t_article as text; create table site_permission ( toto end_user, site t_site); insert into site_permission values('pt','site1'); insert into site_permission values('pt','site2'); insert into site_permission values('rd','site2'); create table article_permission ( toto end_user, article t_article); insert into article_permission values('rd','art1'); create table article( article t_article, dept t_dept); insert into article values('art1','dept1'); create table sales( site t_site, article t_article, qty number); insert into sales values('site1','art1',1); insert into sales values('site2','art1',1); create view v_sales as select * from sales,article where sales.article=article.article; refresh dirty views; select sum(qty) from v_sales where end_user='rd'; -- 1 select sum(qty) from v_sales where end_user='pt'; -- NULL"

Authentication

Authentication means controlling username and password.

By default Stormbase accepts 2 usernames, admin/admin that can do everything and reader/reader for SELECTs only.

Autentication management can be modified to do pretty much anything you want.

Take a look at security.lua in 01_database/_FUNCTIONS. Adding new users or modifying password is very easy for a programmer.

Advanced user may modify this file completely, the programming langage used is LUA.

Syntax

"SELECT group_column_name, ... [ primary_operator | formula ] ( column_name ), … FROM view_name [ WHERE where_definition ] [ GROUP BY group_column_name, … ] [ CALLBACK callback_definition ];"

Primary operators :

"[ SUM | MIN | MAX | AVG | COUNT | COUNTDISTINCT ]"

SUM( column_name )	Returns the sum of all values (number column only).
MIN( column_name )	Returns the mininum value (number column only).
MAX( column_name )	Returns the maximum value (number column only).
AVG( column_name )	Returns the average (number column only). AVG( x ) = SUM( X ) / COUNT( x )
BTE( column_name )	between or equal function : returns true if y <= x and x <= z
COUNT( column_name )	Returns the number of line returned where column_name is not null.
COUNTDISTINCT( column_name )	Returns the number of distinct values.
LIST( column_name )	Returns distinct values concatenated (text column only).
MINSTR( column_name )	Returns the mininum value (text column only).
MAXSTR( column_name )	Returns the average (text column only)

Formula example with primary operators :

"SUM( column1 ) / MAX( column2 ) - AVG( column3 ) "

You may also use a non boolean functions :

"FLOOR( SUM( column1 ) / MAX( column2 ) - AVG( column3 ) ) "

Where clause

Basic where_definition :

"column_name = number_value AND … column_name = 'text_value' AND … "

Same with several values :

"column_name IN ( value1 , value2 … ) AND … "

You can also use a boolean functions:

"[ INSTR | REGEX | BTW(E) | GT(E) | LT(E) | (N)E | IS(N)IN ]( column_name , parameters )"

You can also use a non boolean functions, inside a boolean functions :

"E( FLOOR( x ) , y )"

Boolean functions

INSTR( str1 , str2 )	in string function : returns true if string str2 is in string str2
REGEX( str1 , str2 )	regular expression check function : returns true if str1 matches the regular expression str2
BTW( x , y , z )	between function : returns true if y < x and x < z
BTWE( x , y , z )	between or equal function : returns true if y <= x and x <= z
GT( x , y )	greater than : returns true if x > y
GTE( x , y )	greater than or equal : returns true if x >= y
LT( x , y )	lower than : returns true if x < y
LTE( x , y )	lower than or equal : returns true if x <= y
E( x , y )	equal : returns true if x = y
NE( x , y )	not equal : returns false if x = y
ISIN( x , array )	is in : returns true if x is in array (example { 10 , 20 , 30 })
ISNIN( x , array )	is not in

Non boolean functions

LEAST( x , y )	Obvious
GREATEST( x , y )	Obvious
CEIL( x )	Obvious
FLOOR( x )	Obvious

Callback clause

Callbacks are functions that will be applied to the result set, after the select has been performed :

"CALLBACK callback_fn1( params ) callback_fn2( params ) ..."

Example, sort by column 2 descendant and return only 10 first rows. :

"CALLBACK SORT( 2 , ‘desc’ ) LIMIT( 10 )'

LIMIT( rownum )	Limit result set to first rownum lines.
SORT( col , direction )	Obvious, direction is ‘asc’ or ‘desc’.
WHERE( col , val )	Filters the result set, only lines having column number col equal to val are returned.
GROUP_BY( gby_col )	Groups by column number gby_col and perform a count on the other columns.
RANGE( col , len )	If column number 2 equals 8, range( 2 , 10) will return « 0 – 10 » instead of 8.

Prerequisites

Disable swap

swapoff -a;

Disable THP (=Transparent Huge Page) if activated

On Centos (as root)

echo never > /sys/kernel/mm/transparent_hugepage/enabled;

echo never > /sys/kernel/mm/transparent_hugepage/defrag;

On Redhat (as root)

echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled;

echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag;

Configuration

Every thing you can configure is in this file : 01_database/stormbase.conf.

Parameters are defined like this :

"parameter_upper_case:parameter_value

No space before or after the :. Order doesn’t matter.

TCP_PORT	Port on which stormbase server will listen. Optional, default is 12345.
FILE_SEPARATOR	Field separator when you insert from File, one ascii character. Optional, default is : , (=comma).
HEADER	Possible values: y (files used to insert data contain header line) or n (opposite).Optional, default is y.
CSV_FILES_DIR	Directory to read data from. If I set it to /tmp and I run : INSERT INTO FOO SELECT * FROM FILE( ‘foo_01.csv’ ) Stormbase will look for file /tmp/foo/foo_01.csv (CSV_FILES_DIR/table_name_lowercase/file_name). Mandatory.
TMP_FILES_DIR	Directory for temporary data. Mandatory
INTERNAL_FILES_DIR	Directory used for persistence by Stormbase with you SAVE. If you want to do a backup, just copy this directory. Mandatory
MAX_PERF_FILES_DIR	Same as INTERNAL_FILES_DIR, but contains the "memory image" of Stormbase's data. If this directory is empty when Stormbase starts, Stormbase will automatically recreate and save it. Mandatory

Shells

01_database/start.sh	starts stormbase
01_database/stop.sh	stops stormbase
01_database/show_log.sh	shows log
01_database/clear.sh	clears Stormbase's internal & max_perf directory
01_database/clear_max_perf.sh	clears Stormbase's max_perf directory
01_database/sql.sh	connects to the local Stormbase instance
01_database/version.sh	provides Stormbase's version

Drivers

The dialog between client and server uses TCP messages.

Node.js driver

The low level connection method are coded in 02_sql/node/stormbase_common.js.

If you want to use it in your own node.js program take a look at 02_sql/node/stormbase_sql.js, this is really simple.

Jdbc driver

Put stormbase_jdbc.jar in your classpath. Here under is an java connection program sample. You can also use a tool such as SQuirreLSQL to connect to stormbase.

"import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.Statement; class toto { public static void main(String[] args) throws Exception { Class.forName("stormbase.Driver"); Connection conn = DriverManager.getConnection("jdbc:stormbase:localhost:2219", "admin", "admin"); Statement s = conn.createStatement(); s.execute("create table foo(dummy number)"); s.executeUpdate("insert into foo values(22)"); s.execute("create view my_view as select * from foo"); ResultSet rs = s.executeQuery("SELECT dummy,COUNT(dummy) FROM my_view GROUP BY dummy"); while(rs.next()){ System.out.println("->"+rs.getString(1)); } conn.commit(); conn.close(); } }"

Getting started

Connect

First script

What's included ?

Node.js CLI

Concepts

What is Stormbase ?

What is a col_type ?

Inserts & selects

Case

Data model

Col_types & tables

Inserts

Views

Refresh

Refresh online

Save

Delete

Drop

Shutdown

Describe

Permissions

Authentication

Select

Syntax

Where clause

Boolean functions

Non boolean functions

Callback clause

Others

Prerequisites

Configuration

Shells

Drivers