Create / insert / refresh

create col_type

create col_type <end_user|col_type_name> as <number|text>

create table

create table <|merge|big> table_name ( column_name1 <number|text|col_type_name>, ...)

create view

  create view view_name as select * from table_name1, table_name2, ... where table_name1.column_nameA = table_name2.column_nameB ...

insert values

insert into table_name values('col1_value','col2_value',...)

refresh dirty view

Refresh the views after insert

Dirty means that data has been inserted in the first table of the view since last refresh

refresh dirty view

view data model & parameters

Desc (describe) command

upsert (merge table)

Insert into a merge table will upsert (update if PK already exits, insert otherwise)

save

Save data on disk (persistence)

save

insert from file

insert into table_name select * from file(<'path/to/file'|'file_name under CSV_FILES_DIR/table_name'>)

delete

refresh_online

Unlike refresh, refresh_online won't generate downtime

refresh_online dirty view

partitions

Size of table is limited to 4G lines, hence need for partition

refresh_online & partitions

A partition can have its refresh_online views attached

refresh advanced

Applies deletes + updates + data model changes on tables and views)

refresh dirty table

Recompute view but not max_perf

refresh view view_name

Recompute permissions if changes and drop cache

refresh permission

Recompute computed columns if changes and create a new version of stormbase.so under _SO_LINUX

refresh computed_column

Drop cache

refresh cache

Does all above and recomputes max_perf and re execute init file

refresh dirty view

deletes & refresh_online

Some big tables have a "delete before insert" logic along with refresh_online needs

In previous version of Stormbase such tables were defined as big_online, now all big tables support this functionality

Big_online keyword is kept for backward compatibility

Basics

flat select

select <*|table1.*|col1|table1#col1>, ... from ...

group by select

select col1, col2, ... from ... group by col1, col2, ...
group by clause can be ommited

callback basics

Ths callback allows to run functions on the resultset

select .. from .. <callback|cb> callback_function1 callback_function2 ...

cb is a shortcut for callback

most common callbacks are sort and limit

more info are provided in "callback Lua functions" section

common aggregation operators

<sum|count|countdistinct|min|max|avg>

other aggregation operators

Note: uniquesoft can be used instead of unique for backward compatibility, behavior is the same

orphans management and dimension priority

Column item_id in view v_sales can come from table sales or table items

For artA and artB we don't care

For artC we do care because artC is a orhan record (*)

By default SB will look for item_id in non empty (**) dimension tables (***), then in the fact table

But you can tell SB which one you want using table#col syntax

(*) exists in fact table but not in dimension table

(**) an empty dimension is skipped

(***) dimension1 is the second table of from clause of the view create statement etc...,

--> !!! the order of dimensions in the from clause of the view create statement is important

escape quote

''

Stormbase & Lua

Lua is a progamming langage.
Stormbase is not developped in Lua, it is developped in native C.
But in some cases (callbacks and where clause, see here under) Stormbase calls Lua code.
Standard Lua code (provided in Stormbase installer) is located in _FUNCTIONS/stormbase.lua.
Custom Lua code can also be added in _FUNCTIONS/custom.lua.

utility Lua functions

callback Lua functions

data model modification #1

...

data model modification #2

Since v1.17.08 SB is a lot more flexible

where clause basics

select .. from ...
<where|and> where_clause1
and where_clause2
and where_clause3
..

Where clause syntax

column_name='value'
column_name in ('value1', 'value2', ...)
column_name nin ('value1', 'value2', ...)
column_name like 'value'

where clause / Lua boolean function

any Lua boolean function can be used in a where clause using one and only one column

(not)exists basics

select .. from view1
and view2.col='something'

Exists

and view2!.col='something'

Not exists

The exists join between view1 and view2 is implicit (the first col_type in common will be used for join)

Use case: when view2 is based on a non merge table, which why it can't be included in view1

col_type_checks basics

A col_type_check is a "select col, count(col) from .. where ... group by col"

It is used to filter other part of the sql

The filter will be applied on the first column that has same col_type as the col_type_check

If you alias your group by column in the with clause, the filter with apply in priority to a column with this name (if it exists)

with
with_name1 ( col_type_check1 ),
with_name2 ( col_type_check2 ),
...
select .. from view1, with_name1, with_name2! ..

col_type_checks advanced

Col_type_checks can be combined to do intersection etc...

It is used to filter other part of the sql

with
with_name1 ( col_type_check1 ),
with_name2 ( select col, count(col) from view2, with_name1 where ... group by col ),
...
select .. from view3, with_name2

permissions

end_user col_type

dynamic sql (advanced)

Commands

Commands are non SQL instructions

defragmentation

The CELL_* files in internal storage needs defragmentation, defrag is done in offline mode during the save process

continue_on_error/stop_on_error

By default a script execution will stop on first error

Sometimes this is not what you want (common case is multiple execution of a create statement)

To continue on all errors

continue_on_error

To stop on all errors

stop_on_error

To continue on some errors

continue_on_error(err_code1, err_code2, ...)

basics shells

start.sh
stop.sh
bounce.sh
show_log.sh

export.sh

Exports data_model and/or data to a directory

./export.sh <export_dir> <data_model y/n> <data y/n> <nb_thread> <dry_run y/n>

import.sh

Import data_model and/or data from an export directory

./import.sh <export_dir> <data_model y/n> <data y/n> <dry_run y/n>

sql.sh

From 02_sql

./sql.sh the_user/the_password@sb_host:sb_port [--file:<file.sql> --i]

i means interactive (the file is executed but you must press enter after each execution)

From 01_database (connection to local SB)

./sql.sh [--file:<file.sql> --i]

  Note about logging: sql.sh will create a log file (sql.log), this file is bit different from stdout, it is used to generate this documentation !!

y/n (sql.sh)

n tells sql.sh to "read but not run"
y tells sql.sh to restart normally

display (sql.sh)

something I want to see in sql.log without being executed
it can be on several lines but cannot contain a semi column

bounce

same as bounce.sh without killing the process

shutdown

same as stop.sh without killing the process

Contexts

contexts are C structure that you can create and reuse later in your custom C code embeded in SB

INIT_FILE_PATH (default ./init.sb)

This script is executed at statup and after each refresh dirty view

Analytics

functions

Functions allow to do advanced sql, the general syntax is this

select column_name, function_parameters
from view_name function function_name(function_arguments)
group by column_name

function_parameters: p(colum_name1), p(colum_name2), ...

function_arguments: depends on the function

function fn_having

Fn_having is a filter that applys on an operator results (for instance items that have been sold in more that 3 stores)

fn_having(<count|countdistinct|sum|sumpareto>, operator, value)

operator: = < > <= >=

function fn_store/fn_merge basics

Sql is often used to format/display results, fn_store/fn_merge do that

with
with_name1 as (
  select group_by_col_name, stored_values
  from view_name function fn_store ...
  group by group_by_col_name
),
...
select col_type.val, with_name1.alias1 ... from no_view, * function fn_merge

stored_values: operator(col_name) as alias1, operator(col_name) as alias2 etc..

Note1: all the with must have 0 or 1 group by column

Note2: all the group by columns must have the same col_type, you refer to this col_type with col_type.val in the fn_merge select

function fn_build_filter

Fn_build_filter builds a boolean matrix, matrix[x][y] = y/n (x: group by col value, y: text column value)

select group_by_col_name, p(text_col_name) from view_name function fn_build_filter(order,columns_kept)

group_by_col_name: optional, a one line matrix is created if not provided

order: <asc|desc> optional defaults to asc

columns_kept: 1 means keep first, 2 keep first two, etc... optional defaults to keep all

function fn_apply_filter

Fn_apply_filter applys the boolean matrix produced by fn_build_filter

select group_by_col_name, stored_values, p(col_name1), p(col_name2) from view_name
function fn_apply_filter(with_alias) ...
group by group_by_col_name

with_alias: with holding the fn_build_filter, or null meaning "take previous with"

group_by_col_name: optional

stored_values: operator(col_name) as alias1, operator(col_name) as alias2 etc..

col_name1: must have the same col_type as fn_build_filter_with group by column

col_name2: must have the same col_type as fn_build_filter_with first parameter column

function fn_pivot

Fn_pivot builds a number matrix, matrix[x][y] = num_value (x: group by col value, y: text column value)

select group_by_col_name, p(text_col_name), p(number_col_name)
from view_name function fn_pivot(operator, with_alias, custom_function, signature, before_fn, after_fn)
group by group_by_col_name

group_by_col_name: optional, a one line matrix is created if not provided

operator: <sum|sumprevious|sumafter|last>

with_alias: with holding the fn_build_filter, or null meaning "no filter, no sort"

custom_function: optional, can be dynamic code or function in stormbase.so

signature: <fn_num|fn_num_num|fn_key_num|fn_key3_num2>, the custom function signature (if and only if custom function is provided)

before_fn: optional, code or null, can be dynamic code or fn name in stormbase.so

after_fn: optional, code or null, can be dynamic code or fn name in stormbase.so

Note: custom function usage is explain in the fn_custom section

function fn_filter_pivot

Fn_filter_pivot removes columns from matrix created by fn_pivot

select * from no_view function fn_filter_pivot(with_alias1, with_alias2)

with_alias1: with holding the fn_pivot

with_alias2: with holding the fn_build_filter

Note: used in general with sumafter/sumprevious

function fn_unpivot

Fn_unpivot factorize the matrix created by fn_pivot

select * from no_view funtion fn_unpivot(<sum|avg>, with_holding_the_fn_pivot, custom_function)

custom_function: optional, can be dynamic code or code in stormbase.so

function fn_custom

Fn_custom: ...

select group_by_col_name, function_parameters from view_name
function fn_custom(custom_function, signature, before_fn, after_fn)
group by group_by_col_name

group_by_col_name: optional, a one line matrix is created if not provided

function_parameters: p(col_name1), p(col_name2), etc..

custom_function: optional, can be dynamic code or function in stormbase.so

signature: <fn_num|fn_num_num|fn_key_num|fn_key3_num2>, the custom function signature (if and only if custom function is provided)

before_fn: optional, code or null, can be dynamic code or fn name in stormbase.so

after_fn: optional, code or null, can be dynamic code or fn name in stormbase.so

Parameters

desc parameter

desc <|parameter> <|verbose>

set

set param_name='param_value'
set param_name=default

Others

reduce memory and disk usage

Use QUERY_TABLE_ALLOWED and SKIP_REFRESH_FOR_NOT_IN_MEMORY_BIG_TABLES

Use these parameters when you need to limit memory and disk usage, and when response time does not matter

need to force a value during an insert

use INSERT_FORCE_VALUE parameter

function sb_export

loop table_name(columns) function sb_export('path/to/directory','file_tag',thread_count) ...

set_text/set_sparse_text

show table statistics (work in progress)

allow errors in select

Sometimes it can't be easier to ignore errors, it facilitates integration with reporting tool

Parameters: ALLOW_WHERE_ERROR, ALLOW_GROUP_BY_ERROR, ALLOW_EXP_ERROR

sb_parallel

When you use a custom C in multi thread context, you might need each thread to treat a given subset of the data

This is possible in SB, and this is called "sb_parallel mode"

If a where clause use sb_parallel lua function then the "sb_parallel mode" is triggered

For instance here under we want each thread to treat data of a single item_id, hence the sb_parallel(item_id) where clause

In some cases "sb_parallel mode" is replaced by a "mono thread mode"

This happens when pareto in used or when the index of the sb_parallel column cannot be used

fn_compute_info / fn_smart_delete (beta)

set_dirty

refresh a particular view
set_dirty <the_view>

backup

backup 'the_directory'

dates

SB tags

select

SB use_index|no_index|log_verbose|no_cache|explain

..
from ...

Several SB tags can be used

index vs no_index read

index read versus no_index (aka sequential read or full scan) read

countsequence

countsequence(the_column) counts the number of value changes during reading
countsequence can be an alternative to countdistinct
!!! countsequence will return incorrect result if PARETO_LIMIT!=1 !!!

hide data from memory

Parameters: MAX_PERF_FILTER & MAX_PERF_FILTER_FN

countdistinct & high sparsity

Parameters: SEQUENCE_COLUMNS & SEQUENCE_COLUMN_COMPANIONS & USE_INDEX_WITH_SORT_LIMIT & REPLACE_COUNTDISTINCT_BY_COUNTSEQUENCE

SEQUENCE_COLUMNS will tell SB to sort the lines according to a given column

number format/precision

Loop/update data

data in stormbase can be read/updated using loop statements and C programs

SB C interface

loop table_name(col1, coll2, ...) function function_name(param1, param2, ...)
update (updated_col1, updated_col1, ...)
where ...

Parameters, update clause and where clause are optional

SB will execute 3 C functions

1/ <function_name>_before: optional "before function" called once, where you allocated the object you will need

2/ <function_name>: mandatory "loop function" called once per line fecthed, where you populate the objects

3/ <function_name>_after: optional "after function" called once, where you free memory

See input/output of these functions in common.h

basic read

basic update

passing parameters

mutex

standard update functions

fn_update_idx1

fn_update_idx1_num

fn_copy_col

fn_concat

skip

If before fn returns SKIP, the rest of the loop is skip

Computed columns

Basic types using loop function fn_computed_col

No coding an no parameter is necessary in 2 cases (most common cases):

1/ text 1 to 1 relationship, get the department of an item (type A)

2/ coefficient 1 to 1 relationship, get price from qty using item price (type B)

Basic types using loop function fn_computed_table

Fn_computed_table is a batch version of fn_computed_col

Dimension behaviour can also be reproduced with fn_computed_table

SB parameter

set COMPUTED_COLUMNS='comp_col1,comp_col2,...'

SB C interface

SB needs 2 functions per computed column

1/ <comp_col_name>_before

2/ <comp_col_name>

See input/output of these functions in common.h

priority between computed column and real column

By default real column has priority over a computed column with same name

computed column basics

New_customer_id is a computed text column

Its destination col_type is t_customer_id

We are going to return always the same value

So we don't need other columns to compute new_customer_id in this basic case

Please note that needed_col_count can't be zero, because on the in/out value at first position

Hence needed_col_names=["dept"]

computed columns with analytic function

computed columns with param

optimized contexts (type A)

select_context_type A

optimized contexts (type B)

select_context_type B

using other columns

New_customer_id will be equal to the first customer_id will have 2 values:

1/ current customer_id if item_id is artA (ival=10)

2/ null otherwise

using other columns (better code)

Dept was here for clarity (in/out value), let's remove it, and use customer_id instead

And we won't need to update the new in/out in case 1/ because it is already good

SB predefined contexts C API

A context is a C custom objects stored in memory be SB ant that can be used by other part of your custom C code

Some predefined contexts for most common operations are provided (see common.h)

using context basics

Sometimes your computed column will need other information because the needed columns in not enough

Context should be used in that case

using custom context

Sometimes predefined context is not enough, or you needed several custom contexts

In such cas you should define your own context

using indexes on computed columns

Index can be used if there is only one needed column and that this column has an index (is part of a dimension)

Cluster

SQL steps: SB client -> SB cluster -> send SQL to all SBs of the cluster -> merge results -> reply to SB client

cluster_readme.txt

init cluster

SB cluster will connect to all SB nodes, download data model, and store it under 02_sql/INT_CLU

02_sql> ./cluster.sh init;

start cluster

Start SB cluster in companion mode

02_sql> ./cluster.sh start|bounce;

basics

Cluster of 2 (same) SB

desc

custom.js

SB_CLUSTER tag

Use SB_CLUSTER tag to help SB cluster

REGEX_FIRST_NODE_ONLY

Use SB_CLUSTER tag to help SB cluster

errors management

Timeouts may happen in this mode

Pg_companion

Pre requisite: one Postgresql (PG) instance running somewhere (the PG companion)

You can use PG driver to query SB

Query steps: PG client -> SB cluster -> SB and PG companion -> pgsql client

SB cluster will reply to pgsql client will a "mix" of SB and PG responses

Only queries compatible with PG are supported in this mode

init cluster

The goal is to create SB's data model in PG companion

02_sql> ./cluster.sh init_pg_companion;

start cluster

Start SB cluster in companion mode

02_sql> ./cluster.sh start|bounce;

query cluster

Query SB using PG client

Transac sql

SB is not a transactionnal database, but there is small transactionnal mode in SB

This mode can be used to store users, messages

The goal is to avoid having another database for that

Corruption is possible in bi mode, but it is not possible in transac mode

Update with sql in possible in transac mode but not in bi mode

Parameters TCP_PORT_TRANSAC and TRANSAC_FILES_DIR must be set in stormbase.conf

create

create table table_name(col1 text, col2 text, ...)

insert

insert into table_name('txt1', 'txt2', ...)

insert

select <*|col1, col2, ...> from table_name
<|where col1='txt1' and col2='txt2' ...>

update

update table_name set col1='txt1', col2='txt2'
<|where col1='txt1' and col2='txt2' ...>

delete

delete table_name
<|where col1='txt1' and col2='txt2' ...>

Version

Versions information

Taxi benchmark

data model

pre requisite

You will need around 250 GB of RAM for the full dataset

Disable swap

swapoff -a

Disable THP (=Transparent Huge Page) if activated

On Centos (as root)
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
On Redhat (as root)
echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag

download data

Https://www.stormbase.fr/downloads/taxi_small.zip

< * | *** >

Create / insert / refresh

create col_type

create table

create view

insert values

refresh dirty view

view data model & parameters

upsert (merge table)

save

insert from file

delete

refresh_online

partitions

refresh_online & partitions

refresh advanced

deletes & refresh_online

Basics

flat select

group by select

callback basics

common aggregation operators

other aggregation operators

orphans management and dimension priority

escape quote

Stormbase & Lua

utility Lua functions

callback Lua functions

data model modification #1

data model modification #2

where clause basics

where clause / Lua boolean function

(not)exists basics

col_type_checks basics

col_type_checks advanced

permissions

dynamic sql (advanced)

Commands

defragmentation

continue_on_error/stop_on_error

basics shells

export.sh

import.sh

sql.sh

y/n (sql.sh)

display (sql.sh)

bounce

shutdown

Contexts

INIT_FILE_PATH (default ./init.sb)

Analytics

functions

function fn_having

function fn_store/fn_merge basics

function fn_build_filter

function fn_apply_filter

function fn_pivot

function fn_filter_pivot

function fn_unpivot

function fn_custom

Parameters

desc parameter

set

Others

reduce memory and disk usage

need to force a value during an insert

function sb_export

set_text/set_sparse_text

show table statistics (work in progress)

allow errors in select

sb_parallel

fn_compute_info / fn_smart_delete (beta)

set_dirty

backup

dates

SB tags

index vs no_index read

countsequence

hide data from memory

countdistinct & high sparsity