create col_type
create
col_type
<end_user
|col_type_name> as
<number
|text
>
create table
create
table
<|merge|big> table_name ( column_name1 <number
|text
|col_type_name>, ...)
create view
create
view
view_name as
select
* from
table_name1, table_name2, ... where
table_name1.column_nameA = table_name2.column_nameB ...
insert values
insert
into
table_name values
('col1_value'
,'col2_value'
,...)
refresh dirty view
Refresh the views after insert
Dirty means that data has been inserted in the first table of the view since last refresh
refresh
dirty
view
view data model & parameters
Desc (describe) command
desc
<|col_type
|table
|view
|context|computed_columns|parameter>
desc
col_type
<|verbose
>
desc
table
the_table <|verbose
>
desc
view
the_view <|verbose
>
upsert (merge table)
Insert into a merge table will upsert (update if PK already exits, insert otherwise)
save
Save data on disk (persistence)
save
insert from file
insert
into
table_name select
* from
file(<'path/to/file'
|'file_name under CSV_FILES_DIR/table_name'
>)
delete
refresh_online
Unlike refresh, refresh_online won't generate downtime
refresh_online
dirty
view
partitions
Size of table is limited to 4G lines, hence need for partition
refresh_online & partitions
A partition can have its refresh_online views attached
refresh advanced
Applies deletes + updates + data model changes on tables and views)
refresh
dirty
table
Recompute view but not max_perf
refresh
view
view_name
Recompute permissions if changes and drop cache
refresh
permission
Recompute computed columns if changes and create a new version of stormbase.so under _SO_LINUX
refresh
computed_column
Drop cache
refresh
cache
Does all above and recomputes max_perf and re execute init file
refresh
dirty
view
deletes & refresh_online
Some big tables have a "delete before insert" logic along with refresh_online needs
In previous version of Stormbase such tables were defined as big_online, now all big tables support this functionality
Big_online keyword is kept for backward compatibility
flat select
select
<*|table1.*|col1|table1#col1>, ... from
...
group by select
select
col1, col2, ... from
... group by
col1, col2, ...
group by
clause can be ommited
callback basics
Ths callback allows to run functions on the resultset
select
.. from
.. <callback
|cb
> callback_function1 callback_function2 ...
cb is a shortcut for callback
most common callbacks are sort and limit
more info are provided in "callback Lua functions" section
common aggregation operators
<sum|count|countdistinct|min|max|avg>
other aggregation operators
<unique|countsequence|minstr|maxstr|list|ival|ivallist|ivalbin|p>
Note: uniquesoft can be used instead of unique for backward compatibility, behavior is the same
orphans management and dimension priority
Column item_id in view v_sales can come from table sales or table items
For artA and artB we don't care
For artC we do care because artC is a orhan record (*)
By default SB will look for item_id in non empty (**) dimension tables (***), then in the fact table
But you can tell SB which one you want using table#col syntax
(*) exists in fact table but not in dimension table
(**) an empty dimension is skipped
(***) dimension1 is the second table of from clause of the view create statement etc...,
--> !!! the order of dimensions in the from clause of the view create statement is important
escape quote
''
Stormbase & Lua
Lua is a progamming langage.
Stormbase is not developped in Lua, it is developped in native C.
But in some cases (callbacks and
where
clause, see here under) Stormbase calls Lua code.
Standard Lua code (provided in Stormbase installer) is located in _FUNCTIONS/stormbase.lua.
Custom Lua code can also be added in _FUNCTIONS/custom.lua.
utility Lua functions
callback Lua functions
data model modification #1
...
data model modification #2
Since v1.17.08 SB is a lot more flexible
where clause basics
select
.. from
...
<where
|and
> where_clause1
and
where_clause2
and
where_clause3
..
Where clause syntax
column_name='value'
column_name in ('value1'
, 'value2'
, ...)
column_name nin ('value1'
, 'value2'
, ...)
column_name like 'value'
where clause / Lua boolean function
any Lua boolean function
can be used in a where
clause using one and
only one column
(not)exists basics
select
.. from
view1
and
view2.col=
'something'
Exists
and
view2!.col=
'something'
Not exists
The exists join between view1 and view2 is implicit (the first col_type in common will be used for join)
Use case: when view2 is based on a non merge table, which why it can't be included in view1
col_type_checks basics
A col_type_check is a "select col, count(col) from .. where ... group by col"
It is used to filter other part of the sql
The filter will be applied on the first column that has same col_type as the col_type_check
If you alias your group by column in the with clause, the filter with apply in priority to a column with this name (if it exists)
with
with_name1 ( col_type_check1 ),
with_name2 ( col_type_check2 ),
...
select
.. from
view1, with_name1, with_name2! ..
col_type_checks advanced
Col_type_checks can be combined to do intersection etc...
It is used to filter other part of the sql
with
with_name1 ( col_type_check1 ),
with_name2 ( select
col, count(col) from
view2, with_name1 where
... group by
col ),
...
select
.. from
view3, with_name2
permissions
end_user
col_type
dynamic sql (advanced)
defragmentation
The CELL_* files in internal storage needs defragmentation, defrag is done in offline mode during the save process
continue_on_error/stop_on_error
By default a script execution will stop on first error
Sometimes this is not what you want (common case is multiple execution of a create statement)
To continue on all errors
continue_on_error
To stop on all errors
stop_on_error
To continue on some errors
continue_on_error(err_code1, err_code2, ...)
basics shells
start.sh
stop.sh
bounce.sh
show_log.sh
export.sh
Exports data_model and/or data to a directory
./export.sh <export_dir> <data_model y/n> <data y/n> <nb_thread> <dry_run y/n>
import.sh
Import data_model and/or data from an export directory
./import.sh <export_dir> <data_model y/n> <data y/n> <dry_run y/n>
sql.sh
From 02_sql
./sql.sh the_user/the_password@sb_host:sb_port [--file:<file.sql> --i]
i means interactive (the file is executed but you must press enter after each execution)
From 01_database (connection to local SB)
./sql.sh [--file:<file.sql> --i]
Note about logging: sql.sh will create
a log file (sql.log), this file is bit different from
stdout, it is used to generate this documentation !!
y/n (sql.sh)
n tells sql.sh to "read but not run"
y tells sql.sh to restart normally
display (sql.sh)
something I want to see in sql.log without being executed
it can be on several lines but cannot contain a semi column
bounce
same as
bounce.sh without killing the process
shutdown
same as
stop.sh without killing the process
functions
Functions allow to do advanced sql, the general syntax is this
select
column_name, function_parameters
from
view_name function
function_name(function_arguments)
group by
column_name
function_parameters: p(colum_name1), p(colum_name2), ...
function_arguments: depends on the function
function fn_having
Fn_having is a filter that applys on an operator results (for instance items that have been sold in more that 3 stores)
fn_having(<count|countdistinct|sum|sumpareto>, operator, value)
operator: = < > <= >=
function fn_store/fn_merge basics
Sql is often used to format/display results, fn_store/fn_merge do that
with
with_name1 as
(
select
group_by_col_name, stored_values
from
view_name function
fn_store ...
group by
group_by_col_name
),
...
select
col_type
.val, with_name1.alias1 ... from
no_view, * function
fn_merge
stored_values: operator(col_name) as alias1, operator(col_name) as alias2 etc..
operator: <sum|count,countdistinct|min|max|minstr|maxstr|uniques>
Note1: all the with must have 0 or 1 group by column
Note2: all the group by columns must have the same col_type, you refer to this col_type with col_type.val in the fn_merge select
function fn_build_filter
Fn_build_filter builds a boolean matrix, matrix[x][y] = y/n (x: group by col value, y: text column value)
select
group_by_col_name, p(text_col_name) from
view_name function
fn_build_filter(order,columns_kept)
group_by_col_name: optional, a one line matrix is created if not provided
order: <asc|desc> optional defaults to asc
columns_kept: 1 means keep first, 2 keep first two, etc... optional defaults to keep all
function fn_apply_filter
Fn_apply_filter applys the boolean matrix produced by fn_build_filter
select
group_by_col_name, stored_values, p(col_name1), p(col_name2) from
view_name
function
fn_apply_filter(with_alias) ...
group by
group_by_col_name
with_alias: with holding the fn_build_filter, or null meaning "take previous with"
group_by_col_name: optional
stored_values: operator(col_name) as alias1, operator(col_name) as alias2 etc..
col_name1: must have the same col_type as fn_build_filter_with group by column
col_name2: must have the same col_type as fn_build_filter_with first parameter column
function fn_pivot
Fn_pivot builds a number matrix, matrix[x][y] = num_value (x: group by col value, y: text column value)
select
group_by_col_name, p(text_col_name), p(number_col_name)
from
view_name function
fn_pivot(operator, with_alias, custom_function, signature, before_fn, after_fn)
group by
group_by_col_name
group_by_col_name: optional, a one line matrix is created if not provided
operator: <sum|sumprevious|sumafter|last>
with_alias: with holding the fn_build_filter, or null meaning "no filter, no sort"
custom_function: optional, can be dynamic code or function in stormbase.so
signature: <fn_num|fn_num_num|fn_key_num|fn_key3_num2>, the custom function signature (if and only if custom function is provided)
before_fn: optional, code or null, can be dynamic code or fn name in stormbase.so
after_fn: optional, code or null, can be dynamic code or fn name in stormbase.so
Note: custom function usage is explain in the fn_custom section
function fn_filter_pivot
Fn_filter_pivot removes columns from matrix created by fn_pivot
select
* from
no_view function
fn_filter_pivot(with_alias1, with_alias2)
with_alias1: with holding the fn_pivot
with_alias2: with holding the fn_build_filter
Note: used in general with sumafter/sumprevious
function fn_unpivot
Fn_unpivot factorize the matrix created by fn_pivot
select
* from
no_view funtion fn_unpivot(<sum|avg>, with_holding_the_fn_pivot, custom_function)
custom_function: optional, can be dynamic code or code in stormbase.so
function fn_custom
Fn_custom: ...
select
group_by_col_name, function_parameters from
view_name
function
fn_custom(custom_function, signature, before_fn, after_fn)
group by
group_by_col_name
group_by_col_name: optional, a one line matrix is created if not provided
function_parameters: p(col_name1), p(col_name2), etc..
custom_function: optional, can be dynamic code or function in stormbase.so
signature: <fn_num|fn_num_num|fn_key_num|fn_key3_num2>, the custom function signature (if and only if custom function is provided)
before_fn: optional, code or null, can be dynamic code or fn name in stormbase.so
after_fn: optional, code or null, can be dynamic code or fn name in stormbase.so
reduce memory and disk usage
Use QUERY_TABLE_ALLOWED and SKIP_REFRESH_FOR_NOT_IN_MEMORY_BIG_TABLES
Use these parameters when you need to limit memory and disk usage, and when response time does not matter
need to force a value during an insert
use INSERT_FORCE_VALUE parameter
function sb_export
loop table_name(columns) function
sb_export('path/to/directory'
,'file_tag'
,thread_count) ...
set_text/set_sparse_text
show table statistics (work in progress)
allow errors in select
Sometimes it can't be easier to ignore errors, it facilitates integration with reporting tool
Parameters: ALLOW_WHERE_ERROR, ALLOW_GROUP_BY_ERROR, ALLOW_EXP_ERROR
sb_parallel
When you use a custom C in multi thread context, you might need each thread to treat a given subset of the data
This is possible in SB, and this is called "sb_parallel mode"
If a where clause use sb_parallel lua function then the "sb_parallel mode" is triggered
For instance here under we want each thread to treat data of a single item_id, hence the sb_parallel(item_id) where clause
In some cases "sb_parallel mode" is replaced by a "mono thread mode"
This happens when pareto in used or when the index of the sb_parallel column cannot be used
fn_compute_info / fn_smart_delete (beta)
set_dirty
refresh
a particular view
set_dirty <the_view>
backup
backup 'the_directory'
dates
SB tags
select
SB use_index|no_index|log_verbose|no_cache|explain
..
from
...
Several SB tags can be used
index vs no_index read
index read versus no_index (aka sequential read or full scan) read
countsequence
countsequence(the_column) counts the number
of value changes during reading
countsequence can be an alternative to countdistinct
!!! countsequence will return incorrect result if PARETO_LIMIT!=1 !!!
hide data from memory
Parameters: MAX_PERF_FILTER & MAX_PERF_FILTER_FN
countdistinct & high sparsity
Parameters: SEQUENCE_COLUMNS & SEQUENCE_COLUMN_COMPANIONS & USE_INDEX_WITH_SORT_LIMIT & REPLACE_COUNTDISTINCT_BY_COUNTSEQUENCE
SEQUENCE_COLUMNS will tell SB to sort the lines according to a given column
number format/precision
SB C interface
loop table_name(col1, coll2, ...) function
function_name(param1, param2, ...)
update
(updated_col1, updated_col1, ...)
where
...
Parameters, update clause and where clause are optional
SB will execute 3 C functions
1/ <function_name>_before: optional "before function" called once, where you allocated the object you will need
2/ <function_name>: mandatory "loop function" called once per line fecthed, where you populate the objects
3/ <function_name>_after: optional "after function" called once, where you free memory
See input/output of these functions in common.h
basic read
basic update
passing parameters
mutex
standard update functions
skip
If before fn returns SKIP, the rest of the loop is skip
Basic types using loop function fn_computed_col
No coding an no parameter is necessary in 2 cases (most common cases):
1/ text 1 to 1 relationship, get the department of an item (type A)
2/ coefficient 1 to 1 relationship, get price from qty using item price (type B)
SB parameter
set
COMPUTED_COLUMNS='comp_col1,comp_col2,...'
SB C interface
SB needs 2 functions per computed column
1/ <comp_col_name>_before
2/ <comp_col_name>
See input/output of these functions in common.h
priority between computed column and real column
By default real column has priority over a computed column with same name
computed column basics
New_customer_id is a computed text column
Its destination col_type is t_customer_id
We are going to return always the same value
So we don't need other columns to compute new_customer_id in this basic case
Please note that needed_col_count can't be zero, because on the in/out value at first position
Hence needed_col_names=["dept"]
computed columns with analytic function
computed columns with param
optimized contexts (type A)
optimized contexts (type B)
using other columns
New_customer_id will be equal to the first customer_id will have 2 values:
1/ current customer_id if item_id is artA (ival=10)
2/ null otherwise
using other columns (better code)
Dept was here for clarity (in/out value), let's remove it, and use customer_id instead
And we won't need to update the new in/out in case 1/ because it is already good
SB predefined contexts C API
A context is a C custom objects stored in memory be SB ant that can be used by other part of your custom C code
Some predefined contexts for most common operations are provided (see common.h)
using context basics
Sometimes your computed column will need other information because the needed columns in not enough
Context should be used in that case
using custom context
Sometimes predefined context is not enough, or you needed several custom contexts
In such cas you should define your own context
using indexes on computed columns
Index can be used if there is only one needed column and that this column has an index (is part of a dimension)
Pre requisite: one Postgresql (PG) instance running somewhere (the PG companion)
You can use PG driver to query SB
Query steps: PG client -> SB cluster -> SB and PG companion -> pgsql client
SB cluster will reply to pgsql client will a "mix" of SB and PG responses
Only queries compatible with PG are supported in this mode
SB is not a transactionnal database, but there is small transactionnal mode in SB
This mode can be used to store users, messages
The goal is to avoid having another database for that
Corruption is possible in bi mode, but it is not possible in transac mode
Update with sql in possible in transac mode but not in bi mode
Parameters TCP_PORT_TRANSAC and TRANSAC_FILES_DIR must be set in stormbase.conf
create
create
table
table_name(col1 text
, col2 text
, ...)
insert
insert
into
table_name('txt1'
, 'txt2'
, ...)
insert
select
<*|col1, col2, ...> from
table_name
<|where
col1='txt1'
and
col2='txt2'
...>
update
update
table_name set
col1='txt1'
, col2='txt2'
<|where
col1='txt1'
and
col2='txt2'
...>
delete
delete
table_name
<|where
col1='txt1'
and
col2='txt2'
...>