. See TqlRoadmap
for an introduction.
None of the following operations change the table they act on (updates will be described at a later time). Thus, they follow the functional programming "no side-effects" rule. Generally the operations return another (virtual) table. "RT" is short for "result table". The final returned query result ("result set") is that value of the last operation. "Expr" is an abbreviation for "expression". The descriptions below assume familiarity with TqlColumnTable
For expressions, if an operation involves 2 tables, such as Join, then one can supply "a." and "b." prefixes to column names to distinguish between the first table and the second. If not supplied, the interpreter assumes "a." if there is a naming conflict, meaning the first table is assumed. See TqlPrefix
Here's a brief summary
of the most common SMEQL operators roughly in order of importance:
- calc(table, columnTable) // similar to SELECT clause in SQL
- filter(table, expression) // similar to WHERE clause in SQL
- group(table, columnTable) // roughly similar to GROUP BY in SQL
- join(table_1, table_2, expression)
- orderBy(table, columnTable, [sequenceColumn]) // sorts or produces sequence numbers
- union(table_1, table_2)
RT = calc(table, columnTable)
Purpose: Alter, filter, or rename columns. Similar to SQL "Select" clause.
A column table with at least "expr" and "col" columns. (See TqlColumnTable
Example: Combine the values for two regions into one column.
rt = calc(byRegion, [productID, (northTotal + southTotal) total])
productID northTotal southTotal
10 20 10
11 100 200
12 15 15
13 10 25
Notes: An empty column table is equivalent to "*" in SQL's SELECT, thus "calc(myTable,)" is comparable to "SELECT * FROM myTable".
rt = filter(table, expr)
Purpose: Filter rows by a given boolean expression . Similar to SQL "Where" clause.
rt = join(table_1, table_2, expr)
(thorough description coming later)
Note: If columns between the two tables have the same name, the first table's column will be output. If an alias is needed to preserve columns, then use the Calc operation first to generate an alias. The "a." and "b." notation (described above) can also distinguish between columns of the two tables.
rt = leftJoin(table_1, table_2, expr)
(thorough description coming later)
Note: Overlapping columns are subject to the same rules as the "Join" operator.
rt = group(table, columnTable)
Purpose: Provide aggregate information. Similar to SQL's GROUP BY option.
A column table. "Compute" can only be used if all values are the same for the given group. If not, an error is triggered. (Such duplication may suggest bad normalization of table design, but in practice one is often stuck with bad designs and must find ways to live with them.)
Example: grades = group(exams, [studentID, avg(score) grade]) // average grade per student
Note that the "groupingExpr" parameter has been removed from the draft because it appears to be unnecessary. It would have been an expression that determines how to group related items. If the expression is blank, a totalling is assumed (one result record). See discussion below about the reason for removal.
rt = orderBy(table, columnTable, [infoColumn])
Purpose: To sort results, or at least provide a sequence number. The orderBy function generates:
- An optional sequence number that starts at 1.
- If the orderBy is ASCending, 'A' will have a lower "order" number than 'Z' (assuming ASCII character set).
- If the orderBy is DESCending, 'A' will have a higher "order" number than 'Z' (assuming ASCII character set).
Virtual or actual table to performing ordering on.
Used to determine sorting criteria and ranking. The "order" column determines the sort ranking. If you need to sort on an expression instead of just columns, use the Calc operation first. If the column table is empty, sorting on the primary key is assumed. More on this in "Notes" below. Use "DESC" or "DESCENDING" as the expression for descending order.
An optional column name (string expression) that produces a column that stores the sequence number resulting from the sorting. Sorting is not generally recognized as a "true" relational operator. But, an operator that returns a sequence can still be a valid relational operator, and has many uses.
Notes: The primary key is assumed to be tacked on to the "end" of the sorting criteria. This ensures that the results are deterministic, a requirement of relational theory. Thus, if two columns are specified as the sorting criteria, the primary key is assumed to be the third column (and more if the key is compound). It also means that sorting is on the primary key if no explicit sorting criteria are given.
Sorting in the middle of a multi-operation query is generally futile if a specific output order is expected; for only the final operation affects the actual result order (if supported). However, the "infoColumn" value can still be used for other later computations if "in the middle" since it is a reference-able column in the resulting intermediate table or final result.
To ponder: Should it be "order" instead of "orderBy"?
rt = cols(stringExpr)
Purpose: Converts a string with column-related information into a Column Table. Generally considered a "syntactic convenience operation". See description given earlier [relink]. It may be abbreviated "c", which is a shortcut version of the same operation. This is because it is used so frequently.
(Square brackets are used instead for some examples. See TqlOverloading
rt = union(table_1, table_2)
Purpose: Combines (adds) two tables into one result table. Columns that don't match (based on name and type, if using a typed database) will be excluded from the result.
"Calc" and "Union" do not necessarily return unique results. If uniqueness is desired, then the "Unique" operator can be used. (Uniqueness of row sets is a controversial topic in the relational world. See HowOtherQueryLanguagesAddressSqlFlaws
for a discussion.)
rt = schema(table)
Purpose: Returns a Column Table with schema info. The column "col" will contain the column name and "expr" will contain column type info from the database engine. The type info depends on the specific database engine or driver. For example, an Oracle engine may return "VARCHAR2(30)". TQL does not endorse any particular type system. "Op" may also be populated to give info about very base types to reduce the need to parse and map the "expr" column. It may contain "Char" for character info, "Number" for numeric columns, and perhaps others such as "Date", depending on the database engine. TQL operations that match columns based on type, such as Union, generally will compare on base types, if available from the driver. This allows columns with specific types such as "Integer", "Decimal", and "Float" to all be merged or treated as the same general type.
- Maybe we could use a "subtraction" or "nonJoin" operator to make it easier to implement what would be "NOT IN" in SQL. One can use a left join and then use the empty cells to find the subtracted set, but that is kind of round-about. --top
- A "crossTab" operator(s) for cross-tab transformations.
- Graph traversal operator(s). (A sample is laying around here somewhere, I just forgot where it is.)
- Operators that can access imperative languages for imperative looping. See IteratorVersusQuery.
- An exclude(...) operator that removes given columns from the emitted columns:
x = calc(myTable, [a,b,c,d,e])
exclude(x, [c,d]) // result columns: a, b, e
I am leaning toward removing the expression parameter in the Group operation. The TqlColumnTable
parameter may be sufficient if we supply an ordering in the column-table. Those column-table columns without an operation would be considered the grouping critera. The only reason I can think of having an expression parameter or a second column table is if the order (as listed in column list) of the result columns differ from the order of the grouping columns. However, it is probably a rare enough case such that one can rely on the Calc() operation to adjust such when it does occur.
select count(*) as cnt, aaa, bbb, avg(ccc)
group by aaa, bbb
TQL (or SMEQL):
group(t, [count() cnt, aaa, bbb, avg(ccc)]) // square-bracket syntax variation
(Dots to prevent TabMunging
In some dialects of SQL, if you put a column in the Count function it will not count those rows that have Null in the column. The TQL approach shown here does not prevent such a convention.
Perhaps the closest SQL equivalents should be used for familiarity, such as "Select" for "Calc" and "Where" for "Filter". I chose names I felt were the most meaningful, but entrenched convention and familiarity may trump such. --top
Here's a draft sample for defining views, including user-defined viewed:
view myView(t: table, e:expression, s:scalar) // parameter types optional
x = filter(t, e)
orderBy(x, [a, b], s) // last operation returns result table
myResult = myView(foo, bar, zig)
Expressions are evaluated inside the function, not at the calling point. (I'll also consider curly braces over end-x block syntax.)
 If one wanted to join 3 tables, then they'd have to use two "join" statements. This may seem annoying at first, but other techniques can be used to minimize the need for such constructs. See TqlChainedJoin
 A way to dynamically generate an expression list from a TqlColumnTable
(virtual or actual) should perhaps be considered so that QueryByExample
is easier to implement. Perhaps it can be a function so that one can "chain" expressions for forming an expression string such as "qbe(myAndTable) or qbe(myOrTable) and not qbe(myNotQuery)". More pondering on this is needed. -t (Mar. 2012)
Future or alternative operations are mentioned or linked in the TqlRoadmap