Query Algebra
In this section, we sketch the design and implementation of the query algebra. We will need the following definitions.
using DataKnots:
@VectorTree,
@query,
Collect,
Count,
DataKnot,
Drop,
Each,
Environment,
Exists,
Filter,
First,
Get,
Given,
Group,
Is,
Is0to1,
Is0toN,
Is1to1,
Is1toN,
It,
Join,
Keep,
Label,
Last,
Let,
Lift,
Max,
Min,
Mix,
Nth,
Record,
Sort,
Sum,
Tag,
Take,
Unique,
assemble,
elements,
rewrite_all,
shape,
trivial_pipe,
target_pipe,
uncover
Example Dataset
As a running example, we will use the following dataset of city departments with associated employees. This dataset is serialized as a nested structure with a singleton root record, which holds all department records, each of which holds associated employee records.
chicago_data =
@VectorTree (department = [(name = (1:1)String,
employee = [(name = (1:1)String,
position = (1:1)String,
salary = (0:1)Int,
rate = (0:1)Float64)])],) [
(department = [
(name = "POLICE",
employee = ["JEFFERY A" "SERGEANT" 101442 missing
"NANCY A" "POLICE OFFICER" 80016 missing
"ANTHONY A" "POLICE OFFICER" 72510 missing
"ALBA M" "POLICE CADET" missing 9.46]),
(name = "FIRE",
employee = ["JAMES A" "FIRE ENGINEER-EMT" 103350 missing
"DANIEL A" "FIREFIGHTER-EMT" 95484 missing
"ROBERT K" "FIREFIGHTER-EMT" 103272 missing]),
(name = "OEMC",
employee = ["LAKENYA A" "CROSSING GUARD" missing 17.68
"DORIS A" "CROSSING GUARD" missing 19.38
"BRENDA B" "TRAFFIC CONTROL AIDE" 64392 missing])],
)
]
chicago = DataKnot(Any, chicago_data, :x1to1)
#=>
│ department{name,employee{name,position,salary,rate}} │
┼─────────────────────────────────────────────────────────────────────┼
│ POLICE, [JEFFERY A, SERGEANT, 101442, missing; NANCY A, POLICE OFFI…│
=#
Constructing Queries
In DataKnots, we query data by assembling and running Query
objects. Queries are constructed algebraically: they either come a set of atomic primitive queries, or are built from other queries using query combinators.
For example, consider the query:
Employees = Get(:department) >> Get(:employee)
#-> Get(:department) >> Get(:employee)
This query traverses the dataset through fields department and employee. It is constructed from two primitive queries Get(:department)
and Get(:employee)
connected using the query composition combinator >>
.
Since attribute traversal is so common, DataKnots provides a shorthand notation.
Employees = It.department.employee
#-> It.department.employee
To apply a query to a DataKnot
, we use indexing notation. The output of a query is also a DataKnot
.
chicago[Employees]
#=>
│ employee │
│ name position salary rate │
───┼────────────────────────────────────────────────┼
1 │ JEFFERY A SERGEANT 101442 │
2 │ NANCY A POLICE OFFICER 80016 │
3 │ ANTHONY A POLICE OFFICER 72510 │
4 │ ALBA M POLICE CADET 9.46 │
5 │ JAMES A FIRE ENGINEER-EMT 103350 │
6 │ DANIEL A FIREFIGHTER-EMT 95484 │
7 │ ROBERT K FIREFIGHTER-EMT 103272 │
8 │ LAKENYA A CROSSING GUARD 17.68 │
9 │ DORIS A CROSSING GUARD 19.38 │
10 │ BRENDA B TRAFFIC CONTROL AIDE 64392 │
=#
Regular Julia values and functions could be used to create query components. Specifically, any Julia value could be converted to a query primitive, and any Julia function could be converted to a query combinator.
For example, let us find find employees whose salary is greater than $100k. For this purpose, we need to construct a predicate query that compares the salary field with a specific number.
If we were constructing an ordinary predicate function, we would write:
salary_over_100k(emp) = emp.salary > 100000
An equivalent query is constructed as follows:
SalaryOver100K = Lift(>, (Get(:salary), Lift(100000)))
#-> Lift(>, (Get(:salary), Lift(100000)))
This query expression is constructed from two primitive components: Get(:salary)
and Lift(100000)
, which serve as parameters of the Lift(>)
combinator. Here, Lift
is used twice. Lift
applied to a regular Julia value converts it to a constant query primitive while Lift
applied to a function lifts it to a query combinator.
As a shorthand notation for lifting functions and operators, DataKnots supports broadcasting syntax:
SalaryOver100K = It.salary .> 100000
#-> It.salary .> 100000
To test this query, we can append it to the Employees
query using the composition combinator.
chicago[Employees >> SalaryOver100K]
#=>
──┼───────┼
1 │ true │
2 │ false │
3 │ false │
4 │ true │
5 │ false │
6 │ true │
7 │ false │
=#
However, this only gives us a list of bare Boolean values disconnected from the respective employees. To contextualize this output, we can use the Record
combinator.
chicago[Employees >> Record(It.name,
It.salary,
:salary_over_100k => SalaryOver100K)]
#=>
│ employee │
│ name salary salary_over_100k │
───┼─────────────────────────────────────┼
1 │ JEFFERY A 101442 true │
2 │ NANCY A 80016 false │
3 │ ANTHONY A 72510 false │
4 │ ALBA M │
5 │ JAMES A 103350 true │
6 │ DANIEL A 95484 false │
7 │ ROBERT K 103272 true │
8 │ LAKENYA A │
9 │ DORIS A │
10 │ BRENDA B 64392 false │
=#
To actually filter the data using this predicate query, we need to use the Filter
combinator.
EmployeesWithSalaryOver100K = Employees >> Filter(SalaryOver100K)
#-> It.department.employee >> Filter(It.salary .> 100000)
chicago[EmployeesWithSalaryOver100K]
#=>
│ employee │
│ name position salary rate │
──┼────────────────────────────────────────────┼
1 │ JEFFERY A SERGEANT 101442 │
2 │ JAMES A FIRE ENGINEER-EMT 103350 │
3 │ ROBERT K FIREFIGHTER-EMT 103272 │
=#
DataKnots provides a number of useful query constructors. For example, to find the number of items produced by a query, we can use the Count
combinator.
chicago[Count(EmployeesWithSalaryOver100K)]
#=>
┼───┼
│ 3 │
=#
In general, query algebra forms an XPath-like domain-specific language. It is designed to let the user construct queries incrementally, with each step being individually crafted and tested. It also encourages the user to create reusable query components and remix them in creative ways.
Compiling Queries
In DataKnots, applying a query to the input data is a two-phase process. First, the query generates a pipeline. Second, this pipeline transforms the input data to the output data.
Let us elaborate on the role of pipelines and queries. In DataKnots, just like pipelines are used to transform data, a query can transform pipelines. That is, a query can be applied to a pipeline to produce a new pipeline.
To run a query on the given data, we apply the query to a trivial pipeline. The generated pipeline is used to actually transform the data.
To demonstrate how to apply a query, let us use EmployeesWithSalaryOver100K
from the previous section. Recall that it could be represented as follows:
Get(:department) >> Get(:employee) >> Filter(Get(:salary) .> 100000)
#-> Get(:department) >> Get(:employee) >> Filter(Get(:salary) .> 100000)
This query is constructed using a composition combinator. A query composition transforms a pipeline by sequentially applying the component queries. Therefore, to find the pipeline of EmployeesWithSalaryOver100K
, we need to start with a trivial pipeline and sequentially tranfrorm it with the queries Get(:department)
, Get(:employee)
and Filter(SalaryOver100K)
.
The trivial pipeline can be obtained from the input data.
p0 = trivial_pipe(chicago)
#-> pass()
We use the function assemble()
to apply a query to a pipeline. To run assemble()
we need to create the environment object.
env = Environment()
p1 = assemble(env, p0, Get(:department))
#-> chain_of(with_elements(column(:department)), flatten())
The pipeline p1
fetches the attribute department from the input data. In general, Get(name)
maps a pipeline to its elementwise composition with column(name)
. For example, when we apply Get(:employee)
to p1
, what we get is the result of compose(p1, column(:employee))
.
p2 = assemble(env, p1, Get(:employee))
#=>
chain_of(chain_of(with_elements(column(:department)), flatten()),
chain_of(with_elements(column(:employee)), flatten()))
=#
To finish assembling the pipeline, we apply Filter(SalaryOver100K)
to p2
. Filter
acts on the input pipeline as follows. First, it assembles the predicate pipeline by applying the predicate query to a trivial pipeline.
pc0 = target_pipe(p2)
#-> wrap()
pc1 = assemble(env, pc0, SalaryOver100K)
#=>
chain_of(
wrap(),
chain_of(
with_elements(
chain_of(
chain_of(
⋮
tuple_lift(>)),
adapt_missing())),
flatten()))
=#
Filter(SalaryOver100K)
then combines the pipelines p2
and pc1
using the pipeline primitive sieve_by()
.
p3 = assemble(env, p2, Filter(SalaryOver100K))
#=>
chain_of(
chain_of(chain_of(with_elements(column(:department)), flatten()),
chain_of(with_elements(column(:employee)), flatten())),
chain_of(
with_elements(
chain_of(
⋮
sieve_by())),
flatten()))
=#
The resulting pipeline could be compacted by simplifying the pipeline expression.
p = rewrite_all(uncover(p3))
#=>
chain_of(with_elements(chain_of(column(:department),
with_elements(column(:employee)))),
flatten(),
flatten(),
with_elements(chain_of(tuple_of(pass(),
chain_of(column(:salary),
tuple_of(
pass(),
filler(100000)),
tuple_lift(>),
adapt_missing(),
block_any())),
sieve_by())),
flatten())
=#
Applying this pipeline to the input data gives us the output of the query.
p(chicago)
#=>
│ employee │
│ name position salary rate │
──┼────────────────────────────────────────────┼
1 │ JEFFERY A SERGEANT 101442 │
2 │ JAMES A FIRE ENGINEER-EMT 103350 │
3 │ ROBERT K FIREFIGHTER-EMT 103272 │
=#
API Reference
DataKnots.Environment
— TypeEnvironment()
Query compilation state.
DataKnots.Query
— TypeQuery(op, args...)
A query is implemented as a pipeline transformation that preserves pipeline source. Specifically, a query takes the input pipeline that maps the source to the input target and generates a pipeline that maps the source to the output target.
Parameter op
is a function that performs the transformation; args
are extra arguments passed to the function.
The query transforms an input pipeline p
by invoking op
with the following arguments:
op(env::Environment, q::Pipeline, args...)
The result of op
must be the output pipeline.
Base.getindex
— Methoddb::DataKnot[F::Query; params...] :: DataKnot
Queries db
with F
.
Test Suite
Querying
A Query
is applied to a DataKnot
using the array indexing syntax.
Q = Count(It.department)
chicago[Q]
#=>
┼───┼
│ 3 │
=#
Any parameters to the query should be be passed as keyword arguments.
Q = It.department >>
Filter(Count(It.employee >> Filter(It.salary .> It.AMT)) .>= It.SZ) >>
Count
chicago[Q, AMT=100000, SZ=1]
#=>
┼───┼
│ 2 │
=#
We can use the function assemble()
to see the query plan.
p = assemble(chicago, Count(It.department))
#=>
with_elements(chain_of(column(:department), block_length()))
=#
p(chicago)
#=>
┼───┼
│ 3 │
=#
@query
Query
objects could be constructed using a convenient notation provided by the macro @query
. For example, the query Count(It.department)
could also be written as:
@query count(department)
#-> Count(Get(:department))
The @query
macro could also be used to apply the constructed query to a DataKnot
.
@query chicago count(department)
#=>
┼───┼
│ 3 │
=#
Query parameters could be passed as keyword arguments.
@query chicago AMT=100000 SZ=1 begin
department
filter(count(employee.filter(salary > AMT)) >= SZ)
count()
end
#=>
┼───┼
│ 2 │
=#
The following syntax is recognized by the @query
macro.
A bare field identifier can be used to extract the value of the given field.
@query department
#-> Get(:department)
A sequence of statements in a begin
/end
block becomes a composition of queries.
@query begin
department
employee
salary
max()
end
#-> Get(:department) >> Get(:employee) >> Get(:salary) >> Then(Max)
Expressions separated by .
are also converted to query composition.
@query department.employee.salary.max()
#-> Get(:department) >> Get(:employee) >> Get(:salary) >> Then(Max)
The let
clause is converted to the Given
combinator.
@query begin
department
let max_salary => max(employee.salary)
employee
filter(salary == max_salary)
end
end
#=>
Get(:department) >>
Given(Max(Get(:employee) >> Get(:salary)) >> Label(:max_salary),
Get(:employee) >>
Filter(Lift(==, (Get(:salary), Get(:max_salary)))))
=#
Curly brackets are converted to the Record
combinator.
@query begin
department
{ name, count(employee) }
end
#-> Get(:department) >> Record(Get(:name), Count(Get(:employee)))
@query department.{name, count(employee)}
#-> Get(:department) >> Record(Get(:name), Count(Get(:employee)))
@query department{name, count(employee)}
#-> Get(:department) >> Record(Get(:name), Count(Get(:employee)))
The Pair
constructor =>
can be used for label assignment.
@query size => count(employee)
#-> Count(Get(:employee)) >> Label(:size)
Constants, functions and operators are automatically lifted.
@query department.titlecase(name)
#-> Get(:department) >> Lift(titlecase, (Get(:name),))
@query employee.filter(salary > 100_000)
#-> Get(:employee) >> Filter(Lift(>, (Get(:salary), Lift(100000))))
Logical operators are comparison chains are also supported.
@query employee.filter(50_000 < salary < 100_000)
#=>
Get(:employee) >> Filter(Lift(&,
(Lift(<, (Lift(50000), Get(:salary))),
Lift(<, (Get(:salary), Lift(100000))))))
=#
@query employee.filter(50_000 < salary && salary < 100_000)
#=>
Get(:employee) >> Filter(Lift(&,
(Lift(<, (Lift(50000), Get(:salary))),
Lift(<, (Get(:salary), Lift(100000))))))
=#
@query employee.filter(salary < 50_000 || salary > 100_000)
#=>
Get(:employee) >> Filter(Lift(|,
(Lift(<, (Get(:salary), Lift(50000))),
Lift(>, (Get(:salary), Lift(100000))))))
=#
Queries defined elsewhere could be embedded in a @query
expression using interpolation syntax ($
).
Size = @query count(employee)
#-> Count(Get(:employee))
@query department{name, $Size}
#-> Get(:department) >> Record(Get(:name), Count(Get(:employee)))
Composition
Queries can be composed sequentially using the >>
combinator.
Q = Lift(3) >> (It .+ 4) >> (It .* 6)
#-> Lift(3) >> (It .+ 4) >> (It .* 6)
chicago[Q]
#=>
┼────┼
│ 42 │
=#
The It
query primitive is the identity with respect to >>
.
Q = It >> Q >> It
#-> It >> Lift(3) >> (It .+ 4) >> (It .* 6) >> It
chicago[Q]
#=>
┼────┼
│ 42 │
=#
In @query
notation, the identity query is called it
.
@query it
#-> It
Composition of queries is written as a sequence of statements in a begin
/end
block.
@query begin
3
it + 4
it * 6
end
#-> Lift(3) >> Lift(+, (It, Lift(4))) >> Lift(*, (It, Lift(6)))
@query (3; it + 4; it * 6)
#-> Lift(3) >> Lift(+, (It, Lift(4))) >> Lift(*, (It, Lift(6)))
Alternatively, the .
symbol is used as the composition combinator.
@query (3).(it + 4).(it * 6)
#-> Lift(3) >> Lift(+, (It, Lift(4))) >> Lift(*, (It, Lift(6)))
Record
The query Record(X₁, X₂ … Xₙ)
emits records with the fields generated by X₁
, X₂
… Xₙ
.
Q = It.department >>
Record(It.name,
:size => Count(It.employee))
#-> It.department >> Record(It.name, :size => Count(It.employee))
chicago[Q]
#=>
│ department │
│ name size │
──┼──────────────┼
1 │ POLICE 4 │
2 │ FIRE 3 │
3 │ OEMC 3 │
=#
If a field has no label, an ordinal label (#A
, #B
… #AA
, #AB
…) is assigned.
Q = It.department >> Record(It.name, Count(It.employee))
#-> It.department >> Record(It.name, Count(It.employee))
chicago[Q]
#=>
│ department │
│ name #B │
──┼────────────┼
1 │ POLICE 4 │
2 │ FIRE 3 │
3 │ OEMC 3 │
=#
Similarly, when there are duplicate labels, only the last one survives.
Q = It.department >> Record(It.name, It.employee.name)
#-> It.department >> Record(It.name, It.employee.name)
chicago[Q]
#=>
│ department │
│ #A name │
──┼───────────────────────────────────────────────┼
1 │ POLICE JEFFERY A; NANCY A; ANTHONY A; ALBA M │
2 │ FIRE JAMES A; DANIEL A; ROBERT K │
3 │ OEMC LAKENYA A; DORIS A; BRENDA B │
=#
In @query
notation, Record(X₁, X₂ … Xₙ)
is written as record(X₁, X₂ … Xₙ)
.
@query department.record(name, size => count(employee))
#=>
Get(:department) >> Record(Get(:name),
Count(Get(:employee)) >> Label(:size))
=#
Alternatively, we could use the {}
brackets.
@query {count(department), max(department.count(employee))}
#=>
Record(Count(Get(:department)),
Max(Get(:department) >> Count(Get(:employee))))
=#
When {}
is used in composition, the composition operator .
could be omitted.
@query department.{name, size => count(employee)}
#=>
Get(:department) >> Record(Get(:name),
Count(Get(:employee)) >> Label(:size))
=#
@query department{name, size => count(employee)}
#=>
Get(:department) >> Record(Get(:name),
Count(Get(:employee)) >> Label(:size))
=#
Collect
The query Collect(X)
adds a new field to the input record.
Q = It.department >> Collect(:size => Count(It.employee))
#-> It.department >> Collect(:size => Count(It.employee))
chicago[Q]
#=>
│ department │
│ name employee{name,position,salary,rate} size │
──┼───────────────────────────────────────────────────────────────────┼
1 │ POLICE JEFFERY A, SERGEANT, 101442, missing; NANCY A, POLI… 4 │
2 │ FIRE JAMES A, FIRE ENGINEER-EMT, 103350, missing; DANIEL… 3 │
3 │ OEMC LAKENYA A, CROSSING GUARD, missing, 17.68; DORIS A,… 3 │
=#
More than one field could be added at the same time.
Q = It.department >>
Collect(:size => Count(It.employee),
:avg_salary => Sum(It.employee.salary) ./ It.size)
chicago[Q]
#=>
│ department │
│ name employee{name,position,salary,rate} size avg_salary │
──┼───────────────────────────────────────────────────────────────────┼
1 │ POLICE JEFFERY A, SERGEANT, 101442, missing; N… 4 63492.0 │
2 │ FIRE JAMES A, FIRE ENGINEER-EMT, 103350, mis… 3 100702.0 │
3 │ OEMC LAKENYA A, CROSSING GUARD, missing, 17.… 3 21464.0 │
=#
If the new field has no label, an ordinal label will be assigned to it.
Q = It.department >> Collect(Count(It.employee))
chicago[Q]
#=>
│ department │
│ name employee{name,position,salary,rate} #C │
──┼───────────────────────────────────────────────────────────────────┼
1 │ POLICE JEFFERY A, SERGEANT, 101442, missing; NANCY A, POLICE… 4 │
2 │ FIRE JAMES A, FIRE ENGINEER-EMT, 103350, missing; DANIEL A… 3 │
3 │ OEMC LAKENYA A, CROSSING GUARD, missing, 17.68; DORIS A, C… 3 │
=#
If the record already has a field with the same name, that field is replaced with the new field.
Q = It.department >> Collect(:employee => It.employee.name >> titlecase.(It),
:name => It.name >> titlecase.(It))
chicago[Q]
#=>
│ department │
│ name employee │
──┼───────────────────────────────────────────────┼
1 │ Police Jeffery A; Nancy A; Anthony A; Alba M │
2 │ Fire James A; Daniel A; Robert K │
3 │ Oemc Lakenya A; Doris A; Brenda B │
=#
To remove a field from a record, replace it with the value nothing
.
Q = It.department >> Collect(:size => Count(It.employee),
:employee => nothing)
chicago[Q]
#=>
│ department │
│ name size │
──┼──────────────┼
1 │ POLICE 4 │
2 │ FIRE 3 │
3 │ OEMC 3 │
=#
Collect
can be used as an aggregate primitive.
Q = It.department.employee >> Collect
chicago[Q]
#=>
│ department{name,employee{name,pos… employee{name,position,salary,ra…│
┼─────────────────────────────────────────────────────────────────────┼
│ POLICE, [JEFFERY A, SERGEANT, 101… JEFFERY A, SERGEANT, 101442, mis…│
=#
In @query
notation, Collect(X)
is written as collect(X)
.
@query department.collect(size => count(employee), employee => nothing)
#=>
Get(:department) >> Collect(Count(Get(:employee)) >> Label(:size),
Lift(nothing) >> Label(:employee))
=#
The aggregate primitive Collect
is written as collect()
.
@query department.employee.collect()
#-> Get(:department) >> Get(:employee) >> Then(Collect)
Join
Join(X)
, just like Collect(X)
, adds a field to the input record. As opposed to Collect
, Join(X)
evaluates its argument against the input source.
Q = It.department >> Each(It.employee >> Join(:dept_name => It.name))
#-> It.department >> Each(It.employee >> Join(:dept_name => It.name))
chicago[Q]
#=>
│ employee │
│ name position salary rate dept_name │
───┼───────────────────────────────────────────────────────────┼
1 │ JEFFERY A SERGEANT 101442 POLICE │
2 │ NANCY A POLICE OFFICER 80016 POLICE │
3 │ ANTHONY A POLICE OFFICER 72510 POLICE │
4 │ ALBA M POLICE CADET 9.46 POLICE │
5 │ JAMES A FIRE ENGINEER-EMT 103350 FIRE │
6 │ DANIEL A FIREFIGHTER-EMT 95484 FIRE │
7 │ ROBERT K FIREFIGHTER-EMT 103272 FIRE │
8 │ LAKENYA A CROSSING GUARD 17.68 OEMC │
9 │ DORIS A CROSSING GUARD 19.38 OEMC │
10 │ BRENDA B TRAFFIC CONTROL AIDE 64392 OEMC │
=#
At the same time, Join(X)
uses the target source, which allows us to correlate the joined field with the input data.
Q = It.department.employee >>
Filter(Exists(It.salary)) >>
Keep(:the_salary => It.salary) >>
Join(:rank => Count(It.department.employee >> Filter(It.salary .>= It.the_salary)))
chicago[Q]
#=>
│ employee │
│ name position salary rate rank │
──┼─────────────────────────────────────────────────────┼
1 │ JEFFERY A SERGEANT 101442 3 │
2 │ NANCY A POLICE OFFICER 80016 5 │
3 │ ANTHONY A POLICE OFFICER 72510 6 │
4 │ JAMES A FIRE ENGINEER-EMT 103350 1 │
5 │ DANIEL A FIREFIGHTER-EMT 95484 4 │
6 │ ROBERT K FIREFIGHTER-EMT 103272 2 │
7 │ BRENDA B TRAFFIC CONTROL AIDE 64392 7 │
=#
If the new field has no label, it will have an ordinal label assigned to it.
Q = It.department >>
Keep(:the_size => Count(It.employee)) >>
Join(Count(It.department >> Filter(Count(It.employee) .>= It.the_size)))
chicago[Q]
#=>
│ department │
│ name employee{name,position,salary,rate} #C │
──┼───────────────────────────────────────────────────────────────────┼
1 │ POLICE JEFFERY A, SERGEANT, 101442, missing; NANCY A, POLICE… 1 │
2 │ FIRE JAMES A, FIRE ENGINEER-EMT, 103350, missing; DANIEL A… 3 │
3 │ OEMC LAKENYA A, CROSSING GUARD, missing, 17.68; DORIS A, C… 3 │
=#
If the record already has a field with the same name, that field is replaced with the new field.
Q = It.department >>
Each(It.employee >>
Keep(:the_position => It.position) >>
Join(:position => It.the_position .* " (" .* It.name .* ")"))
chicago[Q]
#=>
│ employee │
│ name position salary rate │
───┼───────────────────────────────────────────────────────┼
1 │ JEFFERY A SERGEANT (POLICE) 101442 │
2 │ NANCY A POLICE OFFICER (POLICE) 80016 │
3 │ ANTHONY A POLICE OFFICER (POLICE) 72510 │
4 │ ALBA M POLICE CADET (POLICE) 9.46 │
5 │ JAMES A FIRE ENGINEER-EMT (FIRE) 103350 │
6 │ DANIEL A FIREFIGHTER-EMT (FIRE) 95484 │
7 │ ROBERT K FIREFIGHTER-EMT (FIRE) 103272 │
8 │ LAKENYA A CROSSING GUARD (OEMC) 17.68 │
9 │ DORIS A CROSSING GUARD (OEMC) 19.38 │
10 │ BRENDA B TRAFFIC CONTROL AIDE (OEMC) 64392 │
=#
In @query
notation, Join(X)
is written as join(X)
.
@query department.each(employee.join(dept_name => name))
#=>
Get(:department) >> Each(Get(:employee) >> Join(Get(:name) >>
Label(:dept_name)))
=#
Mix
The query Mix(X₁, X₂ … Xₙ)
emits records containing all combinations of elements generated by X₁
, X₂
… Xₙ
.
Q = It.department >> Mix(It.name, It.employee)
#-> It.department >> Mix(It.name, It.employee)
chicago[Q]
#=>
│ department │
│ name employee{name,position,salary,rate} │
───┼────────────────────────────────────────────────────────┼
1 │ POLICE JEFFERY A, SERGEANT, 101442, missing │
2 │ POLICE NANCY A, POLICE OFFICER, 80016, missing │
3 │ POLICE ANTHONY A, POLICE OFFICER, 72510, missing │
4 │ POLICE ALBA M, POLICE CADET, missing, 9.46 │
5 │ FIRE JAMES A, FIRE ENGINEER-EMT, 103350, missing │
6 │ FIRE DANIEL A, FIREFIGHTER-EMT, 95484, missing │
7 │ FIRE ROBERT K, FIREFIGHTER-EMT, 103272, missing │
8 │ OEMC LAKENYA A, CROSSING GUARD, missing, 17.68 │
9 │ OEMC DORIS A, CROSSING GUARD, missing, 19.38 │
10 │ OEMC BRENDA B, TRAFFIC CONTROL AIDE, 64392, missing │
=#
When a field has no label, an ordinal label is assigned.
Q = It.department >> Mix(It.name, It.employee.rate >> round.(It))
chicago[Q]
#=>
│ department │
│ name #B │
──┼──────────────┼
1 │ POLICE 9.0 │
2 │ OEMC 18.0 │
3 │ OEMC 19.0 │
=#
Similarly, duplicate fields are replaced by ordinal labels.
Q = It.department >> Mix(It.name, It.employee.name)
chicago[Q]
#=>
│ department │
│ #A name │
───┼───────────────────┼
1 │ POLICE JEFFERY A │
2 │ POLICE NANCY A │
3 │ POLICE ANTHONY A │
4 │ POLICE ALBA M │
5 │ FIRE JAMES A │
6 │ FIRE DANIEL A │
7 │ FIRE ROBERT K │
8 │ OEMC LAKENYA A │
9 │ OEMC DORIS A │
10 │ OEMC BRENDA B │
=#
In @query
notation, Mix(X₁, X₂ … Xₙ)
is written as mix(X₁, X₂ … Xₙ)
.
@query department.mix(name, employee)
#-> Get(:department) >> Mix(Get(:name), Get(:employee))
Lift
The Lift
constructor is used to convert Julia values and functions to queries.
Lift(val)
makes a query primitive from a Julia value.
Q = Lift("Hello World!")
#-> Lift("Hello World!")
chicago[Q]
#=>
┼──────────────┼
│ Hello World! │
=#
Lifting missing
produces no output.
Q = Lift(missing)
#-> Lift(missing)
chicago[Q]
#=>
(empty)
=#
Lifting a vector produces plural output.
Q = Lift('a':'c')
#-> Lift('a':1:'c')
chicago[Q]
#=>
──┼───┼
1 │ a │
2 │ b │
3 │ c │
=#
When lifting a vector, we can specify the cardinality constraint.
Q = Lift('a':'c', :x1toN)
#-> Lift('a':1:'c', :x1toN)
chicago[Q]
#=>
──┼───┼
1 │ a │
2 │ b │
3 │ c │
=#
Lift
can also convert Julia functions to query combinators.
Inc(X) = Lift(x -> x+1, (X,))
Q = Lift(0) >> Inc(It)
#-> Lift(0) >> Lift(x -> x + 1, (It,))
chicago[Q]
#=>
┼───┼
│ 1 │
=#
Functions of multiple arguments are also supported.
GT(X, Y) = Lift(>, (X, Y))
Q = It.department.employee >>
Record(It.name, It.salary, GT(It.salary, 100000))
#=>
It.department.employee >>
Record(It.name, It.salary, Lift(>, (It.salary, 100000)))
=#
chicago[Q]
#=>
│ employee │
│ name salary #C │
───┼──────────────────────────┼
1 │ JEFFERY A 101442 true │
2 │ NANCY A 80016 false │
3 │ ANTHONY A 72510 false │
4 │ ALBA M │
5 │ JAMES A 103350 true │
6 │ DANIEL A 95484 false │
7 │ ROBERT K 103272 true │
8 │ LAKENYA A │
9 │ DORIS A │
10 │ BRENDA B 64392 false │
=#
Just as functions with no arguments.
Q = Lift(rand, ())
#-> Lift(rand, ())
#? VERSION < v"1.7.0-DEV"
using Random: seed!
seed!(0)
chicago[Q]
#=>
┼──────────┼
│ 0.823648 │
=#
#? VERSION >= v"1.7.0-DEV"
using Random: seed!
seed!(0)
chicago[Q]
#=>
┼──────────┼
│ 0.405699 │
=#
Functions with vector arguments are supported.
using Statistics: mean
Mean(X) = Lift(mean, (X,))
Q = Mean(It.department.employee.salary)
#-> Lift(mean, (It.department.employee.salary,))
chicago[Q]
#=>
┼─────────┼
│ 88638.0 │
=#
Just like with regular values, missing
and vector results are interpreted as no and plural output.
Q = Inc(missing)
#-> Lift(x -> x + 1, (missing,))
chicago[Q]
#=>
(empty)
=#
OneTo(N) = Lift(UnitRange, (1, N))
Q = OneTo(3)
#-> Lift(UnitRange, (1, 3))
chicago[Q]
#=>
──┼───┼
1 │ 1 │
2 │ 2 │
3 │ 3 │
=#
Julia functions are lifted when they are broadcasted over queries.
Q = mean.(It.department.employee.salary)
#-> mean.(It.department.employee.salary)
chicago[Q]
#=>
┼─────────┼
│ 88638.0 │
=#
In @query
notation, values and functions are lifted automatically.
@query "Hello World!"
#-> Lift("Hello World!")
@query missing
#-> Lift(missing)
@query 'a':'c'
#-> Lift(Colon, (Lift('a'), Lift('c')))
@query (0; it + 1)
#-> Lift(0) >> Lift(+, (It, Lift(1)))
@query department.employee{name, salary, salary > 100000}
#=>
Get(:department) >>
Get(:employee) >>
Record(Get(:name), Get(:salary), Lift(>, (Get(:salary), Lift(100000))))
=#
@query mean(department.employee.salary)
#-> Lift(mean, (Get(:department) >> Get(:employee) >> Get(:salary),))
Query-valued functions are also supported. They are not lifted, but applied immediately.
increment(x) = @query $x + 1
@query $increment(1)
#-> Lift(+, (Lift(1), Lift(1)))
Query value functions could also be defined via Lift
.
increment(x) = Lift(+, (x, 1))
@query $increment(1 + 1)
#-> Lift(+, (Lift(+, (Lift(1), Lift(1))), 1))
Each
Each
serves as a barrier for aggregate queries.
Q = It.department >> (It.employee >> Count)
#-> It.department >> It.employee >> Count
chicago[Q]
#=>
┼────┼
│ 10 │
=#
Q = It.department >> Each(It.employee >> Count)
#-> It.department >> Each(It.employee >> Count)
chicago[Q]
#=>
──┼───┼
1 │ 4 │
2 │ 3 │
3 │ 3 │
=#
Note that Record
and Lift
also serve as natural barriers for aggregate queries.
Q = It.department >>
Record(It.name, It.employee >> Count)
#-> It.department >> Record(It.name, It.employee >> Count)
chicago[Q]
#=>
│ department │
│ name #B │
──┼────────────┼
1 │ POLICE 4 │
2 │ FIRE 3 │
3 │ OEMC 3 │
=#
Q = It.department >>
(1 .* (It.employee >> Count))
#-> It.department >> (1 .* It.employee >> Count)
chicago[Q]
#=>
──┼───┼
1 │ 4 │
2 │ 3 │
3 │ 3 │
=#
In @query
notation, Each(X)
is written as each(X)
.
@query department.each(employee.count())
#-> Get(:department) >> Each(Get(:employee) >> Then(Count))
Label
We use the Label()
primitive to assign a label to the output.
Q = Count(It.department) >> Label(:num_dept)
#-> Count(It.department) >> Label(:num_dept)
chicago[Q]
#=>
│ num_dept │
┼──────────┼
│ 3 │
=#
As a shorthand, we can use =>
.
Q = :num_dept => Count(It.department)
#-> :num_dept => Count(It.department)
chicago[Q]
#=>
│ num_dept │
┼──────────┼
│ 3 │
=#
In @query
notation, we could use label(name)
or =>
syntax.
@query count(department).label(num_dept)
#-> Count(Get(:department)) >> Label(:num_dept)
@query num_dept => count(department)
#-> Count(Get(:department)) >> Label(:num_dept)
Tag
We use Tag()
constructor to assign a name to a query.
DeptSize = Count(It.employee) >> Label(:dept_size)
#-> Count(It.employee) >> Label(:dept_size)
DeptSize = Tag(:DeptSize, DeptSize)
#-> DeptSize
Q = It.department >> Record(It.name, DeptSize)
#-> It.department >> Record(It.name, DeptSize)
chicago[Q]
#=>
│ department │
│ name dept_size │
──┼───────────────────┼
1 │ POLICE 4 │
2 │ FIRE 3 │
3 │ OEMC 3 │
=#
Tag()
is also used to assign a name to a query combinator.
SalaryOver(X) = It.salary .> X
SalaryOver(100000)
#-> It.salary .> 100000
SalaryOver(X) = Tag(SalaryOver, (X,), It.salary .> X)
SalaryOver(100000)
#-> SalaryOver(100000)
Q = It.department.employee >>
Filter(SalaryOver(100000))
#-> It.department.employee >> Filter(SalaryOver(100000))
chicago[Q]
#=>
│ employee │
│ name position salary rate │
──┼────────────────────────────────────────────┼
1 │ JEFFERY A SERGEANT 101442 │
2 │ JAMES A FIRE ENGINEER-EMT 103350 │
3 │ ROBERT K FIREFIGHTER-EMT 103272 │
=#
Get
We use the Get(name)
to extract the value of a record field.
Q = Get(:department) >> Get(:name)
#-> Get(:department) >> Get(:name)
chicago[Q]
#=>
│ name │
──┼────────┼
1 │ POLICE │
2 │ FIRE │
3 │ OEMC │
=#
As a shorthand, extracting an attribute of It
generates a Get()
query.
Q = It.department.name
#-> It.department.name
chicago[Q]
#=>
│ name │
──┼────────┼
1 │ POLICE │
2 │ FIRE │
3 │ OEMC │
=#
We can also extract fields that have ordinal labels, but the label name is not preserved.
Q = It.department >>
Record(It.name, Count(It.employee)) >>
It.B
chicago[Q]
#=>
──┼───┼
1 │ 4 │
2 │ 3 │
3 │ 3 │
=#
Same notation is used to extract values of context parameters defined with Keep()
or Given()
.
Q = It.department >>
Keep(:dept_name => It.name) >>
It.employee >>
Record(It.dept_name, It.name)
chicago[Q]
#=>
│ employee │
│ dept_name name │
───┼──────────────────────┼
1 │ POLICE JEFFERY A │
2 │ POLICE NANCY A │
3 │ POLICE ANTHONY A │
4 │ POLICE ALBA M │
5 │ FIRE JAMES A │
6 │ FIRE DANIEL A │
7 │ FIRE ROBERT K │
8 │ OEMC LAKENYA A │
9 │ OEMC DORIS A │
10 │ OEMC BRENDA B │
=#
A context parameter is preferred if it has the same name as a record field.
Q = It.department >>
Keep(It.name) >>
It.employee >>
Record(It.name, It.position)
chicago[Q]
#=>
│ employee │
│ name position │
───┼──────────────────────────────┼
1 │ POLICE SERGEANT │
2 │ POLICE POLICE OFFICER │
3 │ POLICE POLICE OFFICER │
4 │ POLICE POLICE CADET │
5 │ FIRE FIRE ENGINEER-EMT │
6 │ FIRE FIREFIGHTER-EMT │
7 │ FIRE FIREFIGHTER-EMT │
8 │ OEMC CROSSING GUARD │
9 │ OEMC CROSSING GUARD │
10 │ OEMC TRAFFIC CONTROL AIDE │
=#
If there is no attribute with the given name, an error is reported.
Q = It.department.employee.ssn
chicago[Q]
#=>
ERROR: cannot find "ssn" at
(0:N) × (name = (1:1) × String, position = (1:1) × String, salary = (0:1) × Int64, rate = (0:1) × Float64)
=#
Regular and named tuples also support attribute lookup.
Q = Lift((name = "JEFFERY A", position = "SERGEANT", salary = 101442)) >>
It.position
chicago[Q]
#=>
│ position │
┼──────────┼
│ SERGEANT │
=#
Q = Lift((name = "JEFFERY A", position = "SERGEANT", salary = 101442)) >>
It.ssn
chicago[Q]
#=>
ERROR: cannot find "ssn" at
(1:1) × NamedTuple{(:name, :position, :salary), Tuple{String, String, Int64}}
=#
Q = Lift(("JEFFERY A", "SERGEANT", 101442)) >>
It.B
chicago[Q]
#=>
┼──────────┼
│ SERGEANT │
=#
Q = Lift(("JEFFERY A", "SERGEANT", 101442)) >>
It.Z
chicago[Q]
#=>
ERROR: cannot find "Z" at
(1:1) × Tuple{String, String, Int64}
=#
When applied to a dictionary with string keys, Get(name)
extracts the value corresponding to name
. If the dictionary does not contain the given key, missing
is returned.
Q = Lift(Dict("name" => "JEFFERY A", "position" => "SERGEANT")) >>
It.position
chicago[Q]
#=>
│ position │
┼──────────┼
│ SERGEANT │
=#
Q = Lift(Dict("name" => "JEFFERY A", "position" => "SERGEANT")) >>
It.ssn
chicago[Q]
#=>
│ ssn │
┼─────┼
(empty)
=#
In @query
notation, Get(:name)
is written as name
.
@query department.name
#-> Get(:department) >> Get(:name)
Keep
and Given
We use the combinator Keep()
to assign a value to a context parameter.
Q = It.department >>
Keep(:dept_name => It.name) >>
It.employee >>
Record(It.dept_name, It.name)
#=>
It.department >>
Keep(:dept_name => It.name) >>
It.employee >>
Record(It.dept_name, It.name)
=#
chicago[Q]
#=>
│ employee │
│ dept_name name │
───┼──────────────────────┼
1 │ POLICE JEFFERY A │
2 │ POLICE NANCY A │
3 │ POLICE ANTHONY A │
4 │ POLICE ALBA M │
5 │ FIRE JAMES A │
6 │ FIRE DANIEL A │
7 │ FIRE ROBERT K │
8 │ OEMC LAKENYA A │
9 │ OEMC DORIS A │
10 │ OEMC BRENDA B │
=#
Several context parameters could be defined together.
Q = It.department >>
Keep(:size => Count(It.employee),
:half => It.size .÷ 2) >>
Each(It.employee >> Take(It.half))
chicago[Q]
#=>
│ employee │
│ name position salary rate │
──┼─────────────────────────────────────────────┼
1 │ JEFFERY A SERGEANT 101442 │
2 │ NANCY A POLICE OFFICER 80016 │
3 │ JAMES A FIRE ENGINEER-EMT 103350 │
4 │ LAKENYA A CROSSING GUARD 17.68 │
=#
Keep()
requires that the parameter is labeled.
Q = It.department >>
Keep(Count(It.employee))
chicago[Q]
#-> ERROR: parameter name is not specified
Keep()
will override an existing parameter with the same name.
Q = It.department >>
Keep(:current_name => It.name) >>
It.employee >>
Filter(It.current_name .== "POLICE") >>
Keep(:current_name => It.name) >>
It.current_name
chicago[Q]
#=>
│ current_name │
──┼──────────────┼
1 │ JEFFERY A │
2 │ NANCY A │
3 │ ANTHONY A │
4 │ ALBA M │
=#
Combinator Given()
is used to evaluate a query with the given context parameters.
Q = It.department >>
Given(:size => Count(It.employee),
:half => It.size .÷ 2,
It.employee >> Take(It.half))
#=>
It.department >> Given(:size => Count(It.employee),
:half => div.(It.size, 2),
It.employee >> Take(It.half))
=#
chicago[Q]
#=>
│ employee │
│ name position salary rate │
──┼─────────────────────────────────────────────┼
1 │ JEFFERY A SERGEANT 101442 │
2 │ NANCY A POLICE OFFICER 80016 │
3 │ JAMES A FIRE ENGINEER-EMT 103350 │
4 │ LAKENYA A CROSSING GUARD 17.68 │
=#
Given()
does not let any parameters defined within its scope escape it.
Q = It.department >>
Given(Keep(It.name)) >>
It.employee >>
It.name
chicago[Q]
#=>
│ name │
───┼───────────┼
1 │ JEFFERY A │
2 │ NANCY A │
3 │ ANTHONY A │
4 │ ALBA M │
5 │ JAMES A │
6 │ DANIEL A │
7 │ ROBERT K │
8 │ LAKENYA A │
9 │ DORIS A │
10 │ BRENDA B │
=#
Given
has an alias called Let
.
Let
#-> DataKnots.Given
In @query
notation, Keep(X)
and Given(X, Q)
are written as keep(X)
and given(X, Q)
.
@query department.keep(dept_name => name).employee{dept_name, name}
#=>
Get(:department) >>
Keep(Get(:name) >> Label(:dept_name)) >>
Get(:employee) >>
Record(Get(:dept_name), Get(:name))
=#
@query begin
department
given(size => count(employee),
half => size ÷ 2,
employee.take(half))
end
#=>
Get(:department) >> Given(Count(Get(:employee)) >> Label(:size),
Lift(div, (Get(:size), Lift(2))) >>
Label(:half),
Get(:employee) >> Take(Get(:half)))
=#
Alternatively, the let
clause is translated to a Given
expression.
@query begin
department
let dept_name => name
employee{dept_name, name}
end
end
#=>
Get(:department) >> Given(Get(:name) >> Label(:dept_name),
Get(:employee) >> Record(Get(:dept_name),
Get(:name)))
=#
@query begin
department
let size => count(employee), half => size ÷ 2
employee.take(half)
end
end
#=>
Get(:department) >> Given(Count(Get(:employee)) >> Label(:size),
Lift(div, (Get(:size), Lift(2))) >>
Label(:half),
Get(:employee) >> Take(Get(:half)))
=#
Count
, Exists
, Sum
, Max
, Min
Count(X)
, Sum(X)
, Max(X)
, Min(X)
evaluate the X
and emit the number of elements, their sum, maximum, and minimum respectively.
Rate = It.department.employee.rate
Q = Record(Rate,
:count => Count(Rate),
:sum => Sum(Rate),
:max => Max(Rate),
:min => Min(Rate))
#=>
Record(It.department.employee.rate,
:count => Count(It.department.employee.rate),
:sum => Sum(It.department.employee.rate),
:max => Max(It.department.employee.rate),
:min => Min(It.department.employee.rate))
=#
chicago[Q]
#=>
│ rate count sum max min │
┼───────────────────────────────────────────────┼
│ 9.46; 17.68; 19.38 3 46.52 19.38 9.46 │
=#
Count
, Sum
, Max
, and Min
could also be used as aggregate primitives.
Q = Record(Rate,
:count => Rate >> Count,
:sum => Rate >> Sum,
:max => Rate >> Max,
:min => Rate >> Min)
#=>
Record(It.department.employee.rate,
:count => It.department.employee.rate >> Count,
:sum => It.department.employee.rate >> Sum,
:max => It.department.employee.rate >> Max,
:min => It.department.employee.rate >> Min)
=#
chicago[Q]
#=>
│ rate count sum max min │
┼───────────────────────────────────────────────┼
│ 9.46; 17.68; 19.38 3 46.52 19.38 9.46 │
=#
When applied to an empty input, Sum
emits 0
, Min
and Max
emit no output.
Rate = It.employee.rate
Q = It.department >>
Record(It.name,
Rate,
:count => Count(Rate),
:sum => Sum(Rate),
:max => Max(Rate),
:min => Min(Rate))
chicago[Q]
#=>
│ department │
│ name rate count sum max min │
──┼──────────────────────────────────────────────────┼
1 │ POLICE 9.46 1 9.46 9.46 9.46 │
2 │ FIRE 0 0.0 │
3 │ OEMC 17.68; 19.38 2 37.06 19.38 17.68 │
=#
Exists(X)
evaluates X
and emits a Boolean value that indicates whether X
produces at least one value or not.
Q = It.department.employee >>
Record(It.name,
It.salary,
:has_salary => Exists(It.salary),
It.rate,
:has_rate => It.rate >> Exists)
#=>
It.department.employee >> Record(It.name,
It.salary,
:has_salary => Exists(It.salary),
It.rate,
:has_rate => It.rate >> Exists)
=#
chicago[Q]
#=>
│ employee │
│ name salary has_salary rate has_rate │
───┼────────────────────────────────────────────────┼
1 │ JEFFERY A 101442 true false │
2 │ NANCY A 80016 true false │
3 │ ANTHONY A 72510 true false │
4 │ ALBA M false 9.46 true │
5 │ JAMES A 103350 true false │
6 │ DANIEL A 95484 true false │
7 │ ROBERT K 103272 true false │
8 │ LAKENYA A false 17.68 true │
9 │ DORIS A false 19.38 true │
10 │ BRENDA B 64392 true false │
=#
These operations are also available in the @query
notation.
@query begin
department.employee.rate.collect()
{rate, count(rate), sum(rate), max(rate), min(rate)}
end
#=>
Get(:department) >>
Get(:employee) >>
Get(:rate) >>
Then(Collect) >>
Record(Get(:rate),
Count(Get(:rate)),
Sum(Get(:rate)),
Max(Get(:rate)),
Min(Get(:rate)))
=#
@query begin
department
collect(employee.rate)
{rate, rate.count(), rate.sum(), rate.max(), rate.min()}
end
#=>
Get(:department) >>
Collect(Get(:employee) >> Get(:rate)) >>
Record(Get(:rate),
Get(:rate) >> Then(Count),
Get(:rate) >> Then(Sum),
Get(:rate) >> Then(Max),
Get(:rate) >> Then(Min))
=#
@query department.employee{name, exists(salary), rate.exists()}
#=>
Get(:department) >>
Get(:employee) >>
Record(Get(:name), Exists(Get(:salary)), Get(:rate) >> Then(Exists))
=#
Filter
We use Filter()
to filter the input by the given predicate.
Q = It.department >>
Filter(It.name .== "POLICE") >>
It.employee >>
Filter(It.name .== "JEFFERY A")
#=>
It.department >>
Filter(It.name .== "POLICE") >>
It.employee >>
Filter(It.name .== "JEFFERY A")
=#
chicago[Q]
#=>
│ employee │
│ name position salary rate │
──┼───────────────────────────────────┼
1 │ JEFFERY A SERGEANT 101442 │
=#
The predicate must produce true
of false
values.
Q = It.department >>
Filter(Count(It.employee))
chicago[Q]
#-> ERROR: expected a predicate
The input data is dropped when the output of the predicate contains only false
elements.
Q = It.department >>
Filter(It.employee >> (It.salary .> 100000)) >>
Record(It.name, It.employee.salary)
chicago[Q]
#=>
│ department │
│ name salary │
──┼───────────────────────────────┼
1 │ POLICE 101442; 80016; 72510 │
2 │ FIRE 103350; 95484; 103272 │
=#
In @query
notation, we write filter(X)
.
@query begin
department
filter(name == "POLICE")
employee
filter(name == "JEFFERY A")
end
#=>
Get(:department) >>
Filter(Lift(==, (Get(:name), Lift("POLICE")))) >>
Get(:employee) >>
Filter(Lift(==, (Get(:name), Lift("JEFFERY A"))))
=#
First
, Last
, Nth
We can use First(X)
, Last(X)
and Nth(X, N)
to extract the first, the last, or the N
-th element of the output of X
.
chicago[It.department.name]
#=>
│ name │
──┼────────┼
1 │ POLICE │
2 │ FIRE │
3 │ OEMC │
=#
Q = First(It.department.name)
#-> First(It.department.name)
chicago[Q]
#=>
│ name │
┼────────┼
│ POLICE │
=#
Q = Last(It.department.name)
#-> Last(It.department.name)
chicago[Q]
#=>
│ name │
┼──────┼
│ OEMC │
=#
Q = Nth(It.department.name, 2)
#-> Nth(It.department.name, 2)
chicago[Q]
#=>
│ name │
┼──────┼
│ FIRE │
=#
These operations also have an aggregate form.
Q = It.department.name >> First
#-> It.department.name >> First
chicago[Q]
#=>
│ name │
┼────────┼
│ POLICE │
=#
Q = It.department.name >> Last
#-> It.department.name >> Last
chicago[Q]
#=>
│ name │
┼──────┼
│ OEMC │
=#
Q = It.department.name >> Nth(2)
#-> It.department.name >> Nth(2)
chicago[Q]
#=>
│ name │
┼──────┼
│ FIRE │
=#
Nth
can take a query argument, which is evaluated against the input source and must produce a singular mandatory integer value.
chicago[Nth(It.department.name, Count(It.department) .- 1)]
#=>
│ name │
┼──────┼
│ FIRE │
=#
chicago[It.department.name >> Nth(Count(It.department) .- 1)]
#=>
│ name │
┼──────┼
│ FIRE │
=#
In @query
notation, we write first()
, last()
and nth(N)
.
@query first(department)
#-> First(Get(:department))
@query last(department)
#-> Last(Get(:department))
@query nth(department, 2)
#-> Nth(Get(:department), Lift(2))
@query department.first()
#-> Get(:department) >> Then(First)
@query department.last()
#-> Get(:department) >> Then(Last)
@query department.nth(2)
#-> Get(:department) >> Nth(Lift(2))
Take
and Drop
We use Take(N)
and Drop(N)
to pass or drop the first N
input elements.
Employee =
It.department >>
Filter(It.name .== "POLICE") >>
It.employee
Q = Employee >> Take(3)
#-> It.department >> Filter(It.name .== "POLICE") >> It.employee >> Take(3)
chicago[Q]
#=>
│ employee │
│ name position salary rate │
──┼─────────────────────────────────────────┼
1 │ JEFFERY A SERGEANT 101442 │
2 │ NANCY A POLICE OFFICER 80016 │
3 │ ANTHONY A POLICE OFFICER 72510 │
=#
Q = Employee >> Drop(3)
#-> It.department >> Filter(It.name .== "POLICE") >> It.employee >> Drop(3)
chicago[Q]
#=>
│ employee │
│ name position salary rate │
──┼────────────────────────────────────┼
1 │ ALBA M POLICE CADET 9.46 │
=#
Take(-N)
drops the last N
elements, while Drop(-N)
keeps the last N
elements.
Q = Employee >> Take(-3)
chicago[Q]
#=>
│ employee │
│ name position salary rate │
──┼───────────────────────────────────┼
1 │ JEFFERY A SERGEANT 101442 │
=#
Q = Employee >> Drop(-3)
chicago[Q]
#=>
│ employee │
│ name position salary rate │
──┼─────────────────────────────────────────┼
1 │ NANCY A POLICE OFFICER 80016 │
2 │ ANTHONY A POLICE OFFICER 72510 │
3 │ ALBA M POLICE CADET 9.46 │
=#
Take
and Drop
accept a query argument, which is evaluated against the input source and must produce a singular integer.
Half = Count(Employee) .÷ 2
Q = Employee >> Take(Half)
chicago[Q]
#=>
│ employee │
│ name position salary rate │
──┼─────────────────────────────────────────┼
1 │ JEFFERY A SERGEANT 101442 │
2 │ NANCY A POLICE OFFICER 80016 │
=#
Q = Take(Employee >> It.name)
chicago[Q]
#-> ERROR: expected a singular integer
In @query
notation, we write take(N)
and drop(N)
.
@query department.employee.take(3)
#-> Get(:department) >> Get(:employee) >> Take(Lift(3))
@query department.employee.drop(3)
#-> Get(:department) >> Get(:employee) >> Drop(Lift(3))
Is
The query Is(T)
asserts that the input has the type T
.
Q = It.department.name >> Is(String)
#-> It.department.name >> Is(String)
chicago[Q]
#=>
│ name │
──┼────────┼
1 │ POLICE │
2 │ FIRE │
3 │ OEMC │
=#
When the check fails, an error is reported.
Q = It.department.name >> Is(Int)
chicago[Q]
#-> ERROR: "name"[1]: expected a value of type Int64; got String
In @query
notation, this operation is written as is(T)
.
@query chicago department.name.is(String)
#=>
│ name │
──┼────────┼
1 │ POLICE │
2 │ FIRE │
3 │ OEMC │
=#
Is0to1
, Is0toN
, Is1to1
, Is1toN
The Is1to1
query asserts that the input exists and is singular.
Q = It.department >>
Take(1) >>
Is1to1
#-> It.department >> Take(1) >> Is1to1
chicago[Q]
#=>
│ department │
│ name employee{name,position,salary,rate} │
┼─────────────────────────────────────────────────────────────────────┼
│ POLICE JEFFERY A, SERGEANT, 101442, missing; NANCY A, POLICE OFFIC…│
=#
shape(chicago[Q])
#=>
BlockOf(⋮
x1to1) |>
IsLabeled(:department)
=#
This operation can also be used in a combinator form.
Q >>= Is1to1(It.employee >> Take(1))
#-> It.department >> Take(1) >> Is1to1 >> Is1to1(It.employee >> Take(1))
chicago[Q]
#=>
│ employee │
│ name position salary rate │
┼───────────────────────────────────┼
│ JEFFERY A SERGEANT 101442 │
=#
Other cardinality constraints can also be asserted.
chicago[It.department.name >> Take(1) >> Is0to1] |> shape
#-> BlockOf(String, x0to1) |> IsLabeled(:name)
chicago[It.department.name >> Take(1) >> Is0toN] |> shape
#-> BlockOf(String) |> IsLabeled(:name)
chicago[It.department.name >> Take(1) >> Is1toN] |> shape
#-> BlockOf(String, x1toN) |> IsLabeled(:name)
chicago[Is0to1(It.department.name >> Take(1))] |> shape
#-> BlockOf(String, x0to1) |> IsLabeled(:name)
chicago[Is0toN(It.department.name >> Take(1))] |> shape
#-> BlockOf(String) |> IsLabeled(:name)
chicago[Is1toN(It.department.name >> Take(1))] |> shape
#-> BlockOf(String, x1toN) |> IsLabeled(:name)
When the constraint is not satisfied, an error is reported.
Q = It.department >> Record(It.name, It.employee >> Is1to1)
chicago[Q]
#-> ERROR: "employee": expected a singular value, relative to "department"
These operations could also be used to widen the cardinality constraint.
Q = Count(It.department) >> Is1toN
chicago[Q]
#=>
──┼───┼
1 │ 3 │
=#
shape(chicago[Q])
#-> BlockOf(Int64, x1toN)
In @query
notation, these operations are written as is0to1()
, is0toN()
, is1to1()
, is1toN()
.
@query chicago department.name.take(1).is1to1()
#=>
│ name │
┼────────┼
│ POLICE │
=#
@query chicago is1to1(department.name.take(1))
#=>
│ name │
┼────────┼
│ POLICE │
=#
shape(@query chicago department.name.take(1).is0to1())
#-> BlockOf(String, x0to1) |> IsLabeled(:name)
shape(@query chicago department.name.take(1).is0toN())
#-> BlockOf(String) |> IsLabeled(:name)
shape(@query chicago department.name.take(1).is1toN())
#-> BlockOf(String, x1toN) |> IsLabeled(:name)
shape(@query chicago is0to1(department.name.take(1)))
#-> BlockOf(String, x0to1) |> IsLabeled(:name)
shape(@query chicago is0toN(department.name.take(1)))
#-> BlockOf(String) |> IsLabeled(:name)
shape(@query chicago is1toN(department.name.take(1)))
#-> BlockOf(String, x1toN) |> IsLabeled(:name)
Unique
, Sort
, and Group
We use the Unique
combinator to produce unique elements of a collection.
Q = It.department >>
Record(It.name, Unique(It.employee.position))
#-> It.department >> Record(It.name, Unique(It.employee.position))
chicago[Q]
#=>
│ department │
│ name position │
──┼────────────────────────────────────────────────┼
1 │ POLICE POLICE CADET; POLICE OFFICER; SERGEANT │
2 │ FIRE FIRE ENGINEER-EMT; FIREFIGHTER-EMT │
3 │ OEMC CROSSING GUARD; TRAFFIC CONTROL AIDE │
=#
Unique
also has a primitive query form.
Q = It.department.employee.position >> Unique
#-> It.department.employee.position >> Unique
chicago[Q]
#=>
│ position │
──┼──────────────────────┼
1 │ CROSSING GUARD │
2 │ FIRE ENGINEER-EMT │
3 │ FIREFIGHTER-EMT │
4 │ POLICE CADET │
5 │ POLICE OFFICER │
6 │ SERGEANT │
7 │ TRAFFIC CONTROL AIDE │
=#
In @query
notation, Unique(X)
is written as unique(X)
.
@query department{name, unique(employee.position)}
#=>
Get(:department) >> Record(Get(:name),
Unique(Get(:employee) >> Get(:position)))
=#
The aggregate query form of Unique
is written as unique()
.
@query department.employee.position.unique()
#-> Get(:department) >> Get(:employee) >> Get(:position) >> Then(Unique)
The Sort
combinator sorts the input by the given key.
Q = It.department.employee >>
Sort(It.position)
#-> It.department.employee >> Sort(It.position)
chicago[Q]
#=>
│ employee │
│ name position salary rate │
───┼────────────────────────────────────────────────┼
1 │ LAKENYA A CROSSING GUARD 17.68 │
2 │ DORIS A CROSSING GUARD 19.38 │
3 │ JAMES A FIRE ENGINEER-EMT 103350 │
4 │ DANIEL A FIREFIGHTER-EMT 95484 │
5 │ ROBERT K FIREFIGHTER-EMT 103272 │
6 │ ALBA M POLICE CADET 9.46 │
7 │ NANCY A POLICE OFFICER 80016 │
8 │ ANTHONY A POLICE OFFICER 72510 │
9 │ JEFFERY A SERGEANT 101442 │
10 │ BRENDA B TRAFFIC CONTROL AIDE 64392 │
=#
Arbitrary key expressions are supported.
Q = It.department >>
Sort(Count(It.employee)) >>
Record(It.name, :size => Count(It.employee))
#=>
It.department >>
Sort(Count(It.employee)) >>
Record(It.name, :size => Count(It.employee))
=#
chicago[Q]
#=>
│ department │
│ name size │
──┼──────────────┼
1 │ FIRE 3 │
2 │ OEMC 3 │
3 │ POLICE 4 │
=#
Empty keys are placed on top.
Q = It.department.employee >>
Sort(It.salary)
#-> It.department.employee >> Sort(It.salary)
chicago[Q]
#=>
│ employee │
│ name position salary rate │
───┼────────────────────────────────────────────────┼
1 │ ALBA M POLICE CADET 9.46 │
2 │ LAKENYA A CROSSING GUARD 17.68 │
3 │ DORIS A CROSSING GUARD 19.38 │
4 │ BRENDA B TRAFFIC CONTROL AIDE 64392 │
5 │ ANTHONY A POLICE OFFICER 72510 │
6 │ NANCY A POLICE OFFICER 80016 │
7 │ DANIEL A FIREFIGHTER-EMT 95484 │
8 │ JEFFERY A SERGEANT 101442 │
9 │ ROBERT K FIREFIGHTER-EMT 103272 │
10 │ JAMES A FIRE ENGINEER-EMT 103350 │
=#
More than one key column could be provided.
Q = It.department.employee >>
Sort(It.rate, It.salary)
#-> It.department.employee >> Sort(It.rate, It.salary)
chicago[Q]
#=>
│ employee │
│ name position salary rate │
───┼────────────────────────────────────────────────┼
1 │ BRENDA B TRAFFIC CONTROL AIDE 64392 │
2 │ ANTHONY A POLICE OFFICER 72510 │
3 │ NANCY A POLICE OFFICER 80016 │
4 │ DANIEL A FIREFIGHTER-EMT 95484 │
5 │ JEFFERY A SERGEANT 101442 │
6 │ ROBERT K FIREFIGHTER-EMT 103272 │
7 │ JAMES A FIRE ENGINEER-EMT 103350 │
8 │ ALBA M POLICE CADET 9.46 │
9 │ LAKENYA A CROSSING GUARD 17.68 │
10 │ DORIS A CROSSING GUARD 19.38 │
=#
In @query
notation, Sort
combinator is written sort()
.
@query begin
department
collect(size => count(employee))
sort(size)
end
#=>
Get(:department) >>
Collect(Count(Get(:employee)) >> Label(:size)) >>
Sort(Get(:size))
=#
We use the Group
combinator to group the input by the given key.
Q = It.department.employee >>
Group(It.position)
#-> It.department.employee >> Group(It.position)
chicago[Q]
#=>
│ position employee{name,position,salary,rate} │
──┼───────────────────────────────────────────────────────────────────┼
1 │ CROSSING GUARD LAKENYA A, CROSSING GUARD, missing, 17.68; …│
2 │ FIRE ENGINEER-EMT JAMES A, FIRE ENGINEER-EMT, 103350, missing │
3 │ FIREFIGHTER-EMT DANIEL A, FIREFIGHTER-EMT, 95484, missing; …│
4 │ POLICE CADET ALBA M, POLICE CADET, missing, 9.46 │
5 │ POLICE OFFICER NANCY A, POLICE OFFICER, 80016, missing; AN…│
6 │ SERGEANT JEFFERY A, SERGEANT, 101442, missing │
7 │ TRAFFIC CONTROL AIDE BRENDA B, TRAFFIC CONTROL AIDE, 64392, miss…│
=#
Just like with Sort
, arbitrary key expressions are supported.
Q = It.department >>
Group(:size => Count(It.employee)) >>
Record(It.size, :count => Count(It.department))
chicago[Q]
#=>
│ size count │
──┼─────────────┼
1 │ 3 2 │
2 │ 4 1 │
=#
Empty keys are placed on top.
Q = It.department.employee >>
Group(:grade => It.salary .÷ 10000) >>
Record(It.grade, :n => Count(It.employee))
chicago[Q]
#=>
│ grade n │
──┼──────────┼
1 │ 3 │
2 │ 6 1 │
3 │ 7 1 │
4 │ 8 1 │
5 │ 9 1 │
6 │ 10 3 │
=#
More than one key column could be provided.
Q = It.department.employee >>
Group(ismissing.(It.salary),
ismissing.(It.rate)) >>
Record(It.A, It.B, Count(It.employee))
chicago[Q]
#=>
│ #A #B #C │
──┼──────────────────┼
1 │ false true 7 │
2 │ true false 3 │
=#
In @query
notation, we write group()
.
@query begin
department
group(size => count(employee))
{size, count => count(department)}
end
#=>
Get(:department) >>
Group(Count(Get(:employee)) >> Label(:size)) >>
Record(Get(:size), Count(Get(:department)) >> Label(:count))
=#