Filters
Now we've seen how to identify parts of a page to turn into data,
let's look at how to manipulate that data. scrapelect does this
using filters. We've seen a couple already, like take and text,
but let's look at them more closely.
Every filter takes in a value, and a list of named arguments (which can be empty), and returns the result of applying that filter.
We call a filter with value | filter_name(arg: value, ...). The simplest filter is the id filter, which takes the value and
returns that same value: if we let a: 5 | id();, we get
{ "a": 5 }.
Another useful filter is the dbg filter, which, like id,
returns the original value, but it prints the value to the
terminal as well. dbg has an optional argument msg, a
String, which specifies a message to include, if set. If not
provided to the filter, it prints debug message: ....
a: 1 | dbg();
b: 2 | dbg(message: "from b");
will output to the console:
debug message: 1
from b: 2
and return { "a": 1, "b": 2 }.
Modifying filters
Often, though, it is useful to use a filter that modifies the passed
value in some way. One useful filter is strip, which trims
leading and trailing whitespace from a string, which is often found
inside HTML elements.
trimmed: " hellooooo " | strip();
outputs
{
"trimmed": "hellooooo"
}
Note that this doesn't mutate the original value passed in; that is, if you read another binding as the input to a filter, applying the filter will not change the value in the original binding:
bind: "5";
new: $bind | int();
results (where int converts the value into the integer type):
{
"bind": "5",
"new": 5
}
where bind is still a string but new is an int.
Filter documentation
Documentation for all of the built-in filters is available at docs.rs, which lists filter signatures, descriptions, and examples.
Chaining filters
Filters are designed to be executed in a pipeline, passing the output of one filter to the input of another:
is-not-five: "5" | int() | eq(to: 5) | not();
outputs
{
"is-not-five": false
}
where eq returns whether the value is equal to to, and not
returns the boolean NOT (opposite) of the input value.
Qualifiers
Similar to element blocks, filters can have qualifiers at the end.
The same qualifiers (none for one, ? for optional, and * for list)
can be placed at the end of a filter call to modify its behavior:
value | filter(...)?applies the filter tovalueif it is notnull, or returnsnullif it is.value | filter(...)*, whenvalueis a list, returns the list of every element in the list with the filter applied to it (similar to the map operation in other languages). It is an error to use the*qualifier if the input value is not a list.
Example
floats: "1 2.3 4.5" | split() | float()*;
optional: "3.4" | float()?;
optional2: null | float()?;
returns, where split turns a string into an array of strings
split on a delimiter (by default, whitespace), and float turns
a value into the float type:
{
"floats": [1.0, 2.3, 4.5],
"optional": 3.4,
"optional2": null
}