headlines
MERGE
Syntax
MERGE statements are a hybrid tool for inserting or updating
information to the database in situations where Paglo needs to make an
intelligent decision about which data to insert, which to update (replace),
and which data to leave unchanged. The MERGE statement identifies
the node or nodes in question, and then looks for it. If it finds it, it adds
any missing information, or updates any obsolete information. If it doesn't
find the node, it creates it. You can use MERGE statements to
add nodes that were not originally discovered.
Think of a MERGE statement as a tree of nested condition statements.
Each condition statement identifies a node or a set of nodes that match the
condition. Going down the tree, condition statements are applied one by one.
After each condition statement is satisfied, the children of that condition
statement are then applied, making the statement more and more specific.
Any single condition statement may of course match more than one node. Make
sure that you construct your MERGE statement with that in mind.
If you want to merge data into a single node in your PQL database, make sure
that your condition statements identify only that node and filter out all
others.
Caution: Be careful using MERGE statements. Because a
MERGE statement replaces existing data with new data, it's
possible to accidently delete or overwrite existing data by using such
innocuous-looking statements as these two:
MERGE INTO / VALUES {};
MERGE @'timestamp' INTO / VALUES {};
The possible elements of an MERGE statement include:
pql_statement |
::= |
MERGE [INTO] [@timestamp]
|
timestamp |
::= |
'yyyy-mm-ddThh:mm:dd.uuuuuuZ'
|
conditional_tree_literal |
::= |
{conditional_tree_assignment [,conditional_tree_assignment]} |
Description
It is important to understand when and why the MERGE
statement updates (replaces) data, and when it simply inserts data,
because this statement can accidently delete your entire tree if used
incorrectly.
The MERGE rules are as follows:
- Rule 1:
If no conditions exist, then all existing nodes with the same name are
replaced by the new node. You can add a
TRUEcondition for a node that otherwise has no conditions — such as the top-levelnetworknode — to prevent the entire tree from being replaced. See example of Rule 1. - Rule 2:
If a condition exists but is not met by any node, then a
new node is created. You can force new node creation by adding
a
FALSEcondition for nodes that you always want replaced, such as logs. See example of Rule 2. - Rule 3: If a condition exists and is met by multiple nodes, then all matching nodes are updated.
If there are no conditions on a node because the Crawler is reporting
all known information about the node, then MERGE updates
(replaces), rather than inserts this information.
If the server finds no conditions, it replaces all existing nodes
that have the same name. You can avoid
Rule 1 by adding a condition to
exclude a particular node from this rule, where the node has no other
conditions to distinguish it such as the network node, by adding a
TRUE condition.
The elements of an MERGE statement are defined as follows:
MERGE [INTO] |
The MERGE statement is a hybrid between
INSERT and
UPDATE, for situations where
the Paglo Crawler discovers new data and needs to make intelligent
decisions about how to merge it with existing data. For example, if
either the new or the old data is incomplete, then the server must
decide which data to insert, which data to replace or update, and
which data to keep unchanged.The INTO keyword is optional. The MERGE clause must include a
path to indicate the
location of the data to be added or replaced, and can include a
[@timestamp] option to add or replace history.
A path can be represented by an identifier
or a boolean_expression. The [@timestamp] option updates history, and is represented
by the string: 'yyyy-mm-ddThh:mm:dd.uuuuuuZ', where
uuuuuu is the fraction of a second in microseconds. For
example: 2007-09-02T16:14:52.171176Z . | |
VALUES |
Indicates the data to be added or replaced, and must be
specified with a conditional_tree_literal. A conditional_tree_literal is one or more
conditional_tree_assignments. A
conditional_tree_assignment can be any tree_assignment
with additional conditions.
|
Examples
- Example 1: Simple merge
- Example 2: Compare MERGE to INSERT
- Example 3: Prevent replacing the entire database
- Example 4: Merge processes
- Example 5: Update disk space
- Example 6: Force new node creation
- Example 7: Merge historical data
- Example 9: Complex merge
- Example 10: Use Node IDs in MERGE statements
- Best practices and troubleshooting
Example 1: Simple merge
Suppose you want to add some information to the database, but are
not sure whether it is actually missing or not. Some of the data is new,
and some must update (replace) existing data. You could write an MERGE
statement that specifies that where
b is 2, a should be
inserted, and where b is not 2,
b should be updated to 3, like this:
MERGE INTO / values {
a[b='2'] => {
b => '3'
}
}
This statement starts with the MERGE INTO keywords, indicating
your intention to insert and/or update data. Next is a slash /
to represent the root node where you are taking action. The VALUES
keyword follows, and then the specific values themselves.
The Paglo Crawler specifies information to the server in the form of a complete tree. The server then merges the data with the existing forest. The new trees include conditional paths, and the conditions specify how they should merge with the existing forest.
The resulting output might look something like this:
row:
*:
a:
b: 3
Example 2: Compare INSERT to MERGE
It might be helpful to compare a MERGE statement with a
similar INSERT statement. The following INSERT
statement simply inserts the {a => 'b'} values into
the root:
INSERT INTO / values {
a => 'b'
}
|
MERGE INTO / values {
a[b = '2'] => {
b => '3'
}
}
|
The MERGE statement inserts and/or overrides data, and the
conditional path: {a[b = '2'] => {b => '3'}} indicates what
should be inserted and what should be updated. In this case, where
b = 2, a is inserted, and where b != 2,
b should be updated to 3.
The resulting output is the same as in Example 1.
Example 3: Prevent replacing the entire database
Let's use a more realistic example. Suppose the Paglo Crawler discovers a new device, and that it already has information about one of the interfaces on that device. The Crawler sends a statement like this to the Search Index:
MERGE INTO / values {
network[TRUE] => {
device[interface/mac_address = '00:00' or
interface/ip_address = '1.2.3.4'] => {
interface => {
name => 'eth0',
mac_address => '00:B1:B2:B3:B4:B5',
ip_address => '10.10.10.40'
}
}
}
}
The device node has a condition that specifies which
existing node should be updated. As we know from
Rule 2, if the condition
does not match any existing nodes, then a new node is created.
The interface node has no conditions. As we know from
Rule 1, nodes without conditions
are updated (completely replaced). If the server finds no
conditions, it replaces all existing nodes with that name.
Note that the network node has a condition of [TRUE]. That is
because the top-level network node normally has no conditions.
Rule 1 completely replaces nodes without
conditions. So [TRUE] is used as an exception to Rule 1. If we
do not use this exception, every update would replace the network
node, which is equivalent to replacing the entire tree.
If this statement is run on the sample database, it produces the following results:
row:
*:
network:
device:
interface:
ip_address: 10.10.10.40
mac_address: 00:B1:B2:B3:B4:B5
name: eth0
Example 4: Merge processes
In this example, the Crawler reports the set of running processes for a device:
MERGE INTO / values {
network[TRUE] => {
device[uuid = '1'] => {
running_processes => {
win32_process => 'foo',
win32_process => 'bar'
}
}
}
}
Note that the network node has a condition of [TRUE] to
avoid (Rule 1) replacing the entire
tree. This statement produces the following results:
row:
*:
network:
device:
running_processes:
win32_process: foo
win32_process: bar
Example 5: Update disk space
In this example, the free disk space is updated,
but only for the disk called '/dev/hda':
MERGE INTO / values {
network[TRUE] => {
device[uuid = '1'] => {
disk[name = '/dev/hda'] => {
free_space => 10000
}
}
}
}
This statement would produces results like this:
row:
*:
network:
device:
disk:
free_space: 10000
Example 6: Force new node creation
For nodes that are never replaced, such as log messages, we need a way to force Rule 2 and ensure new node creation. You can do that by adding a [FALSE] condition so that new nodes are added instead of being updated (replaced). For that case, you can use:
MERGE INTO / values {
log[FALSE] => {
message => 'Service started',
timestamp => '1/2/07'
}
}
This statement would produces results like this:
row:
*:
log:
message: Service started
timestamp: 1/2/07
Example 7: Merge historical data
This example merges historical values by adding a timestamp. Note that the
timestamp must be in single quotes: '2008-03-21T15:50:01.528935Z'.
Note the use of an indented format to write a good MERGE statement
that can be read by human eyes. The first 5 lines of the following more complex
statement are boilerplate. They identify the node and its location. From the
sixth line and below, the details about that node are listed:
Note: Paglo timestamps are in coordinated universal time (UTC/GMT). When you view your data in your Paglo Web account, Paglo converts your data to your local time. But when you retrieve timestamps from your database as results to your searches or PQL queries, it is in UTC.
MERGE@'2008-03-21T15:50:01.528935Z' into / values {
network[true] => {
device[interface/mac_address='00:13:20:40:9A:9E'] => {
system[true] => {
meminfo[true] => {
SwapTotal[true] => '3997474816',
SwapFree[true] => '3548553216',
MemFree[true] => '40374272',
HighTotal[true] => '1198030848',
Committed_AS[true] => '1972723712',
SUnreclaim[true] => '19660800',
NFS_Unstable[true] => '0',
VmallocChunk[true] => '64999424',
Writeback[true] => '0',
MemTotal[true] => '2116120576',
VmallocUsed[true] => '47677440',
AnonPages[true] => '1147985920',
Active[true] => '1293049856',
CommitLimit[true] => '5055533056',
Cached[true] => '475234304',
SwapCached[true] => '247386112',
LowTotal[true] => '918089728',
Dirty[true] => '24576',
Mapped[true] => '58638336',
HighFree[true] => '30834688',
SReclaimable[true] => '64770048',
VmallocTotal[true] => '117432320',
Bounce[true] => '0',
Inactive[true] => '676634624',
PageTables[true] => '3608576',
Slab[true] => '84430848',
Buffers[true] => '346124288',
LowFree[true] => '9539584'
}
}
}
}
}
This statement would produces results like this:
row:
*:
network:
device:
system:
meminfo:
lowfree: 9539584
buffers: 346124288
slab: 84430848
pagetables: 3608576
inactive: 676634624
bounce: 0
vmalloctotal: 117432320
sreclaimable: 64770048
highfree: 30834688
mapped: 58638336
dirty: 24576
lowtotal: 918089728
swapcached: 247386112
cached: 475234304
commitlimit: 5055533056
active: 1293049856
anonpages: 1147985920
vmallocused: 47677440
memtotal: 2116120576
writeback: 0
vmallocchunk: 64999424
nfs_unstable: 0
sunreclaim: 19660800
committed_as: 1972723712
hightotal: 1198030848
memfree: 40374272
swapfree: 3548553216
swaptotal: 3997474816
Example 8: Merge without destroying
This example merges device data.
MERGE INTO / values {
network[true] => {
device[interface/mac_address = '00:E0:81:58:FE:CA'
or interface/inet/ip_address = '50.10.33.64'] => {
interface[mac_address = '00:0C:29:93:CE:EE' or
inet/ip_address = '10.10.20.27'] => {
mac_address[true] => '00:0C:29:93:CE:EE',
inet[ip_address = '10.10.20.27'] => {
last_seen[true] => '2008-03-19T18:42:10.0000000Z'
}
}
}
}
}
This statement produces the following results:
row:
row
*
network
device
interface
inet
last_seen 2008-02-19T18:42:10.0000000Z
mac_address 00:0C:29:93:CE:EE
device
interface
inet
last_seen 2008-03-19T18:42:10.0000000Z
mac_address 00:0C:29:93:CE:EE
Example 9: Complex merge
In reality, your MERGE statements may be far more complex than these
examples. To add multiple nodes to your database that were not previously discovered,
you may want to use a different MERGE statement for each node. Here's a
more complex and more realistic example:
MERGE@'2008-03-21T17:15:12.014103Z' into / values {
network[true] => {
device[interface/mac_address = '00:0C:29:93:CE:EE' or
or interface/inet/ip_address = '10.10.20.27'] => {
interface[mac_address = '00:0C:29:93:CE:EE' or
or inet/ip_address = '10.10.20.27'] => {
mac_address[true] => '00:0C:29:93:CE:EE',
inet[ip_address = '10.10.20.27'] => {
ip_address[true] => '10.10.20.27'
}
},
system[true] => {
dns_name[true] => 'maruichi'
},
evidence => {
dhcp[false] => {
value[true] => {
vendor_class[true] => 'MSFT 5.0'
},
value[true] => {
host_name[true] => 'maruichi'
},
value[true] => {
param_request_list[true] =>
'1,15,3,6,44,46,47,31,33,43,252'
}
},
dns_name[false] => {
value[true] => {
value[true] => 'maruichi'}
},
ip_address[false] => {
value[true] => {
value[true] => '10.10.20.27'
}
},
mac_address[false] => {
value[true] => {
value[true] => '00:0C:29:93:CE:EE'
}
},
probe[false] => {
param[true] => {
technique[true] => 'snmp'
},
param[true] => {
port[true] => '161'
},
param[true] => {
request_label[true] => 'snmp_sysdescr'
},
param[true] => {
request[true] => '1.3.6.1.2.1.1.1.0'
},
value[true] => {
response[true] => 'Hardware: x86 Family 6 Model
15 Stepping 8 AT/AT COMPATIBLE
- Software: Windows 2000
Version 5.0 (Build 2195
Uniprocessor Free)'
}
},
probe[false] => {
param[true] => {
technique[true] => 'snmp'
},
param[true] => {
port[true] => '161'
},
param[true] => {
request_label[true] => 'snmp_sysname'
},
param[true] => {
request[true] => '1.3.6.1.2.1.1.5.0'
},
value[true] => {
response[true] => 'MARUICHI'
}
},
scan_status[false] => {
param[true] => {
scan[true] => 'SCAN_DNS_QUERY'
},
value[true] => {
value[true] => 'completed'
}
},
scan_status[false] => {
param[true] => {
scan[true] => 'SCAN_NBNS_QUERY'
},
value[true] => {
value[true] => 'completed'
}
},
scan_status[false] => {
param[true] => {
scan[true] => 'SCAN_ARP_QUERY'
},
value[true] => {
value[true] => 'completed'
}
},
scan_status[false] => {
param[true] => {
scan[true] => 'SCAN_TCP_PORTS'
},
value[true] => {
value[true] => 'completed'
}
},
scan_status[false] => {
param[true] => {
scan[true] => 'SCAN_UDP_PORTS'
},
value[true] => {
value[true] => 'completed'
}
},
scan_status[false] => {
param[true] => {
scan[true] => 'SCAN_SNMP'
},
value[true] => {
value[true] => 'completed'
}
},
subnet[false] => {
value[true] => {
value[true] => '10.10.0.0/16'
}
},
tcp_port[false] => {
param[true] => {
port[true] => '135'
},
value[true] => {
value[true] => 'open'
}
},
tcp_port[false] => {
param[true] => {
port[true] => '139'
},
value[true] => {
value[true] => 'open'
}
},
tcp_port[false] => {
param[true] => {
port[true] => '7'
},
value[true] => {
value[true] => 'open'
}
}
}
}
}
}
This statement would produces results like this:
row:
*:
network:
device:
interface:
mac_address: 00:0C:29:93:CE:EE
inet:
ip_address: 10.10.20.27
system:
dns_name: maruichi
evidence:
mac_address:
value:
value: 00:0C:29:93:CE:EE
ip_address:
value:
value: 10.10.20.27
tcp_port:
value:
value: open
param:
port: 7
tcp_port:
value:
value: open
param:
port: 139
tcp_port:
value:
value: open
param:
port: 135
subnet:
value:
value: 10.10.0.0/16
scan_status:
value:
value: completed
param:
scan: SCAN_SNMP
scan_status:
value:
value: completed
param:
scan: SCAN_UDP_PORTS
scan_status:
value:
value: completed
param:
scan: SCAN_TCP_PORTS
scan_status:
value:
value: completed
param:
scan: SCAN_ARP_QUERY
scan_status:
value:
value: completed
param:
scan: SCAN_NBNS_QUERY
scan_status:
value:
value: completed
param:
scan: SCAN_DNS_QUERY
probe:
value:
response: MARUICHI
param:
request: 1.3.6.1.2.1.1.5.0
param:
request_label: snmp_sysname
param:
port: 161
param:
technique: snmp
probe:
value:
response: Hardware: x86 Family 6 Model 15
Stepping 8 AT/AT COMPATIBLE -
Software: Windows 2000 Version 5.0
(Build 2195 Uniprocessor Free)
param:
request: 1.3.6.1.2.1.1.1.0
param:
request_label: snmp_sysdescr
param:
port: 161
param:
technique: snmp
dns_name:
value:
value: maruichi
dhcp:
value:
param_request_list: 1,15,3,6,44,46,47,31,33,43,252
value:
host_name: maruichi
value:
vendor_class: MSFT 5.0
Example 10: Use Node IDs in MERGE statements
You can use Node IDs in MERGE statements,
in place of lengthy condition statements in the FROM clause. Many typical
MERGE statements include a lengthy path that identifies which node or set
of nodes to merge the new data into, such as: MERGE INTO / . . .
where the slash / represents a path that roots the first clause of the
VALUES tree. Node IDs cut that path down to just a number that uniquely
identifies a single node.
For example, say you have 3 alerts in your database and you want to update
the first one. To merge data into Alert 1, you must write a MERGE
statement that identifies Alert 1, but filters out Alerts 2 and 3. To do
this without using Node ID, you'd write a MERGE statement with some
conditions that identify Alert 1, like this:
MERGE INTO / VALUES {
apps[true] => {
com[true] => {
alerts[true] =>
{
alert[name='Alert 1'] =>
{name[true] => 'Renamed Alert 1'
}
}
}
}
}
Instead, you can use the Node ID of Alert 1 to identify it. To find the Node ID of any particular node, run a query like this:
Query: What is the Node ID of Alert 1?
SELECT /#id FROM /apps/com/paglo/alerts/alert WHERE name = 'Alert 1'
Result: The result indicates that 10357496 is the Node ID
of Alert 1.
row id(): 10357496
Now you can update Alert 1 with a streamlined MERGE statement
like this:
MERGE INTO 10357496 VALUES {
name[true] => 'Renamed Alert 1'
}
For more examples, see Examples and More examples.
Best practices and troubleshooting
WARNING! Be careful with the MERGE statement. With
only a simple but all-encompasing statement, it is possible to delete
your entire database with a statement this simple:
MERGE INTO / VALUES {};
Identifying condition fields
Problems can occur if you use a MERGE statement
in which you use a field in a condition to select a node, in a case where the
field does not actually exist. For example, look at the following MERGE
statement:
MERGE into / values {
network[true] => {
device[interface/mac_address = '00:A0:B0:C0:D0:E0'
OR interface/inet/ip_address = '10.10.10.10'] => {
last_seen[true] => '2008-02-19T18:42:10.0000000Z'
}
}
}
If you run this statement, it creates a device, but does not create the
interface sub-tree. Therefore, if you run this same MERGE
statement again, it creates a second device instead of merely updating
the first device.
Merging data from different sources
When data about a device is gathered from different sources,
they often differ in how they identify the device. One plugin identifies the
device by its IP address, and another by its MAC address. As part of best
practices, always include both addresses in your MERGE statements
so that data from different sources merges properly. Write this condition in the
same order: mac_address OR ip_address as in the following example:
MERGE into / values {
network[true] => {
device[interface/mac_address = '00:A0:B0:C0:D0:E0'
OR interface/inet/ip_address = '10.10.10.10'] => {
last_seen[true] => '2008-02-19T18:42:10.0000000Z'
}
}
}
Bad merge example
It might be helpful to examine a poorly-written MERGE statement
and see what's wrong with it. The following example attempts to identify an
interface that has an IP address of 10.20.30.40 or a MAX address of
00:A0:B0:C0:D0:E0, and instead creates problems in a couple of ways:
Incorrect MERGE statement:
MERGE INTO / values {
network[true] => {
device[interface[inet/ip_address = '10.20.30.40'
or mac_address = '00:A0:B0:C0:D0:E0'] => {
mac_address[true] => '00:A0:B0:C0:D0:E0',
inet => {
ip_address[true] => '10.20.30.40'
}
}
}
}
One problem is that the same clause that provides the conditions to match, also adds the sub-tree. That means that if Paglo doesn't find this sub-tree in the database already, it creates it new.
In other words, this MERGE statement adds an IP address of 10.20.30.40
and a MAC address of 00:A0:B0:C0:D0:E0 if it doesn't find them already in existence.
Was that your intention? If not, you have added data accidently.
Another problem occurs if you run the same MERGE statement again at a
later date, to add or update the database. Since the sub-tree now already exists,
the MERGE statement creates a second inet clause and
results in an interface with two identical IP addresses:
Result:
. . .
interface/
mac_address = '00:A0:B0:C0:D0:E0'
inet/
ip_address = '10.20.30.40'
inet/
ip_address = '10.20.30.40'
. . .
The reason this happens is an omission in the following lines of the original
MERGE statement:
inet => {
ip_address[true] => '10.20.30.40'
}
These lines do not specify which inet tree to update, so the
MERGE statement creates a second inet sub-tree.
To correct this omission, apply [true] to the inet clause:
inet[true] => {
ip_address[true] => '10.20.30.40'
}
This would match any inet sub-tree under interface.
However, what about an interface that has more than one IP address? The above
inet[true] clause matches any and all inet sub-trees
under interface and then sets the IP address for all of them to
10.20.30.40. For example, if you start with this interface in your database:
Database:
../interface/
inet/
ip_address = '10.20.30.40'
inet/
ip_address = '10.10.10.1'
After the MERGE statement you would have this interface in your database:
Result:
../interface/
inet/
ip_address = '10.20.30.40'
inet/
ip_address = '10.20.30.40'
The correct way to write this merge statement is to fully specify the condition clause for every non-leaf node, as follows:
Correct MERGE statement:
MERGE INTO / values {
network[true] => {
device[interface[inet/ip_address = '10.20.30.40'
or mac_address = '00:A0:B0:C0:D0:E0'] => {
mac_address[true] => '00:A0:B0:C0:D0:E0',
inet[ip_address = '10.20.30.40'] => {
ip_address[true] => '10.20.30.40'
}
}
}
}
Conclusion: Remember that when you have a sub-tree that may or may not exist, or when you may have multiple sub-trees with the same node name, construct your condition clauses so that you full identify every non-leaf node, and thereby update the correct sub-tree.

