Cell writeback in Mondrian
Writeback is a feature that allows you to modify OLAP cell values and see the effects ripple through the data set, automatically modifying child and parent cells, and also cells derived using calculations. This allows you to perform ‘what if’ analysis and applications such as budgeting.
I have added experimental support for writeback to Mondrian.
In Mondrian’s case, the term ‘writeback’ is a bit misleading. In a ROLAP system such as Mondrian, writing back to the database would be difficult, since values are stored in a fact table but we allow cells of any granularity to be modified. One modified cell might contain thousands of fact table rows. So, we don’t write cells back to the database, but just retain the modified cells in memory, and propagate the modifications to related cells.
Here’s how to use the experimental writeback support. Some of the details may change later as we make the feature more usable.
First, enable writeback for your
cube. Create a dimension called ‘Scenario
’, and a measure called
‘Atomic Cell Count
’:
<Cube name='Sales'>
<Dimension name='Scenario' foreignKey='time_id'>
<Hierarchy primaryKey='time_id' hasall='true'>
<InlineTable alias='_dummy'>
<ColumnDefs>
<ColumnDef name='foo' type='Numeric'/>
</ColumnDefs>
<Rows/>
</InlineTable>
<Level name='Scenario' column='foo'/>
</Hierarchy>
</Dimension>
<!-- Other dimensions... -->
<Measure name='Atomic' aggregator='count'/>
<!-- Other measures... -->
</Cube>
(Yes, this is a lot of crud to add to your cube definition, and it’s
temporary. In future, we will let you flag a cube as ‘writeback
enabled’, and a [Scenario]
dimension and [Atomic Cell Count]
measure will be created automatically. Also, we will make it easier
for you to create dimensions that have only calculated members,
without resorting to inline tables.)
Next, create a Scenario:
Connection connection;
Scenario scenario = connection.createScenario();
int scenarioId = scenario.getId();
(The Scenario API will soon move to olap4j:
before mondrian-4.0, I
hope. This includes the class mondrian.olap.Scenario
, the method
mondrian.olap.Cell.setValue()
, and the method
mondrian.olap.Connection.createScenario()
. It will be optional for
an olap4j driver to support writeback, but Mondrian’s olap4j driver
will, of course.)
Write a query that uses the scenario. Assuming that scenarioId
above
was 1, the query
SELECT [Measures].[Unit Sales] ON COLUMNS,
{[Product],
[Product].Children,
[Product].[Drink].Children} ON ROWS
FROM [Sales]
WHERE [Scenario].[1]
returns
[Product] [Unit Sales]
========================= ============
(All) 266,773
+ Drink 24,597
+--+ Alcoholic Beverages 6,838
+--+ Beverages 13,573
+--+ Dairy 4,186
+ Food 191,940
+ Non-Consumable 50,236
Choose one of the cells returned from the query and modify its
value. For example, let’s reduce the sales of Drink
by 1,000 from
24,597 to 23,597:
Result result = connection.executeQuery(...);
Cell cell = result.getCell(new int[] {0, 1});
cell.setValue(23597, AllocationPolicy.EQUAL_ALLOCATION);
Execute the query again, and it returns
[Product] [Unit Sales]
========================= ============
(All) 265,773
+ Drink 23,597
+--+ Alcoholic Beverages 6,563
+--+ Beverages 12,990
+--+ Dairy 4,043
+ Food 191,940
+ Non-Consumable 50,236
The value for Drink
is 23,597, as expected, and the values of its
children have been correspondingly reduced.
How the value is allocated to the children (and in fact all
descendants) is decided by the allocation policy. In this case, we
specified EQUAL_ALLOCATION
, which means that all atomic cells have
the same value.
An atomic cell is the finest grained value that can be viewed multidimensionally; for this cube, it is an instance of a particular customer buying a particular product, on a particular promotion, on a particular day, in a particular store. That makes for an awful lot of of atomic cells, but there may be fewer atomic cells than fact table rows. If the fact table does not have a primary key on (customer, product, time, promotion, store) some cells may have more than one fact table row.
If instead we had written
cell.setValue(23597, AllocationPolicy.EQUAL_INCREMENT);
the query would have returned
[Product] [Unit Sales]
========================= ============
(All) 265,773
+ Drink 23,597
+--+ Alcoholic Beverages 6,560
+--+ Beverages 13,022
+--+ Dairy 4,015
+ Food 191,940
+ Non-Consumable 50,236
We notice that Beverages
has not been reduced as much under
EQUAL_INCREMENT
policy than EQUAL_ALLOCATION
policy; the average
value for atomic cells of Beverages must be greater than for Drink as
a whole.
Allocation policies are defined consistent with Analysis Services’
UPDATE CUBE statement.
Mondrian does not currently implement
WEIGHTED_ALLOCATION
or WEIGHTED_INCREMENT
policies.
Treating scenarios as a dimension is an elegant and powerful
idea. Using the Scenario
dimension, you can easily switch from one
scenario to another, or you can compare scenarios side-by-side.
Note that you can also set a connection’s current scenario. This
effectively becomes the default value for the Scenario
dimension in
that connection, so you do not need to specify Scenario in the
slicer. However, there still needs to be an explicit scenario in the
context when you call Cell.setValue()
. I’m not sure whether the
benefit of having a scenario for a connection outweighs the
benefit/confusion, and we may discontinue this feature.
Remember, this is still an experimental feature. There is some cleanup to be done, some performance tuning, and the API needs to be moved into olap4j. But most importantly, it’s not useful until a user interface, such as PAT or JPivot, supports scenarios and modifying cell values.