<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>metrica BI</title>
	<atom:link href="http://www.metrica-bi.de/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.metrica-bi.de</link>
	<description>Microsoft Technologies</description>
	<lastBuildDate>Thu, 26 Oct 2017 20:27:58 +0000</lastBuildDate>
	<language>de-DE</language>
		<sy:updatePeriod>hourly</sy:updatePeriod>
		<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.9.1</generator>
	<item>
		<title>Fraud analysis with SSAS: Benford&#8217;s law test in OLAP Cubes</title>
		<link>http://www.metrica-bi.de/fraud-analysis-with-ssas-benfords-law-test-in-olap-cubes/</link>
		<comments>http://www.metrica-bi.de/fraud-analysis-with-ssas-benfords-law-test-in-olap-cubes/#comments</comments>
		<pubDate>Fri, 19 Jun 2015 13:54:50 +0000</pubDate>
		<dc:creator><![CDATA[Michael Mukovskiy]]></dc:creator>
				<category><![CDATA[Fraud analysis]]></category>
		<category><![CDATA[Microsoft Business Intelligence]]></category>
		<category><![CDATA[Benford's law]]></category>
		<category><![CDATA[Fraud Analysis]]></category>
		<category><![CDATA[MDX]]></category>
		<category><![CDATA[SSAS]]></category>

		<guid isPermaLink="false">http://www.metrica-bi.de/?p=157</guid>
		<description><![CDATA[Benford’s law states that in many naturally occurring collections of numbers the small digits occur disproportionately often as leading significant digits. For example, in sets which obey the law the number 1 would appear as the most significant digit about&#8230; <a href="http://www.metrica-bi.de/fraud-analysis-with-ssas-benfords-law-test-in-olap-cubes/" class="more-link">Continue Reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p><a href="https://en.wikipedia.org/?title=Benford%27s_law">Benford’s law</a> states that in many naturally occurring collections of numbers the small digits occur disproportionately often as leading significant digits. For example, in sets which obey the law the number 1 would appear as the most significant digit about 30% of the time, while larger digits would occur in that position less frequently: 9 would appear less than 5% of the time. </p>
<p>The leading digit “d” tends to have the probability of Log10(1+1/d):</p>
<p>&nbsp;<a href="http://www.metrica-bi.de/wp-content/uploads/2015/06/image.png"><img title="image" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="image" src="http://www.metrica-bi.de/wp-content/uploads/2015/06/image_thumb.png" width="226" height="187"></a> </p>
<p>Since statistical and financial data often obeys the Benford’s law it is a widely accepted fraud detection method to compare the distribution of leading digits with the theoretical distribution.</p>
<p>Fortunately it is pretty easy to implement the Benford’s analysis in the OLAP cubes.</p>
<p>Here are the extentions of an existing multidimensional SSAS solution:</p>
<ol>
<li>New dimension “First Digit” (1,2,3…9)
<li>For every measure/column to be analysed its measure group becomes a new key referencing the dimension “First Digit”&nbsp;
<li>Calculated members in cube script which bring the analysis in the handy form</li>
</ol>
<p>&nbsp;</p>
<h2>Implementation steps for Adventure Works</h2>
<p>1. Let’s define the new dimension <strong>First Digit</strong> with the following query in DataSourceView:</p>
<p><font size="2" face="Courier New">SELECT&nbsp; 1 AS ID, &#8217;1&#8230;&#8217; AS Name UNION ALL<br />SELECT&nbsp; 2 AS ID, &#8217;2&#8230;&#8217; AS Name UNION ALL<br />SELECT&nbsp; 3 AS ID, &#8217;3&#8230;&#8217; AS Name UNION ALL<br />SELECT&nbsp; 4 AS ID, &#8217;4&#8230;&#8217; AS Name UNION ALL<br />SELECT&nbsp; 5 AS ID, &#8217;5&#8230;&#8217; AS Name UNION ALL<br />SELECT&nbsp; 6 AS ID, &#8217;6&#8230;&#8217; AS Name UNION ALL<br />SELECT&nbsp; 7 AS ID, &#8217;7&#8230;&#8217; AS Name UNION ALL<br />SELECT&nbsp; 8 AS ID, &#8217;8&#8230;&#8217; AS Name UNION ALL<br />SELECT&nbsp; 9 AS ID, &#8217;9&#8230;&#8217; AS Name</font>
<p>The single attribute of the dimension will look like following:</p>
<p><a href="http://www.metrica-bi.de/wp-content/uploads/2015/06/image1.png"><img title="image" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="image" src="http://www.metrica-bi.de/wp-content/uploads/2015/06/image_thumb1.png" width="244" height="209"></a> </p>
<p>&nbsp;</p>
<p>2. Now let’s assume that we want to analyse the SalesAmount from Reseller Sales. </p>
<p>The measure group becomes a new dimension key <strong>SalesAmount_FirstDigit</strong> defined as: </p>
<p><font size="2" face="Courier">&nbsp; CAST(LEFT(CAST(ABS(SalesAmount) AS NVARCHAR(32)), 1) AS INT)</font></p>
<p>Don’t forget to define a regular relashionship between Dimension “First Digit” and the correspondent measure group in cube using the key SalesAmount_FirstDigit.</p>
<p>We take the measure [Reseller Transaction Count] as base value for our statistics. </p>
<p>&nbsp;</p>
<p>3. We define the calculated measures to be able to compare the distributions visually as well as to run a statistical assessment formally (using Kolmogorov–Smirnov test).</p>
<p>Please see the inline comments for details.</p>
<p><font size="1" face="Courier New">// &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br />// Benford&#8217;s analysis (BEGIN) &#8211;<br />//&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />//<br />// Change the definition of [Measures].[First Digit Count] <br />// if you want to analyse another measure group.<br />Create Member CurrentCube.[Measures].[First Digit Count]<br />as<br />[Measures].[Reseller Transaction Count]<br />,DISPLAY_FOLDER=&#8217;Benford&#8221;s analysis&#8217;; </font>
<p><font size="1" face="Courier New">// Distribution of the first digit in data<br />Create Member CurrentCube.[Measures].[First Digit Count %]<br />as<br />[Measures].[First Digit Count]/<br />([Measures].[First Digit Count],[First Digit].[First Digit].[All])<br />,Format_String = &#8220;Percent&#8221;, DISPLAY_FOLDER=&#8217;Benford&#8221;s analysis&#8217;; </font>
<p><font size="1" face="Courier New">// Theoretical distribution <br />Create Member CurrentCube.[Measures].[First Digit Benford %]<br />as // Use constants for performance!<br />VBA![Log](1+1/[First Digit].[First Digit].currentmember.member_key)/VBA![Log](10)<br />,Format_String = &#8220;Percent&#8221;, DISPLAY_FOLDER=&#8217;Benford&#8221;s analysis&#8217;;<br />// Exception for [All]<br />([Measures].[First Digit Benford %],[First Digit].[First Digit].[All])=1;<br />Format_String([Measures].[First Digit Benford %])=&#8221;Percent&#8221;; // format correction </font>
<p><font size="1" face="Courier New">// Cumulative value for Kolmogorov–Smirnov test <br />Create Member CurrentCube.[Measures].[First Digit Count % Cumul]<br />as<br />[Measures].[First Digit Count %]<br />+[First Digit].[First Digit].currentmember.prevmember<br />,Format_String = &#8220;Percent&#8221;, DISPLAY_FOLDER=&#8217;Benford&#8221;s analysis&#8217;;<br />// Recursion seed<br />([Measures].[First Digit Count % Cumul],[First Digit].[First Digit].&amp;[1])<br />=[Measures].[First Digit Count %];<br />Format_String([Measures].[First Digit Count % Cumul])=&#8217;Percent&#8217;; // format correction </font>
<p><font size="1" face="Courier New">// Cumulative value for Kolmogorov–Smirnov test <br />Create Member CurrentCube.[Measures].[First Digit Benford % Cumul]<br />as<br />[Measures].[First Digit Benford %]<br />+[First Digit].[First Digit].currentmember.prevmember<br />,Format_String = &#8220;Percent&#8221;, DISPLAY_FOLDER=&#8217;Benford&#8221;s analysis&#8217;;<br />// Recursion seed<br />([Measures].[First Digit Benford % Cumul],[First Digit].[First Digit].&amp;[1])<br />=[Measures].[First Digit Benford %];<br />// Exception for [All]<br />([Measures].[First Digit Benford % Cumul],[First Digit].[First Digit].[All])=1;<br />Format_String([Measures].[First Digit Benford % Cumul])=&#8221;Percent&#8221;; // format correction </font>
<p><font size="1" face="Courier New">// The Kolmogorov–Smirnov statistic (D)<br />Create Member CurrentCube.[Measures].[First Digit Cumul Delta to Benford]<br />as<br />IIF([Measures].[First Digit Count % Cumul]=0,NULL,<br />&nbsp;&nbsp;&nbsp; ABS([Measures].[First Digit Benford % Cumul]-[Measures].[First Digit Count % Cumul]))<br />,Format_String = &#8220;Percent&#8221;, DISPLAY_FOLDER=&#8217;Benford&#8221;s analysis&#8217;; </font>
<p><font size="1" face="Courier New">// The goodness-of-fit Kolmogorov–Smirnov test (SQRT(N)*D)<br />Create Member CurrentCube.[Measures].[First Digit K-S Test]<br />as<br />MAX([First Digit].[First Digit].[First Digit],[Measures].[First Digit Cumul Delta to Benford])<br />*<br />VBA![SQR](([Measures].[First Digit Count],[First Digit].[First Digit].[All]))<br />,Format_String = &#8220;#.00&#8243;, DISPLAY_FOLDER=&#8217;Benford&#8221;s analysis&#8217;; </font>
<p><font size="1" face="Courier New">// &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br />// Benford&#8217;s analysis (END) &#8211;<br />//&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;</font><br />
<h2>&nbsp;</h2>
<h2>Sample results</h2>
<p>The analysis in tabular form in Excel: </p>
<p><a href="http://www.metrica-bi.de/wp-content/uploads/2015/06/image2.png"><img title="image" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="image" src="http://www.metrica-bi.de/wp-content/uploads/2015/06/image_thumb2.png" width="388" height="545"></a> </p>
<p>The larger the “K-S Test”, the more probable that the data does not obey the Benford’s law (see table of critical values <a href="https://en.wikipedia.org/?title=Benford%27s_law#Statistical_tests">here</a>).</p>
<p>&nbsp;</p>
<p>The graphical analysis could look like:</p>
<p><a href="http://www.metrica-bi.de/wp-content/uploads/2015/06/image3.png"><img title="image" style="border-left-width: 0px; border-right-width: 0px; border-bottom-width: 0px; display: inline; border-top-width: 0px" border="0" alt="image" src="http://www.metrica-bi.de/wp-content/uploads/2015/06/image_thumb3.png" width="405" height="251"></a> </p>
<p>Here the data for France shows a pretty good fit (K-S Test = 1,19).</p>
<p>&nbsp;</p>
<h2>Implementation notes</h2>
<ul>
<li>The MDX here is not performance optimized. You can start with using precalculated constants for [First Digit Benford %].
<li>You can make some of calculated measures invisible. For instance cumulative ones.
<li>Here we made only one measure available for Benford’s analysis. To analyze further measures/columns you have to define more keys for dimension First Digit in your tables and add a selector dimension for the [First Digit Count] (unrelated to data) and do the switching in your cube script. If you have more than one measure/column to analyse per measure group, you have to define a role playing dimensions based on the dimension First Digit.</li>
</ul>
<h2>Warning </h2>
<p><em>Of course you don’t have to expect, that every data obeys the Benford’s law. Please refer to the corresponding topics and go through references </em><a href="https://en.wikipedia.org/?title=Benford%27s_law"><em>here</em></a><em>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.metrica-bi.de/fraud-analysis-with-ssas-benfords-law-test-in-olap-cubes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
