{"id":187,"date":"2007-05-17T17:44:31","date_gmt":"2007-05-17T23:44:31","guid":{"rendered":"http:\/\/fraudbump.wordpress.com\/?p=23"},"modified":"2007-05-17T17:44:31","modified_gmt":"2007-05-17T23:44:31","slug":"applying-google-spellchecker-principle-in-detecting-online-fraud-2","status":"publish","type":"post","link":"https:\/\/amirbekian.com\/blog\/2007\/05\/applying-google-spellchecker-principle-in-detecting-online-fraud-2\/","title":{"rendered":"Applying &#8220;Google spellchecker&#8221; principle in detecting online fraud"},"content":{"rendered":"<p>One of the ways bad guys manage to penetrate\/influence a web site&#8217;s functionality &#8211; is &#8220;poking around&#8221; by hitting different pages &#8211; often on different geolocations (e.g. instead of XYZ.com &#8211; country specific sites XYZ.de, XYZ.ca etc.) &#8211; coupled with &#8220;playing&#8221; with input parameters &#8211; thus looking for input validation breaches or other site inconsistencies. If successful, bad guys can do a lot of harm &#8211; including manipulation of data (e.g. changing a user&#8217;s state by following some quixotic page sequence), stealing information and so on.<\/p>\n<p>Such breaches could be successfully detected in early stages by using a technique I call &#8220;google&#8217;s spellchecker&#8221; approach. Anybody who used google to check the spelling of a word &#8211; or the right collocation\/phrase &#8211; knows the underlying principle. It&#8217;s (paraphrasing eBay&#8217;s motto) &#8220;people are basically educated&#8221;. That is &#8211; if we have 5 million hits for one spelling and 5 thousand for the &#8220;competitor&#8221; spelling &#8211; then the former is the correct one. (BTW, that is one of the basic principles of linguistics: if enough people say &#8216;nucelar&#8217; &#8211; it automatically becomes a legitimate word).<\/p>\n<p>The way the same principle would work in detecting bad behavior is similar:<\/p>\n<ol>\n<li>assign each page a unique ID (normal practice)<\/li>\n<li> define boundaries of individual user sessions<\/li>\n<li>record the sequence of pages hit during individual sessions &#8211; <em>e.g. 23 (login),887 (account setting landing page), 368 (account setting confirmation), 99 (logout)<\/em>; in other words create a &#8220;page trail&#8221; of each session<\/li>\n<li>record and at the end of each session increment the number of times a particular trail appeared on the radar &#8211;<em> e.g. 23,887,368,99 -&gt; 1035 times;<br \/>\n<\/em><\/li>\n<\/ol>\n<p>Leave the system to bake for some time. Assuming that most people use the site for legitimate purposes, the numbers eventually will reflect the &#8220;normal&#8221; usage of the site. Maintaining that information would help in detecting abnormal usage of the site (<em>e.g. jumping to 368 &#8220;account setting confirmation&#8221; without hitting 887 &#8220;account setting landing page&#8221;<\/em>) very soon after the &#8220;probe&#8221; is done. It is important to detect this early, as &#8211; if the hole becomes widely abused, its sequence may approach the &#8220;normality&#8221; level. We also should have some safeguards\/mechanism to avoid false positives &#8211; e.g. if a new page is added to the site, we want to know about it (e.g. have page age information) and treat it as an exception.<\/p>\n<p>Naturally, the approach is not bullet proof (hardly any one is). Indeed, if fraudsters are sophisticated enough &#8211; they could mask their behavior by  mimicking legitimate sequence, or trying to make session tracking more difficult. Nevertheless that would be a serious  complication of their lives &#8211; or another &#8220;bump&#8221; on their way &#8211; so the goal of slowing them down would be fully  achieved.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the ways bad guys manage to penetrate\/influence a web site&#8217;s functionality &#8211; is &#8220;poking around&#8221; by hitting different pages &#8211; often on different geolocations (e.g. instead of XYZ.com &#8211; country specific sites XYZ.de, XYZ.ca etc.) &#8211; coupled with &#8220;playing&#8221; with input parameters &#8211; thus looking for input validation breaches or other site inconsistencies. &hellip; <a href=\"https:\/\/amirbekian.com\/blog\/2007\/05\/applying-google-spellchecker-principle-in-detecting-online-fraud-2\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Applying &#8220;Google spellchecker&#8221; principle in detecting online fraud<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[9,14],"class_list":["post-187","post","type-post","status-publish","format-standard","hentry","category-fresh-ideas","tag-cyber-security","tag-input-validation"],"_links":{"self":[{"href":"https:\/\/amirbekian.com\/blog\/wp-json\/wp\/v2\/posts\/187","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/amirbekian.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/amirbekian.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/amirbekian.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/amirbekian.com\/blog\/wp-json\/wp\/v2\/comments?post=187"}],"version-history":[{"count":0,"href":"https:\/\/amirbekian.com\/blog\/wp-json\/wp\/v2\/posts\/187\/revisions"}],"wp:attachment":[{"href":"https:\/\/amirbekian.com\/blog\/wp-json\/wp\/v2\/media?parent=187"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/amirbekian.com\/blog\/wp-json\/wp\/v2\/categories?post=187"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/amirbekian.com\/blog\/wp-json\/wp\/v2\/tags?post=187"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}