Strict Standards: Declaration of Walker_Page::start_lvl() should be compatible with Walker::start_lvl(&$output) in /home/galeal/ideaforge.org/blog/wp-includes/classes.php on line 576

Strict Standards: Declaration of Walker_Page::end_lvl() should be compatible with Walker::end_lvl(&$output) in /home/galeal/ideaforge.org/blog/wp-includes/classes.php on line 576

Strict Standards: Declaration of Walker_Page::start_el() should be compatible with Walker::start_el(&$output) in /home/galeal/ideaforge.org/blog/wp-includes/classes.php on line 576

Strict Standards: Declaration of Walker_Page::end_el() should be compatible with Walker::end_el(&$output) in /home/galeal/ideaforge.org/blog/wp-includes/classes.php on line 576

Strict Standards: Declaration of Walker_PageDropdown::start_el() should be compatible with Walker::start_el(&$output) in /home/galeal/ideaforge.org/blog/wp-includes/classes.php on line 593

Strict Standards: Declaration of Walker_Category::start_lvl() should be compatible with Walker::start_lvl(&$output) in /home/galeal/ideaforge.org/blog/wp-includes/classes.php on line 687

Strict Standards: Declaration of Walker_Category::end_lvl() should be compatible with Walker::end_lvl(&$output) in /home/galeal/ideaforge.org/blog/wp-includes/classes.php on line 687

Strict Standards: Declaration of Walker_Category::start_el() should be compatible with Walker::start_el(&$output) in /home/galeal/ideaforge.org/blog/wp-includes/classes.php on line 687

Strict Standards: Declaration of Walker_Category::end_el() should be compatible with Walker::end_el(&$output) in /home/galeal/ideaforge.org/blog/wp-includes/classes.php on line 687

Strict Standards: Declaration of Walker_CategoryDropdown::start_el() should be compatible with Walker::start_el(&$output) in /home/galeal/ideaforge.org/blog/wp-includes/classes.php on line 710

Strict Standards: Redefining already defined constructor for class wpdb in /home/galeal/ideaforge.org/blog/wp-includes/wp-db.php on line 58

Deprecated: Assigning the return value of new by reference is deprecated in /home/galeal/ideaforge.org/blog/wp-includes/cache.php on line 99

Strict Standards: Redefining already defined constructor for class WP_Object_Cache in /home/galeal/ideaforge.org/blog/wp-includes/cache.php on line 404

Deprecated: Assigning the return value of new by reference is deprecated in /home/galeal/ideaforge.org/blog/wp-includes/query.php on line 21

Deprecated: Assigning the return value of new by reference is deprecated in /home/galeal/ideaforge.org/blog/wp-includes/theme.php on line 576
MailSentry - rewriting a ruby mail analysis tool in erlang — The Idea Forge

MailSentry - rewriting a ruby mail analysis tool in erlang

I’m going to become an erlang ninja. So rewriting some of my old ruby code that could use some parallelizing seems like a good way to start. I’ll start with a description of what the project is and why I think erlang would do a better job than ruby and then I’ll set out to prove myself right!

Mail Sentry (Ruby Version)

Years ago I had a grand vision of an email analysis system for hospitals that would scan all outbound emails for patient health information or other sensitive data. The system could be configured to block offending mail and notify the privacy office or force the message through a challenge response process that forces the receiver to view it over a secure connection rather than allow the message to be sent unsecured.

So I built it using Ruby. It’s never seen any “real world” action, but it works. The gist of it is this:

You take all the sensitive data you have (patients name, addresses, medical record numbers, etc) and it’s tokenized and hashed and loaded into an in memory DB.

There is a message analyzer that’s added as a filter on your organization’s mail relay (to sendmail, zmailer, whatever). It rips apart the message and converts binary attachments into tokenized text (pdf, word, etc. It can even do OCR on scanned images). It’s passed off to a DRB service that hosts the in-memory DB of hashed sensitive data.

The algorithm

The assumption is that if you have a list of just the first name of 100 patients/clients/whatever, then you aren’t actually violating anyone’s privacy because you haven’t uniquely identified any specific patient. So the DRB service builds a list of hits against specific patients/entities. For example, it determines that a patient 1 had their first name referenced, patient 2 had their last name referenced, patient 3 had their first, last and social security number referenced.

Those “hits” are passed to a rules engine that allows each institution to define what constitutes a breach of privacy. Pretty much anyone would agree that patient 3 in the example above is enough info to uniquely identify them. The rules also specify the action (allow, deny, log, notify, etc) to perform.

Mail Sentry Architecture

The erlang re-write

This all works great. So.. why re-write it in Erlang?

Well, we have many processes (managed by the mail relay) generating analysis requests, but only 1 DRB service to manage them. If we want to scale beyond a certain point it’ll get rough.

So, here’s the plan: Let’s rewrite the “Message Analyzer” as an erlang server. The ruby is working perfectly for ripping apart the email, running the rules engine, converting the attachments and so on.. so let’s leave it as Ruby.

If it all works out, we should have a super scalable email analysis tool that can scale to any sized organization, with any amount of sensitive data.

Stay tuned!

12 comments ↓

#1 jazzhammer on 06.12.08 at 10:53 am

an impressive idea, and undertaking. hospitals accept a severe risk of confidentiality breach with current practice. they’ll need something in place before they start getting sued.

#2 joshua noble on 06.15.08 at 6:36 am

If you’re looking to mimic some of the rules functionality from rein in erlang you might be interested in eXAT, which I can’t personally vouch for, but would worth taking a look at I’d think.

#3 joshua noble on 06.15.08 at 6:39 am

Ok, so after reading some code I realized the rules engine is ERES, which w/in eXAT and is stored here: http://sourceforge.net/projects/eresye/

#4 Luke Galea on 06.15.08 at 8:08 pm

Thanks Joshua. I was actually thinking I’d leave the rules engine in Ruby since it isn’t very computationally intensive. I’ll definately take a look at ERES though for academic purposes!

#5 MailSentry released — The Idea Forge on 07.27.08 at 8:51 am

[...] I’m not done re-writing the relevant ruby portions of MailSentry in Erlang yet. But I really should release early, release [...]

#6 andre on 07.28.14 at 7:18 am

hauling@rapier.quantitative” rel=”nofollow”>.…

ñïñ….

#7 Maurice on 08.22.14 at 1:17 pm

parasitic@motet.recreating” rel=”nofollow”>.…

сэнкс за инфу!!…

#8 tom on 08.22.14 at 1:44 pm

theories@harlan.twenties” rel=”nofollow”>.…

спс….

#9 Floyd on 08.22.14 at 8:52 pm

motionless@fulminating.roberts” rel=”nofollow”>.…

спасибо за инфу!!…

#10 Alfonso on 08.22.14 at 9:29 pm

icicle@multitudinous.corrosion” rel=”nofollow”>.…

благодарю!…

#11 ricardo on 08.22.14 at 9:51 pm

grounding@sheds.uglier” rel=”nofollow”>.…

спс!!…

#12 Cory on 08.22.14 at 10:53 pm

altogether@slump.horstman” rel=”nofollow”>.…

спс….

You must log in to post a comment.