A Summer of Code: The Proposal

Hi
Now let's get to what I am going to work on over the summer. As mentioned in the last post, this one is going to elaborate my understanding and ideas for the project.
I applied for the project Message Queue based Email Archiver under the org GNU Mailman. GNU Mailman is an open source org which develops and maintains Mailman, the mailing list management system. Mailing lists are used in more places than one might think about to begin with. They're probably being used in the institution and the organisation(s) you're enrolled in, in the promotional schemes of big corporates and so on. You just might be a member of one of those without even realising it (Think of some spam that you receive!). What mailman does is to manage these mailing lists, their moderators, owners, banned lists and so on. Also it maintains a server to which all the mails to registered mailing lists are directed, validates that email, puts it on hold for moderation by list admins if required, calculates the recipients and delivers the mails into their inboxes. Wait, another thing, it also has an option to archive the mails it sends out to list subscribers, and thats where my project steps in :) .

Right now, the archiving is done something like this -

WAIT! If you're freaked out by the complexity, worry not. I'll explain in broader sense.
The mailman server and the archive servers are currently separate, i.e. decoupled, which is a good thing, keeping separate things separate as we can.
A mail to can be archived by to different archives by their own archive specific methods(like a POST request, or a mail( inside a mail :D ) to the archive server.
Despite such a neat design, we have some issues here. Firstly, currently the POST request method isn't secure enough, thus archive server and mailman server need to be in the same subnet(or even local machine). Secondly, what if our archive server decided to go off to sleep for a teeny weeny lil second. BOOM. message lost, an error code received, no retryability(right now). Enter the scene message patterns. These beautiful architectural solutions provide some fascinating ways in which machines can pass messages among themselves. Also, these offer options for a queue of messages to be made between our mailman server and the archive server, where messages can be stored for the archive server to consume as and when it can(in an asynchronous manner). Another thing, we can attach as many archive servers, or even other possible web apps to these queues and not bother our mailman server. Also, the issue about whether our mail was received by the archive can now be handled by our message queuing system, saving out mailman server quite some headache. These are just some of the possibilities that message patterns bring in.

My project implementing an interface to accomodate such message queue based systems in our current system, and then subsequently implement message patterns in various backends. The choice of the backends here is dependent on what features it provides. I investigated some backends during my study about the project, and found RabbitMQ quite good for the job. Also ZeroMQ and Redis were also found to be possible good candidates. A small noting made made by me on these in my proposal is as follows -
"

RabbitMQ - It also offers flexible and customizable routing options between publisher and message queues. This could be helpful to publish to select subset (based on mailing list) subscribers listening to our publisher. Also its reliability feature suits the requirements in our context as it provides both acknowledgements as well as personalised queues for each subscriber.
ZeroMQ - Unlike others, it can run without a dedicated message broker. It basically provides web sockets which can be configured to customize message patterns, and can thus be used to suit specific personal needs. Though highly customizable, it might require explicit implementation of a few features.

Redis - Though it started with some initial confusion about lack of any reliable way to message transfer after I approached the redis community for solutions, I subsequently found quite a nice solution in this blog post. It offers features like persistence to disk and message reference queues per subscriber, although no approach was found to provide retryability in case of failure. Thus , it is a lightweight broker system which can be used to provide reliability.

"

Following was the timeline I proposed after much deliberations with myself, mentors, and some seniors who had worked on such projects before.

Timeline :

Till 10 May	Find out more about Message Patterns and understand Mailman and Archiver Interface code architecture. Get familiar with ‘tox’, the testing system.
10 May - 22 May	Find out more about the various backends available and corresponding message patterns supported. Discuss and finalise various architectural designs for messaging system with the mentors.
23 May - June 1	Implement plugin for IArchiver supporting multiple backends. Test the plugin with dummy messaging systems.
June 2 - June 12	Implement a message queue system for a viable backend. Finish with the minimum viable product.
June 13 - June 19	Write unit tests for the mvp. Ask for community review of the product. Catch up with any leftover tasks
June 20 - June 27	Code submitted for mid term evaluation. Write basic documentation for mvp. Reconsider the architecture for message queues and plugin.
June 28 -July 8	Improvise upon the message queue system based on the evaluation. Research about additional backends and message patterns viable for the job.
July 9 - July 25	Discuss and implement more message queuing patterns suitable. Implement support for suitable alternative backends. Attempt to incorporate list events into the messaging system.
July 26 - August 1	Refactoring Week. Seek community review for existing product and design architecture. Write integration tests.
Aug 2 - August 13	Complete any leftover work. Write documentation. Improvise upon reviews by mentors.Attempt to incorporate extensible metrics(as mentioned above). Write any tests left(system testing). Complete any unachieved milestone.
August 14 - August 20	Tidy up code. Complete documentation. Perform rigorous testing upon system. Submit for final submission.

If I am able to complete the intended tasks as part of the project, I intend to incorporate events such as list creation, moderator/owner assignment and other list events as part of a larger more general archiving system. One approach I feel would be good is to implement a handler/system of handlers which direct such list events to the archive servers.

I hope I have well explained the objectives for this project. Please feel free to comment here or contact me by email if you have any suggestions/doubts.
Cheers

A Summer of Code

Tuesday, May 3, 2016

The Proposal - Explained

Timeline :

1 comment:

Blog Archive