An Approach to Email Categorization with the ME Model
Recommend Documents
intelligence; requirements validation; collaborative software design ... you deliver the rock, the customer looks at it for a moment and says, "Yes, but, actually, ...
Aug 11, 2009 - ing and a flexible learning scheme, which we have not im- plemented yet. ... with object search (object detection or localization) as in. Shotton et al. .... undulating or rather a zig-zag. 2.3.1 Labeling ...... Weakly supervised.
Keywords. Text categorization, naïve Bayesian classifier, Rocchio algorithm. 1. ..... on the Bayesian probability and the multinomial model, we have and with ...
Oct 31, 2013 - tions is the clear separation of the model transformation framework ... composition (section 6.7) and Testing of transformations, see section 6.8. ..... They are usually implemented as libraries for that host language with an API that
Oct 31, 2013 - Master thesis of. Georg Hinkel ..... Stages of the transformation . ..... tecture of NMF Transformations, followed by the stage model in section 7.3.
A middle ground is to classify email into groups, but leave them in the inbox until filed or deleted by the user. The email add-in prototype addresses this middle ...
support categorization and classification stages in the systematic mapping using Visual Text Mining (VTM), aiming at reducing time ... a more open form of systematic review, providing an overview of a ... recently applied in different domains, such a
An integrated approach to model an ungulate population based on telemetry, genotyping , camera trapping and aerial survey data in the Bohemian Forest ...
of large and midsize data sets. This method clusters ... abstracts from a variety of business-related journals. 1. .... mation. A small number of words, however, referred to here as ''stop ..... Journal of Accounting and Economics Accounting. 1996â
becomes apparent that projects consistently run smoother on agile. And better
results are produced. Plus, you likely can admit that waterfall isn't the perfect ...
the PageRank algorithm to logistic networks is presented. Three different ap- proaches to model reduction based on the structural properties of the locations are.
model reduction of a logistic network based on ranking. The rank of ... [4], which has been a core component of Google Internet search engine in its early days, is ...
and extended to multiple-input and multiple-output (MIMO) systems (Gong and Murray-Smith, 1993). These model reduction methods are mainly based on ...
scenarios or paper mock-ups), they would have to make many design decisions on their ... In order to be meaningful, the task model of a new application should be ...... o Different user interface objects, for example, in a desktop application it is.
models are in Σ Ï. ⨠and ¬ interpreted as usual abca ···|= a but bbca···|= a abca ···|= Xb but abca···|= Xc aaaaaaaaabca ···|= aUb but aaaaaaaaabca ···|= aUc.
Model-Based Systems Engineering Center, Georgia Institute of Technology, Atlanta, ... viewpoint (e.g., requirements such as regulatory constraints or cost ...
different avenues, most notably Linkedin InMail and email. So what's the best approach for reaching out to technical tal
1476 records - documents constitute a âwar diaryâ of the military operation in Afghanistan, containing a ... provided an aggregated view on the war in Vietnam, the WikiLeaks war .... variables listing the number of âCivilian,â âEnemy,â â
Dec 6, 2005 - Section 5 describes our experiments, the test parameters and details of the data ... level filters because an entire word match, either as a single unit, or a string of ..... the probabilities, the answer to which will depend on the cla
1476 records - AN APPROACH FOR DATA JOURNALISM ILLUSTRATED WITH .... Hence, each media outlet had to write its own stories based on the ma-.
propose a conceptual framework designed to help clarify discussion and facilitate theoretical ... Findings â It is demonstrated that arguments for global consumer culture (GCC) are most easily ..... hammer, double knit pants, office desk).
extracted as features for the categorizer. For example, the content of the âFromâ is âJason Lee â, then the features could be extracted as ...
An Approach to Email Categorization with the ME Model Peifeng LI, Jinhui LI, Qiaoming ZHU School of Computer Science and Technology, Soochow University Suzhou, Jiangsu, China, 215006 {pfli, jhli, [email protected]}
precision in text categorization, but it also doesn’t satisfy the requirement of rapidity and dynamics in email categorization because training the categorization model is expensive on time cost. Therefore, this paper introduces the Me model (Berger, et al., 1996) into email categorization and puts forward an hierarchical approach which categorizes the email based on its contents and properties, such as “subject”, “sender”, “receiver”, etc. This paper also discusses other techniques to improve the performance of categorizer, such as email pre-processing, features selection, iteration, etc.
Abstract This paper puts forward a hierarchical approach for categorizing emails with the ME model based on its contents and properties. This approach categorizes emails in a two-phase way. First, it divides emails into two sets: legitimate set and Spam set; then it categorizes emails in two different sets with different feature selection methods. In addition, the pre-processing, the construction of features and the ME model suitable for the email categorization are also described in building the categorizer. Experimental results show that the hierarchical approach is more efficient than the previous approach and the feature selection is an important factor that affects the precision of email categorization.
Introduction To Pre-process the Email
With the popularization of Internet, the email has become one of the most popular methods for people to communicate each other. Though the email gave us such timely convenience, it also caused the trouble of processing omnifarious emails. Classifying those emails into categories is a convenient and efficient way for people to read them. Email categorization (also called as email classification) was oriented from text categorization and it assigns new emails to pre-defined categories automatically based on their contents and properties. A variety of approaches towards email categorization have been put forward in the past few years. Popular approaches to email categorization include RIPPER (Cohen, 1996; Provost, 1999), Rough Set (Li, et al., 2004), Rocchio (Yang, et al., 2002; Yang, et al., 2003) , Naïve Bayes (Yang, et al., 2003; Bekkerman, et al., 2004), SVM (Bekkerman, et al., 2004), Winnow (Zhu, et al., 2005), Neural network (Clark, et al., 2003), etc. Those work proposed some useful approaches to email categorization. Nevertheless, most of the above approaches were oriented from text categorization, so those approaches classify emails using the plain text categorization approach, regardless of the differences between texts and emails. However, an email is a semi-structure text which includes a structure in the email head and it redounds to email categorization. Besides, the most popular approach used in email categorization is Bayes. This approach is poor on the precision of categorization though it is suitable for the requirement of the rapidity and dynamics in email categorization. Otherwise, mostly the SVM approach can get the highest
After having analyzed the structure of an email, we divide it into two parts: contents and properties. Contents include the email body and the subject which constitute the main part of an email. Properties include those fields such as “From”, “Cc”, “To”, “Date”, “SMTP server”, “Attached files”, etc. The content part is mostly like a plain text while the property part is the characteristic of the email.
To Pre-process the Email Contents The purpose of pre-processing email contents is to delete unused texts and to standardize them. The email format is different from plain text, and the additional pre-processing for email categorization is described as below: (1) To convert the native encoding to Unicode There are lots of encoding schemas to encode characters. For example, ASCII and ISO 8859 are two popular encoding schemas to encode the phonetic characters. So it is necessary to unify the encoding schema of the characters. Otherwise, a same word in different emails would be regarded as different words due to their different encoding schema, especially for ideographic characters. In this step, a convert tool [Li, 2005] is provided to recognize the encoding schema and then to convert texts to Unicode, the international standard of character encoding schema. (2) To filter the HTML tag in the email body Generally the email body has two styles: plain text and html. If the email is in the html format, it should be converted to plain text because most html markups would confuse the categorizer except “” and “”.