dgMaster logo:dice dgMaster: data generator, simple.

Developer's guide - 2.2 Data generator class

After having written the form that allows for fine-tuning of the generator, the next step is to actually create the data generator class. Each data generator class needs to implement a simple Interface (IRandomiserFunctionality):

/*
 * IRandomiserFunctionality.java
 */

package generator.extenders;

/**
 * Provides the methods that need to be implemented by a data generator class.
 
 */
public interface IRandomiserFunctionality
{
    /*
     * Passes the panel information on to the actual data generator class.
     * A RandomiserInstance object contains the choices the user made via the panel
     * The method is called immediately after the implementing class's constructor
     */
    public void setRandomiserInstance(RandomiserInstance ri);
    
    /*
     * Generates the actual data. dgMaster knows how to type-cast the object by 
     * looking into the xml data in SystemDefinitions.xml. If the returned object
     * is to be used in a text file, dgMaster uses the toString() method of the object
     * to retrieve the textual representation of the object.
     */
    public Object generate();
    
    
    /*
     * Performs any tidying up at the end of a data-generation session.
     */
    public void destroy();
}

Whenever a data-generation session is started, for every single data generator that participates in it, dgMaster, will call setRandomiserInstance exactly once, (after having invoked the generator's constructor), followed by the generate method, (in a round-robin fashion and for a number of times as requested), and destroy at the end of the data-generation session (fig.1): Lifecycle of a data generator. 

 

 

 

Fig.1: Lifecycle of a data generator class.

 

 

Next, the actual data generator class for the "full name" generator is shown:

/*
 * FullnameRandomiser.java
 */

package generator.randomisers;

import generator.misc.Utils;
import java.io.FileNotFoundException;
import java.util.LinkedHashMap;
import java.util.Random;
import java.util.Vector;
import org.apache.log4j.Logger;
import generator.extenders.IRandomiserFunctionality;
import generator.extenders.RandomiserInstance;

/*
 * Each generator needs to implement the IRandomiserFunctionality interface.
 */
public class FullnameRandomiser implements IRandomiserFunctionality
{
    Logger logger = Logger.getLogger(FullnameRandomiser.class);
    int nulls; //percentage of nulls
    
    //these will be retrieved from the RandomiserInstance
    boolean lTitle, lFirstName, lFirstNameFull, lFirstInitial, lMiddle, lLastname;
    Random nullGen, gen1, gen2, gen3, gen4;
    
    //vectors to load the dictionaries and vector sizes...
    Vector<String> vFirstNames;
    Vector<String> vLastNames;
    Vector<String> vTitles;
    int iFNames, iLNames;
    
    
    //called immediately after the constructor. I have kept the constructors empty
    //as I want dgMaster to create the objects quickly. loadDictionaries could
    // have been part of the constructor though...
    public void setRandomiserInstance(RandomiserInstance ri)
    {
        LinkedHashMap hashMap;
        String sNull, sTitle, sFirstName, sFirstNameFull, sFirstInitial, sMiddle, sLastname;
        
        //add the titles...
        //Mr and Dr are special cases as explained later,
        //they do not have to be added here...
        vTitles = new Vector();
        vTitles.add("Mrs");
        vTitles.add("Ms");
        vTitles.add("Miss");
        
        //retrieve the hashmap from RandomiserInstance
        hashMap = ri.getProperties();
        
        //extract the values from the hashmap
        sNull  = (StringhashMap.get("nullField");
        sTitle  = (StringhashMap.get("includeTitle");
        sFirstName    = (String)hashMap.get("includeFirstName");
        sFirstNameFull= (String)hashMap.get("firstNameFull");
        sFirstInitial = (String)hashMap.get("firstNameInitial");
        sMiddle       = (String)hashMap.get("includeInitialMiddle");
        sLastname     = (String)hashMap.get("includeLastName");
        try
        {
            nulls = Integer.parseInt(sNull);
            lTitle        = Boolean.valueOf(sTitle);
            lFirstName    = Boolean.valueOf(sFirstName);
            lFirstNameFull= Boolean.valueOf(sFirstNameFull);
            lFirstInitial = Boolean.valueOf(sFirstInitial);
            lMiddle       = Boolean.valueOf(sMiddle);
            lLastname     = Boolean.valueOf(sLastname);
        }
        catch(Exception e)
        {
            logger.error("Error setting the hashmap values", e);
        }
        
        //now load the dictionaries (english firstnames and lastnames)
        loadDictionaries();
        
        //nulls generator
        nullGen = new Random();
        
        //firstname, lastname, titles generators randomisers,
        //could have as well used the same one...
        gen1 = new Random();
        gen2 = new Random();
        gen3 = new Random();
        gen4 = new Random();
    }
    
    
    //loads data from two standard dictionaries provided with dgMaster:
    // english first names and english last names.
    private void loadDictionaries()
    {
        Utils utils = new Utils();
        logger.debug("Loading files");
        try
        {
            vFirstNames = utils.readFile("..\\GenGUI\\resources\\en_firstnames.txt");
            vLastNames  = utils.readFile("..\\GenGUI\\resources\\en_lastnames.txt");
        }
        catch(FileNotFoundException fnfe)
        {
            logger.error("Error loading files...", fnfe);
        }
        iFNames = vFirstNames.size();
        iLNames = vLastNames.size();
        logger.debug("Loading files.Done");
    }
    
    
    
    public Object generate()
    {
        
        int i1 = gen1.nextInt(iFNames);
        int i2 = gen2.nextInt(iLNames);
        
        //we have three female titles
        int i3 = gen3.nextInt(3);
        
        //a smalll probability that someone has the Dr. title
        double drProb = gen3.nextDouble();
        double midlInitProb = gen4.nextDouble();
        
        //check the nulls
        int nullProb = nullGen.nextInt(100);
        if(nullProb<nulls)
            return null;
        
        String aSplitNames[], sFname;
        String sFirstName, sFirstInitial, sLastName, sMiddleInitial, sex, title;
        String returnValue="", space="";
        //form all the data and then include only the ones we need
        
        //get firstname and sex: Firstname,sex (e.g. Michael,m)
        sFname = vFirstNames.elementAt(i1);
        aSplitNames = sFname.split(",");
        sFirstName  = aSplitNames[0];
        sex         = aSplitNames[1];
        sFirstInitial = sFirstName.substring(0,1);
        
        sLastName      = vLastNames.elementAt(i2);
        sMiddleInitial = vLastNames.elementAt(i2).substring(0,1"." ;
        if(sex.equalsIgnoreCase("m"))
            title = "Mr";
        else
            title = vTitles.elementAt(i3);
        
        //there will only be a small probability that someone's title is Dr
        if(drProb<=0.3)
            title = "Dr";
        
        //there will be quite some probabilty that not all people will have a middle initial
        if(midlInitProb<=0.85)
            sMiddleInitial="";
        
        
        //now we have all the data we want, we can form the full name as requested by the user
        returnValue=""; space="";
        if(lTitle)
            returnValue = title + " ";
        
        if(lFirstName)
            if(lFirstNameFull)
                returnValue+= sFirstName + " ";
            else
                returnValue+= sFirstInitial + ". ";
        
        if(lMiddle && sMiddleInitial.length()>0)
            returnValue += sMiddleInitial + " ";
        
        if(lLastname)
            returnValue += sLastName;

        return returnValue.trim();
    }
    
    
    public void destroy()
    {
        //not anything to do here with this generator
    }
    
}

The code above basically does three things:

  1. Receives the RandomiserInstance object, which represents the user's preferences via the setRandomiserInstance(). The RandomiserInstance object contains the hash map with the options the user specified in the related panel. Their values are extracted, the dictionaries are loaded and the random generators are instantiated. Admittedly, some of these functions could have been in the constructor. At the current time, dgMaster will update its progress bar after having called each of the setRandomiserInstance method of each data generator. So, I have put all of the operations in that method (nothing prevents you from using the constructor for usual things, but you still have to use setRandomiserInstance to extract hash map values).
  2. Generates the actual data, notice that I have made the generator produce a lower number of Dr titles, as well as a lower percentage of middle initials (even when the user selected that middle initials should be included). Of course, these percentages could also be user-defined. If you think that it is worthy making a certain addition in your panel, you can include that one as well.
  3. Performs any necessary tidying up, this method is called at the end of a data generation session; however there is nothing to tidy up in this example.

The next section looks into another case study, (a Spanish names generator [PENDING]), and then the turorial continues to explain how you can synchronise generators so that they seem to generate inter-related data, this is important in cases such as the email generator. In such a case, you would like your email generator to generate emails with usernames that resemble your full names generator.

[PENDING]


<< Previous    ToC