OpenVPMS Loader Kettle plugin documention

I have been unable to find any documention on using the OpenVPMS Loader Kettle plugin.  Hence I have started to write some notes on its use.

I would appreciate and comments, clarifications, corrections and additions. '???' mark things that I either know noting about or am unsure (and no ??? does not mean it is correct - just that I think it is ;-).

The OpenVPMS Loader Kettle plugin allows mass data to be loaded into the OpenVPMS database. 
Since OpenVPMS uses an entity-relationship architecture, rather that the traditional relational database design, one needs a tool that understands the archetypes used by the system. Also, because one does not have the table record keys present in a relational database, there is the potentional problem that having say loaded customer Fred into the system, one cannot at a later time, load in Fred's pets - because Fred's identity number is not known (because it was generated internally by the loader).  The Loader plugin provides for this by allowing you to specify an ID field that can be used at a later time to refer to the record previously loaded.  Thus Fred's customer record can be loaded with say ID 'C004236' (with 004236 being his client number in some other database), and this ID can be used later when loading Fred's pets.  This 'external-key to internal-id' data is recorded in the etl-log file in the database.

The Loader is also capable of automatically generating lookup data.  Say that the customer archetype defines a field 'title' where the contents must be present in the title lookup table, then without auto lookup generation, you would have to first check the data set to be loaded to see what values are used for the titles (eg Mr, Ms, Dr, Professor, Mde, Doctor) and then create the title lookup table with these entries. With auto lookup generation, this is not necessary and the Loader will create the necessary entries for you. (However, you do have the problem that if the incomming data quality is poor (with titles like Mr, MR, Mr. and M R) then you will end up with all these in the lookup table - and you will have to use the Replace facility to clean things up.)

The Kettle Transform step definition window contains the following: (obvious things like the OK button are omitted)

ID Field Name:  the name of the field which you want to use as your 'external-key'. This field can be left empty, but you will not be able to later refer to the records being loaded.
The special field name $id can be used to refer to this ID field.

Batch Size: this set the number of records loaded before a commit is done.  Increasing the number increases the performance at the cost of losing data if a abort occurs.  So setting it to say 99999999 will ensure that only one commit is done at the end of the run, but if anything goes wrong nothing will be loaded because all records added will be rolled back out as a result of the transaction failure.

Generate Lookups: ticking this checkbox invokes the auto lookup generation feature discussed above.  Note that this setting applies to all lookup fields in the data set being loaded.  You cannot enable/disable it for specific fields.  Because of this, its worth cleaning up your data as much as possible prior to loading.

Skip Processed: ??? This checkbox tells the loader what to do about pre-existing records with the same 'external-key'. ie if you have loaded Fred's record in a previous run, what happens when you try to load it again.  Ticking the checkbox says to skip the record, not ticking it says to overwrite the record with the new data. ???

The following can be set for each field:
Field Name: the name of the step's input record

Map To: The entry here identifies the field to be loaded.  For simple fields its the archetype name and field name, for example <party.customerperson>firstName.  For collections it will have a format like <party.customerperson>contacts[0]<contact.location>address where the index [0] defines this as the first contact.

Exclude null: ??? If this is set to N, then the field can have a null value.  If set to Y, then if the field has a null value, then ??? what happens ???

Value: If you leave this blank, then the data from the incomming field is put into the 'map to' field.  However, you can enter expressions here.  These will normally operate on the incoming field data in some way as to produce a different value that is put into the 'map to' field.  The incomming field data is referred to using '$value.  For example, for the customer account type we map to <party.customerperson>type[0] and set the Value to <lookup.customerAccountType>$value
??? other examples of entries in the Value column ???

Exclude default objects: ??? what does this do ???

 

I would appreciate any inpu that people have.

 

Regards, Tim

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: OpenVPMS Loader Kettle plugin documention

Feel free to create a new page to document this in the Implementors Handbook at http://www.openvpms.org/implementors-handbook

Generate Lookups:

No longer required. See OVPMS-1240.

Skipped Processed:

If selected, a row that has been processed successfully already will be skipped.

If not selected, all rows will be reprocessed. Care needs to be taken as this can result in duplicate objects being created.

Exclude null:

If "Exclude null" is set to Y, then a null input won't be set on the output field.

If "Exclude null" is set to N, then a null input will be set on the output field, replacing any default value.

Value:

Appointment mapping example

Field name Map to Value
SCHEDULE <act.customerAppointment>schedule[0]<participation.schedule>entity <party.organisationSchedule>$value
APPOINTID <act.customerAppointment>schedule[0]<participation.schedule>act <act.customerAppointment>$value
APPTYPE <act.customerAppointment>appointmentType[0]<participation.appointmentType>entity <entity.appointmentType>$value
APPOINTID <act.customerAppointment>appointmentType[0]<participation.appointmentType>act <act.customerAppointment>$value
CLIENTID <act.customerAppointment>customer[0]<participation.customer>entity <party.customerperson>$value
APPOINTID <act.customerAppointment>customer[0]<participation.customer>act <act.customerAppointment>$value
PATIENTID <act.customerAppointment>patient[0]<participation.patient>entity <party.patientpet>$value
APPOINTID <act.customerAppointment>patient[0]<participation.patient>act <act.customerAppointment>$value

Exclude default objects:

As of OpenVPMS 1.5, the "Exclude default objects" column is no longer required. See OVPMS-1239 for details.

Re: OpenVPMS Loader Kettle plugin documention

Tim - thanks for this.

Yes I will be happy to clean this up with your corrections and add it to the Implementors Handbook.

Would there be any objections if I added a new top page called say "Notes from an RxWorks conversion", and then add the above and any other stuff I generate as sub pages under that.  In this way all my input will be together and people will understand that its not necessarily gospel/the official line - just my ramblings.

Regards, Tim G

Re: OpenVPMS Loader Kettle plugin documention

Call the top level page "OpenVPMS Loader Kettle plugin" or "OpenVPMS Loader plugin for Pentaho Kettle" or similar. The stuff you have so far applies to all conversions; if there's any RxWorks specific stuff you want to add, create a sub-page for it. As for ramblings, they look good so far. Any inconsistencies can be edited out.

-Tim

 

Syndicate content