talk@lists.collectionspace.org

WE HAVE SUNSET THIS LISTSERV - Join us at collectionspace@lyrasislists.org

View all threads

Talend / kettle / import Q

NS
Nate Solas
Fri, Feb 17, 2012 3:05 PM

A data import question for the CS implementers, or any smarty who knows.
I've been having good success mapping our data using Talend (Pentaho's wiki
seemed waaay out of date so I didn't look too closely), and it's been going
pretty well. I am, however, totally stuck at the moment trying to deal with
"nested loops" in XML, basically repeating elements. For instance, the
schema for an object, in pseudo-XML:

<schema> <title>test object</title> ... <dimensionGroups> <dimensionSet> <length>10</length <dimension>width</dimension> <units>inches</units> </dimensionSet> <dimensionSet> <length>20</length <dimension>height</dimension> <units>inches</units> </dimensionSet> </dimensionGroups> ... </schema>

In all the output generators for Talend, you can pick one Loop Element to
put each row into (schema), but can't set up internal loops. There is some
notion of grouping rows and using that for sub-loops, but that would mean
I'd need to duplicate my entire object schema for each dimension (height,
width, depth, whatever) and then also possibly the same issue for
associated people, title languages, etc.

I can't figure this out at all. My current idea is to generate an entirely
separate XML tree for each repeating section and try to merge them in,
either with placeholders and a post-execute hook, or maybe by passing the
nest XML into the full XML as a "Document" type, but either of those seem
very clean or likely to work in a scalable, understandable way.

Susan, does Kettle do this and I should just switch tools? I have the same
problem on import but I'm sort of hacking my away around it. These tools
don't really seem to support the idea of lists within rows (at least at
file generation time), which kind of blows my mind.

Chris Potts, are you solving this another way? Also, before we re-do all
the work, can we take a peek at your "Fine Arts" extensions?

Thanks for any tips.
Nate

A data import question for the CS implementers, or any smarty who knows. I've been having good success mapping our data using Talend (Pentaho's wiki seemed waaay out of date so I didn't look too closely), and it's been going pretty well. I am, however, totally stuck at the moment trying to deal with "nested loops" in XML, basically repeating elements. For instance, the schema for an object, in pseudo-XML: <schema> <title>test object</title> ... <dimensionGroups> <dimensionSet> <length>10</length <dimension>width</dimension> <units>inches</units> </dimensionSet> <dimensionSet> <length>20</length <dimension>height</dimension> <units>inches</units> </dimensionSet> </dimensionGroups> ... </schema> In all the output generators for Talend, you can pick one Loop Element to put each row into (schema), but can't set up internal loops. There is some notion of grouping rows and using that for sub-loops, but that would mean I'd need to duplicate my entire object schema for each dimension (height, width, depth, whatever) and then also possibly the same issue for associated people, title languages, etc. I can't figure this out at all. My current idea is to generate an entirely separate XML tree for each repeating section and try to merge them in, either with placeholders and a post-execute hook, or maybe by passing the nest XML into the full XML as a "Document" type, but either of those seem very clean or likely to work in a scalable, understandable way. Susan, does Kettle do this and I should just switch tools? I have the same problem on import but I'm sort of hacking my away around it. These tools don't really seem to support the idea of lists within rows (at least at file generation time), which kind of blows my mind. Chris Potts, are you solving this another way? Also, before we re-do all the work, can we take a peek at your "Fine Arts" extensions? Thanks for any tips. Nate
CH
Chris Hoffman
Fri, Feb 17, 2012 3:14 PM

Hi Nate,

The nesting groups are a huge problem, I mean challenge. ;-) We are generating separate XML blocks and merging them in after.  Yuteh has developed some java code that is reusable for this purpose.  She's done some documentation for it and I'm sure she would be happy to throw it your way.

We'll also be learning how to sending these as updates later on.

Later on,
Chris

On Feb 17, 2012, at 7:05 AM, Nate Solas wrote:

A data import question for the CS implementers, or any smarty who knows. I've been having good success mapping our data using Talend (Pentaho's wiki seemed waaay out of date so I didn't look too closely), and it's been going pretty well. I am, however, totally stuck at the moment trying to deal with "nested loops" in XML, basically repeating elements. For instance, the schema for an object, in pseudo-XML:

<schema> <title>test object</title> ... <dimensionGroups> <dimensionSet> <length>10</length <dimension>width</dimension> <units>inches</units> </dimensionSet> <dimensionSet> <length>20</length <dimension>height</dimension> <units>inches</units> </dimensionSet> </dimensionGroups> ... </schema>

In all the output generators for Talend, you can pick one Loop Element to put each row into (schema), but can't set up internal loops. There is some notion of grouping rows and using that for sub-loops, but that would mean I'd need to duplicate my entire object schema for each dimension (height, width, depth, whatever) and then also possibly the same issue for associated people, title languages, etc.

I can't figure this out at all. My current idea is to generate an entirely separate XML tree for each repeating section and try to merge them in, either with placeholders and a post-execute hook, or maybe by passing the nest XML into the full XML as a "Document" type, but either of those seem very clean or likely to work in a scalable, understandable way.

Susan, does Kettle do this and I should just switch tools? I have the same problem on import but I'm sort of hacking my away around it. These tools don't really seem to support the idea of lists within rows (at least at file generation time), which kind of blows my mind.

Chris Potts, are you solving this another way? Also, before we re-do all the work, can we take a peek at your "Fine Arts" extensions?

Thanks for any tips.
Nate


Talk mailing list
Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

Hi Nate, The nesting groups are a huge problem, I mean challenge. ;-) We are generating separate XML blocks and merging them in after. Yuteh has developed some java code that is reusable for this purpose. She's done some documentation for it and I'm sure she would be happy to throw it your way. We'll also be learning how to sending these as updates later on. Later on, Chris On Feb 17, 2012, at 7:05 AM, Nate Solas wrote: > A data import question for the CS implementers, or any smarty who knows. I've been having good success mapping our data using Talend (Pentaho's wiki seemed waaay out of date so I didn't look too closely), and it's been going pretty well. I am, however, totally stuck at the moment trying to deal with "nested loops" in XML, basically repeating elements. For instance, the schema for an object, in pseudo-XML: > > <schema> > <title>test object</title> > ... > <dimensionGroups> > <dimensionSet> > <length>10</length > <dimension>width</dimension> > <units>inches</units> > </dimensionSet> > <dimensionSet> > <length>20</length > <dimension>height</dimension> > <units>inches</units> > </dimensionSet> > </dimensionGroups> > ... > </schema> > > In all the output generators for Talend, you can pick one Loop Element to put each row into (schema), but can't set up internal loops. There is some notion of grouping rows and using that for sub-loops, but that would mean I'd need to duplicate my entire object schema for each dimension (height, width, depth, whatever) and then also possibly the same issue for associated people, title languages, etc. > > I can't figure this out at all. My current idea is to generate an entirely separate XML tree for each repeating section and try to merge them in, either with placeholders and a post-execute hook, or maybe by passing the nest XML into the full XML as a "Document" type, but either of those seem very clean or likely to work in a scalable, understandable way. > > Susan, does Kettle do this and I should just switch tools? I have the same problem on import but I'm sort of hacking my away around it. These tools don't really seem to support the idea of lists within rows (at least at file generation time), which kind of blows my mind. > > Chris Potts, are you solving this another way? Also, before we re-do all the work, can we take a peek at your "Fine Arts" extensions? > > Thanks for any tips. > Nate > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
NS
Nate Solas
Fri, Feb 17, 2012 3:47 PM

Ok, glad it's not just me, Do you mean merging them in later as in a
separate script, or still part of a Talend / kettle job? I'll experiment a
bit and see what I can come up with, I feel like there's got to be a way to
keep things contained in one process...

Thanks,
Nate

On Fri, Feb 17, 2012 at 9:14 AM, Chris Hoffman
chris.hoffman@berkeley.eduwrote:

Hi Nate,

The nesting groups are a huge problem, I mean challenge. ;-) We are
generating separate XML blocks and merging them in after.  Yuteh has
developed some java code that is reusable for this purpose.  She's done
some documentation for it and I'm sure she would be happy to throw it your
way.

We'll also be learning how to sending these as updates later on.

Later on,
Chris

On Feb 17, 2012, at 7:05 AM, Nate Solas wrote:

A data import question for the CS implementers, or any smarty who knows.

I've been having good success mapping our data using Talend (Pentaho's wiki
seemed waaay out of date so I didn't look too closely), and it's been going
pretty well. I am, however, totally stuck at the moment trying to deal with
"nested loops" in XML, basically repeating elements. For instance, the
schema for an object, in pseudo-XML:

<schema> <title>test object</title> ... <dimensionGroups> <dimensionSet> <length>10</length <dimension>width</dimension> <units>inches</units> </dimensionSet> <dimensionSet> <length>20</length <dimension>height</dimension> <units>inches</units> </dimensionSet> </dimensionGroups> ... </schema>

In all the output generators for Talend, you can pick one Loop Element

to put each row into (schema), but can't set up internal loops. There is
some notion of grouping rows and using that for sub-loops, but that would
mean I'd need to duplicate my entire object schema for each dimension
(height, width, depth, whatever) and then also possibly the same issue for
associated people, title languages, etc.

I can't figure this out at all. My current idea is to generate an

entirely separate XML tree for each repeating section and try to merge them
in, either with placeholders and a post-execute hook, or maybe by passing
the nest XML into the full XML as a "Document" type, but either of those
seem very clean or likely to work in a scalable, understandable way.

Susan, does Kettle do this and I should just switch tools? I have the

same problem on import but I'm sort of hacking my away around it. These
tools don't really seem to support the idea of lists within rows (at least
at file generation time), which kind of blows my mind.

Chris Potts, are you solving this another way? Also, before we re-do all

the work, can we take a peek at your "Fine Arts" extensions?

Thanks for any tips.
Nate


Talk mailing list
Talk@lists.collectionspace.org

Ok, glad it's not just me, Do you mean merging them in later as in a separate script, or still part of a Talend / kettle job? I'll experiment a bit and see what I can come up with, I feel like there's got to be a way to keep things contained in one process... Thanks, Nate On Fri, Feb 17, 2012 at 9:14 AM, Chris Hoffman <chris.hoffman@berkeley.edu>wrote: > Hi Nate, > > The nesting groups are a huge problem, I mean challenge. ;-) We are > generating separate XML blocks and merging them in after. Yuteh has > developed some java code that is reusable for this purpose. She's done > some documentation for it and I'm sure she would be happy to throw it your > way. > > We'll also be learning how to sending these as updates later on. > > Later on, > Chris > > > On Feb 17, 2012, at 7:05 AM, Nate Solas wrote: > > > A data import question for the CS implementers, or any smarty who knows. > I've been having good success mapping our data using Talend (Pentaho's wiki > seemed waaay out of date so I didn't look too closely), and it's been going > pretty well. I am, however, totally stuck at the moment trying to deal with > "nested loops" in XML, basically repeating elements. For instance, the > schema for an object, in pseudo-XML: > > > > <schema> > > <title>test object</title> > > ... > > <dimensionGroups> > > <dimensionSet> > > <length>10</length > > <dimension>width</dimension> > > <units>inches</units> > > </dimensionSet> > > <dimensionSet> > > <length>20</length > > <dimension>height</dimension> > > <units>inches</units> > > </dimensionSet> > > </dimensionGroups> > > ... > > </schema> > > > > In all the output generators for Talend, you can pick one Loop Element > to put each row into (schema), but can't set up internal loops. There is > some notion of grouping rows and using that for sub-loops, but that would > mean I'd need to duplicate my entire object schema for each dimension > (height, width, depth, whatever) and then also possibly the same issue for > associated people, title languages, etc. > > > > I can't figure this out at all. My current idea is to generate an > entirely separate XML tree for each repeating section and try to merge them > in, either with placeholders and a post-execute hook, or maybe by passing > the nest XML into the full XML as a "Document" type, but either of those > seem very clean or likely to work in a scalable, understandable way. > > > > Susan, does Kettle do this and I should just switch tools? I have the > same problem on import but I'm sort of hacking my away around it. These > tools don't really seem to support the idea of lists within rows (at least > at file generation time), which kind of blows my mind. > > > > Chris Potts, are you solving this another way? Also, before we re-do all > the work, can we take a peek at your "Fine Arts" extensions? > > > > Thanks for any tips. > > Nate > > > > _______________________________________________ > > Talk mailing list > > Talk@lists.collectionspace.org > > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org > >
CH
Chris Hoffman
Fri, Feb 17, 2012 4:19 PM

Right now I think she runs it as a separate scripts but I'm not completely sure.  Talend should be able to call it though.

On Feb 17, 2012, at 7:47 AM, Nate Solas wrote:

Ok, glad it's not just me, Do you mean merging them in later as in a separate script, or still part of a Talend / kettle job? I'll experiment a bit and see what I can come up with, I feel like there's got to be a way to keep things contained in one process...

Thanks,
Nate

On Fri, Feb 17, 2012 at 9:14 AM, Chris Hoffman chris.hoffman@berkeley.edu wrote:
Hi Nate,

The nesting groups are a huge problem, I mean challenge. ;-) We are generating separate XML blocks and merging them in after.  Yuteh has developed some java code that is reusable for this purpose.  She's done some documentation for it and I'm sure she would be happy to throw it your way.

We'll also be learning how to sending these as updates later on.

Later on,
Chris

On Feb 17, 2012, at 7:05 AM, Nate Solas wrote:

A data import question for the CS implementers, or any smarty who knows. I've been having good success mapping our data using Talend (Pentaho's wiki seemed waaay out of date so I didn't look too closely), and it's been going pretty well. I am, however, totally stuck at the moment trying to deal with "nested loops" in XML, basically repeating elements. For instance, the schema for an object, in pseudo-XML:

<schema> <title>test object</title> ... <dimensionGroups> <dimensionSet> <length>10</length <dimension>width</dimension> <units>inches</units> </dimensionSet> <dimensionSet> <length>20</length <dimension>height</dimension> <units>inches</units> </dimensionSet> </dimensionGroups> ... </schema>

In all the output generators for Talend, you can pick one Loop Element to put each row into (schema), but can't set up internal loops. There is some notion of grouping rows and using that for sub-loops, but that would mean I'd need to duplicate my entire object schema for each dimension (height, width, depth, whatever) and then also possibly the same issue for associated people, title languages, etc.

I can't figure this out at all. My current idea is to generate an entirely separate XML tree for each repeating section and try to merge them in, either with placeholders and a post-execute hook, or maybe by passing the nest XML into the full XML as a "Document" type, but either of those seem very clean or likely to work in a scalable, understandable way.

Susan, does Kettle do this and I should just switch tools? I have the same problem on import but I'm sort of hacking my away around it. These tools don't really seem to support the idea of lists within rows (at least at file generation time), which kind of blows my mind.

Chris Potts, are you solving this another way? Also, before we re-do all the work, can we take a peek at your "Fine Arts" extensions?

Thanks for any tips.
Nate


Talk mailing list
Talk@lists.collectionspace.org
http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org

Right now I think she runs it as a separate scripts but I'm not completely sure. Talend should be able to call it though. On Feb 17, 2012, at 7:47 AM, Nate Solas wrote: > Ok, glad it's not just me, Do you mean merging them in later as in a separate script, or still part of a Talend / kettle job? I'll experiment a bit and see what I can come up with, I feel like there's got to be a way to keep things contained in one process... > > Thanks, > Nate > > > On Fri, Feb 17, 2012 at 9:14 AM, Chris Hoffman <chris.hoffman@berkeley.edu> wrote: > Hi Nate, > > The nesting groups are a huge problem, I mean challenge. ;-) We are generating separate XML blocks and merging them in after. Yuteh has developed some java code that is reusable for this purpose. She's done some documentation for it and I'm sure she would be happy to throw it your way. > > We'll also be learning how to sending these as updates later on. > > Later on, > Chris > > > On Feb 17, 2012, at 7:05 AM, Nate Solas wrote: > > > A data import question for the CS implementers, or any smarty who knows. I've been having good success mapping our data using Talend (Pentaho's wiki seemed waaay out of date so I didn't look too closely), and it's been going pretty well. I am, however, totally stuck at the moment trying to deal with "nested loops" in XML, basically repeating elements. For instance, the schema for an object, in pseudo-XML: > > > > <schema> > > <title>test object</title> > > ... > > <dimensionGroups> > > <dimensionSet> > > <length>10</length > > <dimension>width</dimension> > > <units>inches</units> > > </dimensionSet> > > <dimensionSet> > > <length>20</length > > <dimension>height</dimension> > > <units>inches</units> > > </dimensionSet> > > </dimensionGroups> > > ... > > </schema> > > > > In all the output generators for Talend, you can pick one Loop Element to put each row into (schema), but can't set up internal loops. There is some notion of grouping rows and using that for sub-loops, but that would mean I'd need to duplicate my entire object schema for each dimension (height, width, depth, whatever) and then also possibly the same issue for associated people, title languages, etc. > > > > I can't figure this out at all. My current idea is to generate an entirely separate XML tree for each repeating section and try to merge them in, either with placeholders and a post-execute hook, or maybe by passing the nest XML into the full XML as a "Document" type, but either of those seem very clean or likely to work in a scalable, understandable way. > > > > Susan, does Kettle do this and I should just switch tools? I have the same problem on import but I'm sort of hacking my away around it. These tools don't really seem to support the idea of lists within rows (at least at file generation time), which kind of blows my mind. > > > > Chris Potts, are you solving this another way? Also, before we re-do all the work, can we take a peek at your "Fine Arts" extensions? > > > > Thanks for any tips. > > Nate > > > > _______________________________________________ > > Talk mailing list > > Talk@lists.collectionspace.org > > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org > >
YT
Yuteh Theresa Cheng
Fri, Feb 17, 2012 5:04 PM

Hi Nate,

The java code currently runs only outside of Talend (all input/output
filenames & matching/merging tag names are specified on the command
line).  I suppose one can create a "java" component inside Talend, but I
definitely didn't spend time investigating that path --- since we always
ask Talend's tAdvancedOutputXml component to create multiple XMLs on the
main job, I wasn't sure how much trouble it'd be to make the multi-XMLs
feed into the java "merge" component.

I'm definitely curious to see the solution Chris Potts has.

Yuteh

On 2/17/2012 8:19 AM, Chris Hoffman wrote:

Right now I think she runs it as a separate scripts but I'm not
completely sure. Talend should be able to call it though.

On Feb 17, 2012, at 7:47 AM, Nate Solas wrote:

Ok, glad it's not just me, Do you mean merging them in later as in a
separate script, or still part of a Talend / kettle job? I'll
experiment a bit and see what I can come up with, I feel like there's
got to be a way to keep things contained in one process...

Thanks,
Nate

On Fri, Feb 17, 2012 at 9:14 AM, Chris Hoffman
<chris.hoffman@berkeley.edu mailto:chris.hoffman@berkeley.edu> wrote:

 Hi Nate,

 The nesting groups are a huge problem, I mean challenge. ;-) We
 are generating separate XML blocks and merging them in after.
 Yuteh has developed some java code that is reusable for this
 purpose. She's done some documentation for it and I'm sure she
 would be happy to throw it your way.

 We'll also be learning how to sending these as updates later on.

 Later on,
 Chris


 On Feb 17, 2012, at 7:05 AM, Nate Solas wrote:

A data import question for the CS implementers, or any smarty

 who knows. I've been having good success mapping our data using
 Talend (Pentaho's wiki seemed waaay out of date so I didn't look
 too closely), and it's been going pretty well. I am, however,
 totally stuck at the moment trying to deal with "nested loops" in
 XML, basically repeating elements. For instance, the schema for an
 object, in pseudo-XML:
<schema> <title>test object</title> ... <dimensionGroups> <dimensionSet> <length>10</length <dimension>width</dimension> <units>inches</units> </dimensionSet> <dimensionSet> <length>20</length <dimension>height</dimension> <units>inches</units> </dimensionSet> </dimensionGroups> ... </schema>

In all the output generators for Talend, you can pick one Loop

 Element to put each row into (schema), but can't set up internal
 loops. There is some notion of grouping rows and using that for
 sub-loops, but that would mean I'd need to duplicate my entire
 object schema for each dimension (height, width, depth, whatever)
 and then also possibly the same issue for associated people, title
 languages, etc.

I can't figure this out at all. My current idea is to generate

 an entirely separate XML tree for each repeating section and try
 to merge them in, either with placeholders and a post-execute
 hook, or maybe by passing the nest XML into the full XML as a
 "Document" type, but either of those seem very clean or likely to
 work in a scalable, understandable way.

Susan, does Kettle do this and I should just switch tools? I

 have the same problem on import but I'm sort of hacking my away
 around it. These tools don't really seem to support the idea of
 lists within rows (at least at file generation time), which kind
 of blows my mind.

Chris Potts, are you solving this another way? Also, before we

 re-do all the work, can we take a peek at your "Fine Arts" extensions?

Thanks for any tips.
Nate


Talk mailing list
Talk@lists.collectionspace.org

 <mailto:Talk@lists.collectionspace.org>
 http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
Hi Nate, The java code currently runs only outside of Talend (all input/output filenames & matching/merging tag names are specified on the command line). I suppose one can create a "java" component inside Talend, but I definitely didn't spend time investigating that path --- since we always ask Talend's tAdvancedOutputXml component to create multiple XMLs on the main job, I wasn't sure how much trouble it'd be to make the multi-XMLs feed into the java "merge" component. I'm definitely curious to see the solution Chris Potts has. Yuteh On 2/17/2012 8:19 AM, Chris Hoffman wrote: > Right now I think she runs it as a separate scripts but I'm not > completely sure. Talend should be able to call it though. > > > On Feb 17, 2012, at 7:47 AM, Nate Solas wrote: > >> Ok, glad it's not just me, Do you mean merging them in later as in a >> separate script, or still part of a Talend / kettle job? I'll >> experiment a bit and see what I can come up with, I feel like there's >> got to be a way to keep things contained in one process... >> >> Thanks, >> Nate >> >> >> On Fri, Feb 17, 2012 at 9:14 AM, Chris Hoffman >> <chris.hoffman@berkeley.edu <mailto:chris.hoffman@berkeley.edu>> wrote: >> >> Hi Nate, >> >> The nesting groups are a huge problem, I mean challenge. ;-) We >> are generating separate XML blocks and merging them in after. >> Yuteh has developed some java code that is reusable for this >> purpose. She's done some documentation for it and I'm sure she >> would be happy to throw it your way. >> >> We'll also be learning how to sending these as updates later on. >> >> Later on, >> Chris >> >> >> On Feb 17, 2012, at 7:05 AM, Nate Solas wrote: >> >> > A data import question for the CS implementers, or any smarty >> who knows. I've been having good success mapping our data using >> Talend (Pentaho's wiki seemed waaay out of date so I didn't look >> too closely), and it's been going pretty well. I am, however, >> totally stuck at the moment trying to deal with "nested loops" in >> XML, basically repeating elements. For instance, the schema for an >> object, in pseudo-XML: >> > >> > <schema> >> > <title>test object</title> >> > ... >> > <dimensionGroups> >> > <dimensionSet> >> > <length>10</length >> > <dimension>width</dimension> >> > <units>inches</units> >> > </dimensionSet> >> > <dimensionSet> >> > <length>20</length >> > <dimension>height</dimension> >> > <units>inches</units> >> > </dimensionSet> >> > </dimensionGroups> >> > ... >> > </schema> >> > >> > In all the output generators for Talend, you can pick one Loop >> Element to put each row into (schema), but can't set up internal >> loops. There is some notion of grouping rows and using that for >> sub-loops, but that would mean I'd need to duplicate my entire >> object schema for each dimension (height, width, depth, whatever) >> and then also possibly the same issue for associated people, title >> languages, etc. >> > >> > I can't figure this out at all. My current idea is to generate >> an entirely separate XML tree for each repeating section and try >> to merge them in, either with placeholders and a post-execute >> hook, or maybe by passing the nest XML into the full XML as a >> "Document" type, but either of those seem very clean or likely to >> work in a scalable, understandable way. >> > >> > Susan, does Kettle do this and I should just switch tools? I >> have the same problem on import but I'm sort of hacking my away >> around it. These tools don't really seem to support the idea of >> lists within rows (at least at file generation time), which kind >> of blows my mind. >> > >> > Chris Potts, are you solving this another way? Also, before we >> re-do all the work, can we take a peek at your "Fine Arts" extensions? >> > >> > Thanks for any tips. >> > Nate >> > >> > _______________________________________________ >> > Talk mailing list >> > Talk@lists.collectionspace.org >> <mailto:Talk@lists.collectionspace.org> >> > >> http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org >> >> > > > > _______________________________________________ > Talk mailing list > Talk@lists.collectionspace.org > http://lists.collectionspace.org/mailman/listinfo/talk_lists.collectionspace.org
CP
Christopher Pott
Mon, Feb 20, 2012 6:20 PM

Hi Nate,

Our implementation wireframes and Fine Arts crosswalk can be found on the CollectionSpace deployments pages at http://wiki.collectionspace.org/display/deploy/SMK http://wiki.collectionspace.org/display/deploy/SMK  . Just ask if you need more information about this.

I gave up using the Talend "AdvancedXMLoutput" module partly because of the loop issue, and because I got tired of graphically remapping the relations whenever there were changes. The technique I'm currently using for repeating elements in Talend is described at http://wiki.collectionspace.org/display/deploy/Data+Migration+using+Talend+Open+Studio+-+DRAFT . You can see here how it's possible to use "aggregate row" to create a (Java) list of objects and then write this list to xml using JAXB. What is not shown is how to package the resulting xml for import, but this is essentially just adding the correct headers and footers.

We should talk about the possibility of reusing/sharing some Talend output modules (generating collectionspace import files), but we'd have to agree on a best practice before we start creating something suitable for sharing. I can also send our current migration job(s) if it helps, although it may take some explaining (the above example is neater than the current reality).

Regards,

Chris


Fra: talk-bounces@lists.collectionspace.org [mailto:talk-bounces@lists.collectionspace.org] På vegne af Nate Solas
Sendt: 17. februar 2012 16:06
Til: talk@lists.collectionspace.org
Emne: [Talk] Talend / kettle / import Q

A data import question for the CS implementers, or any smarty who knows. I've been having good success mapping our data using Talend (Pentaho's wiki seemed waaay out of date so I didn't look too closely), and it's been going pretty well. I am, however, totally stuck at the moment trying to deal with "nested loops" in XML, basically repeating elements. For instance, the schema for an object, in pseudo-XML:

<schema> <title>test object</title>

...

<dimensionGroups>
<dimensionSet>

  <length>10</length

  <dimension>width</dimension>

  <units>inches</units>

</dimensionSet>

<dimensionSet>

  <length>20</length

  <dimension>height</dimension>

  <units>inches</units>

</dimensionSet>
</dimensionGroups>

...

</schema>

In all the output generators for Talend, you can pick one Loop Element to put each row into (schema), but can't set up internal loops. There is some notion of grouping rows and using that for sub-loops, but that would mean I'd need to duplicate my entire object schema for each dimension (height, width, depth, whatever) and then also possibly the same issue for associated people, title languages, etc.

I can't figure this out at all. My current idea is to generate an entirely separate XML tree for each repeating section and try to merge them in, either with placeholders and a post-execute hook, or maybe by passing the nest XML into the full XML as a "Document" type, but either of those seem very clean or likely to work in a scalable, understandable way.

Susan, does Kettle do this and I should just switch tools? I have the same problem on import but I'm sort of hacking my away around it. These tools don't really seem to support the idea of lists within rows (at least at file generation time), which kind of blows my mind.

Chris Potts, are you solving this another way? Also, before we re-do all the work, can we take a peek at your "Fine Arts" extensions?

Thanks for any tips.

Nate

Hi Nate, Our implementation wireframes and Fine Arts crosswalk can be found on the CollectionSpace deployments pages at http://wiki.collectionspace.org/display/deploy/SMK <http://wiki.collectionspace.org/display/deploy/SMK> . Just ask if you need more information about this. I gave up using the Talend "AdvancedXMLoutput" module partly because of the loop issue, and because I got tired of graphically remapping the relations whenever there were changes. The technique I'm currently using for repeating elements in Talend is described at http://wiki.collectionspace.org/display/deploy/Data+Migration+using+Talend+Open+Studio+-+DRAFT . You can see here how it's possible to use "aggregate row" to create a (Java) list of objects and then write this list to xml using JAXB. What is not shown is how to package the resulting xml for import, but this is essentially just adding the correct headers and footers. We should talk about the possibility of reusing/sharing some Talend output modules (generating collectionspace import files), but we'd have to agree on a best practice before we start creating something suitable for sharing. I can also send our current migration job(s) if it helps, although it may take some explaining (the above example is neater than the current reality). Regards, Chris ________________________________ Fra: talk-bounces@lists.collectionspace.org [mailto:talk-bounces@lists.collectionspace.org] På vegne af Nate Solas Sendt: 17. februar 2012 16:06 Til: talk@lists.collectionspace.org Emne: [Talk] Talend / kettle / import Q A data import question for the CS implementers, or any smarty who knows. I've been having good success mapping our data using Talend (Pentaho's wiki seemed waaay out of date so I didn't look too closely), and it's been going pretty well. I am, however, totally stuck at the moment trying to deal with "nested loops" in XML, basically repeating elements. For instance, the schema for an object, in pseudo-XML: <schema> <title>test object</title> ... <dimensionGroups> <dimensionSet> <length>10</length <dimension>width</dimension> <units>inches</units> </dimensionSet> <dimensionSet> <length>20</length <dimension>height</dimension> <units>inches</units> </dimensionSet> </dimensionGroups> ... </schema> In all the output generators for Talend, you can pick one Loop Element to put each row into (schema), but can't set up internal loops. There is some notion of grouping rows and using that for sub-loops, but that would mean I'd need to duplicate my entire object schema for each dimension (height, width, depth, whatever) and then also possibly the same issue for associated people, title languages, etc. I can't figure this out at all. My current idea is to generate an entirely separate XML tree for each repeating section and try to merge them in, either with placeholders and a post-execute hook, or maybe by passing the nest XML into the full XML as a "Document" type, but either of those seem very clean or likely to work in a scalable, understandable way. Susan, does Kettle do this and I should just switch tools? I have the same problem on import but I'm sort of hacking my away around it. These tools don't really seem to support the idea of lists within rows (at least at file generation time), which kind of blows my mind. Chris Potts, are you solving this another way? Also, before we re-do all the work, can we take a peek at your "Fine Arts" extensions? Thanks for any tips. Nate
NS
Nate Solas
Tue, Feb 21, 2012 2:46 PM

Thanks, Chris, this is a great help! Should have checked the wiki first...
:)

I don't have as much time to mess with the importing scripts this week but
I'll at least try to give some of those techniques a shot. Thanks again,
Nate

On Mon, Feb 20, 2012 at 12:20 PM, Christopher Pott
Christopher.Pott@smk.dkwrote:

Hi Nate,****


Our implementation wireframes and Fine Arts crosswalk can be found on the
CollectionSpace deployments pages at
http://wiki.collectionspace.org/display/deploy/SMK . Just ask if you need
more information about this.****


I gave up using the Talend “AdvancedXMLoutput” module partly because of
the loop issue, and because I got tired of graphically remapping the
relations whenever there were changes. The technique I’m currently using
for repeating elements in Talend is described at
http://wiki.collectionspace.org/display/deploy/Data+Migration+using+Talend+Open+Studio+-+DRAFT. You can see here how it’s possible to use “aggregate row” to create a
(Java) list of objects and then write this list to xml using JAXB. What is
not shown is how to package the resulting xml for import, but this is
essentially just adding the correct headers and footers.****


We should talk about the possibility of reusing/sharing some Talend output
modules (generating collectionspace import files), but we’d have to agree
on a best practice before we start creating something suitable for sharing.
I can also send our current migration job(s) if it helps, although it may
take some explaining (the above example is neater than the current reality).



Regards,****

Chris****



Fra: talk-bounces@lists.collectionspace.org [mailto:
talk-bounces@lists.collectionspace.org] På vegne af Nate Solas
Sendt: 17. februar 2012 16:06
Til: talk@lists.collectionspace.org
Emne: [Talk] Talend / kettle / import Q
**


A data import question for the CS implementers, or any smarty who knows.
I've been having good success mapping our data using Talend (Pentaho's wiki
seemed waaay out of date so I didn't look too closely), and it's been going
pretty well. I am, however, totally stuck at the moment trying to deal with
"nested loops" in XML, basically repeating elements. For instance, the
schema for an object, in pseudo-XML:****


<schema>****

<title>test object</title>****

...****

<dimensionGroups>****

 <dimensionSet>****

   <length>10</length****

   <dimension>width</dimension>****

   <units>inches</units>****

 </dimensionSet>****

 <dimensionSet>****

   <length>20</length****

   <dimension>height</dimension>****

   <units>inches</units>****

 </dimensionSet>****

</dimensionGroups>****

...****

</schema>****


In all the output generators for Talend, you can pick one Loop Element to
put each row into (schema), but can't set up internal loops. There is some
notion of grouping rows and using that for sub-loops, but that would mean
I'd need to duplicate my entire object schema for each dimension (height,
width, depth, whatever) and then also possibly the same issue for
associated people, title languages, etc.****


I can't figure this out at all. My current idea is to generate an entirely
separate XML tree for each repeating section and try to merge them in,
either with placeholders and a post-execute hook, or maybe by passing the
nest XML into the full XML as a "Document" type, but either of those seem
very clean or likely to work in a scalable, understandable way.****


Susan, does Kettle do this and I should just switch tools? I have the same
problem on import but I'm sort of hacking my away around it. These tools
don't really seem to support the idea of lists within rows (at least at
file generation time), which kind of blows my mind.****


Chris Potts, are you solving this another way? Also, before we re-do all
the work, can we take a peek at your "Fine Arts" extensions?****


Thanks for any tips.****

Nate****


Thanks, Chris, this is a great help! Should have checked the wiki first... :) I don't have as much time to mess with the importing scripts this week but I'll at least try to give some of those techniques a shot. Thanks again, Nate On Mon, Feb 20, 2012 at 12:20 PM, Christopher Pott <Christopher.Pott@smk.dk>wrote: > Hi Nate,**** > > ** ** > > Our implementation wireframes and Fine Arts crosswalk can be found on the > CollectionSpace deployments pages at > http://wiki.collectionspace.org/display/deploy/SMK . Just ask if you need > more information about this.**** > > ** ** > > I gave up using the Talend “AdvancedXMLoutput” module partly because of > the loop issue, and because I got tired of graphically remapping the > relations whenever there were changes. The technique I’m currently using > for repeating elements in Talend is described at > http://wiki.collectionspace.org/display/deploy/Data+Migration+using+Talend+Open+Studio+-+DRAFT. You can see here how it’s possible to use “aggregate row” to create a > (Java) list of objects and then write this list to xml using JAXB. What is > not shown is how to package the resulting xml for import, but this is > essentially just adding the correct headers and footers.**** > > ** ** > > We should talk about the possibility of reusing/sharing some Talend output > modules (generating collectionspace import files), but we’d have to agree > on a best practice before we start creating something suitable for sharing. > I can also send our current migration job(s) if it helps, although it may > take some explaining (the above example is neater than the current reality). > **** > > ** ** > > Regards,**** > > Chris**** > > ** ** > ------------------------------ > > *Fra:* talk-bounces@lists.collectionspace.org [mailto: > talk-bounces@lists.collectionspace.org] *På vegne af *Nate Solas > *Sendt:* 17. februar 2012 16:06 > *Til:* talk@lists.collectionspace.org > *Emne:* [Talk] Talend / kettle / import Q**** > > ** ** > > A data import question for the CS implementers, or any smarty who knows. > I've been having good success mapping our data using Talend (Pentaho's wiki > seemed waaay out of date so I didn't look too closely), and it's been going > pretty well. I am, however, totally stuck at the moment trying to deal with > "nested loops" in XML, basically repeating elements. For instance, the > schema for an object, in pseudo-XML:**** > > ** ** > > <schema>**** > > <title>test object</title>**** > > ...**** > > <dimensionGroups>**** > > <dimensionSet>**** > > <length>10</length**** > > <dimension>width</dimension>**** > > <units>inches</units>**** > > </dimensionSet>**** > > <dimensionSet>**** > > <length>20</length**** > > <dimension>height</dimension>**** > > <units>inches</units>**** > > </dimensionSet>**** > > </dimensionGroups>**** > > ...**** > > </schema>**** > > ** ** > > In all the output generators for Talend, you can pick one Loop Element to > put each row into (schema), but can't set up internal loops. There is some > notion of grouping rows and using that for sub-loops, but that would mean > I'd need to duplicate my entire object schema for each dimension (height, > width, depth, whatever) and then also possibly the same issue for > associated people, title languages, etc.**** > > ** ** > > I can't figure this out at all. My current idea is to generate an entirely > separate XML tree for each repeating section and try to merge them in, > either with placeholders and a post-execute hook, or maybe by passing the > nest XML into the full XML as a "Document" type, but either of those seem > very clean or likely to work in a scalable, understandable way.**** > > ** ** > > Susan, does Kettle do this and I should just switch tools? I have the same > problem on import but I'm sort of hacking my away around it. These tools > don't really seem to support the idea of lists within rows (at least at > file generation time), which kind of blows my mind.**** > > ** ** > > Chris Potts, are you solving this another way? Also, before we re-do all > the work, can we take a peek at your "Fine Arts" extensions?**** > > ** ** > > Thanks for any tips.**** > > Nate**** > > ** ** >