DEIMS -- Keying datasets with Core Area terms

update oct 12

the EMLs have now a term when applicable, and always according to the info collected at DATASET_TOC
the migrations are adapted accordingly, and ready to roll.
closing
=============
update oct 6

* Downloaded a copy of 'master-toc-snapshot', and using excell, reduced and made up the contents to about 98 rows.

* New table has these columns: description, table name, id, core area, core area 2, core area 3.

  1. description is more like a data set title.
  2. table name is the database table name
  3. id is the numerical ID in EML
  4. core area (2, 3) is the assigned term

97 data rows in all. from here, two or more choices.

1) tweak eml, add these terms using, say, perl. assoc array of core-area-term on eml-id - easy. question is, can we grab then the terms using the xml migration?
2) using migrate directly, override the xml-dataset-migration, or tweak in prepare.
3) manual addition post-migration

For now, Im suing approach 1). I tested the migration with an enhanced eml , and works. Now the perl part to finish prepping the EMLs.

--------------------
Update Oct 1

website http://tagmcmdata.lternet.edu revived. Added Core areas. Plan B in the works, there are a ton of tables with core are keywords (see TOC tables)

this is related to the overall keywords issue

we could perhaps grab datasets w/ core keys using the 'system of record' https://www.drupal.org/node/1117454

other is modify at source (EML) and revise then the kywd miogration

Here is whats in TOCs

Dataset-
dataset code, dataset name, project, discipline [used for the sections and subsections)
General table-
dataset code, description, table name [ used for loose tables, such as lost seal, personnel, news, cocs, etc ]

Several items could ID the dataset. Ie., the table name, part of the xml loc, the packageid inside the metdt url

COLUMNS:
dataset code, table name, description, glacier, xml loc, metdt url, web station, web datatype, LTER CORE AREA

Glacier core- [ No Core Areas ]
Glacier short [ 4/6 tagged with "Inrgn Matt"]

Limno core- [9/17 : pop, pp, org matt, inorg matt ]
Limno short- [15/27 : Inorg/org Matt, pop ]

Met [ No Core Areas ]

Soils core [ 12/14, mostly pop, disturb. inorg,, org ]
Soils short [ 37/39, mostly pop. also: org matt, disturbance ]

Streams core [ 4/27. inorg and org matter]
Stream short [ 17/40, norg and org matter, pop, ppr]

In all, we have 98 /327 tables tagged with core areas
BTW, MASTER_TOC_SNAPSHOT is a union of them all. All you need is this last table.

------

There is at least one table in the database that contains some column with Core Area terms. We need to analyse how to best use this info to update datasets in DEIMS on migration or post-migration.

Status: 

Priority: 

Normal