Zu Hauptinhalten wechseln

FV Decipher Unterstützung

Alle Themen, Ressourcen für FV Decipher benötigt.

 
decipher

Managing Data with the Tab Import Script

  Requires Decipher Cloud

This tool is the command line version of the Edit Data feature found in the downloads menu.

1:  Overview

The Tab Import functionality is a mechanism that allows reading in a tab-delimited file and appending, modifying or deleting records in a survey. The Tab Import tool has two main purposes:

  1. Cleaning existing data
  2. Uploading external data into v2

Syntax:

tabimport [-k key] [-c] [-a] [-N] [--nobackup] [--layout=###] [--yes] surveyPath file.txt

2:  How it Works

The Tab Import tool will upload a tab-delimited file into an existing survey and append or modify the existing data. Tab Import will only recognize variables that are seen as output from the targeted survey. In other words, if you are able to see a variable in the downloaded tab-delimited data, you can modify it using Tab Import.

The "variables.xls" file can be used to match data layouts and you can create custom data layouts using the   New Layout option found in the Downloads menu.

Tab Import updates data based on the variables provided in the header of the tab-delimited file. You can upload partial data sets that only include the variables that you need to update and you are not required to provide the entire data set.

3:  Examples

Tab Import is often used to perform the following tasks:

  • Import external data so clients can utilize reporting and production tools
  • Transform data into alternative tables
    • tab-delimited --> Triple-S, SPSS, fixed-width, CB, etc...
  • Recode and clean data
    • update markers
    • delete/disqualify respondents
    • update data
    • remove duplicates based on IP address

A more advanced tool, transform, lets you apply Python code to your variables and qualify/delete respondents.

The examples below are simple and should help to get you familiar with the Tab Import process and end result.

3.1:  Example 1

To learn the Tab Import tool, we'll begin with a simple scenario. Take a look at the following survey, specifically the question variables and setMarker function call:

<?xml version="1.0" encoding="UTF-8"?>
<survey name="Survey"
    alt="Tab Import Ex #1"
    extraVariables="source,list,url,record,ipAddress,userAgent"
    compat="118"
    state="testing"
    setup="time,quota,term"
    unique="">


<samplesources default="1">
  <samplesource list="1" title="default">
    <exit cond="qualified"><b>Thanks again for completing the survey!</b></exit>
    <exit cond="terminated"><b>Thank you for your selection!</b></exit>
    <exit cond="overquota"><b>Thank you for your selection!</b></exit>
  </samplesource>
</samplesources>

<radio label="Q1" title="Are you...">
  <row label="r1">Male</row>
  <row label="r2">Female</row>
</radio>

<number label="Q2" optional="0" size="3" verify="range(1, 120)" title="How old are you?"></number>
<suspend/>

<exec cond="Q2.ival gt 17">setMarker('Age_17+')</exec>

<html label="End" where="survey">
  <p>Thank you!</p>
</html>

<marker name="qualified"/>
</survey>

The survey above will generate the following tab-delimited data:

uuid                markers             status  Q1  Q2   qtime            vos
t4s87amvy4b1wuvu    Age_17+,qualified   3       2   76   30.245002138445  5
b0e6njaww638bgu7    Age_17+,qualified   3       1   91   32.446309568718  12
25a0yt89t5zeznwz    Age_17+,qualified   3       1   29   28.911985701804  12
cgeua8a308bvxe28    Age_17+,qualified   3       2   66   35.590599491062  13
xzxe66dvqj54nytg    qualified           3       1   5    28.386099702762  5
wt4xjkq8aj401gy6    Age_17+,qualified   3       1   72   29.058076265442  4
p9zp7u66197hfb3t    qualified           3       1   5    30.897337177573  13
80fkbz5kzn1zdnxt    Age_17+,qualified   3       1   27   29.251838559104  4
...
...

We can use the Tab Import tool to alter any of the variables listed above (e.g. markers, Q1, Q2, etc...). However, it's unlikely that we would want to touch variables unique to the respondent like record and uuid.

You may have noticed an error in the survey code. We have a set a marker named "Age_17+" for respondents with an age value greater than 17.

<exec cond="Q2.ival gt 17">setMarker('Age_17+')</exec>

We should have named this marker "Age_18+" instead. This is where the Tab Import tool comes in. In addition to modifying the survey code to set the correct maker, we would need to alter the existing data for those respondents who have already received the incorrect marker.

One way to accomplish this task is to download the tab-delimited data, use a "Find & Replace" tool to change "Age_17" to "Age_18" and use the Tab Import script to update the existing data. Below is the modified tab-delimited file we will upload to our project:

uuid                markers
t4s87amvy4b1wuvu    Age_18+,qualified
b0e6njaww638bgu7    Age_18+,qualified
25a0yt89t5zeznwz    Age_18+,qualified
cgeua8a308bvxe28    Age_18+,qualified
xzxe66dvqj54nytg    qualified
wt4xjkq8aj401gy6    Age_18+,qualified
p9zp7u66197hfb3t    qualified
80fkbz5kzn1zdnxt    Age_18+,qualified
...
...

We will use the uuid variable to update each record accordingly for each respondent. We can use any variable that will uniquely identify each respondent.

Since we're only modifying the markers string, it is the only variable we need to included in the file. Note that if the variables are exactly the same, then no modifications will be made, so you may include the entire data set without affecting the data. In our project's directory, we can run the following command to correct the data:

tabimport -kuuid . file.txt

The Tab Import tool provides helpful information that you should review before confirming. If the number of records update and unchanged looks correct, you may proceed. A backup of the data will be made in the data/ folder.

[user@ project1234]$ tabimport -kuuid . file.txt
Assuming . refers to temp/project1234
Importable fields: uuid markers
Skipped fields:    vlist vos vosr14oe vbrowser vbrowserr14oe vmobiledevice vmobileos
    
Rows read from input file:       100
 not processed                     0 
 unmatched deletions               0
 
Rows in results file:            100
 unchanged:                       17
 rewritten:                       83
 deleted:                          0
 created:                          0

Fields processed:               1500
 updated:                         83
 identical:                     1417
 ignored:                          0 (e.g. record, weight)
 erronous:                         0

Continue import? (Y/N) Y

Import completed. Backup in temp/project1234/data/old-results/001.results.*
temp/project1234: markers in data file and in markers database were mismatched and were updated

That's it! We used the -kuuid option to update respondents based on their uuid variable, that way the data matched accordingly. If we download the data again, we'll see that the appropriate marker was modified and "Age_18+" can be seen instead of "Age_17+".

3.2:  Example 2

Building off of the example above, we'll delete all respondents who do not have the "Age_18+" marker. To delete respondents, we need to supply an extra field unique to the Tab Import script called "action".

Using the custom reporting tool, you can generate a tab-delimited file containing only those respondents that do not have the "Age_18+" marker. It should look similar to this:

uuid                markers     status  Q1  Q2   qtime            vos
xzxe66dvqj54nytg    qualified   3       1   5    28.386099702762  5
wt4xjkq8aj401gy6    qualified   3       1   72   29.058076265442  4
p9zp7u66197hfb3t    qualified   3       1   5    30.897337177573  13
...
...

 

To delete these respondents, we'll add an extra column labelled "actions" and "DEL" as the value:

record    uuid                markers     ...  action
105       xzxe66dvqj54nytg    qualified   ...  DEL
108       wt4xjkq8aj401gy6    qualified   ...  DEL
112       p9zp7u66197hfb3t    qualified   ...  DEL
...
...

Similar to the previous example, if we run the Tab Import tool using this file, we'll see some helpful information as to what will happen and we can confirm or deny the process.

[user@ project1234]$ tabimport -kuuid . file.txt 
Assuming . refers to temp/project1234
Importable fields: record uuid date markers status Q1 Q2 qtime start_date source list url ipAddress userAgent dcua session action
Skipped fields:    vlist vos vosr14oe vbrowser vbrowserr14oe vmobiledevice vmobileos


    
Rows read from input file:        17
 not processed                     0 
 unmatched deletions              17
 
Rows in results file:            100
 unchanged:                       83
 rewritten:                        0
 deleted:                         17
 created:                          0

Fields processed:                  0
 updated:                          0
 identical:                        0
 ignored:                          0 (e.g. record, weight)
 erronous:                         0

Continue import? (Y/N) Y

Import completed. Backup in temp/project1234/data/old-results/002.results.*
temp/project1234: markers in data file and in markers database were mismatched and were updated

The result shows us that 17 records were deleted and 83 records were unchanged.

4:  Options & Modifiers

The Tab Import tool is equipped with a handful of options, as shown in the syntax below:

tabimport [-k key] [-c] [-a] [-N] [--nobackup] [--layout=###] [--yes] surveyPath file.txt

 

Modifier Description
-k key The unique variable to reference each respondent by. (Default: uuid)
(e.g. tabimport -kID . file.txt )
-c Create a new record if no matching key is found. This is useful when uploading new tab-delimited data to the project. (e.g. tabimport -c . extra-data.txt ) If -c is not used, the record will be skipped.
-a Allow importing into notdp variables. (i.e. variables hidden from data downloads). With -a you should see question labels that use question/row/col label.  Otherwise the label is generated through altlabel/variables.xls labels.
-N Suppresses email sent to project team regarding edited data. This is useful for tabimport scripts that run daily.
nobackup Does not create a backup of data. (Be careful!)
--layout Import data using variables modified in a Custom Data Layout
(See: Custom Data Layouts)
--yes Skips the confirmation prompt.
surveyPath The path to the survey. (e.g. selfserve/9d3/proj1234  or  . if in the working directory)
file.txt The tab-delimited file containing the data to upload. Naming convention does not matter.
  Note: Variables contained in this file are case sensitive and must match 100%. Extra fields not found will create warning, but are ignored.

 

4.1:  Delete Respondents

Demonstrated in Example 2, the Tab Import tool allows you to delete respondents by providing an additional field called "action" with "DEL" supplied as the value. Any record containing the "DEL" value will be removed from the data.

4.2:  Disqualify Respondents

Respondents can be disqualified by modifying the "markers" field. Set this field's value to equal "DISQUALIFY:reason" and when tabimported, the respondent's markers will be adjusted appropriately so that they show up as a disqualified complete.

  • The "qualified" marker will appear as "bad:qualified"
  • All quota markers will be removed
  • A new marker, "reason", will be set
    • Note: This can be set to any text. (e.g. DISQUALIFY:speeder )

4.3:  Custom Data Layouts

Using the survey builder? Learn more about the Data Layout Manager.

In the downloads menu of your project, you have the ability to create custom data layouts. This enables you to move variable positions and modify the labels.

newdatalayout.png

When you download data with the labels modified, the system will not be able to detect the proper variables unless you use the --layout=### modifier where ### should be replaced with the ID of the custom data layout (shown at the top of the layout).

For example, create a custom data layout using the   New Layout link. Using this tool, you have the ability to modify the labeling for each variable.
(e.g. Q1r1 -> foo, vos -> bar )

If you download this data in the tab-delimited format to modify variables and re-upload, you will need to let the Tab Import tool know which layout ID to reference when adding the data back in. For example, a programmer can generate the tab-delimited data for a project with the layout using the following command:

generate --layout=617 . all tab > tabbed_data.txt

After adjustments have been made (e.g. updating markers, resetting values, etc...), you can import the data back in using tabimport and the --layout modifier:

tabimport --layout=617 -kuuid . tabbed_data.txt

This will recognize any altered variable names (e.g. Q1r1 -> foo ) and update the data values at Q1 accordingly.

4.4:  Recover Original Data

To recover backup data, you will need to copy the following files:

  • data/old-results/XXX.results.bin to data/results.bin
  • data/old-results/XXX.results.text to data/results.strings

5:  What's Next?

Learn about a similar data modification tool, transform.