Hands On RDM
Data organization
(Welcome! Nice to *see* you!)

You find the notepad of today’s session: http://www.crc1382.org/pad02

1. Topics

1.1. Proper organization of data

1.2. Meaningful naming of data

2. Organizing data

You can organise your data in two ways:

  • hierarchical
  • tag-based.

2.1. Hierarchical file system

Folders and subfolders define the structure.

\--PhD
   |---images/
   |    |-- microscopy/
   |    |-- analyzed and ready for publication
   |
   |---data/
   |---text/
   |   |-- version1
   |   |-- sent to supervisor/
   |   |   |-- comments
   |   |   |-- suggestions
   |   |
   |   |-- working in progress
   |
   |---documents/

2.2. Tag based system

Metadata define data.

Let's imagine we have a file called 2020-02-03-cheesecake.txt.

It is described with these keywords:

2020-02-03
cheesecake
recipe
birthday party
cake
yummy

Downside: You need a specific software for that, e.g. https://www.tagflow.ch/

2.3. Best practices

To find a systematic file folder structure you should go through the following steps:

  • Define the types of data and file formats; what are your data sets?
  • Collect and include the important contextual information (this can also go into your Data Management Plan).
  • Separate and categorize: e.g work-data, meta-work (applications, CV etc.), references, privat stuff.

How do you look for your data?

  • By time period?
  • By creator/project collaborators?
  • By activity or collection method?
  • By its type (e.g. presentation, report)?
  • Organize folders by meaningful categories (e.g. data sets)

    Go from general to specific: "Primary / Secondary / Tertiary"

    • e.g. [project] / [sub-project] / [experiment] / [instrument] / [date]
    • e.g. [research area] / [project] / [data or documentation] / [date]
    • e.g. [project] / [type of file] / [data collector name] / [date]
  • Choose a directory naming convention
    • Determine your unique elements
    • Consider ordering and abbreviations
      • e.g. use figures, not figs or figure
      • Using only person's last names (ammend suffix if needed)

3. Naming

A naming convention is meant to make your life easier.

But it won't always work.

3.1. Things to be aware of

  • Is there an existent naming convention of your research community or your institute?
  • Adapt it to your needs!

If not, define your own naming convention.

3.2. Your naming convention should be

3.2.1. descriptive

Consider including:

  • Unique identifier (e.g. project name)
  • Conditions (lab instrument, temperature etc.)
  • Run of experiment (sequential)
  • Date (in file properties, too)
  • Version number

3.2.2. consistent

Make sure to use standards!

  • Date: stick to one date-format: YYYY-MM-DD (ISO!) (e.g. 2020-01-31)
  • Numbers: use the same length for number and if necessary fill up with zeros (e.g. 00123, 03948 etc.)
  • Spelling (american English, british English)

3.3. Do no forget to

3.3.1. avoid

  • spaces (in file and folder names), use hyphen (-), underscore (_), or camelCase instead.
  • special characters, e.g. "/ \ : * ? " < > [ ] & $
  • long names. Good length is 32 characters (e.g. 32CharactersLooksExactlyLikeThis.png)
  • period before the file suffix (e.g. expOut.after12.00hours.csv)

3.3.2. document

Write down how your naming convention works.

  • Use a markdown or plain-text file (e.g. README.md).
  • You can also use a Data Management Plan for that.

4. Hands On! Session

4.1. Task 1

You are Lisa, a young PhD student in the field of medicine and bioinformatics. Lisa is working in the lab (with her colleague Heinrich) analyzing structures and she gets results back from the machines as data sets and images.

For the analysis she uses her own code and programs to produce videos, music and more data sets.

She is also very active in giving presentations about her work and publishes frequently in some papers. Writing her PhD thesis is no problem for her, but she is completely lost in organizing her data.

Help Lisa to establish data and file system by renaming and structuring the files and folders. There might even be files that are not related to her work and PhD.

4.2. Task 2

Now, Lisa is happy with the new data structure with folders and specific file names. But she is afraid that it will be a mess soon again or she will forget the structure and naming convention she has used.

In the end she also needs to turn over all her data to her supervisor who will be lost if there is no some sort of guideline.

Document the data organization you just have established for Lisa.

5. Useful links

Date: 2021-05-12

Author: Lukas C. Bossert

Created: 2021-05-12 Wed 13:52

Validate