Hands On RDM
Files: Formats, Standards, Conversions
1 Topic
Figure 1: https://xkcd.com/2116/
2 Files
2.1 Formats and Standards
- Files have formats.
- Extension will (mostly) reveal the format.
2.1.1 Text
- Microsoft Word (
.doc
vs..docx
) - Apple Pages (
.pages
) - LibreOffice Write (
.odt
) Markdown (
.md
), Plain Text (.txt
), LaTeX (.tex
)What else do you use?
2.1.2 Spreadsheet
- Microsoft Excel (
.xls
vs..xlsx
) - Apple Numbers (
.numbers
) - LibreOffice Calc (
.ods
) - CSV (
.csv
), TSV (.tsv
) etc.
2.1.3 Images
Raster vs. Vector
Figure 2: Difference of formats
- Raster: (
.jpg
,.png
) - Vector: (
.pdf
,.svg
,.eps
)
2.1.4 Code
Files for Code are (usually) open source.
Recommendation: JupyterNotebook (.ipynb
)
- Code and documentation (literate programming) in one file
2.1.5 Proprietary
- Is there a documentation?
- What do you need to open the file?
- Can you convert the data?
2.2 Conversions and Archive
- To ensure compatibility, convert your document.
- Not every foormat is suitable for archiving!
For keeping your project files, use preferably:
- PDF (
.pdf
) - TIFF (
.tiff
) - CSV (
.csv
)
3 HandsOn!-Session
There are several files in the folder "Toms-data-sets" (https://rwth-aachen.sciebo.de/s/MrqD1tXyXlkcuA7).
You will need to perform certain steps. Document the steps by writing it down / make screenshots etc. so that you or your colleague will understand what you did.
3.1 Text files
- Convert the file
Medical Report Form.doc
into adocx
-file. - Fill out the form and save it as
pdf
or convert it topdf
. Make sure that the filename is properly set (e.g.YYYY-MM-DD_Medical-Report-Form
; have a look at last session: http://crc1382.org/rdm-docs/02_data-organization.html)
3.2 Spreedsheets
- Open the file
encounter.csv
with Excel/LibreOffice (or the spreedsheet program of your choice). Make sure that the file is loaded correctly with the proper encoding and column separation. (If you have no clue how to do that, have a look into the section Useful links). - Get the timestamp value of cell
G2808
. - Get the sum of the columns
Q
toX
. - Color column
lab_results_count
(R
) relativly to the value (0
= red; the higher the number the greener the cell). - Insert a new column (name it
horizontal_count_sum
) after columnimmunization_count
(X
). In this column calculate the sum of the columnsQ
toX
(horizontally). - Save this file as
xlsx
and save it ascsv
with tabs as delimiters, too. Name the file properly.
3.3 Images
- Make a picture of your computer with the opened files from above (either with a screenshot or with your cellphone).
- Save this picture in
Toms-data-sets
in the formatjpg
andtiff
(you might need to convert it).
3.4 Archive and Reuse
- Rename the folder
Toms-data-sets
to e.g.<YOURNAME>-data-sets
(replace<YOURNAME>
with your name). - Check if the content of the file
README.txt
is accurate (author’s information, file names etc.). Update this data documentation file. - Make this folder archivable by zipping it (use
zip
etc.). - Send this file to the members of your breakout-session. You can send it via email or save the file in e.g. Sciebo/OneDrive/SharePoint and share a download link.
- You will receive datasets, too. Un-zip the received archive and open all the files. Do you run into any error?
4 Useful links
In this section you find some links which might be helpful for this topic
4.1 Conversion tools
There are several conversion tools online available. Like always, check what the conditions are and what (personal) data the provider will store.
- https://www.freeconvert.com/
- https://convertio.co/
- https://www.online-convert.com/
- https://pandoc.org/try/ (very powerful especially when you use it via the commandline interface)
4.2 Tips and Tricks
- Opening a csv-file with Excel: https://www.ablebits.com/office-addins-blog/2014/05/01/convert-csv-excel/
- Making a screenshot (Windows: Official site; macOS: Official site)