A few advices for building an easy to analyse excel file.
Rules of thumb:
One row for one patient/limb/organ (if multiples target assessed)
First row is for columns name
Don’t add any row above (like merged cells with descriptions of the underlying columns: pre op, post op, follow-up…)
Do ALL YOUR CALCULATIONS on excel:
Never calculate durations by yourself. Enter dates and create a new column to calculate duration (age, follow up, length of stay…)
Name ALL your columns with an explicit and DIFFERENT title. ALL names has to be unique
Each column must have consistent values:
Continuous values
Categorical: from 2 unique values to x values. It can be “Yes/Y/y/yes”, “car, bike, motorbike, bus, train”…The main point is to have the same spelling in all your column.
Datascience replace all Yes/No value by 1 and 0.
If you need extra information for few rows, like comments, (“yes, but maybe…), create a new column named column_name comment. Like this, you don’t loose any information, and you keep an easy to analyse file. If your column has to have 2 different values (yes/no for instance), the algorithm will identify an extra value, and all your analyse will be biased.
Never merge cells...!
The descriptive sheet in the excel file is meant for: It gives you the unique value count of each column. If you expect 2 values, and it gives you 3 or more, it means that you may have a typo in a cell…
Search the web for advices about your dataset before starting data collection...! A JAMA paper