Sperm - Asc Data Set: Saseko - CSV Data Set
Sperm - Asc Data Set: Saseko - CSV Data Set
Sperm - Asc Data Set: Saseko - CSV Data Set
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Importing, exporting
Reading csv, fixed format files
listing, descrbing, viewing the data
Thomas Scheike
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Data examples
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Data examples
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Import data
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Import data
// to read from R
saveold oldsperm
Windows:
1
2
3
4
5
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Import data
1
2
3
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Import data
cd data
import excel using mini-data.xlsx
describe
1
list
/home/ifsv/bhd252/undervis/stata-course/thomas/data
Contains data
obs:
4
vars:
5
size:
68
------------------------------------------------------------------------------storage display
value
variable name
type
format
label
variable label
------------------------------------------------------------------------------A
byte
%10.0g
B
double %10.0g
C
byte
%10.0g
D
str6
%9s
E
byte
%10.0g
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
list
1.
2.
3.
4.
+---------------------------+
| A
B
C
D
E |
|---------------------------|
| 1
2
99
anders
1 |
| 2
2.5
30
anders
1 |
| 3
5
5
thomas
2 |
| 4
4.5
25
thomas
2 |
+---------------------------+
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Import data
1
2
1
2
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Import data
Import data
1.
2.
3.
4.
5.
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Import data
1
2
3
4
1
2
clear
insheet your_file.csv, delimiter(";")
list in 1/5
+-----------------------------------------------+
| v1
obs
abstid
alder
s1e2
konc
vol |
|-----------------------------------------------|
| 1
1
.
26
1
0
2.7 |
| 2
2
4
44
1
26
5 |
| 3
3
4
36
1
12
5.5 |
| 4
4
4
40
1
83
4.5 |
| 5
5
3
37
1
36
2 |
+-----------------------------------------------+
clear
infix obsnr 1-5 year 6-11 n 12-17 meanage 18-26 meanabs
27-35 meanvol 36-44 meanct 45-53 medct 54-59 usa
60-63 2 first using sperm.asc
format meanage meanabs meanvol meanct medct %4.2f
drop obsnr
list in 1/5
1.
2.
3.
4.
5.
+-----------------------------------------------------------------+
| year
n
meanage
meanabs
meanvol
meanct
medct
usa |
|-----------------------------------------------------------------|
| 1938
200
-1.00
-2.00
3.00
120.63
.
1 |
| 1941
22
31.50
-2.00
2.98
107.00
.
1 |
| 1943
25
-1.00
-2.00
4.50
66.90
66.40
1 |
| 1944
50
-1.00
-2.00
3.19
85.70
.
0 |
| 1945
100
22.00
-2.00
3.40
134.00
.
1 |
+-----------------------------------------------------------------+
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Import data
1
2
Import
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Import data
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Export data
save NewVersion
save, replace
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
1
2
Labels
1
2
3
4
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Missing variables
if missing(age) age=0
replace age=0 if missing(age,sex,land)
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Sorting
1
2
3
4
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Egen functions
help egen
1
1
2
3
4
5
6
7
8
2
3
4
5
1.
2.
3.
4.
5.
+---------------------------------------------+
| meanct
meanvol
tot
tots
agg |
|---------------------------------------------|
|
.
.
0
.
. |
| 120.63
3.00
123.63
123.63
61.815 |
| 107.00
2.98
109.98
109.98
54.99 |
| 66.90
4.50
71.4
71.4
35.7 |
| 85.70
3.19
88.89
88.89
44.445 |
+---------------------------------------------+
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Egen by
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
help egen
1
2
3
4
1
2
3
4
1.
2.
3.
4.
5.
+-------------------------+
| meanct
usa
meanreg |
|-------------------------|
| 120.63
1
85.32429 |
| 107.00
1
85.32429 |
| 66.90
1
85.32429 |
| 85.70
0
80.05758 |
| 134.00
1
85.32429 |
+-------------------------+
gen igecat2=recode(age,21,38,64,100)
7
8
9
generate agecat=autocode(age,4,0,1000)
tabulate agecat, missing
10
11
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
4
5
3
4
5
6
+------------------------+
| meanct
cat20
cat30 |
|------------------------|
|
.
0
0 |
| 120.63
0
0 |
| 107.00
0
0 |
| 66.90
0
0 |
| 85.70
0
0 |
+------------------------+
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Recoding of factors.
1
2
3
1.
2.
3.
4.
5.
+-----------------------------+
| meanage
agecat1
agecat2 |
|-----------------------------|
|
.
.
. |
|
-1.00
-1
3 |
|
31.50
.
. |
|
-1.00
-1
3 |
|
-1.00
-1
3 |
+-----------------------------+
4
5
6
7
8
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Recoding of factors.
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Recoding
1
2
3
4
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Replace
1
2
3
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
1
2
3
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
1
2
3
4
5
scalar q1=r(r1)
scalar q2=r(r2)
list scalar
scalar drop
help scalar
(Written by R.
sort id
+-------------------+
| id
age
height |
|-------------------|
1. | 1
9
130 |
2. | 1
10
140 |
3. | 1
11
148 |
+-------------------+
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Long to wide
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Long to wide
1
2
1
2
3
4
gen index=1
gen testid=(id==id[_n-1])
replace index=index[_n-1]+1 if id==id[_n-1]
list in 1/3
drop testid
reshape wide age height, i(id) j(index)
list
drop testid
(note: j = 1 2 3)
Data
long
->
wide
----------------------------------------------------------------------------Number of obs.
6
->
2
Number of variables
4
->
7
j variable (3 values)
index
->
(dropped)
xij variables:
age
->
age1 age2 age3
height
->
height1 height2 height3
----------------------------------------------------------------------------+---------------------------------------------------------+
| id
age1
height1
age2
height2
age3
height3 |
|---------------------------------------------------------|
1. | 1
9
130
10
140
11
148 |
2. | 2
9.5
120
10.23
125
10.78
130 |
+---------------------------------------------------------+
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Wide to Long
1
2
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Merging files
(note: j = 1 2 3)
Data
wide
->
long
----------------------------------------------------------------------------Number of obs.
2
->
6
Number of variables
7
->
4
j variable (3 values)
->
index
xij variables:
age1 age2 age3
->
age
height1 height2 height3
->
height
----------------------------------------------------------------------------list in 1/3
(Written by R.
1
2
DEPARTMENT OF BIOSTATISTICS
use wide-data.dta, clear
UNIVERSITY OF COPENHAGEN
list
age height, i(id) j(index)
+--------------------+
| id
region
bmi |
|--------------------|
1. | 1
1
22.3 |
2. | 2
1
28.3 |
+--------------------+
+---------------------------+
| id
index
age
height |
|---------------------------|
1. | 1
1
9
130 |
2. | 1
2
10
140 |
3. | 1
3
11
148 |
+---------------------------+
Merging
fileslong
3
reshape
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Merging files
4
5
6
1
2
(Written by R.
+-------------------+
| id
X_by
alder |
|-------------------|
1. | 1
2
27 |
2. | 2
.
17 |
3. | 3
28
17 |
+-------------------+
1
2
(Written by R.
+----------------------------+
| id
X_by
alder
region |
|----------------------------|
1. | 1
2
2
1 |
2. | 2
.
17
2 |
3. | 3
28
17
1 |
+----------------------------+
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Merging files
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Merging files
1
2
3
1
2
(Written by R.
(Written by R.
Result
# of obs.
----------------------------------------not matched
1
from master
0 (_merge==1)
from using
1 (_merge==2)
+--------------------+
| id
region
bmi |
|--------------------|
1. | 5
2
12.3 |
2. | 7
2
18.3 |
+--------------------+
list
matched
2
-----------------------------------------
(_merge==3)
+----------------------------------------------------+
| id
region
bmi
X_by
alder
_merge |
|----------------------------------------------------|
1. | 1
1
22.3
2
27
matched (3) |
2. | 2
1
28.3
.
17
matched (3) |
3. | 3
.
.
28
17
using only (2) |
+----------------------------------------------------+
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Merging files
1
2
3
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Merging files
1
2
(Written by R.
Result
# of obs.
----------------------------------------not matched
1
from master
1 (_merge==1)
from using
0 (_merge==2)
list
matched
2
-----------------------------------------
(Written by R.
)
append using merge4-addid, gen(oldnew)
list
(_merge==3)
+-----------------------------------------------------+
| id
X_by
alder
region
bmi
_merge |
|-----------------------------------------------------|
1. | 1
2
27
1
22.3
matched (3) |
2. | 2
.
17
1
28.3
matched (3) |
3. | 3
28
17
.
.
master only (1) |
+-----------------------------------------------------+
1.
2.
3.
4.
+-----------------------------+
| id
region
bmi
oldnew |
|-----------------------------|
| 1
1
22.3
0 |
| 2
1
28.3
0 |
| 5
2
12.3
1 |
| 7
2
18.3
1 |
+-----------------------------+
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF COPENHAGEN
Merging files
1
2
3
4
5
+--------------------+
| region
name |
|--------------------|
1. | N Cntrl
Krantz |
2. | N Cntrl
Phipps |
3. | N Cntrl
Willis |
4. | NE
Ecklund |
5. | NE
Franks |
|--------------------|
6. | South
Anderson |
7. | South
Dubnoff |
8. | South
Lee |
9. | South
McNeil |
10. | West
Charles |
|--------------------|
11. | West
Cobb |
12. | West
Grant |
+--------------------+
file http://www.stata-press.com/data/r13/dollars.dta not Stata format
r(610);
1.
2.
3.
4.
5.
6.
+--------------------+
| region
name |
|--------------------|
| N Cntrl
Krantz |
| N Cntrl
Phipps |
| N Cntrl
Willis |
| NE
Ecklund |
| NE
Franks |
|--------------------|
| South
Anderson |