Thursday, 24 February 2011
Earthquake update
Family, cat and I are safe; I've blogged more on my personal journal. I'm disallowing comments here and encouraging comments there instead; it'll be easier for me to keep communications in one place.
Sunday, 13 February 2011
Converting a plaintext bibliography to Endnote/RIS format with help from Linux/Terminal
[Update 16/7/2011: See my more recent post on the topic, Launching Ref2RIS - convert your typed bibliography to Endnote format, which makes things even easier.]
You won't want to do this unless you've got literally hundreds of references. Any less, and these suggestions are way easier.
1. Format references so they're each on their own line - no blank lines.
2. Use Word's "Find Special" capabilities to replace a phrase in italics with {it}a phrase in italics{endit} and a phrase in bold with {b}a phrase in bold{endb}. (Similarly if the citations contain underlines.)
3. Save as plaintext - say, source.txt. Now the fun begins... My own source text contains 600-odd lines in ACS style, like this:
4. Open up Terminal or some other Linux command line.
5. Endnote records are separated by a line
6. The start of each Endnote record tells you what kind of citation it is - eg a book, journal etc. To find every line that includes a colon (ie separating the publisher from the city published in) type in
Note 2: This is a good example of why this whole method is highly suspect, because it'll also catch citations which have a colon in the article title or in a typo or whatever. So if you can think of a better sign that a citation is a book then use that instead of the colon.
Alternatively, you could type in
Anyway, keep doing what seems best given your source, and fix up the inevitable mistakes by hand until each line starts with TY - something. If you want to give up and just assume that everything that isn't already assigned as something must be a journal then try
I now have source looking like:
7. Now we keep playing with patterns. (You may be able to do large chunks of this with regular find/replace, but for illustrative purposes I'll keep using Terminal.)
For example, in my source the authors are nicely set off: they come after "@@" and before the first "{it}" (or "in {it}"), and if there's more than one of them they're separated by ";". So a few commands:
Journal titles:
Years:
And so forth. You pretty soon start to see why the first suggestion on most lists of ways to convert plaintext citations into RIS format is always "Just type it in / search for it again by hand". The method above is really only suitable if you've got literally hundreds of citations. (I have 639, plus or minus.)
8. Eventually you'll be at a point where you can do a simple find/replace to change @@ to a new line and nuke all the {it} and so forth. This will be a great relief.
9. Rename your final saved file from source12.txt to source12.ris and open with Endnote.
10. Bonus material: if this was a bibliography to a paper using numbered citations in order using eg [1], then in that paper you can do a find/replace on [ -> { and ] -> }, then tell the Endnote plugin to format citations, and voila, the best magic ever. (If the paper uses author/date citations then you'll have to link them by hand, sorry.)
You won't want to do this unless you've got literally hundreds of references. Any less, and these suggestions are way easier.
1. Format references so they're each on their own line - no blank lines.
2. Use Word's "Find Special" capabilities to replace a phrase in italics with {it}a phrase in italics{endit} and a phrase in bold with {b}a phrase in bold{endb}. (Similarly if the citations contain underlines.)
3. Save as plaintext - say, source.txt. Now the fun begins... My own source text contains 600-odd lines in ACS style, like this:
Bamford, C. H.; Tipper, C. F. H. {it}Comprehensive Chemical Kinetics{endit}; Elsevier: New York, {b}1977{endb}.
House, D. A.{it}Chem. Rev.{endit} {b}1962{endb}, {it}62{endit}, 185
4. Open up Terminal or some other Linux command line.
5. Endnote records are separated by a line
ER -- that's two spaces before the hyphen and one after. (All these details come from Endnote's help pages.) This is the easy part: type in
sed -e 's/^\(.*\)/\1ER - /' source.txt > source1.txt
6. The start of each Endnote record tells you what kind of citation it is - eg a book, journal etc. To find every line that includes a colon (ie separating the publisher from the city published in) type in
sed -e 's/^\(.*:\)/TY - BOOK@@\1/' source1.txt > source2.txtNote 1: The "@@" is in there as a sign that you'll need to replace this with a new line later; but we want to keep everything on one line for now.
Note 2: This is a good example of why this whole method is highly suspect, because it'll also catch citations which have a colon in the article title or in a typo or whatever. So if you can think of a better sign that a citation is a book then use that instead of the colon.
Alternatively, you could type in
sed -e 's/^\(.*{it}[0-9]*{endit}\)/TY - JOUR@@\1/' source1.txt > source2.txtto find every line that contains {it}[some number]{endit} which, in my source, is the best indicator that I'm dealing with a journal. The same caveats apply - you'll get both false positives and false negatives.
Anyway, keep doing what seems best given your source, and fix up the inevitable mistakes by hand until each line starts with TY - something. If you want to give up and just assume that everything that isn't already assigned as something must be a journal then try
sed -e 's/^\([^(TY - )].*$\)/TY - JOUR@@\1/' source2.txt > source3.txt
I now have source looking like:
TY - BOOK@@Bamford, C. H.; Tipper, C. F. H. {it}Comprehensive Chemical Kinetics{endit}; Elsevier: New York, {b}1977{endb}.
ER -
TY - JOUR@@House, D. A.{it}Chem. Rev.{endit} {b}1962{endb}, {it}62{endit}, 185
ER -
7. Now we keep playing with patterns. (You may be able to do large chunks of this with regular find/replace, but for illustrative purposes I'll keep using Terminal.)
For example, in my source the authors are nicely set off: they come after "@@" and before the first "{it}" (or "in {it}"), and if there's more than one of them they're separated by ";". So a few commands:
sed -e 's/@@\(.* in {it}\)/@@A1 - \1/' source3.txt > source4.txt
sed -e 's/@@\(.* {it}\)/@@A1 - \1/' source3.txt > source4.txt
sed -e 's/;\(.*;\)/@@A1 - \1/' source5.txt > source6.txt (This one I had to repeat a few times depending how many authors could be cited in one reference; there's supposed to be a way to do it globally but my unix fu is not strong.)
sed -e 's/;\(.*{it}\)/@@A1 - \1/' source8.txt > source9.txt
Journal titles:
sed -e 's/^\(TY - JOUR.*\)\({it}.*{endit} {b}\)/\1@@JO - \2/' source9.txt > source10.txt
Years:
sed -e 's/\({b}[0-9]*{endb}\)/@@Y1 - \1/' source10.txt > source11.txt
And so forth. You pretty soon start to see why the first suggestion on most lists of ways to convert plaintext citations into RIS format is always "Just type it in / search for it again by hand". The method above is really only suitable if you've got literally hundreds of citations. (I have 639, plus or minus.)
8. Eventually you'll be at a point where you can do a simple find/replace to change @@ to a new line and nuke all the {it} and so forth. This will be a great relief.
9. Rename your final saved file from source12.txt to source12.ris and open with Endnote.
10. Bonus material: if this was a bibliography to a paper using numbered citations in order using eg [1], then in that paper you can do a find/replace on [ -> { and ] -> }, then tell the Endnote plugin to format citations, and voila, the best magic ever. (If the paper uses author/date citations then you'll have to link them by hand, sorry.)
Friday, 4 February 2011
A rule about rhetorical questions
At intermediate and high school we learned the basics of debating. One technique we learned about was the rhetorical question; and we also learned an important rule for their use: Don't ask a rhetorical question if there's a chance your audience will respond with the 'wrong' answer.
@libsmatter reported from an ALIA panel:
Because if an institution wants/needs/thinks it needs to cut the library's budget, it can really easily reply, "You need to keep providing the same service. Be more efficient. Work smarter. And if you can't figure out how to do that for yourselves then well, we'll send in our favourite efficiency experts and cut your staffing for you."
And if we're not prepared to accept that answer then we should be very careful about asking that question.
@libsmatter reported from an ALIA panel:
What if when our budgets were cut we asked - "so - what do you want us to stop doing?"which I used to agree with. And I still agree that if our budgets keep getting cut then we'll have to cut services. But that doesn't mean the argument will make everyone say, "Oh, right. Um, we didn't think of that. Here, have an extra million dollars."
Because if an institution wants/needs/thinks it needs to cut the library's budget, it can really easily reply, "You need to keep providing the same service. Be more efficient. Work smarter. And if you can't figure out how to do that for yourselves then well, we'll send in our favourite efficiency experts and cut your staffing for you."
And if we're not prepared to accept that answer then we should be very careful about asking that question.
Subscribe to:
Posts (Atom)