Unicode

From Liki

Revision as of 22:31, 1 March 2012 by 3doyunlar (Talk | contribs)
(diff) ← Older revision | Current revision (diff) | Newer revision → (diff)
Jump to: navigation, search

All the fun characters that aren't in ASCII are in Unicode. Really, *all* the characters. Now that unicode is fairly well supported (on *nix, mostly in the UTF-8 encoding), why not use it to snaz up your notes and code? It's just a matter of finding the right information...

Xmodmap

There is an excellent tutorial to using Unicode in X11 by Sven Mascheck here. I follow his second approach “something like AltGraph + anotherkey.”

First, to get a feeling for what's out there, run

$ gucharmap

which is a serchable GUI character picker. Search for something fun, like *bo* or *ij* (for Sam), and you will find *ぼ* and *ij* respectively.

While you certainly can do the search-cut-paste cycle, if you'll be using the character more than a handfull of times, it's probably worth it to bind it to your keyboard. There is a lot of unicode, and only so much keyboard, so you might need one keyboard setup for taking stat. mech. notes and others for composing runic poetry or displaying your machismo.

Anyhow, the procedure is:

1) Take over an underused key to be your AltGraph-equivalent (like a super-shift). I wasn't using my right alt key much so I used it. To figure out what it's keycode is, run 'xev', which prints X events, and hit your right alt key once or twice. You should see a line that looks something like

state 0x0, keycode 113 (keysym 0xffea, Alt_R), same_screen YES,

among other things. Now find an unused modifier with

$ xmodmap
xmodmap:  up to 3 keys per modifier, (keycodes in parentheses):

shift       Shift_L (0x32),  Shift_R (0x3e)
lock        Caps_Lock (0x42)
control     Control_L (0x25),  Control_R (0x6d)
mod1        Alt_L (0x40),  Alt_L (0x7d),  Meta_L (0x9c)
mod2        Num_Lock (0x4d)
mod3
mod4        Super_L (0x7f),  Hyper_L (0x80)
mod5        Mode_switch (0x5d),  ISO_Level3_Shift (0x7c)

Looks like mod3 is free. It's probably a good idea to record your default keymap at this point

$ xmodmap > ~/.xmodmap-original-mods
$ xmodmap -pke > ~/.xmodmap-original-keys

Now create your remapping, changing any definitions you want. My current math-leaning bindings are in ~/.xmodmap:

! .xmodmap
! Following http://www.in-ulm.de/~mascheck/X11/input8bit.html
!
keycode 113 = Mode_switch
clear mod3
add mod3 = Mode_switch

! U208N is subscript N
keycode  10 = 1 exclam U2081 onehalf
keycode  11 = 2 at U2082 twosuperior
keycode  12 = 3 numbersign U2083 threesuperior
keycode  13 = 4 dollar U2084 foursuperior
keycode  14 = 5 percent
! U207B is superscript minus
! U00B9 is superscript one
keycode  15 = 6 asciicircum U207B U00B9
keycode  16 = 7 ampersand
! U2219 is the bullet operator
keycode  17 = 8 asterisk infinity U2219
keycode  18 = 9 parenleft
keycode  19 = 0 parenright emptyset
! U2213 is minus-or-plus
keycode  20 = minus underscore notsign U2213
keycode  21 = equal plus notequal plusminus
! U221A is the square root sign (radical)
! U211A is the set of all rationals
keycode  24 = q Q radical U211A
! U1E84 is LATIN CAPITAL LETTER W WITH DIAERESIS
keycode  25 = w W Greek_omega U1E84
! U2203 is there exists
keycode  26 = e E Greek_epsilon U2203
! U211D is the set of reals numbers
keycode  27 = r R Greek_rho U211D
keycode  28 = t T Greek_tau Greek_theta
keycode  29 = y Y Greek_psi Greek_PSI
keycode  30 = u U Greek_eta
! U222C is a double integral
keycode  31 = i I integral U222C
keycode  32 = o O elementof Greek_OMEGA
keycode  33 = p P Greek_pi Greek_PI
keycode  34 = bracketleft braceleft leftsinglequotemark leftdoublequotemark
keycode  35 = bracketright braceright rightsinglequotemark rightdoublequotemark
! U2200 is for all
keycode  38 = a A Greek_alpha U2200
keycode  39 = s S Greek_sigma Greek_SIGMA
! U2202 is partial differential
! (for some reason the keysym partialderivative wasn't working on my eeepc).
keycode  40 = d D U2202 Greek_delta
keycode  41 = f F function Greek_phi
keycode  42 = g G Greek_gamma Greek_GAMMA
keycode  43 = h H degree
keycode  44 = j J
keycode  45 = k K Greek_kappa
keycode  46 = l L Greek_lambda
! U2026 is an ellipsis
keycode  47 = semicolon colon U2026
keycode  48 = apostrophe quotedbl
! U223C is the tilde operator
keycode  49 = grave asciitilde U223C approximate
keycode  50 = Shift_L
! U2261 is identical to (three-bar equals)
keycode  51 = backslash bar U2261
! U2115 is the set of all natural numbers, U2124 is the set of integers
keycode  52 = z Z U2115 U2124
keycode  53 = x X Greek_chi Greek_xi
keycode  54 = c C Greek_chi
keycode  55 = v V Greek_nu
keycode  56 = b B Greek_beta
keycode  57 = n N Greek_DELTA nabla
keycode  58 = m M mu
! includedin = 'subset of'
! includes = 'superset of'
keycode  59 = comma less includedin guillemotleft
keycode  60 = period greater includes guillemotright
keycode  61 = slash question rightarrow questiondown

The non-unicode keycodes are the X keysyms, which should be interchangable with the unicode code-points as far as xmodmap is concerned. I found the supported keysyms in /usr/include/X11/keysymdef.h on my system. The codes following the equals sign decide what character is entered when you press a given key under different conditions. See ‘man xmodmap’ for details, but briefly pressing the *,* key by itself prints *,*, pressing *Shift + ,* prints *<*, pressing *Alt_R + ,* prints *⊂*, and pressing *Shift + Alt_R + ,* prints *«*.

Once you've got your keymap set up to your liking, install it (you must be in an X environment to do this) with

$ xmodmap ~/.xmodmap

⊂∞λ β∃∀ℕσ!

LaTeX

Once you've spent all this hard work perfecting your unicode shortcuts, typing out \rightarrow every time you want a simple → can be a pain. Happily, LaTeX can understand all your UTF-8-encoded unicode if you include

\usepackage{ucs}
\usepackage[utf8x]{inputenc}

in your preamble. You will still need LaTeX commands for fancy things like AMS'

\xrightarrow[over]{under}

If you use greek letters, but don't use a font defining them (e.g. you just want greek letters for math), you need to call ucs with

\usepackage[mathletters]{ucs}

Otherwise you'll get errors like

! Undefined control sequence.                                                   
\u-default-945 #1->\textalpha                                                   

See the ucs.sty manual for details.

Emacs

You can tell emacs that a file is encoded in UTF-8 by including

-*- coding: utf-8 -*-

in the comments at the beginning of your file.

I had some trouble with some Emacs 21 installations not recognizing UTF-8 input in no window mode (emacs -nw). The problem was that emacs wasn't listening for encoded input... If you expect this may be happening to you, diagnose with `C-h C <RET>`, which shows the currently used emacs codings. You can set the input coding temporarily with

C-x <RET> k coding <RET>

and the output coding with

C-x <RET> t coding <RET>

For UTF-8 replace “coding” with “utf-8”. Set the import encoding permanently in your ~/.emacs file with, for example,

(set-keyboard-coding-system 'utf-8)

See the manual for more details.

Emacs has lots of other unicode goodies too. Check `em out. OYUN OYNA

Personal tools