Frequency List?

Deasbaid air cùrsaichean chànain amsaa. / Anything about language courses etc.
Zwalla28
Posts: 54
Joined: Thu Feb 14, 2013 1:45 am
Language Level: Tha mo ionnsachadh gun deireadh
Corrections: Please correct my grammar
Location: Canada
Contact:

Frequency List?

Unread postby Zwalla28 » Thu Feb 28, 2013 12:09 am

I'm not sure if this is the right section, but it seemed most fitting so I went with it. I apologize if I've erred.

Anyways, do any of you guys know where I might find a frequency list? I've begun memorizing vocab, and it would be nice to know the most frequently-used words in Gaelic so that I can be sure to know them. I can find lists like these in many languages, but no luck finding a Gaelic one thus far.

Any input is much appreciated!


Bhiodh gaol agam oirbh gu bràth, ma cheartaicheadh sibh na mearachdan sa' phost os cionn!
Sgeul aigeantach mòr ri linn,
Gu'm bi neart, agus ceart, mar ri treòir,
Do'n fhear sheasas còir an rìgh.

User avatar
Seonaidh
Posts: 1486
Joined: Fri Apr 04, 2008 8:00 pm
Corrections: I'm fine either way
Location: Faisg air Gleann Rathais

Re: Frequency List?

Unread postby Seonaidh » Thu Feb 28, 2013 1:21 am

Do such things actually help significantly with other languages?

I must confess, I would only find such lists useful if I was doing some sort of statistical linguistic analysis. When it comes to learning languages, I tend to trust that what I'm learning is reasonably relevant.

The most frequently occurring Gaelic word is probably the word "a", in all its variations (well, with or without a following apostrophe). Is this helpful?

Zwalla28
Posts: 54
Joined: Thu Feb 14, 2013 1:45 am
Language Level: Tha mo ionnsachadh gun deireadh
Corrections: Please correct my grammar
Location: Canada
Contact:

Re: Frequency List?

Unread postby Zwalla28 » Thu Feb 28, 2013 2:26 am

No, I suppose it's not.

The idea behind it though is that the one or two thousand most common words in a language will make up ~75% of everything said, read, and written. This is what I've read, anyways. I've no real idea where I should start in way of vocabulary and this seemed to make sense.
Bhiodh gaol agam oirbh gu bràth, ma cheartaicheadh sibh na mearachdan sa' phost os cionn!
Sgeul aigeantach mòr ri linn,
Gu'm bi neart, agus ceart, mar ri treòir,
Do'n fhear sheasas còir an rìgh.

Níall Beag
Posts: 1314
Joined: Sun Sep 23, 2007 6:58 pm
Language Level: Chan eil gaidhlig agam agus cha bhi
Location: Dún Èideann, Alba
Contact:

Re: Frequency List?

Unread postby Níall Beag » Thu Feb 28, 2013 9:46 pm

There's not been a large scale corpus study of Gaelic and there would be certain issues in trying to compile such a list, because the question that arises in any such exercise is "what is a word". As the correspondence between verbs and their verbal nouns is relatively unpredictable (unlike inflections in most languages), you've probably got to count them as different words. And what about dialectal variation: are smaointinn, smaoineachadh and smoineachainn one word or three.

There may be a frequency list in William Lamb's book "Scottish Gaelic Speech and Writing: Register Variation in an Endangered Language", but my copy's a thousand miles away, so you'd have to ask someone else to check for you.

Seonaidh,
Gonnae stop being such a git to the new members? You have you way of doing things, which some of us disagree with. Disagree -- go ahead and disagree -- but don't be so bloody dismissive all the time.

User avatar
Thrissel
Posts: 647
Joined: Wed Jun 24, 2009 10:33 pm
Language Level: eadar-mheadhanach
Location: Glaschu

Re: Frequency List?

Unread postby Thrissel » Fri Mar 01, 2013 6:28 pm

Níall Beag wrote:Seonaidh,
Gonnae stop being such a git to the new members? You have you way of doing things, which some of us disagree with. Disagree -- go ahead and disagree -- but don't be so bloody dismissive all the time.

Chan eil mi gad thuigsinn idir a charaid. Seonaidh should certainly never become a member of any reception committee, but what was "dismissive" about this particular post of his? The way I see it he just (politely) expressed a suspicion that Zwalla28 might be wasting time with a dubious method of learning and gave an example which apparently seemed to him to well illustrate his point. Perhaps you wanted to say your words for some time and this was just the last straw, but in that case you picked a straw so thin as to be virtually nonexistent.

Níall Beag
Posts: 1314
Joined: Sun Sep 23, 2007 6:58 pm
Language Level: Chan eil gaidhlig agam agus cha bhi
Location: Dún Èideann, Alba
Contact:

Re: Frequency List?

Unread postby Níall Beag » Mon Mar 04, 2013 12:38 pm

Yeah, sorry... it was a bad day. I was having a little personal crisis about the near future, and anything, anywhere could have triggered me off at that point.

User avatar
GunChleoc
Rianaire
Posts: 4347
Joined: Mon Sep 17, 2007 11:26 am
Language Level: Mion-chùiseach
Corrections: Please correct my grammar
Location: Dùthaich mo chridhe
Contact:

Re: Frequency List?

Unread postby GunChleoc » Tue Mar 12, 2013 3:41 pm

Tha mi 'n dòchas gun tig piseach air cùisean dhut, a Neill.

Personally, I found that memorising words only helped me if I also got a chance to use them. So, why not create your own little book with words you come across while you are learning and that you find useful? I recommend to write this by hand, since the act of writing also helps with memorising.

You could also trawl textbooks for word lists. They won't necessarily all be the most fequent words, but many of them will be frequent enough.
Oileanach chànan chuthachail
Na dealbhan agam

User avatar
akerbeltz
Rianaire
Posts: 1668
Joined: Mon Nov 17, 2008 2:26 am
Language Level: Barail am broinn baraille
Corrections: Please don't analyse my Gaelic
Location: Glaschu
Contact:

Re: Frequency List?

Unread postby akerbeltz » Sat Jun 01, 2013 9:54 pm

There is a very short frequency list in Will Lamb's book, based on a 60,000 word corpus.

a
an
a'
air
tha
e
na
agus
bha
ann
's
gu
iad
ach
am
robh
aig
sin
gun
do
is
cha(n)
nach
mi
i
eil
mar
bheil
ri
seo
anns
de
chaidh
nan
ag
bhith
eile
le
fhèin
mu
ris
math
bhiodh
b'
as
airson
esan
dol
aca
thu
gum
ràdh
bho
cho
sinn
nuair
chuir
duine
thuirt
no
f(h)ios
aon
dhan

leis
a-mach
dha
cuideachd
fear
thoirt
aige

bu
ars
'ga
chur
gur
àite
taobh
idir
eadar
fhuair
co-dhiù
ma
bliadhna
nam
uair
mòr
tighinn
obair
agad
rinn
dìreach
thàinig
rud
far
tu
bidh
ùr
suas

Note that this corpus data merges/separates roots, for example a merges the possessive, relative particle etc whereas thu and tu obviously should be the same root.

Each to their own, what one person finds a useful learning aide riles another. Whatever works for you :)

I may be able to come up with something more useful but that will take a moment, gimme 5.

User avatar
akerbeltz
Rianaire
Posts: 1668
Joined: Mon Nov 17, 2008 2:26 am
Language Level: Barail am broinn baraille
Corrections: Please don't analyse my Gaelic
Location: Glaschu
Contact:

Re: Frequency List?

Unread postby akerbeltz » Sun Jun 02, 2013 12:57 am

Ok, here's what I've done. When we did the predictive texting tool for Gaelic, we took all the words in the spellchecker and compared it against a corpus of crawled Gaelic texts on the web and discarded those which did not appear at all and some which appeared only highly infrequently. We didn't use the web corpus per se because that was full of typos. I've made some modifications but the result is a list of the 10,000 or so most common Gaelic items, ranked relatively. It looks something like this:

36451 a
20934 an
13882 air
12409 na
10009 tha
9602 agus
7180 ann
5086 gu
4434 e
4235 airson
4004 o
3841 am
3408 seo
3133 le
3014 aig
2757 gàidhlig
2720 is
2621 bha
2439 do
2354 mu
2353 ri
2280 mar
2265 de
2137 mi
2034 iad
2034 nan
1961 anns
1782 bho
1737 eile
1704 bhith
1654 bheil
1594 sin
1559 chaidh
1472 sinn

Things to bear in mind:
  1. this is based on *written* Gaelic on the web. As a result, it's skewed the same way ALL written corpora are which are not based on transcriptions of spoken language. So pàrlamaid occurs more often that it would in spoken Gaelic.
  2. the same root may be represented more than once i.e. both gàidhlig and ghàidhlig appear. That has both advantages and disadvantages. The disadvantage with nouns is that lenition skews the picture but I simply didn't have time to adjust that. The good thing is that for verbs, it tells you which forms are more common that others. For example that chaidh is more common than deach.

You can get it here, you'll need LibreOffice (free) to open it (or a recent version of MS Office)

Hope this helps a little!

Kennedyflyting
Posts: 4
Joined: Thu Mar 02, 2017 10:17 am
Language Level: Beginner, some vocabulary knowledge
Corrections: Please correct my grammar
Location: Greece

Frequency List?

Unread postby Kennedyflyting » Thu Mar 02, 2017 11:13 am

Frequency lists can be useful for identifying key vocabulary and making sure you don't learn the equivalent of postillion before you learn the equivalent of postman. Certainly some other languages I have learned or studied, like Russian, are well provided with them and they have been useful for me.