Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

nada

[Python] JSON 파일 불러오기 :: pd.read_json( ) 본문

Python/Pandas

[Python] JSON 파일 불러오기 :: pd.read_json( )

ds-nada 2023. 8. 9. 00:06

JSON 파일 불러오기 :: pd.read_json( )

Reference

Pandas In Action

JSON(Jave Script Object Notation)
- 텍스트 데이터를 저장하고 전송하기 위한 형식
- 키 - 값 쌍으로 구성
- Python의 딕셔너리 객체와 유사
린터(Linter)
- 각 키 - 값 쌍을 별도의 줄에 배치하여 JSON 응답을 가독성 있는 형식으로 나타냄

< JSON 파일 불러오기 >

pd.read_csv(
    path_or_buf = None,
)

Option
- path_or_buf : 파일 경로 및 파일 이름

import pandas as pd

nobel = pd.read_json("./Data/nobel.json")
nobel

	prizes
0	{'year': '2019', 'category': 'chemistry', 'lau...
1	{'year': '2019', 'category': 'economics', 'lau...
2	{'year': '2019', 'category': 'literature', 'la...
3	{'year': '2019', 'category': 'peace', 'laureat...
4	{'year': '2019', 'category': 'physics', 'overa...
...	...
641	{'year': '1901', 'category': 'chemistry', 'lau...
642	{'year': '1901', 'category': 'literature', 'la...
643	{'year': '1901', 'category': 'peace', 'laureat...
644	{'year': '1901', 'category': 'physics', 'laure...
645	{'year': '1901', 'category': 'medicine', 'laur...

646 rows × 1 columns

< Result >
→ prizes에 중첩된 딕셔너리가 존재

< 평탄화(Flattening)\ or *정규화(Normalizing) >*

pd.json_normalize(
    data,
    record_path = None,
    meta = None,
)

중첩된 데이터 레코드를 단일 1차원 리스트로 변형하는 과정
Option
- data : 직렬화되지 않은 JSON 객체
- record_path : 레코드 목록에 대한 각 개체의 경로
- meta : 결과 테이블의 각 레코드에 대한 메타데이터

# `prizes`데이터 중 첫번째 최상위 딕셔너리 키(`year`, `category`, `laureates`)를 추출
pd.json_normalize(data = nobel['prizes'][0])

	year	category	laureates
0	2019	chemistry	[{'id': '976', 'firstname': 'John', 'surname':...

< Result >
→ laureates에 여전히 중첩된 딕셔너리가 존재

# 중첩된 `laureates`레코드를 정규화
pd.json_normalize(
    data = nobel['prizes'][0],
    record_path = 'laureates'
)

	id	firstname	surname	motivation	share
0	976	John	Goodenough	"for the development of lithium-ion batteries"	3
1	977	M. Stanley	Whittingham	"for the development of lithium-ion batteries"	3
2	978	Akira	Yoshino	"for the development of lithium-ion batteries"	3

< Result >
→ 새로운 열로 확장했지만 기존의 year와 category열이 사라짐

# 최상위 키 - 값 쌍을 유지 (`year`, `category`)
pd.json_normalize(
    data = nobel['prizes'][0],
    record_path = 'laureates',
    meta = ['year', 'category']
)

	id	firstname	surname	motivation	share	year	category
0	976	John	Goodenough	"for the development of lithium-ion batteries"	3	2019	chemistry
1	977	M. Stanley	Whittingham	"for the development of lithium-ion batteries"	3	2019	chemistry
2	978	Akira	Yoshino	"for the development of lithium-ion batteries"	3	2019	chemistry

# Error
pd.json_normalize(
    data = nobel['prizes'],
    record_path = 'laureates',
    meta = ['year', 'category']
)

Result

    ---------------------------------------------------------------------------
  KeyError                                  Traceback (most recent call last)
  File ~/anaconda3/envs/torch/lib/python3.11/site-packages/pandas/io/json/_normalize.py:399, in _json_normalize.<locals>._pull_field(js, spec, extract_record)
      398     else:
  --> 399         result = result[spec]
      400 except KeyError as e:

  KeyError: 'laureates'

  The above exception was the direct cause of the following exception:

  KeyError                                  Traceback (most recent call last)
  Cell In[6], line 2
        1 # Error
  ----> 2 pd.json_normalize(
        3     data = nobel['prizes'],
        4     record_path = 'laureates',
        5     meta = ['year', 'category']
        6 )

  File ~/anaconda3/envs/torch/lib/python3.11/site-packages/pandas/io/json/_normalize.py:518, in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level)
      515                 meta_vals[key].append(meta_val)
      516             records.extend(recs)
  --> 518 _recursive_extract(data, record_path, {}, level=0)
      520 result = DataFrame(records)
      522 if record_prefix is not None:

  File ~/anaconda3/envs/torch/lib/python3.11/site-packages/pandas/io/json/_normalize.py:500, in _json_normalize.<locals>._recursive_extract(data, path, seen_meta, level)
      498 else:
      499     for obj in data:
  --> 500         recs = _pull_records(obj, path[0])
      501         recs = [
      502             nested_to_record(r, sep=sep, max_level=max_level)
      503             if isinstance(r, dict)
      504             else r
      505             for r in recs
      506         ]
      508         # For repeating the metadata later

  File ~/anaconda3/envs/torch/lib/python3.11/site-packages/pandas/io/json/_normalize.py:422, in _json_normalize.<locals>._pull_records(js, spec)
      416 def _pull_records(js: dict[str, Any], spec: list | str) -> list:
      417     """
      418     Internal function to pull field for records, and similar to
      419     _pull_field, but require to return list. And will raise error
      420     if has non iterable value.
      421     """
  --> 422     result = _pull_field(js, spec, extract_record=True)
      424     # GH 31507 GH 30145, GH 26284 if result is not list, raise TypeError if not
      425     # null, otherwise return an empty list
      426     if not isinstance(result, list):

  File ~/anaconda3/envs/torch/lib/python3.11/site-packages/pandas/io/json/_normalize.py:402, in _json_normalize.<locals>._pull_field(js, spec, extract_record)
      400 except KeyError as e:
      401     if extract_record:
  --> 402         raise KeyError(
      403             f"Key {e} not found. If specifying a record_path, all elements of "
      404             f"data should have the path."
      405         ) from e
      406     elif errors == "ignore":
      407         return np.nan

  KeyError: "Key 'laureates' not found. If specifying a record_path, all elements of data should have the path."

< Result >
→ Error : prizes Series에 있는 딕셔너리 중 일부는 laureates라는 키가 없기 때문..

dictionary.setdefault(
    key,
    value
)

딕셔너리 키에 대한 기본 값을 할당
딕셔너리에 키가 없는 경우에는 키 - 값 쌍을 할당
딕셔너리에 키가 있는 경우 기존 값을 반환

def add_laureates_key(entry):
    entry.setdefault('laureates', [])

# prizes에 있는 딕셔너리 자체를 변경하므로 기존의 Series를 덮어쓸 필요가 없음
nobel['prizes'].apply(add_laureates_key)

0      None
1      None
2      None
3      None
4      None
       ... 
641    None
642    None
643    None
644    None
645    None
Name: prizes, Length: 646, dtype: object

# 완성된 JSON 파일 불러오기
winners = pd.json_normalize(
    data = nobel['prizes'],
    record_path = 'laureates',
    meta = ['year', 'category']
)
winners

	id	firstname	surname	motivation	share	year	category
0	976	John	Goodenough	"for the development of lithium-ion batteries"	3	2019	chemistry
1	977	M. Stanley	Whittingham	"for the development of lithium-ion batteries"	3	2019	chemistry
2	978	Akira	Yoshino	"for the development of lithium-ion batteries"	3	2019	chemistry
3	982	Abhijit	Banerjee	"for their experimental approach to alleviatin...	3	2019	economics
4	983	Esther	Duflo	"for their experimental approach to alleviatin...	3	2019	economics
...	...	...	...	...	...	...	...
945	569	Sully	Prudhomme	"in special recognition of his poetic composit...	1	1901	literature
946	462	Henry	Dunant	"for his humanitarian efforts to help wounded ...	2	1901	peace
947	463	Frédéric	Passy	"for his lifelong work for international peace...	2	1901	peace
948	1	Wilhelm Conrad	Röntgen	"in recognition of the extraordinary services ...	1	1901	physics
949	293	Emil	von Behring	"for his work on serum therapy, especially its...	1	1901	medicine

950 rows × 7 columns

'Python > Pandas' 카테고리의 다른 글

[Python] Series 인덱스 & 값 확인 :: Series.index \| Series.values (0)	2023.08.14
[Python] Series 행 반환 :: Series.head( ) \| tail( ) \| sample( ) (0)	2023.08.10
[Python] Excel 파일 불러오기 :: pd.read_excel() (0)	2023.08.07
[Python] CSV 파일 불러오기 :: pd.read_csv( ) (0)	2023.08.07
[Python] DataFrame 생성 :: pd.DataFrame( ) (0)	2023.08.06

'Python/Pandas' Related Articles

nada

[Python] JSON 파일 불러오기 :: pd.read_json( ) 본문

[Python] JSON 파일 불러오기 :: pd.read_json( )

JSON 파일 불러오기 :: pd.read_json( )

'Python > Pandas' 카테고리의 다른 글

티스토리툴바